Factor graph based detection approach for high-mobility OFDM systems with large FFT modes

In this article, a novel detector design is proposed for orthogonal frequency division multiplexing (OFDM) systems over frequency selective and time varying channels. Namely, we focus on systems with large OFDM symbol lengths where design and complexity constraints have to be taken into account and many of the existing ICI reduction techniques can not be applied. We propose a factor graph (FG) based approach for maximum a posteriori (MAP) symbol detection which exploits the frequency diversity introduced by the ICI in the OFDM symbol. The proposed algorithm provides high diversity orders allowing to outperform the free-ICI performance in high-mobility scenarios with an inherent parallel structure suitable for large OFDM block sizes. The performance of the mentioned near-optimal detection strategy is analyzed over a general bit-interleaved coded modulation (BICM) system applying low-density parity-check (LDPC) codes. The inclusion of pilot symbols is also considered in order to analyze how they assist the detection process.


Introduction
Orthogonal frequency-division multiplexing (OFDM) is one of the most popular modulation schemes suitable for high-data rate wireless communication systems. The fact that the OFDM symbol period is much longer than a data symbol in a single-carrier system, makes it robust against intersymbol interference (ISI) caused by frequency-selective channels. For this reason it has been adopted by most of the last-generation wireless communication systems such as, IEEE's 802. 16 framework, the long-term evolution (LTE) project, or the second generation digital video broadcasting (DVB) specifications. Mobility support is one of the key features of these communication systems, which try to deal with the challenge of enabling mobile broadband services at high vehicular speed. Either the second generation DVB standards, with both terrestrial (DVB-T2) and handheld (DVB-NGH) versions, or the IEEE 802. For conventional OFDM receivers, it is assumed that the channel remains static in an OFDM symbol period. In this case, equalization can be drastically simplified by turning the frequency selective channel into several parallel flatfading channels. However, when the channel varies within an OFDM block, the subcarriers are not longer orthogonal and the system performance drops down severely. In fact, it is well known that one of the main drawbacks of OFDM is its susceptibility to the loss of orthogonality amongst subcarriers due to Doppler frequency shifts (mobile reception) or oscillator offsets, which lead to the so-called intercarrier interference (ICI). However, since it is shown that ICI can be modeled as a near-Gaussian random process that leads to an error floor [1][2][3], the use of advanced forward error correction (FEC) techniques such as low-density parity-check codes (LDPC) or turbo codes allows error-free reception at high-mobility scenarios without any particular reception strategy for ICI compensation.
With the aim of increasing spectral efficiency or enhancing the robustness in single-frequency networks (SFN), http://jwcn.eurasipjournals.com/content/2012/1/331 there is a growing trend toward using large OFDM symbol lengths (which means densely spaced subcarriers) in most of the new wireless communication systems (e.g., 32 K subcarriers in DVB-T2). As the delay spread increases, symbol duration must also be increased in order to maintain a near-flat channel in every subcarrier. Nevertheless, since the ICI power depends on the subcarrier spacing, the effect of time-varying channels becomes critical when large OFDM symbols are used and it is necessary to develop appropriate signal processing techniques to combat the mobility-induced ICI problem. Hence, from an ICI cancelation perspective, the most challenging scenario is a high-mobility scenario employing a large FFT mode.
This article focuses on systems with very large OFDM symbol lengths, and proposes a novel FG approach for ICI compensation based on the Forney observation model [4]. The proposed detector approaches the free-ICI case in high mobility scenarios and presents an intrinsic parallel structure suitable for high speed detection. We also incorporate the turbo principle applied to numerous fields [5], by which means the extrinsic information generated by the decoder is used as a priori information in the detection process. In fact, the receiver becomes a triple iterative scheme where iterations are performed in the detector, in the decoder and between both of them. Moreover, when LDPC codes are employed, a combined FG can be drawn which accomplishes both the detection and decoding process in a fully parallel manner [6], leading to potential hardware or DSP code saving.
The rest of this article is organized as follows. Section 2 gives a brief overview of the state of the art. Section 3 describes the system model in frequency selective channels affected by ICI. Section 4 reviews the principles behind the BP detection both for Forney and Ungerboeck approaches and proposes how to deal with pilot tones in the FG. Section 5 faces up to the complexity analysis. Section 6 analyzes the turbo reception scheme. Section 7 evaluates the proposed detector performance for several mobile scenarios, and finally, Section 8 summarizes the main conclusions of the article.
Notation: We denote the kth element of s by s k , and s k,l the element in the kth row and lth column of the matrix S. The conjugate transpose is denoted as (·) H and {·} refers to the real part of a complex number. I N is the identity matrix and ∝ operator means that two terms are equivalent up to irrelevant additive and multiplicative values.

Background on ICI suppressing
In recent years, a wide range of OFDM receiver designs have been proposed in the literature for time-varying multipath channels. We can make a difference between linear equalizers based on zero-forcing (ZF) [7] or minimum mean-square error criterion [8][9][10], and non-linear equalizers based on ICI cancelation or decision-feedback [11][12][13][14][15][16]. Linear equalization requires the inversion of the frequency channel matrix, which is prohibitively complex for large OFDM symbols. Although many approaches have been proposed for solving this problem [7,8], in general terms, they show poor performance at high-mobility scenarios.
Non-linear equalizers outperform linear approaches at the cost of higher complexity. Since there are similarities between the ICI and the multiuser interference (MUI), well-known interference cancelation techniques such as, successive interference cancelation (SIC) and parallel interference cancelation (PIC) can be directly applied to ICI suppressing. While the former present higher computational complexity and processing delays, the latter suffer from performance loss at high Doppler frequencies. In [17], a combination of the MMSE method and PIC is presented, which outperforms the conventional PIC schemes. Recently, space alternating generalized expectation maximization (SAGE) has been proposed for ICI cancelation [18,19].
Most of the techniques mentioned above, either do not show good performance in high mobility scenarios or they are not suited for large FFT modes in terms of complexity and design constraints. Peng and Ryan suggest a parallel implementation of a two-stage equalizer [27] suitable for large FFT sizes, which has low complexity comparing with other proposals when the OFDM symbol length is very large. Recently, a time domain per sub-block equalizer has been presented [28] to tackle the same problem formulated in this article. One of the drawbacks of this proposal is that, pilot tones, used in most of wireless communication systems can not be used for channel estimation.
Factor graphs (FG) have received the attention of many researchers and have been applied to solve many technical problems in communications [29,30]. Essentially, FGs represent graphically the factorization of a function, and have been shown to be a good alternative to solve complex inference problems. Many recent articles address the use of FG-based algorithms (also known as belief propagation (BP) algorithms) for detection or equalization in channels with memory [31][32][33]. Recently, this framework has been extended to ICI channels: a set of detection algorithms have been presented based on the Ungerboeck approach for MAP symbol detection [34], while in [35], progressive parallel ICI cancelation has been proposed based on FGs. http://jwcn.eurasipjournals.com/content/2012/1/331

System model
We consider a single user bit interleaved coded modulation (BICM)-OFDM system over a frequency selective time varying channel. OFDM modulation divides a data stream with sample period T into N substreams, which are modulated on different subcarriers, equally spaced at a distance of f = 1/(TN). N coded symbols s = [s 1 , . . . , s N ] T , equally likely and chosen from a complex constellation χ are transformed into the time domain applying the inverse discrete Fourier transform (IDFT) rule. A cyclic prefix (CP) of length G is added at the output of the IDFT block. No interblock interference (IBI) is considered since it is assumed that the CP is longer than the channel delay spread.
According to the COST 207 model, the channel taps are considered independent random processes with Rayleigh statistics, and the Doppler frequency F d is modeled with the widely used Jakes' Doppler spectrum. The mobile scenarios considered in this article are characterized by the normalized Doppler frequency f d = F d f . The received signal after CP removal can be expressed in matrix form as [8] where H is a N × N time-domain channel matrix, F stands for the standard N-point DFT matrix, and vectors r and z are the received signal and the additive white Gaussian noise (AWGN) discrete samples, respectively. We assume that z is a complex Gaussian noise vector with zero mean and covariance E zz H = σ 2 n I N . Applying the discrete Fourier transform (DFT) leads to the received signal in frequency domain where H = FHF H is the frequency-domain channel matrix, and z f = Fz is the frequency domain noise vector which has the same statistics as time domain noise z. Note that y = Fr.
In time invariant channels (f d = 0), H is circulant and therefore H becomes diagonal, leading to the low complexity per subcarrier equalization that characterizes OFDM systems. Nevertheless, when the channel is time variant, the off-diagonal elements of H are not longer zero, and the ICI comes up making the implementation of conventional equalizers prohibitively complex. The ICI power P ICI is expressed by [2] where J 0 (·) is the zero-order Bessel function of the first kind. From (3) it is clear that the ICI power is a function of the Doppler frequency, the subcarrier spacing and the OFDM block length, and does not depend on the signal constellation.
Since most of the energy in H is concentrated around the main diagonal, the matrix can be approximated by its banded version, where q defines how many off-diagonals above and below the main diagonal are taken into consideration ( Figure 1). For example, more than 64 % of the energy clusters in the neighboring three subcarriers (q = 1) for f d = 0.13. As a result of increasing the vehicular speed, or the OFDM block size, more and more energy leaks into the off-diagonal elements of H, and it is necessary to define a proper value of q to get a trade-off between performance and computational complexity. For many applications, it is shown that q = 1 is enough to achieve good BER performance in relatively high mobility scenarios. In this article, q = 1 is further assumed.
The proposed BP detector's performance is evaluated considering the 64800 bits-long LDPC code adopted by DVB [36] as channel code. Ideal channel state information (CSI) and perfect synchronization are assumed at the receiver. Nevertheless, we consider the presence of pilot carriers in order to describe how they can assist the detection process. The PP1 DVB-T2 pilot pattern [36] has been adopted for this purpose. The structure of the receiver is shown in Figure 2.

Factor graph & sum-product algorithm
A FG is a bipartite graph G = {V , E} which represents the factorization of a complex function into the product of local functions, where V represents the set of vertices of the graph and E is the set of edges. Let s = {s 1 , . . . , s D } represent a set of variables and f (s) a multivariate function. We consider the factorization of f (s) into K factors: where ψ k ⊆ {s 1 , . . . , s D } is the kth variable subset, and f k (·) is a real-valued function. Variable nodes s k are associated with vertex called variable nodes, factors f k are represented with vertex called function nodes and there is an edge e = s n , f k ∈ E for a factor f k and every variable s n ∈ ϑ f k , being ϑ(v) the set of neighbors of a given node v.
Assuming that f k (ψ k ) is a probability mass function (pmf ), and the graph has no cycles, the application of the sum-product algorithm (SPA) to the FG leads to the exact computation of the marginal pmfs, avoiding the cumbersome evaluation of the joint pmf. Denoting by μ s n →f k (s n ) the message sent from a variable node s n to a factor node f k (ψ k ), and by μ f k →s n (s n ) the message in the opposite direction (log domain is assumed), the message computation in variable and factor nodes, respectively, becomes where ∼{s n } indicates the sum over all variables excluding s n (summary operation). After a fix number of iterations between variable and factor nodes, marginal a posteriori pmfs P s n |y are computed, necessary to evaluate the MAP symbol detection rule given bŷ as follows The order in which the messages are updated in the SPA is defined by the message-passing schedule. Different schedules can be considered, and the performance of the algorithm varies depending on it. In this article, in order to parallelize the algorithm as much as possible, the so-called flooding schedule is adopted: all variable and function nodes pass new messages to their neighbors in each iteration.
If the FG is not cycle-free, the algorithm does not converge to the exact marginal pmfs and it is hard in general to state whether it converges to reasonable distributions or not. Consequently, approximate marginal a posteriori distributions are computed. However, for many applications, it has been shown that loopy FGs provide excellent empirical results and turn out to be a feasible alternative for designing near optimal algorithms [37].

MAP symbol detection based on Forney approach
In this article, we are interested in evaluating the joint a posteriori pmf of the transmitted information sequence s conditioned on an observation sequence y (Figure 1). Based on the Bayesian estimation theory, it is well known that the transmission of a random signal over an AWGNaffected random channel can be modeled as P(s|y) ∝ P(y|s)P(s).
For the system model assumed in this article (2), and assuming q = 1, the ψ variable subset described in (4) is defined as ψ k = s k−1 , s k , s k+1 . Based on the Forney observation model [4], the likelihood function P(y|s) is factorized as follows where implicitly s k = 0 for k ≤ 0 and k > N and Part of the corresponding FG is depicted in Figure 3, where it is clear that the shortest cycle in the graph is length 4 (the graph has girth 4). Although it has been shown that length 4 cycles degrade severely the performance of the algorithm, it is known that in particular cases they do not pose significant problems in terms of BER. Throughout this article, the performance of the proposed BP algorithm is compared with the optimal forward backward (FB) algorithm with the aim of determining the performance loss due to the presence of cycles in the graph. http://jwcn.eurasipjournals.com/content/2012/1/331 Figure 3 Factor Graph for the Forney approach-based MAP symbol detection.

MAP symbol detection based on Ungerboeck approach
Two approaches have been proposed for MAP sequence detection until now: Forney approach [4] and the Ungerboeck approach [38]. The former uses the output of a whitened match filter for computing the branch metrics of the Viterbi algorithm (VA) leading to a Markovian channel model, while the latter can work directly with the output of a filter matched to the received pulse. Both strategies are equivalent and only differ in the expression of the VA branch metrics. Nevertheless, when the MAP symbol detection strategy is adopted, while the Forney observation model can be directly applied, the extension of the Ungerboeck observation model is not trivial. In [39], this problem is solved and it is proven that both models are also equivalent for MAP symbol detection. Based on the keen observation made in [39], a BP detection algorithm based on the Ungerboeck approach [40] was proposed for dispersive channels providing impressive complexity reduction comparing with [33]. Similar low-complexity FGs have been derived for channels with memory in [32]. Recently, a novel BP detector employing the Ungerboeck observation model has been presented [34] for ICI channels which shows a good performance-complexity trade-off at low Doppler frequencies.
Based on the Ungerboeck observation model the likelihood function P(y|s) is factorized as follows where x = H H y and G is the Hermitian matrix defined as G = H H H. The same factorization can be expressed in scalar form as By defining the next functions the likelihood function can be expressed in a compact form as where implicitly s k = 0 for k ≤ 0 and k > N. Part of the FG derived from (16) is depicted in Figure 4, where it can be appreciated that has girth 6. This fact seems indicate that the algorithm based on the Ungerboeck approach may perform better than the algorithm based on the Forney observation model (Figure 3). However, numerical results presented in Section 5 show that for high Doppler frequencies, the performance of the Ungerboeck approach drops down severely. Note from (16) that there is still a fundamental complexity difference with respect to the factorization in (11). The complexity of both approaches will be further analyzed in Section 5.

Pilot-assisted graph-based detection
Many wireless communication systems use training data sequences for channel estimation. For instance, DVB standards consider different pilot patterns in order to adjust the transmission requirements to different channel conditions. Due to the frequency correlation introduced by the ICI, BP detection has to be performed before pilot removal. This fact makes the FG longer since variable and function nodes corresponding to pilot carriers have to be taken into account, but, at the same time, there is known information that we can use in order to assist the detection process. Figure 5 shows how to deal with pilot carriers in the message passing process, where s n and f k are the variable and function nodes corresponding to a given pilot symbol, respectively. On one hand, outgoing messages from f k are more accurate representations of marginal distributions since known information corresponding to s n has been used in (11). On the other hand, messages transmitted by variable node s n are uniform distributions of the corresponding pilot symbol, which are useless for data symbol detection, thus, we can consider variable node s n as an idle node and disconnect from their neighbors. As we can appreciate in Figure 5, as a consequence of disconnecting variable nodes corresponding to pilot symbols, some cycles are straightforwardly removed from the FG. As an example, considering PP1 pilot pattern of DVB-T2 where pilot carriers represent the 8% of the total carriers, 8% of length 4 cycles are directly removed, which yields an additional performance improvement.

Complexity analysis
As it has been mentioned before, the implementation of the SPA is accomplished in log domain. This can be done using the Jacobian logarithm which only requires additions and the evaluation of a non-linear function.
The complexity of the proposed detection algorithm is mainly a function of the following parameters: constellation size M, number of total subcarriers N and the bandwidth parameter q. Table 1 describes the complexity of different algorithms considered in this article. As it can be appreciated, it is obvious the weakness of the proposed algorithm comparing with the Ungerboeck approach based algorithm presented in [34] in terms of complexity, since the Forney approach leads to exponential dependence on the bandwidth parameter q.
We can adopt different strategies to reduce the complexity of the algorithm proposed in this article. Compared with FB algorithms, the complexity reduction is much easier in BP algorithms due to there are no constraints imposed by the Trellis structure. Reduced state sequence detection (RSSD) [41,42] can be an effective method for reducing the complexity with a minor performance loss.

Turbo reception
The turbo principle has been applied to numerous fields beyond the practical decoding of powerful channel codes such as turbo codes and LDPC codes. In general terms, it consists of iterative exchanging of soft information between different blocks in a communications receiver with the aim of improving the system performance. While the original turbo receiver is conceived as two independent processors exchanging extrinsic soft information [5] (classical approach), there is a generalized view of the turbo approach derived from [6] which is described as passing soft decision messages on cyclic graphical models (graphical approach).
Considering that LDPC codes are decoded using BP algorithms, the joint data detection and decoding process can be depicted by a higher-order FG following the recent trend derived from the graphical approach. A triple iterative scheme is proposed where messages are exchanged among internal LDPC decoder nodes (LDPC iterations), internal detector nodes (BP iterations), and between the detector and the decoder (turbo iterations) ( Figure 6). In order to reach a good balance between the latency and system performance, we are strongly interested in analyzing the convergence behavior of the proposed receiver and finding out the best turbo schedule, i.e., how many BP and LDPC iterations perform in each turbo iteration. In this article, this task is faced up by extensive simulation results.

Numerical results
In this section, numerical results show the performance of the proposed BP detector for several mobile scenarios. The next table summarizes the main simulation parameters.
The mobile scenarios considered in this article are characterized by the normalized Doppler frequency. For the simulation parameters described in Table 2, f d = 0.13 corresponds to about 200 km/h of vehicular speed and f d = 0.5 represents the same vehicular speed for the 32 K OFDM mode. 6-taps typical urban (TU6) channel and 6taps rural area (RA6) channel are used for modeling urban and rural environments, respectively. The channels of interest in this article correspond to Doppler frequencies beyond f d = 0.3.

Forney approach versus Ungerboeck approach
The performance of the proposed detector is compared with the Ungerboeck approach-based BP detector presented in [34]. TU6 channel model has been simulated for QPSK constellation. Five iterations have been performed in the detector in both cases, and no turbo reception (no turbo iterations) is considered for simplicity. Min-sum (MS) algorithm is adopted for LDPC decoding with code rate (CR) 2/3. Both BP algorithms are compared with the optimal FB algorithm in order to quantize the performance loss of both BP algorithms with respect to the optimal MAP symbol detection, although it is well known that FB algorithms can not be considered for large FFT sizes due to its intrinsic serial structure. Figure 6 High order factor graph performing joint data detection and decoding following the graphical approach for turbo reception. http://jwcn.eurasipjournals.com/content/2012/1/331 As it is shown in Figure 7, both approaches show almost the same performance for low Doppler frequencies and they are very closed to the optimal FB. Consequently, and taking into account that the Ungerboeck approach is less complex than the proposed Forney approach, it is clear that for low mobile scenarios, the best option is the Ungerboeck approach-based BP detector [34]. However, it turns out that the performance of the Ungerboeck approachbased BP detector drops down severely at high-mobility scenarios. In Figure 8, it is shown that the Ungerboeck approach tends to an error-floor whereas the Forney approach presents a performance loss of about 0.6 dB at BER = 10 −4 with respect to the optimal FB. A detailed observation of the messages into the FG at asymptotic regime (high SNR and high Doppler frequency) indicates that the factorization based on the Ungerboeck observation model (13) collapses when in the presence of very reliable input information factor nodes output uniform distributions. Therefore, the Ungerboeck approach is not a good option for the channels of interest in this article.
On the other hand, we can notice that the performance loss in comparison with the optimal FB is higher for high Doppler frequencies, but even for the highest Doppler frequency considered in this section (f d = 0.4), our proposal is closed to the optimal one. Hence, it is shown that, in this case, short cycles do not prevent from an acceptable performance of the SPA and the BP algorithm provides good ICI compensation.

BP detection versus MAP detection with ICI cancelation
The proposed detector is also compared with the MAP ICI canceler proposed in [27]. This detector presents a fully parallel architecture which accommodates hardware implementation and its complexity is much less than other proposals' in the literature when the OFDM symbol length is very large. As a result, it consists of a good candidate for ICI compensation when very large OFDM symbol lengths are used. In general terms, it is composed of two stages: the first one estimates the transmitted data by means of a Viterbi-like algorithm and the second one performs the MAP detection suppressing the ICI previously reconstructed with the output of stage one. TU6 channel model and QPSK constellation is adopted, 5 BP iterations have been performed in the detector, and again no turbo reception (no turbo iterations) is considered for simplicity.
As it is shown in Figure 9, both schemes are very closed to each other for f d = 0.13. Nevertheless, the estimates produced by the first stage of the MAP ICI canceler degrade severely at high Doppler frequencies, thus it suffers from a performance loss which leads to an error-floor. Hence, although the MAP ICI canceler proposed in [27] is well suited to large FFT sizes from an implementation point of view, it is clearly shown that the proposed BP detector outperforms the MAP ICI canceler in terms of BER performance for the channels of interest in this article.

BER performance analysis for the graphical approach
The performance of the proposed BP detector is evaluated for QPSK and 16QAM constellations and over two channel models. Turbo reception based on the graphical approach presented in Section 6 is considered and 2 BP iterations and 20 LDPC iterations are performed in each of the 5 turbo iterations. MS algorithm and CR= 2/3 are considered for LDPC decoding in QPSK simulations and SPA and CR= 1/2 in 16QAM simulations. Figure 10 shows the system performance for the TU6 channel model. Regarding the QPSK constellation, it can  be appreciated that the BP detector is able to remove the error-floor caused by the ICI even for the highest Doppler frequency considered in this article (f d = 0.5). We can also appreciate that the BER curve shifts to the right side as the Doppler frequency increases. This is caused by the residual ICI due to not considering the bandwidth parameter q higher that 1, which would enhance the BER performance with the cost of much higher complexity. It is considered that q = 1 gives a good trade-off between performance/complexity, since the performance loss comparing with the free-ICI case is about 0.8 dB for f d = 0.5. In the case of 16QAM constellation, the mentioned residual ICI, which can be modeled as a near-Gaussian noise, affects much more to the signal preventing from achieving the free-ICI performance. Still, the BP detector is able to remove the error-floor caused by the Doppler frequency and approach the free-ICI case up to 1.2 dB.  Figure 11 BER performance of the proposed BP detector over RA6 channel model. Turbo reception based on the graphical approach is adopted. QPSK and 16QAM constellations are considered. http://jwcn.eurasipjournals.com/content/2012/1/331 Figure 11 shows the performance of the BP detector for the RA6 channel, where it is shown that the behavior of the receiver changes comparing to the previous case: the BER performance improves significantly as the Doppler frequency increases. The reason behind this behavior is that the maximum frequency diversity order that a BICM-OFDM system can achieve is defined by min(d free , L) [43], where d free refers to the Hamming distance of the channel code and L stands for the number of channel taps. Since the Hamming distance of the LDPC code is very high, the diversity order of the BICM-OFDM system is determined by the total number of channel taps, which is very low for RA6 (6-taps) comparing with TU6 (47-taps).
In the case of RA6 channel, the exploitation of the frequency diversity introduced by the ICI in the BP detector increases the diversity order of the system becoming the BER curve steeper as the Doppler frequency increases. Taking into account that high mobility scenarios usually take place in rural environments, it is concluded that the combination of the BICM-OFDM scheme with the proposed BP detector consists of an excellent solution for high mobility scenarios. For instance, it is shown that, the BP detector achieves free-ICI performance for f d = 0.22 with 16QAM constellation.

Conclusions
A novel FG-based detector has been proposed for wireless mobile OFDM systems which enables exploitation of the frequency diversity available in the received signal affected by ICI. The implementation of this novel scheme for signal detection allows high speed joint detection and decoding providing excellent performance at high mobility scenarios comparing with other ICI compensation schemes in the literature. The performance of the proposed detector has been analyzed over TU6 and RA6 channel models using LDPC codes. Furthermore, the inclusion of pilot symbols has also been considered in order to analyze how they can assist the detection process. Compared to the traditional FB receiver, the proposed BP detector is better suited to large FFT sizes.