- Open Access
Iterative receivers with channel estimation for multi-user MIMO-OFDM: complexity and performance
© Hammarberg et al; licensee Springer. 2012
- Received: 15 May 2011
- Accepted: 1 March 2012
- Published: 1 March 2012
A family of iterative receivers is evaluated in terms of complexity and performance for the case of an uplink multi-user (MU) multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) system. The transmission over block fading channels is considered. The analyzed class of receivers is performing channel estimation inside the iterative detection loop, which has been shown to improve estimation performance. As part of our results we illustrate the ability of this type of receiver to reduce the required amount of pilot symbols. A remaining question to ask is which combinations of estimation and detection algorithms that provide the best trade-off between performance and complexity. We address this issue by considering MU detectors and channel estimators, with varying algorithm complexity. For MU detection, two algorithms based on parallel interference cancellation (PIC) are considered and compared with the optimal symbol-wise maximum a-posteriori probability (MAP) detector. For channel estimation, an algorithm performing joint minimum-mean-square-error (MMSE) estimation is considered along with a low complexity replica making use of a Krylov subspace method. An estimator based on the space alternating generalized expectation-maximization (SAGE) algorithm is also considered. Our results show that low-complexity algorithms provide the best tradeoff, even though more receiver iterations are needed to reach a desired performance.
- multi-user detection
- iterative receiver
- channel estimation
In future wireless systems high data rate transmissions need to be supported, requiring larger bandwidths to be used. At the same time, spectral efficiency is becoming increasingly important. A technology that has become popular in later years, and also found its way into many wireless standards such as, e.g., LTE , is the use of multiple-input multiple-output (MIMO) antenna systems in combination with orthogonal frequency division multiplexing (OFDM). OFDM is used to efficiently combat inter-symbol interference (ISI), inherent in broadband transmissions, while MIMO is used for improving the channel spectral efficiency and/or suppress interference.
Introducing multiple users (MU) into such systems, a MU-MIMO-OFDM system is created. In the uplink, accurate multi-user (MU) receivers are needed to harvest the available gains. A significant number of algorithms, with varying complexity, have been proposed for this task; ranging from the simple zero-forcing detector to the high complexity maximum-likelihood (ML) detector. Please refer  for an overview.
The degree of channel state information (CSI) available at the receiver plays an important role in the design of the receiver structure. While it is convenient for theoretical investigations to assume that perfect CSI is available, practical receivers need to obtain CSI via, e.g., noisy pilot symbol observations. In the case of a large coherence time, the accuracy of the channel estimate can be made high since many symbols can be dedicated for pilot information without any significant effect on the spectral efficiency. In fast fading environments, or packet-based systems, the number of pilot symbols must, however, be kept small to maintain a reasonable spectral efficiency. To this end, other more sophisticated transceiver structures have been developed [3–5]. These receivers jointly detect the data symbols and estimate the transmission channel, which allows for a lower number of inserted pilot symbols as compared to traditional pilot based transceiver systems. While the prospect of reducing the number of pilot symbols is important, these receivers are of limited utility since they have grossly larger computational complexity than traditional pilot based receivers. This complexity amplifies dramatically if the data is coded.
The discovery of the turbo principle  brought radical changes to the entire communication field. It is today understood that highly complex problems, such as jointly detecting coded data and estimating the underlying transmission channel, can be efficiently handled by iteratively solving much simpler sub-problems. In particular, during the last decade there has been a growing interest in iteratively solving the joint coded data detection and channel estimation problem [7–10]. The receiver is alternating between decoding of the outer error correcting code, performing multi-user detection (MUD), and estimation the transmission channel, in an iterative manner. In , a theoretical framework is presented for this, elsewhere ad-hoc, choice of receiver design; strengthening the motive for this choice.
Even though iterative algorithms can reduce the complexity of the digital receiver, they may still be of prohibitive complexity in many practical scenarios; representing itself in a large chip area and high power consumption. It is therefore important to find low-complexity algorithms that are both power efficient and can deliver performance required to reach high spectral efficiencies.
In the current literature, an impressive number of low-complexity algorithms have been proposed for the different components of an iterative receiver, see e.g., . However, few have studied the trade-off between complexity and performance for the entire receiver, including MUD, channel estimation and channel decoder. In , we have performed a trade-off analysis for an interleave division multiple access (IDMA) system, where a number of channel estimation algorithms are evaluated. One other exception is , where the complexity and performance of a set of receiver algorithms for MIMO multi-carrier code division multiple access (MC-CDMA) systems are investigated. In contrast to , this article evaluates a family of iterative receivers for an uplink MU MIMO-OFDM system, operating over block fading channels. Furthermore, we have tried to place a greater focus on the convergence properties of the different receiver configurations. The convergence speed is important since more iterations require a larger computational effort. Also worth mentioning is the work in , where a performance-complexity comparison of receivers for down-link MIMO-OFDM systems is performed. Unlike in our comparison, the investigated receivers does not contain any channel estimator.
A tradeoff analysis between complexity and performance is performed for a MU MIMO-OFDM system incorporating iterative channel estimation and MUD. Two popular channel estimation algorithms, one based on expectation maximization (EM) , and one performing a joint minimum-mean-square-error (MMSE) estimation of all user channels [8, 9], are evaluated. A low-complexity approximation of the latter based on a Krylov subspace projection method, as presented in , is also evaluated. Three popular MUDs are considered; two parallel interference cancellation (PIC) based detectors and one full maximum a-posteriori probability (MAP) detector. The latter being a natural performance benchmark.
In the tradeoff analysis, the total complexity, in terms of complex multiplications, required to reach a given bit error rate (BER) is derived for all algorithm combinations at different signal-to-noise ratios and number of users. The results show that low-complexity schemes are generally providing the best tradeoff.
The convergence properties of the different receiver combinations are presented, both in terms of BER, mean square estimation error, and through the use of extrinsic information transfer (EXIT) charts . The EXIT charts visualize the exchange of extrinsic information between the outer code and the rest of the receiver incorporating channel estimation and MUD.
The rest of this article is organized as follows. In Section 2, a description of the considered MU-MIMO-OFDM system is given. The algorithms for obtaining the channel estimate are presented in Section 3, and the MUD algorithms in Section 4. In Section 5 the complexity of the algorithms is discussed, and in Section 6 the performance of different algorithm combinations is investigated. A complexity versus performance analysis is performed in Section 7, before the paper is summarized in Section 8.
2.1 MU-MIMO-OFDM system overview
At the receiver, the signal is demodulated into the complex baseband, where an iterative receiver is implemented. The complexity-performance trade-off of this receiver is the focal point of this article. The receiver consists of three blocks; a channel estimator, a MUD, and a bank of soft-input-soft-output (SISO) channel decoders. First, an initial channel estimation is performed, based on the transmitted pilot symbols. This estimate is then used in the MUD to separate the different user streams, which are then fed to the SISO decoders after de-interleaving (Π-1). The output of the decoders are then used in the next iteration to update the channel estimate, and to further improve the user separation in the MUD. Multiple iterations are then performed in the same way. The different components are described in detail in later sections.
2.2 Input-output relationship of the channel
Next we turn the attention to a description of the input-output relationship of the channel used in this article. The notation introduced here will also be used for the description of the various algorithms. Furthermore, a low-rank description of the channel, being used by the channel estimation algorithms, is also introduced in section.
from the K autonomous users to the N-antenna base-station at subcarrier m. For later use, we define h:,k[m] = [h1,k[m], ..., hN,k[m]]T and similarily for hn,:[m] and hn,k[:].a Note that due to the block-fading assumption, the matrix H[m] does not depend on s. Furthermore, r[m, s], x[m, s], and w[m, s] are column vectors which contain the received signal, the composite transmitted vector from the K users, and the noise vector ( distributed) respectively, at subcarrier m and OFDM symbol s.
where is a diagonal matrix which contains user k's transmitted data in OFDM symbol s along its diagonal, and is a vector collecting the noise at receive antenna n across subcarriers.
All channel estimation algorithms to be evaluated in this article are based on low rank approximations of the wireless channel. The assumption made is that the channel is limited in the delay domain, and can therefore be accurately represented by a relatively small number of base functions. The optimal set of base functions are presented in , and are known under the name discrete prolate spheroidal (DPS) sequences. Their use for low-complexity channel estimation were proposed in , and estimators using the same type of base functions have also been proposed in, e.g., .
where is collecting the received signal in all time-frequency positions and at all receive antennas, is an observation matrix collecting the transmitted symbols and channel base functions, is collecting the channel coefficients for all users, and is collecting noise. More explicitly, the data structures are given by: r= (rT,..., rT[S])T, r[s] = (rT[1, s],..., rT[M, s])T, , , .
The DPS base functions are obtained from solving the eigenvalue equation [8, 18, 20], Cu i = λ i u i , where is a channel correlation matrix. For later use, the eigenvalues λ i are collected in a vector, λ = [λ1,..., λ I ] T . For I ≥ ⌈τmaxM ⌉ + 1, where ⌈·⌉ denotes the ceil operation, the energy of the eigenvalues are small and can in general be neglected . This value sets a bound on the number of DPS sequences that are needed to represent the channel in an accurate way.
In order to achieve satisfactory detection performance, high-accuracy channel estimates need to be made available at the receiver. A large number of appropriate algorithms has been proposed in the literature. Amongst these, two popular families of algorithms have received a great deal of attention; algorithms performing joint estimation for all users [8, 22, 23], and algorithms based on interference cancellation [15, 24]. In this article, two algorithm from the first, and one from the second family is considered. The algorithms make use of the transmitted pilot symbols, as well as decoded data symbols. Thus, they are all using the turbo principle to iteratively improve the channel estimate as the reliability of the decoded data symbols increases. Furthermore, the algorithms have in common that they all use the same underlying low-rank channel model, the one given in Section 2.2.
The first algorithm, previously presented for MC-CDMA systems in [8, 25] and later for MIMO-OFDM in , performs a joint MMSE estimate of the composite channel matrices H[m] based on the model in (3). The second algorithm, presented in  for MC-CDMA, uses a Krylov subspace method to approximate a costly matrix inverse in the joint MMSE estimator. The third algorithm, based on , is using the EM framework, and iteratively performs per-user channel estimation, i.e., estimates of the columns of H[m]. We slightly modify the second algorithm by using the improved space alternating generalized expectation-maximization (SAGE)  algorithm. The three algorithms are described below.
3.1 Joint MMSE estimator using soft decisions (joint MMSE)
where has the same structure as Ξ, but contains both known pilot symbols and soft estimates of the transmitted data carrying symbols; , with ϑ= (ϑT,..., ϑT[S])T, ϑ[s] = (ϑ[1, s],..., ϑ[M, s])T, , and are either pilots or soft symbol outputs from the decoder, and 1 N is the all-ones column vector of length N. Further, note that , and C ψ is the covariance matrix of the DPS sequences.
Due to the sizes of the matrices involved in (7), the computational complexity can be expected to be significant. The computational burden is significantly decreased, but still large, if the sparsity and regularity of is taken into account. We will elaborate more on this in Section 5.
3.2 Krylov subspace reduced joint MMSE estimator using soft decisions (Krylov MMSE)
As mentioned above, the implementation of the joint MMSE estimator embeds a significant computational cost. Multiplication of matrices of large dimensions, along with a costly matrix inversion, adds greatly to the receiver complexity. In  an approach to reduce these costs was proposed. The algorithm is making use of a Krylov subspace method, more precisely the unconditional conjugate gradient method , to iteratively solve (7). The method iteratively finds the solution to the linear equation system x= Ab, based on an initial guess x0, using that . The number of terms S k gives the dimensionality of the Krylov subspace, and equals the number of iterations in the algorithm.
Outline of the Krylov subspace projection method
Input: A, b and
ρ1 = rHr
α = ρ1/pHq
r= r- α q
for s = 2,..., S k (or while ρ s > ϵ)
ρ s = rHr
β = ρ s /ρs-1
p= r+ β p
α = ρ s /pHq
r= r- α q
3.3 SAGE based estimator (SAGE ML)
Even though the Krylov subspace method can significantly reduce the complexity of the joint MMSE estimator, the complexity is still high, since large matrix-vector multiplications are required in each Krylov iteration. A low-complexity alternative, which has shown good performance, is to use an algorithm based on EM/SAGE. In SAGE, given a received signal, the ML solution is iteratively generated based on an underlying subspace model of the data. In  one such algorithm was presented, producing an optimal low-rank MMSE estimate of the channel. The details of that algorithm are outlined below, where a conversion from EM to SAGE has been performed.
Initialization: For all k and s(9)
For each iteration i:
In (12), the matrix stems from the low-rank MMSE estimator, and in (13) averaging is performed to make use of the assumption that the channel is static within a block.
The value of X k [s] is only perfectly known at time instances where pilots are transmitted. On all other positions, symbol estimates must be used. The estimates are updated by the SISO decoders in every iteration, using the most recent channel estimate. Here, hard decisions of the decoded soft symbols are used for channel estimation, and soft for interference cancellation.
At the very first receiver iteration, no channel estimate is available. Therefore, the algorithm is initialized with . Furthermore, to improve the accuracy of the initial estimate, several internal iterations can be performed within the estimator itself. This can be seen as the algorithm being reinitialized with its own updated channel estimate, without waiting for updates on the symbol estimates. In this article, this is only performed at the initial pilot based stage, where the gain is observed to be the largest. In later stages, multiple internal iterations are not producing any significant gain, thus mainly adding to the computational complexity.
With estimates of the transmission channel having been made available by the channel estimator, the next stage of the iterative receiver structure is to produce likelihood-ratios of the coded data symbols. This operation is performed by the MUD, which apart from the received signal and channel estimate, uses a-priori information of the transmitted symbols. This information is provided, from the previous iteration, by the channel decoder. The optimal SISO detector is the symbol-wise MAP detector, implemented through the BCJR algorithm . Unfortunately, the complexity of the MAP detector in the MIMO case is prohibitive in most situations, except for the cases when the number of users K is small. Therefore, reduced complexity techniques have to be considered for most practical applications. Furthermore, although optimal detection is not generally feasible in practice, it remains important as a benchmark reference, and will therefore be considered in this article. The principles behind the MAP algorithm are outlined in Section 4.1.
Many reduced complexity detection algorithms have been proposed in the literature . To restrict the investigations, two such algorithms have been selected and are presented in Section 4.2. Both algorithms are based on PIC. The first algorithm applies a matched filter (MF) after the cancellation, while the other applies an MMSE filter, in an attempt to further suppress the inter-user interference. While the latter approach yields better performance it is also more complex. In later sections we shall investigate whether the performance gain motivates the increased complexity.
4.1 Maximum a-posteriori probability
As stated previously, the optimal MUD is the symbol-wise MAP detector. While the PIC-based algorithms, being introduced in Section 4.2, only make use of the mean values , the symbol-wise MAP detector works with the probability mass function of x[m,s], denoted Pa(x[m, s]).
As was discussed above, the complexity of the symbol-wise MAP detector (16) may in many cases be prohibitively large, showing the demand for low complexity schemes.
4.2 PIC based detectors
where is equal to , except for element k, which is set to zero. A filtering of the signal is then applied to produce an estimate of the transmitted symbol x k [m, s]. A mapping to LLR values then follows.
is the variance of the residual interference plus noise for user k.
When it comes to practical implementations of iterative MU receivers, complexity considerations are of great importance. Since several receiver iterations are generally needed to reach a desired performance, the total computational effort can grow very large. To get an estimate of this cost, we have chosen to present and compare the complexity of the addressed algorithms in terms of the required number of complex-valued multiplications. This measure is chosen since it provides a reasonable estimate of the complexity, while being analytically tractable. Obviously, the final computational and hardware complexity depends on a large number of parameters, such as memory requirements, parallelization, hardware reuse, word lengths, etc.
Expresions for the complexity per user for the different receiver components, as well as the required number of complex multiplications per information bit
Total no. of complex mult.*
Mult. per bit
2MNS +2MNL +IN
MS(3 + K) + KMI(1 + I) + K2I3 + N(MS + 2MI + KI2)
3MS + MSN + 2IMN + CAx(S k + 1) + IN(5S k + 2) + 3NS k /K
CAx = 3MSN+ IN(M +1)
4SM + 3SMN + SMNK + SMK3 + MNK
(42M(S - Sp))/3
5.1 Channel estimator complexity
Three different channel estimation algorithms were presented in Section 3, joint MMSE, Krylov MMSE and SAGE ML. As seen in Table 2, the difference in complexity is significant. For the discussions below, we will assume that the number of OFDM symbols in each block is smaller than the number of subcarriers, i.e., S < M.
Looking at the first algorithm, the optimal joint MMSE algorithm, the complexity is large, as previously discussed. Since all user channels are estimated jointly, using all available frequency and time samples, the dimensionality of the problem to solve becomes very large. Looking at (7), a straightforward implementation would be very costly due to the dimensionality of the involved data structures. Fortunately, considerable reductions can be achieved. Firstly, under the assumption of independent receive antenna channels, the same estimator can be used independently on each antenna. Secondly, under the block fading assumption, the matrix is the product of a block diagonal matrix and a block matrix with diagonal sub-matrices. Thus, the operations involving this structure can be computed efficiently. It should be noted that under the assumption of independent receive antennas, Ξ is block diagonal with identical sub-matrices. The estimator only involve one of these SM × KI submatrices. In the end, the main part of the complexity is related to two operations; the product of and the inverse operation of a KI × KI matrix. The computational complexity of the former is approximately M(IK)2, while approximately (KI)3 for the latter. For the system settings considered in this article the two are of comparable size. Also note that the hermitian properties of the data structures can be exploited to further reduce complexity.
The second algorithm make use of a Krylov subspace method to avoid the explicit matrix inversion in (7). At the same time the explicit computation of can be avoided. This will be beneficial as long as S < M. Referring back to Section 3.2 and Table 1, the main part of the complexity lies in calculating Av s , which is performed once for every subspace dimension S K . From a complexity point of view, its preferable to keep S K low. On the other hand, a too small value will provide a poor approximation of the matrix inverse, and thus poor performance. The value thus needs to be chosen with care, trading complexity for performance. An upper limit on the number of dimensions may be set by timing constraints in the receiver.
The last algorithm, based on SAGE, has the lowest complexity and performs a separate channel estimate for each user channel. SAGE ML has less then half the complexity of Krylov MMSE with S K = 1. This suboptimal approach has an attractively low complexity and, as will be seen in Section 6, also delivers good performance. The complexity is linear in the number of user, i.e., the complexity per user is constant. The main part of the complexity is shared between the per symbol estimate, the interference cancellation, and the subspace filtering, i.e., the utilization of the frequency correlation.
The former two is proportional to the number of OFDM symbols S, while the latter to the subspace order I, all with the same proportionality constant. The complexity can thus be reduced by lowering the number of OFDM symbols taken into account when performing the estimation, or by reducing I. Both actions would come at the price of a performance loss.
5.2 MUD complexity
As for the different channel estimation algorithms, the complexity of the considered MUDs differ significantly, as seen from Table 2. The one with the lowest complexity is the PIC-MF, which due to its simplicity requires relatively few arithmetic operations. The complexity is shared between the interference cancellation plus MF, and generating the LLRs. The former requiring a bit more computational effort. Despite its low complexity, as will be seen in Section 6, the performance is still competitive at low user loads.
Using a soft information based MMSE filter instead of the MF, the performance will be shown to improve. This comes at a cost of an increased complexity due to the MMSE filter in (21) which needs to be calculated for each user and for each data symbol. The filter includes an inverse of a K × K matrix. At high user loads, computing the inverse will dominate the complexity. If the number of users grow very large, subspace methods as the one used in the Krylov MMSE estimator could be used to reduce the complexity.
If the optimal MAP receiver is considered, the complexity is significantly increased. The complexity, as derived in , grows exponentially in the number of users. For few users, the complexity is manageable, but as the number of users grows, it rapidly becomes prohibitive. It should be noted that there exist a number of reduced complexity MAP-like detectors which are based upon searching trees [32, 33], which are not included in our comparison.
In order to investigate the receiver performance under the use of the different algorithms, computer simulations were performed. In the simulations, each user transmits S = 20 OFDM symbols, each with M = 256 subcarriers. If nothing else is stated, a single OFDM symbol is dedicated for training information, i.e., S p = 1, which is generated randomly for each user. Non-orthogonal transmission of the pilot symbols are assumed, i.e., all users transmit their pilot symbols simultaneously in time and frequency. This may incur a loss in performance, but is motivated by the flexibility it brings to the system configuration if varying number of users is to be supported. A rate 1/2 convo-lutional code with generator polynomial (7, 5)8 is used to generate the code bits, which after random interleaving are mapped to QPSK symbols. For the receiver, we are restricting the investigation to N = 4 antennas, while different number of transmitting users are considered.
where αp,k,nare zero-mean complex Gaussian random variables with an exponential power delay profile, , where C is a constant, and the delays τp,k,nare uniformly distributed within the CP. In this article, the length of the channel, normalized to the symbol duration, is τmax = 0.1, the root mean square delay spread set to τrms = 0.03, and the number of multi-path components P = 100. The channel delay is assumed to be no longer than the CP, and the block fading channel is generated independently for each user and receive antenna link. The number of DPS sequences used in the channel estimation process is chosen as I = 36, guided by the discussion in Section 2.2, and adding a few for improved performance at high SNR. The subspace order in Krylov MMSE estimator is set to S k = 5, if nothing else is stated.
In the following, the motive behind performing the complex operation of channel estimation in the loop of an iterative receiver is first illustrated with an example. In the example, the average BER performance at different E b /N0 is compared for receivers using the channel estimator inside or outside of the iterative loop. It will be seen that the gains by performing the estimation inside the loop can provide significant performance gains. Here, E b is the average bit energy at the receiver. Furthermore, the impact of the array gain has been removed by scaling the noise variance by N.
We then study the evolution of the BER and MSE of the channel estimate, over the receiver iterations. This is done for different user loads. The results illustrate the difference in convergence speed of the different receiver configurations, which is important when assessing the total computational complexity needed to reach a certain level of performance. Finally, the convergence analysis is extended with the use of EXIT charts; providing additional insight on the receiver.
6.1 Illustration of the gains of using channel estimation inside the detection loop
As was seen in Section 5, performing channel estimation adds significantly to the total receiver complexity. Furthermore, having the estimation inside the loop of an iterative receiver, this costly operation needs to be performed multiple times. It would therefore, from a complexity point of view, be attractive to move the estimation outside the loop, only performing it once for each code block based on the transmitted pilot symbols.
As seen from the figure, if only pilot based estimates are used, there is a significant performance loss, as compared to when using channel estimation in the iterative loop. For few pilot symbols, a loss in performance of 1-3 dB is observed, while if the number of pilot symbols is increased to S p = 10, the loss is small. Remember that the total number of OFDM symbols in a block is S = 20, thus transmitting ten symbols yields a 50% pilot overhead, which is unacceptable for most applications. Transmitting orthogonal boosted pilots also result in a loss of up to 1 dB. The performance achieved with orthogonal pilots is only slightly better than when transmitting S p = 4 non-orthogonal pilots, since joint channel estimation is performed. Furthermore, if iteratively updating the channel estimates, close to single user performance with PCSI is achieved. It can therefore be concluded that the use of channel estimation inside an iterative receiver can give significant performance gains, as compared to pure pilot based approaches. This means that pilot density can be kept low, without sacrificing performance, thus improving the system throughput.
6.2 Convergence performance: BER and MSE
Starting with the BER in Figure 3, it is clear that convergence properties differ between algorithm combinations. At the smaller user load, i.e., K = 4, the difference in convergence is relatively small, with all algorithms reaching roughly the same BER within 3-8 iterations. The fastest convergence is achieved using the MAP based MUD with joint MMSE channel estimation, while the slowest is obtained if using the PIC-MF detector with SAGE ML estimation. By using the MMSE Krylov estimator with S K = 5, a small performance loss as compared to joint MMSE is observed. Increasing this value to S K = 10, close to joint MMSE performance has been observed. Looking at a system load of K = 7 users, a similar behavior as with K = 4 is seen. Comparing the performance achieved when using the different MUDs, the best performance is given by the MAP. A gain of 1-5 iterations over the PIC-MMSE detector is observed. There is a large difference in convergence depending on which estimator is used, and additional insight on this will be given when looking at the EXIT charts in the next section. Furthermore, at this high user load, the PIC-MF can not provide sufficient detection performance for receiver convergence. It is also interesting to note that performance close to that of a single user with PCSI at the receiver is achieved for all receiver configurations, except for PIC-MF at K = 7 users. This illustrate the good performance obtained by the iterative receiver.
Looking at the average MSE, as shown in Figure 4, similar trends as for the BER are seen. The convergence speed of the joint MMSE estimator is better than that of SAGE ML, and the difference increases with the user load. Furthermore, in the first iteration, only pilot symbols are used for channel estimation, and a large MSE is obtained due to the relatively small number of available pilots. In the iterative process, as the reliability of the symbol estimates increases with iterations, so does the accuracy of the channel estimate.
6.3 Convergence performance: EXIT charts
Even though the BER and MSE convergence provide some insight on the behavior of the different algorithms, they have some limitations. One significant drawback is that the performance of the channel estimation and detection algorithms cannot be separated from that of the code. Other means are therefore of interest for the receiver evaluation.
One popular technique for visualizing the convergence behavior of iterative decoders is the EXIT charts . The charts are used to visualize the exchange of extrinsic information between the SISO units making up an iterative decoder. In , it was shown that the MUD could be seen as SISO unit being serially concatenated with the outer channel decoder. In our case, we have three units, the MUD, the channel estimator and the decoder. Even though it is possible to visualize the exchange between all three SISO units [36, 37], it is more convenient to combine the estimator and the MUD into a single SISO unit , referred to as MUD/CE.
where Ia = I (x; Λa) is the a priori input mutual information and Iext = I (x; Λext) is the output extrinsic information.
where , and .
where the probability density function, , is approximated using histogram calculations. The transfer functions are then averaged over 20 channel realizations. The transfer function for the SISO decoder is obtained in a similar way.
When generating the transfer function for the MUD/CE, the initial guess for the Krylov MMSE and SAGE ML has to be provided. In the receiver this value is given by the estimate obtained in the previous iteration. Since this value is unknown, we solve it by running the channel estimator twice, first initialized with the all one channel then reinitialized with the new output. This potentially leads to an over estimated performance at high Iext. For SAGE ML this also leads to an under estimated performance at low values.
Comparing the channel estimation algorithms, Krylov MMSE, used with S K = 5, delivers performance identical to Joint MMSE. For SAGE ML, the performance is much worse, but the performance at low Ia is somewhat underestimated as discussed above. From Figure 5, we also see the impact of inaccurate CSI, illustrating itself by a gap between the transfer functions obtained when using the channel estimation and when having PCSI. As the reliability of the a priori information increases, this gap is decreased since the produced estimates become increasingly accurate. Looking at the MUDs, the MAP obviously has the best performance, followed by PIC-MMSE and PIC-MF. Furthermore, when the SNR is reduced (essentially leading to downward shift of the transfer functions of the MUD/CE), or when increasing the user load (essentially changing the slope of the transfer functions), the PIC-MF will be the first MUD closing the gap to the SISO decoder transfer function, and thus failing to converge.
Overall, we see that the insight given by the EXIT chart matches fairly well with what was observed for the BER. Furthermore, observing the MAP detector for K = 7 users in Figure 3, large difference in convergence performance between using the MMSE estimators or SAGE ML was observed. This could be explained by the fact that the gap in the EXIT chart is smaller for the latter estimator. From a algorithm design point of view, it is also interesting to observe that for the case presented in Figure 5 there is still room for further simplifications of the receiver structure. Additionally, the performance obtained when using an alternative channel code can be estimated by replacing the transfer function for the chosen convolutional code in Figure 5.
From a receiver design point of view, the trade-off between performance and complexity is an important aspect. In an attempt to shed some light on this aspect, the total receiver complexity, in terms of the number of complex multiplications, needed to reach a specific target BER is investigated. The total complexity depends both on the choice of channel estimator and MUD, as well as on the number of iterations needed to reach the target. For the evaluation, a target BER of 10-3 is chosen. The system settings are the same as described in Section 6, i.e., N = 4 receive antennas, S p = 1 and S = 20 OFDM symbols, M = 256 subcarriers and I = 36 DPS sequences. The subspace order in Krylov MMSE is set to S K = 5.
As was previously seen in Figure 3, under these system settings, all receivers reach the same BER performance of ~10-4. On the other hand, looking at the number of multiplications needed to reach this value, there is more than an order of magnitude difference between the receiver configurations. The receiver configurations using the MAP detector is found on the right, requiring the largest number of multiplications to reach convergence. To the left, we find the PIC based MUDs using SAGE ML, providing the cheapest alternative. Looking at the target BER of 10-3, the algorithms with the lowest total complexity is PIC-MF followed by PIC-MMSE. Reaching the target in about 70 and 100 complex multiplications per information bit, respectively. When using the MMSE Krylov estimator, we see that PIC-MF and PIC-MMSE reach the target using approximately the same number of multiplications, though PIC-MF require one more iteration.
The results shown in Figure 7 take the overall computational complexity into account and may therefore fail to show other interesting trade-offs. An example of this is seen in Figure 3, where the difference in convergence speed between the algorithms is large. Depending on the hardware architecture used, this may affect the latency of the system, and for time critical systems, the choice of algorithm combinations may therefore be another. We believe, however, that our evaluation shows that combinations of algorithms with low computational complexity, when used in an in an iterative receiver, can deliver very competitive performance for a large range of scenarios.
In this article, we have studied the trade-off between complexity and performance for uplink receivers in a packet based MU MIMO-OFDM system. The considered iterative receivers contained three main components; a MUD, a channel estimator and a con-volutional decoder. Three different MUD algorithms were considered, two suboptimal approaches based on PIC and one optimal based on MAP. For channel estimation, three algorithms were evaluated, one optimal joint MMSE based estimator, a low complexity Krylov subspace based version of the same, and one sub-optimal based on SAGE. The difference in complexity between the algorithms were shown to be large.
When only considering performance, the high complexity algorithms naturally showed the fastest convergence. The low-complexity algorithms showed similar BER performance as the more complex ones, when converging, but at a generally slower convergence speed. More insight on the convergence was also provided through EXIT charts. When taking complexity into account, we demonstrate that the sub-optimal low-complexity algorithms are often the most attractive choice. Even though a larger number of receiver iterations were needed, the total number of complex multiplications was still lower, due to a significantly lower computational cost per-iteration. At the same time, it should be noted that the most simple receiver failed earlier than the others at high user loads, which indicates that an appropriate balance between complexity reduction and performance needs to be achieved. Furthermore, for time critical systems where convergence speed is at focus, high complexity algorithms may be a better choice.
aIn general the notation will be that sub-indices state which user and receive antenna is considered, while the time and frequency position will be given in brackets.
- Dahlman E, Parkvall S, Sköld J, Beming P: 3G Evolution HSPA and LTE for Mobile Broadband. Academic Press, Oxford; 2008.Google Scholar
- Larsson EG: MIMO detection methods: how they work. IEEE Signal Process Mag 2009, 26(3):91-95.View ArticleGoogle Scholar
- Vikalo H, Hassibi B, Stoica P: Efficient joint maximum-likelihood channel estimation and signal detection. IEEE Trans Wirel Commun 2006, 5(7):1838-1845.View ArticleGoogle Scholar
- Xu W, Stojnic M, Hassibi B: On exact maximum-likelihood detection for non-coherent MIMO wireless systems: a branch-estimate-bound optimization framework. In Proc IEEE International Symposium on Information Theory. Toronto, Canada; 2008:2017-2021.Google Scholar
- Ryan DJ, Collings IB, Clarkson IVL: GLRT-optimal noncoherent lattice decoding. IEEE Trans Signal Process 2007, 55(7):3773-3786.MathSciNetView ArticleGoogle Scholar
- Berrou C, Glavieux A: Near optimum error correcting coding and decoding: turbo-codes. IEEE Trans Commun 1996, 44(10):1261-1271. 10.1109/26.539767View ArticleGoogle Scholar
- Wautelet X, Dejonghe A, Vandendorpe L: MMSE-based fractional turbo receiver for space-time BICM over frequency-selective MIMO fading channels. IEEE Trans Signal Process 2004, 52(6):1804-1809. 10.1109/TSP.2004.827198View ArticleGoogle Scholar
- Zemen T, Mecklenbrauker C, Wehinger J, Muller R: Iterative joint time-variant channel estimation and multi-user detection for MC-CDMA. IEEE Trans Wirel Commun 2006, 5(6):1469-1478.View ArticleGoogle Scholar
- Salvo Rossi P, Müller R: Slepian-based two-dimensional estimation of time-frequency variant MIMO-OFDM channels. IEEE Signal Process Lett 2008, 15: 21-24.View ArticleGoogle Scholar
- Hu B, Land I, Rasmussen L, Piton R, Fleury B: A divergence minimization approach to joint multiuser decoding for coded CDMA. IEEE J Sel Areas Commun 2008, 26(3):432-445.View ArticleGoogle Scholar
- Honig ME: Advances in Multiuser Detection. Wiley, Hoboken; 2009.View ArticleGoogle Scholar
- Hammarberg P, Rusek F, Edfors O: Channel estimation algorithms for OFDM-IDMA: complexity and performance. IEEE Trans Wirel Commun, in press.Google Scholar
- Dumard C, Zemen T: Low-complexity MIMO multiuser receiver: a joint antenna detection scheme for time-varying channels. IEEE Trans Signal Process 2008, 56(7):2931-2940.MathSciNetView ArticleGoogle Scholar
- Ketonen J, Juntti M, Cavallaro J: Performance--complexity comparison of receivers for a LTE MIMO-OFDM system. IEEE Trans Signal Process 2010, 58(6):3360-3372.MathSciNetView ArticleGoogle Scholar
- Gao J, Liu H: Low-complexity MAP channel estimation for mobile MIMO-OFDM systems. IEEE Trans Wirel Commun 2008, 7(3):774-780.View ArticleGoogle Scholar
- ten Brink S: Convergence of iterative decoding. IEEE Electron Lett 1999, 35(10):806-808. 10.1049/el:19990555View ArticleGoogle Scholar
- Ylioinas J, Raghavendra M, Juntti M: Avoiding matrix inversion in DD SAGE channel estimation in MIMO-OFDM with M-QAM. In Proc IEEE Vehicular Technology Conference 2009 fall. Anchorage, AK; 2009:1-5.View ArticleGoogle Scholar
- Slepian D: Prolate spheroidal wave functions, Fourier analysis, and uncertainty-V: the discrete case. Bell Syst Tech J 1978, 57(5):1371-1430.View ArticleGoogle Scholar
- Zemen T, Mecklenbrauker C: Time-variant channel estimation using discrete prolate spheroidal sequences. IEEE Trans Signal Process 2005, 53(9):3597-3607.MathSciNetView ArticleGoogle Scholar
- Edfors O, Wilson S, Börjesson P: OFDM channel estimation by singular value decomposition. IEEE Trans Commun 1998, 46(7):931-939. 10.1109/26.701321View ArticleGoogle Scholar
- Kay SM: Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. 1st edition. Prentice Hall, Upper Saddle River; 1993.Google Scholar
- Salvo Rossi P, Müller R: Joint iterative time-variant channel estimation and multi-user detection for MIMO-OFDM systems. In Proc IEEE Global Communications Conference. Washington, DC; 2007:4263-4268.Google Scholar
- Li Y, Seshadri N, Ariyavisitakul S: Channel estimation for OFDM systems with transmitter diversity in mobile wireless channels. IEEE J Sel Areas Commun 1999, 17(3):461-471. 10.1109/49.753731View ArticleGoogle Scholar
- Münster M, Hanzo L: Parallel-interference-cancellation-assisted decision-directed channel estimation for OFDM systems using multiple transmit antennas. IEEE Trans Wirel Commun 2005, 4(5):2148-2162.View ArticleGoogle Scholar
- Zemen T, Loncar M, Wehinger J, Mecklenbrauker C, Muller R: Improved channel estimation for iterative receivers. In Proc IEEE Global Communications Conference. Volume 1. San Francisco, CA; 2003:257-261.Google Scholar
- Fessler J, Hero A: Space-alternating generalized expectation-maximization algorithm. IEEE Trans Signal Process 1994, 42(10):2664-2677. 10.1109/78.324732View ArticleGoogle Scholar
- Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, Van der Vorst H: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. 2nd edition. SIAM, Philadelphia; 1994.View ArticleGoogle Scholar
- Bahl L, Cocke J, Jelinek F, Raviv J: Optimal decoding of linear codes for minimizing symbol error rate. IEEE Trans Inf Theory 1974, 20(2):284-287.MathSciNetView ArticleGoogle Scholar
- Lee H, Lee I, Lee B: Iterative detection and decoding with an improved V-BLAST for MIMO-OFDM systems. IEEE J Sel Areas Commun 2006, 24(3):504-513.View ArticleGoogle Scholar
- Costello D, Banerjee A, He C, Massey P: A comparison of low complexity turbo-like codes. In Signals Systems and Computers Conference Record of the Thirty-Sixth Asilomar Conference on. Volume 1. Pacific Grove, CA; 2002:16-20.Google Scholar
- Roy S, Duman T: Soft input soft output Kalman equalizer for MIMO frequency selective fading channels. IEEE Trans Wirel Commun 2007, 6(2):506-514.View ArticleGoogle Scholar
- Boutros J, Gressety N, Brunel L, Fossorier M: Soft-input soft-output lattice sphere decoder for linear channels. In Proc IEEE Global Communications Conference. San Francisco, CA; 2003:1583-1587.Google Scholar
- Wong KKV: The soft-output M-algorithm and its applications, Dept. In Electrical and Computer Eng. Queens University, Canada, Ph.D. thesis; 2006.Google Scholar
- Hoeher P: A statistical discrete-time model for the WSSUS multipath channel. IEEE Trans Veh Technol 1992, 41(4):461-468. 10.1109/25.182598View ArticleGoogle Scholar
- Alexander P, Grant AJ, Reed MC: Iterative detection in code-division multiple-access with error control coding. Eur Trans Telecommun 1998, 9(5):419-425. 10.1002/ett.4460090504View ArticleGoogle Scholar
- Brännström F, Rasmussen L: A Grant, Convergence analysis and optimal scheduling for multiple concatenated codes. IEEE Trans Inf Theory 2005, 51(9):3354-3364. 10.1109/TIT.2005.853312View ArticleGoogle Scholar
- Shepherd D, Z Shi, Anderson M, Reed M: EXIT chart analysis of an iterative receiver with channel estimation. In Proc IEEE Global Communications Conference. Washington, DC, USA; 2007:4010-4014.Google Scholar
- Otnes R, Tüchler M: EXIT chart analysis applied to adaptive turbo equalization. In Proc 5th Nordic Signal Processing Symp. Hurtigruten from Tromso to Trondheim, Norway; 2002.Google Scholar
- ten Brink S: Convergence behavior of iteratively decoded parallel concatenated codes. IEEE Trans Commun 2001, 49(10):1727-1737. 10.1109/26.957394View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.