 Research
 Open Access
Iterative receivers with channel estimation for multiuser MIMOOFDM: complexity and performance
 Peter Hammarberg^{1}Email author,
 Fredrik Rusek^{1} and
 Ove Edfors^{1}
https://doi.org/10.1186/16871499201275
© Hammarberg et al; licensee Springer. 2012
 Received: 15 May 2011
 Accepted: 1 March 2012
 Published: 1 March 2012
Abstract
A family of iterative receivers is evaluated in terms of complexity and performance for the case of an uplink multiuser (MU) multipleinput multipleoutput orthogonal frequency division multiplexing (MIMOOFDM) system. The transmission over block fading channels is considered. The analyzed class of receivers is performing channel estimation inside the iterative detection loop, which has been shown to improve estimation performance. As part of our results we illustrate the ability of this type of receiver to reduce the required amount of pilot symbols. A remaining question to ask is which combinations of estimation and detection algorithms that provide the best tradeoff between performance and complexity. We address this issue by considering MU detectors and channel estimators, with varying algorithm complexity. For MU detection, two algorithms based on parallel interference cancellation (PIC) are considered and compared with the optimal symbolwise maximum aposteriori probability (MAP) detector. For channel estimation, an algorithm performing joint minimummeansquareerror (MMSE) estimation is considered along with a low complexity replica making use of a Krylov subspace method. An estimator based on the space alternating generalized expectationmaximization (SAGE) algorithm is also considered. Our results show that lowcomplexity algorithms provide the best tradeoff, even though more receiver iterations are needed to reach a desired performance.
Keywords
 MIMO
 OFDM
 multiuser detection
 iterative receiver
 complexity
 channel estimation
1 Introduction
In future wireless systems high data rate transmissions need to be supported, requiring larger bandwidths to be used. At the same time, spectral efficiency is becoming increasingly important. A technology that has become popular in later years, and also found its way into many wireless standards such as, e.g., LTE [1], is the use of multipleinput multipleoutput (MIMO) antenna systems in combination with orthogonal frequency division multiplexing (OFDM). OFDM is used to efficiently combat intersymbol interference (ISI), inherent in broadband transmissions, while MIMO is used for improving the channel spectral efficiency and/or suppress interference.
Introducing multiple users (MU) into such systems, a MUMIMOOFDM system is created. In the uplink, accurate multiuser (MU) receivers are needed to harvest the available gains. A significant number of algorithms, with varying complexity, have been proposed for this task; ranging from the simple zeroforcing detector to the high complexity maximumlikelihood (ML) detector. Please refer [2] for an overview.
The degree of channel state information (CSI) available at the receiver plays an important role in the design of the receiver structure. While it is convenient for theoretical investigations to assume that perfect CSI is available, practical receivers need to obtain CSI via, e.g., noisy pilot symbol observations. In the case of a large coherence time, the accuracy of the channel estimate can be made high since many symbols can be dedicated for pilot information without any significant effect on the spectral efficiency. In fast fading environments, or packetbased systems, the number of pilot symbols must, however, be kept small to maintain a reasonable spectral efficiency. To this end, other more sophisticated transceiver structures have been developed [3–5]. These receivers jointly detect the data symbols and estimate the transmission channel, which allows for a lower number of inserted pilot symbols as compared to traditional pilot based transceiver systems. While the prospect of reducing the number of pilot symbols is important, these receivers are of limited utility since they have grossly larger computational complexity than traditional pilot based receivers. This complexity amplifies dramatically if the data is coded.
The discovery of the turbo principle [6] brought radical changes to the entire communication field. It is today understood that highly complex problems, such as jointly detecting coded data and estimating the underlying transmission channel, can be efficiently handled by iteratively solving much simpler subproblems. In particular, during the last decade there has been a growing interest in iteratively solving the joint coded data detection and channel estimation problem [7–10]. The receiver is alternating between decoding of the outer error correcting code, performing multiuser detection (MUD), and estimation the transmission channel, in an iterative manner. In [10], a theoretical framework is presented for this, elsewhere adhoc, choice of receiver design; strengthening the motive for this choice.
Even though iterative algorithms can reduce the complexity of the digital receiver, they may still be of prohibitive complexity in many practical scenarios; representing itself in a large chip area and high power consumption. It is therefore important to find lowcomplexity algorithms that are both power efficient and can deliver performance required to reach high spectral efficiencies.
In the current literature, an impressive number of lowcomplexity algorithms have been proposed for the different components of an iterative receiver, see e.g., [11]. However, few have studied the tradeoff between complexity and performance for the entire receiver, including MUD, channel estimation and channel decoder. In [12], we have performed a tradeoff analysis for an interleave division multiple access (IDMA) system, where a number of channel estimation algorithms are evaluated. One other exception is [13], where the complexity and performance of a set of receiver algorithms for MIMO multicarrier code division multiple access (MCCDMA) systems are investigated. In contrast to [13], this article evaluates a family of iterative receivers for an uplink MU MIMOOFDM system, operating over block fading channels. Furthermore, we have tried to place a greater focus on the convergence properties of the different receiver configurations. The convergence speed is important since more iterations require a larger computational effort. Also worth mentioning is the work in [14], where a performancecomplexity comparison of receivers for downlink MIMOOFDM systems is performed. Unlike in our comparison, the investigated receivers does not contain any channel estimator.

A tradeoff analysis between complexity and performance is performed for a MU MIMOOFDM system incorporating iterative channel estimation and MUD. Two popular channel estimation algorithms, one based on expectation maximization (EM) [15], and one performing a joint minimummeansquareerror (MMSE) estimation of all user channels [8, 9], are evaluated. A lowcomplexity approximation of the latter based on a Krylov subspace projection method, as presented in [13], is also evaluated. Three popular MUDs are considered; two parallel interference cancellation (PIC) based detectors and one full maximum aposteriori probability (MAP) detector. The latter being a natural performance benchmark.

In the tradeoff analysis, the total complexity, in terms of complex multiplications, required to reach a given bit error rate (BER) is derived for all algorithm combinations at different signaltonoise ratios and number of users. The results show that lowcomplexity schemes are generally providing the best tradeoff.

The convergence properties of the different receiver combinations are presented, both in terms of BER, mean square estimation error, and through the use of extrinsic information transfer (EXIT) charts [16]. The EXIT charts visualize the exchange of extrinsic information between the outer code and the rest of the receiver incorporating channel estimation and MUD.
The rest of this article is organized as follows. In Section 2, a description of the considered MUMIMOOFDM system is given. The algorithms for obtaining the channel estimate are presented in Section 3, and the MUD algorithms in Section 4. In Section 5 the complexity of the algorithms is discussed, and in Section 6 the performance of different algorithm combinations is investigated. A complexity versus performance analysis is performed in Section 7, before the paper is summarized in Section 8.
2 System description
2.1 MUMIMOOFDM system overview
At the receiver, the signal is demodulated into the complex baseband, where an iterative receiver is implemented. The complexityperformance tradeoff of this receiver is the focal point of this article. The receiver consists of three blocks; a channel estimator, a MUD, and a bank of softinputsoftoutput (SISO) channel decoders. First, an initial channel estimation is performed, based on the transmitted pilot symbols. This estimate is then used in the MUD to separate the different user streams, which are then fed to the SISO decoders after deinterleaving (Π^{1}). The output of the decoders are then used in the next iteration to update the channel estimate, and to further improve the user separation in the MUD. Multiple iterations are then performed in the same way. The different components are described in detail in later sections.
2.2 Inputoutput relationship of the channel
Next we turn the attention to a description of the inputoutput relationship of the channel used in this article. The notation introduced here will also be used for the description of the various algorithms. Furthermore, a lowrank description of the channel, being used by the channel estimation algorithms, is also introduced in section.
from the K autonomous users to the Nantenna basestation at subcarrier m. For later use, we define h_{:,k}[m] = [h_{1,k}[m], ..., h_{N,k}[m]]^{T} and similarily for h_{n,:}[m] and h_{n,k}[:].^{a} Note that due to the blockfading assumption, the matrix H[m] does not depend on s. Furthermore, r[m, s], x[m, s], and w[m, s] are column vectors which contain the received signal, the composite transmitted vector from the K users, and the noise vector ($~\mathcal{C}\mathcal{N}\left(0,{\sigma}_{w}^{2}\mathit{I}\right)$ distributed) respectively, at subcarrier m and OFDM symbol s.
where ${\mathit{X}}_{k}\left[s\right]\in {\u2102}^{M\times M}$ is a diagonal matrix which contains user k's transmitted data in OFDM symbol s along its diagonal, and ${\mathit{w}}_{n}\left[s\right]\in {\u2102}^{M\times 1}$ is a vector collecting the noise at receive antenna n across subcarriers.
All channel estimation algorithms to be evaluated in this article are based on low rank approximations of the wireless channel. The assumption made is that the channel is limited in the delay domain, and can therefore be accurately represented by a relatively small number of base functions. The optimal set of base functions are presented in [18], and are known under the name discrete prolate spheroidal (DPS) sequences. Their use for lowcomplexity channel estimation were proposed in [19], and estimators using the same type of base functions have also been proposed in, e.g., [20].
where $\mathit{r}\in {\u2102}^{\mathsf{\text{SMN}}\times 1}$ is collecting the received signal in all timefrequency positions and at all receive antennas, $\Xi \in {\u2102}^{\mathsf{\text{SMN}}\times \mathsf{\text{KNI}}}$ is an observation matrix collecting the transmitted symbols and channel base functions, $\psi \in {\u2102}^{\mathsf{\text{KNI}}\times 1}$ is collecting the channel coefficients for all users, and $\mathit{w}\in {\u2102}^{\mathsf{\text{SMN}}\times 1}$ is collecting noise. More explicitly, the data structures are given by: r= (r^{T}[1],..., r^{T}[S])^{T}, r[s] = (r^{T}[1, s],..., r^{T}[M, s])^{T}, $\Xi \phantom{\rule{2.77695pt}{0ex}}={\stackrel{\u0304}{\mathit{X}}}_{N}{\stackrel{\u0304}{\mathit{U}}}_{N}$, ${\stackrel{\u0304}{\mathit{X}}}_{N}=\stackrel{\u0304}{\mathit{X}}\otimes {\mathit{I}}_{N,}\phantom{\rule{2.77695pt}{0ex}}\stackrel{\u0304}{\mathit{X}}=\left({\mathit{X}}_{1},\dots ,{\mathit{X}}_{K}\right),\phantom{\rule{2.77695pt}{0ex}}{\mathit{X}}_{k}={\left({\mathit{X}}_{k}^{T}\left[1\right],\dots ,{\mathit{X}}_{k}^{T}\left[S\right]\right)}^{T},\phantom{\rule{2.77695pt}{0ex}}{\stackrel{\u0304}{\mathit{U}}}_{N}\mathit{U}\otimes {\mathit{I}}_{NK}$, $\psi ={\left({\psi}_{1}^{\mathsf{\text{T}}},...,{\psi}_{N}^{\mathsf{\text{T}}}\right)}^{\mathsf{\text{T}}},{\psi}_{n}{\left({\psi}_{n,1}^{\mathsf{\text{T}}},...,{\psi}_{n,K}^{\mathsf{\text{T}}}\right)}^{\mathsf{\text{T}}}.$.
The DPS base functions are obtained from solving the eigenvalue equation [8, 18, 20], Cu_{ i }= λ_{ i }u_{ i }, where $C\in {\u2102}^{M\times M}$ is a channel correlation matrix. For later use, the eigenvalues λ_{ i }are collected in a vector, λ = [λ_{1},..., λ_{ I }]^{ T }. For I ≥ ⌈τ_{max}M ⌉ + 1, where ⌈·⌉ denotes the ceil operation, the energy of the eigenvalues are small and can in general be neglected [18]. This value sets a bound on the number of DPS sequences that are needed to represent the channel in an accurate way.
3 Channel estimation algorithms
In order to achieve satisfactory detection performance, highaccuracy channel estimates need to be made available at the receiver. A large number of appropriate algorithms has been proposed in the literature. Amongst these, two popular families of algorithms have received a great deal of attention; algorithms performing joint estimation for all users [8, 22, 23], and algorithms based on interference cancellation [15, 24]. In this article, two algorithm from the first, and one from the second family is considered. The algorithms make use of the transmitted pilot symbols, as well as decoded data symbols. Thus, they are all using the turbo principle to iteratively improve the channel estimate as the reliability of the decoded data symbols increases. Furthermore, the algorithms have in common that they all use the same underlying lowrank channel model, the one given in Section 2.2.
The first algorithm, previously presented for MCCDMA systems in [8, 25] and later for MIMOOFDM in [22], performs a joint MMSE estimate of the composite channel matrices H[m] based on the model in (3). The second algorithm, presented in [13] for MCCDMA, uses a Krylov subspace method to approximate a costly matrix inverse in the joint MMSE estimator. The third algorithm, based on [15], is using the EM framework, and iteratively performs peruser channel estimation, i.e., estimates of the columns of H[m]. We slightly modify the second algorithm by using the improved space alternating generalized expectationmaximization (SAGE) [26] algorithm. The three algorithms are described below.
3.1 Joint MMSE estimator using soft decisions (joint MMSE)
where $\widehat{\Xi}$ has the same structure as Ξ, but contains both known pilot symbols and soft estimates of the transmitted data carrying symbols; $\Delta =\mathsf{\text{diag}}\left(\vartheta \right)+{\sigma}_{w}^{2}{\mathit{I}}_{\mathsf{\text{NMS}}}$, with ϑ= (ϑ^{T}[1],..., ϑ^{T}[S])^{T}, ϑ[s] = (ϑ[1, s],..., ϑ[M, s])^{T}, $\vartheta \left[m,s\right]=\left({\sum}_{k=1}^{K}\left(1{\left{\stackrel{\u0303}{x}}_{k}\left[m,s\right]\right}^{2}\right)\right){1}_{N}$, and ${\stackrel{\u0303}{x}}_{k}\left[m,s\right]$ are either pilots or soft symbol outputs from the decoder, and 1_{ N }is the allones column vector of length N. Further, note that $\mathsf{\text{diag}}\left(\vartheta \right)=E\left\{\Xi \psi {\psi}^{H}{\Xi}^{H}\right\}\widehat{\Xi}{C}_{\psi}{\widehat{\Xi}}^{H}$, and C_{ ψ }is the covariance matrix of the DPS sequences.
Due to the sizes of the matrices involved in (7), the computational complexity can be expected to be significant. The computational burden is significantly decreased, but still large, if the sparsity and regularity of $\widehat{\Xi}$ is taken into account. We will elaborate more on this in Section 5.
3.2 Krylov subspace reduced joint MMSE estimator using soft decisions (Krylov MMSE)
As mentioned above, the implementation of the joint MMSE estimator embeds a significant computational cost. Multiplication of matrices of large dimensions, along with a costly matrix inversion, adds greatly to the receiver complexity. In [13] an approach to reduce these costs was proposed. The algorithm is making use of a Krylov subspace method, more precisely the unconditional conjugate gradient method [27], to iteratively solve (7). The method iteratively finds the solution to the linear equation system x= Ab, based on an initial guess x_{0}, using that ${\mathit{A}}^{1}={\sum}_{r=1}^{R}{a}_{r}{\mathit{A}}^{r}\approx {\sum}_{r=1}^{{S}_{K}}{a}_{r}{\mathit{A}}^{r}$. The number of terms S_{ k }gives the dimensionality of the Krylov subspace, and equals the number of iterations in the algorithm.
Outline of the Krylov subspace projection method
Steps 

Input: A, b and ${\widehat{\psi}}_{0}$ 
$r=bA{\widehat{\psi}}_{0}$ 
ρ_{1} = r^{H}r 
p= r 
q= Ap 
α = ρ_{1}/p^{H}q 
${\widehat{\psi}}_{1}={\widehat{\psi}}_{0}+\alpha p$ 
r= r α q 
for s = 2,..., S_{ k }(or while ρ_{ s }> ϵ) 
ρ_{ s }= r^{H}r 
β = ρ_{ s }/ρ_{s1} 
p= r+ β p 
q= Ap 
α = ρ_{ s }/p^{H}q 
${\widehat{\psi}}_{s}={\widehat{\psi}}_{s1}+\alpha p$ 
r= r α q 
end 
Output: ${\widehat{\psi}}_{{S}_{k}}$ 
3.3 SAGE based estimator (SAGE ML)
Even though the Krylov subspace method can significantly reduce the complexity of the joint MMSE estimator, the complexity is still high, since large matrixvector multiplications are required in each Krylov iteration. A lowcomplexity alternative, which has shown good performance, is to use an algorithm based on EM/SAGE. In SAGE, given a received signal, the ML solution is iteratively generated based on an underlying subspace model of the data. In [15] one such algorithm was presented, producing an optimal lowrank MMSE estimate of the channel. The details of that algorithm are outlined below, where a conversion from EM to SAGE has been performed.

Initialization: For all k and s${\widehat{\mathit{s}}}_{k,n}^{\left(0\right)}\left[s\right]={\stackrel{\u0303}{\mathit{X}}}_{k}\left[s\right]\mathit{U}{\widehat{\psi}}_{k,n}^{\left(0\right)}.$(9)

For each iteration i:
In (12), the matrix ${\Delta}_{m}=\mathsf{\text{diag}}\left(\frac{{\lambda}_{1}}{{\lambda}_{1}+{\sigma}_{w}^{2}},...,\frac{{\lambda}_{I}}{{\lambda}_{I}+{\sigma}_{w}^{2}}\right)$ stems from the lowrank MMSE estimator, and in (13) averaging is performed to make use of the assumption that the channel is static within a block.
The value of X_{ k }[s] is only perfectly known at time instances where pilots are transmitted. On all other positions, symbol estimates must be used. The estimates are updated by the SISO decoders in every iteration, using the most recent channel estimate. Here, hard decisions ${\widehat{\mathit{X}}}_{k}\left[s\right]=\mathsf{\text{sign}}\left({\stackrel{\u0303}{\mathit{X}}}_{k}\left[s\right]\right)$ of the decoded soft symbols are used for channel estimation, and soft for interference cancellation.
At the very first receiver iteration, no channel estimate is available. Therefore, the algorithm is initialized with ${\widehat{\mathit{s}}}_{k,n}^{\left(0\right)}\left[s\right]={\mathit{X}}_{k}\left[s\right]{1}_{M}$. Furthermore, to improve the accuracy of the initial estimate, several internal iterations can be performed within the estimator itself. This can be seen as the algorithm being reinitialized with its own updated channel estimate, without waiting for updates on the symbol estimates. In this article, this is only performed at the initial pilot based stage, where the gain is observed to be the largest. In later stages, multiple internal iterations are not producing any significant gain, thus mainly adding to the computational complexity.
4 Softinput softoutput MU detectors
With estimates of the transmission channel having been made available by the channel estimator, the next stage of the iterative receiver structure is to produce likelihoodratios of the coded data symbols. This operation is performed by the MUD, which apart from the received signal and channel estimate, uses apriori information of the transmitted symbols. This information is provided, from the previous iteration, by the channel decoder. The optimal SISO detector is the symbolwise MAP detector, implemented through the BCJR algorithm [28]. Unfortunately, the complexity of the MAP detector in the MIMO case is prohibitive in most situations, except for the cases when the number of users K is small. Therefore, reduced complexity techniques have to be considered for most practical applications. Furthermore, although optimal detection is not generally feasible in practice, it remains important as a benchmark reference, and will therefore be considered in this article. The principles behind the MAP algorithm are outlined in Section 4.1.
Many reduced complexity detection algorithms have been proposed in the literature [2]. To restrict the investigations, two such algorithms have been selected and are presented in Section 4.2. Both algorithms are based on PIC. The first algorithm applies a matched filter (MF) after the cancellation, while the other applies an MMSE filter, in an attempt to further suppress the interuser interference. While the latter approach yields better performance it is also more complex. In later sections we shall investigate whether the performance gain motivates the increased complexity.
4.1 Maximum aposteriori probability
As stated previously, the optimal MUD is the symbolwise MAP detector. While the PICbased algorithms, being introduced in Section 4.2, only make use of the mean values ${\stackrel{\u0303}{x}}_{k}\left[m,s\right]$, the symbolwise MAP detector works with the probability mass function of x[m,s], denoted P_{a}(x[m, s]).
As was discussed above, the complexity of the symbolwise MAP detector (16) may in many cases be prohibitively large, showing the demand for low complexity schemes.
4.2 PIC based detectors
where ${\stackrel{\u0303}{\mathit{x}}}_{\ne k}\left[m,s\right]$ is equal to $\stackrel{\u0303}{\mathit{x}}\left[m,s\right]$, except for element k, which is set to zero. A filtering of the signal ${\stackrel{\u0303}{\mathit{r}}}_{k}\left[m,s\right]$ is then applied to produce an estimate of the transmitted symbol x_{ k }[m, s]. A mapping to LLR values then follows.
is the variance of the residual interference plus noise for user k.
5 Complexity analysis
When it comes to practical implementations of iterative MU receivers, complexity considerations are of great importance. Since several receiver iterations are generally needed to reach a desired performance, the total computational effort can grow very large. To get an estimate of this cost, we have chosen to present and compare the complexity of the addressed algorithms in terms of the required number of complexvalued multiplications. This measure is chosen since it provides a reasonable estimate of the complexity, while being analytically tractable. Obviously, the final computational and hardware complexity depends on a large number of parameters, such as memory requirements, parallelization, hardware reuse, word lengths, etc.
Expresions for the complexity per user for the different receiver components, as well as the required number of complex multiplications per information bit
Algorithm  Total no. of complex mult.*  Mult. per bit 

Channel estimators  
SAGE ML  2MNS +2MNL +IN  24 
Joint MMSE  MS(3 + K) + KMI(1 + I) + K^{2}I^{3} + N(MS + 2MI + KI^{2})  465 
Krylov MMSE  3MS + MSN + 2IMN + C_{Ax}(S_{ k }+ 1) + IN(5S_{ k }+ 2) + 3NS_{ k }/K  145 
with  C_{Ax} = 3MSN+ IN(M +1)  
Multiuser detection  
PICMF  2MSN+4MS+MNK  13 
PICMMSE  4SM + 3SMN + SMNK + SMK^{3} + MNK  97 
MAP  SMN2^{2K}/K  256 
SISO decoder  
MAP**  (42M(S  Sp))/3  56 
5.1 Channel estimator complexity
Three different channel estimation algorithms were presented in Section 3, joint MMSE, Krylov MMSE and SAGE ML. As seen in Table 2, the difference in complexity is significant. For the discussions below, we will assume that the number of OFDM symbols in each block is smaller than the number of subcarriers, i.e., S < M.
Looking at the first algorithm, the optimal joint MMSE algorithm, the complexity is large, as previously discussed. Since all user channels are estimated jointly, using all available frequency and time samples, the dimensionality of the problem to solve becomes very large. Looking at (7), a straightforward implementation would be very costly due to the dimensionality of the involved data structures. Fortunately, considerable reductions can be achieved. Firstly, under the assumption of independent receive antenna channels, the same estimator can be used independently on each antenna. Secondly, under the block fading assumption, the matrix $\Xi ={\stackrel{\u0304}{\mathit{X}}}_{N}{\stackrel{\u0304}{\mathit{U}}}_{N}$ is the product of a block diagonal matrix and a block matrix with diagonal submatrices. Thus, the operations involving this structure can be computed efficiently. It should be noted that under the assumption of independent receive antennas, Ξ is block diagonal with identical submatrices. The estimator only involve one of these SM × KI submatrices. In the end, the main part of the complexity is related to two operations; the product of ${\widehat{\Xi}}^{\mathsf{\text{H}}}{\Delta}^{1}\widehat{\Xi}$ and the inverse operation of a KI × KI matrix. The computational complexity of the former is approximately M(IK)^{2}, while approximately (KI)^{3} for the latter. For the system settings considered in this article the two are of comparable size. Also note that the hermitian properties of the data structures can be exploited to further reduce complexity.
The second algorithm make use of a Krylov subspace method to avoid the explicit matrix inversion in (7). At the same time the explicit computation of ${\widehat{\Xi}}^{\mathsf{\text{H}}}{\Delta}^{1}\widehat{\Xi}$ can be avoided. This will be beneficial as long as S < M. Referring back to Section 3.2 and Table 1, the main part of the complexity lies in calculating Av_{ s }, which is performed once for every subspace dimension S_{ K }. From a complexity point of view, its preferable to keep S_{ K }low. On the other hand, a too small value will provide a poor approximation of the matrix inverse, and thus poor performance. The value thus needs to be chosen with care, trading complexity for performance. An upper limit on the number of dimensions may be set by timing constraints in the receiver.
The last algorithm, based on SAGE, has the lowest complexity and performs a separate channel estimate for each user channel. SAGE ML has less then half the complexity of Krylov MMSE with S_{ K }= 1. This suboptimal approach has an attractively low complexity and, as will be seen in Section 6, also delivers good performance. The complexity is linear in the number of user, i.e., the complexity per user is constant. The main part of the complexity is shared between the per symbol estimate, the interference cancellation, and the subspace filtering, i.e., the utilization of the frequency correlation.
The former two is proportional to the number of OFDM symbols S, while the latter to the subspace order I, all with the same proportionality constant. The complexity can thus be reduced by lowering the number of OFDM symbols taken into account when performing the estimation, or by reducing I. Both actions would come at the price of a performance loss.
5.2 MUD complexity
As for the different channel estimation algorithms, the complexity of the considered MUDs differ significantly, as seen from Table 2. The one with the lowest complexity is the PICMF, which due to its simplicity requires relatively few arithmetic operations. The complexity is shared between the interference cancellation plus MF, and generating the LLRs. The former requiring a bit more computational effort. Despite its low complexity, as will be seen in Section 6, the performance is still competitive at low user loads.
Using a soft information based MMSE filter instead of the MF, the performance will be shown to improve. This comes at a cost of an increased complexity due to the MMSE filter in (21) which needs to be calculated for each user and for each data symbol. The filter includes an inverse of a K × K matrix. At high user loads, computing the inverse will dominate the complexity. If the number of users grow very large, subspace methods as the one used in the Krylov MMSE estimator could be used to reduce the complexity.
If the optimal MAP receiver is considered, the complexity is significantly increased. The complexity, as derived in [31], grows exponentially in the number of users. For few users, the complexity is manageable, but as the number of users grows, it rapidly becomes prohibitive. It should be noted that there exist a number of reduced complexity MAPlike detectors which are based upon searching trees [32, 33], which are not included in our comparison.
6 Simulation results
In order to investigate the receiver performance under the use of the different algorithms, computer simulations were performed. In the simulations, each user transmits S = 20 OFDM symbols, each with M = 256 subcarriers. If nothing else is stated, a single OFDM symbol is dedicated for training information, i.e., S_{ p }= 1, which is generated randomly for each user. Nonorthogonal transmission of the pilot symbols are assumed, i.e., all users transmit their pilot symbols simultaneously in time and frequency. This may incur a loss in performance, but is motivated by the flexibility it brings to the system configuration if varying number of users is to be supported. A rate 1/2 convolutional code with generator polynomial (7, 5)_{8} is used to generate the code bits, which after random interleaving are mapped to QPSK symbols. For the receiver, we are restricting the investigation to N = 4 antennas, while different number of transmitting users are considered.
where α_{p,k,n}are zeromean complex Gaussian random variables with an exponential power delay profile, $\theta \left({\tau}_{p,k,n}\right)=C{e}^{{\tau}_{p,k,n}/{\tau}_{\mathsf{\text{rms}}}}$, where C is a constant, and the delays τ_{p,k,n}are uniformly distributed within the CP. In this article, the length of the channel, normalized to the symbol duration, is τ_{max} = 0.1, the root mean square delay spread set to τ_{rms} = 0.03, and the number of multipath components P = 100. The channel delay is assumed to be no longer than the CP, and the block fading channel is generated independently for each user and receive antenna link. The number of DPS sequences used in the channel estimation process is chosen as I = 36, guided by the discussion in Section 2.2, and adding a few for improved performance at high SNR. The subspace order in Krylov MMSE estimator is set to S_{ k }= 5, if nothing else is stated.
In the following, the motive behind performing the complex operation of channel estimation in the loop of an iterative receiver is first illustrated with an example. In the example, the average BER performance at different E_{ b }/N_{0} is compared for receivers using the channel estimator inside or outside of the iterative loop. It will be seen that the gains by performing the estimation inside the loop can provide significant performance gains. Here, E_{ b }is the average bit energy at the receiver. Furthermore, the impact of the array gain has been removed by scaling the noise variance by N.
We then study the evolution of the BER and MSE of the channel estimate, over the receiver iterations. This is done for different user loads. The results illustrate the difference in convergence speed of the different receiver configurations, which is important when assessing the total computational complexity needed to reach a certain level of performance. Finally, the convergence analysis is extended with the use of EXIT charts; providing additional insight on the receiver.
6.1 Illustration of the gains of using channel estimation inside the detection loop
As was seen in Section 5, performing channel estimation adds significantly to the total receiver complexity. Furthermore, having the estimation inside the loop of an iterative receiver, this costly operation needs to be performed multiple times. It would therefore, from a complexity point of view, be attractive to move the estimation outside the loop, only performing it once for each code block based on the transmitted pilot symbols.
As seen from the figure, if only pilot based estimates are used, there is a significant performance loss, as compared to when using channel estimation in the iterative loop. For few pilot symbols, a loss in performance of 13 dB is observed, while if the number of pilot symbols is increased to S_{ p }= 10, the loss is small. Remember that the total number of OFDM symbols in a block is S = 20, thus transmitting ten symbols yields a 50% pilot overhead, which is unacceptable for most applications. Transmitting orthogonal boosted pilots also result in a loss of up to 1 dB. The performance achieved with orthogonal pilots is only slightly better than when transmitting S_{ p }= 4 nonorthogonal pilots, since joint channel estimation is performed. Furthermore, if iteratively updating the channel estimates, close to single user performance with PCSI is achieved. It can therefore be concluded that the use of channel estimation inside an iterative receiver can give significant performance gains, as compared to pure pilot based approaches. This means that pilot density can be kept low, without sacrificing performance, thus improving the system throughput.
6.2 Convergence performance: BER and MSE
Starting with the BER in Figure 3, it is clear that convergence properties differ between algorithm combinations. At the smaller user load, i.e., K = 4, the difference in convergence is relatively small, with all algorithms reaching roughly the same BER within 38 iterations. The fastest convergence is achieved using the MAP based MUD with joint MMSE channel estimation, while the slowest is obtained if using the PICMF detector with SAGE ML estimation. By using the MMSE Krylov estimator with S_{ K }= 5, a small performance loss as compared to joint MMSE is observed. Increasing this value to S_{ K }= 10, close to joint MMSE performance has been observed. Looking at a system load of K = 7 users, a similar behavior as with K = 4 is seen. Comparing the performance achieved when using the different MUDs, the best performance is given by the MAP. A gain of 15 iterations over the PICMMSE detector is observed. There is a large difference in convergence depending on which estimator is used, and additional insight on this will be given when looking at the EXIT charts in the next section. Furthermore, at this high user load, the PICMF can not provide sufficient detection performance for receiver convergence. It is also interesting to note that performance close to that of a single user with PCSI at the receiver is achieved for all receiver configurations, except for PICMF at K = 7 users. This illustrate the good performance obtained by the iterative receiver.
Looking at the average MSE, as shown in Figure 4, similar trends as for the BER are seen. The convergence speed of the joint MMSE estimator is better than that of SAGE ML, and the difference increases with the user load. Furthermore, in the first iteration, only pilot symbols are used for channel estimation, and a large MSE is obtained due to the relatively small number of available pilots. In the iterative process, as the reliability of the symbol estimates increases with iterations, so does the accuracy of the channel estimate.
6.3 Convergence performance: EXIT charts
Even though the BER and MSE convergence provide some insight on the behavior of the different algorithms, they have some limitations. One significant drawback is that the performance of the channel estimation and detection algorithms cannot be separated from that of the code. Other means are therefore of interest for the receiver evaluation.
One popular technique for visualizing the convergence behavior of iterative decoders is the EXIT charts [16]. The charts are used to visualize the exchange of extrinsic information between the SISO units making up an iterative decoder. In [35], it was shown that the MUD could be seen as SISO unit being serially concatenated with the outer channel decoder. In our case, we have three units, the MUD, the channel estimator and the decoder. Even though it is possible to visualize the exchange between all three SISO units [36, 37], it is more convenient to combine the estimator and the MUD into a single SISO unit [38], referred to as MUD/CE.
where I_{a} = I (x; Λ_{a}) is the a priori input mutual information and I_{ext} = I (x; Λ_{ext}) is the output extrinsic information.
where $w~\mathcal{N}\left(0,1\right)$, and ${\sigma}_{\mathsf{\text{ext}}}^{2}={J}^{1}\left({I}_{a}\right)$.
where the probability density function, $p\left(\widehat{d}d\right)$, is approximated using histogram calculations. The transfer functions are then averaged over 20 channel realizations. The transfer function for the SISO decoder is obtained in a similar way.
When generating the transfer function for the MUD/CE, the initial guess for the Krylov MMSE and SAGE ML has to be provided. In the receiver this value is given by the estimate obtained in the previous iteration. Since this value is unknown, we solve it by running the channel estimator twice, first initialized with the all one channel then reinitialized with the new output. This potentially leads to an over estimated performance at high I_{ext}. For SAGE ML this also leads to an under estimated performance at low values.
Comparing the channel estimation algorithms, Krylov MMSE, used with S_{ K }= 5, delivers performance identical to Joint MMSE. For SAGE ML, the performance is much worse, but the performance at low I_{a} is somewhat underestimated as discussed above. From Figure 5, we also see the impact of inaccurate CSI, illustrating itself by a gap between the transfer functions obtained when using the channel estimation and when having PCSI. As the reliability of the a priori information increases, this gap is decreased since the produced estimates become increasingly accurate. Looking at the MUDs, the MAP obviously has the best performance, followed by PICMMSE and PICMF. Furthermore, when the SNR is reduced (essentially leading to downward shift of the transfer functions of the MUD/CE), or when increasing the user load (essentially changing the slope of the transfer functions), the PICMF will be the first MUD closing the gap to the SISO decoder transfer function, and thus failing to converge.
Overall, we see that the insight given by the EXIT chart matches fairly well with what was observed for the BER. Furthermore, observing the MAP detector for K = 7 users in Figure 3, large difference in convergence performance between using the MMSE estimators or SAGE ML was observed. This could be explained by the fact that the gap in the EXIT chart is smaller for the latter estimator. From a algorithm design point of view, it is also interesting to observe that for the case presented in Figure 5 there is still room for further simplifications of the receiver structure. Additionally, the performance obtained when using an alternative channel code can be estimated by replacing the transfer function for the chosen convolutional code in Figure 5.
7 Complexity versus performance tradeoff
From a receiver design point of view, the tradeoff between performance and complexity is an important aspect. In an attempt to shed some light on this aspect, the total receiver complexity, in terms of the number of complex multiplications, needed to reach a specific target BER is investigated. The total complexity depends both on the choice of channel estimator and MUD, as well as on the number of iterations needed to reach the target. For the evaluation, a target BER of 10^{3} is chosen. The system settings are the same as described in Section 6, i.e., N = 4 receive antennas, S_{ p }= 1 and S = 20 OFDM symbols, M = 256 subcarriers and I = 36 DPS sequences. The subspace order in Krylov MMSE is set to S_{ K }= 5.
As was previously seen in Figure 3, under these system settings, all receivers reach the same BER performance of ~10^{4}. On the other hand, looking at the number of multiplications needed to reach this value, there is more than an order of magnitude difference between the receiver configurations. The receiver configurations using the MAP detector is found on the right, requiring the largest number of multiplications to reach convergence. To the left, we find the PIC based MUDs using SAGE ML, providing the cheapest alternative. Looking at the target BER of 10^{3}, the algorithms with the lowest total complexity is PICMF followed by PICMMSE. Reaching the target in about 70 and 100 complex multiplications per information bit, respectively. When using the MMSE Krylov estimator, we see that PICMF and PICMMSE reach the target using approximately the same number of multiplications, though PICMF require one more iteration.
The results shown in Figure 7 take the overall computational complexity into account and may therefore fail to show other interesting tradeoffs. An example of this is seen in Figure 3, where the difference in convergence speed between the algorithms is large. Depending on the hardware architecture used, this may affect the latency of the system, and for time critical systems, the choice of algorithm combinations may therefore be another. We believe, however, that our evaluation shows that combinations of algorithms with low computational complexity, when used in an in an iterative receiver, can deliver very competitive performance for a large range of scenarios.
8 Conclusion
In this article, we have studied the tradeoff between complexity and performance for uplink receivers in a packet based MU MIMOOFDM system. The considered iterative receivers contained three main components; a MUD, a channel estimator and a convolutional decoder. Three different MUD algorithms were considered, two suboptimal approaches based on PIC and one optimal based on MAP. For channel estimation, three algorithms were evaluated, one optimal joint MMSE based estimator, a low complexity Krylov subspace based version of the same, and one suboptimal based on SAGE. The difference in complexity between the algorithms were shown to be large.
When only considering performance, the high complexity algorithms naturally showed the fastest convergence. The lowcomplexity algorithms showed similar BER performance as the more complex ones, when converging, but at a generally slower convergence speed. More insight on the convergence was also provided through EXIT charts. When taking complexity into account, we demonstrate that the suboptimal lowcomplexity algorithms are often the most attractive choice. Even though a larger number of receiver iterations were needed, the total number of complex multiplications was still lower, due to a significantly lower computational cost periteration. At the same time, it should be noted that the most simple receiver failed earlier than the others at high user loads, which indicates that an appropriate balance between complexity reduction and performance needs to be achieved. Furthermore, for time critical systems where convergence speed is at focus, high complexity algorithms may be a better choice.
Endnote
^{a}In general the notation will be that subindices state which user and receive antenna is considered, while the time and frequency position will be given in brackets.
Declarations
Authors’ Affiliations
References
 Dahlman E, Parkvall S, Sköld J, Beming P: 3G Evolution HSPA and LTE for Mobile Broadband. Academic Press, Oxford; 2008.Google Scholar
 Larsson EG: MIMO detection methods: how they work. IEEE Signal Process Mag 2009, 26(3):9195.View ArticleGoogle Scholar
 Vikalo H, Hassibi B, Stoica P: Efficient joint maximumlikelihood channel estimation and signal detection. IEEE Trans Wirel Commun 2006, 5(7):18381845.View ArticleGoogle Scholar
 Xu W, Stojnic M, Hassibi B: On exact maximumlikelihood detection for noncoherent MIMO wireless systems: a branchestimatebound optimization framework. In Proc IEEE International Symposium on Information Theory. Toronto, Canada; 2008:20172021.Google Scholar
 Ryan DJ, Collings IB, Clarkson IVL: GLRToptimal noncoherent lattice decoding. IEEE Trans Signal Process 2007, 55(7):37733786.MathSciNetView ArticleGoogle Scholar
 Berrou C, Glavieux A: Near optimum error correcting coding and decoding: turbocodes. IEEE Trans Commun 1996, 44(10):12611271. 10.1109/26.539767View ArticleGoogle Scholar
 Wautelet X, Dejonghe A, Vandendorpe L: MMSEbased fractional turbo receiver for spacetime BICM over frequencyselective MIMO fading channels. IEEE Trans Signal Process 2004, 52(6):18041809. 10.1109/TSP.2004.827198View ArticleGoogle Scholar
 Zemen T, Mecklenbrauker C, Wehinger J, Muller R: Iterative joint timevariant channel estimation and multiuser detection for MCCDMA. IEEE Trans Wirel Commun 2006, 5(6):14691478.View ArticleGoogle Scholar
 Salvo Rossi P, Müller R: Slepianbased twodimensional estimation of timefrequency variant MIMOOFDM channels. IEEE Signal Process Lett 2008, 15: 2124.View ArticleGoogle Scholar
 Hu B, Land I, Rasmussen L, Piton R, Fleury B: A divergence minimization approach to joint multiuser decoding for coded CDMA. IEEE J Sel Areas Commun 2008, 26(3):432445.View ArticleGoogle Scholar
 Honig ME: Advances in Multiuser Detection. Wiley, Hoboken; 2009.View ArticleGoogle Scholar
 Hammarberg P, Rusek F, Edfors O: Channel estimation algorithms for OFDMIDMA: complexity and performance. IEEE Trans Wirel Commun, in press.Google Scholar
 Dumard C, Zemen T: Lowcomplexity MIMO multiuser receiver: a joint antenna detection scheme for timevarying channels. IEEE Trans Signal Process 2008, 56(7):29312940.MathSciNetView ArticleGoogle Scholar
 Ketonen J, Juntti M, Cavallaro J: Performancecomplexity comparison of receivers for a LTE MIMOOFDM system. IEEE Trans Signal Process 2010, 58(6):33603372.MathSciNetView ArticleGoogle Scholar
 Gao J, Liu H: Lowcomplexity MAP channel estimation for mobile MIMOOFDM systems. IEEE Trans Wirel Commun 2008, 7(3):774780.View ArticleGoogle Scholar
 ten Brink S: Convergence of iterative decoding. IEEE Electron Lett 1999, 35(10):806808. 10.1049/el:19990555View ArticleGoogle Scholar
 Ylioinas J, Raghavendra M, Juntti M: Avoiding matrix inversion in DD SAGE channel estimation in MIMOOFDM with MQAM. In Proc IEEE Vehicular Technology Conference 2009 fall. Anchorage, AK; 2009:15.View ArticleGoogle Scholar
 Slepian D: Prolate spheroidal wave functions, Fourier analysis, and uncertaintyV: the discrete case. Bell Syst Tech J 1978, 57(5):13711430.View ArticleGoogle Scholar
 Zemen T, Mecklenbrauker C: Timevariant channel estimation using discrete prolate spheroidal sequences. IEEE Trans Signal Process 2005, 53(9):35973607.MathSciNetView ArticleGoogle Scholar
 Edfors O, Wilson S, Börjesson P: OFDM channel estimation by singular value decomposition. IEEE Trans Commun 1998, 46(7):931939. 10.1109/26.701321View ArticleGoogle Scholar
 Kay SM: Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. 1st edition. Prentice Hall, Upper Saddle River; 1993.Google Scholar
 Salvo Rossi P, Müller R: Joint iterative timevariant channel estimation and multiuser detection for MIMOOFDM systems. In Proc IEEE Global Communications Conference. Washington, DC; 2007:42634268.Google Scholar
 Li Y, Seshadri N, Ariyavisitakul S: Channel estimation for OFDM systems with transmitter diversity in mobile wireless channels. IEEE J Sel Areas Commun 1999, 17(3):461471. 10.1109/49.753731View ArticleGoogle Scholar
 Münster M, Hanzo L: Parallelinterferencecancellationassisted decisiondirected channel estimation for OFDM systems using multiple transmit antennas. IEEE Trans Wirel Commun 2005, 4(5):21482162.View ArticleGoogle Scholar
 Zemen T, Loncar M, Wehinger J, Mecklenbrauker C, Muller R: Improved channel estimation for iterative receivers. In Proc IEEE Global Communications Conference. Volume 1. San Francisco, CA; 2003:257261.Google Scholar
 Fessler J, Hero A: Spacealternating generalized expectationmaximization algorithm. IEEE Trans Signal Process 1994, 42(10):26642677. 10.1109/78.324732View ArticleGoogle Scholar
 Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, Van der Vorst H: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. 2nd edition. SIAM, Philadelphia; 1994.View ArticleGoogle Scholar
 Bahl L, Cocke J, Jelinek F, Raviv J: Optimal decoding of linear codes for minimizing symbol error rate. IEEE Trans Inf Theory 1974, 20(2):284287.MathSciNetView ArticleGoogle Scholar
 Lee H, Lee I, Lee B: Iterative detection and decoding with an improved VBLAST for MIMOOFDM systems. IEEE J Sel Areas Commun 2006, 24(3):504513.View ArticleGoogle Scholar
 Costello D, Banerjee A, He C, Massey P: A comparison of low complexity turbolike codes. In Signals Systems and Computers Conference Record of the ThirtySixth Asilomar Conference on. Volume 1. Pacific Grove, CA; 2002:1620.Google Scholar
 Roy S, Duman T: Soft input soft output Kalman equalizer for MIMO frequency selective fading channels. IEEE Trans Wirel Commun 2007, 6(2):506514.View ArticleGoogle Scholar
 Boutros J, Gressety N, Brunel L, Fossorier M: Softinput softoutput lattice sphere decoder for linear channels. In Proc IEEE Global Communications Conference. San Francisco, CA; 2003:15831587.Google Scholar
 Wong KKV: The softoutput Malgorithm and its applications, Dept. In Electrical and Computer Eng. Queens University, Canada, Ph.D. thesis; 2006.Google Scholar
 Hoeher P: A statistical discretetime model for the WSSUS multipath channel. IEEE Trans Veh Technol 1992, 41(4):461468. 10.1109/25.182598View ArticleGoogle Scholar
 Alexander P, Grant AJ, Reed MC: Iterative detection in codedivision multipleaccess with error control coding. Eur Trans Telecommun 1998, 9(5):419425. 10.1002/ett.4460090504View ArticleGoogle Scholar
 Brännström F, Rasmussen L: A Grant, Convergence analysis and optimal scheduling for multiple concatenated codes. IEEE Trans Inf Theory 2005, 51(9):33543364. 10.1109/TIT.2005.853312View ArticleGoogle Scholar
 Shepherd D, Z Shi, Anderson M, Reed M: EXIT chart analysis of an iterative receiver with channel estimation. In Proc IEEE Global Communications Conference. Washington, DC, USA; 2007:40104014.Google Scholar
 Otnes R, Tüchler M: EXIT chart analysis applied to adaptive turbo equalization. In Proc 5th Nordic Signal Processing Symp. Hurtigruten from Tromso to Trondheim, Norway; 2002.Google Scholar
 ten Brink S: Convergence behavior of iteratively decoded parallel concatenated codes. IEEE Trans Commun 2001, 49(10):17271737. 10.1109/26.957394View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.