In order to investigate the receiver performance under the use of the different algorithms, computer simulations were performed. In the simulations, each user transmits *S* = 20 OFDM symbols, each with *M* = 256 subcarriers. If nothing else is stated, a single OFDM symbol is dedicated for training information, i.e., *S*_{
p
}= 1, which is generated randomly for each user. Non-orthogonal transmission of the pilot symbols are assumed, i.e., all users transmit their pilot symbols simultaneously in time and frequency. This may incur a loss in performance, but is motivated by the flexibility it brings to the system configuration if varying number of users is to be supported. A rate 1/2 convo-lutional code with generator polynomial (7, 5)_{8} is used to generate the code bits, which after random interleaving are mapped to QPSK symbols. For the receiver, we are restricting the investigation to *N* = 4 antennas, while different number of transmitting users are considered.

A fading multi-path IID channel is assumed, mimicking a rich scattering environment. The channel impulse response between user *k* and receive antenna *n* is given by [34]

{g}_{k,n}\left(\tau \right)=\sum _{p=0}^{P-1}{\alpha}_{p,k,n}\delta \left(\tau -{\tau}_{p,k,n}\right),

where *α*_{p,k,n}are zero-mean complex Gaussian random variables with an exponential power delay profile, \theta \left({\tau}_{p,k,n}\right)=C{e}^{-{\tau}_{p,k,n}/{\tau}_{\mathsf{\text{rms}}}}, where *C* is a constant, and the delays *τ*_{p,k,n}are uniformly distributed within the CP. In this article, the length of the channel, normalized to the symbol duration, is *τ*_{max} = 0.1, the root mean square delay spread set to *τ*_{rms} = 0.03, and the number of multi-path components *P* = 100. The channel delay is assumed to be no longer than the CP, and the block fading channel is generated independently for each user and receive antenna link. The number of DPS sequences used in the channel estimation process is chosen as *I* = 36, guided by the discussion in Section 2.2, and adding a few for improved performance at high SNR. The subspace order in Krylov MMSE estimator is set to *S*_{
k
}= 5, if nothing else is stated.

In the following, the motive behind performing the complex operation of channel estimation in the loop of an iterative receiver is first illustrated with an example. In the example, the average BER performance at different *E*_{
b
}/*N*_{0} is compared for receivers using the channel estimator inside or outside of the iterative loop. It will be seen that the gains by performing the estimation inside the loop can provide significant performance gains. Here, *E*_{
b
}is the average bit energy at the receiver. Furthermore, the impact of the array gain has been removed by scaling the noise variance by *N*.

We then study the evolution of the BER and MSE of the channel estimate, over the receiver iterations. This is done for different user loads. The results illustrate the difference in convergence speed of the different receiver configurations, which is important when assessing the total computational complexity needed to reach a certain level of performance. Finally, the convergence analysis is extended with the use of EXIT charts; providing additional insight on the receiver.

### 6.1 Illustration of the gains of using channel estimation inside the detection loop

As was seen in Section 5, performing channel estimation adds significantly to the total receiver complexity. Furthermore, having the estimation inside the loop of an iterative receiver, this costly operation needs to be performed multiple times. It would therefore, from a complexity point of view, be attractive to move the estimation outside the loop, only performing it once for each code block based on the transmitted pilot symbols.

To illustrate the motive behind using the channel estimation inside the iterative receiver, simulations are performed for a system with *N* = 4 receive antennas and *K* = 4 users. Two different receiver configurations are considered. The first is performing pilot based channel estimation only, while the other is performing channel estimation inside the iterative loop. For both receivers, the MAP MUD is used in combination with the joint MMSE channel estimator. In Figure 2, the BER performance is shown for different number of pilot symbols transmitted. For the purely iterative receiver, only one pilot OFDM symbol is used, while for the other receiver *S*_{
p
}= 1,2 and ten pilot symbols are transmitted. For comparison, single user performance when perfect channel state information (PCSI) is available at the receiver is also shown. Also, an example with orthogonal pilots is provided, where the users consecutively transmit one pilot symbol each during the first four symbol intervals. Each pilot have been boosted, containing the equivalent energy of four regular symbols.

As seen from the figure, if only pilot based estimates are used, there is a significant performance loss, as compared to when using channel estimation in the iterative loop. For few pilot symbols, a loss in performance of 1-3 dB is observed, while if the number of pilot symbols is increased to *S*_{
p
}= 10, the loss is small. Remember that the total number of OFDM symbols in a block is *S* = 20, thus transmitting ten symbols yields a 50% pilot overhead, which is unacceptable for most applications. Transmitting orthogonal boosted pilots also result in a loss of up to 1 dB. The performance achieved with orthogonal pilots is only slightly better than when transmitting *S*_{
p
}= 4 non-orthogonal pilots, since joint channel estimation is performed. Furthermore, if iteratively updating the channel estimates, close to single user performance with PCSI is achieved. It can therefore be concluded that the use of channel estimation inside an iterative receiver can give significant performance gains, as compared to pure pilot based approaches. This means that pilot density can be kept low, without sacrificing performance, thus improving the system throughput.

### 6.2 Convergence performance: BER and MSE

In the previous section we illustrated how iterative channel estimation can provide a significant performance gain. At the same time, the complexity can be significant, as seen in Section 5. Since the computational cost increases linearly with the number of iterations, the convergence properties of the different receiver configurations are of importance. To illustrate their properties, the BER as well as the MSE is shown, as a function of the number of iterations, in Figures 3 and 4, respectively. The results are shown for the cases of *K* = 4 and 7 users, at an *E*_{
b
}/*N*_{0} = 10dB.

Starting with the BER in Figure 3, it is clear that convergence properties differ between algorithm combinations. At the smaller user load, i.e., *K* = 4, the difference in convergence is relatively small, with all algorithms reaching roughly the same BER within 3-8 iterations. The fastest convergence is achieved using the MAP based MUD with joint MMSE channel estimation, while the slowest is obtained if using the PIC-MF detector with SAGE ML estimation. By using the MMSE Krylov estimator with *S*_{
K
}= 5, a small performance loss as compared to joint MMSE is observed. Increasing this value to *S*_{
K
}= 10, close to joint MMSE performance has been observed. Looking at a system load of *K* = 7 users, a similar behavior as with *K* = 4 is seen. Comparing the performance achieved when using the different MUDs, the best performance is given by the MAP. A gain of 1-5 iterations over the PIC-MMSE detector is observed. There is a large difference in convergence depending on which estimator is used, and additional insight on this will be given when looking at the EXIT charts in the next section. Furthermore, at this high user load, the PIC-MF can not provide sufficient detection performance for receiver convergence. It is also interesting to note that performance close to that of a single user with PCSI at the receiver is achieved for all receiver configurations, except for PIC-MF at *K* = 7 users. This illustrate the good performance obtained by the iterative receiver.

Looking at the average MSE, as shown in Figure 4, similar trends as for the BER are seen. The convergence speed of the joint MMSE estimator is better than that of SAGE ML, and the difference increases with the user load. Furthermore, in the first iteration, only pilot symbols are used for channel estimation, and a large MSE is obtained due to the relatively small number of available pilots. In the iterative process, as the reliability of the symbol estimates increases with iterations, so does the accuracy of the channel estimate.

### 6.3 Convergence performance: EXIT charts

Even though the BER and MSE convergence provide some insight on the behavior of the different algorithms, they have some limitations. One significant drawback is that the performance of the channel estimation and detection algorithms cannot be separated from that of the code. Other means are therefore of interest for the receiver evaluation.

One popular technique for visualizing the convergence behavior of iterative decoders is the EXIT charts [16]. The charts are used to visualize the exchange of extrinsic information between the SISO units making up an iterative decoder. In [35], it was shown that the MUD could be seen as SISO unit being serially concatenated with the outer channel decoder. In our case, we have three units, the MUD, the channel estimator and the decoder. Even though it is possible to visualize the exchange between all three SISO units [36, 37], it is more convenient to combine the estimator and the MUD into a single SISO unit [38], referred to as MUD/CE.

In order to produce an EXIT chart, information transfer functions of the SISO units have to be produced. Each unit can be seen an LLR transformer (*Λ*_{a} → *Λ*_{ext}), where the transfer function measures the improvement of the LLR-transformation in terms of mutual information between the LLRs and the underlying variables *x*. The transfer function is given as [39]

{I}_{\mathsf{\text{ext}}}=T\left({I}_{\mathsf{\text{a}}}\right),

(23)

where *I*_{a} = *I* (*x*; *Λ*_{a}) is the *a priori* input mutual information and *I*_{ext} = *I* (*x*; *Λ*_{ext}) is the output extrinsic information.

When producing the transfer functions, all elements of *Λ*_{ext} (becoming *Λ*_{a} for the next component decoder) are assumed independent and to follow a Gaussian distribution, \mathcal{N}\left(x{\mu}_{\mathsf{\text{ext}}},{\sigma}_{\mathsf{\text{ext}}}^{2}\right), with consistency condition {\mu}_{\mathsf{\text{ext}}}={\sigma}_{\mathsf{\text{ext}}}^{2}/2 and where *x* = ±1. With this distribution of the LLRs, there is a one-to-one mapping between the mutual information *I*_{ext} and the variance {\sigma}_{\mathsf{\text{ext}}}^{2} given by

{I}_{\mathsf{\text{ext}}}=J\left({\sigma}_{\mathsf{\text{ext}}}\right),

(24)

where the *J*-function is defined in [16]. When generating the transfer functions, the *J*-function is used for generating input sequences with different *a priori* information content. More specifically, given an input symbol *x*, and a value for the *a priori* information *I*_{a}, the input LLRs are given by

{\Lambda}_{\mathsf{\text{ext}}}\left({I}_{a}\right)=\frac{{\sigma}_{\mathsf{\text{ext}}}^{2}}{2}x+w{\sigma}_{\mathsf{\text{ext}}},

(25)

where w~\mathcal{N}\left(0,1\right), and {\sigma}_{\mathsf{\text{ext}}}^{2}={J}^{-1}\left({I}_{a}\right).

For the MUD/CE, as shown in Figure 1, the transfer function is now derived for a number of *I*_{ext} ∈ [0,1]. We first generate the soft input symbols {\stackrel{\u0303}{x}}_{k}=\text{tanh}\left({\Lambda}_{\mathsf{\text{ext}}}\left({I}_{a}\right)/2\right) and the known pilot symbols for all users. After QPSK mapping, channel estimation and MUD is performed. The LLR output generated by the MUD is then feed to a sink, where the mutual information is computed through [39]

{I}_{\mathsf{\text{ext}}}=\frac{1}{2}\sum _{x=\pm 1}\underset{-\infty}{\overset{\infty}{\int}}p\left(\widehat{x}|x\right){\text{log}}_{2}\left(\frac{2p\left(\widehat{x}|x\right)}{p\left(\widehat{x}|-1\right)+p\left(\widehat{x}|1\right)}\right)d\widehat{x},

(26)

where the probability density function, p\left(\widehat{d}|d\right), is approximated using histogram calculations. The transfer functions are then averaged over 20 channel realizations. The transfer function for the SISO decoder is obtained in a similar way.

When generating the transfer function for the MUD/CE, the initial guess for the Krylov MMSE and SAGE ML has to be provided. In the receiver this value is given by the estimate obtained in the previous iteration. Since this value is unknown, we solve it by running the channel estimator twice, first initialized with the all one channel then reinitialized with the new output. This potentially leads to an over estimated performance at high *I*_{ext}. For SAGE ML this also leads to an under estimated performance at low values.

In Figure 5 the EXIT chart is shown for the different receiver combinations for the case of *N* = 4 receive antennas and *K* = 4 users at *E*_{
b
}/*N*_{0} = 10 dB. The transfer functions in the case of PCSI is also shown. Furthermore, the convergence path for PIC-MF with SAGE ML estimation is shown as a dashed line, and the receiver is estimated to converge in five iterations. This coincides with the observation for the BER in Figure 3. For the receivers where SAGE ML is used, a dip is seen in the transfer function at low *I*_{a}. This occurs since the algorithm is not taking the quality of the soft symbols into account, thus producing estimates based on very unreliable hard estimates of the transmitted symbols. This dip could be partly removed if only pilots are considered (*I*_{a} = 0) in the estimator if the reliability of the produced soft symbols are low.

Comparing the channel estimation algorithms, Krylov MMSE, used with *S*_{
K
}= 5, delivers performance identical to Joint MMSE. For SAGE ML, the performance is much worse, but the performance at low *I*_{a} is somewhat underestimated as discussed above. From Figure 5, we also see the impact of inaccurate CSI, illustrating itself by a gap between the transfer functions obtained when using the channel estimation and when having PCSI. As the reliability of the *a priori* information increases, this gap is decreased since the produced estimates become increasingly accurate. Looking at the MUDs, the MAP obviously has the best performance, followed by PIC-MMSE and PIC-MF. Furthermore, when the SNR is reduced (essentially leading to downward shift of the transfer functions of the MUD/CE), or when increasing the user load (essentially changing the slope of the transfer functions), the PIC-MF will be the first MUD closing the gap to the SISO decoder transfer function, and thus failing to converge.

Overall, we see that the insight given by the EXIT chart matches fairly well with what was observed for the BER. Furthermore, observing the MAP detector for *K* = 7 users in Figure 3, large difference in convergence performance between using the MMSE estimators or SAGE ML was observed. This could be explained by the fact that the gap in the EXIT chart is smaller for the latter estimator. From a algorithm design point of view, it is also interesting to observe that for the case presented in Figure 5 there is still room for further simplifications of the receiver structure. Additionally, the performance obtained when using an alternative channel code can be estimated by replacing the transfer function for the chosen convolutional code in Figure 5.