Performance evaluation of IB-DFE-based strategies for SC-FDMA systems

The aim of this paper is to propose and evaluate multi-user iterative block decision feedback equalization (IB-DFE) schemes for the uplink of single-carrier frequency-division multiple access (SC-FDMA)-based systems. It is assumed that a set of single antenna users share the same physical channel to transmit its own information to the base station, which is equipped with an antenna array. Two space-frequency multi-user IB-DFE-based processing are considered: iterative successive interference cancellation and parallel interference cancellation. In the first approach, the equalizer vectors are computed by minimizing the mean square error (MSE) of each individual user, at each subcarrier. In the second one, the equalizer matrices are obtained by minimizing the overall MSE of all users at each subcarrier. For both cases, we propose a simple yet accurate analytical approach for obtaining the performance of the discussed receivers. The proposed schemes allow an efficient user separation, with a performance close to the one given by the matched filter bound for severely time-dispersive channels, with only a few iterations.


Introduction
Single-carrier frequency-division multiple access (SC-FDMA), a modified form of orthogonal frequency-division multiple access (OFDMA), is a promising solution technique for high data rate uplink communications in future cellular systems.
When compared with OFDMA, SC-FDMA has similar throughput and essentially the same overall complexity.A principal advantage of SC-FDMA is the peak-toaverage power ratio (PAPR), which is lower than that of OFDMA [1,2].SC-FDMA was adopted for the uplink, as a multiple access scheme, of the current long-term evolution (LTE) cellular system [3].
Single-carrier frequency domain equalization (SC-FDE) is widely recognized as an excellent alternative to OFDM, especially for the uplink of broadband wireless systems [4,5].As other block transmission techniques, SC-FDE is suitable for high data rate transmission over severely time-dispersive channels due to the frequency domain implementation of the receivers.Conventional SC-FDE schemes employ a linear FDE optimized under the minimum mean square error (MMSE) criterion.However, the residual interference levels might still be too high, leading to performance that is still several decibels from the matched filter bound (MFB).Nonlinear time domain equalizers are known to outperform linear equalizers and DFE are known to have good performance-complexity tradeoffs [6].For this reason, there has been significant interest in the design of nonlinear FDE in general and decision feedback FDE in particular, with the IB-DFE being the most promising nonlinear FDE [7,8].IB-DFE was originally proposed in [9] and was extended for a wide range of scenarios in the last 10 years, ranging from diversity scenarios [10,11], MIMO systems [12], CDMA systems [13,14], and multi-access scenarios [15,16], among many other.Essentially, the IB-DFE can be regarded as a low complexity turbo equalizer [17][18][19][20] implemented in the frequency domain that do not require the channel decoder output in the feedback loop, although true turbo equalizers based on the IB-DFE concept can also be designed [21][22][23].An IB-DFE-based scheme specially designed for offset constellations (e.g., OQPK and OQAM) was also proposed in [24].In the context of cooperative systems, an IB-DFE approach was derived to separate the quantized received signals from the different base stations (BSs) [25].
Works related to IB-DFE specifically designed for SC-FDMA-based systems are scarce in the literature.In [26], the authors proposed an IB-DFE structure consisting of a frequency domain feedforward filter and a time domain feedback filter for single-user SC-FDMA systems.An iterative frequency domain multiuser detection for spectrally efficient relaying protocols was proposed in [27], and a frequency domain soft-decision feedback equalization scheme for single user SISO SC-FDMA systems with insufficient cyclic prefix was proposed in [28].
In this paper, we consider a broadband wireless transmission over severely time-dispersive channels, and we design and evaluate multi-user receiver structures for the uplink single-input multiple-output (SIMO) SC-FDMA systems that are based on the IB-DFE principle.It is assumed that a set of single antenna user equipment (UE) share the same physical channel to transmit its own information to the base station, which is equipped with an antenna array.Two multi-user IB-DFE-based processing schemes are considered, both with the feedforward and feedback filters designed in space frequency domain: iterative successive interference cancellation (SIC) and parallel interference cancellation (PIC).In the first approach, the equalizer vectors are computed by minimizing the mean square error (MSE) of each individual user at each subcarrier.In the second one, the equalizer matrices are obtained by minimizing the overall MSE of all users at each subcarrier.For both cases, we propose a quite accurate analytical approach for obtaining the performance of the proposed receivers.
The remainder of the paper is organized as follows: Section 2 presents the multi-user SIMO SC-FDMA system model.Section 3 presents in detail the considered multi-user IB-DFE-based receiver structures.The feedforward and feedback filters are derived for both cases and analytical approach for obtaining the performance is discussed.Section 4 presents the main performance results, both numerical and analytical.The conclusions will be drawn in Section 5.
Notation: Throughout this paper, we will use the following notations.Lowercase letters, uppercase letters, are used for scalars in time and frequency, respectively.Boldface uppercase letters are used for both vectors and matrices in frequency domain.The index (n) is used in time while the index (l) is for frequency.(.) H , (.) T , and (.) * represent the complex conjugate transpose, transpose, and complex conjugate operators, respectively, E[.] represents the expectation operator, I N is the identity matrix of size N × N, CN(.,.) denotes a circular symmetric complex Gaussian vector, tr(A) is the trace of matrix A, and e k is an appropriate column vector with 0 in all positions except the kth position that is 1.

System model
Figure 1 shows the considered uplink SC-FDMA-based transmitter of the kth user equipment.We consider a BS equipped with M antennas and K single antenna UEs share the same physical channel, i.e., the information from all UEs is transmitted at the same frequency band.A SC-FDMA scheme is employed by each UE and the data block associated to the kth ) is selected from the data according to a given mapping rule.Then, the L-length data block symbols are moved to frequency domain obtaining {S k,l ; l = 0, …, L − 1} = DFT{s k,l ; l = 0, …, L − 1}.After that, the frequency domain signals are interleaved so that they are widely separated in the OFDM symbol, therefore increasing the frequency diversity order.Finally, an OFDM modulation is performed and a cyclic prefix is inserted to avoid inter-symbol interference (ISI).Without loss of generality, we concentrate on a single L-length data block, although in practical system several data blocks are mapped into the OFDM symbol.
The received signal in frequency domain (i.e., after cyclic prefix removal, N-FFT, and chip demapping operations), at the mth BS antenna and on subcarrier l can be expressed as assuming that the cyclic prefix is long enough to account for channel impulse responses between the UEs and the BS.In (1), is the noise.In matrix format, (1) can be re-written as The channel vector of the kth user is defined as

Multi-user IB-DFE receiver strategies
In this section, we present in detail the multi-user iterative frequency domain receiver design strategies based on the IB-DFE concept [6].Two iterative approaches are considered: SIC and PIC.

IB-DFE SIC approach
Figure 2 shows the main blocks of the IB-DFE SICbased process.For each iteration, we detect all K UEs on lth subcarrier, in a successive way, using the most updated estimated of the transmit data symbols associated to each UE to cancel the corresponding interference.Thus, this receiver can be regarded as an iterative SIC scheme.However, as with conventional single-user IB-DFE-based receivers, we take into account the reliability of the block data estimates associated to UEs for each detection and interference cancellation procedure.
From Figure 2, we can see that at the ith iteration, the signal received on lth subcarrier associated to the kth UE, before the L-IDFT operation is given by denoting the feedforward and feedback vector coefficients of the kth UE applied on the lth subcarrier, respectively.The vector is the DFT of the block of time domain average values conditioned to the detector output for user k and iteration i. Cleary, the elements of S k 0 ;l are associated to the current iteration for the UEs already detected (k ' < k) and associated to the previous iterations for the UE that is being detected, as well as the UEs still not detected in this iteration.For normalized QPSK constellations (i.e., s k,n = ± 1 ± j), the average values are given by [13] where and We should emphasize that although we only consider QPSK constellations, IB-DFE-based schemes in general and our techniques in particular can easily be extended to other constellations.For this purpose, we just need to employ the generalized IB-DFE design of references [29,30].The hard decision associated to the symbol  Ŝk;l ≈ ρ k S k;l þ Δ k;l , which means that S k;l ≈ ρ 2 k S k;l þ ρ k Δ k;l , and in matrix form, we have S l ≈ P 2 S l þ PΔ l .It can be shown that the error Δ l = [Δ 1,l … Δ K,l ] T has zero mean and P = diag(ρ 1 , …, ρ K ), with correlation coefficients defined as being a measure of the estimates reliability associated to the ith iteration, approximately given by For larger constellations, an estimate of the correlation coefficient can be computed as in [29,30].For a given iteration and the detection of the kth UE, the iterative receiver equalizer is composed by coefficients These coefficients are computed to maximize the overall signal-to-interference plus noise ratio (SINR) at the FDE output and, therefore, minimize the bit error rate (BER).
If we consider a normalized FDE (i.e., E this is formally equivalent to minimize the MSE.For a QPSK constellation with Gray mapping, the BER can be approximately given where Q(x) denotes the well-known Gaussian function and MSE k,l is the mean square error on the frequency domain samples given by For the sake of simplicity, the dependence on the iteration index is dropped in (12) and in the following equations.After some mathematical manipulations, it can be shown that ( 12) is reduced as The different correlation matrices of ( 13) are given by with R s ¼ σ 2 S I K and R N ¼ σ 2 N I M , being the correlation matrices of data symbols and noise on each carrier.
From (11), we can see that to minimize the BER of each UE, we need to minimize the MSE of each UE on each subcarrier.However, only considering the MSE minimization may lead to biased estimates and thus to avoid it, we force the received amplitude of each user to one, i.e., 1 The constrained optimization problem can be formulated as min We use the Karush-Kuhn-Tucker (KKT) [31] conditions to solve the optimization at each step with all but one variable fixed.The Lagrangian associated with this problem can be written by where μ k is the Lagrangian multiplier [32].The KKT conditions are After straightforward but lengthy mathematical manipulation, we obtain the feedforward and feedback vector coefficients with the iterative index dependence, with The Lagrangian multiplier is selected, at each iteration i, to ensure the constraint 1 L It should be emphasizes that for the first iteration (i = 1), and for the first UE to be detected, P (0) is a null matrix and

IB-DFE PIC approach
Figure 3 shows the main blocks of the IB-DFE PICbased process.For each iteration, we detect all K UE on the lth subcarrier, in a parallel way, using the most updated estimated of the transmit data symbols to cancel the residual interference, which it could not be cancelled in the first equalizer block.Thus, this receiver can be regarded as an iterative PIC scheme [20].However, as with conventional IB-DFE-based receivers and the above SIC approach, we take into account the reliability of the block data estimates for each detection procedure.
From Figure 3, the received signal on lth subcarrier of all UEs, before the L-IDFT operation is given by where i is a matrix of size MxK with all UEs' feedforward vector coefficients, K ;l T is a matrix of size KxK with all UEs' feedback vector coefficients, and For this approach, the matrices l are computed to minimize the average bit error rate (BER) of all UEs, and for a QPSK constellation, the average BER can be approximately given by BER≈Q Here, the MSE l is the overall mean square error on the frequency domain samples given by Replacing ( 21) in ( 23) and after some mathematical manipulations, it can be shown that ( 23) is reduced to with the correlation matrices R Y;S l and R S;S defined as Note that the correlation matrices R Y l , R S; S , and R S;Y l were already defined in (14).Contrarily to the SIC approach, to minimize the average BER, we need to minimize the overall MSE at each subcarrier.Here, to avoid the bias, we force the received amplitude to K, i.e., 1  L optimization problem can be formulated as min We also use the KKT conditions to solve the optimization problem.The Lagrangian associated with this problem is now given by where μ is the Lagrangian multiplier.The KKT conditions are After lengthy mathematical manipulation, we finally obtain the feedforward and feedback matrices with the iterative index dependence, with In this approach, the Lagrangian multiplier is selected, at each iteration i, to ensure the constraint 1 L Since all users are detected in parallel, for the first iteration (i = 1), P (0) is a null matrix and S 0 ð Þ l is a null vector.The complexity of the SIC approach is slightly higher than the PIC one.For the SIC, we need to invert a matrix of size MxM for each user on each iteration, while for the PIC one, we need to invert a matrix of size MxM for all users on each iteration, i.e., the SIC approach requires K − 1 more matrix inversions per iteration.Since in the receiver SIC structure, each user is detected individually and sequentially, the delay is also higher.

Performance results
In this section, we present a set of performance results, analytical and numerical, for the proposed IB-DFE-based PIC and SIC receiver schemes.Two different scenarios are considered: Scenario 1, we assume two UEs (K = 2) and a BS equipped with two antennas (M = 2).Scenario 2, we assume four UEs (K = 4) and a BS equipped with four antennas (M = 4).
For both scenarios, the main parameters used in the simulations are N-FFT size of 1,024; L-DFT size set to 128 (this represents the data symbols block associated to each UE); sampling frequency set to 15.36 MHz; useful symbol duration is 66.6 μs, cyclic prefix duration is 5.21 μs; overall OFDM symbol duration is 71.86 μs; subcarrier separation is 15 kHz, and a QPSK constellation under Gray mapping rule, unless otherwise stated.Most of the parameters are based on LTE system [33].
The channel between each UE and the BS is uncorrelated and severely time dispersive, each one with rich multipath propagation and uncorrelated Rayleigh fading for different multipath components.Specifically, we assume a L p = 32-path frequency-selective block Rayleigh fading channel with uniform power delay profile (i.e., each path with average power of 1/L p ).The same conclusions could be drawn for other multipath fading channels, provided that the number of separable multipath components is high.Also, we assume perfect channel state information, synchronization and |α k | 2 = 1, ∀ k.The results are presented in terms of the average bit error rate (BER) as a function of E b /N 0 , with E b denoting the average bit energy and N 0 denoting the one-sided noise power spectral density.In all scenarios, we present the theoretical and simulation average BER performances for both proposed receiver structures: IB-DFE PIC and SIC.For the sake of comparisons, we also include the matched filter bound (MFB) performance.
Figures 4 and 5 show the performance results for the first scenario, considering IB-DFE PIC and IB-DFE SIC, respectively.Starting by analysing the results presented in Figure 4, it is clear that the proposed analytical approach is very precise, especially regarding the first iteration.Note that for this iteration, the IB-DFE PIC reduces to the conventional MMSE frequency domain multi-user equalizer, since P (0) is a null matrix and |α k | 2 = 1, ∀ k is a null vector.Although there is a small difference between theoretical and simulated results when we have iterations, mainly due to errors in the estimation of variance of the overall error at the FDE output (see (7)) and the non-Gaussian nature of the overall error, our analytical approach is still very accurate, with differences of just a few tenths of decibels.As expected, the BER performance improves with the iterations, and it can be observed that for the iteration, the performance is close the one obtained by the MF, many for high SNR regime.Therefore, the proposed IB-DFE PIC scheme is quite efficient to separate the users and achieve the maximum system diversity order, with only a few iterations.
From Figure 5, we can also see that the analytical approach proposed for the IB-DFE SIC structure is very accurate.The BER performance approaches, with a number of iterations as low as 4, very closely to the limit obtained with the MFB.This means mean that this receiver structure is also able to efficiently separate the UEs, while taking advantage of the space-frequency diversity inherent to the MIMO SC-FDMA-based systems.Comparing the SIC and the PIC approach, it is clear that for the first iteration the SIC approach outperforms the  PIC one.It can be observed a penalty of approximately 1 dB of the PIC against the SIC, for a BER = 10 −3 .This is because the SIC-based structure to detect a given user takes into account the previous detected ones, with the exception for the first user.However, when the number of iteration increases, the performance of the PIC approach tends to the one given by the SIC approach.We can observe that the BER performance of both approaches is basically the same for four iterations.Figures 6 and 7 show the performance results for the second scenario, considering IB-DFE PIC and SIC, respectively.From these figures, we basically can point out the same conclusions as for the results obtained in the previous ones.We can see that similarly to the first scenario the proposed analytical approaches for both IB-DFE SIC and PIC structure are very accurate.However, comparing the results obtained for this scenario with the ones obtained for scenario 1, we can see that the overall performance is much better.This is because our receiver structures can take benefit of the higher space-diversity order available in this scenario, since they are efficient in removing both multi-user and inter-carrier interferences.
The previous results indicate that IB-DFE receivers can have excellent performance, close to the MFB, for MIMO systems with QPSK constellations.One question that arises naturally is if this is still valid for larger constellations such as QAM constellations.In fact, the performance of a DFE for larger constellations can be seriously affected due to error propagation effects.As an example, we present in Figure 8 the performance results for 16-QAM constellations in the second scenario, considering IB-DFE SIC approach.Clearly, we are still able to approach the MFB, although we need more iterations, the convergence is less smooth and we only approach the MFB for lower BER (and, naturally, larger SNR).Although these good results might be somewhat surprising, we should have in mind that an IB-DFE is not a conventional DFE due to the non-causal nature of the feedback.Moreover, the error propagation effects are much lower in IB-DFE receivers due to the following issues: Symbol errors (which are in the time domain) are spread over all frequencies.Due to the frequencydomain nature of the feedback loop input, a symbol error has only a minor effect on all frequencies.The FDE is designed to take into account the reliability of estimates employed in the feedback loop.When we have a large number of symbol errors, the reliability decreases and the weight of the feedback part decreases.When we have a decision error, we usually move to one of the closer constellation symbols, i.e., the magnitude of the error is usually the minimum Euclidean distance of the constellation, regardless of the constellation size.This is especially important for larger constellations.
As we pointed out, an IB-DFE can be regarded as a complexity turbo equalizer implemented in the frequencydomain which does not employ a channel decoder in the feedback loop.For this reason, it has a turbo-like behavior with good performance provided that the BER is low enough.That is why we can only approach the MFB for larger SNR.

Conclusions
In this paper, we designed and evaluated multi-user receiver structures based on the IB-DFE principle for the uplink SIMO SC-FDMA systems.Two multi-user IB-DFE PIC-and SIC-based processing schemes were considered.In the first approach, the equalizer vectors were computed by minimizing the mean square error (MSE) of each individual user at each subcarrier.In the second one, the equalizer matrices were obtained by minimizing the overall MSE of all users at each subcarrier.For both cases, we proposed a quite accurate analytical approach for obtaining the performance of the proposed receivers.
The results have shown that the proposed receiver structures are quite efficient to separate the users, while allowing a close-to-optimum space-diversity gain, with  performance close to the MFB (severely time-dispersive channels) with only a few iterations.The performance of both PIC and SIC receiver structures is basically the same after three or four iterations.However, the main drawback of the SIC approach is the delay in the detection procedure, which is larger than for the PIC, since it detects one user at each time.Thus for practical systems, where the delay is a critical issue, the PIC approach can be the best choice.
To conclude, we can clearly state that these techniques are an excellent choice for the uplink SC-FDMA-based systems, already adopted by the LTE standard.

Figure 2
Figure 2 Iterative receiver structure for UE k based on IB-DFE SIC approach.

Figure 3
Figure 3 Iterative receiver structure based on IB-DFE PIC approach.

Figure 5
Figure 5 Performance of the IB-DFE SIC structure for scenario 1.

Figure 6
Figure 6 Performance of the IB-DFE PIC structure for scenario 2.

Figure 7
Figure 7 Performance of the IB-DFE SIC structure for scenario 2.

Figure 8
Figure 8 Performance of the IB-DFE SIC structure for scenario 2 and 16-QAM.

l represents the channel be- tween user k and the mth antenna of the BS on subcarrier l, where H
Performance of the IB-DFE PIC structure for scenario 1.