A Chip-Level BSOR-Based Linear GSIC Multiuser Detector for Long-Code CDMA Systems

We introduce a chip-level linear group-wise successive interference cancellation (GSIC) multiuser structure that is asymptotically equivalent to block successive over-relaxation (BSOR) iteration, which is known to outperform the conventional block Gauss-Seidel iteration by an order of magnitude in terms of convergence speed. The main advantage of the proposed scheme is that it uses directly the spreading codes instead of the cross-correlation matrix and thus does not require the calculation of the cross-correlation matrix (requires 2 NK 2 ﬂoating point operations (ﬂops), where N is the processing gain and K is the number of users) which reduces signiﬁcantly the overall computational complexity. Thus it is suitable for long-code CDMA systems such as IS-95 and UMTS where the cross-correlation matrix is changing every symbol. We study the convergence behavior of the proposed scheme using two approaches and prove that it converges to the decorrelator detector if the over-relaxation factor is in the interval ]0, 2[. Simulation results are in excellent agreement with theory.


INTRODUCTION
Actual cellular systems such as IS-95 and UMTS are longcode CDMA systems.The spreading codes used in the uplink channels are long codes which span thousands of symbols.These spreading codes are also known as random codes since they appear to change randomly from one symbol period to another.
The main reason for not incorporating multiuser detectors in current cellular systems is that the latter are long-code systems while most multiuser detectors developed until now assume a short-code system [1].Depending on whether a long-code or a short-code system is considered, multiuser detectors can be divided into two categories: symbol level (also known as narrowband) and chip level (also known as wideband) [2].Symbol-level multiuser detectors act on the matched filter outputs while chip-level multiuser detectors act directly on the received signal.Moreover, symbol-level multiuser detectors make use of the cross-correlation coefficients whereas chip-level multiuser detectors use the spreading codes directly and thus avoid the calculation of the crosscorrelation matrix.This very attractive property of chiplevel multiuser detectors is the key-point for developing lowcomplexity multiuser detectors for long-code CDMA systems.
Chip-level linear multistage detectors have received significant attention in recent years due to their ability to approximate the decorrelator/LMMSE detectors efficiently but with much less computational complexity [2].At each stage, the estimated interference from the current user/group of users is subtracted out from the total signal to reduce the interference seen by other users.Depending on the interference cancellation procedure implemented at each stage, two types of multistage detectors are covered in the literature: successive interference cancellation (SIC) and parallel interference cancellation (PIC) detectors [3][4][5].The successive interference cancellation is one of the simplest multiuser detectors.It requires only marginal additional computational complexity over the conventional-matched filter detector.Chip-level linear successive interference cancellation and chip-level linear parallel interference cancellation detectors are shown to be equivalent to Gauss-Seidel and Jacobi iterative methods used in matrix inversion, respectively [4,5].While the linear chip-level PIC is not stable, the chip-level linear SIC is stable and exhibits less computational complexity at the expense of more delay detection time.In order to reduce the long delay detection time of the chip-level linear SIC, chip-level linear GSIC detectors were proposed in [6,7].While the authors in [6] suggested a chip-level linear GSIC detection scheme and showed that if the proposed structure converges it converges  to the decorrelator detector only, the authors in [7] proposed a chip-level linear GSIC detection scheme that converges not only to the decorrelator detector as in the case of [6] but to the LMMSE detector as well.
In this work, we prove that the proposed scheme in [6] is in fact equivalent to the block Gauss-Seidel iterative method if the group-detection scheme is the decorrelator detector.Moreover, we propose a new scheme that is equivalent to the BSOR iterative method, which is well known to outperform the conventional block Gauss-Seidel method by an order of magnitude in terms of convergence speed.We study its convergence behavior and determine the condition of convergence using two different approaches that lead to the same result.
The work proposed here has two contributions.The first contribution consists of identifying the structure proposed in [6] as a block Gauss-Seidel iterative method if the group detection scheme is the decorrelator detector.This is very important because it enables the use of the rich theory of iterative methods to study the convergence behavior of the scheme in [6].The second contribution, which is based on iterative methods theory, consists of proposing a weighted group SIC structure that is equivalent to the block SOR iterative method that is known to exhibit fast convergence.
The work of [7], which is based on matrix transformation, is therefore totally different from the one proposed here as the former proposes a group SIC structure that is able to converge to either the decorrelator detector or the LMMSE detector.However, the proposed structure converges to the decorrelator detector only.
Finally, it is important to know that the work reported here considers linear group detection only.Nonlinear group detection can be found in a work such as the one reported in [8].

SYSTEM MODEL AND THE PROPOSED BSOR-BASED LINEAR GSIC STRUCTURE
In this work, we consider a scenario of an uplink channel where K users transmit simultaneously over a synchronous additive white Gaussian noise (AWGN) channel using binary phase shift keying (BPSK).Each user is characterized by its own pseudonoise code of length N chips.The received signal is expressed in vector form as where S is an N ×K matrix of linearly independent spreading codes, A is a K × K matrix of the received amplitudes, b is a K-length vector of transmitted binary symbols, and finally n is an N-length vector of independently and identicallydistributed additive white Gaussian-distributed samples with zero-mean and variance σ 2 and are defined as ( Here, s k , a k , and b k are the N × 1 vector of the spreading code, received amplitude, and data symbol of the kth user, respectively.
In the following we assume that the K-users are partitioned into G groups, where the gth group consists of U g users such that and thus the matrix of spreading codes can be partitioned as S = (S 1 , S 2 , . . ., S g , . . ., S G ) where as the cross-correlation matrix of the spreading codes, R i, j = S T i S j as the (ith, jth) submatrix of R, and A g as the gth diagonal submatrix of matrix A. We assume that R and R g,g (for g = 1, 2, . . ., G) are nonsingular (since the spreading codes are assumed to be linearly independent).Therefore, both matrices R and R g,g (for g = 1, 2, . . ., G) are positive definite.
The proposed linear weighted GSIC detector which we call for brevity the chip-level linear BSOR-GSIC detector consists of group interference cancellation units (GICU) arranged in a multistage structure of M stages as illustrated in Figure 1.The basic linear GICU is shown in Figure 2. The residual signal e m,g at the input of the mth-stage, gth-group GICU, is first despreaded, multiplied by a transformation matrix R −1 g,g and then by a weighting factor μ to estimate the vector of the partial decision variables y m,g of users of the gth group at the mth stage that is y m,g = μR −1 g,g S T g e m,g .The vector of the decision variables of the users of the gth group at the mth stage is obtained by summing up the vector of decision variables of the previous stage y m−1,g and the vector of partial decision variables of the current stage y m,g , that is, y m,g = y m,g + y m−1,g .
The residual signal for the next GICU is obtained by spreading the vector of the partial decision variables y m,g and subtracting it from the residual signal of the current GICU e m,g , that is, e m,g+1 = e m,g − S g y m,g .

CONVERGENCE ANALYSIS
Let e 1,1 = r be the input signal to the chip-level linear BSOR-GSIC scheme, at the mth stage, the vector of decision variables of the gth group of users at the mth stage of the chiplevel linear BSOR-GSIC detector is derived as Exact derivation of ( 3) is given in Appendix A. At convergence, we have y m,g = y m−1,g = y ∞,g where y ∞,g is the vector of decision variables at convergence, therefore (3) is equivalent to Equation ( 4) is equivalent to Since R g,g is nonsingular, (5) could be written in matrix form as Finally, (6) could be written as Therefore, if the proposed scheme converges, it converges to the decorrelator detector.

First approach
This approach allows the identification of the proposed scheme as the BSOR iterative method, which facilitates the determination of the condition of convergence.Let us first establish the analogy between the proposed scheme and the corresponding iterative method used to solve a set of linear equations which is known as the BSOR method.
The matrix R could be decomposed into three parts, that is, R = D−L−U, where D is block diagonal matrix, that is ), and L and U are the remaining lower-left and upper-right block triangular parts of R, respectively.After some manipulations, (3) could be written in matrix form as which is exactly the BSOR iteration.See Appendix B for the exact derivation of (8).Note that if μ = 1 (this is the case for the scheme proposed in [6] where the group detection scheme is the decorrelator detector), the iteration in (8) reduces to the block Gauss-Seidel iteration.For the convergence of ( 8), we use the following corollary [9].Thus, for real μ, the iteration in (8) converges if μ ∈ ]0, 2[.Nevertheless, one should set μ within the interval ]1, 2[ which corresponds to over relaxation (acceleration) since the interval ]0, 1[ corresponds to under relaxation (deceleration) and it is basically used to ensure convergence of the block Gauss-Seidel iteration if it is not convergent.The calculation of the optimum value of μ for which the convergence is maximum depends on the maximum eigenvalue of the iteration matrix which is complicated to be computed.However, one can get a cheap fairly-accurate estimate of the optimum value of μ based on some upper bound on the maximum eigenvalue of the iteration matrix as in [10].

Second approach
This approach was used in [6] to study the convergence behavior of the GSIC detector.We adopt it here to determine the condition of convergence of the proposed scheme.From Figure 2, we have For convergence we have (11) Following the recursion in (11), (10)  Therefore, the chip-level linear BSOR-GSIC converges if where λ max is the maximum eigenvalue.Since for square matrices X and Y with the same dimensions, the matrices XY and YX have the same eigenvalues, all the Ω g , 1 ≤ g ≤ G have the same eigenvalues.Thus, we consider the case where g = G.
Consequently, the chip-level linear BSOR-GSIC converges if In the following, we consider the following lemma [6].
Thus, if |λ max (B g )|, 1 ≤ g ≤ G, is less than one, then the condition in ( 14) is satisfied and the linear BSOR-GSIC is guaranteed to converge.We have max 1≤g≤G (|λ g,g S T g )), but S g R −1 g,g S T g is a projection matrix and thus |λ max (S g R −1  g,g S T g )| = 1.Consequently, 0 < μ < 2 which is the same condition as for the BSOR method.

COMPUTATIONAL COMPLEXITY
The computational complexity of the proposed detector requires M G g=1 U g (4N +1)+ G g=1 (11U 3 g +(3/2)U 2 g +U g ), however, the evaluation of R g,g = S T g S g needs 2N G g=1 U 2 g flops.Thus, the total is g .The computational complexity of the symbol-level linear BSOR-GSIC detector which is illustrated in Figure 3 is (16) However, the evaluation of the matched filter outputs and R = S T S needs (2N − 1)K and 2NK 2 , respectively.Thus, the total is flops.Finally, the decorrelator detector needs at least (lower bound) [11] It is clear from the expression above that the computational complexity is a function of the number of usersK and the number of users within each group U g .For the rest of the parameters such as the processing gain N and the number of stages M, they are fixed.It is important to note that the number of stages M needed for convergence should be less than K so that the computational complexity is in the order of O (K 2 ) rather than O (K 3 ) for the decorrelator detector.This is the situation for the most practical cases as it is shown in Figure 6.The computational complexity of the proposed chip-level linear BSOR-GSIC detector and the symbol-level linear BSOR-GSIC detectors is illustrated in Figure 4.
Finally, note that for the case of the asynchronous multiapth-fading channel, the received signal is not processed in a symbol-by-symbol approach due to the asynchronism of users; instead, a processing window of length W symbols is used.For this case, all the above expressions remain the same except for the number of users K that should be substituted by WK.

EFFECT OF GROUPING
The effect of users' grouping on the convergence behavior of the GSIC detector was studied partially in [6] and in detail in [12].It was shown that if the group-detection scheme is the decorrelator detector, as in our case, the convergence speed increases with decreasing the number of groups.Thus, it is favorable to decrease the number of groups as much as possible.However, decreasing the number of groups results in increasing the size of each group and therefore increasing the computational complexity of the proposed detector since the cross-correlation matrix of each group of users has to be inverted.Hence, users' grouping (the number of groups) is a system design parameter that is determined by the tradeoff between convergence speed and computational complexity.
Simulation results showing the effect of grouping are provided in Section 8.

EXTENSION TO THE CASE OF ASYNCHRONOUS MULTIPATH FADING CDMA CHANNEL
For the case of asynchronous CDMA multipath fading channel, the structure presented in Figures 1 and 2 are the same, only the spreading code matrix for the gth group S g is substituted by S g where S g = (s g,1 , s g,2 , . . ., s g,ug , . . ., s g,Ug ) and s g,ug = s g,ug * h ug .Here h ug is the vector of the complex fading coefficients of the u g th user's channel and * denotes the convolution operation.Moreover, the conjugate operators (H) should replace all the transpose operators (T).In this case, then, the cross-correlation matrix R = S H S is hermitian (as a result of combining the code cross-correlation matrix and the complex gain multipath fading channel matrix) and may become singular in some cases.In [11], it was found that the cross-correlation matrix is nonsingular if KL ≤ 3N, where L is the number of multipaths.Based on this practically supported fact, it follows that the proposed structure will converge to the decorrelator detector if the condition KL ≤ 3N is satisfied.Moreover, all the aforementioned convergence analysis and conditions of convergence are valid for this case as well, that is, the proposed structure converges to the decorrelator detector if it converges and the condition of convergence is 0 < μ < 2. This will be validated in the simulation section.

SIMULATION RESULTS
To show the important reduction in computational complexity one can gain by using the proposed chip-level multiuser detector, the computational complexity of the chiplevel/symbol-level BSOR-GSIC detectors is evaluated using the expressions in Section 6 and plotted in Figure 4.Here G is equal to 4 and N is set to 31 throughout the simulations.Two cases are assumed: in (a) the number of stages needed to approximate the decorrelator detector's average BER performance (average BER of all users) is M = K/2 while in (b) M = K/5.It is clear that in both cases the computational complexity of the proposed chip-level BSOR-GSIC detector is less than that of the symbol-level BSOR-GSIC detector and this difference between the two increases significantly for high loads, that is, if K/N≈ 1.
In all subsequent simulations and for sake of comparison, one should note that the scheme proposed in [6] which we use as a benchmark is obtained by setting the relaxation factor μ = 1.In the following, we simulate the convergence behavior of the proposed linear BSOR-GSIC multiuser detector in an AWGN channel.For all simulations conducted, Gold codes are used and thus the cross-correlation between users is equal.This removes any effect of certain grouping or order of cancellation.In Figure 5, the relaxation factor is varied in the interval ]0, 2[ to illustrate its impact on the average BER (average of all users' BER) of the proposed scheme.The SNR is set to 10 dB, M = 4, K = 20 and perfect power control is assumed.Two different groupings are used, specifically, G = 2 and G = 10 equally sized groups are used.It can be seen that the minimum achievable average BER level is for a relaxation factor of about 1.2 for G = 2 and 1.4 for G = 10.Note that the optimum relaxation factor is different from one grouping to another; this is mainly because the iteration matrix [D − μL] −1 [(1 − μ)D + μU], which the optimum relaxation factor relies on, depends on grouping through the block diagonal matrix D.   In Figure 6, the convergence behavior of the proposed detector is investigated.The SNR is set to 8 dB, K = 20, G = 2 and perfect power control is assumed.The number of chip-level linear BSOR-GSIC stages is varied between 1 and 15 and the average BER performance of the proposed detector is evaluated for μ = 1, 1.2, 1.4, 1.6, and 1.8.It is clear that the chip-level linear BSOR-GSIC  detector with μ = 1.2 results in the fastest convergence speed (4 stages are enough to converge to the decorrelator's detector average BER performance).One can notice also that for μ = 1.8 the average BER performance of the proposed detector exhibits an oscillating behavior which is expected because we are close to the region of divergence ([2, +∞)).In Figure 7, the capacity (number of users) of the proposed scheme is evaluated.Here, the SNR is set to 10 dB,G = 2, μ = 1.2 and perfect power control is assumed.We note that with increasing the number of stages the linear BSOR-GSIC detector can support more users, for example, for an average BER of 10 −3 theproposed scheme with M = 3 can support up to 25 users whereas that with M = 2 can support 20 users.
In Figures 8 and 9, the near-far resistance of the proposed scheme is assessed.For the near-far ratio, the amplitude of the first user is fixed and the amplitude of the other users is varied from one to 20 times that of the first user.The BER of the first user versus the near-far ratio is then plotted.The SNR is set to 10 dB, M = 4, and K = 20.For Figure 8 (G = 2), the near-far resistance is maximum for a relaxation factor of 1.2 whereas the near-far resistance is maximum for a relaxation factor of 1.4 in Figure 9 (G = 10).
In Figure 10, the effect of grouping is illustrated.It is clear that as the number of groups decreases (the size of each group increases), the convergence speed of the proposed structure increases.However, the computational complexity on the other hand increases as well.This agrees well with the results obtained in [12].
In Figure 11, we change the relaxation factor in the interval ]0, 2[ to illustrate its impact on the average BER (average of all users' BER) of the proposed scheme in an asynchronous CDMA multipath Rayleigh fading channel.Now, the SNR is set to 6 dB, M = 4, K = 24, vehicular A outdoor channel power delay profile for WCDMA is used and perfect power control is assumed.Two different groupings are used, specifically, G = 2 and G = 12 equally sized groups are used.We see that the minimum achievable average BER level is for a relaxation factor of about 1.2 for G = 2 and 1.4 for G = 10.It is easy to note that the proposed scheme converges if the relaxation factor is between 0 and 2.Moreover, the minimum achievable BER is for a relaxation factor of about 0.8, which is in a good agreement with the theory.Finally, it is important to note that the detection delay is reduced by a factor G/K, compared to that of the linear SIC detector.

Corollary 1 .
Let R be an K-by-K hermitian matrix and R = D − L − U, where D is block diagonal matrix, and L and U are the remaining lower-left and upper-right block triangular parts of R. If D is positive definite, then the block successive overrelaxation method is convergent for all y o if and only if 0 < μ < 2 and R is positive definite.

Figure 5 :
Figure 5: Average BER of the chip-level linear BSOR-GSIC detector versus the relaxation factor.

Figure 6 :
Figure 6: Convergence behavior of the chip-level linear BSOR-GSIC detector for different values of the relaxation factor.

Figure 8 :
Figure 8: Near-far resistance of the chip-level linear BSOR-GSIC detector for different values of the relaxation factor (G = 2).

Figure 9 :
Figure 9: Near-far resistance of the chip-level linear BSOR-GSIC detector for different values of the relaxation factor (G = 10).

Figure 10 :Figure 11 :
Figure 10: Effect of grouping on the convergence behavior of the BSOR-GSIC detector.