Low complexity lattice reduction scheme for STBC two-user uplink MIMO systems

Recently, a lattice reduction-aided (LRA) multiple-input multiple-output detection scheme has been proposed in junction with linear (as well as nonlinear) detectors. It is well known that these schemes provide a full diversity, and its complexity is comparable to that of linear detectors for the block fading channels. For the fast varying channels, however, the decoding complexity of LRA detection scheme is unreasonably high. This article proposes an efficient iterative lattice reduction (LR) scheme for an uplink system with two receive antennas at the base station and two users, each employing the Alamouti space-time block code (STBC). By taking advantage of the certain inherent STBC structure of transmitted symbols from users, the proposed scheme provides the same performance as a conventional LR while saving about 80% computational complexity. We also show that it can be successfully extended to handle the scenario where another interfering user, who is also employing the Alamouti STBC, is present.


Introduction
Exploiting multiple-input multiple-output (MIMO) in wireless communication systems has been proven to provide plenty of benefits in both increasing the system capacity and reliability of reception in rich scattering environments [1,2]. To take advantage of these benefits, a space-time block coding (STBC)-oriented diversity scheme has been widely adopted in future wireless communication standards [3], such as 3GPP LTE, WiMax, etc. The STBC scheme, originally proposed by Alamouti in [4], achieves transmit diversity without channel knowledge. Even though Alamouti's STBC was originally designed for two transmit antennas and one receive antenna, this scheme has been generalized by Tarokh in [5] and extended to a system for four transmit antennas. One of those schemes, a double space time transmit diversity (DSTTD) scheme, which is also called "stacked STBC" [6], allows two STBC signals to be sent simultaneously. The theoretical performances of STBC and DSTTD were analyzed in [7][8][9]. There is also a lot of research to make STBC system work in the multi-user environment [10][11][12][13][14].
As systems with multiple antennas are adopted commercially and the number of multiplexed streams increase, a more efficient detection scheme is requested. Therefore, there is always tradeoff between complexity and performance, and a practical multi-antenna system becomes limited by its complexity. The maximum likelihood detection (MLD) of a multi-user MIMO detection takes large complexity of O( N t K ), where Λ is the size of symbol constellation, N t is the number of transmit antennas, and K is the number of users who access the base station (BS) simultaneously. Although there have been some attempts to reduce the complexity while achieving near-ML performance, such as sphere decoding and modified MLD [15][16][17], these schemes still have large complexity compared with linear detection, with such a scheme being based on zero-forcing (ZF) or minimum mean square error criterion. Even though linear detection schemes have much lower complexity, its performance degradation is often excessively intolerable.
Recently, a lattice reduction-aided (LRA) linear detection scheme has been proposed as an alternative, offering as many diversity orders as MLD [18,19]. Moreover, its complexity is close to that of a linear detection scheme when the channel remains constant for a frame. Since lattice reduction (LR) should be performed at the beginning of the frame-block, the overload caused by the LR procedure can be negligible for large block length on slowly varying channels [20][21][22].
In most mobile wireless channel environments, however, LR should be often performed for every symbol to prevent performance degradation from the mismatch of channel variation, which could result in high complexity of LR. There were a few attempts to mitigate the LR complexity. In [23,24], the LR method for complex-lattice is introduced, which achieves an average complexity saving of nearly 50% (in terms of floating-point operations) without performance degradation. In [25][26][27], a low complexity LRA detection scheme has been proposed in temporally, spatially, and spectrally correlated MIMO channels, exploiting the channel correlation and unimodular property of the transformation matrix. Although these schemes reduce the detection complexity significantly, the complexity of LR still needs to be further reduced to make the LR scheme more attractive for MIMO detection in practical wireless environments.
In this study, we propose an efficient iterative LR scheme for an uplink system with two receive antennas at the BS and two users, each employing the Alamouti STBC. By taking advantage of the certain inherent STBC structure of transmitted symbols, the proposed scheme provides the same performance as a conventional LR while saving about 80% computational complexity. Furthermore, it is shown that the proposed scheme can be applicable with the whitening filtering preprocess for the interference environments.
The remainder of this article is organized as follows. In Section 2, we introduce the system model and establish the notation. In Section 3, an overview of conventional LR and the LRA ZF detector are reviewed. In Section 4, the proposed scheme for STBC multi-user uplink MIMO systems and simulation results are presented. Section 5 provides an application of the proposed scheme for an interference limited STBC multiuser uplink system. Finally, concluding remarks follow.

System model
We considered an uplink multi-user MIMO system with two receive antennas (N r = 2) at the BS and two users (K = 2), each equipped with two transmit antennas (N t = 2) employing the Alamouti STBC. (There are two receiving antennas at the BS and two transmitting antennas for each mobile station (MS), as shown in Figure 1.) The MIMO channels of each user are assumed to be homogenous without consideration of the user's geometry. If there are more than two users in the system, then we assume that a scheduling scheme would select two users for simultaneous transmission.
However, this article focuses more on the receiver for the MIMO uplink detection processing rather than scheduling.
The received signal vector r at two receive antennas over two symbol time can be written as where r i (t) is the received signal for the ith receive antenna at the symbol time of t, x k is a 2 × 1 transmitted symbol vector from the kth user with E x H k x k = E s , n i is a 2 × 1 additive white Gaussian noise with zero mean and variance of 0.5 per dimension for the ith receive antenna, and H i, k for 1 ≤ i, k ≤ 2 is an effective channel matrix from the kth user to the ith receiver antenna and has the following form from individual Alamouti's scheme [4] of each user: where h i, k (j) is the channel coefficient from the jth transmit antenna of user k to the ith BS receive antenna, and (·)*, (·) T , and (·) H denote complex conjugate, transpose, and Hermitian transpose, respectively.
In this case, the equivalent channel model is the same as the DSTTD scheme, and analytic studies of their performances with the optimal MLD and suboptimal LRA ZF detector were analyzed in [8,9]. (Note that the system model with uplink multi-user MIMO detection appears as a generalization of known single-user MIMO concepts to the multi-user case.) 3 Conventional LLL-based lattice reduction scheme

LLL algorithm
In this section, we describe a LLL (Lenstra Lenstra Lovasz) [28]-based LR (LLL-LR) algorithm. Since the LLL algorithm treats real values, we first transform the received signals in (1) to the equivalent form as follows: Where where Re{·} and Im{·} denote the real and imaginary parts, respectively. Assuming the perfect channel estimation at the receiver, the LLL-LR algorithm iteratively executes three functional processes: Gram-Schmidt orthogonalization (GSO), size-reduction, and basis swapping and finds the basis-reduced matrix H' R = H R T, in which T is an unimodular integer matrix.
Equations 6 and 7 are size-reduction and basis-swapping condition, respectively. If (6) is fulfilled, then the basis is called size-reduced or weakly reduced. In general, it will be assumed that δ is set to the 3/4 to ensure faster convergence [28]. If (7) is not satisfied, then corresponding two basis vectors are swapped and return to the GSO procedure; otherwise, the LLL-reduction process will end. The complexity of the LLL algorithm, therefore, highly depends on the number of column exchanges within the algorithm.

LR-aided ZF detection
By applying the LLL algorithm, the channel matrix H R is transformed into the reduced channel matrix H' R = H R T with near-orthogonal columns. Consequently, this nearorthogonal property of the reduced matrix enabled that the LRA linear receiver such as ZF achieves the same diversity orders as in MLD [20][21][22].
Considering the transformation unimodular matrix T into the system model (3), the received signal r R can be rewritten as where the symbols to be detected become for the ZF detector, we obtain Since (H R T) −1 is also well conditioned, the noise enhancement and coloring is relatively small. In order to estimate the transmitted symbols, the following operation has to be applied [21,22] where d = 6/(M − 1) is a given constant for M-QAM, Q denotes the componentwise quantization with respect to the infinite integer space, and 1 n is a n × 1 vector of ones.

Structured lattice reduction scheme
Since wireless channel often shows time-selectivity, the LR procedure should be performed as fast as the channel varies to offer the full diversity orders. Therefore, the complexity of LR needs to be further reduced to make the LR technique more practical for MIMO detection. In this section, the LLL-LR scheme is modified to offer substantial complexity savings for Alamouti's STBC-based multi-user uplink MIMO signal detection, while providing the same performance as conventional one.
The structured lattice reduction (SLR) scheme, which exploits the inherent structure of multi-user STBC, is proposed to cut down the computational complexity further. As shown in Table 1, the proposed scheme consists of two stages, and each stage is operated by the orthogonal LR (OLR)-block. The OLR-block consists of initial sorting, LLL-LR, re-ordering, and remaining basis generation, which will be explained in the following subsections.
Assuming the given channel matrix of (1) as H = [h 1 , h 2 , h 3 , h 4 ], the real-valued channel matrix H R of (4) can be rewritten as where where Since the transmitted symbols of each user have an orthogonal STBC structure, the matrix H and H R have an orthogonal property, such as Using the orthogonal property above, the first stage of the SLR scheme breaks up the columns of channel matrix into the two matrices of half size, which are as follows: Then, H a is transformed into real-valued form as in (4), and this is the input matrix of the OLR-block at the first stage of the SLR scheme

Initial sorting
A LLL-LR scheme iterates the basis reduction until basis vector swapping does not occur, which means that the basis vectors are sorted to satisfy the condition of (7). In the consideration of (7), one heuristic and efficient method that reduces the iterations is to sort the basis vectors according to the magnitude of their norms before the LLL-LR is applied [21,23]. Let θ be the permutation order of basis vectors, then the input matrix H in, a of OLR-block is sorted as follows: Therefore, we obtain an ordered matrix S a = [s 1 , s 2 , s 3 , s, 4 ], where s i = h a,θ (i) for 1 ≤ i ≤ 4, and this is an input matrix of the LLL-LR.

LLL lattice reduction
After all the basis vectors are sorted in ascending order of the magnitude, the conventional LLL-LR is performed Function of OLR-block Input: H in, a = [h a, 1 , h a, 2 , h a, 3 , h  with ordered matrix S a , which gives a reduced channel matrix G a as its output G α = g 1 , g 2 , g 3 , g 4 .
As mentioned in Section 3.1, the basis vectors are exchanged if swapping condition in (7) is not satisfied. Therefore, the order of basis vectors is changed, when swapping event occurs. Let π be the permutation order of basis vectors caused by the swapping event, so that we can express s' π (i) = g i , where s' (i) is the reduced basis of s'.

Re-ordering
The proposed scheme begins with the LLL-LR for submatrix having half columns of channel matrix, with which the whole reduced matrix will be constructed by taking advantage of the known STBC structure. Therefore, the corresponding order of columns in G a of (18) and H in, a of (16) must be the same to maintain the STBC structure. In order to keep track of the column swaps in the initial sorting and the LLL-LR, the inverse permutations of θ and π should be applied.
First of all, re-ordering is executed to return the order back to the original order incurred by LLL-LR.
Then, we also perform the re-ordering procedure to correct the mixed order of basis vectors caused by the initial sorting and obtain a re-ordered reduced matrix H' in,a as follows: where h' α,θ (i) = u i . (20)

Remaining basis generation
Finally, the reduced matrix of the remaining basis vectors, H' in,β = h' β,1 , h' β,2 , h' β,3 , h' β,4 , is simply generated from H' in,a . According to the known STBC structure of (2), the column vector of H' in,b can be obtained from the following relationship: where h' a, l (j) denotes the jth low entry of h' a, l . Accordingly, the resulting matrices H' in,a and H' in, b wind up satisfying the STBC structure, which are as follows: and this is the output matrix at the first stage of the SLR scheme.
In the second stage, we update the partial matrices H' R,a and H' R,b as (23) and (24), where H' R,a is a combination of the 1st and 4th (correspondingly 5th and 8th)related column vectors of the first stage output H' R , so that all the channel vectors are jointly involved: By applying the same procedure with (23) as was done in the first stage of the SLR from (16) to (21), we can find the basis-reduced matrix H" in,α and generate the remaining matrix H" in,β , Eventually, the proposed scheme ends up with the basis-reduced matrix H" R as follows: H out,2 H" R = h" R,1 , . . . , h" R,4 ,h" R,1 , . . . ,h" R,4 , (25) where h" R,l andh" R,l are the reduced basis from h' R,l andh' R,l of H' R .

Iterative scheme of structured lattice reduction
The SLR scheme reduces the complexity by decomposing the common LR into two stages of LRs where each stage uses a half size matrix. However, it may happen that the basis swapping at the second stage may affect the orthogonal structure obtained by the first stage. Therefore, we propose an iterative SLR (I-SLR) scheme to further reduce the basis vectors, in this subsection. As shown in Figure 3, the I-SLR is executed with the 1st and 3rd column of H in the first stage of the OLR-block and the 2nd and 4th column of H in the second stage of the OLR-block, which is the same as the SLR scheme. If the swap event occurs at the second stage of the SLR-block, then the iteration begins again with the updated partial input matrix H R,α := h" R,1 , h" R,3 ,h" R,1 ,h" R,3 from (25). Otherwise, the output matrix H" R is determined as the finally reduced matrix, and the process will end. It is noted that this iterative approach still exploits the STBC structure.

Simulation results
In this section, we compare the empirical complexity and the quality of reduced matrix for each LR scheme. At first, we discuss the empirical complexity in terms of an average number of required column exchanges within each LR algorithm. The complexity of the LLL algorithm depends on the size of input matrix and the number of column exchanges. Even though the proposed schemes have additional processes, all the processes of the SLR and I-SLR scheme consist of a half size input matrix compared with the conventional one. In order to investigate the impact of this half size matrix, Table 2 shows the average number of required column exchanges (c) for each LR scheme. A common LLL-LR scheme requires an average number of 7.03 column exchanges. By applying the proposed schemes, an average number of column exchanges are significantly reduced by a factor of 10 to 0.70 for the SLR, and to 0.73 for the I-SLR. About 90% of the complexity can be saved by the proposed schemes. For a fair comparison, we also investigate the effects of the common LLL-LR with initial sorting, where it helps to reduce the complexity of common LLL-LR from 7.03 to 3.07. In this case, the saving of the proposed scheme over LLL-LR with initial sorting is about 77%.
Next, we can verify the degree of LR in terms of the bit error rate (BER). Figures 4 and 5 show that the SLR-aided ZF detector (SLRA-ZF) has almost the same performance as a conventional LLL-LR-aided ZF detector (LLL-LRA-ZF) when the required BER is less than 0.005. Moreover, the I-SLR aided ZF detector (I-SLRA-ZF) provides the full diversity and the same performance as the LLL-LRA-ZF. Note that this performance comes at almost no cost, because the complexity measured by column exchanges is comparable to that of SLRA-ZF.
From Table 2 and Figures 4 and 6, it is noted that the reduced matrices obtained by the conventional LLL-LR and I-SLR schemes provide almost the same performance when it is applied to the detection of a spatially multiplexed STBC signal. The complexity saving obtained does not come with any loss in performance. We can also observe that although the SLR scheme does not always provide full diversity, its output H" R is well conditioned compared to a non-reduced channel matrix H R .

LRA linear detection scheme in the interference limited STBC multi-user uplink systems
In this section, a proposed scheme is extended to handle the scenario where another interfering user (also employing the Alamouti STBC) is present. We examine the feasibility of the proposed SLR scheme with whitening filter in the interference environment.
Suppose there is a co-channel user in a multi-cell environment, the detection capability of the receiver could be significantly degraded by the interference from   that user. In this scenario, a received signal at the BS can be written by where H i, j is the 2 × 2 interference channel matrix, which also has the same STBC structure, and x i is interferences signal. The optimum preprocessing requires pre-whitening filtering against interference.

Conventional LRA detection with whitening in the interference limited channel
In order to overcome the performance degradation caused by the correlated interference, we can apply the whitening filter before the LRA detection. The interference whitening procedure can be done as follows: where and , and I m denotes the m × m identity matrix. By applying the conventional LRA detection scheme to the above whitened effective channel matrix H W in (28), we can recover the transmitted symbol of two intended cochannel users with interference whitening.

Proposed I-SLR-aided detection with whitening in the interference limited channel
As shown in Section 4, the SLR and I-SLR scheme can reduce the complexity for the LR procedure, if the channel matrix has the known STBC structure. In order to apply the proposed iterative scheme to LRA detection, the input channel matrix should have a known orthogonal structure. Therefore, if only the whitened effective channel matrix H W = W −1/2 H keeps the known orthogonal structure, the proposed iterative scheme can be applied successively.
At first, the whitening filter W −1/2 in (27) can be written as follows: [see Appendix] where where and    Therefore, the whitening filter of (30) can be expressed as By applying the whitening filter W -1/2 of (34) to the effective channel matrix H, the whitened effective channel matrix H W can be re-written as follows: In (35), it is shown that the whitened effective channel matrix H W has the known orthogonal structure as the original effective channel matrix H. Therefore, we can apply the proposed SLR-aided detection scheme to the whitened effective channel matrix H W . As shown in Section 4, the proposed scheme has the same performance as a LLL-LR scheme while saving about 80% complexity.

Simulation results
The simulations are performed using a STBC multi-user uplink MIMO system with a BS of two receive antennas and two co-channel users equipped with two transmit antennas. It is assumed that there is a STBC-coded interference user. Figure 6 shows four groups of BER curves for three different values of signal-to-interference ratio. In each group, the upper two curves indicate the LLL-LRA-ZF and the I-SLRA-ZF without pre-whitening, and the bottom two curves indicate the LLL-LRA-ZF and the I-SLRA-ZF with pre-whitening. It is shown that the whitening procedure before the I-SLRA-ZF keeps the same performance in comparison to the conventional one while reducing the complexity.

Conclusion
In this article, an efficient (iterative) structured LR scheme for uplink two-user STBC is proposed, so that it can reduce the complexity of the LR significantly by exploiting the orthogonal structure of the STBC. The proposed scheme is shown to provide the same performance as a conventional LRA-ZF with the reduction of complexity. The numerical results show that the proposed scheme achieves about a complexity reduction of 80%. It is also shown that the SLR can still be applicable with a prewhitening filter in the interference environment.