Near-Optimum Detection with Low Complexity for Uplink Virtual MIMO Systems

.


Introduction
Multiple input multiple output (MIMO) techniques are potentially expected to be introduced in most mobile communication systems for an increase in wireless channel capacity.Adopting MIMO schemes, diversity gain and coding gain can be simultaneously achieved, since a number of independent radio channels are generated by placing multiple antennas in the transmitter and the receiver [1].In particular, to effectively guarantee user throughput in an uplink situation, multiuser MIMO schemes have recently drawn increased attention [2][3][4].The uplink MIMO techniques have also been adopted in mobile worldwide interoperability for microwave access (WiMAX) systems that are based on the IEEE 802.16e − 2005 standards [5] or 3rd generation partnership project long-term evolution (3GPP-LTE) systems [6].Especially, for the uplink mobile WiMAX situation, virtual MIMO which is called "collaborative spatial multiplexing (CSM)," is adopted as a mandatory profile in the IEEE 802.16e standards.In the virtual MIMO scheme, each data stream of two portable subscriber stations (PSSs) is simultaneously transmitted through the same OFDMA resources.Assuming the perfect cancellation of multiuser interferences, the achievable channel capacity of uplink mobile WiMAX using the virtual MIMO technology can be increased in proportion to the number of PSSs.
In 2007, the WiMAX forum mobile radio conformance test (RCT) [7] provided a criterion to verify the system performance of mobile WiMAX Wave-II MIMO, based on the well-known optimal maximum likelihood detection (MLD), which can achieve the best performance [8].However, the optimal MLD requires a sizeable amount of computational complexity on the receiver side, which may exponentially increase as the number of transmit antennas and the modulation level increase.Therefore, suboptimal detection algorithms that can reduce the complexity are required.Among previous studies, QR decomposition and M-algorithm-MLD (QRM-MLD) and sphere decoding (SD) schemes have been reported to achieve a near ML performance.However, these schemes require additional computation after the hard decision for log-likelihood ratio (LLR) information of all bits [9,10].Another previous work has proposed parallel detection (PD) based on successive symbol cancellation [11].By making use of the property of the MIMO channel, the algorithm can attain near MLD performance with a slight increase in computational complexity.However, PD cannot obtain LLR information for all the transmit layers, because this scheme considers all possible symbols for the first transmit layer only.
In this paper, we propose a two-step MIMO decoding scheme that is an extension of PD with low computational complexity for feasible implementation of uplink mobile WiMAX systems that have an iterative channel decoder using bit LLR information.Unlike the optimal MLD, in which all the layers are fully searched in all possible combinations of symbol sets to determine the LLR values, the proposed scheme performs a search for only one transmit layer in the first step, and then the LLR values of the residual transmit layers are simply determined in the second step.These procedures have only 15.75% computational complexity in terms of real multiplication as compared to the optimal MLD, when both PSSs transmit 16 quadrature amplitude modulation (QAM) signals, and only 3.77% for 64 QAM signals.Nevertheless, the proposed scheme is shown to achieve reasonable block error rate (BLER) performance comparable to the optimal MLD.We also propose another MIMO decoding scheme that performs an independent search for each transmit layer.This scheme achieves exactly the same BLER performance of the optimal MLD with only 28% and 7.22% computational complexity, when both PSSs transmit 16 QAM and 64 QAM signals, respectively.This paper is organized as follows: Section 2 introduces the uplink virtual MIMO systems, and describes our proposed MIMO decoding algorithms.The computational complexity of the MLD and the proposed schemes are analyzed in Section 3. We introduce link-level simulation environments for mobile WiMAX systems and discuss the simulation results in Section 4, followed by concluding remarks in Section 5.

Virtual MIMO Decoding Schemes
Figure 1 shows a virtual MIMO system where for simplicity we consider two PSSs and one radio access station (RAS).We assume a single transmitting antenna for each PSS, and two PSSs transmit data streams on the same OFDMA resources simultaneously.As the RAS receives multiple data streams through two antennas, it makes a 2 × 2 independent fading channel condition.Figures 2 and 3 show the block diagram of the transmitter for one PSS and the receiver for the RAS with two receiving antennas in mobile WiMAX systems, respectively.
For the purposes of a more general discussion, we consider a MIMO system with N r receiving antennas and N t PSSs, where N r ≥ N t .The overall channel H, which is an  N r × N t complex matrix, can be represented as The modulated symbols transmitted from N t PSSs can be written as where T is the transpose operation.The received signal vector r of size N r × 1 is then expressed as where n is the zero-mean additive white Gaussian noise (AWGN) vector.Now we explain two proposed schemes that have low complexity for MIMO signal detection.Moreover, we introduce the optimal MLD scheme for the purpose of comparison.
2.1.Proposed Decoding Scheme I.The proposed decoding scheme I, which is an extended type of modified PD, consists of two steps.It performs full interference cancellation for only one transmit layer in the first step, and then the LLR values of the residual transmitting layers are simply determined in the second step.As the full search is performed in only one transmit layer, the computational complexity will be significantly reduced.Figure 4 illustrates the block diagram of the proposed decoding scheme.
Before going into the first step, the decoder calculates the signal-to-noise ratio (SNR) of the two transmit layers from channel responses in the receiver side.Because the second layer performs the cancellation with all symbol candidates, the index of the transmit signal with a lower SNR among the PSSs is selected as the second transmit layer.
In the first step, it performs a full interference cancellation of the second layer as follows: where M m is the number of constellations for the mth transmit layer and s m,i is the ith constellation symbol of the  mth layer.The reconstructed symbols for the first transmit layer can be expressed as The scheme then calculates the squared Euclidean distance (ED) and lists up after canceling the reconstructed symbols: For the last part of the first step, we find the index of the minimum squared ED from the list as follows: The idea of the second step is to generate only the neighboring symbols of s 1,p by inversing each bit in s 1,p in order to reconstruct the symbol candidates for the first transmit layer.The additional squared EDs are added to the list.These procedures make C k,l / = ∅ (l = 0, 1) for all the kth bits due to the listed additional squared EDs with the bit inversed symbols in the first layer.Then, it can calculate the LLR information of all bits directly without any additional computation, unlike other conventional suboptimum schemes [9][10][11].
First, in the second step, the scheme takes s 1,p and reconstructs symbol s 1, j with the jth bit inverse as where Q j (•) is the jth bit inverse operator.For example, if s 1,p of the 16 QAM symbol is "0001" in bit representation, s 1,2 will be "0101" by the second bit inversed.The scheme can now calculate new squared EDs after recancellation of the bit inversed symbols: After performing the second step, the scheme has (M 2 + log 2 M 1 ) squared EDs in the list.The LLR of the kth bit for the proposed scheme can be calculated directly using these squared EDs in the list as follows: where C Prop k,l (l = 0, 1) is the set of candidate symbols with the kth bit being fixed to l, of which the size is (M 2 +log 2 M 1 ), and ED 2 is the squared ED vector of these candidate symbol sets.
Note that the proposed scheme can be extended to N t -transmit antennas by taking the zero forcing (ZF) or minimum mean-square error (MMSE) algorithm for the symbol reconstruction of the residual transmit layers in (5) and executing the second step N t − 1 times.

Proposed Decoding Scheme II.
Another proposed scheme performs a full interference cancellation for all layers in parallel.In other words, the proposed scheme II performs only the first step of proposed scheme I in parallel for every transmit layer.Figure 5 illustrates the block diagram of the proposed decoding scheme for the first transmit layer.The following procedure describes the proposed decoding algorithm for the mth transmit layer.
First, the scheme performs a full interference cancellation of the mth transmit layer as where M m is the number of constellations for the mth layer and s m,i is the ith constellation symbol of the mth layer.The reconstructed symbols for the mth residual transmit layer can be expressed as Performed for all candidates symbols for the 2nd transmit layer Squared Euclidean distance  Performed for all candidates symbols for the 1st transmit layer Squared Euclidean distance The scheme then calculates the squared ED and lists up after canceling the reconstructed symbols: The LLR of the kth bit for the mth transmit layer can be calculated directly using these squared EDs in the list as follows: Here, C m k,l (l = 0, 1) is the candidate symbol set for the mth transmit layer satisfying that the kth bit is l, for which the size is M m .Moreover, ED 2  m is the squared ED vector of the candidate symbol sets for the mth transmit layer.
To take all the LLR information from N t transmit layers, these procedures are performed in parallel for every transmit layer.
The BLER performance of the proposed scheme is exactly the same as that of the MLD in the 2 × 2 MIMO antenna configuration (i.e., two PSSs and one RAS) systems.Since the decoding procedure is performed independently at each transmit layer, the computational complexity linearly increases in proportion to the number of transmit antenna N t and its QAM modulation level M. [12].The MLD can detect the desired signal by calculating the minimum squared ED for all possible combinations of symbol set C Nt .The error distance vector e and the detected signal can be expressed as

Optimal MLD Scheme
The LLR of the kth bit can be calculated as follows: where C Nt k,l (l = 0, 1) is the set of N t symbol combinations with the kth bit being fixed to l.
This scheme is optimal and can achieve the best performance for MIMO systems.However, the computational complexity increases exponentially according to the number of transmitting antennas and the number of symbol constellations M.

Complexity Analysis and Comparison
To compare the computational complexity, we consider real multiplication and real addition operations with two PSSs and two receiving antennas in the RAS.Tables 1, 2, and 3 show the complexity analyses for the optimal MLD, the proposed scheme I, and the proposed scheme II per one subcarrier.The symbols in the tables are defined in Section 2.
Table 1: Complexity analysis for the optimal MLD scheme (N t = N r = 2, per one subcarrier).

Operations
Real multiplication Real addition No. of iterations For i = 1: Table 3: Complexity analysis for the proposed scheme II (N t = N r = 2, per one subcarrier).

EURASIP Journal on Wireless Communications and Networking
From these analyses, we can compare the number of operations in various sets of modulation levels.Tables 4 and  5 show the numerical comparisons of real multiplication and real addition per one subcarrier, respectively.
As shown in these tables, when both PSSs transmit 16 QAM signals, the complexity of the proposed scheme I is only 15.75% and 15.29% of the optimal MLD for real multiplication and real addition, respectively.The proposed scheme II requires only about 28% and 27.17% computational complexity as compared to the optimal MLD under the same condition.In addition, when both PSSs transmit 64 QAM signals, the computational complexity of the proposed scheme I is only 3.77% and 3.67% as compared to the optimal MLD in terms of real multiplication and real addition, respectively.Under same condition, the proposed scheme II needs only about 7.22% and 7.02% of the optimal MLD.The relative computational complexity of the proposed schemes decreases significantly as the symbol modulation level increases.

Simulation Results
The performance of the proposed MIMO decoding schemes was evaluated through link-level simulations under the mobile WiMAX specifications.We considered the Vehicular-A channel environment in recommendations ITU-R M.1225 [13] at 60 km/h mobile velocity.We also assumed that there were no correlations between the two PSSs.The convolutional turbo coding (CTC) with code rate r was utilized as the channel coding, for which the maximum number of iterations was eight.The packet size was the same as the CTC block size, which was 144, 216, 288, and 432 bits when the modulation and coding scheme (MCS) level was (QPSK, r = 1/2), (QPSK, 3/4), (16 QAM, 1/2), and (16 QAM, 3/4), respectively.We assumed that the uplink channel response was perfectly known at the RAS, and there were no time/frequency offsets in the system.We also assumed that the power offset between the two PSSs was 0 dB.The data packet was fully loaded in 12 OFDMA symbols per frame; we considered only the partial usage of subchannels (PUSC) [5] mode with a subchannel rotation enabled as a type of subchannelization.The main system parameters of the mobile WiMAX for the simulation are described in Table 6.
For the BLER performance comparison, other conventional spatial multiple access decoding schemes, including the optimal MLD, MMSE nulling, and MMSE-ordered successive interference cancellation (MMSE-OSIC) [14], are involved in our link-level simulations.Figures 6,7,8,and 9 show the BLER performances of the first PSS when the MCS of both PSSs are (QPSK, 1/2), (QPSK, 3/4), (16 QAM, 1/2), and (16 QAM, 3/4).As shown in these results, the proposed scheme I has only maximum 0.2 dB performance degradation from the optimal MLD at BLER of 10 −2 .Specifically in Figure 6, the BLER performance of the proposed algorithm is almost the same as the MLD in EURASIP Journal on Wireless Communications and Networking the case of (16 QAM, 3/4).Moreover, the proposed scheme II has no performance degradation as compared to the optimal MLD for every MCS set.Figures 10 and 11 show the BLER performances of the first PSS when the MCS combinations are {(16 QAM, 1/2), (QPSK, 1/2)} and {(16 QAM, 3/4), (QPSK, 3/4)}, respectively.Here, {A, B} means that the first PSS transmits with A MCS level and the second transmits with B MCS level.As shown in the results, the proposed scheme I has about 1.0 dB  and 0.3 dB performance degradation from the optimal MLD at BLER of 10 −3 , respectively.However, the proposed scheme II, which has about two times complexity of the proposed scheme I, shows almost the same BLER performance of the optimal MLD scheme.We also observe that MMSE nulling and MMSE-OSIC suffer more performance degradation as the code rate increases.

Conclusion
In this paper, we have proposed suitable decoding schemes, which achieve near-optimal ML performance with low computational complexity in uplink virtual MIMO systems that utilize LLR information at the channel decoder.As the proposed schemes almost satisfy the criterion of the WiMAX forum mobile RCT, which is based on the performance of ML, the whole system can achieve a higher margin of implementation loss.The link-level simulation is performed under the assumption of perfect synchronization between two PSSs, and performance of the virtual MIMO system may be decreased in case of any imperfect synchronization.The linklevel performance shows that the proposed schemes have almost the same BLER performance as compared to the optimal MLD.The proposed scheme I has only 15.75% computational complexity in terms of real multiplication as compared to the optimal MLD when both PSSs transmit 16 QAM signals, and only 3.77% for 64 QAM signals.Moreover, the proposed scheme II achieves exactly the same BLER performance of the optimal MLD with only 28% and 7.22% computational complexity of the optimal MLD, when both PSSs transmit 16 QAM and 64 QAM signals, respectively.We expect that there is more significant complexity reduction in systems that transmit with a higher modulation level.

Figure 3 :
Figure 3: Receiver architecture of RAS with two receiving antennas.

Figure 4 :
Figure 4: Block diagram of the proposed decoding scheme I.

Figure 5 :
Figure 5: Block diagram of the proposed decoding scheme II for the first transmit layer.

Figure 6 :Figure 7 :
Figure 6: Comparison of the BLER performance of the first PSS for the proposed and the conventional schemes when both MSs transmit with (QPSK, 1/2).

Figure 8 :
Figure 8: Comparison of the BLER performance of the first PSS for the proposed and the conventional schemes when both MSs transmit with (16 QAM, 1/2).

Figure 9 :Figure 10 :
Figure 9: Comparison of the BLER performance of the first PSS for the proposed and the conventional schemes when both MSs transmit with (16 QAM, 3/4).

Figure 11 :
Figure 11: Comparison of the BLER performance of the first PSS for the proposed and the conventional schemes when the MCS combination is {(16 QAM, 3/4), (QPSK, 3/4)}.
Figure 2: Transmitter architecture of one PSS.

Table 2 :
Complexity analysis for the proposed scheme I (N t = N r = 2, per one subcarrier).

Table 4 :
Complexity comparison for real multiplication operations (N t = N r = 2, per one subcarrier).

Table 5 :
Complexity comparison for real addition operations (N t = N r = 2, per one subcarrier).