Implementation of low-complexity MIMO detector and efficient soft-output demapper for MIMO-OFDM-based wireless LAN systems

In this paper, we describe a simplified soft-output demapper designed to support coded multiple-input multiple-output orthogonal frequency-division multiplexing-based system utilizing only 3-bit soft information. The IEEE 802.11n standard requires relatively high punctured convolutional code rate of R=5/6 for spectrally efficient high-throughput data rate settings. In order to extract soft-bit information effectively without degrading the packet error rate performance, we introduce bit-rounding and effective-bit threshold adjustment techniques to achieve such.


Introduction
In many areas of modern digital communication systems, modulation and coding scheme (MCS) in conjunction with bit-interleaved coded modulation technique has been adopted. The IEEE 802.11a/g [1] and 802.11n [2] standards are such good examples. The bit-interleaved coded modulation scheme brings substantial performance enhancement not only limited to single-input singleoutput but also multiple-input multiple-output (MIMO) antenna system as well. In order to decode multiple layers of transmit signals propagated in the wireless fading channel, the MIMO detector takes the crucial role of extracting soft-bit signals for optimal decoding performance.
On the subject of MIMO detector, the effective performance and hardware implementation complexity are important issues. The maximum likelihood (ML) detector having log-likelihood ratio (LLR) output is known as the optimum detector. Its complexity, however, increases exponentially with the number of transmit antennas and modulation order [3]. On the other hand, applying linear MIMO detectors such as simple zero-forcing (ZF) or minimum mean squared error nulling techniques to extract *Correspondence: hjlee@hansung.ac.kr 2 Department of Information and Communications Engineering, Hansung University, Seoul 136-792, South Korea Full list of author information is available at the end of the article soft bits require much lower implementation complexity, as a result of significant performance trade-off. This lower complexity benefit is appreciated especially when modulation order is high, such as 64-quadrature amplitude modulation (QAM). Considering linear MIMO detectors, some complimentary techniques to compensate receiver performance should be devised in the soft demapper. In addition, the size of soft-bit should be kept as small as possible for low implementation complexity. It is also expected that latency induced by heavy arithmetic operations processed in the channel decoder is proportional to resolution of soft-demapped bit resolution, and therefore, it should be minimized for implementation.
The IEEE 802.11n standard requires punctured convolutional code rate R = 5/6 as one of mandatory high-throughput (HT) data rate settings. In order to effectively extract 3-soft-bit information without causing performance degradation induced by adopting high channel coding rate, we apply bit-rounding and effective-bit threshold adjustment techniques during the extraction of soft-bits at the demapper. The system to be investigated is based on bit-interleaved coded MIMO-orthogonal frequency-division multiplexing (OFDM) which supports IEEE 802.11n version with HT transmission mode up to MCS 15. Low-complexity ZF linear detector with extra receive antenna is assumed prior to the calculation of softdemapped bits in this paper, as a cost-effective solution http://jwcn.eurasipjournals.com/content/2013/1/143 for achieving both low implementation complexity and performance [4]. We use the following notation throughout this paper. The superscripts (·) T , (·) * , and (·) H denote transpose, complex conjugate, and Hermitian operations, respectively. Pr (·) denotes the probability. E [·] stands for expectation. (α) denotes the real part of complex number α.

Receiver system model
The overall receiver block diagram is shown in Figure 1. The three received signals from three antennas are fed into digital amplifiers to adjust the power of the incoming signals to a target value. The digital front-end operations are applied to only two received signal paths out of three available paths to reduce implementation complexity. The power of the input signal is measured, and gain update is calculated in the automatic gain control (AGC) block. Next, DC offset and I/Q imbalance that come from RF components and analog-to-digital conversion (ADC) are compensated in each RX path. The received signals are directed to a channel mixer for +10-and −10-MHz frequency shifting. The input OFDM symbols are stored in the fast Fourier transform (FFT) input buffer, and carrier frequency offset (CFO) is corrected at the input of the FFT block. At the start of data OFDM symbols, residual frequency and phase errors are estimated and corrected using pilot tones in the phase tracking block. After the synchronization process is done, the CFO compensated packets are transformed to the frequency domain by a 128-point radix-2 3 decimation in frequency FFT block. The output data of FFT is transferred to the MIMO detector. Finally, the output signal of MIMO detector is processed in soft demapper.

MIMO signal detection
We apply simple linear ZF nulling scheme to detect symbols for lower implementation complexity. Note that the linear MIMO detector that we consider here does not involve the popular ordering and successive interference cancelation techniques [5]. The received symbol vector r of MIMO system with N T = 2 transmit antennas and N R = 3 receive antennas is given by where r =[ r 0 , r 1 , r 2 ] T represents received symbol column vector, H denotes the 3 × 2 channel matrix which timeinvariant channel per subcarrier is assumed. Element, h ij , of H stands for the channel gain between the i-th receive antenna and the j-th transmit antenna. x =[ x 0 , x 1 ] T is the transmitted symbol vector of two independent streams with total transmit power normalized to unity. Vector n represents additive white Gaussian noise vector generated at the receiver side with variance σ 2 . Assuming channel delay is completely within guard interval, the channel matrix and its detection algorithm can be separated in subcarrier basis. The ZF detection method for de-correlating spatially multiplexed signals ignores the noise enhancement effect. Therefore, ZF filter coefficient matrix W is defined by where the inverting term H H H can be expressed as channel norm and correlation term, where channel norm of transmit antennas are The correlation term, interpreted as interference among transmit signals over the air interface, can be expressed as However, due to fixedpoint implementation complexity and numerical instability issues, division operation by the determinant should be avoided if possible. In fact, it is unnecessary to normalize the estimated transmit symbols during the matrix inversion step, but normalization can be deferred to the soft demapper. Then, the scaled ZF filter coefficient matrix W can be rewritten as Next, the filter coefficient matrix W is multiplied by received signal vector r to estimate the transmitted signal vectorx. The estimated symbol vectorx is a scaled version of the transmitted signal, and it is scaled exactly by the determinant . The resulting estimated transmit vectorx can be written aŝ As observed, note that noise vector n is boosted by W H , and covariance matrix of noise vector is R = σ 2 H H H −1 . In other words, the noise is generally correlated [6] after ZF equalization. For detection of single-spatial-stream-based signals such as IEEE 802.11a or 802.11n HT 1 spatial stream (i.e., SIMO) frames, estimated symbol is derived from simply redefining w and to following equations: where vector w is simply channel impulse response per subcarrier per receive antennas and is the channel norm. The transmitted symbol estimation method in this case is simply maximal ratio combining.
Knowing the theoretically mapped point of estimated QAM signals, we consider the scaled effect of at the soft demapper during the extraction of soft-bits. This will be described in the next section.

Soft-output demapper
At the demapper, the extraction algorithm of soft-output bits in [7] is complex so that sub-optimal solutions [8] need to be adopted. As extraction of soft-bit information is concerned, we first review conventional soft-demapping techniques briefly and then focus on simplified suboptimal approach for generating multi-level modulation cases.

Log-likelihood ratio-based bit metric
The optimum bit metric for Viterbi decoding is given by LLR. The bit metric given by LLR of the estimated k-th subcarrier symbol at the j-th bit can be defined as where i indicates spatial stream. S 0 represents the subset of 1 2 M N T vectors s for which the j-th bit of the corresponding symbol is equal to bit 0. The above metric can be simplified as max-sum approximation which eliminates calculation of logarithm [9].
As the above bit metric is the procedure for exact soft ML MIMO detection, total computation of M N T × N R Euclidean distances ar required.

ZF equalizer output-based bit metric
As implementation complexity is concerned, ZF equalized output can be used to extract soft-bits instead of using maximum likelihood LLR bit metric described above. The simplified ZF equalizer output-based soft demapper bit metric can be expressed as where x i is the ZF equalized output in Eq. trade-off between complexity and performance. Note that at high signal-to-noise ratio (SNR) region, Eq. (10) is a piecewise linear function of real or imaginary part of ZF equalized outputx, as suggested in [8]. Furthermore, much simpler bit metric algorithm is suggested by [9] where complexity of the demapper is maintained at almost the same level for all the multi-level modulation modes.

Noise power weighting-based bit metric
As ZF MIMO detector disregards the noise enhancement effect, the boosted colored noise term is considered during generation of soft-bit information at the demapper. In Eq. (5), we see that the covariance matrix of noise vector n is affected by ZF filter matrix W. In Eq. (5), the variance σ 2 of nulling vector and noise vector product can be expressed as |w 00 | 2 + |w 10 | 2 + |w 20 | 2 σ 2 |w 01 | 2 + |w 11 | 2 + |w 21 | 2 σ 2 .
Thus, by multiplying the inverse of norm of the ZF filter coefficient 1/ W H i 2 to the estimated symbolx i , the exact variance of colored noise affected to soft-bit information is reflected on the channel decoder, delivering enhanced channel decoding performance. Soft-bit weighting is especially crucial for coded multicarrier OFDM systems, since colored noise boosted subcarriers suffer low signal-to-interference noise ratio. Since colored noise is always present for all possible cases unless SNR approaches infinity, an estimated symbol having small W H i 2 value (i.e., high SNR) can be said to be more reliable than those with high W H i 2 values. Note that i refers to i-th row vector here. Then, the simplified noise power weight soft-bit LLR metric related to real part in case of 64-QAM modulation is defined as Sincex i and ||W i || 2 are both scaled, soft-bit weighting of noise enhancement should be / ||W i || 2 instead of 1/ ||W i || 2 , and their scaling effect is effectively gone. In summary, only one division operation is necessary to normalize the estimated transmit signalx i . In addition, we found that the ratio of / ||W i || 2 is typically in the range  Figure 3 Effective 3-bit extraction process.
of 0 to 8. Therefore, only 3 bits are enough to express the colored noise weighting as well as normalization factor.

Hardware architecture
The hardware architecture of soft demapper is designed such that they are consisted of two functional parts, as illustrated in Figure 2. First half block, during clock delay intervals 1 to 5, is designated for calculating exact softbit values, and the other half, at delay 6, is specialized for effective 3-bit extraction process. Before MIMO detector output signals are processed, amplitude tracking value is multiplied to the scaled estimated symbolx i in consideration of channel variation over time, x 0 = amp0_trk ·x 0 , where "ampi_trk" is amplitude tracking result of the i-th spatial stream. This tracking coefficient reflects variation of average magnitude of pilot tones allocated to every data field OFDM symbols. Although time selectivity of wideband nomadic systems such as wireless local area network (WLAN) is considered negligible, average channel power may change for long packet format in case of long packet aggregation feature is enabled.
After the multiplication of incoming ZF equalized signal by amplitude tracking result amp0_trk, 6 least significant bits (LSBs) corresponding to floating point are cut off, making scaled signal x 0 · amp0_trk as 20 bits. During that time, higher order modulation threshold values corresponding to decision boundary of 16-QAM and 64-QAM signals are calculated to let soft demapper extract sub-optimal LLR values. Note that these decision boundaries are scaled by .
As shown in Figure 2, "ratefield_rx" control signal is the indication of the modulation order of incoming symbols. For OFDM-based IEEE 802.11 WLAN systems, this  signal is set to either 16-QAM or 64-QAM after the detection of transmission rate indicated in the Legacy/HT signal field. At processing delay time 4, clipping operation is performed since any multiplication size larger than 20 bits involves typically more than one built-in multiplier block in field programmable gate array (FPGA) implementation. Thus, saturation process is necessary to lower complexity. It takes the total of five clock time delays to acquire the soft-bits. Finally, normalization factor / ||W i || 2 is multiplied, as indicated in Eq. (13), before the 3-bit extraction process (Table 1).

Three-bit extraction/quantization
Based on colored noise power weight bit metric discussed in the previous section, the soft-bit calculation procedure (Eq. 13) is a piecewise linear function [9] which is simple to implement. In contrast, the 3-bit quantization process, however, is a non-linear function. During the extraction, the magnitude of soft-bit values are analyzed for a given specific range.
The bit metric output signal, as shown in Figure 3, is first saturated to one of 9-, 10-, 11-, or 12-bit signal which length of saturation depends on register control value "wgt_max_dmp". We can set this as a fixed value since normalization results E[ ||x i || 2 ] = 1 for BPSK signals. Next, LSBs of the saturated signal are removed; the number of LSBs depends on pre-determined register control value also. This is the effective-bit threshold adjustment technique. Consequently, what is left is LLR with three effective bits.
At this point, 3-bit LLR is examined whether its absolute value is greater than or equal to 3. If it holds true, the final 3-bit output is set to either 3 or -3, limiting 3-bit signed number to seven levels instead of eight (i.e., from -4 to 3). If not, bit-rounding scheme, equivalent to adding 0.5 bit, is applied to mitigate problems related to negative-value biased signal due to fixed-point quantization effect on 2's compliment conversion. We later find that this biased signal affects the soft-input Viterbi decoding performance significantly, especially for high code rate R = 5/6. Note that the above procedure can be applied similarly to finding the 3-bit value of / ||W i || 2 . In this case, b 0 of Figure 3 can be replaced with | − ||W i || 2 |, ≥ ||W i || 2 or | − ||W i || 2 |/4, < ||W i || 2 since both and ||W i || 2 are positive values.

Simulation results
In this section, we present the fixed-point simulation results of implemented soft demapper applied to the IEEE 802.11n-based 2 × 3 MIMO on Viterbi decoder with 3-bit LLR input. The simulation results are based on bit and cycle true synchronized clock-based C/RTL codes targeted for the IEEE 802.11n system implemented/verified with FPGA technology, as shown in Figure 4.
We apply exponentially decaying 50-ns root mean square (rms) delay spread channel model. The channel sampling rate is adjusted to 80 Msps. All simulation cases have frequency/time offset of 40 ppm, introduced by TX/RX carrier frequency mismatch and TX/RX sampling mismatch from ADC converter, respectively. Also, RF impairments are included: RAPP power amplifier with 10-dB backoff and phase noise with a pole-zero model. All real system impairments mentioned above are compensated with AGC, carrier frequency offset compensation, time synchronization, and phase tracking algorithms. Channel estimation is done by capturing the frequencydomain tone-interleaved per transmit antenna version of long sequence preambles defined in [2] and saving them as H in 3 × 2 matrix form in the MIMO detector. Simulation parameters are given in Table 2, and packet size is fixed to 1,000 bytes, as suggested in [10].
As shown in Figure 5, the packet error rate (PER) of various soft-demapping schemes are plotted as a function of average SNR per received antenna. Here, floating-point simulation means such that not only soft demapper but all arithmetic calculations including entire RX front-end are based on floating-point operation. Out of 16 MCS modes, four transmission rates are chosen for analyzing fixed-point effect of soft demapper in comparison to floating-point simulation results. From MCS 12 to 14, performance gap between floating point and fixed point due to quantization error is kept minimum with proposed effective 3-bit extraction soft demapper whereas conventional eight-level quantization scheme consistently has SNR loss about 0.5 dB. On top of this, PER error floor is observed in MCS 14 and 15. In contrast, error floor is successfully eliminated by applying proposed fixed-point soft demapper. The low-complexity soft demapper enables the MIMO-OFDM-based IEEE 802.11n WLAN system to achieve its peak data rate of 270 Mbps with packet error ratio of 1% at SNR 31 dB.

Conclusions
We have proposed a linear MIMO detector-based softdemapping metric as well as its hardware architecture that is simple to implement. With a combining technique of limiting 3-bit effective quantization to seven-level bitrounding and effective-bit threshold adjustment, a considerable gain can be realized in coded MIMO-OFDM-based system in high data rate transmission modes, especially for high code rates. The proposed soft-bit demapper has been tested/verified with Xilinx Virtex II XC2V8000 FPGAs operating at 80 MHz.