Hardware oriented, quasi-optimal detectors for iterative and non-iterative MIMO receivers
- Alessandro Tomasoni^{1},
- Massimiliano Siti^{2},
- Marco Ferrari^{1}Email author and
- Sandro Bellini^{3}
https://doi.org/10.1186/1687-1499-2012-62
© Tomasoni et al; licensee Springer. 2012
Received: 14 May 2011
Accepted: 24 February 2012
Published: 24 February 2012
Abstract
In this article we study hardware-oriented versions of the recently appeared Layered ORthogonal lattice Detector (LORD) and Turbo LORD (T-LORD). LORD and T-LORD are attractive Multiple-Input Multiple-Output (MIMO) detection algorithms, that aim to approach the optimal Maximum-Likelihood (ML) and Maximum-A-Posteriori (MAP) performance, respectively, yet allowing a complexity quadratic in the number of transmitting antennas rather than exponential. LORD and T-LORD are also well suited to a hardware (e.g., ASIC or FPGA) implementation because of their regularity, parallelism, deterministic latency, and complexity. Nevertheless, their complexity is still high in case of high cardinality constellations, such as the 64-QAM foreseen by the 802.11n standard. We show that, when only global latency constraints exist, e.g., a fixed time to detect the whole OFDM symbol, the LORD and T-LORD complexity can be remarkably reduced, still approaching the ML and MAP performance, respectively. Notwithstanding the suboptimal low-complexity and hardware-oriented implementation, LORD and T-LORD approach the EXtrinsic Information Transfer of the ML and MAP detectors, respectively. To focus on a specific setting, we consider the indoor MIMO wireless LAN 802.11n standard, taking into account errors in channel estimation and a frequency selective, spatially correlated channel model.
Keywords
MIMO systems space-frequency BICM OFDM ML and MAP detection Turbo detection EXIT charts1 Introduction
Because of the increasing demand of data rate and link robustness in wireless transmissions, Multiple-Input Multiple-Output (MIMO) technologies are nowadays an indispensable option in the wireless communications standards recently released or under definition, such as IEEE 802.11n [1], WiMax [2], and mobile long term evolution (LTE) [3]. In fact, the capacity of the wireless link grows linearly with the number of transmitting or receiving antennas [4, 5], when spatial diversity is available. In practice, MIMO is often combined with space-frequency bit interleaved coded modulation (BICM) and orthogonal frequency-division multiplexing (OFDM) [1, 2], which ensure that almost uncorrelated channels are experienced by different tones within an OFDM symbol.
To increase the spectral efficiency of the link, the transmitting antennas can be used in layered mode, i.e., each antenna transmits a different symbol in the same bandwidth at the same time. On one side, a sophisticated receiver is needed to solve spatial inter-symbol interference and effectively exploit the theoretical advantages. On the other side, mobile devices must be low power consuming and moderately expensive.
The ideal receiver should consider the likelihood of the received vectors for each possible codeword, jointly performing detection and decoding. This has prohibitive complexity, except for simple space-time codes. In practice, detection and decoding are decoupled, and Soft-Input Soft-Output (SISO) detectors are used in conjunction with SISO decoders in iterative schemes [6], to approximate the ideal receiver through disjoint stages, according to the turbo principle [7]. Turbo detectors exploit the information fed back by the channel decoder as a priori information about the transmitted vectors of symbols. If needed, detector and decoder can be simply applied in cascade, as a special case of turbo detection and decoding without iterations. In the former case, the optimal detector to be used is the Maximum-A-Posteriori (MAP) detector, while in the latter case the MAP detector degenerates into the symbol-by-symbol Maximum-Likelihood (ML) detector, since there is no available a-priori information.
Despite these simplifications, the complexity of the optimal MAP and ML detectors still increases exponentially with N_{ t }N_{ b }, where N_{ t }is the number of transmitting antennas and N_{ b }the modulation order [bits/dimension]. This is why many researchers have sought suboptimal detection strategies, trying to approach the ideal detector with limited complexity, e.g., via (Turbo) MMSE detection [8, 9] or sphere detection [10–14]. The former strategy combines "soft decision" subtraction (soft-DFE) and linear minimum mean square error (MMSE) spatial equalization [8, 9, 15]. This method, although computationally affordable, can lead to largely sub-optimal results, especially when very high spectral efficiencies are sought with large N_{ b }and N_{ t }and high rate codes. Conversely, sphere decoding (SD) detectors [16] reduce the MAP and ML complexity restricting the search to a subset of the whole hyper-symbol constellation. The SD family can be roughly divided in two groups: the depth-first [10] and the breadth-first [11–14, 17–19] SDs. The second family has several advantages, such as fixed latency and lower complexity. However, for a small number of candidates, breadth-first SD leads to a performance degradation. Recently, to improve its behavior in iterative receivers, SD has been combined with MMSE [20]: the linear detector assists the SD, finding a good center of the search sphere and thus improving the performance of the iterative receiver.
One of the most promising proposals is the Layered ORthogonal lattice Detector (LORD) [21, 22], and its iterative version, namely Turbo LORD (T-LORD) [23]. A proposal similar to LORD has been published few years later [24]. LORD detects the ML hyper-symbol, or close, depending on the number of antennas involved. In a similar way, the T-LORD approaches the MAP performance, combining the low-complexity spatial DFE principles of LORD with a simple yet accurate method to handle a priori log-likelihood ratios (LLRs). LORD and T-LORD are particularly suited for hardware, parallel implementation and soft-output bit detection, and perform very well in combination with soft decoders like SOVA [25] or BCJR [26] for convolutional codes, or with an LDPC decoder [27]. Nevertheless, their complexity is still high in case of high cardinality constellations, such as the 64-QAM foreseen e.g., by the 802.11n standard.
In this article, we propose simplified LORD and T-LORD versions, that keep their vocation for hardware implementation, maintaining deterministic complexity (quadratic in the number of transmitting antennas) and flexibility for setting the desired performance-complexity trade-off. Besides, we show that, when only global latency constraints exist, e.g., a fixed time to detect the whole OFDM symbol, the LORD and T-LORD complexity can be remarkably reduced, still approaching the ML and MAP performance, respectively.
In all cases the performance is very good. We show in particular that LORD and T-LORD can perform very close to the ideal MAP detector for at least up to four antennas and for modulation orders of 3 bits per dimension, even in a realistic setting, with imperfect channel state information (CSI) and correlated channels. Furthermore, we show that our detectors exhibit the same EXIT chart behavior as the MMSE-assisted SD [20], yet with a lower complexity, even more reduced w.r.t. [23]. Recently, a strategy that recalls in principle the idea proposed in this article for OFDM tones processing has been proposed in [28], with a different detector.
The article is organized as follows. Section 2 describes the system model. Section 3 recalls the full-complexity LORD and T-LORD algorithms. Section 4 motivates and describes our low-complexity hardware-oriented LORD and T-LORD proposals. Section 5 shows their performance. Section 6 concludes the article.
2 System model
where $\mathbf{x}\in {\u2102}^{{N}_{t}}$ contains the transmitted symbols and $\mathbf{n}\in {\u2102}^{{N}_{r}}$ is an i.i.d. Gaussian complex white noise vector with covariance matrix: R_{ n }= E [nn^{ H }] = N_{0}I. The transmitted M-QAM symbols are uncorrelated, with zero mean and variance ${\sigma}_{X}^{2}=1$, for each transmitting antenna. Therefore, the transmitter Signal-to-Noise-Ratio SNR_{ TX }equals N_{ t }/N_{0}.
The MIMO (frequency selective) channel is represented by $\mathbf{H}\in {\u2102}^{{N}_{r}\times {N}_{t}}$, whose elements h_{ r,t }are the complex (flat) gain of the path between transmitter t and receiver r, at a certain tone. These elements are normalized, i.e., E [|h_{ r,t }|^{2}] = 1, with t = 1, 2,..., N_{ t }and r = 1, 2,..., N_{ r }. More details on channel assumptions are given in Section 5.
Here ϕ_{ t }(n) is the a posteriori LLR of b_{ n }(x_{ t }), computed during the detection process. In the next section we focus on this detection stage.
3 Detectors outline
In this Section, we give a synthetic description of LORD and T-LORD, indispensable to understand the implementation proposed in the successive sections. For more details about LORD and T-LORD and their various versions, we defer the reader to [23]. We also recall the optimal MAP and ML detectors. Rather than practicable solutions they are a reference for the performance of any other detector. The reader interested in other techniques, such as the MMSE and the Sphere, can refer again to [23], where a detailed comparison with the "standard" T-LORD is reported, both in terms of complexity and performance.
3.1 MAP and ML detectors
With no a priori information, i.e., ξ_{ t }(i) = 0, (4) reduces to the ML metric.
The last term in (5) is the (typically very accurate) Max-Log-MAP approximation. As mentioned in the introduction, the number of complex Euclidean distances (ED) and a-priori probabilities to compute, either in case of MAP and ML, as well as of their Max-Log approximations, increases exponentially with N_{ b }N_{ t }.
3.2 LORD detector
without impairing the receiver performance as long as the AWGN is spatially i.i.d., since $\stackrel{\u0303}{\mathbf{n}}$ is still Gaussian with ${\mathbf{R}}_{\stackrel{\u0303}{\mathbf{n}}}={\mathbf{Q}}^{H}\left(t\right){\mathbf{R}}_{\mathbf{n}}\mathbf{Q}\left(t\right)={N}_{0}\mathbf{I}$. The evaluation burden of this phase is negligible if the channel can be supposed constant for several consecutive OFDM symbols, thus we focus on the second stage of the algorithm.
3.3 T-LORD detector
Using the terminology introduced in [15] in the scalar case, (15) and (16) are said to obey the a priori and the distance criteria: T-LORD assumes that the symbol with the smallest (4) is either x^{ A }or x^{ D }.
When N_{ t }> 2, the distance to be minimized is function of two or more variables, hence the solution can not be found through simple slicing. As in the previous section, one can rely on a DFE process, but there is not guarantee that the chosen symbol is actually the closest, because wrong decisions in intermediate layers can happen.
We denote this sequentially chosen hyper-symbol as ${\mathbf{x}}^{D,...,D}\left(t,\stackrel{\u0304}{x}\right)$, to underline the layer by layer application of the distance criterion, over the upper layers. In the same way, even the a-priori criterion can be applied layer by layer, processing blocks of 2N_{ b }LLRs per layer, letting their signs drive the choice of QAM candidates and subtracting their interference from the upper layers, this time without introducing errors.
is the LLR permutation, coherent with matrix Π(t). The K-best algorithm builds ${U}_{t}\left(\stackrel{\u0304}{x}\right)$ as follows. At the first step, the partial set ${U}_{t}^{{N}_{t}}\left(\stackrel{\u0304}{x}\right)$ contains only $\stackrel{\u0304}{x}$. At the k-th step, according to each possible criterion (a priori, distance and possibly flipping), the K-best algorithm expands each candidate in ${U}_{t}^{k+1}\left(\stackrel{\u0304}{x}\right)$ (note that index k decreases while the algorithm proceeds). Only the K best results, out of this expanded set, are retained in the partial set ${U}_{t}^{k}\left(\stackrel{\u0304}{x}\right)$, while the other ones are discarded. At the last step, when k = 1, we declare ${U}_{t}\left(\stackrel{\u0304}{x}\right)={U}_{t}^{1}\left(\stackrel{\u0304}{x}\right)$ and the T-LORD search goes on as explained in Section 3.2.
We remark that (18) can be recursively updated, layer by layer, adding the new partial ED and a priori terms to the previous partial metric, saving computing time and power. A very interesting choice, as shown in the next section, is K = 1. This corresponds to retaining at each layer only the best new candidate. This algorithm can be interpreted as a decision feedback equalizer (DFE), driven by the aforementioned criteria.
One can first compute (12) for all t, then cross data to obtain the improved ${S}_{t}^{\prime}$. This enhancement implies no growth of the number of checked hyper-symbols, but only extra latency and complexity due to additional metric comparisons and selections.
4 Hardware-oriented, low-complexity LORD
This pdf is plotted in Figure 2b (solid blue line), along with other simulated pdfs (dashed red lines), referring to a MIMO system with N_{ t }= N_{ r }= 2 and spatially correlated channel gains. It can be observed that the correlated case is even worse, because the matrix H can easily be ill-conditioned, i.e., with the last diagonal elements of R(t) close to zero.
Even more importantly, the pdfs always exhibit long tails at high standard deviations. This suggests the following interpretation of the SNR behaviour at the root layer: quite often, the noise level is moderate; occasionally it is very high. The square side reduction is expected to be applicable only in the former case, since any approximation in the latter case would likely be harmful. This idea will be made practical in the following section.
4.1 Algorithm description
Aiming at square subsets (see Figure 2a) that include the transmitted symbol with a high probability, square side lengths larger than $\alpha \sqrt{{N}_{0}/2}$ (with α > 1) should be considered. Probably, one could approach the performance of the full complexity LORD with ad-hoc tuning of the parameter α, fundamental to find a good tradeoff between complexity reduction and capability to detect the ML point. Nevertheless, this solution would be sensitive to the parameter α, thus not well suited to any device implementation.
On the contrary we propose an attractive alternative solution, which does not require any parameter to be tuned, once the architecture has been chosen. Indeed, we suggest to perform the full-complexity search over the whole QAM constellation for the carriers affected by the worst-case noise (in the following, "worst carriers" for brevity), limiting the search for all the other ones to a square subset with a fixed number of points. This way, we always reserve the more robust (full-complexity) algorithm to the carriers affected by higher noise powers at the root layer. This significantly reduces the probability that a transmitted point falls outside the reduced square subset of Figure 2a. Clearly, the higher the hardware capability of the device, the higher the fraction of carriers that can be fully spanned. The fundamental hypothesis, at the basis of this solution, is the existence of detection time constraints (measured in number of clock cycles) only at the level of the entire OFDM symbol, and not carrier-by-carrier. This hypothesis seems quite reasonable, since devices for typical applications have to conclude the detection of an entire OFDM symbol within a fixed time, say T_{max}.
Let us focus on the QAM symbols transmitted by the same antenna within an OFDM symbol. Define the parallelism P^{lc} as the minimum number of DFE processors (8) able to exhaustively analyze only the worst ${N}_{c}^{\mathsf{\text{full}}}$ carriers, limiting the search for the rest of the tones within a square subset of cardinality S^{2}, as in Figure 2.
As a special case, when ${N}_{c}^{\mathsf{\text{full}}}={N}_{\mathsf{\text{dc}}}$ we obtain the full-complexity LORD parallelism P^{full}, satisfying ${T}_{\mathsf{\text{elem}}}\u2308{N}_{\mathsf{\text{dc}}}M/{P}^{\mathsf{\text{full}}}\u2309\le {T}_{\text{max}}$.
To efficiently compute not only (8) but also (10), we impose further regularity to the hardware supposing that at each clock cycle the device can work only over a row/column of the square.
clock cycles and, with a reasoning analogous to (25), leads to a smaller number of available fully-spanned carriers, as reported in Figure 3 (dashed lines). Curves in case of tessellation have been plotted only up to S = 6, which is the limit case, since with 6 clock cycles (on average) the detection time is exhausted for performing the low-complexity algorithm over each carrier (i.e., ${N}_{c}^{\mathsf{\text{full}}}=0$). If S > 6, it is not possible to meet the overall OFDM time constraints of our example. Nevertheless, it will be shown in the next section that S = 5 is enough for all cases of practical interest we have tested. This means that the ML performance can be approached with S ⋅ P^{lc}/M = 25/64 ≈ 40% of the original LORD complexity.
4.2 Extension to T-LORD
From the description of LORD and T-LORD outlined in Section 3, it is clear that the LORD algorithm is actually a special case of the T-LORD, when all the a-priori LLRs ξ_{ t }(n) are zero. Indeed, in this case there is no point in applying the a-priori criterion, since any symbol has the same a-priori probability equal to $\frac{1}{M}$. Only the distance criterion makes sense, and its DFE process is actually the same as (8). Finally, having just one meaningful candidate per set ${U}_{t}\left(\stackrel{\u0304}{x}\right)$, also the K-best approach becomes superfluous. To summarize, the distance criterion in the T-LORD works as the LORD algorithm. For this reason we can generalize the LORD hardware-oriented simplification, presented in the previous section, to the T-LORD.
In case of null a-priori information, ${\sigma}_{A}^{2}\left(t\right)={\sigma}_{X}^{2}$ and the square subset choice is practically the same as the LORD one, since ${\sigma}_{X}^{2}=1$ is typically much greater than ${\sigma}_{D}^{2}\left(t\right)$. Conversely, when the a-priori information in high, the received symbol is ignored in the calculus of $\widehat{x}\left(t\right)$, since ${\sigma}_{A}^{2}\left(t\right)$ is small. Finally we remark that (29) and (30) can be efficiently computed with techniques similar to [15].
4.3 Related issues
Another problem is how to efficiently find the square subset of Figure 2a. An efficient solution is to apply, for each dimension, the Euchner-Schnorr "zig-zag" algorithm [29], which determines the symbol closest to the received one or to the estimated $\widehat{x}\left(t\right)$, and alternatively adds points on its left and right, till the boundaries of the constellation or the square subset sizes are not exceeded.
4.4 LORD and T-LORD complexity
EDs, while the MMSE essentially requires 2 points per coded bit (we can exploit the Gray mapping regularity as in [30], "folding" the constellation), i.e., 2N_{ t }log_{2} M on the whole. Though the computation of the number of EDs is only a preliminary tool to evaluate complexity, it reveals that the ML cost is exponential in the number of antennas, practically unaffordable even for small arrays, while the LORD complexity is only quadratic. Analogous considerations can be done in case of iterative detectors.
Number of real operations per tone and iteration (N_{ t }= 4, M = 64)
Multiplications | Additions & Comparisons | ||||
---|---|---|---|---|---|
Detector | Symbols | Fixed | Variable | Fixed | Variable |
SIC-MMSE | 56 | 672 | 372 | 536 | 2077 |
Sphere detector | 6976 | 744 | 4414 | 583 | 19456 |
LC K-best T-LORD | 842 | 848 | 616 | 971 | 22278 |
K-best T-LORD | 2304 | 832 | 1082 | 571 | 28919 |
T-LORD Full | 9984 | 832 | 7994 | 571 | 83495 |
Focusing on the low-complexity K-best T-LORD, we have chosen a reduced square subset of side S = 4, the smallest available parallelism P^{lc} = 4, and a full search over ${N}_{c}^{\mathsf{\text{full}}}=8$ carriers. The square subset center is driven both by the received symbols and the a-priori LLRs. This solution is very attractive, since the loss w.r.t. the full-complexity T-LORD evanishes after some iterations, as shown in the next section. Furthermore, being the square subset 4 × 4, the same hardware could be used also to detect a 16-QAM constellation, with negligible incremental costs. As we can see in Table 1, the low-complexity K-best T-LORD saves less multiplications and additions than those expected looking at the number of computed ED. Indeed, there are additional operations to perform, e.g., (29)-(31) and the identification of the ${N}_{c}^{\mathsf{\text{full}}}$ worst carriers. Also the crossing between candidate sets ${\mathcal{S}}_{t}$ contributes to lower the complexity reduction, since it must be performed within the entire constellation, and not only among points of the square subset. Anyway, the simplified T-LORD greatly benefits from the reduction of the spanned symbols, and almost halves the number of required variable multiplications. Also the number of variable additions is sligthly smaller (the a-priori criterion remains almost unchanged).
To conclude, we remark that not only the device area (i.e., the number of required logic gates) benefits from the simplification proposed in this paper, but also the power consumption, as reported in [31]. E.g., in case of LORD with N_{ t }= N_{ r }= 2 and M = 64, assuming a 65 nm CMOS technology with an 80 MHz clock frequency, the area is reduced from 0.64 mm^{2} to 0.21 mm^{2} and the power from 38 to 14 mW, respectively. Therein, a comparison with prior designs can be found, too.^{d}
5 Simulation results
In this section, we provide performance results, both in terms of extrinsic information delivered by several detectors and Monte Carlo simulation of the receivers embedding them.
We assume two different environments, referred in the following as the "ideal" and "real" one. Firstly, we use a rich scattering channel, whose coefficients h_{r,t}are i.i.d. complex gaussian values with unit power. Perfect CSI at the receiver side is assumed, too. Then we consider a more realistic channel, with exponentially decaying power delay profile (PDP) and a short time delay spread τ_{rms} = 50 ns (equal to the sampling time). Spatial correlation is assumed equal to $r\left({t}_{1},{t}_{2}\right)=0.{5}^{\left|{t}_{1}-{t}_{2}\right|}$, being t_{1} and t_{2} the antenna indexes, sorted in ascending order from a border of the linear array. The perfect CSI hypothesis is abandoned, and substituted by pilot aided tone-by-tone channel estimation (CE): due to the average of subsequent orthogonal long training sequences (LTS), as in [32], each channel tap estimation ${\u0125}_{r,t}$ is affected by i.i.d. Gaussian noise with power ${\sigma}_{CE}^{2}=\frac{{N}_{0}}{{2}^{\u2308{\text{log}}_{2}{N}_{t}\u2309}}$. No time nor frequency smoothing, such as [32, 33], is adopted, since this would have reduced the difference between the ideal and real settlements.
5.1 EXIT charts
In Figure 6, we plot (both in ideal and realistic channel conditions) the EXIT curve of the hardware oriented T-LORD developed so far and, as references, the SIC-MMSE [9] and the MAP.
We choose SNR = 23dB, corresponding to a target packet error rate (PER) close to 10^{-2}. The T-LORD algorithm produces at the output almost the same information as the MAP detector, for any input information I_{ A }. On the contrary, SIC-MMSE is largely suboptimal and is expected to introduce severe losses in an iterative receiver. Monte-Carlo simulation will confirm these predictions.
An interesting choice is S = P^{lc} = 4 with the update enabled (denoted throughout figures as "moving square"), exhibiting an abrupt change of the delivered information, as the a-priori information gets large. When no a-priori information is available, the reduced square positioning is based only on the received, noisy symbol, therefore when the transmitted signal lies outside the reduced square, some extrinsic information is lost. Conversely, when the square positioning can benefit from a priori information, a relevant fraction of extrinsic information is recovered. This closes the gap iteration by iteration, as confirmed by simulation in the following. Similar results hold for the realistic channel.
5.2 Monte Carlo simulations
In this section we provide simulation results for the above described low-complexity LORD detectors, with different square sides and parallelisms. For comparison, we plot also the MAP, MMSE [9] and full-complexity T-LORD [23] curves. Simulations are floating-point. Iterations range from 1 to 4. Iteration 1 means no extrinsic information is available to the detector, i.e., LORD and T-LORD coincide, as well as MAP and ML. Aiming to achieve very high spectral efficiencies, up to 15 bit/carrier, and to test the simplified T-LORD in challenging conditions, we always consider a channel code rate R_{ c }= 5/6, i.e., the most sensitive 64-QAM mode in the 802.11n standard [1].
As we can see, the T-LORD performance is close to the MAP detector, and largely outperforms the MMSE receiver, plagued by ill-conditioned channel matrices. Only the T-LORD with S = P^{lc} = 4 and fixed subset choice at the root layer has a modest loss, say 0.2dB more than the full-complexity T-LORD.
equal to 3dB when N_{ t }= 2 or N_{ t }= 4, and slightly smaller (2.4dB) when N_{ t }= 3, since the 802.11n preamble contains one more LTS than the number of transmitting antennas. The remaining 3dB loss, that one would experience even in case of ideal CE, is due to the severe channel described in Section 2, with an exponentially decaying PDP, short time delay spread and spatial correlation. In this challenging case with less spatial diversity, the MMSE receiver completely fails to improve with iterations, while the full-complexity K-best T-LORD misses the MAP performance by only 0.5 dB in the realistic case. This gap is probably due to error propagation in the DFE process.
Focusing now on LORD detectors, S = P^{lc} = 5 is enough to approach the optimal ML detector performance as the full-complexity LORD does, while in case of S = P^{lc} = 4 LORD suffers some performance degradation. This can be explained by a higher probability that noise overcomes the square subset borders or the soft output generation misses some EDs in (10) or (14). Thus, the former parameters have been chosen for the HDL implementation of LORD [31]. Conversely, focusing on the fourth iteration, the above gap is almost closed by the S = P^{lc} = 4 T-LORD with the square subset center update (as predicted by EXIT charts), performing even better than the case S = P^{lc} = 5, with fixed subset.
To conclude, we defer readers interested in a performance comparison with sphere detectors to [23]. Results show that T-LORD achieves better performance computing less EDs, in any simulated case.
6 Conclusions
In this article, we have proposed innovative hardware-oriented, soft-output LORD and T-LORD algorithms, that can heavily reduce the number of parallel elementary processing units, required to meet the latency constraints in MIMO-OFDM systems, when high-cardinality QAM constellations are deployed. The simplified versions preserve the features of the original algorithms, i.e., fixed complexity, deterministic latency and a remarkable parallel structure. The proposed solution is regular, scalable and does not require any ad hoc parameter tuning, e.g., depending on the experienced average SNR or the actual channel realization.
Besides, the loss in 802.11n systems w.r.t. the ML and MAP detectors is very small (few tenths of dB). We tried several configurations up to 20 bit/carrier (N_{ t }= 4), corresponding to a system throughput of 260 Mb/s, if we consider the 802.11n standard. We also tested the system with very noisy channel estimates, as well as a more realistic channel offering less spatial and frequency diversity, due to correlation. In each case, the simplified LORD and T-LORD showed comforting robustness, outperforming the non-iterative ML and the iterative SIC-MMSE receiver, and always approaching the receiver with the ideal detector. These features make LORD and T-LORD good candidates for VLSI MIMO receivers.
Endnotes
^{a}For brevity, we give a simplified description of the LORD algorithm. For more details, refer to [21] and [22], where a real-domain modified QR decomposition allows to avoid the normalization of the columns of Q. Nevertheless, the low-complexity, hardware-oriented LORD and T-LORD presented in this article can be also applied to that framework, as shown in [31]. ^{b}Obviously this value depends on the hardware, but it is reasonable for an FPGA device (with a clock frequency of tens of MHz) aiming to process in realtime an ODFM symbol lasting 4 μ s and carrying 52 data carries, as in [1]. ^{c}Note that the measure does not refer to the number of N_{ t }-dimensional hyper-symbols, but to the number of spanned QAM symbols throughout the algorithm. ^{d}A fair comparison of area and power consumption is hard to achieve, since many parameters change from one design to the other (e.g., clock frequency, CMOS technology, modulations, antennas, soft-output generation). Nevertheless, in [31] it is shown that LORD provides a very good trade-off in any case. ^{e}E.g., the standardization group for 802.11n chose PER = 10^{-2} for performance comparison purposes. ^{f}A straight performance comparison between systems with a different number of antennas is hard to achieve. The SNR letting the system meet the target PER changes when the number of antennas gets large and its trend is hard to foresee, for at least two reasons. On the one hand, we exploit the capacity growth to increase the throughput, not to strengthen the communication. On the other hand, a larger number of antennas makes the data packet shorter, since more information is conveyed at each channel use: this reduces the PER for a given SNR.
Declarations
Acknowledgements
We warmly thank Eng. Teo Cupaiuolo for the VLSI design of LORD.
Authors’ Affiliations
References
- Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specification-Amendment 5: Enhancements for Higher Throughput 2009.Google Scholar
- WiMAX Forum Mobile System Profile Release 1.0 Approved Specification 2005.Google Scholar
- Evolved universal terrestrial radio access (EUTRA) and evolved universal terrestrial radio access network (EU-TRAN); overall description 2009.Google Scholar
- Telatar E: Capacity of multi-antenna gaussian channels. Eur Trans Telcommun 1999, 10(6):585-595. 10.1002/ett.4460100604View ArticleGoogle Scholar
- Foschini GJ, Gans M: On limits of wireless communications in a fading environment when using multiple antennas. Wirl Pers Commun 1998, 6(3):311-335. 10.1023/A:1008889222784View ArticleGoogle Scholar
- ten Brink S: Convergence of iterative decoding. Electron Lett 1999, 35(10):806-808. 10.1049/el:19990555View ArticleGoogle Scholar
- Berrou C, Glavieux A: Near optimum error correcting coding and decoding: turbo-codes. IEEE Trans Commun 1996, 44(10):1261-1271. 10.1109/26.539767View ArticleGoogle Scholar
- Tuchler M, Singer AC, Koetter R: Minimum mean squared error equalization using a priori information. IEEE Trans Acous Speech Signal Process 2002, 50(3):673-683.View ArticleGoogle Scholar
- Zuyderhoff D, Wautelet X, Dejonghe A, Vandendorpe L: MMSE turbo receiver for space-frequency bit-interleaved coded OFDM,". In Proc IEEE Vehicular Tech Conf. Orlando, FL; 2003:567-571.Google Scholar
- Viterbo E, Boutros J: A universal lattice code decoder for fading channels. IEEE Trans Inf Theory 1999, 45(5):1639-1642. 10.1109/18.771234MathSciNetView ArticleGoogle Scholar
- Wong K, Tsui C, Cheng RSK, Mow M: A VLSI architecture of a k-best lattice decoding algorithm for MIMO channels. In Proc IEEE Int Symp on Circuits and Systems. Scottsdale, AZ; 2002:273-276.Google Scholar
- Baro S, Hagenauer J, Witzke M: Iterative detection of MIMO transmission using a list-sequential (LISS) detector. In Proc IEEE Int Conf Communications. Anchorage, AK; 2003:2653-2657.Google Scholar
- Guo Z, Nisson P: Algorithm and implementation of the k-best sphere decoding for MIMO detection. IEEE J Sel Areas Commun 2006, 24(3):491-503.View ArticleGoogle Scholar
- Radosavljevic P, Cavallaro JR: Soft sphere detection with bounded search for high-throughput MIMO receivers. In Proc Asilomar Conf Signals, Systems, and Computers. Pacific Grove, CA; 2006:1175-1179.Google Scholar
- Tomasoni A, Ferrari M, Gatti D, Osnato F, Bellini S: A low complexity turbo mmse receiver for w-lan mimo systems. In Proc IEEE Int Conf Communications. Istanbul, Turkey; 2006:4119-4124.Google Scholar
- Hochwald BM, ten Brink S: Achieving near-capacity on a multiple-antenna channel. IEEE Trans Commun 2003, 51(3):389-399. 10.1109/TCOMM.2003.809789View ArticleGoogle Scholar
- Li Q, Wang Z: Improved K-best sphere decoding algorithms for MIMO systems. In Proc IEEE Int Symp on Circuits and Systems. Kos, Greece; 2006:1159-1162.Google Scholar
- Barbero LG, Ratnarajah T, Cowan C: A low-complexity soft-MIMO detector based on the fixed-complexity sphere decoder. In Proc IEEE Int Conf On Acoustics, Speech and Signal Processing. Las Vegas, NE; 2008:2669-2672.Google Scholar
- Wu Y, Liu YT, Liao Y, Chang H: Early-pruned k-best sphere decoding algorithm based on radius constraints. In Proc IEEE Int Conf Communications. Beijing, China; 2008:4496-4500.Google Scholar
- Wang L, Xu L, Chen S, Hanzo L: Generic iterative search-centre-shifting k-best sphere detection for rank-deficient SDM-OFDM systems. Electron Lett 2008, 44(8):552-553. 10.1049/el:20083279View ArticleGoogle Scholar
- Siti M, Fitz MP: Layered orthogonal lattice detector for two transmit antenna communications. In Proc Allerton Conference On Communication, Control, And Computing. Monticello, IL; 2005:287-296.Google Scholar
- Siti M, Fitz MP: A novel soft-ouput layered orthogonal lattice detector for multiple antenna communications. In Proc IEEE Int Conf Communications. Istanbul, Turkey; 2006:1686-1691.Google Scholar
- Tomasoni A, Siti M, Ferrari M, Bellini S: Low complexity, quasi-optimal MIMO detectors for iterative receivers. IEEE Trans Wirl Commun 2010, 9(10):3166-3177.View ArticleGoogle Scholar
- Ahn CJ: Parallel detection algorithm using multiple QR decompositions with permuted channel matrix for SDM/OFDM. IEEE Trans Veh Technol 2008, 57(4):2578-2582.View ArticleGoogle Scholar
- Hagenauer J, Hoeher P: A Viterbi algorithm with soft-decision outputs and its applications. In Proc IEEE Global Telecommunications Conf. Dallas, TX; 1989:1680-1686.Google Scholar
- Bahl L, Cocke J, Jelinek F, Raviv J: Optimal decoding of linear codes for minimizing symbol error rate. IEEE Trans Inf Theory 1974, 20(2):284-287.MathSciNetView ArticleGoogle Scholar
- ten Brink S, Kramer G, Ashikhmin A: Design of low-density parity-check codes for modulation and detection. IEEE Trans Commun 2004, 52(4):670-678. 10.1109/TCOMM.2004.826370View ArticleGoogle Scholar
- Baek MS, You Y, Song HK: Combined QRD-M and DFE detection technique for simple and efficient signal detection in MIMO-OFDM systems. IEEE Trans Wirel Commun 2009, 8(4):1632-1638.View ArticleGoogle Scholar
- Agrell E, Eriksson T, Vardy A, Zeger K: Closest point search in lattices. IEEE Trans Inf Theory 2002, 48(8):2201-2214. 10.1109/TIT.2002.800499MathSciNetView ArticleGoogle Scholar
- Tosato F, Bisaglia P: Simplified soft-output demapper for binary interleaved COFDM with application to HIPERLAN/2. In Proc IEEE Int Conf Communications. New York City, NY; 2002:664-668.Google Scholar
- Cupaiuolo T, Siti M, Tomasoni A: Low-complexity and high throughput VLSI architecture of soft-output ML MIMO detector. In Proc Design, Automation & Test in Europe. Dresden, Germany; 2010:1396-1401.Google Scholar
- Tomasoni A, Gallizio E, Bellini S: Low complexity and low latency training assisted channel estimation for MIMO-OFDM systems. In Proc IEEE Personal Indoor and Mobile Radio Conf. Athens, Greece; 2007:1-5.Google Scholar
- Li YG, Seshadri N, Ariyavisitakul S: Channel estimation for OFDM systems with transmitter diversity in mobile wireless channels. IEEE J Sel Areas Commun 1999, 17(3):461-471. 10.1109/49.753731View ArticleGoogle Scholar
- Tüchler M: Convergence prediction for iterative decoding of threefold concatenated systems. In Proc IEEE Global Telecommunications Conf. Taipei, Taiwan; 2002:1358-1362.Google Scholar
- Lechner G, Sayir J, Land I: Optimization of LDPC codes for receiver frontends. In Proc IEEE Int Symp Inform Theory. Seattle, WA; 2006:2388-2392.Google Scholar
- ten Brink S: Convergence behavior of iteratively decoded parallel concatenated codes. IEEE Trans Commun 2001, 49(10):1727-1737. 10.1109/26.957394View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.