Hardware oriented, quasi-optimal detectors for iterative and non-iterative MIMO receivers

Tomasoni, Alessandro; Siti, Massimiliano; Ferrari, Marco; Bellini, Sandro

doi:10.1186/1687-1499-2012-62

Research
Open access
Published: 24 February 2012

Hardware oriented, quasi-optimal detectors for iterative and non-iterative MIMO receivers

Alessandro Tomasoni¹,
Massimiliano Siti²,
Marco Ferrari¹ &
…
Sandro Bellini³

EURASIP Journal on Wireless Communications and Networking volume 2012, Article number: 62 (2012) Cite this article

2650 Accesses
6 Citations
Metrics details

Abstract

In this article we study hardware-oriented versions of the recently appeared Layered ORthogonal lattice Detector (LORD) and Turbo LORD (T-LORD). LORD and T-LORD are attractive Multiple-Input Multiple-Output (MIMO) detection algorithms, that aim to approach the optimal Maximum-Likelihood (ML) and Maximum-A-Posteriori (MAP) performance, respectively, yet allowing a complexity quadratic in the number of transmitting antennas rather than exponential. LORD and T-LORD are also well suited to a hardware (e.g., ASIC or FPGA) implementation because of their regularity, parallelism, deterministic latency, and complexity. Nevertheless, their complexity is still high in case of high cardinality constellations, such as the 64-QAM foreseen by the 802.11n standard. We show that, when only global latency constraints exist, e.g., a fixed time to detect the whole OFDM symbol, the LORD and T-LORD complexity can be remarkably reduced, still approaching the ML and MAP performance, respectively. Notwithstanding the suboptimal low-complexity and hardware-oriented implementation, LORD and T-LORD approach the EXtrinsic Information Transfer of the ML and MAP detectors, respectively. To focus on a specific setting, we consider the indoor MIMO wireless LAN 802.11n standard, taking into account errors in channel estimation and a frequency selective, spatially correlated channel model.

1 Introduction

Because of the increasing demand of data rate and link robustness in wireless transmissions, Multiple-Input Multiple-Output (MIMO) technologies are nowadays an indispensable option in the wireless communications standards recently released or under definition, such as IEEE 802.11n [1], WiMax [2], and mobile long term evolution (LTE) [3]. In fact, the capacity of the wireless link grows linearly with the number of transmitting or receiving antennas [4, 5], when spatial diversity is available. In practice, MIMO is often combined with space-frequency bit interleaved coded modulation (BICM) and orthogonal frequency-division multiplexing (OFDM) [1, 2], which ensure that almost uncorrelated channels are experienced by different tones within an OFDM symbol.

To increase the spectral efficiency of the link, the transmitting antennas can be used in layered mode, i.e., each antenna transmits a different symbol in the same bandwidth at the same time. On one side, a sophisticated receiver is needed to solve spatial inter-symbol interference and effectively exploit the theoretical advantages. On the other side, mobile devices must be low power consuming and moderately expensive.

The ideal receiver should consider the likelihood of the received vectors for each possible codeword, jointly performing detection and decoding. This has prohibitive complexity, except for simple space-time codes. In practice, detection and decoding are decoupled, and Soft-Input Soft-Output (SISO) detectors are used in conjunction with SISO decoders in iterative schemes [6], to approximate the ideal receiver through disjoint stages, according to the turbo principle [7]. Turbo detectors exploit the information fed back by the channel decoder as a priori information about the transmitted vectors of symbols. If needed, detector and decoder can be simply applied in cascade, as a special case of turbo detection and decoding without iterations. In the former case, the optimal detector to be used is the Maximum-A-Posteriori (MAP) detector, while in the latter case the MAP detector degenerates into the symbol-by-symbol Maximum-Likelihood (ML) detector, since there is no available a-priori information.

Despite these simplifications, the complexity of the optimal MAP and ML detectors still increases exponentially with N_tN_b, where N_tis the number of transmitting antennas and N_bthe modulation order [bits/dimension]. This is why many researchers have sought suboptimal detection strategies, trying to approach the ideal detector with limited complexity, e.g., via (Turbo) MMSE detection [8, 9] or sphere detection [10–14]. The former strategy combines "soft decision" subtraction (soft-DFE) and linear minimum mean square error (MMSE) spatial equalization [8, 9, 15]. This method, although computationally affordable, can lead to largely sub-optimal results, especially when very high spectral efficiencies are sought with large N_band N_tand high rate codes. Conversely, sphere decoding (SD) detectors [16] reduce the MAP and ML complexity restricting the search to a subset of the whole hyper-symbol constellation. The SD family can be roughly divided in two groups: the depth-first [10] and the breadth-first [11–14, 17–19] SDs. The second family has several advantages, such as fixed latency and lower complexity. However, for a small number of candidates, breadth-first SD leads to a performance degradation. Recently, to improve its behavior in iterative receivers, SD has been combined with MMSE [20]: the linear detector assists the SD, finding a good center of the search sphere and thus improving the performance of the iterative receiver.

One of the most promising proposals is the Layered ORthogonal lattice Detector (LORD) [21, 22], and its iterative version, namely Turbo LORD (T-LORD) [23]. A proposal similar to LORD has been published few years later [24]. LORD detects the ML hyper-symbol, or close, depending on the number of antennas involved. In a similar way, the T-LORD approaches the MAP performance, combining the low-complexity spatial DFE principles of LORD with a simple yet accurate method to handle a priori log-likelihood ratios (LLRs). LORD and T-LORD are particularly suited for hardware, parallel implementation and soft-output bit detection, and perform very well in combination with soft decoders like SOVA [25] or BCJR [26] for convolutional codes, or with an LDPC decoder [27]. Nevertheless, their complexity is still high in case of high cardinality constellations, such as the 64-QAM foreseen e.g., by the 802.11n standard.

In this article, we propose simplified LORD and T-LORD versions, that keep their vocation for hardware implementation, maintaining deterministic complexity (quadratic in the number of transmitting antennas) and flexibility for setting the desired performance-complexity trade-off. Besides, we show that, when only global latency constraints exist, e.g., a fixed time to detect the whole OFDM symbol, the LORD and T-LORD complexity can be remarkably reduced, still approaching the ML and MAP performance, respectively.

In all cases the performance is very good. We show in particular that LORD and T-LORD can perform very close to the ideal MAP detector for at least up to four antennas and for modulation orders of 3 bits per dimension, even in a realistic setting, with imperfect channel state information (CSI) and correlated channels. Furthermore, we show that our detectors exhibit the same EXIT chart behavior as the MMSE-assisted SD [20], yet with a lower complexity, even more reduced w.r.t. [23]. Recently, a strategy that recalls in principle the idea proposed in this article for OFDM tones processing has been proposed in [28], with a different detector.

The article is organized as follows. Section 2 describes the system model. Section 3 recalls the full-complexity LORD and T-LORD algorithms. Section 4 motivates and describes our low-complexity hardware-oriented LORD and T-LORD proposals. Section 5 shows their performance. Section 6 concludes the article.

2 System model

Consider a MIMO communication system with N_tantennas at the transmitter side and N_r≥ N_tat the receiver side. To focus on a practical application, we adopt many of the parameters from the 802.11n standard [1]. At the transmitter, each Wireless LAN packet carrying 390 data bytes is encoded by a 64-states binary convolutional encoder, space-frequency interleaved and Gray-mapped onto an M-QAM constellation. An OFDM modulator, for each spatial stream, splits the overall frequency band (20 MHz) in N_f= 64 sub-bands (tones), out of which N_dc = 52 are data carriers. The block diagram in Figure 1 summarizes the main operations applied to a packet, when N_t= 2 and N_r= 2. The extension to more than two antennas is straightforward.

The OFDM format allows to separately consider each tone. Therefore the dependency on the carrier index is omitted in the article, and all the equations refer to a single OFDM tone. The received signal $y \in ℂ^{N_{r}}$ reads

y = H x + n

(1)

where $x \in ℂ^{N_{t}}$ contains the transmitted symbols and $n \in ℂ^{N_{r}}$ is an i.i.d. Gaussian complex white noise vector with covariance matrix: R_n= E [nn^H] = N₀I. The transmitted M-QAM symbols are uncorrelated, with zero mean and variance $σ_{X}^{2} = 1$ , for each transmitting antenna. Therefore, the transmitter Signal-to-Noise-Ratio SNR_TXequals N_t/N₀.

The MIMO (frequency selective) channel is represented by $H \in ℂ^{N_{r} \times N_{t}}$ , whose elements h_r,tare the complex (flat) gain of the path between transmitter t and receiver r, at a certain tone. These elements are normalized, i.e., E [|h_r,t|²] = 1, with t = 1, 2,..., N_tand r = 1, 2,..., N_r. More details on channel assumptions are given in Section 5.

At the receiver (see Figure 1), after OFDM demodulation the symbol vector y is passed to the detector, which computes the LLRs of the coded bits λ_t(n), with t = 1, 2,..., N_tand n = 1, 2,..., 2N_b, where N_b= log₂(M)/2 is the number of bits per dimension. The soft values are de-interleaved and passed to a decoder. In case of serially concatenated detection and decoding, the decoder, e.g., a Viterbi one, outputs hard bits. Conversely, in case of turbo detection and decoding, the SISO decoder, such as the BCJR [26] or the SOVA [25], also outputs the extrinsic information used as a priori LLR ξ_t(n) at the detector (after interleaving)

ξ_{t} (n) = ln \frac{P (b_{n} (x_{t}) = 0)}{P (b_{n} (x_{t}) = 1)}

(2)

where b_n(x_t) ∈ {0,1} is the i-th bit of symbol x_t. Taking advantage of the soft information fed back by the decoder, at each iteration the detector produces more reliable extrinsic information to be passed to the decoder itself:

λ_{t} (n) = ϕ_{t} (n) - ξ_{t} (n) .

(3)

Here ϕ_t(n) is the a posteriori LLR of b_n(x_t), computed during the detection process. In the next section we focus on this detection stage.

3 Detectors outline

In this Section, we give a synthetic description of LORD and T-LORD, indispensable to understand the implementation proposed in the successive sections. For more details about LORD and T-LORD and their various versions, we defer the reader to [23]. We also recall the optimal MAP and ML detectors. Rather than practicable solutions they are a reference for the performance of any other detector. The reader interested in other techniques, such as the MMSE and the Sphere, can refer again to [23], where a detailed comparison with the "standard" T-LORD is reported, both in terms of complexity and performance.

3.1 MAP and ML detectors

A MAP detector accepts the received vector y and the a priori information, coming from the decoder, and evaluates the probability of all possible transmitted vectors x. These two contributions can be easily identified in the following metric:

φ (x) = - \frac{{∥y - H x∥}^{2}}{N_{0}} + \sum_{t = 1}^{N_{t}} \sum_{i = 1}^{2 N_{b}} b_{i} (x_{t}) ξ_{t} (i)

(4)

With no a priori information, i.e., ξ_t(i) = 0, (4) reduces to the ML metric.

Equation (4) is the basis for the computation of a posteriori LLRs

ϕ_{t} (n) = ln \frac{\sum_{x : b_{n} (x_{t}) = 0} exp φ (x)}{\sum_{x : b_{n} (x_{t}) = 1} exp φ (x)} \approx max_{x : b_{n} (x_{t}) = 1} φ (x) - max_{x : b_{n} (x_{t}) = 0} φ (x)

(5)

The last term in (5) is the (typically very accurate) Max-Log-MAP approximation. As mentioned in the introduction, the number of complex Euclidean distances (ED) and a-priori probabilities to compute, either in case of MAP and ML, as well as of their Max-Log approximations, increases exponentially with N_bN_t.

3.2 LORD detector

The LORD algorithm is composed of two stages.^a The former is a pre-processing, common to several received sequences y, as long as the channel can be supposed constant for several OFDM symbols. It consists in N_tQR factorizations of the channel matrix H, with permuted column orders:

Q (t) R (t) = H Π (t)

(6)

where $Π (t) = [u_{1} \dots u_{t - 1} u_{t + 1} \dots u_{N_{t}} u_{t}]$ is the permutation matrix which moves the t-th column of H in the last position. Thus, the received symbol and the system model can be rewritten as

\tilde{y} (t) = Q {(t)}^{H} y = R Π {(t)}^{H} x + Q {(t)}^{H} n = R \tilde{x} (t) + \tilde{n} (t)

(7)

without impairing the receiver performance as long as the AWGN is spatially i.i.d., since $\tilde{n}$ is still Gaussian with $R_{\tilde{n}} = Q^{H} (t) R_{n} Q (t) = N_{0} I$ . The evaluation burden of this phase is negligible if the channel can be supposed constant for several consecutive OFDM symbols, thus we focus on the second stage of the algorithm.

For each permutation Π(t), the LORD algorithm explores every possible transmitted QAM symbol x_t, moved in the lowest position of $\tilde{x} (t)$ , called "root layer" from now on. For each hypothesis $x_{t} = {\tilde{x}}_{N_{t}} = \tilde{x}$ , the algorithm subtracts its interference over the upper layers, chooses the closest symbol over the new interference-free layer (through a simple slicing) and iterates this process, up to $ỹ_{1} (t)$ . In formulas:

{\hat{x}}_{j} (t, \bar{x}) = {\begin{array}{l} \bar{x}, j = N_{t} \\ \underset{x}{\arg \min} | {\tilde{y}}_{j} (t) - r_{j, j} (t) x - \sum_{i = j + 1}^{N_{t}} r_{j, i} (t) {\hat{x}}_{i} (t, \bar{x}) |, j < N_{t} \end{array}

(8)

Finally, the algorithm builds N_tdifferent sets, back to the "original" order:

S_{t} = {Π (t) \hat{x} (t, \bar{x}), \forall \bar{x}}

(9)

and performs the Log-Likelihood search only over the elements in (9):

λ_{t}^{m} \approx min_{x \in S_{t} : b_{m} (x_{t}) = 1} \frac{{∥\tilde{y} (t) - R (t) x∥}^{2}}{N_{0}} - min_{x \in S_{t} : b_{m} (x_{t}) = 0} \frac{{∥\tilde{y} (t) - R (t) x∥}^{2}}{N_{0}}

(10)

If N_t= 2, it can be shown that the set $S_{t}$ contains, for each possible bit of x_t, the closest hyper-symbol x having b_m(x_t) equal to one or zero. Thus, the algorithm has the same performance of the Max-Log ML detector. On the contrary, if N_t> 2, this is not assured, due to possible error propagation in the decision feedback equalization (DFE) (8). This sub-optimal behavior can be mitigated crossing many sets $S_{j}$ , i.e., letting a hyper-symbol $x \in S_{t}$ be replaced by the candidate $x^{'} \in S_{j}$ , if its ED is smaller and $x_{t}^{'} = x_{t} = \bar{x}$ :

{S^{'}}_{t} = {\underset{x^{'} \in S_{j}, \forall_{j} : {x^{'}}_{t} = \bar{x}}{\arg \min} {‖ \tilde{y} (t) - R (t) Π^{H} (t) x^{'} ‖}^{2}, \forall \bar{x}}

(11)

3.3 T-LORD detector

Turbo-LORD is a generalization of LORD, able to manage a-priori information. Basically, (9) is replaced by

S_{t} = \{arg max_{x \in U_{t} (\bar{x})} φ (x), \forall \bar{x}\}

(12)

where the metric φ (x) is the same as (4) and all the elements in the same subset $U_{t} (\bar{x})$ share the same root symbol $\bar{x}$ :

x \in U_{t} (\bar{x}) \Rightarrow x_{t} = \bar{x}

(13)

Thus, the a-posteriori LLR (5) is eventually approximated as

ϕ_{t} (n) \approx max_{x \in S_{t} : b_{m} (x_{t}) = 0} φ (x) - max_{x \in S_{t} : b_{m} (x_{t}) = 1} φ (x)

(14)

The cardinality of $U_{t} (\bar{x})$ is selected according to the desired complexity-performance trade-off. E.g., one could consider just two hyper-symbols, the one with with the highest a-priori probability, and the one with the smallest ED:

x^{A} (t, \bar{x}) = {x : (x_{t} = \bar{x}) and (b_{i} (x_{j}) = sign(ξ_{j} (i)) \forall_{i}, \forall_{j} \neq t)}

(15)

x^{D} (t, \bar{x}) = arg min_{x : x_{t} = \bar{x}} {∥y^{'} - R x^{'}∥}^{2}

(16)

U_{t} (\bar{x}) = {x^{A} (t, \bar{x}), x^{D} (t, \bar{x})}

(17)

Using the terminology introduced in [15] in the scalar case, (15) and (16) are said to obey the a priori and the distance criteria: T-LORD assumes that the symbol with the smallest (4) is either x^Aor x^D.

When N_t> 2, the distance to be minimized is function of two or more variables, hence the solution can not be found through simple slicing. As in the previous section, one can rely on a DFE process, but there is not guarantee that the chosen symbol is actually the closest, because wrong decisions in intermediate layers can happen.

We denote this sequentially chosen hyper-symbol as $x^{D, . . ., D} (t, \bar{x})$ , to underline the layer by layer application of the distance criterion, over the upper layers. In the same way, even the a-priori criterion can be applied layer by layer, processing blocks of 2N_bLLRs per layer, letting their signs drive the choice of QAM candidates and subtracting their interference from the upper layers, this time without introducing errors.

In [23], other criteria have been proposed to choose the hyper-symbols in $U_{t} (\bar{x})$ , e.g., to take into account not only the most probable a-priori symbol, but also the second one $x^{F} (t, \bar{x})$ , that can be easily computed by flipping the weakest a-priori LLR. Furthermore, there is no need for applying the same criterion at each layer: they can be mixed, retaining only the K candidates with the best Partial A-Posteriori Probability (PAPP)

φ ({x^{'}}_{k \div N_{t}} (t)) = - \frac{{∥{y^{'}}_{k \div N_{t}} (t) - {R^{'}}_{k \div N_{t}, k \div N_{t}} (t) {x^{'}}_{k \div N_{t}} (t)∥}^{2}}{N_{0}} + \sum_{j = k}^{N_{t}} \sum_{i = 1}^{2 N_{b}} b_{i} ({x^{'}}_{j} (t)) ξ_{t^{'} (j, t)} (i)

(18)

where k ÷ N_tstands for k,k + 1,...,N_tand

t^{'} (j, t) = \{\begin{matrix} j & j < t \\ N_{t} & j = t \\ j - 1 & j > t \end{matrix}

(19)

is the LLR permutation, coherent with matrix Π(t). The K-best algorithm builds $U_{t} (\bar{x})$ as follows. At the first step, the partial set $U_{t}^{N_{t}} (\bar{x})$ contains only $\bar{x}$ . At the k-th step, according to each possible criterion (a priori, distance and possibly flipping), the K-best algorithm expands each candidate in $U_{t}^{k + 1} (\bar{x})$ (note that index k decreases while the algorithm proceeds). Only the K best results, out of this expanded set, are retained in the partial set $U_{t}^{k} (\bar{x})$ , while the other ones are discarded. At the last step, when k = 1, we declare $U_{t} (\bar{x}) = U_{t}^{1} (\bar{x})$ and the T-LORD search goes on as explained in Section 3.2.

We remark that (18) can be recursively updated, layer by layer, adding the new partial ED and a priori terms to the previous partial metric, saving computing time and power. A very interesting choice, as shown in the next section, is K = 1. This corresponds to retaining at each layer only the best new candidate. This algorithm can be interpreted as a decision feedback equalizer (DFE), driven by the aforementioned criteria.

Finally, though the above enhancements proved to be effective, it can anyhow happen that the above generalized DFE process fails, missing the correct detection at some intermediate layer. However, when S_jfor another transmitted symbol is computed, the distance and the a priori criteria may select some symbols x with $x_{t} = \bar{x}$ in the upper layers with a better metric (4). So, in analogy with (11), one can augment S_tas follows:

S_{t}^{'} = \{arg max_{x \in U_{t} (\bar{x}) \lor x \in S_{j \neq t} : x_{t} = \bar{x}} φ (x), \forall \bar{x}\}

(20)

One can first compute (12) for all t, then cross data to obtain the improved $S_{t}^{'}$ . This enhancement implies no growth of the number of checked hyper-symbols, but only extra latency and complexity due to additional metric comparisons and selections.

4 Hardware-oriented, low-complexity LORD

As we are going to show, the number of QAM symbols that LORD must check is largely affected by M, the cardinality of the constellation, which can even reach hundreds of points. Therefore, we aim to reduce the number of candidates in $S_{t}$ , exploring only a subset of the QAM constellation at the root layer. Thus, the hyper-symbol span described in (8) is performed only with the root belonging to this reduced set of QAM symbols. Trying to preserve the regularity and parallelism typical of the full-complexity LORD, we restrict our attention to square subsets of the QAM constellation, centered in the received (equalized) signal $ỹ_{N_{t}} (t) / r_{N_{t}, N_{t}} (t)$ , at the root layer (red cross in Figure 2a).

The performance of the detector depends on the probability that the transmitted symbol does not belong to the square QAM subset, since when this happens, the LORD algorithm fails with high probability. To describe how this border violation behaves, we keep the QAM constellation size fixed and properly re-scale the noise power. After the pre-processing operations in (7), the channel gain $r_{N_{t}, N_{t}} (t)$ multiplies the signal ${\tilde{x}}_{N_{t}} (t)$ , and the noise at the root layer reads

ñ_{N_{t}} (t) = q_{N_{t}}^{H} (t) n,

(21)

which is a Gaussian variable with zero mean and unchanged power $E [{|ñ_{N_{t}}|}^{2}] = N_{0} q_{N_{t}}^{H} (t) q_{N_{t}} (t) = N_{0}$ . Thus, the actual noise power affecting the fixed-size QAM constellation of Figure 2a reads

σ_{act .noise}^{2} = E [{|\frac{ñ_{N_{t}} (t)}{r_{N_{t}, N_{t}} (t)}|}^{2}] = \frac{N_{0}}{r_{N_{t}, N_{t}}^{2} (t)} .

(22)

This provides a better insight into our problem. Indeed, let us assume the channel is composed by i.i.d. Gaussian variables. Due to the Gram-Schmidt orthogonalization in the QR process, $r_{N_{t}, N_{t}} (t)$ is a Rayleigh random variable with unit power and probability density function given by

p_{R_{N_{t}, N_{t}}} (r_{N_{t}, N_{t}}) = 2 r_{N_{t}, N_{t}} exp (- r_{N_{t}, N_{t}}^{2})

(23)

We can easily compute the pdf of the actual noise standard deviation, normalized by $\sqrt{N_{0}}$ , being the output of the function $d = f (r_{N_{t}, N_{t}}) = \frac{1}{r_{N_{t}, N_{t}}}$ :

p_{D} (d) = \frac{p_{R_{N_{t}, N_{t}}} (f^{- 1} (d))}{|f^{'} (f^{- 1} (d))|} = \frac{2}{d^{3}} exp (- \frac{1}{d^{2}})

(24)

This pdf is plotted in Figure 2b (solid blue line), along with other simulated pdfs (dashed red lines), referring to a MIMO system with N_t= N_r= 2 and spatially correlated channel gains. It can be observed that the correlated case is even worse, because the matrix H can easily be ill-conditioned, i.e., with the last diagonal elements of R(t) close to zero.

Even more importantly, the pdfs always exhibit long tails at high standard deviations. This suggests the following interpretation of the SNR behaviour at the root layer: quite often, the noise level is moderate; occasionally it is very high. The square side reduction is expected to be applicable only in the former case, since any approximation in the latter case would likely be harmful. This idea will be made practical in the following section.

4.1 Algorithm description

Aiming at square subsets (see Figure 2a) that include the transmitted symbol with a high probability, square side lengths larger than $α \sqrt{N_{0} / 2}$ (with α > 1) should be considered. Probably, one could approach the performance of the full complexity LORD with ad-hoc tuning of the parameter α, fundamental to find a good tradeoff between complexity reduction and capability to detect the ML point. Nevertheless, this solution would be sensitive to the parameter α, thus not well suited to any device implementation.

On the contrary we propose an attractive alternative solution, which does not require any parameter to be tuned, once the architecture has been chosen. Indeed, we suggest to perform the full-complexity search over the whole QAM constellation for the carriers affected by the worst-case noise (in the following, "worst carriers" for brevity), limiting the search for all the other ones to a square subset with a fixed number of points. This way, we always reserve the more robust (full-complexity) algorithm to the carriers affected by higher noise powers at the root layer. This significantly reduces the probability that a transmitted point falls outside the reduced square subset of Figure 2a. Clearly, the higher the hardware capability of the device, the higher the fraction of carriers that can be fully spanned. The fundamental hypothesis, at the basis of this solution, is the existence of detection time constraints (measured in number of clock cycles) only at the level of the entire OFDM symbol, and not carrier-by-carrier. This hypothesis seems quite reasonable, since devices for typical applications have to conclude the detection of an entire OFDM symbol within a fixed time, say T_max.

Let us focus on the QAM symbols transmitted by the same antenna within an OFDM symbol. Define the parallelism P^lc as the minimum number of DFE processors (8) able to exhaustively analyze only the worst $N_{c}^{full}$ carriers, limiting the search for the rest of the tones within a square subset of cardinality S², as in Figure 2.

Assume that each DFE processing (8) takes T_elem clock cycles. Thus, the low-complexity LORD requires a number of elementary processing units per antenna P^lc such that

T_{elem} ⌈\frac{N_{c}^{full} M + (N_{dc} - N_{c}^{full}) S^{2}}{P^{lc}}⌉ \leq T_{max}

(25)

As a special case, when $N_{c}^{full} = N_{dc}$ we obtain the full-complexity LORD parallelism P^full, satisfying $T_{elem} ⌈N_{dc} M / P^{full}⌉ \leq T_{max}$ .

For example, assume that T_max = 6N_dcT_elem, i.e., on average 6 clock cycles are allowed to conclude the detection process for each antenna permutation, at each data carrier.^b With N_t= 2, N_dc = 52 and a 64-QAM constellation, Figure 3 plots the maximum number of fully-spanned tones (solid lines) as a function of the parallelism P^lc and of the square subset side S. Clearly, the higher the number of processing units, the higher the number of exhaustive searches we can exploit. On the contrary, the larger the square side length S, the lower the number of full-complexity detections.

To efficiently compute not only (8) but also (10), we impose further regularity to the hardware supposing that at each clock cycle the device can work only over a row/column of the square.

When P^lc ≥ S, as in Figure 4a, the square processing is always performed in the same direction, e.g., by columns. On the contrary, when P^lc <S, the square is computed through a tessellation, as in Figure 4b. It can be shown that this kind of processing requires

⌊\frac{S}{P^{lc}}⌋ (2 S - P^{lc}) - {⌊\frac{S}{P^{lc}}⌋}^{2} P^{lc} + S

(26)

clock cycles and, with a reasoning analogous to (25), leads to a smaller number of available fully-spanned carriers, as reported in Figure 3 (dashed lines). Curves in case of tessellation have been plotted only up to S = 6, which is the limit case, since with 6 clock cycles (on average) the detection time is exhausted for performing the low-complexity algorithm over each carrier (i.e., $N_{c}^{full} = 0$ ). If S > 6, it is not possible to meet the overall OFDM time constraints of our example. Nevertheless, it will be shown in the next section that S = 5 is enough for all cases of practical interest we have tested. This means that the ML performance can be approached with S ⋅ P^lc/M = 25/64 ≈ 40% of the original LORD complexity.

4.2 Extension to T-LORD

From the description of LORD and T-LORD outlined in Section 3, it is clear that the LORD algorithm is actually a special case of the T-LORD, when all the a-priori LLRs ξ_t(n) are zero. Indeed, in this case there is no point in applying the a-priori criterion, since any symbol has the same a-priori probability equal to $\frac{1}{M}$ . Only the distance criterion makes sense, and its DFE process is actually the same as (8). Finally, having just one meaningful candidate per set $U_{t} (\bar{x})$ , also the K-best approach becomes superfluous. To summarize, the distance criterion in the T-LORD works as the LORD algorithm. For this reason we can generalize the LORD hardware-oriented simplification, presented in the previous section, to the T-LORD.

Basically, the full T-LORD algorithm is performed only for the most attenuated carriers. For the rest, the DFE process is run just for a subset of root layer symbols. In this case, the candidate sets $U_{t} (\bar{x})$ are not determined for any hypothesis $\bar{x}$ , but only for those belonging to a properly chosen square subset of the QAM constellation. The only difference with LORD is in the way we can choose the square subset of cardinality S². Indeed, in principle one could find it at the first iteration, as shown in Section 4, and let it unchanged along iterations. Though this approach is quite attractive, it is potentially harmful, since it could inhibit the a-priori LLRs influence on the detector outputs, if the search gets stuck in a bad subset of root layer symbols, not containing the transmitted one. An effective solution is to let the a-priori information drive the choice of the square subset, in conjunction with the observed signal $ỹ_{N_{t}} (t)$ . We compute the L-MMSE estimation of the transmitted symbol on the root layer, performing a weighted maximal ratio combining (MRC) of the equalized received signal ${\hat{x}}_{D} (t)$ and of the a-priori expected symbol ${\hat{x}}_{A} (t)$ :

{\hat{x}}_{D} (t) = \frac{ỹ_{N_{t}} (t)}{r_{N_{t}, N_{t}} (t)}

(27)

σ_{D}^{2} (t) = \frac{N_{0}}{{|r_{N_{t}, N_{t}} (t)|}^{2}}

(28)

{\hat{x}}_{A} (t) = \sum_{j = 1}^{M} x_{j} \prod_{i = 1}^{2 B} \frac{exp (b_{i} (x_{i}) ξ_{t} (i))}{1 + exp (ξ_{t} (i))}

(29)

σ_{A}^{2} (t) = \sum_{j = 1}^{M} {|x_{j}|}^{2} \prod_{i = 1}^{2 B} \frac{exp (b_{i} (x_{j}) ξ_{t} (i))}{1 + exp (ξ_{t} (i))}

(30)

\hat{x} (t) = \frac{σ_{A}^{2} (t) {\hat{x}}_{D} (t) + σ_{D}^{2} (t) {\hat{x}}_{A} (t)}{σ_{D}^{2} (t) + σ_{A}^{2} (t)}

(31)

In case of null a-priori information, $σ_{A}^{2} (t) = σ_{X}^{2}$ and the square subset choice is practically the same as the LORD one, since $σ_{X}^{2} = 1$ is typically much greater than $σ_{D}^{2} (t)$ . Conversely, when the a-priori information in high, the received symbol is ignored in the calculus of $\hat{x} (t)$ , since $σ_{A}^{2} (t)$ is small. Finally we remark that (29) and (30) can be efficiently computed with techniques similar to [15].

4.3 Related issues

As the constellation search is restricted at the root layer, there is no guarantee that at least one candidate symbol in $S_{t}$ exists for each value of any bit of x_t. In this case, if the crossing processes (11) or (20) do not recover one in $S_{t}^{'}$ , one of the two terms in (10) and (14) is missing for that particular bit. Clipping approximations, like assigning a fixed (finite or infinite) value to its LLR, based on the hard decided ML or MAP symbol, are not completely satisfactory. Nevertheless, for a Gray 64-QAM constellation this approximation is required only when S ≤ 4. In fact, as clear in Figure 5, if we consider five or more adjacent symbols of an 8-PAM Gray constellation, we are assured to span at least one symbol for each possible bit value.

Another problem is how to efficiently find the square subset of Figure 2a. An efficient solution is to apply, for each dimension, the Euchner-Schnorr "zig-zag" algorithm [29], which determines the symbol closest to the received one or to the estimated $\hat{x} (t)$ , and alternatively adds points on its left and right, till the boundaries of the constellation or the square subset sizes are not exceeded.

4.4 LORD and T-LORD complexity

In this section we discuss the complexity of the proposed hardware-oriented LORD and T-LORD. A simple measure to rate the complexity of any detector is the number of spanned modulated symbols, i.e., the number of EDs to compute. Indeed, this is approximately proportional to the number of multiplications (usually more expensive than additions in hardware). E.g., one could compare the ML receiver with LORD and MMSE. With the above definition, the ML complexity is larger than^c $M^{N_{t}}$ . Conversely, LORD evaluates

C_{L} = M N_{t} (N_{t} - 1)

(32)

EDs, while the MMSE essentially requires 2 points per coded bit (we can exploit the Gray mapping regularity as in [30], "folding" the constellation), i.e., 2N_tlog₂ M on the whole. Though the computation of the number of EDs is only a preliminary tool to evaluate complexity, it reveals that the ML cost is exponential in the number of antennas, practically unaffordable even for small arrays, while the LORD complexity is only quadratic. Analogous considerations can be done in case of iterative detectors.

Here, we focus on the complexity reduction of the simplified LORD and K-best T-LORD, w.r.t. the "original" ones. For an exhaustive complexity analisys of all the T-LORD versions as well of other detector families, we refer the reader to [23]. As reported therein, the K-best T-LORD (with K = 1 and all the enhancements set on) computes

C_{KTL} = 3 M N_{t} (N_{t} - 1)

(33)

EDs. From (32) and (33), it is clear that when the constellation cardinality M is large, it represents the largest contribute to the LORD and T-LORD complexity. The simplification proposed in the previous Subsections reduces that factor, and the complexity (averaged w.r.t. frequency tones) becomes

C_{LC - L} = \frac{N_{c}^{full} M + (N_{dc} - N_{c}^{full}) S^{2}}{N_{dc}} N_{t} (N_{t} - 1)

(34)

C_{LC - KTL} = 3 \frac{N_{c}^{full} M + (N_{dc} - N_{c}^{full}) S^{2}}{N_{dc}} N_{t} (N_{t} - 1)

(35)

To strenghten the above analysis, we study the number of multiplications, additions and comparisons, also distinguishing those performed just once (such as the QR decomposition), from those to be repeated for every detection process, i.e., referring to a single tone, OFDM symbol and iteration. Results for these fixed and variable operations are reported in Table 1, for the most complex case that we have investigated, i.e., M = 64 and N_t= 4. The table also comprises the complexity referring to the soft-output generation stage. For completeness, the SIC-MMSE [15], the Full-Complexity T-LORD [23] and a SD have been reported, too. Among different SD families, a breadth-first list detector has been considered, since it guarantees deterministic complexity and latency, as T-LORD does. The list size is K = 36, chosen to achieve a performance close to the T-LORD one.

Table 1 Number of real operations per tone and iteration (N_t= 4, M = 64)

Full size table

Focusing on the low-complexity K-best T-LORD, we have chosen a reduced square subset of side S = 4, the smallest available parallelism P^lc = 4, and a full search over $N_{c}^{full} = 8$ carriers. The square subset center is driven both by the received symbols and the a-priori LLRs. This solution is very attractive, since the loss w.r.t. the full-complexity T-LORD evanishes after some iterations, as shown in the next section. Furthermore, being the square subset 4 × 4, the same hardware could be used also to detect a 16-QAM constellation, with negligible incremental costs. As we can see in Table 1, the low-complexity K-best T-LORD saves less multiplications and additions than those expected looking at the number of computed ED. Indeed, there are additional operations to perform, e.g., (29)-(31) and the identification of the $N_{c}^{full}$ worst carriers. Also the crossing between candidate sets $S_{t}$ contributes to lower the complexity reduction, since it must be performed within the entire constellation, and not only among points of the square subset. Anyway, the simplified T-LORD greatly benefits from the reduction of the spanned symbols, and almost halves the number of required variable multiplications. Also the number of variable additions is sligthly smaller (the a-priori criterion remains almost unchanged).

To conclude, we remark that not only the device area (i.e., the number of required logic gates) benefits from the simplification proposed in this paper, but also the power consumption, as reported in [31]. E.g., in case of LORD with N_t= N_r= 2 and M = 64, assuming a 65 nm CMOS technology with an 80 MHz clock frequency, the area is reduced from 0.64 mm² to 0.21 mm² and the power from 38 to 14 mW, respectively. Therein, a comparison with prior designs can be found, too.^d

5 Simulation results

In this section, we provide performance results, both in terms of extrinsic information delivered by several detectors and Monte Carlo simulation of the receivers embedding them.

We assume two different environments, referred in the following as the "ideal" and "real" one. Firstly, we use a rich scattering channel, whose coefficients h_r,tare i.i.d. complex gaussian values with unit power. Perfect CSI at the receiver side is assumed, too. Then we consider a more realistic channel, with exponentially decaying power delay profile (PDP) and a short time delay spread τ_rms = 50 ns (equal to the sampling time). Spatial correlation is assumed equal to $r (t_{1}, t_{2}) = 0 . 5^{|t_{1} - t_{2}|}$ , being t₁ and t₂ the antenna indexes, sorted in ascending order from a border of the linear array. The perfect CSI hypothesis is abandoned, and substituted by pilot aided tone-by-tone channel estimation (CE): due to the average of subsequent orthogonal long training sequences (LTS), as in [32], each channel tap estimation $ĥ_{r, t}$ is affected by i.i.d. Gaussian noise with power $σ_{C E}^{2} = \frac{N_{0}}{2^{⌈{log}_{2} N_{t}⌉}}$ . No time nor frequency smoothing, such as [32, 33], is adopted, since this would have reduced the difference between the ideal and real settlements.

5.1 EXIT charts

In an iterative receiver, a detector is rated on its capability to transfer extrinsic information to the decoder. EXIT charts are an effective tool to predict the convergence behavior of iterative systems [34] and to design component codes [27, 35], even in case of (possibly MIMO) selective channels. The EXIT analysis assumes independent a priori LLRs ξ_t(n), drawn at random from some probability density function (pdf) often modeled as the output of an AWGN channel with variance twice the mean μ. The output pdf p(l) of extrinsic LLRs is generally sampled experimentally. The mutual information for a consistent pdf is [36]

I = 1 - \int p (l) {log}_{2} (1 + \frac{p (- l)}{p (l)}) d l

(36)

In case of serial concatenation between the detector and the decoder, the quality of the detector output can be evaluated looking at the leftmost value in the graph, corresponding to absence of a-priori information (see, e.g., Figure 6). On the contrary, in turbo receivers one can track the system convergence overlapping the charts of the two iterative modules (with exchanged axes), since the output of the former becomes the input of the latter and so on.

In Figure 6, we plot (both in ideal and realistic channel conditions) the EXIT curve of the hardware oriented T-LORD developed so far and, as references, the SIC-MMSE [9] and the MAP.

We choose SNR = 23dB, corresponding to a target packet error rate (PER) close to 10^-2. The T-LORD algorithm produces at the output almost the same information as the MAP detector, for any input information I_A. On the contrary, SIC-MMSE is largely suboptimal and is expected to introduce severe losses in an iterative receiver. Monte-Carlo simulation will confirm these predictions.

In Figure 7 we further compare the EXIT curves of different T-LORD detectors. As we can see, the gap between the hardware oriented T-LORD with S = P^lc = 5 and the full-complexity T-LORD is small. Conversely, the case S = P^lc = 4 without the update of the square subset center leads to some information loss.

An interesting choice is S = P^lc = 4 with the update enabled (denoted throughout figures as "moving square"), exhibiting an abrupt change of the delivered information, as the a-priori information gets large. When no a-priori information is available, the reduced square positioning is based only on the received, noisy symbol, therefore when the transmitted signal lies outside the reduced square, some extrinsic information is lost. Conversely, when the square positioning can benefit from a priori information, a relevant fraction of extrinsic information is recovered. This closes the gap iteration by iteration, as confirmed by simulation in the following. Similar results hold for the realistic channel.

5.2 Monte Carlo simulations

In this section we provide simulation results for the above described low-complexity LORD detectors, with different square sides and parallelisms. For comparison, we plot also the MAP, MMSE [9] and full-complexity T-LORD [23] curves. Simulations are floating-point. Iterations range from 1 to 4. Iteration 1 means no extrinsic information is available to the detector, i.e., LORD and T-LORD coincide, as well as MAP and ML. Aiming to achieve very high spectral efficiencies, up to 15 bit/carrier, and to test the simplified T-LORD in challenging conditions, we always consider a channel code rate R_c= 5/6, i.e., the most sensitive 64-QAM mode in the 802.11n standard [1].

Figures 8 and 9 plot PER vs SNR for the case N_t= N_r= 2 and N_t= N_r= 3, respectively. Target PER has been set equal to 10^-2, a common assumption in wireless LAN communications when retransmission is allowed.^e Here, we assume ideal channel conditions (i.e., Gaussian uncorrelated channels with perfect CSI at the receiver).

As we can see, the T-LORD performance is close to the MAP detector, and largely outperforms the MMSE receiver, plagued by ill-conditioned channel matrices. Only the T-LORD with S = P^lc = 4 and fixed subset choice at the root layer has a modest loss, say 0.2dB more than the full-complexity T-LORD.

The T-LORD robustness w.r.t. MMSE becomes even more pronounced in Figure 10, assuming the realistic channel conditions described at the beginning of this section. Part of the SNR gap between ideal and realistic conditions can be ascribed to the noisy (tone-by-tone, ZF) channel estimates, computed exploiting the orthogonal preambles in [1]. The estimation error can be interpreted as additional noise over the link, and one can expect an overall performance degradation

{L_{CE} = \frac{N_{0}}{N_{0} + N_{t} σ_{CE}^{2}}|}_{dB} {= - (1 + \frac{N_{t}}{2^{⌈{log}_{2} N_{t}⌉}})|}_{dB}

(37)

equal to 3dB when N_t= 2 or N_t= 4, and slightly smaller (2.4dB) when N_t= 3, since the 802.11n preamble contains one more LTS than the number of transmitting antennas. The remaining 3dB loss, that one would experience even in case of ideal CE, is due to the severe channel described in Section 2, with an exponentially decaying PDP, short time delay spread and spatial correlation. In this challenging case with less spatial diversity, the MMSE receiver completely fails to improve with iterations, while the full-complexity K-best T-LORD misses the MAP performance by only 0.5 dB in the realistic case. This gap is probably due to error propagation in the DFE process.

For completeness, in Figures 11 and 12 we also report simulations for the case N_t= N_r= 4, both for ideal and realistic channels.^f In this case, the extremely time consuming MAP has been replaced by a lower bound, assuming that MAP receiver can fail only when also the full complexity T-LORD is not able to recover the message.

Focusing now on LORD detectors, S = P^lc = 5 is enough to approach the optimal ML detector performance as the full-complexity LORD does, while in case of S = P^lc = 4 LORD suffers some performance degradation. This can be explained by a higher probability that noise overcomes the square subset borders or the soft output generation misses some EDs in (10) or (14). Thus, the former parameters have been chosen for the HDL implementation of LORD [31]. Conversely, focusing on the fourth iteration, the above gap is almost closed by the S = P^lc = 4 T-LORD with the square subset center update (as predicted by EXIT charts), performing even better than the case S = P^lc = 5, with fixed subset.

To conclude, we defer readers interested in a performance comparison with sphere detectors to [23]. Results show that T-LORD achieves better performance computing less EDs, in any simulated case.

6 Conclusions

In this article, we have proposed innovative hardware-oriented, soft-output LORD and T-LORD algorithms, that can heavily reduce the number of parallel elementary processing units, required to meet the latency constraints in MIMO-OFDM systems, when high-cardinality QAM constellations are deployed. The simplified versions preserve the features of the original algorithms, i.e., fixed complexity, deterministic latency and a remarkable parallel structure. The proposed solution is regular, scalable and does not require any ad hoc parameter tuning, e.g., depending on the experienced average SNR or the actual channel realization.

Besides, the loss in 802.11n systems w.r.t. the ML and MAP detectors is very small (few tenths of dB). We tried several configurations up to 20 bit/carrier (N_t= 4), corresponding to a system throughput of 260 Mb/s, if we consider the 802.11n standard. We also tested the system with very noisy channel estimates, as well as a more realistic channel offering less spatial and frequency diversity, due to correlation. In each case, the simplified LORD and T-LORD showed comforting robustness, outperforming the non-iterative ML and the iterative SIC-MMSE receiver, and always approaching the receiver with the ideal detector. These features make LORD and T-LORD good candidates for VLSI MIMO receivers.

Endnotes

^aFor brevity, we give a simplified description of the LORD algorithm. For more details, refer to [21] and [22], where a real-domain modified QR decomposition allows to avoid the normalization of the columns of Q. Nevertheless, the low-complexity, hardware-oriented LORD and T-LORD presented in this article can be also applied to that framework, as shown in [31]. ^bObviously this value depends on the hardware, but it is reasonable for an FPGA device (with a clock frequency of tens of MHz) aiming to process in realtime an ODFM symbol lasting 4 μ s and carrying 52 data carries, as in [1]. ^cNote that the measure does not refer to the number of N_t-dimensional hyper-symbols, but to the number of spanned QAM symbols throughout the algorithm. ^dA fair comparison of area and power consumption is hard to achieve, since many parameters change from one design to the other (e.g., clock frequency, CMOS technology, modulations, antennas, soft-output generation). Nevertheless, in [31] it is shown that LORD provides a very good trade-off in any case. ^eE.g., the standardization group for 802.11n chose PER = 10^-2 for performance comparison purposes. ^fA straight performance comparison between systems with a different number of antennas is hard to achieve. The SNR letting the system meet the target PER changes when the number of antennas gets large and its trend is hard to foresee, for at least two reasons. On the one hand, we exploit the capacity growth to increase the throughput, not to strengthen the communication. On the other hand, a larger number of antennas makes the data packet shorter, since more information is conveyed at each channel use: this reduces the PER for a given SNR.

References

Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specification-Amendment 5: Enhancements for Higher Throughput 2009.
WiMAX Forum Mobile System Profile Release 1.0 Approved Specification 2005.
Evolved universal terrestrial radio access (EUTRA) and evolved universal terrestrial radio access network (EU-TRAN); overall description 2009.
Telatar E: Capacity of multi-antenna gaussian channels. Eur Trans Telcommun 1999, 10(6):585-595. 10.1002/ett.4460100604
Article Google Scholar
Foschini GJ, Gans M: On limits of wireless communications in a fading environment when using multiple antennas. Wirl Pers Commun 1998, 6(3):311-335. 10.1023/A:1008889222784
Article Google Scholar
ten Brink S: Convergence of iterative decoding. Electron Lett 1999, 35(10):806-808. 10.1049/el:19990555
Article Google Scholar
Berrou C, Glavieux A: Near optimum error correcting coding and decoding: turbo-codes. IEEE Trans Commun 1996, 44(10):1261-1271. 10.1109/26.539767
Article Google Scholar
Tuchler M, Singer AC, Koetter R: Minimum mean squared error equalization using a priori information. IEEE Trans Acous Speech Signal Process 2002, 50(3):673-683.
Article Google Scholar
Zuyderhoff D, Wautelet X, Dejonghe A, Vandendorpe L: MMSE turbo receiver for space-frequency bit-interleaved coded OFDM,". In Proc IEEE Vehicular Tech Conf. Orlando, FL; 2003:567-571.
Google Scholar
Viterbo E, Boutros J: A universal lattice code decoder for fading channels. IEEE Trans Inf Theory 1999, 45(5):1639-1642. 10.1109/18.771234
Article MathSciNet Google Scholar
Wong K, Tsui C, Cheng RSK, Mow M: A VLSI architecture of a k-best lattice decoding algorithm for MIMO channels. In Proc IEEE Int Symp on Circuits and Systems. Scottsdale, AZ; 2002:273-276.
Google Scholar
Baro S, Hagenauer J, Witzke M: Iterative detection of MIMO transmission using a list-sequential (LISS) detector. In Proc IEEE Int Conf Communications. Anchorage, AK; 2003:2653-2657.
Google Scholar
Guo Z, Nisson P: Algorithm and implementation of the k-best sphere decoding for MIMO detection. IEEE J Sel Areas Commun 2006, 24(3):491-503.
Article Google Scholar
Radosavljevic P, Cavallaro JR: Soft sphere detection with bounded search for high-throughput MIMO receivers. In Proc Asilomar Conf Signals, Systems, and Computers. Pacific Grove, CA; 2006:1175-1179.
Google Scholar
Tomasoni A, Ferrari M, Gatti D, Osnato F, Bellini S: A low complexity turbo mmse receiver for w-lan mimo systems. In Proc IEEE Int Conf Communications. Istanbul, Turkey; 2006:4119-4124.
Google Scholar
Hochwald BM, ten Brink S: Achieving near-capacity on a multiple-antenna channel. IEEE Trans Commun 2003, 51(3):389-399. 10.1109/TCOMM.2003.809789
Article Google Scholar
Li Q, Wang Z: Improved K-best sphere decoding algorithms for MIMO systems. In Proc IEEE Int Symp on Circuits and Systems. Kos, Greece; 2006:1159-1162.
Google Scholar
Barbero LG, Ratnarajah T, Cowan C: A low-complexity soft-MIMO detector based on the fixed-complexity sphere decoder. In Proc IEEE Int Conf On Acoustics, Speech and Signal Processing. Las Vegas, NE; 2008:2669-2672.
Google Scholar
Wu Y, Liu YT, Liao Y, Chang H: Early-pruned k-best sphere decoding algorithm based on radius constraints. In Proc IEEE Int Conf Communications. Beijing, China; 2008:4496-4500.
Google Scholar
Wang L, Xu L, Chen S, Hanzo L: Generic iterative search-centre-shifting k-best sphere detection for rank-deficient SDM-OFDM systems. Electron Lett 2008, 44(8):552-553. 10.1049/el:20083279
Article Google Scholar
Siti M, Fitz MP: Layered orthogonal lattice detector for two transmit antenna communications. In Proc Allerton Conference On Communication, Control, And Computing. Monticello, IL; 2005:287-296.
Google Scholar
Siti M, Fitz MP: A novel soft-ouput layered orthogonal lattice detector for multiple antenna communications. In Proc IEEE Int Conf Communications. Istanbul, Turkey; 2006:1686-1691.
Google Scholar
Tomasoni A, Siti M, Ferrari M, Bellini S: Low complexity, quasi-optimal MIMO detectors for iterative receivers. IEEE Trans Wirl Commun 2010, 9(10):3166-3177.
Article Google Scholar
Ahn CJ: Parallel detection algorithm using multiple QR decompositions with permuted channel matrix for SDM/OFDM. IEEE Trans Veh Technol 2008, 57(4):2578-2582.
Article Google Scholar
Hagenauer J, Hoeher P: A Viterbi algorithm with soft-decision outputs and its applications. In Proc IEEE Global Telecommunications Conf. Dallas, TX; 1989:1680-1686.
Google Scholar
Bahl L, Cocke J, Jelinek F, Raviv J: Optimal decoding of linear codes for minimizing symbol error rate. IEEE Trans Inf Theory 1974, 20(2):284-287.
Article MathSciNet Google Scholar
ten Brink S, Kramer G, Ashikhmin A: Design of low-density parity-check codes for modulation and detection. IEEE Trans Commun 2004, 52(4):670-678. 10.1109/TCOMM.2004.826370
Article Google Scholar
Baek MS, You Y, Song HK: Combined QRD-M and DFE detection technique for simple and efficient signal detection in MIMO-OFDM systems. IEEE Trans Wirel Commun 2009, 8(4):1632-1638.
Article Google Scholar
Agrell E, Eriksson T, Vardy A, Zeger K: Closest point search in lattices. IEEE Trans Inf Theory 2002, 48(8):2201-2214. 10.1109/TIT.2002.800499
Article MathSciNet Google Scholar
Tosato F, Bisaglia P: Simplified soft-output demapper for binary interleaved COFDM with application to HIPERLAN/2. In Proc IEEE Int Conf Communications. New York City, NY; 2002:664-668.
Google Scholar
Cupaiuolo T, Siti M, Tomasoni A: Low-complexity and high throughput VLSI architecture of soft-output ML MIMO detector. In Proc Design, Automation & Test in Europe. Dresden, Germany; 2010:1396-1401.
Google Scholar
Tomasoni A, Gallizio E, Bellini S: Low complexity and low latency training assisted channel estimation for MIMO-OFDM systems. In Proc IEEE Personal Indoor and Mobile Radio Conf. Athens, Greece; 2007:1-5.
Google Scholar
Li YG, Seshadri N, Ariyavisitakul S: Channel estimation for OFDM systems with transmitter diversity in mobile wireless channels. IEEE J Sel Areas Commun 1999, 17(3):461-471. 10.1109/49.753731
Article Google Scholar
Tüchler M: Convergence prediction for iterative decoding of threefold concatenated systems. In Proc IEEE Global Telecommunications Conf. Taipei, Taiwan; 2002:1358-1362.
Google Scholar
Lechner G, Sayir J, Land I: Optimization of LDPC codes for receiver frontends. In Proc IEEE Int Symp Inform Theory. Seattle, WA; 2006:2388-2392.
Google Scholar
ten Brink S: Convergence behavior of iteratively decoded parallel concatenated codes. IEEE Trans Commun 2001, 49(10):1727-1737. 10.1109/26.957394
Article Google Scholar

Download references

Acknowledgements

We warmly thank Eng. Teo Cupaiuolo for the VLSI design of LORD.

Author information

Authors and Affiliations

CNR-IEIIT Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy
Alessandro Tomasoni & Marco Ferrari
STMicroelectronics, Agrate Brianza, (MB), Italy
Massimiliano Siti
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milan, Italy
Sandro Bellini

Authors

Alessandro Tomasoni
View author publications
You can also search for this author in PubMed Google Scholar
Massimiliano Siti
View author publications
You can also search for this author in PubMed Google Scholar
Marco Ferrari
View author publications
You can also search for this author in PubMed Google Scholar
Sandro Bellini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Ferrari.

Additional information

Competing interests

This work has been partially supported by the Advanced System Technology Group of STMicroelec-tronics, via Olivetti 2, 20864 Agrate Brianza (MB), Italy. Some solutions are protected by US and EU patents.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Authors’ original file for figure 12

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Tomasoni, A., Siti, M., Ferrari, M. et al. Hardware oriented, quasi-optimal detectors for iterative and non-iterative MIMO receivers. J Wireless Com Network 2012, 62 (2012). https://doi.org/10.1186/1687-1499-2012-62

Download citation

Received: 14 May 2011
Accepted: 24 February 2012
Published: 24 February 2012
DOI: https://doi.org/10.1186/1687-1499-2012-62

Hardware oriented, quasi-optimal detectors for iterative and non-iterative MIMO receivers

Abstract

1 Introduction

2 System model

3 Detectors outline

3.1 MAP and ML detectors

3.2 LORD detector

3.3 T-LORD detector

4 Hardware-oriented, low-complexity LORD

4.1 Algorithm description

4.2 Extension to T-LORD

4.3 Related issues

4.4 LORD and T-LORD complexity

5 Simulation results

5.1 EXIT charts

5.2 Monte Carlo simulations

6 Conclusions

Endnotes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords