 Research
 Open Access
A simplified hard output sphere decoder for large MIMO systems with the use of efficient search center and reduced domain neighborhood study
 Youssef Nasser^{1}Email author,
 Sebastien Aubert^{2, 3},
 Fabienne Nouvel^{3},
 Karim Y. Kabalan^{1} and
 Hassan A. Artail^{1}
https://doi.org/10.1186/s136380150442y
© Nasser et al. 2015
 Received: 16 February 2015
 Accepted: 4 September 2015
 Published: 17 October 2015
The Erratum to this article has been published in EURASIP Journal on Wireless Communications and Networking 2015 2015:251
Abstract
Multipleinput multipleoutput (MIMO) with a spatialmultiplexing (SM) scheme is a topic of high interest for the next generation of wireless communications systems. At the receiver, neighborhood studies (NS) and lattice reduction (LR)aided techniques are common solutions in the literature to approach the optimal and computationally complex maximum likelihood (ML) detection. However, the NS and LR solutions might not offer optimal performance for large dimensional systems, such as large number of antennas, and highorder constellations when they are considered separately. In this paper, we propose a novel equivalent metric dealing with the association of these solutions by introducing a reduced domain neighborhood study. We show that the proposed metric presents a relevant complexity reduction while maintaining nearML performance. Moreover, the corresponding computational complexity is shown to be independent of the constellation size, but it is quadratic in the number of transmit antennas. For instance, for a 4 × 4 MIMO system with 16QAM modulation on each layer, the proposed solution is simultaneously nearML with perfect and real channel estimation and ten times less complex than the classical neighborhoodbased Kbest solution.
Keywords
 Minimum Mean Square Error
 MIMO System
 Lattice Reduction
 Zero Force
 Constellation Size
1 Introduction
Multipleinput multipleoutput (MIMO) technology has taken a lot of attention in the last decade since it can improve link reliability without sacrificing bandwidth efficiency or, contrariwise, it can improve the bandwidth efficiency without losing link reliability [1]. Recently, the concept of large MIMO systems, i.e., high number of antennas, has also gained research interests, and it is well seen as a part of nextgeneration wireless communication systems [2, 3].
However, the main drawback of MIMO technology is the increased complexity at the receiver side when a nonorthogonal (NO) MIMO scheme with a large number of antennas and/or large constellation size is implemented [4, 5]. For the detection process, although the performance of the maximum likelihood (ML) detector is optimal, its computational complexity increases exponentially with the number of transmit antennas and with the constellation size. In literature, different MIMO detection techniques have been proposed. The linearlike detection (LD) [6] and decisionfeedback detection (DFD) [7] are the baseline detection algorithms. Here, we distinguish the conventional linear MIMO detection techniques zero forcing (ZF) [8] and minimum mean square error (MMSE) [8]. Although linear detection approaches are attractive in terms of their computational complexity, they might lead to a nonnegligible degradation in terms of performance [9].
Some nonlinear detectors have been also introduced. The sphere decoder (SD), one of the most famous MIMO detectors, is based on a tree search and is very popular due to its quasioptimal performance [10]. However, this performance is reached at the detriment of an additional implementation complexity. Indeed, the SD achieves quasiML performance while its average complexity is shown to be polynomial (roughly cubic) in constellation size and in the number of transmit antennas over a certain range of signaltonoise ratio (SNR) while the worst case complexity is still exponential [11]. From a hardware implementation point of view, the SD algorithm presents two main drawbacks. Firstly, its complexity coefficients can become large when the problem dimension is high, i.e., at the high spectral efficiency, high number of antennas, and high number of users in multiuser MIMO (MUMIMO) context. Secondly, the variance of its computation time can be also large leading to undesirable highly variable decoding delays. Despite classical optimizations such as the SchnorrEuchner (SE) enumeration [12], the SD originally presented in [11] offers by nature a sequential tree search phase, which is an additional drawback for implementation. In order to deal with these two aspects, the authors in [13] have proposed a suboptimal solution denoted as the Kbest [13, 14], where K is the number of stored neighbors given a layer. However, even with a fixed computational complexity and a parallel nature of implementation, some optimizations are required especially for highorder constellation and large number of antennas (due to the large K required in this case) [15–18]. Aiming at reducing the neighborhood size (namely K, over all layers), different solutions are proposed. For instance, the sorted QR decomposition (SQRD)based dynamic Kbest which leads to the famous SQRDbased fixed throughput SD (FTSD) is proposed in [16]. Even with these efforts, the neighborhood size still induces a computationally expensive solution for achieving quasiML performance. An alternative trend has been firstly presented in the literature by Wuebben et al. in [19]. It consists in adding a preprocessing step, namely the lattice reduction (LR), aiming at applying a classical detection through a betterconditioned channel matrix [19–21]. This solution has been shown to offer the full reception diversity at the expense of a SNR offset in the system performance. This offset increases with large dimensional transmit antenna systems and highorder modulations.
Recently, a promising—although complex—association of the Kbest and LR solutions has been considered. It provides a convenient performancecomplexity tradeoff. The general idea consists in reducing the SNR offset through a neighborhood study which yields a nearML performance for a reasonable K. The concept has been introduced first by the authors of [22]. Later on, their basic solution has been improved by proposing to model the sphere constraint in a reduced domain or by introducing efficient symbols’ enumeration algorithm [23]. However, a major aspect of this combination has not been considered yet. In particular, any SD, including the Kbest, may be advantageously applied by considering a betterconditioned channel matrix through the introduction of a Reduced Domain Neighborhood (RDN) study and a judicious search center. In [5], an improved LR technique dealing with the RDN has been proposed in the context of large MIMO systems. It is based on the decomposition of the spanned space of the channel matrix into small subspaces in order to improve orthogonality of the quantization. In [24], the search center is found through an ant colony optimization and initial search through the output of the MMSE detector.

A promising association of the Kbest and LR solutions is proposed.

Modification of the SD neighborhood study by applying a preprocessing step. This is accompanied with a new and efficient search center and MMSE detector. The equivalent expression of the lattice reductionaided (LRA)MMSEcentered SD, which corresponds to an efficient LRAMMSEsuccessive interference canceller (SIC) Babai point, is proposed to improve the performance or reduce the complexity of the detector.

The (S)QRD is introduced in formulas. It provides—to the best of the authors’ knowledge—the best known pseudolinear hard detector as a Babai point, for large number of antennas as well as for highorder modulations.

The proposed expression is robust by nature to any search center and constellation order and offers closetooptimal performance with medium K values. This applies for both perfect and real channel estimation.

The proposed solution offers a computational complexity that is independent of the constellation order. Therefore, it outperforms classical SD techniques for a reasonable complexity in the case of highorder constellations. We show for example that a number of neighbors K = 2 is sufficient for a 4 × 4 MIMO system with 16QAM modulation on each layer, and it is less than 0.5 dB for a 64 × 64 and 128 × 128 MIMO system from the ML solution.

The proposed solution offers a computational complexity that is quasiconstant for large number of antennas, showing the evidence of its importance.
This paper is organized as follows. Section 2 presents the problem statement of the SD. In Section 3, the different existing solutions are described and analyzed. In Section 4, we propose our generalized solution based on LR with the use of an efficient search center and reduced domain neighborhood. In Section 5, the performance of the presented detectors are provided, compared, and discussed. In Section 6, we consider the computational complexity of the proposed solution in comparison with some reference detection techniques. Conclusions are drawn in Section 7.
2 Problem statement
2.1 Sphere decoder detector
where H represents the (n _{R}, n _{T}) complex channel matrix assumed to be perfectly known at the receiver, x is the transmit symbol vector of dimensions (n _{T}, 1) where each entry is independently withdrawn from a constellation set ξ, and n is the additive white Gaussian noise of dimensions (n _{R}, 1) and of variance σ ^{2}/2 per dimension. The basic idea of the SD, to reach the optimal estimate \( {\widehat{x}}_{\mathrm{ML}} \) (given by the ML detector) while avoiding an exhaustive search, is to observe the lattice points that lie inside a sphere of radius d.
where H = QR, with the classical QRD definitions.
The classical SD formula in (2) is centered on the received signal y. From now on, this detection will be denoted as the naïve SD. In the case of a depthfirst search algorithm [13], the first solution given by this algorithm is defined as the Babai point [25, 26]. In order to write it, the classical SD expression may be rearranged, leading then to an exact formulation through an efficient partial Euclidean distance (PED) expression and early pruned nodes [27].
In the literature, the SD principle leads to numerous implementation problems. In particular, it is a nondeterministic polynomialtime (NPhard) problem [28]. This aspect has been partially solved through the introduction of an efficient solution that lies in a fixed neighborhood size algorithm (FNSA), commonly known as the Kbest solution. However, this solution makes the detector suboptimal since it leads to a performance loss compared to the ML detector. It is particularly true in the case of an inappropriate choice of K according to the MIMO channel condition number and in the case of an inappropriate choice of d in (2). Indeed, an inappropriate choice of d could lead to a ML solution excluded from the search tree. On the other hand, although a neighborhood study remains the one and only one solution that achieves nearML performance, it may lead to the use of a largesize neighborhood scan which would correspond to a dramatic increase of the computational complexity. This complexity’s increase will be prohibitive for highorder modulations.
2.2 Lattice reduction
where \( \tilde{\boldsymbol{H}}=\boldsymbol{H}\boldsymbol{T} \) and z = T ^{− 1} x. The n _{T} × n _{T} complex matrix T (with det{T} = 1) is unimodular, i.e., its entries belong to the set of complex integers which reads ℤ_{ℂ} = ℤ + jℤ, with j ^{2} = 1. The key idea of any LRaided (LRA) detection scheme is to understand that the finite set of transmitted symbols \( {\xi}^{n_T} \)can be interpreted as a denormalized, shifted then scaled version of the infinite set of complex integers subset \( \subset {\mathbf{\mathbb{Z}}}_{\mathbb{C}}^{n_T} \) according to the relations offered in [29].
To this end, various reduction algorithms have been proposed [19, 30–32]. In the following, we focus on the wellknown LenstraLenstraLovász (LLL) algorithm due to considerations presented in [30, 33]. The lattice aided (LA) is a local approach [34] that transforms the channel matrix into an LLLreduced basis that satisfies both of the orthogonality and norm reduction conditions [31]. While it has been shown in [33] that the QRD outputs of the channel matrix is a possible starting point for the LLL, it has been subsequently introduced that the SQRD provides a better starting point [34]. In particular, it leads to a significant reduction of its computational complexity [35]. That is, the detection process in (3) is performed on z instead of x through the betterconditioned matrix \( \tilde{\boldsymbol{H}} \). Wuebben et al. [19] proposed a full description of some reference solutions, namely the LRAZF and LRAZFSIC without noise power consideration and the LRAMMSE, LRAMMSE extended, and LRAMMSESIC. LRA detectors constitute efficient detectors in the sense of the high quality of their hard outputs. Indeed, they offer a low overall computational complexity while the ML diversity is reached within a constant offset. However, some important drawbacks exist. In particular, the aforementioned SNR offset is important in the case of highorder modulations and of large number of antennas. This issue is expected to be bypassed through an additional neighborhood study.
2.3 Lattice reductionaided sphere decoder
Contrary to the LRA(O)SIC receivers, the application of the LR preprocessing step followed by any SD detector is not straightforward. The main problem lies in the consideration of the possibly transmit symbol vector in the reduced constellation since, unfortunately, the set of all possible transmit symbols vectors cannot be predetermined. The reason for that is because the solution does not only depend on the employed constellation but also on the T ^{ −1} matrix of (3). Hence, the number of children in the search tree and their values are not known in advance. A bruteforce solution is then to determine the set of all possible transmit vectors in the reduced constellation, starting from the set of all possible transmit vectors in the original constellation and by switching to the reduced domain, thanks to the T ^{ −1} matrix.
3 Detection process in the original domain neighborhood
3.1 Zero forcingcentered sphere decoder with original domain neighborhood study detection process
In order to deal with the detection process, we firstly introduce the sphere center x _{C} search algorithm. It concerns any signal of the form ∥x _{C} − x∥ ^{2} ≤ d ^{2} where x is a possible signal.
where e _{ZF} = x _{ZF} − x and x _{ZF} = (H ^{ H } H)^{− 1} H ^{ H } y.
Equation (4) clearly shows that the naïve SD is unconstrained ZFcentered. It implicitly corresponds to a ZFSIC solution with an Original Domain Neighborhood (ODN) study at each layer where each layer is defined as the number of spatial multiplexed data streams. It can be noticed that, in the case of a large ODN study, the ML performance is achieved since the computed metrics are exactly the ML metrics. However, this occurs at the detriment of a large neighborhood study and subsequently a large computational complexity.
3.2 Minimum mean square errorcentered sphere decoder with original domain neighborhood study detection process: equivalent formula
In this section, we introduce the minimum mean square error successive interference cancellation (MMSESIC), a closertoML Babai point than the ZFSIC. For the sake of clearness with definitions, we firstly give a general definition of the equivalence between two ML metrics.
where c is a constant.
where the signals x have to be of constant modulus, i.e., x ^{ H } x is a constant.
This assumption is respected in the case of quadrature phaseshift keying (QPSK) modulations, but it is not directly applicable to 16QAM and 64QAM modulations. However, this assumption is not limiting in practice since a QAM constellation can be considered as a linear sum of QPSK points [36]. In Appendix 1, we discuss the constant modulus constraint on the signal x.
where U is upper triangular with real diagonal elements and \( \tilde{x} \) is any (ZF or MMSE) unconstrained linear estimate.
4 Proposed detection process in the reduced domain neighborhood
Due to the implementation drawbacks, the optimal SD has been proposed to be replaced by a suboptimal FNSA. Hassibi et al. have discussed and shown in [11] that the detector performance is impacted by the noise power and the channel condition number. Hence, the presence of a wellconditioned channel could highly reduce the neighborhood. This means that realizing a LR step followed by a neighborhood study is a very interesting solution in a goodconditioned channel matrix. Accordingly, our proposed combined solution will be detailed in the next subsections.
4.1 Preprocessing
All existing solutions rely on the utilization of the efficient CF preprocessing step. However, these solutions are only functional in the case of a factorized formulation form. Although it is the case in our context, most of the advanced studies have been provided with the applicable QRD. In particular, the advantageous SIC performance optimizations such as ordering according to the corresponding decreasing SNR (from n _{T} to 1) in the ZFSQRD case and SINR in the MMSESQRD case have been proposed in [33]. Moreover, a complexity reduction of the LLLbased LR algorithm has been proposed by the same authors in [33]. In our work, we propose to modify the classical detectors by introducing the QRD instead of the CF, and subsequently of the SQRD, in the (LRA)MMSE(O)SIC cases.

\( {Q}_{\xi^{n_{\mathrm{T}}}}\left\{.\right\} \) is the quantification operator in the original domain constellation,

\( {Q}_{{\mathrm{\mathbb{Z}}}_{\mathrm{\mathbb{C}}}^{n_{\mathrm{T}}}}\left\{.\right\} \) is the quantification operator in the reduced domain constellation,

a is the power normalization and scaling coefficient (i.e., \( 2/\sqrt{2},\ 2/\sqrt{10},\ \mathrm{and}\ 2/\sqrt{42} \) for QPSK, 16QAM, and 64QAM constellations, respectively)

\( \boldsymbol{d}=\frac{1}{2}{\boldsymbol{T}}^{1}{\left[\begin{array}{ccc}\hfill 1+j\hfill & \hfill \dots \hfill & \hfill 1+j\hfill \end{array}\right]}^T \) is a complex displacement vector.
where \( \tilde{\boldsymbol{R}} \) is the LLLbased LR algorithm output, e _{LRA ‐ ZF} = z _{LRA ‐ ZF} − z, and \( {\mathrm{\mathbb{Z}}}_{\mathrm{\mathbb{C}}}^{n_{\mathrm{T}}} \) is the n _{T}dimensional infinite set of complex integers.
4.2 Lattice reductionaided minimum mean square errorcentered sphere decoder with reduced domain neighborhood study detection process
The corresponding detector is a suboptimal solution that consists in a RDN study around the unconstrained LRAMMSE solution, obtained through QR decomposition. This solution’s output is the constrained LRAMMSE detection plus a list of solutions in the neighborhood. The latter is generated according to a nonequivalent metric, which would be subsequently reordered according to the exact ML metric. However, the list is not generated according to the correct distance minimization criterion and would not lead to a nearML solution. Consequently, the proposed detector does not offer an acceptable uncoded BER performance in the sense that it would not lead to a nearML solution. In particular, the ML performance is not reached in the case of a large neighborhood study.
where \( {\tilde{\boldsymbol{H}}}^H\tilde{\boldsymbol{H}}+{\sigma}^2{\boldsymbol{T}}^H\boldsymbol{T}={\tilde{\boldsymbol{U}}}^H\tilde{\boldsymbol{U}} \) in the MMSE case (\( {\tilde{\boldsymbol{H}}}^H\tilde{\boldsymbol{H}}={\tilde{\boldsymbol{U}}}^H\tilde{\boldsymbol{U}} \) in the ZF case) and by noting that Ũ is upper triangular with real diagonal elements and \( \tilde{\boldsymbol{z}} \) is any LRA (ZF or MMSE) unconstrained linear estimate. The proof of this detector formula is given in Appendix 3.
The formula introduced in (12) offers an equivalent metric to the MMSE one introduced in (11), which has been shown to be nearML performance. The difference, and in particular the interest in the LRA case in (12), relies on the neighborhood study nature. In the case of a RDN study, the equivalent channel matrix \( \tilde{\boldsymbol{H}} \) is considered and is remembered to be only roughly, and not exactly, orthogonal. Consequently, the detection, layer by layer, of the symbol vector x does not exactly correspond to its joint detection since the mutual influence of the transformed z signal is still present. This discussion not only exhibits the interest of SDlike techniques to still improve such a detector performance but also puts a big challenge to achieve the ML performance.
In Fig. 2, the mapping of any estimate (or list of estimates) from the reduced domain ẑ to the original domain \( \tilde{\boldsymbol{x}} \) is processed through the T matrix multiplication (see Equation (3)). The additional quantification step aims at removing duplicate symbol vector outputs in the case of a list of solutions.
For the sake of simplicity, let us consider any LRASIC procedure with no neighborhood study. The search center is updated at each layer as follows. By considering the kth layer and with the knowledge of the \( {\widehat{\boldsymbol{z}}}_{k+1:{n}_{\mathrm{T}}} \) estimates at previous layers, the ẑ _{ k } unconstrained Babai point can be provided. Then, it has to be denormalized and shifted to make it belong to \( {\mathrm{\mathbb{Z}}}_{\mathrm{\mathbb{C}}}^{n_{\mathrm{T}}} \). After quantization, and deshifting and normalization, the ẑ _{ k } estimate at the kth layer is obtained such as the next (k − 1)th layer can be considered, until the whole symbol vector is detected. As previously introduced, the neighborhood generation is a problematic step due to the infiniteness and nonregular natures of the constellations in the reduced domain. This point is transparent with classical detectors such as LD and DFD, thanks to the straightforward quantification step in the reduced domain [39].
The SE strategy aims at finding the correct decision early, leading to a safe early termination criterion, which is not considered here for the sake of readability in performance comparison. Also, all the corresponding PEDs are computed and then ordered. The Kbest solutions, namely with the lowest PED, in the reduced domain are stored (C _{ ẑ }) similarly to their corresponding cumulative Euclidean distances (CED) (D _{tot}). The whole procedure is depicted in Fig. 2.
As a final step of the detector and in the case of a RDNbased SD, the list of possible symbols output has to be reordered according to the ML metrics in the original domain and duplicate solutions are removed. It is due to the presence of noise that makes some candidates to be mapped on nonlegitimate constellation points in the reduced constellation, leading to nonacceptable points in the original constellation. The symbol vector associated to the minimal metric becomes the hard decision output of the detector and offers a nearML solution. The proposed algorithm is described in detail in Appendix 4. The reader may refer to this appendix for more details.
5 System performance
ODN naïve (O)SIC FNSA, ODN ZF(O)SIC FNSA, ODN MMSE(O)SIC FNSA, RDN LRAZF(O)SIC FNSA, RDN LRAMMSE(O)SIC FNSA, and ML formulas
Technique designation  Corresponding metric 

ODN naïve (O)SIC FNSA  \( \parallel {\boldsymbol{Q}}^H\boldsymbol{y}\boldsymbol{R}\boldsymbol{x}{\parallel}^2,\ \boldsymbol{x}\in {\xi}^{n_{\mathrm{T}}} \) 
ODN exact ZF(O)SIC FNSA  \( \parallel \boldsymbol{R}\left({\boldsymbol{y}}_{\mathrm{ZF}}\boldsymbol{x}\right){\parallel}^2,\ \boldsymbol{x}\in {\xi}^{n_{\mathrm{T}}} \) [14] 
ODN equivalent MMSE(O)SIC FNSA  \( {\left({\boldsymbol{y}}_{\mathrm{MMSE}}\boldsymbol{x}\right)}^H\left({\boldsymbol{H}}^H\boldsymbol{H}+{\sigma}^2\boldsymbol{I}\right)\left({\boldsymbol{y}}_{\mathrm{MMSE}}\boldsymbol{x}\right),\ \boldsymbol{x}\in {\xi}^{n_{\mathrm{T}}} \) [38] 
RDN exact LRAZF(O)SIC FNSA  \( \parallel \tilde{\boldsymbol{R}}\left({\boldsymbol{z}}_{\mathrm{LRA}\hbox{} \mathrm{Z}\mathrm{F}}\boldsymbol{z}\right){\parallel}^2,\ \boldsymbol{z}\in {\mathrm{\mathbb{Z}}}_{\mathrm{\mathbb{C}}}^{n_{\mathrm{T}}} \) 
RDN equivalent LRAMMSE(O)SIC FNSA (proposed)  \( {\left({\boldsymbol{z}}_{\mathrm{LRA}\hbox{} \mathrm{MMSE}}\boldsymbol{z}\right)}^H\left({\tilde{\boldsymbol{H}}}^H\tilde{\boldsymbol{H}}+{\sigma}^2{\boldsymbol{T}}^H\boldsymbol{T}\right)\left({\boldsymbol{z}}_{\mathrm{LRA}\hbox{} \mathrm{MMSE}}\boldsymbol{z}\right),\ \boldsymbol{z}\in {\mathrm{\mathbb{Z}}}_{\mathrm{\mathbb{C}}}^{n_{\mathrm{T}}} \) 
ML  \( \parallel \boldsymbol{y}\boldsymbol{H}\boldsymbol{x}{\parallel}^2,\ \boldsymbol{x}\in {\xi}^{n_{\mathrm{T}}} \) 
We should note that the RDN LRAMMSEOSIC FNSA, to which this paper relates, is particularly efficient in the case of rankdeficient MIMO systems, i.e., spatially correlated antenna systems, for highorder modulation which are considered points of the LTEA norm and for large number of antennas as in the future generation of cellular systems (beyond 4G networks). Moreover, since the equivalent channel matrix in the LRA case is only roughly orthogonal, the mutual influence of the transformed z is small but still present. Hence, a neighborhood study in the original constellation domain improves the performance compared to a SIC. However, contrarily to classical solutions that are not LRA, the necessary size for achieving the optimal performance is smaller.
It may also be noticed in Fig. 4 that the RDN LRAZFSICcentered FNSA does not reach the ML performance, contrarily to other techniques. It is due to the chosen neighborhood size in the reduced constellation value (N = 5) that is not sufficient for this detector but that is sufficient for the proposed LRAMMSE(O)SIC Babai points. With a larger N value, the RDN LRAZFSICcentered FNSA achieves the ML performance, similarly to other presented detectors.
As depicted in Fig. 6 and with 16QAM modulation, the performance is impacted by the fact that the strict equivalence assumption is not true, i.e., the term x ^{ H } x (or z ^{ H } z) is not exactly constant but only constant in average. As shown in this figure, this assumption is not constraining in terms of performance loss. Moreover, it is insignificant compared to the advantage of the LRA in highorder constellation, which would be annihilated by the use of QPSK constellation.
6 Complexity evaluation
Computational complexity equivalences
Complex operations  Real operations  MUL 

ADD_{CC}  2ADD  0 
ADD_{RC}  ADD  0 
ADD_{RR}  ADD  0 
MUL_{CC}  4MUL + 4ADD  4 
MUL_{RC}  2MUL  2 
MUL_{RR}  1MUL  1 
DIV_{CC}  6DIV + 6ADD  96 
DIV_{RC}  2DIV  32 
DIV_{RR}  1DIV  16 
SQRT_{RR}  1SQRT  32 
ODN ZF(O)SIC FNSA, ODN MMSE(O)SIC FNSA, RDN LRAZF(O)SIC FNSA, RDN LRAMMSE(O)SIC FNSA, and ML formulas
Technique designation  Corresponding computational complexity in MUL 

ODN exact ZF(O)SIC  \( 2MK{n}_{\mathrm{T}}^2+2MK{n}_{\mathrm{T}}4MK+3M \) 
ODN equivalent MMSE(O)SIC  \( 2MK{n}_{\mathrm{T}}^2+2MK{n}_{\mathrm{T}}4MK+3M \) 
RDN exact LRAZF(O)SIC  \( 2N \min \left\{K,N\right\}{n}_{\mathrm{T}}^2+30 \min \left\{K,N\right\}{n}_{\mathrm{T}}+2N\ \min \left\{K,N\right\}{n}_{\mathrm{T}}4N \min \left\{K,N\right\} \) 
\( +6 \min \left\{K,N\right\}{n}_{\mathrm{T}}^2+4 \min \left\{K,N\right\}{n}_{\mathrm{R}}{n}_{\mathrm{T}}+2 \min \left\{K,N\right\}{n}_{\mathrm{R}}+4{n}_{\mathrm{T}}^232 \min \left\{K,N\right\}+2N \)  
RDN equivalent LRAMMSE(O)SIC  \( 2N \min \left\{K,N\right\}{n}_{\mathrm{T}}^2+30 \min \left\{K,N\right\}{n}_{\mathrm{T}}+2N\ \min \left\{K,N\right\}{n}_{\mathrm{T}}4N \min \left\{K,N\right\} \) 
\( +6 \min \left\{K,N\right\}{n}_{\mathrm{T}}^2+4 \min \left\{K,N\right\}{n}_{\mathrm{R}}{n}_{\mathrm{T}}+2 \min \left\{K,N\right\}{n}_{\mathrm{R}}+4{n}_{\mathrm{T}}^232 \min \left\{K,N\right\}+2N \)  
ML  \( 4{n}_{\mathrm{R}}{n}_{\mathrm{T}}{M}^{n_{\mathrm{T}}} \) 
SNR loss at BER = 10^{−4}, ODN ZFSIC FNSA, ODN MMSESIC FNSA, RDN LRAZFSIC FNSA, RDN LRAMMSESIC FNSA, and RDN LRAMMSEOSIC FNSA compared to ML
SNR loss (QPSK)  SNR loss (16QAM)  

Technique  K = 1  K = 2  K = 3  K = 4  K = 1  K = 2  K = 4  K = 16 
ODN exact ZFSIC FNSA  >7.6  >7.6  >7.6  0.36  >5.0  >5.0  >5.0  0 
ODN equivalent MMSESIC FNSA  >7.6  >7.6  6.21  0.30  >5.0  >5.0  >5.0  0.09 
RDN exact LRAZFSIC FNSA  4.43  2.90  1.92  1.71  3.21  2.04  1.27  0.62 
RDN equivalent LRAMMSESIC FNSA  2.90  0.73  0.52  0.27  2.12  0.76  0.53  0.40 
RDN equivalent LRAMMSEOSIC FNSA  0.80  0.01  0  0  1.62  0.02  0  0 
ODN ZFSIC, ODN MMSESIC, RDN LRAZFSIC, RDN LRAMMSESIC, RDN LRAMMSEOSIC, and ML computational complexities in MUL
MUL (QPSK)  MUL (16QAM)  

Technique  K = 1  K = 2  K = 3  K = 4  K = 1  K = 2  K = 4  K = 16 
ODN ZF(O)SIC FNSA  156  300  444  588  624  1200  2352  9264 
ODN MMSE(O)SIC FNSA  156  300  444  588  624  1200  2352  9264 
RDN LRAZF(O)SIC FNSA  510  946  1382  1818  510  946  1818  2254 
RDN LRAMMSE(O)SIC FNSA  510  946  1382  1818  510  946  1818  2254 
ML  16,384  4,194,304 
ODN MMSESIC, RDN LRAMMSESIC, and RDN LRAMMSEOSIC computational complexities in MUL
MUL (16QAM extension)  MUL (16QAM)  

Technique  K = 1  K = 2  K = 3  K = 4  K = 1  K = 2  K = 4  K = 16 
ODN equivalent MMSE(O)SIC FNSA  560  1120  1680  2240  624  1200  2352  9264 
RDN equivalent LRAMMSE(O)SIC FNSA  1694  3122  4550  5978  510  946  1818  2254 

▪The equivalent expression of the LRAMMSEcentered SD, which corresponds to an efficient LRAMMSEOSIC Babai point, improves the performance or reduces the complexity of the detector.

▪The proposed (S)QRD formulation with reduced domain neighborhood induces the use of the best known hard detector as a Babai point, for both large number of antennas and highorder modulations.

▪The proposed expression is robust by nature to any search center and constellation order and offers closetooptimal performance for large K. Likewise, the proposed solution offers a computational complexity that is independent of the constellation order which consequently offers a solution that outperforms classical SD techniques for a reasonable computational complexity in the case of highorder constellations. For instance, the neighborhood study size K has been reduced to K = 2 for a 16QAM modulation compared to classical SD techniques.
7 Conclusions
In this paper, the LRAMMSEcentered SD has been proposed with a Kbest neighborhood generation. A detailed and hardware implementationoriented computational complexity estimation has been provided and combined with performance results. It has been shown that the proposed detection technique outperforms the existing solutions. In particular, the corresponding implementation complexity has been shown to be independent of the constellation size and polynomial in the number of antennas while reaching the ML performance with both real and perfect channel estimation. It implies a ten times lower computational complexity compared to the classical Kbest, even for a large MIMO system, with 16QAM modulation on each layer.
It is worth mentioning that, with respect to our previous work in [1], this paper presents a detailed technical description of the proposed methodology, a detailed complexity analysis, and more results. This particularly includes a step by step implementation of the proposed algorithm in Appendix 4.
Notes
Declarations
Acknowledgements
This paper was partially presented in [1].
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 S Aubert, Y Nasser, F Nouvel, Lattice reductionaided minimum mean square error kbest detection for MIMO systems, in Proc. of the International Conference Computing, Networking and Communications (ICNC), 2012, pp. 1066–1070Google Scholar
 F Rusek, D Persson, BK Lau, EG Larsson, TL Marzetta, O Edfors, F Tufvesson, Scaling up MIMO: opportunities and challenges with very large arrays. IEEE Signal Processing Magazine 30(1), 40–46 (2013)View ArticleGoogle Scholar
 EG Larsson, F Tufvesson, O Edfors, TL Marzetta, Massive MIMO for next generation wireless systems. IEEE Commun. Mag. 52(2), 186–195 (2014)View ArticleGoogle Scholar
 Y Kong, Q Zhou, X Ma, Lattice reduction aided transceiver design for multiuser MIMO downlink transmissions, in Proc. of the IEEE Military Communications Conference (MILCOM), 2014, pp. 556–562Google Scholar
 KA Singhal, T Datta, A Chockalingam, Lattice reduction aided detection in largeMIMO systems, in Proc. of the IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2013, pp. 594–598Google Scholar
 E Zimmermann, G Fettweis, Linear MIMO receivers vs. tree search detection: a performance comparison overview, in Proc. of the IEEE Personal Indoor and Mobile Radio Communications (PIMRC), 2006, pp. 1–7Google Scholar
 N Prasad, MK Varanasi, Analysis of decision feedback detection for MIMO Rayleighfading channels and the optimization of power and rate allocations. IEEE Transactions on Information Theory 50(6), 1009–1025 (2004)MATHMathSciNetView ArticleGoogle Scholar
 R Xu, FCM Lau, Performance analysis for MIMO systems using zero forcing detector over fading channels. IEE Proceedings on Communications 153(1), 74–80 (2006)Google Scholar
 Y Nasser, JF Hélard, M Crussière. System Level Evaluation of Innovative Coded MIMOOFDM Systems for Broadcasting Digital TV; in EURASIP International Journal of Digital Multimedia Broadcasting. 2008(359206), 12 (2008)Google Scholar
 E Viterbo, J Boutros, A universal lattice code decoder for fading channels. IEEE Trans. on Information Theory 45, 1639–1642 (1997)MathSciNetView ArticleGoogle Scholar
 B Hassibi, H Vikalo, On the expected complexity of sphere decoding, in Proc. of the Asimolar Conference on Signal, Systems and Computers, 2001, pp. 1051–1055Google Scholar
 C Schnorr, M Euchner, Lattice basis reduction: improved practical algorithms and solving subset sum problems. Mathematical Programming 66, 181–199 (1994)MATHMathSciNetView ArticleGoogle Scholar
 Z Guo, P Nilsson, Algorithm and implementation of the Kbest sphere decoding for MIMO detection. IEEE Journal on Selected Areas in Communications 24(3), 491–503 (2006)View ArticleGoogle Scholar
 LG Barbero, JS Thompson, A fixedcomplexity MIMO detector based on the complex sphere decoder. IEEE 7th Workshop on Signal Processing Advances in Wireless Communications, 2006. SPAWC ’06. pp. 1, 5, 2–5 (2006)Google Scholar
 M Mohaisen, KyungHi Chang, On improving the efficiency of the fixedcomplexity sphere decoder. 2009 IEEE 70th Vehicular Technology Conference Fall (VTC 2009Fall), 20–23 Sept 2009, pp. 1, 5Google Scholar
 Y Ding, Y Wang, JF Diouris, Z Yao, Robust fixedcomplexity sphere decoders for rankdeficient MIMO systems. IEEE Trans. Wireless Commun 12(9), 4297–4305 (2013)View ArticleGoogle Scholar
 J Fink, S Roger, A Gonzalez, V Almenar, VM Garciay, Complexity assessment of sphere decoding methods for MIMO detection. 2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 14–17 Dec 2009, pp. 9, 14Google Scholar
 C Hess, M Wenk, A Burg, P Luethi, C Studer, N Felber, W Fichtner, Reducedcomplexity MIMO detector with closeto ML error rate performance, in Proc. of the GLSVLSI, 2007, pp. 200–203View ArticleGoogle Scholar
 D Wuebben, R Bohnke, V Kuhn, KD Kammeyer, MMSEbased latticereduction for nearML detection of MIMO systems, in Proc. of the ITG Workshop on Smart Antennas, 2004, pp. 106–113View ArticleGoogle Scholar
 S Roger, A Gonzales, V Almenar, AM Vidal, Latticereductionaided Kbest MIMO detector based on the channel matrix condition number. 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP), March 2010, pp. 1–4Google Scholar
 CF Liao, YH Huang, Cost reduction algorithm for 8x8 lattice reductionaided Kbest MIMO detector, in Proc. of the IEEE International Conference of Signal Processing, Communication and Computing, 2012, pp. 186–190Google Scholar
 XF Qi, K Holt, A latticereductionaided soft demapper for highrate coded MIMOOFDM systems. IEEE Signal Processing Letters 14(5), 305–308 (2007)View ArticleGoogle Scholar
 M Shabany, PG Gulak, The application of latticereduction to the Kbest algorithm for nearoptimal MIMO detection. IEEE International Symposium on Circuits and Systems, 2008. ISCAS 2008. 18–21 May 2008, pp. 316–319Google Scholar
 JC Marinello, T Abrao, Lattice reduction aided detector for dense MIMO via ant colony optimization, in Proc. of the IEEE Wireless Communications and Networking Conference (WCNC), 2013, pp. 2839–2844. ShanghaiGoogle Scholar
 LG Barbero, JS Thompson, A fixedcomplexity MIMO detector based on the complex sphere decoder, in Proc. of the Workshop on Signal Processing Advances for Wireless Communications, 2006, pp. 1–5Google Scholar
 E Agrell, T Eriksson, E Vardy, K Zeger, Closest point search in lattices. IEEE Transactions on Information Theory 48(8), 2201–2214 (2002)MATHMathSciNetView ArticleGoogle Scholar
 KW Wong, CY Tsui, SK Cheng, WH Mow, A VLSI architecture of a Kbest lattice decoding algorithm for MIMO channels, in Proc. of the IEEE International symposium on Circuits and Systems, vol. 3, 2002, pp. 273–276Google Scholar
 E Viterbo, J Boutros, A universal lattice code decoder for fading channels. IEEE Transactions on Information Theory 45(5), 1639–1642 (1999)MATHMathSciNetView ArticleGoogle Scholar
 S Aubert, M Mohaisen, From linear equalization to latticereductionaided spheredetector as an answer to the MIMO detection problematic in spatial multiplexing systems. Vehicular Technologies, 9789537619XX, INTECH, (2011)Google Scholar
 BA Lamacchia, Basis reduction algorithms and subset sum problems. Technical report, MSc Thesis, Massachusetts Institute of Technology, 1991Google Scholar
 AK Lenstra, HW Lenstra, L Lovász, Factoring polynomials with rational coefficients. Mathematische Annalen 261(4), 515–534 (1982)MATHMathSciNetView ArticleGoogle Scholar
 M Seysen, Simultaneous reduction of a lattice basis and its reciprocal basis. Combinatorica 13(3), 363–376 (1993)MATHMathSciNetView ArticleGoogle Scholar
 D Wübben, R Böhnke, V Kühm, KD Kammeyer, Nearmaximumlikelihood detection of MIMO systems using MMSEbased latticereduction, in Proc. of the IEEE International Conference on Communications, vol. 2, 2004, pp. 798–802Google Scholar
 S Roger, A Gonzalez, V Almenar, AM Vidal, On decreasing the complexity of latticereductionaided Kbest MIMO detectors, in Proc. of the European Signal Processing Conference, 2009, pp. 2411–2415Google Scholar
 B Gestner, W Zhang, X Ma, DV Anderson, VLSI implementation of a lattice reduction algorithm for lowcomplexity equalization, in Proc. of the IEEE International Conference on Circuits and Systems for Communications, 2008, pp. 643–647Google Scholar
 T Cui, C Tellambura, An efficient generalized sphere decoder for rankdeficient MIMO systems. IEEE Communications Letters 9(5), 423–425 (2005)View ArticleGoogle Scholar
 L Wang, L Xu, S Chen, L Hanzo, MMSE softinterferencecancellation aided iterative centershifting Kbest sphere detection for MIMO channels, in the Proc. of the IEEE International Conference on Communications, 2008, pp. 3819–3823Google Scholar
 J Jalden, B Ottersten, On the complexity of sphere decoding in digital communications. IEEE Transactions on Signal Processing 53(4), 1474–1484 (2005)MathSciNetView ArticleGoogle Scholar
 X Wang, Z He, K Niu, W Wu, X Zhang, An improved detection based on lattice reduction in MIMO systems, in Proc. of the IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), 2006, pp. 1–5Google Scholar
 C Studer, A Burg, H Bolcskei, Softoutput sphere decoding: algorithms and VLSI implementation. IEEE Journal on Selected Areas in Communications 26(2), 290–300 (2008)View ArticleGoogle Scholar
 W Zhang, M Xiaoli, Approaching optimal performance by latticereduction aided soft detectors. 41st Annual Conference on Information Sciences and Systems, 2007. CISS ’07. 14–16 March 2007, pp. .818–822Google Scholar
 M Pohst, On the computation of lattice vectors of minimal length, successive minima and reduced basis with applications. ACM SIGSAM Bull. 15, 37–44 (1981)MATHView ArticleGoogle Scholar