 Research
 Open Access
 Published:
A simplified hard output sphere decoder for large MIMO systems with the use of efficient search center and reduced domain neighborhood study
EURASIP Journal on Wireless Communications and Networking volume 2015, Article number: 227 (2015)
Abstract
Multipleinput multipleoutput (MIMO) with a spatialmultiplexing (SM) scheme is a topic of high interest for the next generation of wireless communications systems. At the receiver, neighborhood studies (NS) and lattice reduction (LR)aided techniques are common solutions in the literature to approach the optimal and computationally complex maximum likelihood (ML) detection. However, the NS and LR solutions might not offer optimal performance for large dimensional systems, such as large number of antennas, and highorder constellations when they are considered separately. In this paper, we propose a novel equivalent metric dealing with the association of these solutions by introducing a reduced domain neighborhood study. We show that the proposed metric presents a relevant complexity reduction while maintaining nearML performance. Moreover, the corresponding computational complexity is shown to be independent of the constellation size, but it is quadratic in the number of transmit antennas. For instance, for a 4 × 4 MIMO system with 16QAM modulation on each layer, the proposed solution is simultaneously nearML with perfect and real channel estimation and ten times less complex than the classical neighborhoodbased Kbest solution.
Introduction
Multipleinput multipleoutput (MIMO) technology has taken a lot of attention in the last decade since it can improve link reliability without sacrificing bandwidth efficiency or, contrariwise, it can improve the bandwidth efficiency without losing link reliability [1]. Recently, the concept of large MIMO systems, i.e., high number of antennas, has also gained research interests, and it is well seen as a part of nextgeneration wireless communication systems [2, 3].
However, the main drawback of MIMO technology is the increased complexity at the receiver side when a nonorthogonal (NO) MIMO scheme with a large number of antennas and/or large constellation size is implemented [4, 5]. For the detection process, although the performance of the maximum likelihood (ML) detector is optimal, its computational complexity increases exponentially with the number of transmit antennas and with the constellation size. In literature, different MIMO detection techniques have been proposed. The linearlike detection (LD) [6] and decisionfeedback detection (DFD) [7] are the baseline detection algorithms. Here, we distinguish the conventional linear MIMO detection techniques zero forcing (ZF) [8] and minimum mean square error (MMSE) [8]. Although linear detection approaches are attractive in terms of their computational complexity, they might lead to a nonnegligible degradation in terms of performance [9].
Some nonlinear detectors have been also introduced. The sphere decoder (SD), one of the most famous MIMO detectors, is based on a tree search and is very popular due to its quasioptimal performance [10]. However, this performance is reached at the detriment of an additional implementation complexity. Indeed, the SD achieves quasiML performance while its average complexity is shown to be polynomial (roughly cubic) in constellation size and in the number of transmit antennas over a certain range of signaltonoise ratio (SNR) while the worst case complexity is still exponential [11]. From a hardware implementation point of view, the SD algorithm presents two main drawbacks. Firstly, its complexity coefficients can become large when the problem dimension is high, i.e., at the high spectral efficiency, high number of antennas, and high number of users in multiuser MIMO (MUMIMO) context. Secondly, the variance of its computation time can be also large leading to undesirable highly variable decoding delays. Despite classical optimizations such as the SchnorrEuchner (SE) enumeration [12], the SD originally presented in [11] offers by nature a sequential tree search phase, which is an additional drawback for implementation. In order to deal with these two aspects, the authors in [13] have proposed a suboptimal solution denoted as the Kbest [13, 14], where K is the number of stored neighbors given a layer. However, even with a fixed computational complexity and a parallel nature of implementation, some optimizations are required especially for highorder constellation and large number of antennas (due to the large K required in this case) [15–18]. Aiming at reducing the neighborhood size (namely K, over all layers), different solutions are proposed. For instance, the sorted QR decomposition (SQRD)based dynamic Kbest which leads to the famous SQRDbased fixed throughput SD (FTSD) is proposed in [16]. Even with these efforts, the neighborhood size still induces a computationally expensive solution for achieving quasiML performance. An alternative trend has been firstly presented in the literature by Wuebben et al. in [19]. It consists in adding a preprocessing step, namely the lattice reduction (LR), aiming at applying a classical detection through a betterconditioned channel matrix [19–21]. This solution has been shown to offer the full reception diversity at the expense of a SNR offset in the system performance. This offset increases with large dimensional transmit antenna systems and highorder modulations.
Recently, a promising—although complex—association of the Kbest and LR solutions has been considered. It provides a convenient performancecomplexity tradeoff. The general idea consists in reducing the SNR offset through a neighborhood study which yields a nearML performance for a reasonable K. The concept has been introduced first by the authors of [22]. Later on, their basic solution has been improved by proposing to model the sphere constraint in a reduced domain or by introducing efficient symbols’ enumeration algorithm [23]. However, a major aspect of this combination has not been considered yet. In particular, any SD, including the Kbest, may be advantageously applied by considering a betterconditioned channel matrix through the introduction of a Reduced Domain Neighborhood (RDN) study and a judicious search center. In [5], an improved LR technique dealing with the RDN has been proposed in the context of large MIMO systems. It is based on the decomposition of the spanned space of the channel matrix into small subspaces in order to improve orthogonality of the quantization. In [24], the search center is found through an ant colony optimization and initial search through the output of the MMSE detector.
In this paper, we adopt the Kbest solution with fixed complexity as the basic solution of the SD. We propose to reduce the neighborhood size through an efficient preprocessing step which allows the SD process to apply a neighborhood study in a modified constellation domain. Then, using the modified domain, we propose a modified novel ML equation with an efficient search center. We show that the proposed metric presents a large complexity reduction while maintaining nearML performance. Moreover, the corresponding complexity is shown to be independent of the constellation size and polynomial in the number of transmit antennas. In particular, for a 4 × 4 MIMO system with 16QAM modulation on each layer, the proposed association presents nearML performance while it is ten times less complex than the classical Kbest solution. We note that because the complexity is fixed with such a detector, the exposed optimizations guarantee a performance gain for a given neighborhood size or a reduction of the neighborhood size for a given bit error rate (BER) target. The contributions of this paper are summarized as follows:^{Footnote 1}

A promising association of the Kbest and LR solutions is proposed.

Modification of the SD neighborhood study by applying a preprocessing step. This is accompanied with a new and efficient search center and MMSE detector. The equivalent expression of the lattice reductionaided (LRA)MMSEcentered SD, which corresponds to an efficient LRAMMSEsuccessive interference canceller (SIC) Babai point, is proposed to improve the performance or reduce the complexity of the detector.

The (S)QRD is introduced in formulas. It provides—to the best of the authors’ knowledge—the best known pseudolinear hard detector as a Babai point, for large number of antennas as well as for highorder modulations.

The proposed expression is robust by nature to any search center and constellation order and offers closetooptimal performance with medium K values. This applies for both perfect and real channel estimation.

The proposed solution offers a computational complexity that is independent of the constellation order. Therefore, it outperforms classical SD techniques for a reasonable complexity in the case of highorder constellations. We show for example that a number of neighbors K = 2 is sufficient for a 4 × 4 MIMO system with 16QAM modulation on each layer, and it is less than 0.5 dB for a 64 × 64 and 128 × 128 MIMO system from the ML solution.

The proposed solution offers a computational complexity that is quasiconstant for large number of antennas, showing the evidence of its importance.
This paper is organized as follows. Section 2 presents the problem statement of the SD. In Section 3, the different existing solutions are described and analyzed. In Section 4, we propose our generalized solution based on LR with the use of an efficient search center and reduced domain neighborhood. In Section 5, the performance of the presented detectors are provided, compared, and discussed. In Section 6, we consider the computational complexity of the proposed solution in comparison with some reference detection techniques. Conclusions are drawn in Section 7.
Problem statement
Sphere decoder detector
Let us introduce a n _{R} × n _{T} MIMO system model with n _{T} transmit and n _{R} receive antennas. Then, the received symbols vector could be written as
where H represents the (n _{R}, n _{T}) complex channel matrix assumed to be perfectly known at the receiver, x is the transmit symbol vector of dimensions (n _{T}, 1) where each entry is independently withdrawn from a constellation set ξ, and n is the additive white Gaussian noise of dimensions (n _{R}, 1) and of variance σ ^{2}/2 per dimension. The basic idea of the SD, to reach the optimal estimate \( {\widehat{x}}_{\mathrm{ML}} \) (given by the ML detector) while avoiding an exhaustive search, is to observe the lattice points that lie inside a sphere of radius d.
The SD solution starts from the ML equation \( {\widehat{\boldsymbol{x}}}_{\mathrm{ML}}=\underset{\boldsymbol{x}\in {\xi}^{n_{\mathrm{T}}}}{\mathrm{argmin}}\parallel \boldsymbol{y}\boldsymbol{H}\boldsymbol{x}{\parallel}^2 \) and reads
where H = QR, with the classical QRD definitions.
The classical SD formula in (2) is centered on the received signal y. From now on, this detection will be denoted as the naïve SD. In the case of a depthfirst search algorithm [13], the first solution given by this algorithm is defined as the Babai point [25, 26]. In order to write it, the classical SD expression may be rearranged, leading then to an exact formulation through an efficient partial Euclidean distance (PED) expression and early pruned nodes [27].
In the literature, the SD principle leads to numerous implementation problems. In particular, it is a nondeterministic polynomialtime (NPhard) problem [28]. This aspect has been partially solved through the introduction of an efficient solution that lies in a fixed neighborhood size algorithm (FNSA), commonly known as the Kbest solution. However, this solution makes the detector suboptimal since it leads to a performance loss compared to the ML detector. It is particularly true in the case of an inappropriate choice of K according to the MIMO channel condition number and in the case of an inappropriate choice of d in (2). Indeed, an inappropriate choice of d could lead to a ML solution excluded from the search tree. On the other hand, although a neighborhood study remains the one and only one solution that achieves nearML performance, it may lead to the use of a largesize neighborhood scan which would correspond to a dramatic increase of the computational complexity. This complexity’s increase will be prohibitive for highorder modulations.
Lattice reduction
Through the aforementioned considerations and by using the lattice definition in [26], the system model given in (1) rewrites
where \( \tilde{\boldsymbol{H}}=\boldsymbol{H}\boldsymbol{T} \) and z = T ^{− 1} x. The n _{T} × n _{T} complex matrix T (with det{T} = 1) is unimodular, i.e., its entries belong to the set of complex integers which reads ℤ_{ℂ} = ℤ + jℤ, with j ^{2} = 1. The key idea of any LRaided (LRA) detection scheme is to understand that the finite set of transmitted symbols \( {\xi}^{n_T} \)can be interpreted as a denormalized, shifted then scaled version of the infinite set of complex integers subset \( \subset {\mathbf{\mathbb{Z}}}_{\mathbb{C}}^{n_T} \) according to the relations offered in [29].
To this end, various reduction algorithms have been proposed [19, 30–32]. In the following, we focus on the wellknown LenstraLenstraLovász (LLL) algorithm due to considerations presented in [30, 33]. The lattice aided (LA) is a local approach [34] that transforms the channel matrix into an LLLreduced basis that satisfies both of the orthogonality and norm reduction conditions [31]. While it has been shown in [33] that the QRD outputs of the channel matrix is a possible starting point for the LLL, it has been subsequently introduced that the SQRD provides a better starting point [34]. In particular, it leads to a significant reduction of its computational complexity [35]. That is, the detection process in (3) is performed on z instead of x through the betterconditioned matrix \( \tilde{\boldsymbol{H}} \). Wuebben et al. [19] proposed a full description of some reference solutions, namely the LRAZF and LRAZFSIC without noise power consideration and the LRAMMSE, LRAMMSE extended, and LRAMMSESIC. LRA detectors constitute efficient detectors in the sense of the high quality of their hard outputs. Indeed, they offer a low overall computational complexity while the ML diversity is reached within a constant offset. However, some important drawbacks exist. In particular, the aforementioned SNR offset is important in the case of highorder modulations and of large number of antennas. This issue is expected to be bypassed through an additional neighborhood study.
Lattice reductionaided sphere decoder
Contrary to the LRA(O)SIC receivers, the application of the LR preprocessing step followed by any SD detector is not straightforward. The main problem lies in the consideration of the possibly transmit symbol vector in the reduced constellation since, unfortunately, the set of all possible transmit symbols vectors cannot be predetermined. The reason for that is because the solution does not only depend on the employed constellation but also on the T ^{−1} matrix of (3). Hence, the number of children in the search tree and their values are not known in advance. A bruteforce solution is then to determine the set of all possible transmit vectors in the reduced constellation, starting from the set of all possible transmit vectors in the original constellation and by switching to the reduced domain, thanks to the T ^{−1} matrix.
Detection process in the original domain neighborhood
Zero forcingcentered sphere decoder with original domain neighborhood study detection process
In order to deal with the detection process, we firstly introduce the sphere center x _{C} search algorithm. It concerns any signal of the form ∥x _{C} − x∥ ^{2} ≤ d ^{2} where x is a possible signal.
Based on this search algorithm, different possible sphere centers could be introduced. Using a ZF detector, the received symbols given in (2) are then estimated through
where e _{ZF} = x _{ZF} − x and x _{ZF} = (H ^{H} H)^{− 1} H ^{H} y.
Equation (4) clearly shows that the naïve SD is unconstrained ZFcentered. It implicitly corresponds to a ZFSIC solution with an Original Domain Neighborhood (ODN) study at each layer where each layer is defined as the number of spatial multiplexed data streams. It can be noticed that, in the case of a large ODN study, the ML performance is achieved since the computed metrics are exactly the ML metrics. However, this occurs at the detriment of a large neighborhood study and subsequently a large computational complexity.
Minimum mean square errorcentered sphere decoder with original domain neighborhood study detection process: equivalent formula
In this section, we introduce the minimum mean square error successive interference cancellation (MMSESIC), a closertoML Babai point than the ZFSIC. For the sake of clearness with definitions, we firstly give a general definition of the equivalence between two ML metrics.
Definition Two ML equations are equivalent if the lattice point argument outputs of the minimum distance are the same, even in the case of different metrics. Two ML equations are equivalent iff:
where c is a constant.
Using (5), Cui et al. [36] proposed a general equivalent minimization problem given by
where the signals x have to be of constant modulus, i.e., x ^{H} x is a constant.
This assumption is respected in the case of quadrature phaseshift keying (QPSK) modulations, but it is not directly applicable to 16QAM and 64QAM modulations. However, this assumption is not limiting in practice since a QAM constellation can be considered as a linear sum of QPSK points [36]. In Appendix 1, we discuss the constant modulus constraint on the signal x.
The authors of [37] proposed to apply this solution to the FNSA detection technique of the unconstrained MMSE center, leading to a MMSESIC procedure with an ODN study at each layer [37]. In this case, the equivalent ML equation reads
Through the use of the Cholesky factorization (CF) of H ^{H} H + σ ^{2} I = U ^{H} U in the MMSE case (H ^{H} H = U ^{H} U in the ZF case), the ML expression equivalently rewrites, using the proof in Appendix 2, as
where U is upper triangular with real diagonal elements and \( \tilde{x} \) is any (ZF or MMSE) unconstrained linear estimate.
Proposed detection process in the reduced domain neighborhood
Due to the implementation drawbacks, the optimal SD has been proposed to be replaced by a suboptimal FNSA. Hassibi et al. have discussed and shown in [11] that the detector performance is impacted by the noise power and the channel condition number. Hence, the presence of a wellconditioned channel could highly reduce the neighborhood. This means that realizing a LR step followed by a neighborhood study is a very interesting solution in a goodconditioned channel matrix. Accordingly, our proposed combined solution will be detailed in the next subsections.
Preprocessing
All existing solutions rely on the utilization of the efficient CF preprocessing step. However, these solutions are only functional in the case of a factorized formulation form. Although it is the case in our context, most of the advanced studies have been provided with the applicable QRD. In particular, the advantageous SIC performance optimizations such as ordering according to the corresponding decreasing SNR (from n _{T} to 1) in the ZFSQRD case and SINR in the MMSESQRD case have been proposed in [33]. Moreover, a complexity reduction of the LLLbased LR algorithm has been proposed by the same authors in [33]. In our work, we propose to modify the classical detectors by introducing the QRD instead of the CF, and subsequently of the SQRD, in the (LRA)MMSE(O)SIC cases.
The MMSE criterion is introduced through the consideration of an extended system model [27], by introducing the (n _{R} + n _{T}) ‐ by ‐ n _{ T } matrix H _{ext} and the (n _{R} + n _{T}) vector y _{ext} such as
In this way, the preprocessing step is similar to the ZFSQRD and the detection procedure equals that of LRAZFSIC. The SQRD interest lies in the ordering of the detection symbols as a function of their S(I)NR, and consequently, it limits the error propagation in SIC procedures. Indeed, it has been shown by Wübben et al. [19] that the optimum order offers a performance improvement even if the ML diversity is not reached. On the other hand, it was shown that once the ML diversity is achieved through a LRA technique, the performance may be significantly improved with this solution [19]. Thus, The LRAMMSEOSIC corresponds, to the best of the authors’ knowledge, to the best pseudolinear detector proposed in the literature, in particular in the case of 4 × 4 MIMO systems with QPSK modulations on each layer [19]. For higher order constellations or larger number of antennas, it may be shown that our proposed solution offers convenient harddecision performance with a highly reduced complexity. In order to deal with these statements, we introduce the reduced domain neighborhood by using the following notations:

\( {Q}_{\xi^{n_{\mathrm{T}}}}\left\{.\right\} \) is the quantification operator in the original domain constellation,

\( {Q}_{{\mathrm{\mathbb{Z}}}_{\mathrm{\mathbb{C}}}^{n_{\mathrm{T}}}}\left\{.\right\} \) is the quantification operator in the reduced domain constellation,

a is the power normalization and scaling coefficient (i.e., \( 2/\sqrt{2},\ 2/\sqrt{10},\ \mathrm{and}\ 2/\sqrt{42} \) for QPSK, 16QAM, and 64QAM constellations, respectively)

\( \boldsymbol{d}=\frac{1}{2}{\boldsymbol{T}}^{1}{\left[\begin{array}{ccc}\hfill 1+j\hfill & \hfill \dots \hfill & \hfill 1+j\hfill \end{array}\right]}^T \) is a complex displacement vector.
The classical LRAFNSA is implicitly unconstrained LRAZFcentered, which leads to a LRAZFSIC procedure with a RDN study at each layer. The exact formula has not been clearly provided but is implicitly used by any LRAFNSA [21] and may even be considered as an incremental extension of (4):
where \( \tilde{\boldsymbol{R}} \) is the LLLbased LR algorithm output, e _{LRA ‐ ZF} = z _{LRA ‐ ZF} − z, and \( {\mathrm{\mathbb{Z}}}_{\mathrm{\mathbb{C}}}^{n_{\mathrm{T}}} \) is the n _{T}dimensional infinite set of complex integers.
Lattice reductionaided minimum mean square errorcentered sphere decoder with reduced domain neighborhood study detection process
To the best of the author’s knowledge, no convincing formula has been proposed until now. Even if Jalden et al. [38] proposed a LRAMMSEcentered solution, the introduced metrics are not equivalent to the ML expression. The solution of [38] is given by
The corresponding detector is a suboptimal solution that consists in a RDN study around the unconstrained LRAMMSE solution, obtained through QR decomposition. This solution’s output is the constrained LRAMMSE detection plus a list of solutions in the neighborhood. The latter is generated according to a nonequivalent metric, which would be subsequently reordered according to the exact ML metric. However, the list is not generated according to the correct distance minimization criterion and would not lead to a nearML solution. Consequently, the proposed detector does not offer an acceptable uncoded BER performance in the sense that it would not lead to a nearML solution. In particular, the ML performance is not reached in the case of a large neighborhood study.
An efficient solution is derived from (11) and consists in an unconstrained LRAMMSE center which leads to a LRAMMSESIC procedure with a RDN study at each layer. The equivalent ML equation reads
where \( {\tilde{\boldsymbol{H}}}^H\tilde{\boldsymbol{H}}+{\sigma}^2{\boldsymbol{T}}^H\boldsymbol{T}={\tilde{\boldsymbol{U}}}^H\tilde{\boldsymbol{U}} \) in the MMSE case (\( {\tilde{\boldsymbol{H}}}^H\tilde{\boldsymbol{H}}={\tilde{\boldsymbol{U}}}^H\tilde{\boldsymbol{U}} \) in the ZF case) and by noting that Ũ is upper triangular with real diagonal elements and \( \tilde{\boldsymbol{z}} \) is any LRA (ZF or MMSE) unconstrained linear estimate. The proof of this detector formula is given in Appendix 3.
The formula introduced in (12) offers an equivalent metric to the MMSE one introduced in (11), which has been shown to be nearML performance. The difference, and in particular the interest in the LRA case in (12), relies on the neighborhood study nature. In the case of a RDN study, the equivalent channel matrix \( \tilde{\boldsymbol{H}} \) is considered and is remembered to be only roughly, and not exactly, orthogonal. Consequently, the detection, layer by layer, of the symbol vector x does not exactly correspond to its joint detection since the mutual influence of the transformed z signal is still present. This discussion not only exhibits the interest of SDlike techniques to still improve such a detector performance but also puts a big challenge to achieve the ML performance.
The general principle of RDN LRAMMSEOSICcentered solution key points is depicted as a block diagram in Fig. 1. The detailed block diagram description of the proposed solution is addressed in Fig. 2.
In Fig. 2, the mapping of any estimate (or list of estimates) from the reduced domain ẑ to the original domain \( \tilde{\boldsymbol{x}} \) is processed through the T matrix multiplication (see Equation (3)). The additional quantification step aims at removing duplicate symbol vector outputs in the case of a list of solutions.
For the sake of simplicity, let us consider any LRASIC procedure with no neighborhood study. The search center is updated at each layer as follows. By considering the kth layer and with the knowledge of the \( {\widehat{\boldsymbol{z}}}_{k+1:{n}_{\mathrm{T}}} \) estimates at previous layers, the ẑ _{ k } unconstrained Babai point can be provided. Then, it has to be denormalized and shifted to make it belong to \( {\mathrm{\mathbb{Z}}}_{\mathrm{\mathbb{C}}}^{n_{\mathrm{T}}} \). After quantization, and deshifting and normalization, the ẑ _{ k } estimate at the kth layer is obtained such as the next (k − 1)th layer can be considered, until the whole symbol vector is detected. As previously introduced, the neighborhood generation is a problematic step due to the infiniteness and nonregular natures of the constellations in the reduced domain. This point is transparent with classical detectors such as LD and DFD, thanks to the straightforward quantification step in the reduced domain [39].
However, the issue of infinite lattices, addressed through a sphere constraint, appears when working with the classical considerations. It presents a performance loss or a NPhard complexity solution. Hence, our proposed solution relies on a SE enumeration. Starting from the LRASIC principle, a neighborhood is considered at each layer and leads to the RDN LRASIC FNSA principle. In particular and due to the implementation constraints, the RDN generation is processed for bounded number of N possibilities and in a SE fashion, namely with ordered PEDs according to an increasing distance from \( {\tilde{\boldsymbol{z}}}_k \) at each layer as follows:
The SE strategy aims at finding the correct decision early, leading to a safe early termination criterion, which is not considered here for the sake of readability in performance comparison. Also, all the corresponding PEDs are computed and then ordered. The Kbest solutions, namely with the lowest PED, in the reduced domain are stored (C _{ ẑ }) similarly to their corresponding cumulative Euclidean distances (CED) (D _{tot}). The whole procedure is depicted in Fig. 2.
By adding the preprocessing steps, i.e., the SQRDbased then LLLbased LR blocks, and the computation of a closetoML unconstrained estimate (although linear) such as LRAMMSE extended, a complete description of the detection may be obtained. Figure 3 shows the detailed block diagram of the complete proposed solution. The SQRD block offers an efficient layer reordering [19] that lies on the noise power. The latter is taken into account in the rest of the detector through the T matrix.
As a final step of the detector and in the case of a RDNbased SD, the list of possible symbols output has to be reordered according to the ML metrics in the original domain and duplicate solutions are removed. It is due to the presence of noise that makes some candidates to be mapped on nonlegitimate constellation points in the reduced constellation, leading to nonacceptable points in the original constellation. The symbol vector associated to the minimal metric becomes the hard decision output of the detector and offers a nearML solution. The proposed algorithm is described in detail in Appendix 4. The reader may refer to this appendix for more details.
System performance
In this section, we present and compare the system performance of the different techniques previously presented, and we compare them with our proposed solution. For clearness target, we summarize the detection metrics for each solution in Table 1.
We should note that the RDN LRAMMSEOSIC FNSA, to which this paper relates, is particularly efficient in the case of rankdeficient MIMO systems, i.e., spatially correlated antenna systems, for highorder modulation which are considered points of the LTEA norm and for large number of antennas as in the future generation of cellular systems (beyond 4G networks). Moreover, since the equivalent channel matrix in the LRA case is only roughly orthogonal, the mutual influence of the transformed z is small but still present. Hence, a neighborhood study in the original constellation domain improves the performance compared to a SIC. However, contrarily to classical solutions that are not LRA, the necessary size for achieving the optimal performance is smaller.
Figure 4 depicts the BER for the aforementioned techniques. Some notable points have to be highlighted from this figure. Contrary to the RDN LRAZF/MMSE(O)SIC FNSA, the ODN ZF/MMSESIC FNSA does not reach the ML diversity for a reasonable neighborhood size, even if there is a decrease of the SNR offset in the MMSESIC case. However, a BER offset can be observed in the low SNR range, due to error propagation. Consequently, there exists a switching point from low to high SNR between LRA detectors and others. This aspect is removed through the use of better techniques. In particular, the SQRD in the RDN LRAMMSEOSIC FNSA presented in this work offers ML diversity, and the BER offset in low SNR has been highly reduced compared to the RDN LRAMMSESIC FNSA and is now closetoML.
It may also be noticed in Fig. 4 that the RDN LRAZFSICcentered FNSA does not reach the ML performance, contrarily to other techniques. It is due to the chosen neighborhood size in the reduced constellation value (N = 5) that is not sufficient for this detector but that is sufficient for the proposed LRAMMSE(O)SIC Babai points. With a larger N value, the RDN LRAZFSICcentered FNSA achieves the ML performance, similarly to other presented detectors.
Similarly to Fig. 4, some notable points have to be highlighted from Fig. 5. There still exists a switching point from low to high SNR regime between LRA detectors and others. This aspect is removed through the use of better techniques. In particular, the SQRD in the RDN LRAMMSEOSIC FNSA offers ML diversity and the BER offset in low SNR has been importantly reduced compared to the RDN LRAMMSESIC FNSA, leading now to a closetoML solution. We can observe from both Figs. 4 and 5 that even though when ZFSIC and equivalent MMSESIC are not LRA, they achieve the ML performance at the detriment of a very large neighborhood study size; it is of the order of the number of symbols contained in the employed constellation. By comparing the impact on LRA detector performance of QPSK and 16QAM modulations, two fundamental points must be discussed. Firstly, there implicitly exists a constraint from the QPSK constellation construction that eliminates nearby lattice points that do not belong to \( {\xi}^{n_{\mathrm{T}}} \), due to the quantization operation \( {Q}_{\xi^{n_{\mathrm{T}}}}\left\{.\right\} \). This aspect annihilates a large part of the LRaid benefit and cannot be corrected despite the increase of the neighborhood study size since many lattice points considered in the RDN would be associated with the same constellation point after quantization in the original constellation. In the case of larger constellation orders, the LRA solution is more effective, as depicted in Fig. 5.
Secondly, we recall that the constant modulus constellation assumption has, in theory, to be fulfilled. It was not the case in Fig. 5 with 16QAM modulation on each layer. However, it could be assumed that this constraint would be almost respected in mean value as shown in Appendix 1 (Fig. 12). In Fig. 6, the performance of R(O)DN (LRA)MMSE(O)SIC FNSA detectors with or without respect of this assumption are depicted, but only for a neighborhood scan of 1 and 2 neighbors for the sake of consistency between QPSK and 16QAM performance.
As depicted in Fig. 6 and with 16QAM modulation, the performance is impacted by the fact that the strict equivalence assumption is not true, i.e., the term x ^{H} x (or z ^{H} z) is not exactly constant but only constant in average. As shown in this figure, this assumption is not constraining in terms of performance loss. Moreover, it is insignificant compared to the advantage of the LRA in highorder constellation, which would be annihilated by the use of QPSK constellation.
The proposed solution is particularly efficient for a large number of antennas and for highorder constellations. It was not the case of the LRAMMSEOSIC that has been shown worse BER performance in 4 × 4 MIMO systems with a 16QAM modulation on each layer, compared to the ML detection [40], while it was the case for 4 × 4 MIMO systems with QPSK modulation on each layer [41]. For the sake of completeness of this work, Fig. 7 shows the same results with 64QAM modulation as those given in Fig. 5. Again this figure shows the outperformance of the proposed detection algorithm with highorder constellation.
Figure 8 shows the comparison between the proposed RDN LRAMMSEOSICcentered FNSA and the ML detection for high number of antennas, such that n _{R} = n _{T} = N = 64 and N = 128 and, K = 2. First, there is no doubt that increasing the number of antennas increases the performance gain. Secondly, the proposed solution shows a comparable performance with respect to the ML decoder. At a BER = 10^{−4}, the SNR loss is less than 0.4 dB for 16QAM and less than 0.5 dB for 64QAM while the complexity of the proposed RDN LRAMMSEOSICcentered FNSA solution is by far much lower than the ML decoder. This will be discussed in the next section.
Finally, even though it is not the target of the paper, we have drawn the simulation results of the proposed solution with real channel estimation. Figure 9 shows the simulation results when the channel estimation error variance Δ is equal to 0.001 and 0.005, assuming that the channel coefficients power is normalized by the number of antennas. This figure shows that the proposed LRAMMSE solution still presents quasiML detection even with real channel estimation.
Complexity evaluation
Based on the assumptions presented in Table 1, the computational complexities introduced in Table 2 can be demonstrated. The RDN study is processed in an infinite lattice which would not lead to boundary control; however, a finite set of displacements has been generated in a SE fashion in simulations. Its size has been fixed to an arbitrary value (N = 5)—decided through simulations. Although an SE technique is used, the proposed solution does not consider any complexity reduction like early termination.
As shown in Table 3, the computational complexities of RDN LRAZF/MMSE(O)SIC FNSA detectors do not depend on the constellation order log_{2}{M}. It may be checked in the numerical applications in Table 4, and it is the key point of the paper advantage over classical techniques for highorder modulations such as 16(64)QAM. The SNR loss compared to ML are given in Table 4. They have been measured for an uncoded BER of 10^{−4} in the case of the ML decoder. For all the configurations given in Table 4, the numerical application of the corresponding computational complexity is given in Table 5 for a RDN size N = 5.
Even if the proposed solution is two times more complex in the QPSK case, it offers nearML performance and in particular a SNR gain of 0.3 dB at a BER of 10^{−4}. The interesting point concerns higher order modulations: starting from the 16QAM modulation, the estimated complexity of the proposed solution is ten times less complex than the classical one, for the same performance result. Identically, same conclusions are obtained for a 64QAM modulation. In such case, the complexity gain will increase importantly to reach a hundred times. Similarly, the numerical application of the 16QAM extension complexity is given in Table 6. As an example, in the case of 16QAM modulations, the computational complexities read \( 8MK{n}_{\mathrm{T}}^2+4MK{n}_{\mathrm{T}}4MK+3M \) for the ODN equivalent MMSE(O)SIC and \( 8N \min \left\{K,N\right\}{n}_{\mathrm{T}}^2+60 \min \left\{K,N\right\}{n}_{\mathrm{T}}+4N\ \min \left\{K,N\right\}{n}_{\mathrm{T}}4N \min \left\{K,N\right\}+24 \min \left\{K,N\right\}{n}_{\mathrm{T}}^2+8 \min \left\{K,N\right\}{n}_{\mathrm{R}}{n}_{\mathrm{T}}+2 \min \left\{K,N\right\}{n}_{\mathrm{R}}+16{n}_{\mathrm{T}}^232 \min \left\{K,N\right\}+2N \) for the RDN equivalent LRAMMSE(O)SIC, and with M = 4 since a QPSK modulation is considered in this case. As depicted in Table 6, the computational complexity of the 16QAM extension with respect to the constant modulus criterion is more important compared to the straightforward but not strictly correct solution. Since no significant gain is provided, we consequently claim it does not offer high advantages.
Figure 10 shows the “measured” complexity of all solutions explored in this work versus the constellation size, expressed in terms of the exponent (in base 10) of the computational capacity in MUL, for n _{R} = n _{T} = 8 and K = 2. This figure shows, as explained earlier, that the proposed solution is independent of the constellation size. This is very crucial in the future large MIMO systems exploiting large dimensions. Figure 11 is in line with the previous conclusion. It provides the computation complexity of the different MIMO detection solutions, expressed as a function of the number of antennas. This figure shows that the proposed solution is almost ten times less complex than the classical Kbest solutions. Moreover, it presents almost equal complexity for n _{T} ≥ 32 yielding another important characteristic for large MIMO decoding.
Finally, to give some concrete example, Table 6 compares between ODN and RDN cases. It shows that the proposed solution offers an advantage over existing solutions when applied to any OFDM standard supporting MIMO spatialmultiplexing mode, e.g., IEEE 802.16, IEEE 802.11, 3GPP LTE, and 3GPP LTEA. It may be advantageously considered in the case of a large number of antennas and consequently in the case of the 3GPP LTEA standard. The main advantages reside in the following points:

▪The equivalent expression of the LRAMMSEcentered SD, which corresponds to an efficient LRAMMSEOSIC Babai point, improves the performance or reduces the complexity of the detector.

▪The proposed (S)QRD formulation with reduced domain neighborhood induces the use of the best known hard detector as a Babai point, for both large number of antennas and highorder modulations.

▪The proposed expression is robust by nature to any search center and constellation order and offers closetooptimal performance for large K. Likewise, the proposed solution offers a computational complexity that is independent of the constellation order which consequently offers a solution that outperforms classical SD techniques for a reasonable computational complexity in the case of highorder constellations. For instance, the neighborhood study size K has been reduced to K = 2 for a 16QAM modulation compared to classical SD techniques.
Conclusions
In this paper, the LRAMMSEcentered SD has been proposed with a Kbest neighborhood generation. A detailed and hardware implementationoriented computational complexity estimation has been provided and combined with performance results. It has been shown that the proposed detection technique outperforms the existing solutions. In particular, the corresponding implementation complexity has been shown to be independent of the constellation size and polynomial in the number of antennas while reaching the ML performance with both real and perfect channel estimation. It implies a ten times lower computational complexity compared to the classical Kbest, even for a large MIMO system, with 16QAM modulation on each layer.
Notes
It is worth mentioning that, with respect to our previous work in [1], this paper presents a detailed technical description of the proposed methodology, a detailed complexity analysis, and more results. This particularly includes a step by step implementation of the proposed algorithm in Appendix 4.
References
S Aubert, Y Nasser, F Nouvel, Lattice reductionaided minimum mean square error kbest detection for MIMO systems, in Proc. of the International Conference Computing, Networking and Communications (ICNC), 2012, pp. 1066–1070
F Rusek, D Persson, BK Lau, EG Larsson, TL Marzetta, O Edfors, F Tufvesson, Scaling up MIMO: opportunities and challenges with very large arrays. IEEE Signal Processing Magazine 30(1), 40–46 (2013)
EG Larsson, F Tufvesson, O Edfors, TL Marzetta, Massive MIMO for next generation wireless systems. IEEE Commun. Mag. 52(2), 186–195 (2014)
Y Kong, Q Zhou, X Ma, Lattice reduction aided transceiver design for multiuser MIMO downlink transmissions, in Proc. of the IEEE Military Communications Conference (MILCOM), 2014, pp. 556–562
KA Singhal, T Datta, A Chockalingam, Lattice reduction aided detection in largeMIMO systems, in Proc. of the IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2013, pp. 594–598
E Zimmermann, G Fettweis, Linear MIMO receivers vs. tree search detection: a performance comparison overview, in Proc. of the IEEE Personal Indoor and Mobile Radio Communications (PIMRC), 2006, pp. 1–7
N Prasad, MK Varanasi, Analysis of decision feedback detection for MIMO Rayleighfading channels and the optimization of power and rate allocations. IEEE Transactions on Information Theory 50(6), 1009–1025 (2004)
R Xu, FCM Lau, Performance analysis for MIMO systems using zero forcing detector over fading channels. IEE Proceedings on Communications 153(1), 74–80 (2006)
Y Nasser, JF Hélard, M Crussière. System Level Evaluation of Innovative Coded MIMOOFDM Systems for Broadcasting Digital TV; in EURASIP International Journal of Digital Multimedia Broadcasting. 2008(359206), 12 (2008)
E Viterbo, J Boutros, A universal lattice code decoder for fading channels. IEEE Trans. on Information Theory 45, 1639–1642 (1997)
B Hassibi, H Vikalo, On the expected complexity of sphere decoding, in Proc. of the Asimolar Conference on Signal, Systems and Computers, 2001, pp. 1051–1055
C Schnorr, M Euchner, Lattice basis reduction: improved practical algorithms and solving subset sum problems. Mathematical Programming 66, 181–199 (1994)
Z Guo, P Nilsson, Algorithm and implementation of the Kbest sphere decoding for MIMO detection. IEEE Journal on Selected Areas in Communications 24(3), 491–503 (2006)
LG Barbero, JS Thompson, A fixedcomplexity MIMO detector based on the complex sphere decoder. IEEE 7th Workshop on Signal Processing Advances in Wireless Communications, 2006. SPAWC ’06. pp. 1, 5, 2–5 (2006)
M Mohaisen, KyungHi Chang, On improving the efficiency of the fixedcomplexity sphere decoder. 2009 IEEE 70th Vehicular Technology Conference Fall (VTC 2009Fall), 20–23 Sept 2009, pp. 1, 5
Y Ding, Y Wang, JF Diouris, Z Yao, Robust fixedcomplexity sphere decoders for rankdeficient MIMO systems. IEEE Trans. Wireless Commun 12(9), 4297–4305 (2013)
J Fink, S Roger, A Gonzalez, V Almenar, VM Garciay, Complexity assessment of sphere decoding methods for MIMO detection. 2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 14–17 Dec 2009, pp. 9, 14
C Hess, M Wenk, A Burg, P Luethi, C Studer, N Felber, W Fichtner, Reducedcomplexity MIMO detector with closeto ML error rate performance, in Proc. of the GLSVLSI, 2007, pp. 200–203
D Wuebben, R Bohnke, V Kuhn, KD Kammeyer, MMSEbased latticereduction for nearML detection of MIMO systems, in Proc. of the ITG Workshop on Smart Antennas, 2004, pp. 106–113
S Roger, A Gonzales, V Almenar, AM Vidal, Latticereductionaided Kbest MIMO detector based on the channel matrix condition number. 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP), March 2010, pp. 1–4
CF Liao, YH Huang, Cost reduction algorithm for 8x8 lattice reductionaided Kbest MIMO detector, in Proc. of the IEEE International Conference of Signal Processing, Communication and Computing, 2012, pp. 186–190
XF Qi, K Holt, A latticereductionaided soft demapper for highrate coded MIMOOFDM systems. IEEE Signal Processing Letters 14(5), 305–308 (2007)
M Shabany, PG Gulak, The application of latticereduction to the Kbest algorithm for nearoptimal MIMO detection. IEEE International Symposium on Circuits and Systems, 2008. ISCAS 2008. 18–21 May 2008, pp. 316–319
JC Marinello, T Abrao, Lattice reduction aided detector for dense MIMO via ant colony optimization, in Proc. of the IEEE Wireless Communications and Networking Conference (WCNC), 2013, pp. 2839–2844. Shanghai
LG Barbero, JS Thompson, A fixedcomplexity MIMO detector based on the complex sphere decoder, in Proc. of the Workshop on Signal Processing Advances for Wireless Communications, 2006, pp. 1–5
E Agrell, T Eriksson, E Vardy, K Zeger, Closest point search in lattices. IEEE Transactions on Information Theory 48(8), 2201–2214 (2002)
KW Wong, CY Tsui, SK Cheng, WH Mow, A VLSI architecture of a Kbest lattice decoding algorithm for MIMO channels, in Proc. of the IEEE International symposium on Circuits and Systems, vol. 3, 2002, pp. 273–276
E Viterbo, J Boutros, A universal lattice code decoder for fading channels. IEEE Transactions on Information Theory 45(5), 1639–1642 (1999)
S Aubert, M Mohaisen, From linear equalization to latticereductionaided spheredetector as an answer to the MIMO detection problematic in spatial multiplexing systems. Vehicular Technologies, 9789537619XX, INTECH, (2011)
BA Lamacchia, Basis reduction algorithms and subset sum problems. Technical report, MSc Thesis, Massachusetts Institute of Technology, 1991
AK Lenstra, HW Lenstra, L Lovász, Factoring polynomials with rational coefficients. Mathematische Annalen 261(4), 515–534 (1982)
M Seysen, Simultaneous reduction of a lattice basis and its reciprocal basis. Combinatorica 13(3), 363–376 (1993)
D Wübben, R Böhnke, V Kühm, KD Kammeyer, Nearmaximumlikelihood detection of MIMO systems using MMSEbased latticereduction, in Proc. of the IEEE International Conference on Communications, vol. 2, 2004, pp. 798–802
S Roger, A Gonzalez, V Almenar, AM Vidal, On decreasing the complexity of latticereductionaided Kbest MIMO detectors, in Proc. of the European Signal Processing Conference, 2009, pp. 2411–2415
B Gestner, W Zhang, X Ma, DV Anderson, VLSI implementation of a lattice reduction algorithm for lowcomplexity equalization, in Proc. of the IEEE International Conference on Circuits and Systems for Communications, 2008, pp. 643–647
T Cui, C Tellambura, An efficient generalized sphere decoder for rankdeficient MIMO systems. IEEE Communications Letters 9(5), 423–425 (2005)
L Wang, L Xu, S Chen, L Hanzo, MMSE softinterferencecancellation aided iterative centershifting Kbest sphere detection for MIMO channels, in the Proc. of the IEEE International Conference on Communications, 2008, pp. 3819–3823
J Jalden, B Ottersten, On the complexity of sphere decoding in digital communications. IEEE Transactions on Signal Processing 53(4), 1474–1484 (2005)
X Wang, Z He, K Niu, W Wu, X Zhang, An improved detection based on lattice reduction in MIMO systems, in Proc. of the IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), 2006, pp. 1–5
C Studer, A Burg, H Bolcskei, Softoutput sphere decoding: algorithms and VLSI implementation. IEEE Journal on Selected Areas in Communications 26(2), 290–300 (2008)
W Zhang, M Xiaoli, Approaching optimal performance by latticereduction aided soft detectors. 41st Annual Conference on Information Sciences and Systems, 2007. CISS ’07. 14–16 March 2007, pp. .818–822
M Pohst, On the computation of lattice vectors of minimal length, successive minima and reduced basis with applications. ACM SIGSAM Bull. 15, 37–44 (1981)
Acknowledgements
This paper was partially presented in [1].
Author information
Authors and Affiliations
Corresponding author
Additional information
An erratum to this article is available at http://dx.doi.org/10.1186/s136380150478z.
Appendices
Appendices
Appendix 1: the constant modulus constraint in (6)
The authors of [42] discussed the constant modulus constraint of x in (6) when n _{T} is large. It has been shown that the constant modulus signal assumption becomes the time average of the n _{T} x _{ i } entries. Figure 12 presents the probability density functions (PDF) of x ^{H} x for different number of transmit antennas and different modulations. This figure shows that, due to the weak law of large numbers, the term is Gaussian centered to a mean value that is constant in time. Consequently, the assumption may still be considered as fulfilled as n _{T} increases. It is worth mentioning that, in order to make (6) strictly equivalent to the ML metric, any MQAM constellation may be represented as a weighted sum of QPSK constellations:
Where x ^{(M ‐ QAM)} is an n _{T} symbols vector whose entries all belong to a MQAM constellation and \( {\boldsymbol{x}}_i^{\left(\mathrm{QPSK}\right)} \) is an n _{T} symbol vector whose all entries belong to a QPSK constellation.
Appendix 2: proof of Equation (8)
Let us introduce any term \( c\;\mathrm{s}.\mathrm{t}.\parallel \boldsymbol{y}\boldsymbol{H}\boldsymbol{x}{\parallel}^2+c=\parallel \boldsymbol{U}\left(\tilde{\boldsymbol{x}}  \boldsymbol{x}\right){\parallel}^2 \), where \( \tilde{\boldsymbol{x}} \) is any (ZF or MMSE) unconstrained linear estimate:
by introducing \( \tilde{\boldsymbol{x}} = {\boldsymbol{G}}^{1}{\boldsymbol{H}}^H\boldsymbol{y} \) and \( {\tilde{\boldsymbol{x}}}^H={\boldsymbol{y}}^H\boldsymbol{H}{\boldsymbol{G}}^{1} \) and where G = U ^{H} U = H ^{H} H in the ZF case and G = H ^{H} H + σ ^{2} I in the MMSE case.
In the ZF case, HG ^{− 1} H ^{H} = HH ^{− 1}(H ^{H})^{− 1} H ^{H} = I and G − H ^{H} H = 0, consequently c = 0 which is a constant term.
In the MMSE case, c = y ^{H}[H(H ^{H} H + σ ^{2} I)^{− 1} H ^{H} − I]y + σ ^{2} x ^{H} x which is a constant term in x iff the signal x entries are of constant modulus.
Appendix 3: proof of Equation (12)
The proof of Equation (12) is very similar to the proof of (8); however, in this appendix, we work on the LRAbased detector. Let us introduce any term \( c^{\prime}\;\mathrm{s}.\mathrm{t}.\parallel \boldsymbol{y}\tilde{\boldsymbol{H}}\boldsymbol{z}{\parallel}^2+c^{\prime }=\parallel \tilde{\boldsymbol{U}}\left(\tilde{\boldsymbol{z}}  \boldsymbol{z}\right){\parallel}^2 \), where \( \tilde{\boldsymbol{z}} \) is any LRA (ZF or MMSE) unconstrained linear estimate:
by introducing \( \tilde{\boldsymbol{z}}={\tilde{\boldsymbol{G}}}^{1}{\tilde{\boldsymbol{H}}}^H\boldsymbol{y} \) and \( {\tilde{\boldsymbol{z}}}^H={\boldsymbol{y}}^H\tilde{\boldsymbol{H}}{\tilde{\boldsymbol{G}}}^{1} \),
where \( \tilde{\boldsymbol{G}}={\tilde{\boldsymbol{H}}}^H\tilde{\boldsymbol{H}} \) in the LRAZF case and \( \tilde{\boldsymbol{G}}={\tilde{H}}^H\tilde{H}+{\sigma}^2{\boldsymbol{T}}^H\boldsymbol{T} \) in the LRAMMSE case.
In the ZF case, \( \tilde{\boldsymbol{H}}{\tilde{\boldsymbol{G}}}^{1}{\tilde{\boldsymbol{H}}}^H=\tilde{\boldsymbol{H}}{\tilde{\boldsymbol{H}}}^{1}{\left({\tilde{\boldsymbol{H}}}^H\right)}^{1}{\tilde{\boldsymbol{H}}}^H=\boldsymbol{I} \) and \( \tilde{\boldsymbol{G}}{\tilde{\boldsymbol{H}}}^H\tilde{\boldsymbol{H}}=0 \), consequently c′ = 0 is a constant term.
In the MMSE case, \( c^{\prime }={\boldsymbol{y}}^H\left[\tilde{\boldsymbol{H}}{\left({\tilde{\boldsymbol{H}}}^H\tilde{\boldsymbol{H}}+{\sigma}^2{\boldsymbol{T}}^H\boldsymbol{T}\right)}^{1}{\tilde{\boldsymbol{H}}}^H\boldsymbol{I}\right]\boldsymbol{y}+{\sigma}^2{\boldsymbol{z}}^H{\boldsymbol{T}}^H\boldsymbol{T}\boldsymbol{z} \) which is a constant term in x iff the signal x entries are of constant modulus since σ ^{2} z ^{H} T ^{H} Tz = σ ^{2} x ^{H} x.
Appendix 4: description of the proposed detection algorithm: RDN LRAZF(O)SIC Kbest
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Nasser, Y., Aubert, S., Nouvel, F. et al. A simplified hard output sphere decoder for large MIMO systems with the use of efficient search center and reduced domain neighborhood study. J Wireless Com Network 2015, 227 (2015). https://doi.org/10.1186/s136380150442y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s136380150442y
Keywords
 Minimum Mean Square Error
 MIMO System
 Lattice Reduction
 Zero Force
 Constellation Size