Open Access

Successive interference cancellation aided sphere decoder for multi-input multi-output systems

EURASIP Journal on Wireless Communications and Networking20162016:51

Received: 15 June 2015

Accepted: 5 February 2016

Published: 16 February 2016


In this paper, sphere decoding algorithms are proposed for both hard detection and soft processing in multi-input multi-output (MIMO) systems. Both algorithms are based on the complex tree structure to reduce the complexity of searching the unique minimum Euclidean distance and multiple Euclidean distances, and obtain the corresponding transmit symbol vectors. The novel complex hard sphere decoder for MIMO detection is presented first, and then the soft processing of a novel sphere decoding algorithm for list generation is discussed. The performance and complexity of the proposed techniques are demonstrated via simulations in terms of bit error rate (BER), the number of nodes accessed and floating-point operations (FLOPS).


Complex sphere decoder (CSD) Detection algorithms Multi-input multi-output (MIMO) QR decomposition

1 Introduction

To achieve a high spectral efficiency, maximum likelihood (ML) detection should be employed with high-order constellations. However, “brute-force” ML detection is impractical even for a system with a small number of antennas. An alternative method is called the sphere decoder (SD), and this has attracted significant attention in the last decade, due to the considerable complexity reduction it achieves [1, 2]. The key idea behind the SD is to find the lattice point closest to the received signals within a sphere radius. Although the computational complexity has been greatly reduced, it is still very high for systems with a large number of antennas and high-order modulation. For the complexity reduction, the authors in [38] have studied different search strategies and enumeration schemes. Additionally, some suboptimal methods with linear and decision feedback equalization (DFE) [9, 10] have been proposed to approach the ML performance.

In most applications, the complex-valued system is decoupled and reformulated as an equivalent real-valued system. Real-valued SDs can only process lattice-based modulation schemes such as quadrature amplitude modulation (QAM) and pulse amplitude modulation (PAM), while other modulations such as phase shift keying (PSK) cannot be processed as efficiently, because some invalid lattice points are included in the search. Additionally, the depth of the expanded tree for real-valued SDs is twice of that for complex counterparts. Hence, the complex-valued SE-SD and a modified version of SE-SD were proposed in [11, 12]. The complex-valued SDs avoid the decoupling of the complex system and can be widely applied to different modulations without reaching invalid lattice points. Especially, the latter one can achieve a very low complexity compared to other real-valued and complex-valued SDs [13]. However, the intricacy of complex SE enumeration is still a weak point that makes the real-valued SDs preferred for hardware implementation. Some novel low-complexity complex enumerators have been studied in [1416], and these enumerators are interchangeable in most complex-valued SDs. Nevertheless, the enumeration still must be employed in each detection layer and performed for several times once new branches are accessed. Furthermore, the authors in [17] investigate the practical performance of a novel sphere decoder (Geosphere) for multiuser detection. A novel two-dimensional zig-zag ordering strategy has been studied in the sense that the number of path metric calculations is reduced. Additionally, the lower bound of the path metric is employed to eliminate the branches if the path metric is smaller than the lower bound. Another efficient ordering and pruning scheme is studied in [18], which performs the horizontal pruning and vertical pruning with a novel tight lower limit for the path metric. These two schemes discussed above could also be used in the complex-valued SDs with simple modifications.

Motivated by the description above and probabilistic tree pruning SD (PTP-SD) [7], we devise a novel complex-valued SD (CSD) with statistical pruning strategy (SPS), successive interference cancellation (SIC) aided modified probabilistic tree pruning (MPTP).

In addition to hard decision type detection, soft processing for multi-input multi-output (MIMO) systems has been recently studied in several works [11, 14, 1921]. In [11], the authors report a near-capacity MIMO detection using list-SD (LSD) with a relatively large candidate list. With a large list, the performance of the LSD will be very close to the maximum a posteriori probability (MAP) detector. The results in [14] illustrate the hardware implementation of LSD with four candidates, which may not be considered as an implementation of approximate MAP detector [19]. This is because the list size is too small to achieve near-capacity performance. Hence, it is not an optimum detection technique. Additionally, the log-likelihood ratio (LLR) clipping is also required for the LSD with a small list, because the +1 or −1 is missing in a particular bit LLR calculation based on the max-log MAP criterion. It can be fixed by setting a given magnitude of LLR as in [11], but the performance loss is unavoidable. Although the authors in [20, 21] have fixed this problem for non-iterative detection decoding scheme and non-coherent detection, these LLR clipping techniques are still relatively complicated in most cases and may not be suitable for every iterative detection decoding structure. However, a large list will result in irreducible complexity which limits the applications of LSD, because the candidate updates in the list is difficult in hardware implementation [22]. This is not a desirable feature. If the list generation for LSD is simple, the multiple search will not be required in the iterative processing compared to single tree search (STS) sphere decoder. In the following, we will discuss a simple list generation for the LSD given the proposed CSD.

The main contribution of this paper is summarized as follows:
  • An efficient CSD has been proposed that approaches the linear complexity for practical large-scale MIMO systems. This is because the proposed CSD significantly reduces the required number of times for performing enumeration and the span of the detection tree.

  • Due to the additional conditions (ACs), the performance loss of the conventional CSD is compensated.

  • The search strategy of the proposed CSD can be easily extended to the LSD with lower complexity.

  • The scatter list generation and the ML ordering accelerate the construction of the list and make the LSD more suitable for the parallel hardware implementation.

Due to the requirement of enumerators, the SDs we discuss in this work are all based on complex-valued SE enumeration in [11, 12] for simplicity, namely computation of coordinate bound (CCB) enumeration.

The rest of the paper is organized as follows. Section 2 presents the system model and problem formulation. Section 3 describes the proposed complex sphere decoder and the algorithm table. The soft processing with LSD is also discussed in Section 4. In Section 5, the simulation results demonstrate the complexity and BER performance of CSD and LSD, respectively. A conclusion is drawn in Section 7.

2 System model

The channels between transmit and receive antennas are assumed to be independent frequency flat fading. Here, we denote the dimension symbol \(\mathbb {C}_{N_{t} \times N_{r}}\) as the complex number with the vertical dimension N t and the horizontal dimension N r . The channel can be represented by \(\mathbf {H} \in \mathbb {C}_{N_{t} \times N_{r}}\). Defining the transmit symbol vector \(\mathbf {s}=[s_{1} \ldots s_{i} \ldots s_{N_{t}}]^{T} \in \mathbb {C}_{N_{t} \times 1}\), the received signal \(\mathbf {y}=[y_{1} \ldots y_{j} \ldots y_{N_{r}}]^{T} \in \mathbb {C}_{N_{r} \times 1}\), and the AWGN noise vector \(\mathbf {v}= [v_{1} \ldots v_{j} \ldots v_{N_{r}}]^{T} \in \mathbb {C}_{N_{r} \times 1}\), the MIMO system model can be written as
$$ \mathbf{y}=\mathbf{H}\mathbf{s}+\mathbf{v}, $$
where \(\mathbb {E}\lbrace \mathbf {s}\mathbf {s}^{H} \rbrace ={\sigma ^{2}_{s}}\mathbf {I}_{N_{t}\times N_{t}}\). Note that each element in the noise vector is assumed to be a zero-mean circular symmetric complex Gaussian variable, which implies that the phase rotation of v will not affect its statistical properties, and \(\mathbb {E}\lbrace \mathbf {v}\mathbf {v}^{H} \rbrace ={\sigma ^{2}_{v}}\mathbf {I}_{N_{r} \times N_{r}}\). After QR decomposition, the reformulated mathematical expression of (1) can be presented as follows:
$$ \mathbf{z}=\mathbf{R}\mathbf{s}+\tilde{\mathbf{v}}, $$
$$ \mathbf{R} = \left[ \begin{array}{cccc} r_{N_{r},N_{t}} & r_{N_{r},N_{t}-1} & \ldots & r_{N_{r},1} \\ 0 & r_{N_{t}-1,N_{r}-1} & \ldots & \vdots \\ \vdots & \ddots & \ldots & \vdots \\ \vdots & \vdots & r_{2,2} &\vdots\\ 0 & 0 & \ldots & r_{1,1} \\ \end{array} \right], $$
the receive signal vector z=Q H y, and the noise vector \(\tilde {\mathbf {v}} = \mathbf {Q}^{H}\mathbf {v}\). Because the random variables in the vectors \(\tilde {\mathbf {v}}\) and v have the same statistical properties. In the remaining part of the paper, we use the same notation v for both cases. Note that we assume N t N r throughout this paper. However, for the rank deficit MIMO systems, the authors in [23] and [24] proposed new forms of the Cholesky and QR decomposition to exploit the performance advantage of the SDs for overloaded MIMO systems (N t >N r ). The algorithms discussed in the paper could be also naturally extended to the rank deficit MIMO systems with modifications of such preprocessing techniques. To clarify the summation of the Euclidean distance for multiple single layers, the full path metric (FPM) and the partial path metric (PPM) are defined as
$$ p_{m}=\sum\limits_{i=1}^{m}B_{i}, $$
where the branch metric
$$ B_{i}=\left\vert y_{i}-r_{i,i}s_{i}-\sum\limits_{j=1}^{i-1}r_{i,j}s_{j} \right\vert^{2}. $$

For the ith detection layer, the only first i elements in the ith row of the matrix R are used for the branch metric calculation in Eq. (5). In other words, the FPM \(p_{N_{t}}\) corresponds to the Euclidean distance of all detection layers, and the PPM p m means the Euclidean distance of partial detection layers. Note that the quantity \(p_{N_{t}}\) denotes the FPM when m=N t , and the quantity p m denotes the PPM when m<N t .

3 Complex sphere decoder with successive interference cancellation-based tree pruning

The proposed CSD is based on the CCB enumeration scheme discussed as follows. Hence, the system model becomes (2). Furthermore, the minimum mean square error-sorted QR-decomposition (MMSE-SQRD) is used as the processing technique [2], which provides several efficient QR decomposition methods. For fair comparison between different CSDs, the one with the most promising decomposition methods (SQRD) with the optimum signal to interference plus noise ratio (SINR) ordering is used for pre-processing to obtain the upper-triangular matrix R. Thus, the CSDs with MMSE-SQRD can achieve the ML performance at the expense of very low complexity.

3.1 The review of CCB

This subsection briefly demonstrates a complex-valued enumeration namely CCB or complex SE enumeration. This bound was first proposed in [11], and an improved version was presented in [12], which separates the constellation points into groups located on one or multiple concentric rings and computes the phase bound based on the current sphere radius and previously detected symbols \(\hat {s}_{k}\) and r i,k . In this case, these constellation points can be tested according to the bound to determine whether they are in the circle of nulling-cancelling points, which are the symbol estimates of the SIC. The constellation points \({s^{m}_{i}}=\gamma e^{\theta _{m}}\) can be represented in polar coordinates, where the quantity \({s^{m}_{i}}\) denotes the mth candidate constellation point at layer i, and 0≤θ m <2π. Note that the quantity γ will be different in different concentric rings as shown in [11]. Figure 1 illustrates that the phase bound of the constellation points for one particular concentric ring can be determined by the sphere radius. The red curves correspond to the phase bounds of concentric rings. Accordingly, the null-cancelling point for the ith layer can be defined as
$$ \delta_{i}=\left(y_{i}-\sum\limits_{k=1}^{i-1}r_{i,k}\hat{s}_{k}\right) \bigg/ r_{i,i}, $$
Fig. 1

Two specific examples to exclude the constellation points erroneously: a in case I, the phases are bounded by \(\left [\frac {7\pi }{8}~-\frac {3\pi }{4}\right ]\), which is supposed to cover the constellation points between \(\left [\frac {3\pi }{2}~2\pi \right ]\). But these constellation points are excluded by the CCB, because they are not inside the bound. b Case II has the similar problem: the constellation point \(\frac {\pi }{4}\) is not in the bound \(\left [\frac {11\pi }{4}~\frac {11\pi }{6}\right ]\)

where the quantity \(\hat {s}_{k}\) is defined as the accessed symbol at the kth detection layer in the surviving branch. With the aid of trigonometric functions, we calculate the phase bound of θ m as
$$ \cos(\theta_{m}-\theta_{\delta_{i}})=\frac{1}{2\gamma \vert \delta_{i} \vert}\left(\gamma^{2}+ \vert \delta_{i} \vert^{2}-\frac{p^{2}_{\text{SD}}}{r_{i,i}^{2}}\right)\triangleq \psi, $$
where the quantity \(\theta _{\delta _{i}}\) denotes the phase of the nulling-cancelling point obtained by Eq. (6), and the quantity p SD denotes the sphere radius. If the previous detected symbols \(\hat {s}_{k}\) are perfect, the equivalent sphere radius \(\frac {p_{\text {SD}}}{r_{i,i}}\) can be used to compute the phase bound (red curves) with the trigonometric function \(\cos \theta =\frac {a^{2}+b^{2}-c^{2}}{2ab}\), where a=γ, b=|δ i |, and \(c=\frac {p_{\text {SD}}}{r_{i,i}}\). The vector with possible candidates \(\tilde {\mathbf {s}}_{i}\) in layer i for a given concentric ring can be categorized as
$$ \tilde{\mathbf{s}}_{i}=\left\lbrace \begin{array}{llll} \emptyset, \psi>1,\\ {s^{m}_{i}}, m=1,2,\ldots,M,~\text{and}~ \psi<-1 \\ {s^{m}_{i}}, \theta_{m} \in [\theta_{\delta_{i}}-\arccos(\psi),\theta_{\delta_{i}}+\arccos(\psi)], \\ \text{and}~-1 \leq \psi \leq 1,\\ \end{array} \right. $$

where 0≤ arccos(ψ)≤π. From (8), no constellation points in one concentric ring will be included for the candidates if ψ>1, which implies that the phase bound is too small to cover any constellation points except δ i . For ψ<−1, the corresponding phase bound \([\theta _{\delta _{i}}-\pi, \theta _{\delta _{i}}+\pi ]\) to include all constellation. For −1≤ψ≤1, only the constellation points inside the bound can be used for the search. However, the phase bound described above may eliminate some candidates, which should be included in the search. This is because the phases of the constellation points are between 0 and 2π, and the corresponding phase bound may not be located within [0,2π]. Thus the mismatch between the phases of constellation points and the phase bound must be fixed to avoid missing candidates.

3.2 Novel search strategy and successive interference cancellation tree pruning

In this subsection, we present three different techniques to reduce the complexity: (1) a novel search strategy with the aid of SIC; (2) SPS; (3) the MPTP algorithm.

3.2.1 Search strategy

Compared to conventional SE-CSD, the novel search strategy first performs SIC to obtain the nulling-cancelling points and the FPM without calculating the PPM of other constellation points and sorting for each layer, and the radius p SD may be updated by FPM, i.e., \(p_{N_{t}}\), once the search reaches the bottom layer. The rest of the search can be performed upwards starting from the nulling-cancelling point of the bottom layer rather than top layer as in conventional SE-CSD. Additionally, the span of the tree can be further shrunk by the MPTP. In (7), the candidates chosen by CCB can be determined by the new updated radius obtained by the MPTP. Hence, the number of possible candidates for each layer can be significantly reduced. The details of the proposed algorithm are described in Algorithm 1.

3.2.2 Statistical pruning strategy

As discussed in [25], the complexity of the SD is significantly affected by the initial radius. However, the new search strategy does not require the search starting with an appropriate initial radius to control the tree span. This is because the bottom layer will be arrived by the SIC process. For some extreme channel conditions, the radius derived by the SIC may still have a very small chance to be very large. In other words, the result obtained by the proposed search strategy may reach the local optima rather than the global one. Hence, the initial radius for CSDs must be considered. First, we assume the ML solution is obtained. Thus, the optimum radius would be the summation of squared noise terms \(\left (\sum _{i=1}^{N_{t}}\vert v {\vert ^{2}_{i}}\right)\) of the detection layers, which follows χ 2distribution with N t degree of freedom and upper bounded by the initial radius \(p^{\prime }_{\text {SD}}\). The normalized radius can be written as \(\beta = \frac {p'_{\text {SD}}}{{\sigma ^{2}_{v}}}\), and the normalized summation of squared noise terms can be written as \(u=\frac {\sum _{i=1}^{N_{t}}\vert v{\vert ^{2}_{i}}}{{\sigma ^{2}_{v}}}\). The quantity \({\sigma ^{2}_{v}}\) is the noise variance. In the following, an appropriate value of β is required. The cumulative density function (CDF) of the random variable u can be represented as
$$ \begin{aligned} \Pr(u<p'_{\text{SD}}) &=\int_{0}^{\beta}\frac{u^{N_{t}-1}}{\Gamma(N_{t})}e^{-u}du\\ &=1-\epsilon, \end{aligned} $$

where the quantity ε is the pre-defined threshold probability according to the empirical results with the different number of antennas of the MIMO systems, and the quantity β can be easily calculated by the inverse calculation of (9), i.e., the inverse incomplete Gamma function. Once p SD>pSD′, the quantity p SD will be updated by \(p^{\prime }_{\text {SD}}\) at the bottom layer without performing multiple search compared to that in [25].

3.2.3 Modified probabilistic tree pruning

We assume that the remaining N t m layers’ symbols are perfectly detected in (5). Then, the PPM is only affected by the noise. Hence, the current PPM p m plus the norm of the remaining layers’ noise \(\sum _{i=m+1}^{N_{t}}\vert v_{i} \vert ^{2}\) must be smaller than the radius in most cases. Hence, the possible FPM can be represented as
$$ p_{m}+\sum\limits_{i=m+1}^{N_{t}}\vert v_{i} \vert^{2} \leq p_{\text{SD}}, $$
where \(p_{m}= \sum _{i=1}^{m}B_{i}\). Since \(\sum _{i=m+1}^{N_{t}}\vert v_{i} \vert ^{2}/{\sigma ^{2}_{v}} \sim \chi ^{2}\) with 2(N t m) degrees of freedom [7, 25], the noise term after some manipulations can be given by
$$ \sum\limits_{i=m+1}^{N_{t}}\vert v_{i} \vert^{2}/{\sigma^{2}_{v}} \leq \left(p_{\text{SD}} - p_{m}\right)/{\sigma^{2}_{v}} $$
Accordingly, the value of \(\text {Pr}\left (\sum _{i=m+1}^{N_{t}}\vert v_{i} \vert ^{2}/{\sigma ^{2}_{v}}\leq (p_{\text {SD}} - p_{m})/{\sigma ^{2}_{v}}\vphantom {\dot {\sum _{i=m+1}^{N_{t}}\vert v_{i}}\!}\right)\) is reasonably large. Because the sphere radius p SD is sufficiently large to avoid missing the ML solution. As discussed above, the summation \(\sum _{i=m+1}^{N_{t}}\vert v_{i} \vert ^{2}/{\sigma ^{2}_{v}}\) follows the Chi-square distribution, and the CDF is \(\text {Pr}\left (\sum _{i=m+1}^{N_{t}}\vert v_{i} \vert ^{2}/{\sigma ^{2}_{v}} \leq (p_{\text {SD}} - p_{m})/{\sigma ^{2}_{v}}\right)\). Thus,
$$ {\selectfont{\begin{aligned} {}\text{Pr}\!\left(\sum\limits_{i=m+1}^{N_{t}}\!\vert v_{i} \vert^{2}/\!{\sigma^{2}_{v}} \!\leq (p_{\text{SD}} \,-\, p_{m})/{\sigma^{2}_{v}}\!\right)\,=\,\Xi\left((p_{\text{SD}} \,-\, p_{m})/\!{\sigma^{2}_{v}}; N_{t}\,-\,m\right)\!<\epsilon_{p}, \end{aligned}}} $$
where \(\Xi (x;a)={\int _{0}^{x}}\frac {1}{\Gamma (a)}e^{-t}t^{a-1}dt\). In order to obtain the PPM p m , Eq. (12) can be reformulated as
$$ p_{m}\leq p_{\text{SD}}-{\sigma^{2}_{v}}\Xi^{-1}\left(\epsilon_{p};N_{t}-m\right), $$
where Ξ −1(x;a) is the inverse of Ξ(x;a), and the quantity ε p is the pre-defined probability. Hence, the left hand side (LHS) of the Eq. (13) can be considered as the upper bound of the PPM at the mth detection layer. In other words, any PPM p m larger than the LHS of (13) is unlikely to be the correct path for the ML solution, so these nodes at the m detection layer with their child nodes are eliminated in the search tree. To avoid the CCB, we introduce the quantized nulling-cancelling point Q(δ m ) obtained by SIC to calculate the minimum PPM for the mth layer as
$$ p^{\delta}_{m} = \left\vert y_{m}-r_{m,m}Q(\delta_{m})-\sum\limits_{j=1}^{m-1}r_{i,j}\hat{s}_{j} \right\vert^{2}+P_{m-1}, $$
$$ p^{\delta}_{m} > \rho_{m}, $$

where \(\rho _{m}=p_{\text {SD}}-{\sigma ^{2}_{v}}\Xi ^{-1}(\epsilon _{p};N_{t}-m)\), and Q(δ m ) is a quantized symbol for the given mth layer. If the inequality in (15) is satisfied, the NC point and the remaining nodes with their child nodes are all pruned, and the CCB is not carried out. Otherwise, the quantity \(p^{\delta }_{m}\) is used in (7) to replace p SD to further reduce the number of candidates. Note that the parameter ρ m is pre-computed before the start of the transmission.

3.3 Additional conditions for CCB

There are two ACs we should consider to avoid the candidates missing, if we introduce ρ i as the intra radius for CCB in each detection layer.
  • If \(\theta _{\delta _{i}}-\arccos (\psi)<0\) and \(\theta _{\delta _{i}}+\arccos (\psi)>0\), set −πθ k <π. If 0≤θ k <2π, some constellation points located in [π,2π] will be eliminated erroneously.

  • If \(\theta _{\delta _{i}}+\arccos (\psi)>2\pi \), set \(\theta _{\delta _{i}}+\arccos (\psi)=\theta _{\delta _{i}}+ \arccos (\psi)-2\pi \). If the upper bound of the phase is greater than 2π, the constellation points in [0,π] will not be included.

In Fig. 1, we present two examples that can be fixed by the conditions described above. The phase range between \(-\frac {3\pi }{4}\) and \(\frac {7\pi }{8}\) does not match to the above definition 0≤θ k <2π, so the two points between \(\frac {3\pi }{2}\) and 2π will be pruned erroneously in Fig. 1 a within the red circle. For Fig. 1 b, the phase of the constellation point is \(\frac {\pi }{4}\), which should be considered as a candidate based on the phase range. But the upper bound of phase obtained by CCB is greater than 2π, which will eliminate the candidate at \(\frac {\pi }{4}\). Note that these ACs are not specified in previous works such as [11, 12], which employ extremely large initial radius instead. For PSK modulation and QAM, all constellation points are located on one ring, and the candidates can be obtained in one shot. For high-order QAM, the CCB must be performed multiple times for different concentric rings.

4 List soft processing-based complex sphere decoder

As we discussed above, the conventional LSD has a variable complexity. In our case, we extend our proposed CSD to the LSD with a simpler list generation. From the original idea of the LSD, a list of symbol candidates with the smallest FPMs are required in the LLR calculation as (17). Furthermore, it would be possible to construct a list with the MAP solution inside. However, the complexity of list generation will be variable and significant [11]. For the simple implementation of LSD with a large number of candidates, the scatter list generation (SLG) is proposed.

4.1 Extrinsic LLR calculation of LSD

According to the MAP criterion, the extrinsic LLR can be evaluated by
$$ {\small{\begin{aligned} L_{e1}&=\frac{1}{2}\sum\limits_{\substack{\mathbf{b} \in \mathcal{B}_{k^{+}}\\} }\left\lbrace -\frac{1}{{\sigma^{2}_{v}}/2}\Vert\mathbf{y}-\mathbf{H}\mathbf{s}_{k^{+}} \Vert^{2} + \mathbf{b}^{T}_{\bar{k}}\cdot \mathbf{L}_{e2}\left(\mathbf{b}_{\bar{k}}\right)\right\rbrace\\ &\quad-\frac{1}{2}\sum\limits_{\mathbf{b} \in \mathcal{B}_{k^{-}} }\left\lbrace -\frac{1}{{\sigma^{2}_{v}}/2}\Vert\mathbf{y}-\mathbf{H}\mathbf{s}_{k^{-}} \Vert^{2}+\mathbf{b}^{T}_{\bar{k}}\cdot \mathbf{L}_{e2}\left(\mathbf{b}_{\bar{k}}\right)\right\rbrace, \end{aligned}}} $$
where the vector \(\mathbf {b}_{\bar {k}}\) denotes the bit vector omitting the kth bit, the a priori LLR \(\mathbf {L}_{e2}(\mathbf {b}_{\bar {k}})\) denotes the LLR from the channel decoder corresponding to the bits in \(\mathbf {b}_{\bar {k}}\), and the quantity \(\mathcal {B}_{k^{\pm }}\) denotes the list of bit vectors obtained by the LSD having ±1 at the kth bit. The symbol vector \(\mathbf {s}_{k^{\pm }}\phantom {\dot {i}\!}\) denotes the possible symbol combinations corresponding to the set \(\mathcal {B}_{k^{\pm }}\), and the corresponding kth bit of \(\mathbf {s}_{k^{\pm }}\phantom {\dot {i}\!}\) equals ±1. Following the max-log approximation and the list obtained by the LSD [11], Eq. (16) becomes
$$ {\small{\begin{aligned} L_{e1}&\approx \frac{1}{2}\max_{\mathbf{b} \in \mathcal{B}_{k^{+}} }\left\lbrace -\frac{1}{{\sigma^{2}_{v}}/2}\Vert\mathbf{y}-\mathbf{H}\mathbf{s}_{k^{+}} \Vert^{2} + \mathbf{b}^{T}_{\bar{k}}\cdot \mathbf{L}_{e2}\left(\mathbf{b}_{\bar{k}}\right)\right\rbrace\\&\quad-\frac{1}{2}\max_{\mathbf{b} \in \mathcal{B}_{k^{+}} }\left\lbrace -\frac{1}{{\sigma^{2}_{v}}/2}\Vert\mathbf{y}-\mathbf{H}\mathbf{s}_{k^{-}} \Vert^{2}+\mathbf{b}^{T}_{\bar{k}}\cdot \mathbf{L}_{e2}\left(\mathbf{b}_{\bar{k}}\right)\right\rbrace. \end{aligned}}} $$

The LLR L e1 for the kth bit in the transmit symbol vector is obtained for the channel decoder. The extrinsic information L e1 from the LSD will be fed forward to the channel decoder as the input, and the extrinsic information L e2 from the channel decoder will be fed back to the LSD. Thus, the information between two decoding components exchanges iteratively.

4.2 Scatter list generation

To build a large list with simple implementation, a few modifications will be made to the proposed CSD:
  • Perform the search by the proposed CSD to obtain the branches accessed in the search, and rearrange these branches in an ascending order according to the PPMs. Start several searches with the ML ordering by traversing the spans of the sub-trees of the branches until the list is filled. Note that the sub-tree search will be terminated once it reaches the starting point of the neighbouring sub-tree search.

  • Replace the radius p SD by the largest FPM of the symbol vector in the list.

  • MPTP will be carried out given the new radius p SD.

  • The sphere radius p SD may be updated in (9) with the new largest FPM in the list once a candidate with a smaller FPM is found.

The search strategy described above splits the entire tree into different sub-trees and searches them independently. The algorithm table is shown in Algorithm 2. Although the proposed CSD is needed to perform several times for the scatter list generation, its complexity has been significantly reduced, which is measured via the number of updates in the list generation.

4.3 ML-based ordering

The ML solution can be exploited to re-order the remaining branches for the list generation of the LSD. When the list is full, the search will go back to the upper layers and proceed down the tree. However, the unvisited nodes at the lower layers are unknown to this search, and these partial branches would be ordered according to the SE enumeration. The basic idea of ML-based ordering is to sort the remaining partial branches with the ML solution in the low detection layers rather than only computing their real PPM for a given layer. Additionally, a large proportion of the remaining branches may be discarded if the distance (p SD) is much smaller than the Euclidean distance of s r . In our case, the following equation can be used for ordering at the ith layer:
$$ \kappa=\mathop{\arg\min}_{\mathbf{s}_{r} \in \mathcal{R}}\Vert\mathbf{y}-\mathbf{R}\mathbf{s}_{r}\Vert^{2}, $$

where the vector \(\mathbf {s}_{r}=\left [s^{1}_{\text {ML}},\ldots,s^{i-1}_{\text {ML}}, {s^{i}_{r}},\ldots,s^{N_{t}}_{r}\right ]^{T}\), which implies that the unknown i−1 transmit symbols at low detection layers are replaced by the symbols in the ML solution, which can be used for ordering. The quantity \(\mathcal {R}\) denotes the set of available branches for the ith layers, and the quantity κ denotes the smallest Euclidean distance in the sorting process. The calculation of ML-based ordering has a modest cost in (18), which only needs \(\vert \mathcal {R} \vert (i+1)\) multiplications for each layer. The notation |·| denotes the size of the set.

5 Simulation results

In this section, we have discussed the proposed CSD in two different forms: (1) the hard output CSD and (2) the soft output CSD (LSD). For the hard output CSD, the performance and complexity of several CSDs are compared via BER and the number of visited nodes in a 8×8-MIMO system with 16QAM and 8PSK. An MPSK modulation in our simulation is defined as γ e (2n+1)π/M :n=0,1,…,M−1. The quantity γ is defined as the magnitude of the modulation scheme, and the quantity M is the size of the modulation. We consider the conventional SE-CSD, Pham-CSD [12], PTP-CSD [7], and the proposed CSD with and without AC for CCB, all of which are complex SE enumeration-based CSD with p SD= at the beginning of the search. The PTP can be simply extended to Pham-CSD. The energy per bit to noise (E b N 0) is used. The MIMO channel coefficients (N t =N r ) are generated according to Jakes model, and the channel noise is additive white Gaussian noise, which is identically independently distributed for each receive antenna as stated in the previous section. The probabilistic noise constraint is set to ε p =0.2. The threshold ε for SPS must be appropriately adjusted according to the dimensions and the modulation as stated earlier, and we set ε=0.001. The ISRC scheme [8] is not employed because of the difficulty of choosing parameters for intra radius.

As shown in Figs. 2 and 3, the complexity of the proposed CSD improves upon the others in terms of visited nodes per channel use by 25 % for 16QAM and more than 25 % for 8PSK at high E b N 0 values without any BER performance loss, even compared to conventional SE-CSD between the mid and high E b N 0 regime. The performance loss of the proposed algorithm without ACs is significant at high SNRs. In other words, it is more sensitive to the missing candidates in low noise scenarios. However, the complexity reduction is not obvious at low E b N 0 scenarios due to the CCB including more unreliable constellation points. It can be observed that the curves of the number of visited nodes for different SDs converge at very high E b N 0, so the improvement of the proposed SD is reduced at high E b N 0, but is still very promising.
Fig. 2

BER performance. Comparison with perfect channel estimates between the proposed and other CSDs for N t =N r =8 with 8PSK and 16QAM. Note that the curves of the conventional SE-CSD, Pham-CSD, PTP-CSD, and proposed-CSD w/ AC are superimposed in (a) and (b), respectively

Fig. 3

The number of visited nodes per channel use. Comparison with perfect channel estimates between the proposed and other CSDs for N t =N r =8 with 8PSK and 16QAM in (a) and (b), respectively

To show the robustness to the channel estimation errors, the BER performance of CSDs for 8×8 MIMO system with 8PSK and least square (LS) channel estimation [26] is plotted in Fig. 4. We can observe that the BER performance of the proposed CSD with imperfect channel estimates can still achieve the same performance as other existing CSDs. The BER performance of 16QAM is not shown here, because it has similar curves as in Fig. 4.
Fig. 4

BER performance with LS channel estimation. Comparison with imperfect channel estimates between the proposed and other CSDs for N t =N r =8 with 8PSK. Note that the curves of the conventional SE-CSD, Pham-CSD, PTP-CSD, and proposed-CSD w/ AC are superimposed

The worst-case complexity is measured by the 99 % quantile of the total number of visited nodes per channel use (\(\text {Pr}(\mathcal {C}_{w}>\mathcal {C}_{\text {any}})=0.99\)) [27], where the quantity \(\mathcal {C}_{w}\) denotes the number of visited nodes accessed by the SDs in one particular channel use, and the quantity \(\mathcal {C}_{\text {any}}\) denotes the number of visited nodes accessed by the SDs in any channel use. The corresponding worst-case complexity \(\mathcal {C}_{w}\) of CSDs are also plotted in Fig. 5, which implies that the number of visited nodes of the proposed CSD is tightly lower bounded by the complexity of SIC at high SNRs.
Fig. 5

Worst case. The number of visited nodes of CSDs against SNR, N t =N r =8 with 16QAM

Additionally, the complexity of SDs increases exponentially with increasing dimension. We therefore plot the number of visited nodes against the dimensions (N t =N r ) at a high E b N 0 (20 dB) to show that the complexity is still reduced by our proposed algorithm in Fig. 6.
Fig. 6

Dimensions. The number of visited nodes against increasing dimensions N t =N r with 16QAM at SNR = 20 dB

The complexity discussed so far is only based on the number of visited nodes. In order to show the advantages of complexity reduction of complex SE enumeration (CCB) and eliminating unnecessary candidates, the curves with the number of FLOPS are presented in Fig. 7. For a fair comparison, we assume that a complex addition requires 2 FLOPS, and a complex multiplication requires 16 FLOPS. The proposed algorithm still outperforms the other CSDs, because of fewer implementations of complex SE enumeration and the reduced number of candidates. The number of FLOPS of the detection ordering are not considered, because the CSDs are performed with the same preprocessing technique. Furthermore, the parameters for MPTP can be pre-computed before the start of the transmission.
Fig. 7

The number of FLOPS per channel use. Comparison with perfect channel estimates between the proposed and other CSDs for N t =N r =8 with 8PSK and 16QAMs in (a) and (b), respectively

For soft output CSD (LSD), we consider 8×8-16 QAM and a NSC half rate convolutional code with constraint length 3 for simplicity. In a coded system, the energy per bit to noise has been re-defined: \(\frac {E_{b}}{N_{0}}_{\text {dB}}=E_{s}{N_{0}}_{\text {dB}}+10\log _{10}\frac {N_{t}}{RN_{r}M}\), where R denotes the rate of the channel code. The performance and complexity of the proposed LSD have been evaluated by BER and the number of updates in the list, respectively. The EXIT chart has also been introduced to illustrate how the mutual information changes. Note that a fixed value clipping has been adopted in our simulation and the appropriate clipping values can be simply obtained by evaluating the mutual information I e as [20]. In our case, the clipping value is set to ±12. In Fig. 8 a, the EXIT chart of conventional LSD [11] and the proposed LSD is illustrated with different sizes of list. We can observe that both of them perform almost identically with different sizes of list, and the list size has a significant influence on the LSD performance. Additionally, the EXIT chart of LSD with the same list size in different SNR has also been plotted in Fig. 8 b. The SNR only moves the curves up and down without changing the shapes. Similarly, the BER performance of two LSD with L=512 agrees with the results in the EXIT chart, which has been presented in Fig. 9, and the performance improves with the increasing number of iterations. The complexity comparison made by CDF has been shown in Fig. 10, which indicates that the number of updates in the list has been significantly reduced by the proposed LSD with a large list size. Furthermore, the search can be terminated early to suit the hardware implementation.
Fig. 8

EXIT chart. Conventional LSD and proposed LSD with L=512 with different list sizes and SNR in (a) and (b), respectively. N t =N r =8 with 16QAM

Fig. 9

BER performance. Conventional LSD and proposed LSD with different iterations, N t =N r =8 with 16QAM

Fig. 10

CDF. The number of visited nodes employing different list size at 12 dB, N t =N r =8 with 16QAM

6 Related work discussion

For a very large-scale integration (VLSI) implementation, the enumeration scheme becomes the bottleneck of sphere decoding algorithm as discussed in [14, 15, 17]. The enumeration must be efficiently implemented, so “one-node-per-cycle architecture” is one of the promising structures for the hardware implementation. Our proposed sphere decoder basically is a variant of the method proposed in [14]. The authors proposed a new architecture which consists of two entities: (1) metric computation unit (MCU) and (2) metric enumeration unit (MEU). These two components perform in a parallel manner to handle the forward search and the backward search separately. In our proposed method, the enumeration scheme, i.e., MEU and the MCU can work in the same way. The search will always perform the successive interference cancellation to obtain the nulling-cancelling points, once the unvisited node is accessed. In the meanwhile, the MEU can be used to find the remaining surviving nodes after the above process. Furthermore, the latency requirement of “one-node-per-cycle” architecture will not be a problem in our case. This is because the second term in the modified probabilistic tree pruning is not real-time calculation as described in Eq. (14). Only one additional cycle is needed to obtain the tight radius in each detection layer. The reduction of the number of the visited nodes is more significant than the few additional cycles. Additionally, the statistical pruning we used in the paper is also pre-computed to avoid that the radius obtained by the successive interference cancellation is too large in some extreme cases. In [14], the authors modified the SE enumeration scheme to achieve the critical path reduction. However, the modified one is not strictly compatible with the “one-node-per-cycle” architecture. Due to the complexity reduction and a few additional cycles required, the modified scheme is still working under the architecture. This situation is quite similar to the one we discussed above.

Furthermore, the complexity of the enumeration scheme can be further reduced by the method introduced in [15, 17]. The authors proposed the two-dimensional zigzag enumeration schemes to avoid the sorting process and unnecessary partial path metric calculation, which are very computational intensive for the practical hardware implementation. In other words, the partial path metrics of the unvisited nodes are only calculated once the expansion of the tree is demanded. Hence, there is no need to visit all children of a parent node with the real Euclidean distance sorting. To the authors’ best knowledge, the two-dimensional zigzag enumeration scheme is the most efficient method for the complex sphere decoder.

7 Conclusion

In this paper, novel sphere decoding algorithms for MIMO detection and iterative detection and decoding have been presented. The proposed CSD, incorporating the statistical tree pruning technique and SIC, first reaches the bottom layer and eliminates the candidates for the lower layers before the search reaches them rather than eliminating these candidates at the lower layers. From the simulation results, it is seen that the proposed CSD can significantly save computational efforts compared to the conventional CSD. Furthermore, the proposed CSD has been naturally extended to the list SD (LSD) based on the scatter list generation. The proposed LSD makes better use of the ML solution to re-order the remaining branches. Hence, the list generation becomes simpler than that of the conventional LSD. The complexity of the proposed sphere decoding algorithms for MIMO detection is significantly reduced, and the algorithm provides an attractive tradeoff between complexity and performance.



This work is supported by the Scientific Research Foundation of CUIT (NO. KYTZ201415), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry and the Sichuan Provincial Department of Science and Technology Innovation and R&D projects in Science and Technology Support Program (NO. 2015RZ0060).

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

School of Communication Engineering, Chengdu University of Information Technology
Department of Electronics, University of York
CETUC, Pontifical Catholic University of Rio de Janeiro (PUC-Rio)


  1. E Viterbo, J Boutros, A universal lattice code decoder for fading channels. IEEE Trans. Inf. Theory. 45(5), 1639–1642 (1999).View ArticleMathSciNetMATHGoogle Scholar
  2. Y Li, N Seshadri, S Ariyavisitakul, On maximum-likelihood detection and the search for the closest lattice point. IEEE Trans. Inf. Theory. 49(10), 2389–2402 (2003).View ArticleGoogle Scholar
  3. CP Schnorr, M Euchner, Lattice basis reduction: improved practical algorithms and solving subset sum problems. Math. Program. 66(2), 181–191 (1994).View ArticleMathSciNetMATHGoogle Scholar
  4. E Agrell, T Eriksson, A Vardy, K Zegar, Closet point search in lattices. IEEE Trans. Inf. Theory. 48(8), 2201–2214 (2002).View ArticleMATHGoogle Scholar
  5. AD Murugan, HE Gamal, et al., A unified framework for tree search decoding: rediscovering the sequential decoder. IEEE Trans. Inf. Theory. 52(3), 933–953 (2006).View ArticleMATHGoogle Scholar
  6. R Gowaikar, B Hassibi, Statistical pruning for near-maximum likelihood decoding. IEEE Trans. Signal Process. 55(6), 2661–2675 (2007).View ArticleMathSciNetGoogle Scholar
  7. B Shim, I Kang, Sphere decoding with a probabilistic tree pruning. IEEE Trans. Signal Process. 56(10), 4867–4878 (2008).View ArticleMathSciNetGoogle Scholar
  8. MX Chang, On further reduction of complexity in tree pruning based sphere search. IEEE Trans. Commun. 58(2), 471–422 (2010).Google Scholar
  9. GJ Foschini, Layered space-time architecture for wireless communication in a fading environment when using multiple antennas. Bell Lab Technical J.1(2), 41–59 (1996).View ArticleGoogle Scholar
  10. RC de Lamare, R Sampaio-Neto, Minimum mean square error iterative successive parallel arbitrated decision feedback detectors for DS-CDMA systems. IEEE Trans. Commun. 56(5), 778–789 (2008).View ArticleMathSciNetGoogle Scholar
  11. B Hochwald, S Ten Brink, Achieving near-capacity on a multiple-antenna channel. IEEE Trans. Commun. 51(3), 389–399 (2003).View ArticleGoogle Scholar
  12. D Pham, KR Pattipati, et al., An improved complex sphere decoder for V-BLAST systems. IEEE Signal Process. Lett. 11(9), 748–751 (2004).View ArticleGoogle Scholar
  13. K-C Lai, L-W Lin, Low-complexity adaptive tree search algorithm for MIMO detection. IEEE Trans. Wireless. Commun. 8(7), 3716–3726 (2009).View ArticleGoogle Scholar
  14. A Burg, M Borgmann, et al., VLSI implementation of MIMO detection using the sphere decoding algorithm. IEEE J. Solid-State Circuits. 40(7), 1566–1577 (2005).View ArticleGoogle Scholar
  15. M Shabany, K Su, P Gulak, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP). A pipelined scalable high throughput implementation of a near-ML K-Best complex lattice decoder, (2008), pp. 3173–3176, doi:10.1109/ICASSP.2008.4518324.
  16. M Barrenechea, M Mendicute, et al., in Proc.19th European Signal Process. Conf. (EUSIPCO). Implementation of complex enumeration for multiuser MIMO vector precoding, (2011), pp. 739–743.Google Scholar
  17. K Nikitopoulos, J Zhou, B Congdon, et al., in Proc. 2014 ACM Conf. on SIGCOMM. Geosphere: Consistently turning MIMO capacity into throughput, pp. 631–642, doi:10.1145/2619239.2626301.
  18. K Nikitopoulos, A Karachalios, D Reisis, Exact Max-Log MAP Soft-Output Sphere Decoding via Approximate Schnorr-Euchner Enumeration. IEEE Trans. Veh. Technol. 64(6), 2749–2753 (2015).View ArticleGoogle Scholar
  19. SA Laraway, B Farhang-Boroujeny, Implementation of a Markov Chain Monte Carlo based multiuser/MIMO detection. IEEE Trans. Circuits Syst. I. 56(1), 246–255 (2009).View ArticleMathSciNetGoogle Scholar
  20. E Zimmermann, DL Milliner, et al., in Proc. IEEE GLOBECOMM 2008. Optimal LLR clipping levels for mixed hard/soft output detection, (2008), pp. 1–5, doi:10.1109/GLOCOM.2008.ECP.222.
  21. RH Gohary, TJ Willink, On LLR clipping in BICM-ID non-coherent MIMO communications. IEEE Commun. Lett. 15(6), 650–652 (2011).View ArticleGoogle Scholar
  22. C Studer, Iterative MIMO decoding: algorithms and VLSI implementation aspects (Ph.D. dissertation, Hartung-Gorre Verlag Konstanz, 2009).Google Scholar
  23. T Cui, C Tellambura, An efficient generalized sphere decoder for rank-deficient MIMO systems. IEEE Commun. Lett. 9(5), 423–425 (2015).Google Scholar
  24. P Wang, T Le-Ngoc, A low-complexity generalized sphere decoding approach for underdetermined linear communication systems: performance and complexity evaluation. IEEE Trans. Commun. 57(11), 3376–3388 (2009).View ArticleGoogle Scholar
  25. B Hassibi, H Vikalo, On the sphere-decoding algorithm I.Expected complexity. IEEE Trans. Signal Process. 53(8), 2806–2818 (2005).View ArticleMathSciNetGoogle Scholar
  26. S Kay, Fundamentals of Statistical Signal Processing: Estimation Theory (A Simon & Schuster Company, Upper Saddle River, New Jersey, 1993).MATHGoogle Scholar
  27. DW Waters, JR Barry, The chase family of detection algorithms for multiple-input multiple-output channels. IEEE Trans. Signal Process.56(2), 739–747 (2008).View ArticleMathSciNetGoogle Scholar


© LI et al. 2016