 Research
 Open Access
 Published:
Successive interference cancellation aided sphere decoder for multiinput multioutput systems
EURASIP Journal on Wireless Communications and Networking volume 2016, Article number: 51 (2016)
Abstract
In this paper, sphere decoding algorithms are proposed for both hard detection and soft processing in multiinput multioutput (MIMO) systems. Both algorithms are based on the complex tree structure to reduce the complexity of searching the unique minimum Euclidean distance and multiple Euclidean distances, and obtain the corresponding transmit symbol vectors. The novel complex hard sphere decoder for MIMO detection is presented first, and then the soft processing of a novel sphere decoding algorithm for list generation is discussed. The performance and complexity of the proposed techniques are demonstrated via simulations in terms of bit error rate (BER), the number of nodes accessed and floatingpoint operations (FLOPS).
Introduction
To achieve a high spectral efficiency, maximum likelihood (ML) detection should be employed with highorder constellations. However, “bruteforce” ML detection is impractical even for a system with a small number of antennas. An alternative method is called the sphere decoder (SD), and this has attracted significant attention in the last decade, due to the considerable complexity reduction it achieves [1, 2]. The key idea behind the SD is to find the lattice point closest to the received signals within a sphere radius. Although the computational complexity has been greatly reduced, it is still very high for systems with a large number of antennas and highorder modulation. For the complexity reduction, the authors in [3–8] have studied different search strategies and enumeration schemes. Additionally, some suboptimal methods with linear and decision feedback equalization (DFE) [9, 10] have been proposed to approach the ML performance.
In most applications, the complexvalued system is decoupled and reformulated as an equivalent realvalued system. Realvalued SDs can only process latticebased modulation schemes such as quadrature amplitude modulation (QAM) and pulse amplitude modulation (PAM), while other modulations such as phase shift keying (PSK) cannot be processed as efficiently, because some invalid lattice points are included in the search. Additionally, the depth of the expanded tree for realvalued SDs is twice of that for complex counterparts. Hence, the complexvalued SESD and a modified version of SESD were proposed in [11, 12]. The complexvalued SDs avoid the decoupling of the complex system and can be widely applied to different modulations without reaching invalid lattice points. Especially, the latter one can achieve a very low complexity compared to other realvalued and complexvalued SDs [13]. However, the intricacy of complex SE enumeration is still a weak point that makes the realvalued SDs preferred for hardware implementation. Some novel lowcomplexity complex enumerators have been studied in [14–16], and these enumerators are interchangeable in most complexvalued SDs. Nevertheless, the enumeration still must be employed in each detection layer and performed for several times once new branches are accessed. Furthermore, the authors in [17] investigate the practical performance of a novel sphere decoder (Geosphere) for multiuser detection. A novel twodimensional zigzag ordering strategy has been studied in the sense that the number of path metric calculations is reduced. Additionally, the lower bound of the path metric is employed to eliminate the branches if the path metric is smaller than the lower bound. Another efficient ordering and pruning scheme is studied in [18], which performs the horizontal pruning and vertical pruning with a novel tight lower limit for the path metric. These two schemes discussed above could also be used in the complexvalued SDs with simple modifications.
Motivated by the description above and probabilistic tree pruning SD (PTPSD) [7], we devise a novel complexvalued SD (CSD) with statistical pruning strategy (SPS), successive interference cancellation (SIC) aided modified probabilistic tree pruning (MPTP).
In addition to hard decision type detection, soft processing for multiinput multioutput (MIMO) systems has been recently studied in several works [11, 14, 19–21]. In [11], the authors report a nearcapacity MIMO detection using listSD (LSD) with a relatively large candidate list. With a large list, the performance of the LSD will be very close to the maximum a posteriori probability (MAP) detector. The results in [14] illustrate the hardware implementation of LSD with four candidates, which may not be considered as an implementation of approximate MAP detector [19]. This is because the list size is too small to achieve nearcapacity performance. Hence, it is not an optimum detection technique. Additionally, the loglikelihood ratio (LLR) clipping is also required for the LSD with a small list, because the +1 or −1 is missing in a particular bit LLR calculation based on the maxlog MAP criterion. It can be fixed by setting a given magnitude of LLR as in [11], but the performance loss is unavoidable. Although the authors in [20, 21] have fixed this problem for noniterative detection decoding scheme and noncoherent detection, these LLR clipping techniques are still relatively complicated in most cases and may not be suitable for every iterative detection decoding structure. However, a large list will result in irreducible complexity which limits the applications of LSD, because the candidate updates in the list is difficult in hardware implementation [22]. This is not a desirable feature. If the list generation for LSD is simple, the multiple search will not be required in the iterative processing compared to single tree search (STS) sphere decoder. In the following, we will discuss a simple list generation for the LSD given the proposed CSD.
The main contribution of this paper is summarized as follows:

An efficient CSD has been proposed that approaches the linear complexity for practical largescale MIMO systems. This is because the proposed CSD significantly reduces the required number of times for performing enumeration and the span of the detection tree.

Due to the additional conditions (ACs), the performance loss of the conventional CSD is compensated.

The search strategy of the proposed CSD can be easily extended to the LSD with lower complexity.

The scatter list generation and the ML ordering accelerate the construction of the list and make the LSD more suitable for the parallel hardware implementation.
Due to the requirement of enumerators, the SDs we discuss in this work are all based on complexvalued SE enumeration in [11, 12] for simplicity, namely computation of coordinate bound (CCB) enumeration.
The rest of the paper is organized as follows. Section 2 presents the system model and problem formulation. Section 3 describes the proposed complex sphere decoder and the algorithm table. The soft processing with LSD is also discussed in Section 4. In Section 5, the simulation results demonstrate the complexity and BER performance of CSD and LSD, respectively. A conclusion is drawn in Section 7.
System model
The channels between transmit and receive antennas are assumed to be independent frequency flat fading. Here, we denote the dimension symbol \(\mathbb {C}_{N_{t} \times N_{r}}\) as the complex number with the vertical dimension N _{ t } and the horizontal dimension N _{ r }. The channel can be represented by \(\mathbf {H} \in \mathbb {C}_{N_{t} \times N_{r}}\). Defining the transmit symbol vector \(\mathbf {s}=[s_{1} \ldots s_{i} \ldots s_{N_{t}}]^{T} \in \mathbb {C}_{N_{t} \times 1}\), the received signal \(\mathbf {y}=[y_{1} \ldots y_{j} \ldots y_{N_{r}}]^{T} \in \mathbb {C}_{N_{r} \times 1}\), and the AWGN noise vector \(\mathbf {v}= [v_{1} \ldots v_{j} \ldots v_{N_{r}}]^{T} \in \mathbb {C}_{N_{r} \times 1}\), the MIMO system model can be written as
where \(\mathbb {E}\lbrace \mathbf {s}\mathbf {s}^{H} \rbrace ={\sigma ^{2}_{s}}\mathbf {I}_{N_{t}\times N_{t}}\). Note that each element in the noise vector is assumed to be a zeromean circular symmetric complex Gaussian variable, which implies that the phase rotation of v will not affect its statistical properties, and \(\mathbb {E}\lbrace \mathbf {v}\mathbf {v}^{H} \rbrace ={\sigma ^{2}_{v}}\mathbf {I}_{N_{r} \times N_{r}}\). After QR decomposition, the reformulated mathematical expression of (1) can be presented as follows:
where
the receive signal vector z=Q ^{H} y, and the noise vector \(\tilde {\mathbf {v}} = \mathbf {Q}^{H}\mathbf {v}\). Because the random variables in the vectors \(\tilde {\mathbf {v}}\) and v have the same statistical properties. In the remaining part of the paper, we use the same notation v for both cases. Note that we assume N _{ t }≤N _{ r } throughout this paper. However, for the rank deficit MIMO systems, the authors in [23] and [24] proposed new forms of the Cholesky and QR decomposition to exploit the performance advantage of the SDs for overloaded MIMO systems (N _{ t }>N _{ r }). The algorithms discussed in the paper could be also naturally extended to the rank deficit MIMO systems with modifications of such preprocessing techniques. To clarify the summation of the Euclidean distance for multiple single layers, the full path metric (FPM) and the partial path metric (PPM) are defined as
where the branch metric
For the ith detection layer, the only first i elements in the ith row of the matrix R are used for the branch metric calculation in Eq. (5). In other words, the FPM \(p_{N_{t}}\) corresponds to the Euclidean distance of all detection layers, and the PPM p _{ m } means the Euclidean distance of partial detection layers. Note that the quantity \(p_{N_{t}}\) denotes the FPM when m=N _{ t }, and the quantity p _{ m } denotes the PPM when m<N _{ t }.
Complex sphere decoder with successive interference cancellationbased tree pruning
The proposed CSD is based on the CCB enumeration scheme discussed as follows. Hence, the system model becomes (2). Furthermore, the minimum mean square errorsorted QRdecomposition (MMSESQRD) is used as the processing technique [2], which provides several efficient QR decomposition methods. For fair comparison between different CSDs, the one with the most promising decomposition methods (SQRD) with the optimum signal to interference plus noise ratio (SINR) ordering is used for preprocessing to obtain the uppertriangular matrix R. Thus, the CSDs with MMSESQRD can achieve the ML performance at the expense of very low complexity.
The review of CCB
This subsection briefly demonstrates a complexvalued enumeration namely CCB or complex SE enumeration. This bound was first proposed in [11], and an improved version was presented in [12], which separates the constellation points into groups located on one or multiple concentric rings and computes the phase bound based on the current sphere radius and previously detected symbols \(\hat {s}_{k}\) and r _{ i,k }. In this case, these constellation points can be tested according to the bound to determine whether they are in the circle of nullingcancelling points, which are the symbol estimates of the SIC. The constellation points \({s^{m}_{i}}=\gamma e^{\theta _{m}}\) can be represented in polar coordinates, where the quantity \({s^{m}_{i}}\) denotes the mth candidate constellation point at layer i, and 0≤θ _{ m }<2π. Note that the quantity γ will be different in different concentric rings as shown in [11]. Figure 1 illustrates that the phase bound of the constellation points for one particular concentric ring can be determined by the sphere radius. The red curves correspond to the phase bounds of concentric rings. Accordingly, the nullcancelling point for the ith layer can be defined as
where the quantity \(\hat {s}_{k}\) is defined as the accessed symbol at the kth detection layer in the surviving branch. With the aid of trigonometric functions, we calculate the phase bound of θ _{ m } as
where the quantity \(\theta _{\delta _{i}}\) denotes the phase of the nullingcancelling point obtained by Eq. (6), and the quantity p _{SD} denotes the sphere radius. If the previous detected symbols \(\hat {s}_{k}\) are perfect, the equivalent sphere radius \(\frac {p_{\text {SD}}}{r_{i,i}}\) can be used to compute the phase bound (red curves) with the trigonometric function \(\cos \theta =\frac {a^{2}+b^{2}c^{2}}{2ab}\), where a=γ, b=δ _{ i }, and \(c=\frac {p_{\text {SD}}}{r_{i,i}}\). The vector with possible candidates \(\tilde {\mathbf {s}}_{i}\) in layer i for a given concentric ring can be categorized as
where 0≤ arccos(ψ)≤π. From (8), no constellation points in one concentric ring will be included for the candidates if ψ>1, which implies that the phase bound is too small to cover any constellation points except δ _{ i }. For ψ<−1, the corresponding phase bound \([\theta _{\delta _{i}}\pi, \theta _{\delta _{i}}+\pi ]\) to include all constellation. For −1≤ψ≤1, only the constellation points inside the bound can be used for the search. However, the phase bound described above may eliminate some candidates, which should be included in the search. This is because the phases of the constellation points are between 0 and 2π, and the corresponding phase bound may not be located within [0,2π]. Thus the mismatch between the phases of constellation points and the phase bound must be fixed to avoid missing candidates.
Novel search strategy and successive interference cancellation tree pruning
In this subsection, we present three different techniques to reduce the complexity: (1) a novel search strategy with the aid of SIC; (2) SPS; (3) the MPTP algorithm.
Search strategy
Compared to conventional SECSD, the novel search strategy first performs SIC to obtain the nullingcancelling points and the FPM without calculating the PPM of other constellation points and sorting for each layer, and the radius p _{SD} may be updated by FPM, i.e., \(p_{N_{t}}\), once the search reaches the bottom layer. The rest of the search can be performed upwards starting from the nullingcancelling point of the bottom layer rather than top layer as in conventional SECSD. Additionally, the span of the tree can be further shrunk by the MPTP. In (7), the candidates chosen by CCB can be determined by the new updated radius obtained by the MPTP. Hence, the number of possible candidates for each layer can be significantly reduced. The details of the proposed algorithm are described in Algorithm 1.
Statistical pruning strategy
As discussed in [25], the complexity of the SD is significantly affected by the initial radius. However, the new search strategy does not require the search starting with an appropriate initial radius to control the tree span. This is because the bottom layer will be arrived by the SIC process. For some extreme channel conditions, the radius derived by the SIC may still have a very small chance to be very large. In other words, the result obtained by the proposed search strategy may reach the local optima rather than the global one. Hence, the initial radius for CSDs must be considered. First, we assume the ML solution is obtained. Thus, the optimum radius would be the summation of squared noise terms \(\left (\sum _{i=1}^{N_{t}}\vert v {\vert ^{2}_{i}}\right)\) of the detection layers, which follows χ ^{2}distribution with N _{ t } degree of freedom and upper bounded by the initial radius \(p^{\prime }_{\text {SD}}\). The normalized radius can be written as \(\beta = \frac {p'_{\text {SD}}}{{\sigma ^{2}_{v}}}\), and the normalized summation of squared noise terms can be written as \(u=\frac {\sum _{i=1}^{N_{t}}\vert v{\vert ^{2}_{i}}}{{\sigma ^{2}_{v}}}\). The quantity \({\sigma ^{2}_{v}}\) is the noise variance. In the following, an appropriate value of β is required. The cumulative density function (CDF) of the random variable u can be represented as
where the quantity ε is the predefined threshold probability according to the empirical results with the different number of antennas of the MIMO systems, and the quantity β can be easily calculated by the inverse calculation of (9), i.e., the inverse incomplete Gamma function. Once p _{SD}>pSD′, the quantity p _{SD} will be updated by \(p^{\prime }_{\text {SD}}\) at the bottom layer without performing multiple search compared to that in [25].
Modified probabilistic tree pruning
We assume that the remaining N _{ t }−m layers’ symbols are perfectly detected in (5). Then, the PPM is only affected by the noise. Hence, the current PPM p _{ m } plus the norm of the remaining layers’ noise \(\sum _{i=m+1}^{N_{t}}\vert v_{i} \vert ^{2}\) must be smaller than the radius in most cases. Hence, the possible FPM can be represented as
where \(p_{m}= \sum _{i=1}^{m}B_{i}\). Since \(\sum _{i=m+1}^{N_{t}}\vert v_{i} \vert ^{2}/{\sigma ^{2}_{v}} \sim \chi ^{2}\) with 2(N _{ t }−m) degrees of freedom [7, 25], the noise term after some manipulations can be given by
Accordingly, the value of \(\text {Pr}\left (\sum _{i=m+1}^{N_{t}}\vert v_{i} \vert ^{2}/{\sigma ^{2}_{v}}\leq (p_{\text {SD}}  p_{m})/{\sigma ^{2}_{v}}\vphantom {\dot {\sum _{i=m+1}^{N_{t}}\vert v_{i}}\!}\right)\) is reasonably large. Because the sphere radius p _{SD} is sufficiently large to avoid missing the ML solution. As discussed above, the summation \(\sum _{i=m+1}^{N_{t}}\vert v_{i} \vert ^{2}/{\sigma ^{2}_{v}}\) follows the Chisquare distribution, and the CDF is \(\text {Pr}\left (\sum _{i=m+1}^{N_{t}}\vert v_{i} \vert ^{2}/{\sigma ^{2}_{v}} \leq (p_{\text {SD}}  p_{m})/{\sigma ^{2}_{v}}\right)\). Thus,
where \(\Xi (x;a)={\int _{0}^{x}}\frac {1}{\Gamma (a)}e^{t}t^{a1}dt\). In order to obtain the PPM p _{ m }, Eq. (12) can be reformulated as
where Ξ ^{−1}(x;a) is the inverse of Ξ(x;a), and the quantity ε _{ p } is the predefined probability. Hence, the left hand side (LHS) of the Eq. (13) can be considered as the upper bound of the PPM at the mth detection layer. In other words, any PPM p _{ m } larger than the LHS of (13) is unlikely to be the correct path for the ML solution, so these nodes at the m detection layer with their child nodes are eliminated in the search tree. To avoid the CCB, we introduce the quantized nullingcancelling point Q(δ _{ m }) obtained by SIC to calculate the minimum PPM for the mth layer as
and
where \(\rho _{m}=p_{\text {SD}}{\sigma ^{2}_{v}}\Xi ^{1}(\epsilon _{p};N_{t}m)\), and Q(δ _{ m }) is a quantized symbol for the given mth layer. If the inequality in (15) is satisfied, the NC point and the remaining nodes with their child nodes are all pruned, and the CCB is not carried out. Otherwise, the quantity \(p^{\delta }_{m}\) is used in (7) to replace p _{SD} to further reduce the number of candidates. Note that the parameter ρ _{ m } is precomputed before the start of the transmission.
Additional conditions for CCB
There are two ACs we should consider to avoid the candidates missing, if we introduce ρ _{ i } as the intra radius for CCB in each detection layer.

If \(\theta _{\delta _{i}}\arccos (\psi)<0\) and \(\theta _{\delta _{i}}+\arccos (\psi)>0\), set −π≤θ _{ k }<π. If 0≤θ _{ k }<2π, some constellation points located in [π,2π] will be eliminated erroneously.

If \(\theta _{\delta _{i}}+\arccos (\psi)>2\pi \), set \(\theta _{\delta _{i}}+\arccos (\psi)=\theta _{\delta _{i}}+ \arccos (\psi)2\pi \). If the upper bound of the phase is greater than 2π, the constellation points in [0,π] will not be included.
In Fig. 1, we present two examples that can be fixed by the conditions described above. The phase range between \(\frac {3\pi }{4}\) and \(\frac {7\pi }{8}\) does not match to the above definition 0≤θ _{ k }<2π, so the two points between \(\frac {3\pi }{2}\) and 2π will be pruned erroneously in Fig. 1 a within the red circle. For Fig. 1 b, the phase of the constellation point is \(\frac {\pi }{4}\), which should be considered as a candidate based on the phase range. But the upper bound of phase obtained by CCB is greater than 2π, which will eliminate the candidate at \(\frac {\pi }{4}\). Note that these ACs are not specified in previous works such as [11, 12], which employ extremely large initial radius instead. For PSK modulation and QAM, all constellation points are located on one ring, and the candidates can be obtained in one shot. For highorder QAM, the CCB must be performed multiple times for different concentric rings.
List soft processingbased complex sphere decoder
As we discussed above, the conventional LSD has a variable complexity. In our case, we extend our proposed CSD to the LSD with a simpler list generation. From the original idea of the LSD, a list of symbol candidates with the smallest FPMs are required in the LLR calculation as (17). Furthermore, it would be possible to construct a list with the MAP solution inside. However, the complexity of list generation will be variable and significant [11]. For the simple implementation of LSD with a large number of candidates, the scatter list generation (SLG) is proposed.
Extrinsic LLR calculation of LSD
According to the MAP criterion, the extrinsic LLR can be evaluated by
where the vector \(\mathbf {b}_{\bar {k}}\) denotes the bit vector omitting the kth bit, the a priori LLR \(\mathbf {L}_{e2}(\mathbf {b}_{\bar {k}})\) denotes the LLR from the channel decoder corresponding to the bits in \(\mathbf {b}_{\bar {k}}\), and the quantity \(\mathcal {B}_{k^{\pm }}\) denotes the list of bit vectors obtained by the LSD having ±1 at the kth bit. The symbol vector \(\mathbf {s}_{k^{\pm }}\phantom {\dot {i}\!}\) denotes the possible symbol combinations corresponding to the set \(\mathcal {B}_{k^{\pm }}\), and the corresponding kth bit of \(\mathbf {s}_{k^{\pm }}\phantom {\dot {i}\!}\) equals ±1. Following the maxlog approximation and the list obtained by the LSD [11], Eq. (16) becomes
The LLR L _{ e1} for the kth bit in the transmit symbol vector is obtained for the channel decoder. The extrinsic information L _{ e1} from the LSD will be fed forward to the channel decoder as the input, and the extrinsic information L _{ e2} from the channel decoder will be fed back to the LSD. Thus, the information between two decoding components exchanges iteratively.
Scatter list generation
To build a large list with simple implementation, a few modifications will be made to the proposed CSD:

Perform the search by the proposed CSD to obtain the branches accessed in the search, and rearrange these branches in an ascending order according to the PPMs. Start several searches with the ML ordering by traversing the spans of the subtrees of the branches until the list is filled. Note that the subtree search will be terminated once it reaches the starting point of the neighbouring subtree search.

Replace the radius p _{SD} by the largest FPM of the symbol vector in the list.

MPTP will be carried out given the new radius p _{SD}.

The sphere radius p _{SD} may be updated in (9) with the new largest FPM in the list once a candidate with a smaller FPM is found.
The search strategy described above splits the entire tree into different subtrees and searches them independently. The algorithm table is shown in Algorithm 2. Although the proposed CSD is needed to perform several times for the scatter list generation, its complexity has been significantly reduced, which is measured via the number of updates in the list generation.
MLbased ordering
The ML solution can be exploited to reorder the remaining branches for the list generation of the LSD. When the list is full, the search will go back to the upper layers and proceed down the tree. However, the unvisited nodes at the lower layers are unknown to this search, and these partial branches would be ordered according to the SE enumeration. The basic idea of MLbased ordering is to sort the remaining partial branches with the ML solution in the low detection layers rather than only computing their real PPM for a given layer. Additionally, a large proportion of the remaining branches may be discarded if the distance (p _{SD}) is much smaller than the Euclidean distance of s _{ r }. In our case, the following equation can be used for ordering at the ith layer:
where the vector \(\mathbf {s}_{r}=\left [s^{1}_{\text {ML}},\ldots,s^{i1}_{\text {ML}}, {s^{i}_{r}},\ldots,s^{N_{t}}_{r}\right ]^{T}\), which implies that the unknown i−1 transmit symbols at low detection layers are replaced by the symbols in the ML solution, which can be used for ordering. The quantity \(\mathcal {R}\) denotes the set of available branches for the ith layers, and the quantity κ denotes the smallest Euclidean distance in the sorting process. The calculation of MLbased ordering has a modest cost in (18), which only needs \(\vert \mathcal {R} \vert (i+1)\) multiplications for each layer. The notation · denotes the size of the set.
Simulation results
In this section, we have discussed the proposed CSD in two different forms: (1) the hard output CSD and (2) the soft output CSD (LSD). For the hard output CSD, the performance and complexity of several CSDs are compared via BER and the number of visited nodes in a 8×8MIMO system with 16QAM and 8PSK. An MPSK modulation in our simulation is defined as γ e ^{(2n+1)π/M}:n=0,1,…,M−1. The quantity γ is defined as the magnitude of the modulation scheme, and the quantity M is the size of the modulation. We consider the conventional SECSD, PhamCSD [12], PTPCSD [7], and the proposed CSD with and without AC for CCB, all of which are complex SE enumerationbased CSD with p _{SD}=∞ at the beginning of the search. The PTP can be simply extended to PhamCSD. The energy per bit to noise (E _{ b } N _{0}) is used. The MIMO channel coefficients (N _{ t }=N _{ r }) are generated according to Jakes model, and the channel noise is additive white Gaussian noise, which is identically independently distributed for each receive antenna as stated in the previous section. The probabilistic noise constraint is set to ε _{ p }=0.2. The threshold ε for SPS must be appropriately adjusted according to the dimensions and the modulation as stated earlier, and we set ε=0.001. The ISRC scheme [8] is not employed because of the difficulty of choosing parameters for intra radius.
As shown in Figs. 2 and 3, the complexity of the proposed CSD improves upon the others in terms of visited nodes per channel use by 25 % for 16QAM and more than 25 % for 8PSK at high E _{ b } N _{0} values without any BER performance loss, even compared to conventional SECSD between the mid and high E _{ b } N _{0} regime. The performance loss of the proposed algorithm without ACs is significant at high SNRs. In other words, it is more sensitive to the missing candidates in low noise scenarios. However, the complexity reduction is not obvious at low E _{ b } N _{0} scenarios due to the CCB including more unreliable constellation points. It can be observed that the curves of the number of visited nodes for different SDs converge at very high E _{ b } N _{0}, so the improvement of the proposed SD is reduced at high E _{ b } N _{0}, but is still very promising.
To show the robustness to the channel estimation errors, the BER performance of CSDs for 8×8 MIMO system with 8PSK and least square (LS) channel estimation [26] is plotted in Fig. 4. We can observe that the BER performance of the proposed CSD with imperfect channel estimates can still achieve the same performance as other existing CSDs. The BER performance of 16QAM is not shown here, because it has similar curves as in Fig. 4.
The worstcase complexity is measured by the 99 % quantile of the total number of visited nodes per channel use (\(\text {Pr}(\mathcal {C}_{w}>\mathcal {C}_{\text {any}})=0.99\)) [27], where the quantity \(\mathcal {C}_{w}\) denotes the number of visited nodes accessed by the SDs in one particular channel use, and the quantity \(\mathcal {C}_{\text {any}}\) denotes the number of visited nodes accessed by the SDs in any channel use. The corresponding worstcase complexity \(\mathcal {C}_{w}\) of CSDs are also plotted in Fig. 5, which implies that the number of visited nodes of the proposed CSD is tightly lower bounded by the complexity of SIC at high SNRs.
Additionally, the complexity of SDs increases exponentially with increasing dimension. We therefore plot the number of visited nodes against the dimensions (N _{ t }=N _{ r }) at a high E _{ b } N _{0} (20 dB) to show that the complexity is still reduced by our proposed algorithm in Fig. 6.
The complexity discussed so far is only based on the number of visited nodes. In order to show the advantages of complexity reduction of complex SE enumeration (CCB) and eliminating unnecessary candidates, the curves with the number of FLOPS are presented in Fig. 7. For a fair comparison, we assume that a complex addition requires 2 FLOPS, and a complex multiplication requires 16 FLOPS. The proposed algorithm still outperforms the other CSDs, because of fewer implementations of complex SE enumeration and the reduced number of candidates. The number of FLOPS of the detection ordering are not considered, because the CSDs are performed with the same preprocessing technique. Furthermore, the parameters for MPTP can be precomputed before the start of the transmission.
For soft output CSD (LSD), we consider 8×816 QAM and a NSC half rate convolutional code with constraint length 3 for simplicity. In a coded system, the energy per bit to noise has been redefined: \(\frac {E_{b}}{N_{0}}_{\text {dB}}=E_{s}{N_{0}}_{\text {dB}}+10\log _{10}\frac {N_{t}}{RN_{r}M}\), where R denotes the rate of the channel code. The performance and complexity of the proposed LSD have been evaluated by BER and the number of updates in the list, respectively. The EXIT chart has also been introduced to illustrate how the mutual information changes. Note that a fixed value clipping has been adopted in our simulation and the appropriate clipping values can be simply obtained by evaluating the mutual information I _{ e } as [20]. In our case, the clipping value is set to ±12. In Fig. 8 a, the EXIT chart of conventional LSD [11] and the proposed LSD is illustrated with different sizes of list. We can observe that both of them perform almost identically with different sizes of list, and the list size has a significant influence on the LSD performance. Additionally, the EXIT chart of LSD with the same list size in different SNR has also been plotted in Fig. 8 b. The SNR only moves the curves up and down without changing the shapes. Similarly, the BER performance of two LSD with L=512 agrees with the results in the EXIT chart, which has been presented in Fig. 9, and the performance improves with the increasing number of iterations. The complexity comparison made by CDF has been shown in Fig. 10, which indicates that the number of updates in the list has been significantly reduced by the proposed LSD with a large list size. Furthermore, the search can be terminated early to suit the hardware implementation.
Related work discussion
For a very largescale integration (VLSI) implementation, the enumeration scheme becomes the bottleneck of sphere decoding algorithm as discussed in [14, 15, 17]. The enumeration must be efficiently implemented, so “onenodepercycle architecture” is one of the promising structures for the hardware implementation. Our proposed sphere decoder basically is a variant of the method proposed in [14]. The authors proposed a new architecture which consists of two entities: (1) metric computation unit (MCU) and (2) metric enumeration unit (MEU). These two components perform in a parallel manner to handle the forward search and the backward search separately. In our proposed method, the enumeration scheme, i.e., MEU and the MCU can work in the same way. The search will always perform the successive interference cancellation to obtain the nullingcancelling points, once the unvisited node is accessed. In the meanwhile, the MEU can be used to find the remaining surviving nodes after the above process. Furthermore, the latency requirement of “onenodepercycle” architecture will not be a problem in our case. This is because the second term in the modified probabilistic tree pruning is not realtime calculation as described in Eq. (14). Only one additional cycle is needed to obtain the tight radius in each detection layer. The reduction of the number of the visited nodes is more significant than the few additional cycles. Additionally, the statistical pruning we used in the paper is also precomputed to avoid that the radius obtained by the successive interference cancellation is too large in some extreme cases. In [14], the authors modified the SE enumeration scheme to achieve the critical path reduction. However, the modified one is not strictly compatible with the “onenodepercycle” architecture. Due to the complexity reduction and a few additional cycles required, the modified scheme is still working under the architecture. This situation is quite similar to the one we discussed above.
Furthermore, the complexity of the enumeration scheme can be further reduced by the method introduced in [15, 17]. The authors proposed the twodimensional zigzag enumeration schemes to avoid the sorting process and unnecessary partial path metric calculation, which are very computational intensive for the practical hardware implementation. In other words, the partial path metrics of the unvisited nodes are only calculated once the expansion of the tree is demanded. Hence, there is no need to visit all children of a parent node with the real Euclidean distance sorting. To the authors’ best knowledge, the twodimensional zigzag enumeration scheme is the most efficient method for the complex sphere decoder.
Conclusion
In this paper, novel sphere decoding algorithms for MIMO detection and iterative detection and decoding have been presented. The proposed CSD, incorporating the statistical tree pruning technique and SIC, first reaches the bottom layer and eliminates the candidates for the lower layers before the search reaches them rather than eliminating these candidates at the lower layers. From the simulation results, it is seen that the proposed CSD can significantly save computational efforts compared to the conventional CSD. Furthermore, the proposed CSD has been naturally extended to the list SD (LSD) based on the scatter list generation. The proposed LSD makes better use of the ML solution to reorder the remaining branches. Hence, the list generation becomes simpler than that of the conventional LSD. The complexity of the proposed sphere decoding algorithms for MIMO detection is significantly reduced, and the algorithm provides an attractive tradeoff between complexity and performance.
References
 1
E Viterbo, J Boutros, A universal lattice code decoder for fading channels. IEEE Trans. Inf. Theory. 45(5), 1639–1642 (1999).
 2
Y Li, N Seshadri, S Ariyavisitakul, On maximumlikelihood detection and the search for the closest lattice point. IEEE Trans. Inf. Theory. 49(10), 2389–2402 (2003).
 3
CP Schnorr, M Euchner, Lattice basis reduction: improved practical algorithms and solving subset sum problems. Math. Program. 66(2), 181–191 (1994).
 4
E Agrell, T Eriksson, A Vardy, K Zegar, Closet point search in lattices. IEEE Trans. Inf. Theory. 48(8), 2201–2214 (2002).
 5
AD Murugan, HE Gamal, et al., A unified framework for tree search decoding: rediscovering the sequential decoder. IEEE Trans. Inf. Theory. 52(3), 933–953 (2006).
 6
R Gowaikar, B Hassibi, Statistical pruning for nearmaximum likelihood decoding. IEEE Trans. Signal Process. 55(6), 2661–2675 (2007).
 7
B Shim, I Kang, Sphere decoding with a probabilistic tree pruning. IEEE Trans. Signal Process. 56(10), 4867–4878 (2008).
 8
MX Chang, On further reduction of complexity in tree pruning based sphere search. IEEE Trans. Commun. 58(2), 471–422 (2010).
 9
GJ Foschini, Layered spacetime architecture for wireless communication in a fading environment when using multiple antennas. Bell Lab Technical J.1(2), 41–59 (1996).
 10
RC de Lamare, R SampaioNeto, Minimum mean square error iterative successive parallel arbitrated decision feedback detectors for DSCDMA systems. IEEE Trans. Commun. 56(5), 778–789 (2008).
 11
B Hochwald, S Ten Brink, Achieving nearcapacity on a multipleantenna channel. IEEE Trans. Commun. 51(3), 389–399 (2003).
 12
D Pham, KR Pattipati, et al., An improved complex sphere decoder for VBLAST systems. IEEE Signal Process. Lett. 11(9), 748–751 (2004).
 13
KC Lai, LW Lin, Lowcomplexity adaptive tree search algorithm for MIMO detection. IEEE Trans. Wireless. Commun. 8(7), 3716–3726 (2009).
 14
A Burg, M Borgmann, et al., VLSI implementation of MIMO detection using the sphere decoding algorithm. IEEE J. SolidState Circuits. 40(7), 1566–1577 (2005).
 15
M Shabany, K Su, P Gulak, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP). A pipelined scalable high throughput implementation of a nearML KBest complex lattice decoder, (2008), pp. 3173–3176, doi:10.1109/ICASSP.2008.4518324.
 16
M Barrenechea, M Mendicute, et al., in Proc.19th European Signal Process. Conf. (EUSIPCO). Implementation of complex enumeration for multiuser MIMO vector precoding, (2011), pp. 739–743.
 17
K Nikitopoulos, J Zhou, B Congdon, et al., in Proc. 2014 ACM Conf. on SIGCOMM. Geosphere: Consistently turning MIMO capacity into throughput, pp. 631–642, doi:10.1145/2619239.2626301.
 18
K Nikitopoulos, A Karachalios, D Reisis, Exact MaxLog MAP SoftOutput Sphere Decoding via Approximate SchnorrEuchner Enumeration. IEEE Trans. Veh. Technol. 64(6), 2749–2753 (2015).
 19
SA Laraway, B FarhangBoroujeny, Implementation of a Markov Chain Monte Carlo based multiuser/MIMO detection. IEEE Trans. Circuits Syst. I. 56(1), 246–255 (2009).
 20
E Zimmermann, DL Milliner, et al., in Proc. IEEE GLOBECOMM 2008. Optimal LLR clipping levels for mixed hard/soft output detection, (2008), pp. 1–5, doi:10.1109/GLOCOM.2008.ECP.222.
 21
RH Gohary, TJ Willink, On LLR clipping in BICMID noncoherent MIMO communications. IEEE Commun. Lett. 15(6), 650–652 (2011).
 22
C Studer, Iterative MIMO decoding: algorithms and VLSI implementation aspects (Ph.D. dissertation, HartungGorre Verlag Konstanz, 2009).
 23
T Cui, C Tellambura, An efficient generalized sphere decoder for rankdeficient MIMO systems. IEEE Commun. Lett. 9(5), 423–425 (2015).
 24
P Wang, T LeNgoc, A lowcomplexity generalized sphere decoding approach for underdetermined linear communication systems: performance and complexity evaluation. IEEE Trans. Commun. 57(11), 3376–3388 (2009).
 25
B Hassibi, H Vikalo, On the spheredecoding algorithm I.Expected complexity. IEEE Trans. Signal Process. 53(8), 2806–2818 (2005).
 26
S Kay, Fundamentals of Statistical Signal Processing: Estimation Theory (A Simon & Schuster Company, Upper Saddle River, New Jersey, 1993).
 27
DW Waters, JR Barry, The chase family of detection algorithms for multipleinput multipleoutput channels. IEEE Trans. Signal Process.56(2), 739–747 (2008).
Acknowledgements
This work is supported by the Scientific Research Foundation of CUIT (NO. KYTZ201415), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry and the Sichuan Provincial Department of Science and Technology Innovation and R&D projects in Science and Technology Support Program (NO. 2015RZ0060).
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
The main contribution of this paper is summarized as follows: (1) An efficient CSD has been proposed that approaches the linear complexity for practical largescale MIMO systems. This is because the proposed CSD significantly reduces the required number of times for performing enumeration and the span of the detection tree. (2) Due to the ACs, the performance loss of the conventional CSD is compensated. (3) The search strategy of the proposed CSD can be easily extended to the LSD with lower complexity. (4) The scatter list generation and the ML ordering accelerate the construction of the list and make the LSD more suitable for the parallel hardware implementation. All authors read and approved the final manuscript.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
LI, L.A., de Lamare, R.C. & Burr, A.G. Successive interference cancellation aided sphere decoder for multiinput multioutput systems. J Wireless Com Network 2016, 51 (2016). https://doi.org/10.1186/s136380160555y
Received:
Accepted:
Published:
Keywords
 Complex sphere decoder (CSD)
 Detection algorithms
 Multiinput multioutput (MIMO)
 QR decomposition