Successive interference cancellation aided sphere decoder for multiinput multioutput systems
 LI Alex LI^{1}Email author,
 Rodrigo C. de Lamare^{2, 3} and
 Alister G. Burr^{2}
https://doi.org/10.1186/s136380160555y
© LI et al. 2016
Received: 15 June 2015
Accepted: 5 February 2016
Published: 16 February 2016
Abstract
In this paper, sphere decoding algorithms are proposed for both hard detection and soft processing in multiinput multioutput (MIMO) systems. Both algorithms are based on the complex tree structure to reduce the complexity of searching the unique minimum Euclidean distance and multiple Euclidean distances, and obtain the corresponding transmit symbol vectors. The novel complex hard sphere decoder for MIMO detection is presented first, and then the soft processing of a novel sphere decoding algorithm for list generation is discussed. The performance and complexity of the proposed techniques are demonstrated via simulations in terms of bit error rate (BER), the number of nodes accessed and floatingpoint operations (FLOPS).
Keywords
Complex sphere decoder (CSD) Detection algorithms Multiinput multioutput (MIMO) QR decomposition1 Introduction
To achieve a high spectral efficiency, maximum likelihood (ML) detection should be employed with highorder constellations. However, “bruteforce” ML detection is impractical even for a system with a small number of antennas. An alternative method is called the sphere decoder (SD), and this has attracted significant attention in the last decade, due to the considerable complexity reduction it achieves [1, 2]. The key idea behind the SD is to find the lattice point closest to the received signals within a sphere radius. Although the computational complexity has been greatly reduced, it is still very high for systems with a large number of antennas and highorder modulation. For the complexity reduction, the authors in [3–8] have studied different search strategies and enumeration schemes. Additionally, some suboptimal methods with linear and decision feedback equalization (DFE) [9, 10] have been proposed to approach the ML performance.
In most applications, the complexvalued system is decoupled and reformulated as an equivalent realvalued system. Realvalued SDs can only process latticebased modulation schemes such as quadrature amplitude modulation (QAM) and pulse amplitude modulation (PAM), while other modulations such as phase shift keying (PSK) cannot be processed as efficiently, because some invalid lattice points are included in the search. Additionally, the depth of the expanded tree for realvalued SDs is twice of that for complex counterparts. Hence, the complexvalued SESD and a modified version of SESD were proposed in [11, 12]. The complexvalued SDs avoid the decoupling of the complex system and can be widely applied to different modulations without reaching invalid lattice points. Especially, the latter one can achieve a very low complexity compared to other realvalued and complexvalued SDs [13]. However, the intricacy of complex SE enumeration is still a weak point that makes the realvalued SDs preferred for hardware implementation. Some novel lowcomplexity complex enumerators have been studied in [14–16], and these enumerators are interchangeable in most complexvalued SDs. Nevertheless, the enumeration still must be employed in each detection layer and performed for several times once new branches are accessed. Furthermore, the authors in [17] investigate the practical performance of a novel sphere decoder (Geosphere) for multiuser detection. A novel twodimensional zigzag ordering strategy has been studied in the sense that the number of path metric calculations is reduced. Additionally, the lower bound of the path metric is employed to eliminate the branches if the path metric is smaller than the lower bound. Another efficient ordering and pruning scheme is studied in [18], which performs the horizontal pruning and vertical pruning with a novel tight lower limit for the path metric. These two schemes discussed above could also be used in the complexvalued SDs with simple modifications.
Motivated by the description above and probabilistic tree pruning SD (PTPSD) [7], we devise a novel complexvalued SD (CSD) with statistical pruning strategy (SPS), successive interference cancellation (SIC) aided modified probabilistic tree pruning (MPTP).
In addition to hard decision type detection, soft processing for multiinput multioutput (MIMO) systems has been recently studied in several works [11, 14, 19–21]. In [11], the authors report a nearcapacity MIMO detection using listSD (LSD) with a relatively large candidate list. With a large list, the performance of the LSD will be very close to the maximum a posteriori probability (MAP) detector. The results in [14] illustrate the hardware implementation of LSD with four candidates, which may not be considered as an implementation of approximate MAP detector [19]. This is because the list size is too small to achieve nearcapacity performance. Hence, it is not an optimum detection technique. Additionally, the loglikelihood ratio (LLR) clipping is also required for the LSD with a small list, because the +1 or −1 is missing in a particular bit LLR calculation based on the maxlog MAP criterion. It can be fixed by setting a given magnitude of LLR as in [11], but the performance loss is unavoidable. Although the authors in [20, 21] have fixed this problem for noniterative detection decoding scheme and noncoherent detection, these LLR clipping techniques are still relatively complicated in most cases and may not be suitable for every iterative detection decoding structure. However, a large list will result in irreducible complexity which limits the applications of LSD, because the candidate updates in the list is difficult in hardware implementation [22]. This is not a desirable feature. If the list generation for LSD is simple, the multiple search will not be required in the iterative processing compared to single tree search (STS) sphere decoder. In the following, we will discuss a simple list generation for the LSD given the proposed CSD.

An efficient CSD has been proposed that approaches the linear complexity for practical largescale MIMO systems. This is because the proposed CSD significantly reduces the required number of times for performing enumeration and the span of the detection tree.

Due to the additional conditions (ACs), the performance loss of the conventional CSD is compensated.

The search strategy of the proposed CSD can be easily extended to the LSD with lower complexity.

The scatter list generation and the ML ordering accelerate the construction of the list and make the LSD more suitable for the parallel hardware implementation.
Due to the requirement of enumerators, the SDs we discuss in this work are all based on complexvalued SE enumeration in [11, 12] for simplicity, namely computation of coordinate bound (CCB) enumeration.
The rest of the paper is organized as follows. Section 2 presents the system model and problem formulation. Section 3 describes the proposed complex sphere decoder and the algorithm table. The soft processing with LSD is also discussed in Section 4. In Section 5, the simulation results demonstrate the complexity and BER performance of CSD and LSD, respectively. A conclusion is drawn in Section 7.
2 System model
For the ith detection layer, the only first i elements in the ith row of the matrix R are used for the branch metric calculation in Eq. (5). In other words, the FPM \(p_{N_{t}}\) corresponds to the Euclidean distance of all detection layers, and the PPM p _{ m } means the Euclidean distance of partial detection layers. Note that the quantity \(p_{N_{t}}\) denotes the FPM when m=N _{ t }, and the quantity p _{ m } denotes the PPM when m<N _{ t }.
3 Complex sphere decoder with successive interference cancellationbased tree pruning
The proposed CSD is based on the CCB enumeration scheme discussed as follows. Hence, the system model becomes (2). Furthermore, the minimum mean square errorsorted QRdecomposition (MMSESQRD) is used as the processing technique [2], which provides several efficient QR decomposition methods. For fair comparison between different CSDs, the one with the most promising decomposition methods (SQRD) with the optimum signal to interference plus noise ratio (SINR) ordering is used for preprocessing to obtain the uppertriangular matrix R. Thus, the CSDs with MMSESQRD can achieve the ML performance at the expense of very low complexity.
3.1 The review of CCB
where 0≤ arccos(ψ)≤π. From (8), no constellation points in one concentric ring will be included for the candidates if ψ>1, which implies that the phase bound is too small to cover any constellation points except δ _{ i }. For ψ<−1, the corresponding phase bound \([\theta _{\delta _{i}}\pi, \theta _{\delta _{i}}+\pi ]\) to include all constellation. For −1≤ψ≤1, only the constellation points inside the bound can be used for the search. However, the phase bound described above may eliminate some candidates, which should be included in the search. This is because the phases of the constellation points are between 0 and 2π, and the corresponding phase bound may not be located within [0,2π]. Thus the mismatch between the phases of constellation points and the phase bound must be fixed to avoid missing candidates.
3.2 Novel search strategy and successive interference cancellation tree pruning
In this subsection, we present three different techniques to reduce the complexity: (1) a novel search strategy with the aid of SIC; (2) SPS; (3) the MPTP algorithm.
3.2.1 Search strategy
Compared to conventional SECSD, the novel search strategy first performs SIC to obtain the nullingcancelling points and the FPM without calculating the PPM of other constellation points and sorting for each layer, and the radius p _{SD} may be updated by FPM, i.e., \(p_{N_{t}}\), once the search reaches the bottom layer. The rest of the search can be performed upwards starting from the nullingcancelling point of the bottom layer rather than top layer as in conventional SECSD. Additionally, the span of the tree can be further shrunk by the MPTP. In (7), the candidates chosen by CCB can be determined by the new updated radius obtained by the MPTP. Hence, the number of possible candidates for each layer can be significantly reduced. The details of the proposed algorithm are described in Algorithm 1.
3.2.2 Statistical pruning strategy
where the quantity ε is the predefined threshold probability according to the empirical results with the different number of antennas of the MIMO systems, and the quantity β can be easily calculated by the inverse calculation of (9), i.e., the inverse incomplete Gamma function. Once p _{SD}>pSD′, the quantity p _{SD} will be updated by \(p^{\prime }_{\text {SD}}\) at the bottom layer without performing multiple search compared to that in [25].
3.2.3 Modified probabilistic tree pruning
where \(\rho _{m}=p_{\text {SD}}{\sigma ^{2}_{v}}\Xi ^{1}(\epsilon _{p};N_{t}m)\), and Q(δ _{ m }) is a quantized symbol for the given mth layer. If the inequality in (15) is satisfied, the NC point and the remaining nodes with their child nodes are all pruned, and the CCB is not carried out. Otherwise, the quantity \(p^{\delta }_{m}\) is used in (7) to replace p _{SD} to further reduce the number of candidates. Note that the parameter ρ _{ m } is precomputed before the start of the transmission.
3.3 Additional conditions for CCB

If \(\theta _{\delta _{i}}\arccos (\psi)<0\) and \(\theta _{\delta _{i}}+\arccos (\psi)>0\), set −π≤θ _{ k }<π. If 0≤θ _{ k }<2π, some constellation points located in [π,2π] will be eliminated erroneously.

If \(\theta _{\delta _{i}}+\arccos (\psi)>2\pi \), set \(\theta _{\delta _{i}}+\arccos (\psi)=\theta _{\delta _{i}}+ \arccos (\psi)2\pi \). If the upper bound of the phase is greater than 2π, the constellation points in [0,π] will not be included.
In Fig. 1, we present two examples that can be fixed by the conditions described above. The phase range between \(\frac {3\pi }{4}\) and \(\frac {7\pi }{8}\) does not match to the above definition 0≤θ _{ k }<2π, so the two points between \(\frac {3\pi }{2}\) and 2π will be pruned erroneously in Fig. 1 a within the red circle. For Fig. 1 b, the phase of the constellation point is \(\frac {\pi }{4}\), which should be considered as a candidate based on the phase range. But the upper bound of phase obtained by CCB is greater than 2π, which will eliminate the candidate at \(\frac {\pi }{4}\). Note that these ACs are not specified in previous works such as [11, 12], which employ extremely large initial radius instead. For PSK modulation and QAM, all constellation points are located on one ring, and the candidates can be obtained in one shot. For highorder QAM, the CCB must be performed multiple times for different concentric rings.
4 List soft processingbased complex sphere decoder
As we discussed above, the conventional LSD has a variable complexity. In our case, we extend our proposed CSD to the LSD with a simpler list generation. From the original idea of the LSD, a list of symbol candidates with the smallest FPMs are required in the LLR calculation as (17). Furthermore, it would be possible to construct a list with the MAP solution inside. However, the complexity of list generation will be variable and significant [11]. For the simple implementation of LSD with a large number of candidates, the scatter list generation (SLG) is proposed.
4.1 Extrinsic LLR calculation of LSD
The LLR L _{ e1} for the kth bit in the transmit symbol vector is obtained for the channel decoder. The extrinsic information L _{ e1} from the LSD will be fed forward to the channel decoder as the input, and the extrinsic information L _{ e2} from the channel decoder will be fed back to the LSD. Thus, the information between two decoding components exchanges iteratively.
4.2 Scatter list generation

Perform the search by the proposed CSD to obtain the branches accessed in the search, and rearrange these branches in an ascending order according to the PPMs. Start several searches with the ML ordering by traversing the spans of the subtrees of the branches until the list is filled. Note that the subtree search will be terminated once it reaches the starting point of the neighbouring subtree search.

Replace the radius p _{SD} by the largest FPM of the symbol vector in the list.

MPTP will be carried out given the new radius p _{SD}.

The sphere radius p _{SD} may be updated in (9) with the new largest FPM in the list once a candidate with a smaller FPM is found.
The search strategy described above splits the entire tree into different subtrees and searches them independently. The algorithm table is shown in Algorithm 2. Although the proposed CSD is needed to perform several times for the scatter list generation, its complexity has been significantly reduced, which is measured via the number of updates in the list generation.
4.3 MLbased ordering
where the vector \(\mathbf {s}_{r}=\left [s^{1}_{\text {ML}},\ldots,s^{i1}_{\text {ML}}, {s^{i}_{r}},\ldots,s^{N_{t}}_{r}\right ]^{T}\), which implies that the unknown i−1 transmit symbols at low detection layers are replaced by the symbols in the ML solution, which can be used for ordering. The quantity \(\mathcal {R}\) denotes the set of available branches for the ith layers, and the quantity κ denotes the smallest Euclidean distance in the sorting process. The calculation of MLbased ordering has a modest cost in (18), which only needs \(\vert \mathcal {R} \vert (i+1)\) multiplications for each layer. The notation · denotes the size of the set.
5 Simulation results
In this section, we have discussed the proposed CSD in two different forms: (1) the hard output CSD and (2) the soft output CSD (LSD). For the hard output CSD, the performance and complexity of several CSDs are compared via BER and the number of visited nodes in a 8×8MIMO system with 16QAM and 8PSK. An MPSK modulation in our simulation is defined as γ e ^{(2n+1)π/M }:n=0,1,…,M−1. The quantity γ is defined as the magnitude of the modulation scheme, and the quantity M is the size of the modulation. We consider the conventional SECSD, PhamCSD [12], PTPCSD [7], and the proposed CSD with and without AC for CCB, all of which are complex SE enumerationbased CSD with p _{SD}=∞ at the beginning of the search. The PTP can be simply extended to PhamCSD. The energy per bit to noise (E _{ b } N _{0}) is used. The MIMO channel coefficients (N _{ t }=N _{ r }) are generated according to Jakes model, and the channel noise is additive white Gaussian noise, which is identically independently distributed for each receive antenna as stated in the previous section. The probabilistic noise constraint is set to ε _{ p }=0.2. The threshold ε for SPS must be appropriately adjusted according to the dimensions and the modulation as stated earlier, and we set ε=0.001. The ISRC scheme [8] is not employed because of the difficulty of choosing parameters for intra radius.
6 Related work discussion
For a very largescale integration (VLSI) implementation, the enumeration scheme becomes the bottleneck of sphere decoding algorithm as discussed in [14, 15, 17]. The enumeration must be efficiently implemented, so “onenodepercycle architecture” is one of the promising structures for the hardware implementation. Our proposed sphere decoder basically is a variant of the method proposed in [14]. The authors proposed a new architecture which consists of two entities: (1) metric computation unit (MCU) and (2) metric enumeration unit (MEU). These two components perform in a parallel manner to handle the forward search and the backward search separately. In our proposed method, the enumeration scheme, i.e., MEU and the MCU can work in the same way. The search will always perform the successive interference cancellation to obtain the nullingcancelling points, once the unvisited node is accessed. In the meanwhile, the MEU can be used to find the remaining surviving nodes after the above process. Furthermore, the latency requirement of “onenodepercycle” architecture will not be a problem in our case. This is because the second term in the modified probabilistic tree pruning is not realtime calculation as described in Eq. (14). Only one additional cycle is needed to obtain the tight radius in each detection layer. The reduction of the number of the visited nodes is more significant than the few additional cycles. Additionally, the statistical pruning we used in the paper is also precomputed to avoid that the radius obtained by the successive interference cancellation is too large in some extreme cases. In [14], the authors modified the SE enumeration scheme to achieve the critical path reduction. However, the modified one is not strictly compatible with the “onenodepercycle” architecture. Due to the complexity reduction and a few additional cycles required, the modified scheme is still working under the architecture. This situation is quite similar to the one we discussed above.
Furthermore, the complexity of the enumeration scheme can be further reduced by the method introduced in [15, 17]. The authors proposed the twodimensional zigzag enumeration schemes to avoid the sorting process and unnecessary partial path metric calculation, which are very computational intensive for the practical hardware implementation. In other words, the partial path metrics of the unvisited nodes are only calculated once the expansion of the tree is demanded. Hence, there is no need to visit all children of a parent node with the real Euclidean distance sorting. To the authors’ best knowledge, the twodimensional zigzag enumeration scheme is the most efficient method for the complex sphere decoder.
7 Conclusion
In this paper, novel sphere decoding algorithms for MIMO detection and iterative detection and decoding have been presented. The proposed CSD, incorporating the statistical tree pruning technique and SIC, first reaches the bottom layer and eliminates the candidates for the lower layers before the search reaches them rather than eliminating these candidates at the lower layers. From the simulation results, it is seen that the proposed CSD can significantly save computational efforts compared to the conventional CSD. Furthermore, the proposed CSD has been naturally extended to the list SD (LSD) based on the scatter list generation. The proposed LSD makes better use of the ML solution to reorder the remaining branches. Hence, the list generation becomes simpler than that of the conventional LSD. The complexity of the proposed sphere decoding algorithms for MIMO detection is significantly reduced, and the algorithm provides an attractive tradeoff between complexity and performance.
Declarations
Acknowledgements
This work is supported by the Scientific Research Foundation of CUIT (NO. KYTZ201415), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry and the Sichuan Provincial Department of Science and Technology Innovation and R&D projects in Science and Technology Support Program (NO. 2015RZ0060).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 E Viterbo, J Boutros, A universal lattice code decoder for fading channels. IEEE Trans. Inf. Theory. 45(5), 1639–1642 (1999).View ArticleMathSciNetMATHGoogle Scholar
 Y Li, N Seshadri, S Ariyavisitakul, On maximumlikelihood detection and the search for the closest lattice point. IEEE Trans. Inf. Theory. 49(10), 2389–2402 (2003).View ArticleGoogle Scholar
 CP Schnorr, M Euchner, Lattice basis reduction: improved practical algorithms and solving subset sum problems. Math. Program. 66(2), 181–191 (1994).View ArticleMathSciNetMATHGoogle Scholar
 E Agrell, T Eriksson, A Vardy, K Zegar, Closet point search in lattices. IEEE Trans. Inf. Theory. 48(8), 2201–2214 (2002).View ArticleMATHGoogle Scholar
 AD Murugan, HE Gamal, et al., A unified framework for tree search decoding: rediscovering the sequential decoder. IEEE Trans. Inf. Theory. 52(3), 933–953 (2006).View ArticleMATHGoogle Scholar
 R Gowaikar, B Hassibi, Statistical pruning for nearmaximum likelihood decoding. IEEE Trans. Signal Process. 55(6), 2661–2675 (2007).View ArticleMathSciNetGoogle Scholar
 B Shim, I Kang, Sphere decoding with a probabilistic tree pruning. IEEE Trans. Signal Process. 56(10), 4867–4878 (2008).View ArticleMathSciNetGoogle Scholar
 MX Chang, On further reduction of complexity in tree pruning based sphere search. IEEE Trans. Commun. 58(2), 471–422 (2010).Google Scholar
 GJ Foschini, Layered spacetime architecture for wireless communication in a fading environment when using multiple antennas. Bell Lab Technical J.1(2), 41–59 (1996).View ArticleGoogle Scholar
 RC de Lamare, R SampaioNeto, Minimum mean square error iterative successive parallel arbitrated decision feedback detectors for DSCDMA systems. IEEE Trans. Commun. 56(5), 778–789 (2008).View ArticleMathSciNetGoogle Scholar
 B Hochwald, S Ten Brink, Achieving nearcapacity on a multipleantenna channel. IEEE Trans. Commun. 51(3), 389–399 (2003).View ArticleGoogle Scholar
 D Pham, KR Pattipati, et al., An improved complex sphere decoder for VBLAST systems. IEEE Signal Process. Lett. 11(9), 748–751 (2004).View ArticleGoogle Scholar
 KC Lai, LW Lin, Lowcomplexity adaptive tree search algorithm for MIMO detection. IEEE Trans. Wireless. Commun. 8(7), 3716–3726 (2009).View ArticleGoogle Scholar
 A Burg, M Borgmann, et al., VLSI implementation of MIMO detection using the sphere decoding algorithm. IEEE J. SolidState Circuits. 40(7), 1566–1577 (2005).View ArticleGoogle Scholar
 M Shabany, K Su, P Gulak, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP). A pipelined scalable high throughput implementation of a nearML KBest complex lattice decoder, (2008), pp. 3173–3176, doi:10.1109/ICASSP.2008.4518324.
 M Barrenechea, M Mendicute, et al., in Proc.19th European Signal Process. Conf. (EUSIPCO). Implementation of complex enumeration for multiuser MIMO vector precoding, (2011), pp. 739–743.Google Scholar
 K Nikitopoulos, J Zhou, B Congdon, et al., in Proc. 2014 ACM Conf. on SIGCOMM. Geosphere: Consistently turning MIMO capacity into throughput, pp. 631–642, doi:10.1145/2619239.2626301.
 K Nikitopoulos, A Karachalios, D Reisis, Exact MaxLog MAP SoftOutput Sphere Decoding via Approximate SchnorrEuchner Enumeration. IEEE Trans. Veh. Technol. 64(6), 2749–2753 (2015).View ArticleGoogle Scholar
 SA Laraway, B FarhangBoroujeny, Implementation of a Markov Chain Monte Carlo based multiuser/MIMO detection. IEEE Trans. Circuits Syst. I. 56(1), 246–255 (2009).View ArticleMathSciNetGoogle Scholar
 E Zimmermann, DL Milliner, et al., in Proc. IEEE GLOBECOMM 2008. Optimal LLR clipping levels for mixed hard/soft output detection, (2008), pp. 1–5, doi:10.1109/GLOCOM.2008.ECP.222.
 RH Gohary, TJ Willink, On LLR clipping in BICMID noncoherent MIMO communications. IEEE Commun. Lett. 15(6), 650–652 (2011).View ArticleGoogle Scholar
 C Studer, Iterative MIMO decoding: algorithms and VLSI implementation aspects (Ph.D. dissertation, HartungGorre Verlag Konstanz, 2009).Google Scholar
 T Cui, C Tellambura, An efficient generalized sphere decoder for rankdeficient MIMO systems. IEEE Commun. Lett. 9(5), 423–425 (2015).Google Scholar
 P Wang, T LeNgoc, A lowcomplexity generalized sphere decoding approach for underdetermined linear communication systems: performance and complexity evaluation. IEEE Trans. Commun. 57(11), 3376–3388 (2009).View ArticleGoogle Scholar
 B Hassibi, H Vikalo, On the spheredecoding algorithm I.Expected complexity. IEEE Trans. Signal Process. 53(8), 2806–2818 (2005).View ArticleMathSciNetGoogle Scholar
 S Kay, Fundamentals of Statistical Signal Processing: Estimation Theory (A Simon & Schuster Company, Upper Saddle River, New Jersey, 1993).MATHGoogle Scholar
 DW Waters, JR Barry, The chase family of detection algorithms for multipleinput multipleoutput channels. IEEE Trans. Signal Process.56(2), 739–747 (2008).View ArticleMathSciNetGoogle Scholar