This section firstly reviews the optimal MAP detection algorithm. Then, it discusses several sub-optimal detection algorithms. These algorithms can be divided into two main families, namely the tree-search-based detection and the interference cancellation-based detection. The tree-search-based detection generally falls into two main categories, namely depth-first search and breath-first search. The classical sphere decoding is a depth-first approach, while the K-Best decoding and fixed sphere decoding are commonly seen as breath-first approaches. We then present interference cancellation-based detection that performs MMSE filtering in combination with the soft symbol-aided interference cancellation. The interference cancellation can be carried out either in a successive way or in a parallel way. Consequently, the LC-K-Best decoder is also introduced [40].
Maximum a posteriori probability (MAP) detection
The MAP algorithm uses an exhaustive search over all 2Q·M possible symbol combinations to compute the exact a posteriori probability of each bit. Such probability is usually expressed in terms of log-likelihood ratio (LLR). The sign of LLR value determines the binary decision about the corresponding bit, while its magnitude indicates the reliability of the decision. More concretely, LLR of the b
th bit of the i
th symbol, x
i,b
, can be computed as:
$$ L\left(x_{i,b}\right) = \log \frac{P(x_{i,b}=+1|\textbf{y})}{P(x_{i,b}=-1|\textbf{y})} = \log \frac{\sum_{s\in \chi_{i,b}^{+1}}p(\textbf{y}|\textbf{s})P(\textbf{s})}{\sum_{\textbf{s}\in \chi_{i,b}^{-1}}p(\textbf{y}|\textbf{s})P(\textbf{s})}, $$
((3))
where \(\chi _{i,b}^{+1}\) and \(\chi _{i,b}^{-1}\) denote the sets of symbol vectors corresponding to the i
th symbol and having the b
th bit of the symbol equal to +1 and −1 (representing a logical 1 and a logical 0), respectively. p(y|s) is the conditioned probability density function given by:
$$ p(\textbf{y}|\textbf{s}) = \frac{1}{{\left(\pi N_{0}\right)}^{N}}\exp\left(-\frac{1}{N_{0}}\left\|\textbf{y}-\textbf{H}\textbf{s}\right\|^{2}\right), $$
((4))
and P(s) represents the a priori information provided by the channel decoder in the form of a priori LLRs:
$$ \begin{aligned} L_{A}(x_{i,b})& = \log \left(\frac{P\left(x_{i,b}=+1\right)}{P\left(x_{i,b}=-1\right)}\right), \quad \forall i,b\\ P(\textbf{s}) & = \prod\limits_{i=1}^{M}P\left(s_{i}\right)= \prod\limits_{i=1}^{M}\prod\limits_{b=1}^{Q}P\left(x_{i,b}\right). \end{aligned} $$
((5))
To reduce the computational complexity, LLR values can be calculated using the Max-Log-MAP approximation [22]:
$$ L \left(x_{i,b}\right) \approx \frac{1}{N_{0}}\min\limits_{\chi_{i,b}^{-1}} \left\{d_{1}\right\} -\frac{1}{N_{0}}\min\limits_{\chi_{i,b}^{+1}} \left\{d_{1}\right\}, $$
((6))
where
$$ \begin{aligned} d_{1} & = \left\|\textbf{y}-\textbf{H}\textbf{s}\right\|^{2} -N_{0}\log P(\textbf{s}) \\ & = \left\|\textbf{y}-\textbf{H}\textbf{s}\right\|^{2} -N_{0}\sum\limits_{i=1}^{M}\sum\limits_{b=1}^{Q}\log P(x_{i,b}), \end{aligned} $$
((7))
represents the Euclidean distance between the received vector y and lattice points Hs.
Based on the a posteriori LLRs L(x
i,b
) and the a priori LLRs L
A
(x
i,b
), the detector computes the extrinsic LLRs as L
E
(x
i,b
)=L(x
i,b
)−L
A
(x
i,b
).
The exact computation of LLR using MAP detection can only be used with low-order modulations and a small number of antennas [43] because its complexity increases exponentially with respect to the number of transmit antennas and modulation orders. For example, in the case of a 2×2 MIMO system with 4-QAM, 22×2=16 possible solutions need to be searched. However, in the case of a 4×4 MIMO system with 16-QAM, there are 24×4=65,536 possible solutions. A number of MIMO detectors have been therefore proposed with reduced complexity as will be discussed in the following.
Tree-search-based detection
List sphere decoder (LSD)
The sphere decoder transforms the symbol detection problem into a lattice search problem [9,11,12], which can be represented by the search on a tree. Using the QR decomposition, the channel matrix H can be transformed into the product of two matrices Q and R (H =QR), where Q is a N×M unitary matrix (Q
H
Q=I
M
), and R is a M×M upper triangular matrix with real-positive entries on its diagonal. With the modified received symbol vector \(\tilde {\textbf {y}} = \textbf {Q}^{H}\textbf {y}\), the distance in Equation 7 can then be computed as: \(\left \|\textbf {y}-\textbf {H}\textbf {s} \right \|^{2} =\left \|\tilde {\textbf {y}}-\textbf {Rs}\right \|^{2}\). Exploiting the triangular nature of R, the Euclidean distance metric d
1 can be recursively evaluated through the accumulated partial Euclidean distance d
i
with d
M+1=0 as follows [25]:
$$ \begin{aligned} d_{i} &= d_{i+1} + \underbrace{\left|\tilde y_{i}-\sum\limits_{j=i}^{M}R_{i,j} s_{j}\right|^{2}}_{{m^{C}_{i}}}\\ &\quad+\underbrace{\frac{N_{0}}{2}\sum\limits_{b=1}^{Q}\left(\left|L_{A}\left(x_{i,b}\right)\right|-x_{i,b}L_{A}\left(x_{i,b}\right)\right)}_{{m^{A}_{i}}}, \quad i = M,..., 1. \end{aligned} $$
((8))
where \({m^{C}_{i}}\) and \({m^{A}_{i}}\) denote the channel-based partial metric and the a priori-based partial metric at the i
th level, respectively.
The sphere decoder performs a depth-first search in both forward and backward directions. A certain pruning criterion can be used to reduce the number of visited nodes. For example, a sphere radius can be set to limit the search range. The tree is represented with M+1 levels, where the level l corresponds to the l
th transmit antenna. The tree search starts at the root level with the node at level M corresponding to the symbol transmitted by the M
th antenna. The partial Euclidean distance d
M
in Equation 8 is computed. If d
M
respects the sphere radius constraint, the search continues at level M−1 and steps down the tree at level l until finding a valid leaf node at level 1. Subsequently, the search continues by back-tracking to previous levels to find better candidates. Figure 2a illustrates the tree search in the case of M=2. Thus, the candidate with the minimum Euclidean distance is chosen as an approximation of the ML solution in the hard-output sphere decoder. Whereas, in the list sphere decoder [22], a list
of the most promising candidates and their corresponding Euclidean distances are used in the computation of LLR values:
$$ L \left(x_{i,b}\right) = \frac{1}{N_{0}}\min\limits_{\mathcal{L}\cap \chi_{i,b}^{-1}}\left\{d_{1}\right\}-\frac{1}{N_{0}}\min\limits_{\mathcal{L}\cap \chi_{i,b}^{+1}}\left\{d_{1}\right\}. $$
((9))
Although the list sphere decoder is able to approach the theoretical channel capacity, the proximity to the capacity depends on the list size. The list should be large enough to include at least one candidate for both possible hypotheses. However, using an excessively large list size will lead to an increase in computational complexity. Meanwhile, the size of the list should not be too small either. The use of limited list size causes inaccurate approximation due to missing some counter hypotheses where no entry can be found in the list for a particular bit x
i,b
= +1 or −1. The frequently used solution for this problem is to set the LLR to a predefined maximum value [22,27]. Moreover, two methods were used to process the list in the iterative receiver. The first method consists of generating the list during the first iteration and using this list for subsequent iterations to update the soft information [22]. The second method updates the list at each iteration leading to further performance improvements but yielding additional computational complexity [27]. Additionally, several methods can be included for further reduction of the complexity of tree-search algorithms. The Schnorr-Euchner (SE) enumeration [10] proposed as a refinement of Fincke-Pohst (FP) enumeration extends the nodes in ascending order with respect to their Euclidean distance metrics to reduce the average complexities. Layer ordering technique allows to select most reliable symbols at a high layer using the sorted QR (SQR) decomposition [44]. The most reliable symbols are helpful for faster finding the ML solution. MMSE pre-processing might also be used for further reducing through the use of an extended channel matrix for the SQR decomposition [45]. However, this method introduces a biasing factor in the metrics which should be removed in the LLR calculation to avoid performance degradation [46].
Single tree-search sphere decoder (STS-SD)
One of the two minima in Equation 6 corresponds to the MAP hypothesis s
MAP while the other corresponds to the counter hypothesis. The computation of LLR can be done as:
$$\begin{array}{*{20}l} L\left(x_{i,b}\right) &=\left\{ \begin{array}{ll} \frac{1}{N_{0}}\left(d_{i,b}^{\overline{\text{MAP}}} - d^{\text{MAP}}\right),& \text{if} \,\,\,x_{i,b}^{\text{MAP}} = +1\\ \frac{1}{N_{0}}\left(d^{\text{MAP}} - d_{i,b}^{\overline{\text{MAP}}}\right), & \text{if} \,\,\,x_{i,b}^{\text{MAP}} = -1. \end{array} \right.\\ d^{\text{MAP}} & = \left\|\tilde{\textbf{y}}-\textbf{R}\textbf{s}^{\text{MAP}}\right\|^{2} - N_{0}P\left(\textbf{s}^{\text{MAP}}\right), \end{array} $$
((10))
$$\begin{array}{*{20}l} d_{i,b}^{\overline {\text{MAP}}} & = \min\limits_{s \in \chi_{i,b}^{\overline {\text{MAP}}}} \left\{\left\|\tilde{\textbf{y}}-\textbf{R}\textbf{s}\right\|^{2} - N_{0}P\left(\textbf{s}\right)\right\}, \end{array} $$
((11))
$$\begin{array}{*{20}l} \textbf{s}^{\text{MAP}} & = \arg \min\limits_{s \in 2^{Q \cdot M}} \left\{\left \|\tilde{\textbf{y}}-\textbf{R}\textbf{s}\right\|^{2} - N_{0}P\left(\textbf{s}\right)\right\}, \end{array} $$
((12))
where \(\chi _{i,b}^{\overline {\text {MAP}}}\) denotes the bit-wise counter hypothesis of the MAP hypothesis, which is obtained by searching over all the solutions with the b
th bit of the i
th symbol opposite to the current MAP hypothesis. Originally, the MAP hypothesis and the counter hypotheses can be found through repeating the tree search [47] that requires a large computational complexity cost. To overcome this, the single tree-search algorithm [24,25] was developed to compute all the LLRs concurrently. The d
MAP metric and the corresponding \(d_{i,b}^{\overline {\text {MAP}}}\) metrics are updated through one tree search. The basic idea of STS-SD is to search the sub-tree originating from a given node if the Euclidean distance leads to an update of either d
MAP or at least one of \(d_{i,b}^{\overline {\text {MAP}}}\). Through the use of extrinsic LLR clipping method, the STS-SD algorithm can be tunable between the MAP performance and hard-output performance. Channel matrix regularization and run time constraint may also be used in STS-SD to reduce the decoding complexity at the price of performance degradation. The implementations of STS-SD have been reported in [31,32].
K-Best decoder
K-Best decoder is a breath-first search-based algorithm. Starting from the root node at level M+1 with d
M+1=0, K-Best decoder expands each of the K survival paths to all possible children nodes in the constellation and computes their corresponding partial Euclidean distances. Then, the K-Best decoder sorts all \(K\sqrt {2^{Q}}\) distances and keeps only the K nodes with minimum Euclidean distances until reaching the leaf nodes as illustrated in Figure 2b. The candidate with the minimum Euclidean distance is chosen as an approximate of the ML solution. Whereas, a list of the most likely candidates is retained in the case of iterative receiver. We note that the candidate list does not necessarily correspond to the lowest Euclidean distance.
The major drawbacks of K-Best decoder are the expansion and the sorting operations that are very time consuming. Several proposals have been drawn in the literature to approximate the sorting operations such as relaxed sorting [48], local sorting and merging, and distributed sorting [49]; or even to avoid sorting using on demand expansion scheme [50]. Moreover, K-Best decoder suffers similarly as LSD from missing counter hypothesis problem due to the limited list size. Numerous approaches have been proposed to address this problem such as smart candidates adding [51], bit flipping [52], path augmentation and LLR clipping [22,27].
Fixed sphere decoder (FSD)
Fixed sphere decoder is a breath-first search algorithm proposed to further reduce the complexity of K-Best decoder. It performs two stages of tree search. A full search is performed in the first T levels by expanding all branches per node. Then, a single search is performed in the remaining M−T levels expanding only one branch per node. The parameters T are chosen such as (N−M)(T+1)+(T+1)2>N in order to provide an asymptotical ML performance. We note that in FSD, the columns of H are ordered such as in the first T levels, the signal has the largest post-processing noise amplification. In the soft-output FSD proposed in [30], the search is performed not only to find the ML solution but also to find a set of candidates around the ML solution in order to compute the LLR of all bits. Therefore, a subset S is first chosen, then the ML solution of the subset is used to generate a subset S
′. The combined list S∪S
′ is finally used to compute an approximation of the extrinsic LLR. Efficient SISO FSD implementations have been proposed in [35,36].
Interference cancellation (IC)-based detection
Interference cancellation-based detection is commonly used in combination with MMSE liner filtering. In the case of MIMO iterative receiver, the MIMO equalizer and the channel decoder exchange soft information according to the turbo equalization principle [4,6]. The MIMO equalizer produces an equalized symbol vector \(\tilde {\textbf {s}}\) deduced from received signal y. The soft estimated symbol vector \(\hat {\textbf {s}}\) is used to cancel the interference terms in the received signal. The interference cancellation can be carried out either in a successive way as in VBLAST [8] or in a parallel way as in MMSE-IC [20,21].
Minimum mean square error-interference cancellation (MMSE-IC) equalizer
MMSE-IC equalizer can be performed using two filters [20]. The first filter p
k
is applied to the received vector y, and the second filter q
k
is applied to the estimated vector \(\hat {s}\) as shown in Figure 3. The equalized symbol \(\tilde {\textbf {s}}_{k}\) can be written as:
$$ \tilde{s}_{k} = \textbf{p}^{H}_{k} \textbf{y} - \textbf{q}^{H}_{k}\hat{\textbf{s}}_{k} \quad \text{with} \quad k\in\left[1,M\right], $$
((13))
where \(\hat {\textbf {s}}_{k}\) denotes the estimated vector given by the previous iteration with the k
th symbol omitted: \(\hat {\textbf {s}}_{k} = \left [\hat {s_{1}}... \hat {s}_{k-1} \quad 0 \quad \hat {s}_{k+1}... \hat {s}_{M}\right ]\). \(\hat {s}_{k}\) is calculated by the soft mapper as: \(\hat {s}_{k} = \mathbb {E}\left [s_{k}\right ] = \sum _{s \in 2^{Q}} sP\left (s_{k}=s\right)\) [53].
The filters p
k
and q
k
are optimized under the MMSE criterion:
$$ \left(\textbf{p}^{opt}_{k},\textbf{q}^{opt}_{k}\right) = \arg \min_{\textbf{p}_{k},\textbf{q}_{k}} \mathbb{E}\left\{\left|s_{k}-\tilde{s}_{k}\right|^{2}\right\}, $$
((14))
and can be computed by [4]:
$$ \textbf{p}^{opt}_{k} = {\sigma_{s}^{2}} \left[\textbf{H}\textbf{V}_{k}\textbf{H}^{H} + N_{0}\textbf{I}_{N}\right]^{-1}\textbf{h}_{k}, $$
((15a))
$$ \textbf{q}^{opt}_{k} = \textbf{H}^{H}\textbf{p}^{opt}_{k}, $$
((15b))
where \({\sigma _{s}^{2}}\) is the power of the received signal, h
k
denotes the k
th column of the channel matrix H, and V
k
is a diagonal matrix that depends on the residual errors of each estimated symbols:
$$ \textbf{V}_{k} = {\sigma_{s}^{2}}e_{k}{e_{k}^{T}} + \sum\limits_{i=1,i\neq k}^{M}{\nu_{i}^{2}}e_{i}{e_{i}^{T}} $$
((16))
with \({\nu _{k}^{2}}\) defined as:
$$ {\nu_{k}^{2}} = \mathbb{E}\left\{\left|s_{k}-\hat{s}_{k}\right|^{2}|L_{A}\right\}, $$
((17a))
$$ {\nu_{k}^{2}} = \sum\limits_{s \in 2^{Q}} \left|s\right|^{2}P\left(\hat{s}_{k} = s\right)- \left|\hat{s}_{k}\right|^{2}. $$
((17b))
At the first iteration, since no a priori information is available, the equalization process is reduced to the classical MMSE solution:
$$ \tilde{s}_{k} = \left[\textbf{H}^{H}\textbf{H} + \frac{{\sigma_{n}^{2}}}{{\sigma_{s}^{2}}}\textbf{I}_{M}\right]^{-1}\textbf{H}^{H}\textbf{y}. $$
((18))
The equalized symbols \(\tilde {s}_{k}\) are associated with a bias factor β
k
in addition to some residual noise plus interferences η
k
:
$$ \tilde{s}_{k}= \beta_{k}s_{k}+\eta_{k} $$
((19))
These equalized symbols are then used by the soft demapper to compute the LLR values using the Max-Log-MAP approximation [53]:
$$ L\left(x_{i,b}\right) = \frac{1}{\sigma_{\eta_{k}}^{2}}\left(\min_{s \in \chi_{i,b}^{-1}}\left|\tilde{s}_{k}-\beta_{k}.s_{k}\right|^{2} - \min_{s \in \chi_{i,b}^{+1}}\left|\tilde{s}_{k}-\beta_{k}.s_{k}\right|^{2}\right). $$
((20))
MMSE-IC equalizer requires M matrix inversions for each symbol vector. For this reason, several approximations of MMSE-IC were proposed.
The first approximation of MMSE-IC consists of replacing the variable \({\nu _{k}^{2}}\) by its mean \(\nu ^{2} = E\left ({\nu _{k}^{2}}\right)={\sigma _{s}^{2}} -\sigma _{\hat {s}}^{2}\) [4]. Hence, one matrix inversion is computed for all symbols. This approximation is denoted as MMSE-IC1. MMSE-IC1 algorithm reduces significantly the complexity of computing the filter coefficients. However, the coefficients of the equalizer must be recomputed at each iteration.
A second approximation denoted as MMSE-IC2 [20] assumes a perfect estimation of transmitted symbols (\(\sigma _{\hat {s}}^{2} ={\sigma _{s}^{2}}\)) to overcome the matrix inversion at each iteration.
In [21], a low-complexity approach of MMSE-IC is described by performing a single matrix inversion without performance loss. We refer to this algorithm as LC-MMSE-IC.
Successive interference cancellation (SIC) equalizer
The SIC-based detector was initially used in VBLAST systems. In VBLAST architecture [8], a successive cancellation step and interference nulling step are used to detect the transmitted symbols. However, this method suffers from error propagation. Several methods have been proposed to reduce this problem by taking decision errors into account [19,54]. An improved VBLAST for iterative detection and decoding is described in [54]. At the first iteration, an enhanced VBLAST which takes decision errors into account is employed. When the a priori LLRs are available from the channel decoder, soft symbols are computed by a soft mapper and are used in the interference cancellation. To describe the enhanced VBLAST algorithm, we assume that the detection order has been made according to the optimal detection order [8]. We define \(\hat {\textbf {s}}_{k-1}\) as \(\left [\hat {s_{1}} \quad \hat {s}_{2} \quad... \quad \hat {s}_{k-1}\right ]\), and H
i:j
as [h
i
h
i+1... h
j
], where h
i
denotes the i
th column of H. At the step k, the pre-detected symbol vector \(\hat {s}_{k-1}\) until step k−1 is cancelled out from the received signal:
$$ \textbf{y}_{k} = \textbf{y} - \textbf{H}_{1:k-1}\hat{\textbf{s}}_{k-1}. $$
((21))
In the conventional VBLAST algorithm, the hard estimated symbol vector s
k−1 is used in the cancellation step; then the MMSE filtering is applied in the nulling step. The enhanced VBLAST algorithm uses the soft estimated symbol vector \(\hat {\textbf {s}}_{k-1}\) and a nulling matrix W
k
based on the MMSE criterion that takes decision errors into account. W
k
can be expressed by [19,54]:
$$ \textbf{W}_{k} = {\sigma_{s}^{2}}\left(\textbf{H}\Sigma_{k}\textbf{H}^{H}+N_{0}I_{N}\right)^{-1}\textbf{h}_{k}, $$
((22))
where Σ
k
is the decision error covariance matrix defined as:
$$ \Sigma_{k}\! =\! \sum\limits_{i=1}^{k-1}{\epsilon_{i}^{2}}e_{i}{e_{i}^{T}} + \sum\limits_{i=k}^{M-k+1} {\sigma_{s}^{2}}e_{i}{e_{i}^{T}}, {\epsilon_{i}^{2}}\! =\! \mathbb{E}\left\{\left|s_{i}-\hat{s}_{i}\right|^{2}|\hat{s}_{i-1}\right\}. $$
((23))
The estimated symbol \(\tilde {s}_{k}\) can be expressed as:
$$ \tilde{s}_{k} = \textbf{W}_{k}^{H} \textbf{y}_{k} = \beta_{k}s_{k}+\eta_{k}. $$
((24))
A soft demapper is then used to compute the LLRs as in Equation 20. We refer to this algorithm as improved VBLAST (I-VBLAST) in the following.
Low-complexity K-Best (LC-K-Best) decoder
The classical K-Best decoder computes \(K\sqrt {2^{Q}}\) Euclidean distances. Then, a sorting operation is done to choose the K best candidates as illustrated in Figure 5 with an example of K=4. The LC-K-Best decoder recently proposed in [40] uses two improvements over the classical K-Best decoder for the sake of lower complexity and latency.
Simplified hybrid enumerationThe first improvement simplifies the hybrid enumeration of the constellation points in real system model when the a priori information is incorporated into the tree search using two look-up-tables (LUTs). Hybrid enumeration was initially proposed in [55] for soft-input sphere decoder in complex system model. It consists of separating the partial metric into two metrics: the channel metric and the a priori metric. To simplify the enumeration, we consider two LUTs. One LUT is used for channel metric \({m^{C}_{i}}\) and the other LUT is used to store the a priori metric \({m^{A}_{i}}\). The enumeration is approximated through the orthogonality of these two metrics. Figure 4 illustrates an example of the enumeration strategy. First, the constellation points are enumerated according to m
C and m
A and stored in the LUTs. Then, the smallest Euclidean distances of m
C and m
A are compared (S2 and S3). The one which has the minimum distance (S2 in m
C) is chosen as the first point. Then, the first point in m
A (S3) is compared to the next point in m
C (S1). Since S3 has a lower distance, it is considered as the second point and so on.
Relaxed on-demand expansionThe second improvement is to use a relaxed on-demand expansion that reduces the need of exhaustive expansion and sorting operations. The on-demand expansion was proposed in [50] for hard-output decoder. It consists in expanding the first children of parent nodes and choosing one minimum between these children. Then, the survival path expands the next child. In our approach, a portion A of the first children is chosen. Then, the corresponding parents expand their next children. This operation is repeated to get K best nodes. The number of the first children A is chosen in order to allow a parent node to extend all its possible children nodes depending on the constellation and on the total number K of retained solutions. Figure 5 shows an example with K=4 and A=2. All parent nodes at the first level expand their first children. The two children that have the smaller Euclidean distances (nodes 1 and 7) are retained. Then, the corresponding parent nodes (P1 and P2) expand their next children (nodes 3 and 8). The distance is compared, and the two nodes (3 and 10) having the lowest distances are retained to get 4 best candidates.
It has been shown in [40] that LC-K-Best decoder achieves almost the same performance as the classical K-Best decoder with different modulations. It was shown that the computational complexity in terms of the number of visited nodes can be significantly reduced specially in the case of high-order modulations.