Skip to main content

Soft information acceleration aided subspace suppression MIMO detection


In this paper, a multiple-input multiple-output detection structure called soft information acceleration (SIA) is proposed, which is suitable for simplifying the two-stage subspace marginalization with interference suppression (SUMIS) into one stage. The proposed one-stage method outperforms the conventional two-stage SUMIS when the subspace size is large enough. The performance advantage of the proposed SUMIS-SIA mainly results from the average number of soft information updates being equal to the ’subspace size,’ instead of only once during the two-stage SUMIS detection. Thus, the SUMIS-SIA achieves a better trade-off between performance and complexity. To further reduce the complexity, a channel-shortening method based on subspace suppression is proposed. Simulation results show that the proposed channel-shortening one-stage method also outperforms SUMIS, which benefits from SIA.

1 Introduction

The fifth generation of mobile communication scenarios demands higher data transmission rates. Multiple-input multiple-output (MIMO) [1,2,3,4,5] technology is a powerful means to adapt to this trend. MIMO systems enhance spectral efficiency through spatial multiplexing and use MIMO detection techniques at the receiver to recover the superimposed symbols in the spatial domain. However, the scaling up of MIMO presents a challenge to the base station receivers in the uplink scenario. The linear detection algorithms [6]—including zero-forcing (ZF) and minimum mean square error (MMSE)—can better cope with complexity rather than performance. In contrast, nonlinear detection can achieve better performance, but the number of symbol combinations traversed by nonlinear detection increases exponentially. As a result, receivers tend to adopt nonlinear detection for better performance. To reduce the complexity, many nonlinear detection schemes approximating the maximum likelihood (ML) are proposed, such as K-best decoding [7] and likelihood ascent search based on K symbols (K-LAS) [8]. These are both successful detection algorithms that offer a good trade-off between performance and complexity, coming close to the performance of ML [9] with adjustable complexity.

Subspace suppression is another technique in trade-off schemes, such as the two-stage detection algorithm ‘subspace marginalization for interference suppression’ (SUMIS) [10]. SUMIS constructs an \(n_s\)-dimensional subspace for each symbol and each subspace outputs soft information for that symbol only. Subspace partitioning and the precision of soft cancellation determine the quality of each symbol. Furthermore, both stages of SUMIS are well suited for parallel processing [11]. However, considering the coupling between different symbols, the two stages of SUMIS can be compressed into one stage by serial processing of symbols, subspace sorting, and result updating. Another method to build a trade-off is the Ungerboeck observation model (UOM) with finite memory length. The UOM used to be a tree-based approach. Since Rusek proposed parameter optimization and channel shortening method, the UOM can shorten the memory length based on the information theory [12] and run on the trellis.

The proposed soft information acceleration (SIA) structure improves SUMIS by introducing serial processing. It directly obtains soft information for all \(n_s\) symbols within the subspace through classification operations. Therefore, on average each symbol will get \(n_s\) versions of soft information. SIA corrects the cumulative soft information of symbols by utilizing multiple soft information versions per symbol, enhancing the accuracy of soft cancellation. Under the SIA structure, SUMIS detection is compressed into one stage with serial detection. Before detecting the subspace, the covariance matrix of interference and noise is updated based on previous detection results, achieving accelerated convergence of detection performance. Although the algorithm has only one stage, SUMIS-SIA outperforms SUMIS as \(n_s\) increases. This improvement is mainly due to the average number of soft information updates being \(n_s\), instead of only once as in SUMIS. We then proposed a channel-shortening method based on subspace detection. Using this method, we combined SUMIS-SIA and UOM to propose another algorithm, USUMIS-SIA, to further reduce complexity. The results show that USUMIS-SIA can also achieve a good trade-off and adjust the complexity by both \(n_{s}\) and memory length v, which is more flexible.

The notations in this paper are described below. Lowercase bold letters represent vectors, and uppercase bold letters represent matrices. \(\{\cdot \}^{T}\) and \(\{\cdot \}^{-1}\) stand for matrix transpose and inverse, respectively. \(M_{i,j}\) is the entry at the i-th row and j-th column of matrix \(\textbf{M}\). \({\mathbb {E}}\{\cdot \}\) represents mathematical expectation. \(\Vert \cdot \Vert\) is the norm calculation for vectors. \(|\cdot |\) is the modular computation of the set.

2 Preliminaries

For the consistency with SUMIS algorithm, the real signal model is used in this paper. Considering an uplink MIMO system with \(N_T /2\) transmitting antennas and \(N_R /2\) receiving antennas, the real reception model is

$$\begin{aligned} \textbf{y}=\textbf{H}\textbf{x}+\textbf{n}, \end{aligned}$$

where \(\textbf{y}\in {{\mathbb {R}}^{{{N_{R}}}\times 1}}\), \(\textbf{H}\in {{\mathbb {R}}^{ {N_{R} }\times N_T }}\), \(\textbf{x}\in {{\mathbb {R}}^{{N_T}\times 1}}\), \(\textbf{n}\in {{\mathbb {R}}^{{{N_{R}}}\times 1}}\) are the real received vector, real channel matrix, transmitted real symbol vector, and real white Gaussian noise, respectively. \(\textbf{x}\), \(\textbf{y}\), \(\textbf{n}\), and \(\textbf{H}\) are calculated from the corresponding complex form matrices \(\textbf{x}^\mathbb {C}\), \(\textbf{y}^\mathbb {C}\), \(\textbf{n}^\mathbb {C}\), \(\textbf{H}^\mathbb {C}\) as follows

$$\begin{aligned} \left[ \begin{aligned}&{\mathcal {R}}\{\textbf{x}^\mathbb {C}\} \\&{\mathcal {I}}\{\textbf{x}^\mathbb {C}\} \\ \end{aligned}\right] , \left[ \begin{aligned}&{\mathcal {R}}\{\textbf{y}^\mathbb {C}\} \\&{\mathcal {I}}\{\textbf{y}^\mathbb {C}\} \\ \end{aligned}\right] , \left[ \begin{aligned}&{\mathcal {R}}\{\textbf{n}^\mathbb {C}\} \\&{\mathcal {I}}\{\textbf{n}^\mathbb {C}\} \\ \end{aligned}\right] , \text { and }\left[ \begin{aligned} {\mathcal {R}}\{\textbf{H}^\mathbb {C}\}{} & {} -{\mathcal {I}}\{\textbf{H}^\mathbb {C}\} \\ {\mathcal {I}}\{\textbf{H}^\mathbb {C}\}{} & {} {\mathcal {R}}\{\textbf{H}^\mathbb {C}\} \\ \end{aligned}\right] . \end{aligned}$$

Each element of \(\textbf{n}\sim \mathcal {N}(0,\gamma )\), where \(\gamma =\frac{N_T{N}_{0}}{4}\) and \(N_0\) is the power of noise. The element \({x}_{s}\) of \(\textbf{x}\) belongs to the alphabet \(\chi\) (such as 2PAM, \({{x}_{s}}\in \chi = \{1,-1\}\)), \({\mathbb {E}}\{ {x}_{s} \}=0\), variance of \({x}_{s}\) is 1, where \(s\in {\{1, \cdots , N_T \}}\).

The optimal method to detect \(\textbf{x}\) from (1) is ML [9] with exponential complexity. To avoid the huge complexity of ML, SUMIS [10] was proposed to reduce the complexity, which constructs a subspace of size \(n_s\) for the s-th symbol and detects s-th symbol within it. In two-stage SUMIS, the s-th subspace must contain the s-th symbol. Thus, \(N_T\) subspaces of two-stage SUMIS can also be indexed by s. The subspace division is based on

$$\begin{aligned} {{\textbf{H}}^{T}}\textbf{H}=\left[ \begin{matrix} {{\sigma }^{2}_{1}} &{} {{\rho }_{1,2}} &{} \cdots \\ {{\rho }_{1,2}} &{} {{\sigma }^{2}_{2}} &{} {} \\ \vdots &{} {} &{} \ddots \\ \end{matrix} \right] , \end{aligned}$$

where \({\rho }_{s,j}\) is the inner product of the s-th and j-th real channel of \(\textbf{H}\), and the larger \({\rho }_{s,j}\) is, the stronger the correlation between s-th channel and j-th channel is. Subspace s contains s-th symbol and other \(n_{s}\)-1 symbols with the strongest correlation to s-th symbol. Thus the reception model becomes

$$\begin{aligned} \textbf{y}=\textbf{Hx}+\textbf{n}=\left[ \begin{matrix} {\bar{\textbf{H}}} &{} {\tilde{\textbf{H}}} \\ \end{matrix} \right] {{\left[ \begin{matrix} {{{\bar{\textbf{x}}}}^{T}} &{} {{{\tilde{\textbf{x}}}}^{T}} \\ \end{matrix} \right] }^{T}}+\textbf{n}={\bar{\textbf{H}}}{\bar{\textbf{x}}}+\tilde{\textbf{H}}\tilde{\textbf{x}}+\textbf{n}, \end{aligned}$$

where \({\bar{\textbf{H}}}{\bar{\textbf{x}}}\) and \(\tilde{\textbf{H}}\tilde{\textbf{x}}\) are the detection components and interference components of \(\textbf{y}\), respectively. Obviously, the elements associated with s-th symbol are contained in \(\bar{\textbf{x}}\) and \(\bar{\textbf{H}}\).

SUMIS contains two stages. In the first stage (S1), SUMIS detects \(N_T\) subspaces without (priori) soft information of interference subspace. The result of subspace s only contains log-likelihood ratios (LLRs) of s-th symbol. The i-th LLR of s calculated by max-log is

$$\begin{aligned} \lambda _{{b}_{s,i}} \approx \underset{\bar{\textbf{x}}:{{b}_{s,i}}=0}{\mathop {\min }}\,\frac{1}{{2}}\left\| \textbf{y}-{{\bar{\mathbf {H}}}{\bar{x}}} \right\| _\textbf{Q}^{2}- \underset{\bar{\textbf{x}}:{{b}_{s,i}}=1}{\mathop {\min }}\,\frac{1}{{2}}\left\| \textbf{y}- {{\bar{\mathbf{H}}}{\bar{x}}} \right\| _\textbf{Q}^{2}, \end{aligned}$$

where \({{b}_{s,i}}\) is the i-th bit of s-th symbol and \(\textbf{Q}\) is the covariance matrix of ‘\(\textbf{y}- {{\bar{\mathbf{H}}}{\bar{x}}}\)’. \(\left\| \textbf{y}- {{\bar{\mathbf{H}}}{\bar{x}}} \right\| _{\textbf{Q}}^{2}\) is the shorthand of the inner product \((\textbf{y}- {{\bar{\mathbf{H}}}{\bar{x}}})^{T}{\textbf{Q}}^{-1}(\textbf{y}- {{\bar{\mathbf{H}}}{\bar{x}}})\), which can be simplified as

$$\begin{aligned} {{\textbf{y}}^{T}}{{\bar{\textbf{H}}}^{r}}\bar{\textbf{x}}-\frac{1}{2}{{{\bar{\textbf{x}}}}^{T}}{{{\bar{\textbf{G}}^{r}}}}\bar{\textbf{x}}, \end{aligned}$$

where \({{\bar{\textbf{H}}}^{r}}={{\left( {{\textbf{Q}}^{-1}} \right) }}\bar{\textbf{H}}\), \({{\bar{\textbf{G}}}^{r}}={{\bar{\textbf{H}}}^{T}}{{\textbf{Q}}^{-1}}\bar{\textbf{H}}\) and \(\textbf{Q}=\tilde{\textbf{H}}\tilde{\textbf{H}}^{T}+\gamma \textbf{I}\).

In the second stage (S2), SUMIS detects \(N_T\) subspaces after canceling soft information of the interference subspace. The posteriori expectation vector \({\mathbb {E}}\left\{ \tilde{\textbf{x}}|\textbf{y} \right\}\) (calculated by S1’s result) of symbols in the interference subspace should be canceled from \(\textbf{y}\) to get \(\textbf{y}'\) by \(\mathbf {{y}'}\triangleq \textbf{y}-\tilde{\textbf{H}}{\mathbb {E}}\left\{ \tilde{\textbf{x}} \right\}\) and then \(\textbf{Q}'=\tilde{\textbf{H}}\bar{\varvec{\Phi }}\tilde{\textbf{H}}^{T}+\gamma \textbf{I}\) instead of \(\textbf{Q}\). The operation \(\mathbf {{y}'}\triangleq \textbf{y}-\tilde{\textbf{H}}{\mathbb {E}}\left\{ \tilde{\textbf{x}} \right\}\) means subspace suppression. \(\bar{\varvec{\Phi }}\) is a diagonal matrix composed of the posteriori variances of the interference symbols. The probability of s at constellation x is defined as \({p}_{s,x}\) which is the ‘soft information’ throughout the paper. In SUMIS, the \({p}_{s,x}\)s is calculated from the LLRs of S1. So the posteriori expectation and posteriori variance of s-th symbol are \(\sum \limits _{x\in \chi }{x\cdot {{p}_{s,x}}}\) and \(\sum \limits _{x\in \chi }{x^{2}\cdot {{p}_{s,x}}}-\{ \sum \limits _{x\in \chi }{x\cdot {{p}_{s,x}}} \} ^2\), respectively. It is worth noting that in S1, the soft information of the s-th symbol is directly from the s-th subspace. Thus, the soft information softly canceled by S2 is updated only once.

The LLRs in (4) is calculated only for the s-th symbol while all cases of \(\bar{\textbf{x}}\) and \(\left\| \textbf{y}- {{\bar{\mathbf{H}}}{\bar{x}}} \right\| _{\textbf{Q}}^{2}\) within such subspace are traversed. If all \(\bar{\textbf{x}}\) are properly classified, the LLRs of the other \(n_{s}\)-1 symbols can also be obtained directly. In each stage of SUMIS, the detection of subspaces is carried out independently, so each of the two stages of SUMIS can be processed parallelly. The core idea of this paper is that by only one stage, detecting the \(N_T\) subspaces serially and cumulatively updating the soft information for better performance. Specifically, the s-th subspace detection comprehensively utilizes the soft information from the previous \(s-1\) subspaces and executes soft cancellation in S1, instead of S2.

3 Methods

First, a one-stage SUMIS-SIA detection method through ‘serial detection’ and ‘utilizing accumulated soft information to perform interference cancellation’ is proposed. Then parameter optimization and channel shortening methods based on subspace suppression are derived. Based on these methods, USUMIS-SIA is proposed to further reduce the complexity.

Since SIA involves the adjustment of subspace order, define t as the index of sorted subspaces. The left side of Fig. 1 is an example of subspaces division while \(N_T\)=8 and \(n_s\)=3. In the subspace [1, 6, 4], SUMIS only calculates the LLRs of symbol 1 while SUMIS-SIA and USUMIS-SIA directly output the LLRs of symbols 1, 6, and 4 with almost no extra overhead. Define \(\varvec{\Sigma }^{t}_{a}\in {{\mathbb {R}}^{\log _{2}(\left| \chi \right| )\times {{N_T}}}}\) (initialized by \(\varvec{\Sigma }^{0}_{a}=\textbf{0}\)) for each symbol to store the accumulative LLRs throughout the detection process. Due to the serial detection architecture, the superscript t here in \(\varvec{\Sigma }^{t}_{a}\) represents the number of updates as the number of subspace detections increases. And \(\varvec{\Sigma }^{t}_{a}\)’s s-th column \(\varvec{\sigma }^{t}_{s}\) is the accumulative LLRs of s-th symbol. Define \(\varvec{\lambda }^{t}_{s}\) (\(s \in {1,\cdots ,n_s}\)) as the output LLRs from the t-th subspace. If the t-th subspace is detected, the \(\varvec{\lambda }^{t}_{s}\)s of \(n_s\) symbols will be merged into the corresponding \(\varvec{\sigma }^{t-1}_{s}\)s to get \(\varvec{\sigma }^{t}_{s}\)s. The soft cancellation operation of (t+1)-th subspace will be based on \(\varvec{\Sigma }^{t}_{a}\), which increases the performance of (t+1)-th subspace.


The details of SUMIS-SIA are given in Algorithm 1. Line 4 indicates that each subspace is detected only once. SUMIS-SIA detects subspaces in sorted serial order. Before a subspace detection, it is necessary to cancel the cumulative soft information of the interfering symbol. After a subspace detection, the cumulative LLRs need to be merged with the LLRs detected by the current subspace. The rest of this subsection will introduce the changes of SUMIS-SIA from SUMIS in three aspects, which include the sorting of subspaces, the improved subspace detection, and the LLR merging.

3.1.1 Sorting of subspaces

The division of subspace also uses \(\textbf{H}^{T}\textbf{H}\) as SUMIS. In SUMIS, subspaces are detected in natural order (and parallel), and the symbol s must be included in s-th subspace. SUMIS-SIA needs to adjust the above natural order first. By counting the total \(n_{s} \times N_T\) symbols of all detection subspaces, it can be found that the occurrence times of each symbol may be different. SUMIS-SIA sorts the subspaces according to the descending order of occurrence times as Fig. 1 shows. In Fig. 1, Stream 4 (and 8) has a total of 5 occurrences in all subspaces, so detect the Subspace 4 (or 8) first. The occurrence times of the 8 real symbols are “1, 4, 2, 5, 1, 4, 2, 5” respectively. Arranging the above occurrence times in descending order, SUMIS-SIA obtains the detection order of the serial subspace as: “4, 8, 2, 6, 3, 7, 1, 5”. The LLRs in \(\varvec{\Sigma }^{t}_{a}\) will gradually become more accurate; serially detecting the subspaces with ‘descending order’ will get more accurate LLRs in \(\varvec{\Sigma }^{t}_{a}\) of the symbols with fewer occurrence time, so that increasing the accuracy of accumulative LLRs comprehensively. Figure 4 in Section Results and Discussion proves the above statement.

Fig. 1
figure 1

An example of adjusting subspace order. Stream 4 (and 8) appears a total of 5 times across all subspaces, so prioritize detecting Subspace 4 (or 8). The occurrence counts for the 8 real symbols are “1, 4, 2, 5, 1, 4, 2, 5” respectively. When these occurrence counts are arranged in descending order, SUMIS-SIA determines the detection order of the serial subspaces as: “4, 8, 2, 6, 3, 7, 1, 5”

3.1.2 Subspace detection

Besides \(\varvec{\Sigma }^{t}_{a}\), define \(\varvec{\Sigma }^{t}_{c}\in {{\mathbb {R}}^{ \log _{2}(\left| \chi \right| ) \times N_T}}\) (initialized by \(\textbf{0}\)) to store the last LLRs detected from a subspace. The reason for \(\varvec{\Sigma }^{t}_c\) is needed is that SUMIS-SIA outputs ‘\(\varvec{\lambda }\)’ consistent with SUMIS rather than ‘\(\varvec{\sigma }\)’, which may not belong to the \(N_T\)-th subspace (such as the right side of Fig. 1, the 1-st symbol doesn’t belong to the 8-th subspace).

Before t-th subspace detection, SUMIS-SIA selects \(\varvec{\sigma }^{t-1}_{m}\)s from \(\varvec{\Sigma }^{t-1}_{a}\) (index m is corresponding to the interference subspace) and calculates \(\bar{\varvec{\Phi }}\) to give the most accurate soft information so far. The \(\bar{\textbf{x}}\) traversal of t-th subspace detection is consistent with two-stage SUMIS. Then the LLR vectors of the t-th detection \(\varvec{\lambda }^{t}_{s}\)s shall replace the corresponding columns of \(\varvec{\Sigma }^{t-1}_{c}\) to get \(\varvec{\Sigma }^{t}_{c}\) and merged with the accumulative LLR \(\varvec{\sigma }^{t-1}_{s}\)s through the method of 3.1.3 to get \(\varvec{\sigma }^{t}_{s}\)s and \(\varvec{\Sigma }^{t}_{a}\). \(\varvec{\Sigma }^{N_T}_{c}\) is the final output. The rationality here also lies in the growing accuracy of soft information.

3.1.3 The merger of LLRs

SUMIS-SIA uses a damped merge approach as [13]. When subspace detection obtains a series of new LLRs, they are merged with the previous accumulative LLRs as

$$\begin{aligned} \varvec{\sigma }^{t}_{s}=\zeta \cdot \varvec{\sigma }^{t-1}_{s}+(1-\zeta )\cdot \varvec{\lambda }^{t}_{s}. \end{aligned}$$

Through a large number of tests, \(\zeta = 0.6\) gives good performance in various simulations. Thus the LLRs of a symbol from all detection subspaces are merged together smoothly. This means that SIA needs to gradually correct LLR from the a priori equivalency of two cases of each bit.

Algorithm 1
figure a


3.2 USUMIS-SIA with trellis structure

Taking advantage of UOM’s reduction complexity method by shortening memory length, we then propose USUMIS-SIA, which also takes one stage as SUMIS-SIA while the trellis of UOM is introduced to the subspace detection at line 5 to 7 in Algorithm 1. The channel input–output relationship based on \(\mathbf {{y}}\) and \(\textbf{Q}\) (or \(\mathbf {{y}'}\) and \(\textbf{Q}'\)) of (3) is

$$\begin{aligned} \tilde{p}(\textbf{y}|\bar{\textbf{x}})\triangleq \exp \left( {{\textbf{y}}^{T}}{{\bar{\textbf{H}}}^{r}}\bar{\textbf{x}}-\frac{1}{2}{{{\bar{\textbf{x}}}}^{T}}{{{\bar{\textbf{G}}}}^{r}}\bar{\textbf{x}} \right) , \end{aligned}$$

where \({{\bar{\textbf{H}}}^{r}}\) and \({{\bar{\textbf{G}}}^{r}}\) are same as them in (5). And the recursive factorization of (7) is

$$\begin{aligned} \tilde{p}(\textbf{y}|\bar{\textbf{x}}) =\prod \limits ^{n_{s}}_{k=1}{\exp \left( {{r}_{k}}{{{\bar{x}}}_{k}}-\frac{1}{2}{{\left| {{{\bar{x}}}_{k}} \right| }^{2}}\bar{G}_{k,k}^{r}-{{{\bar{x}}}_{k}}\sum \limits _{l=1}^{k-1}{\bar{G}_{l,k}^{r}{{{\bar{x}}}_{l}}} \right) }. \end{aligned}$$

Combining (7) with channel shortening can further reduce the complexity of subspace detection under the same \(n_s\). For a limited memory v [14], the number of states of trellis is equal to \({{\left( \sqrt{\left| \chi \right| } \right) }^{v}}\), and there are \({{\left( \sqrt{\left| \chi \right| } \right) }^{v+1}}\) branches within a trellis unit. if \(v<n_{s}-1\), \(\bar{\textbf{G}}^{r}\) is a symmetric band matrix, i.e., \({{{\bar{G}}}^{r}_{m,n}}=0\), if \(|m-n|>v\). Recursive factorization [15] of (7) with v is

$$\begin{aligned} \tilde{p}(\textbf{y}|\bar{\textbf{x}}) =\prod \limits ^{n_{s}}_{k=1}{\exp \left( {{r}_{k}}{{{\bar{x}}}_{k}}-\frac{1}{2}{{\left| {{{\bar{x}}}_{k}} \right| }^{2}}\bar{G}_{k,k}^{r}-{{{\bar{x}}}_{k}}\sum \limits _{l=k-v}^{k-1}{\bar{G}_{l,k}^{r}{{{\bar{x}}}_{l}}} \right) }, \end{aligned}$$

where \(\textbf{r}={( {{{\bar{\textbf{H}}}}^{r}} )^{T}}\textbf{y}\). By comparing the changes from (8) to (9), channel shortening is manifested in the fact that the k-th symbol is only related to the forward-neighboring v-1 symbols. Based on (9), BCJR in [16] is adopted to complete the detection. The branch metric [17] (gamma) is

$$\begin{aligned} \varphi ({{\bar{x}}_{k}},\textbf{r}) =\exp \left( {{r}_{k}}{{{\bar{x}}}_{k}}-\frac{1}{2}{{\left| {{{\bar{x}}}_{k}} \right| }^{2}}\bar{G}_{k,k}^{r}-{{{\bar{x}}}_{k}}\sum \limits _{l=k-v}^{k-1}{\bar{G}_{l,k}^{r}{{{\bar{x}}}_{l}}} \right) . \end{aligned}$$

Combining with (10), other operations (recursive calculation of \(\alpha\)s, \(\beta\)s and to calculate the symbol’s marginal probability (decision) of the trellis) of BCJR can be run on the trellis.

The following of this subsection will introduce how to shorten the channel under subspace suppression. Unlike the description in [14], on the premise that the channel is shortened, \({{\bar{\textbf{H}}}^{r}}\) and \({{\bar{\textbf{G}}}^{r}}\) are given as

$$\begin{aligned} {{\bar{\textbf{H}}}^{r}}= & {} {{\left[ \bar{\textbf{H}}{{{\bar{\textbf{H}}}}^{T}}+\textbf{Q} \right] }^{-1}}\bar{\textbf{H}}\left[ {{{\bar{\textbf{G}}}}^{r}}+\textbf{I} \right] , \end{aligned}$$
$$\begin{aligned} {{\bar{\textbf{G}}}^{r}}= & {} {{{\mathbf {{V}}}}^{T}}\mathbf {{V}}-\textbf{I}, \end{aligned}$$

Upper triangular matrix \(\textbf{V}\) satisfies \({V_{m,n}}=0\), if \(n-m>v\), and its calculation is consistent with that in [14].

Proposition 1


$$\begin{aligned} \textbf{B}\triangleq -{{ {{\bar{\mathbf{H}}}}}^{T}}{{\left[ {{\bar{H}}}{{ {{\bar{\mathbf{H}}}}}^{T}}+\textbf{Q} \right] }^{-1}} {{\bar{\mathbf{H}}}}+\textbf{I}. \end{aligned}$$

Define \(\textbf{B}_{k}^{v}\) as the submatrix of \(\textbf{B}\),

$$\begin{aligned} \textbf{B}_{k}^{v}=\left[ \begin{matrix} B_{k+1,k+1} &{} \cdots &{} B_{k+1,\min (n_{s},k+v)} \\ \vdots &{} \ddots &{} \vdots \\ B_{\min (n_{s},k+v),k+1} &{} \cdots &{} B_{\min (n_{s},k+v),\min (n_{s},k+v)}^{} \\ \end{matrix} \right] . \end{aligned}$$

Define \(\textbf{b}_{k}^{v}=\left[ B_{k,k+1}^{},\cdots ,B_{k,\min (n_{s},k+v)} \right]\) as the row submatrix of \(\textbf{B}\), and \(\textbf{v}_{k}^{v}=\left[ \textbf{V}_{k,k+1}^{{}},\cdots ,\textbf{V}_{k,\min (n_{s},k+v)} \right]\) as the row submatrix of \(\textbf{V}\). Define

$$\begin{aligned} {{c}_{k}}=B_{k,k}^{}-\textbf{b}_{k}^{v}{{\left( \textbf{B}_{k}^{v} \right) }^{-1}}{{\left( \textbf{b}_{k}^{v} \right) }^{T}}. \end{aligned}$$

Finally, define \({{v}_{k,k}}={{\left( {{c}_{k}} \right) }^{-\frac{1}{2}}}\) as the diagonal element of the matrix \(\textbf{V}\), and

$$\begin{aligned} \textbf{v}_{k}^{v}=-{{v}_{k,k}}\textbf{b}_{k}^{v}{{\left( \textbf{B}_{k}^{v} \right) }^{-1}} \end{aligned}$$

together with \({v}_{k,k}\) form the row submatrix of \(\textbf{V}\). \(\square\)

The proof of (11) is given in Appendix A. Due to (3), derivation process only analyze \(\bar{\textbf{H}\bar{x}}\), which leads to \(\gamma \textbf{I}\) changing to \(\textbf{Q}\). It should be noted that although UOM directly obtains symbol-level probabilities, USUMIS-SIA still follows the LLR merger method described in 3.1.3. This requires converting the symbol-level probabilities into bit-level LLRs

$$\begin{aligned} \lambda _{{b}_{s,i}} \approx \underset{{x \in \chi }:{{{b}_{s,i}}}=0}{\mathop {\max }}\, \text {ln}(p_{s,x}) -\underset{{x \in \chi }:{{{b}_{s,i}}}=1}{\mathop {\max }}\, \text {ln}(p_{s,x}), \end{aligned}$$

where \(p_{s,x}\) is the symbol-level probability from the BCJR decision. The other operations are no different from Algorithm 1.

3.3 Complexity analysis

This subsection discusses the complexity of the SUMIS, SUMIS-SIA, and USUMIS-SIA based on real addition and real multiplication. Firstly, partial ML in a subspace requires pre-processing of \({{\bar{\textbf{H}}}^{T}}{{\textbf{Q}}^{-1}}\bar{\textbf{H}}\). Based on (5), traversal (\(\bar{\textbf{x}}\)) complexity of a partial ML subspace detection is

$$\begin{aligned} \left\{ \begin{aligned}&{{\left| \chi \right| }^{{{n}_{s}}}}\cdot \left( n_{s}^{2}+2{{n}_{s}} \right) ,&{mult}\\&{{\left| \chi \right| }^{{{n}_{s}}}}\cdot 0.5 \cdot (n_{s}^{2}+3{{n}_{s}}-4),&{add} \end{aligned} \right. , \end{aligned}$$

which is the complexity of detecting a subspace once. For each of the \(N_T\) subspaces, SUMIS requires two times the above pre-processing and (18) while SUMIS-SIA requires only once.

Secondly, USUMIS-SIA only needs to preprocess and traverse each \(N_T\) subspace once. BCJR detection in subspace requires preprocessing of \({{\textbf{Q}}}\), (11) and (12). Due to the existence of finite v, the BCJR detection structure is a combination of a tree and trellis. The BCJR algorithm adopted by a USUMIS-SIA’s subspace needs to iterate over a trellis with branches of

$$\begin{aligned} ({{n}_{s}}-v)\cdot {{\left| \chi \right| }^{v+1}}+\sum \limits _{i=1}^{v-1}{{{\left| \chi \right| }^{i}}} \end{aligned}$$

within a subspace detection. The BCJR algorithm can use max-log algorithm [18] to merge \(\alpha\)s, \(\beta\)s and to calculate the symbol’s marginal probability (decision) of each trellis unit, these above operations are directly related to the number of branches in (19). The gamma computation of a subspace is

$$\begin{aligned} \left\{ \begin{aligned}&(v+4)({{n}_{s}}-v) {{\left| \chi \right| }^{v+1}}+3{\left| \chi \right| }+\sum \limits _{i=2}^{v-1}{(i+3){{\left| \chi \right| }^{i}}},&{mult}\\&(v+1)({{n}_{s}}-v) {{\left| \chi \right| }^{v+1}}+{\left| \chi \right| }+\sum \limits _{i=2}^{v-1}{{i{\left| \chi \right| }^{i}}},&{add} \end{aligned} \right. . \end{aligned}$$

At the same time, total \(\alpha\) and \(\beta\) computation need multiplication of two times (19) respectively, and total decision computation needs the addition of one time (19).

Table 1 One subspace detection complexity corresponding to Figs. 9, 10, 11, the complexity relationship under the same configuration is “SUMIS > SUMIS-SIA > USUMIS-SIA”

4 Results and discussion

This section introduces the experimental results of SUMIS-SIA and USUMIS-SIA in fast-fading MIMO channels. In the MIMO scenario, \(N_T/2\) codewords are transmitted in parallel over complex MIMO channels. Under fast-fading Rayleigh channels, modulation symbols mapped from the same codeword will experience different channel fading coefficients. The elements in matrix \(\textbf{H}\) are independently and identically distributed, following a real Gaussian distribution with \(\mathcal {N}(0,\frac{1}{2})\). For ease of expression, this section describes the configuration in complex MIMO. For example, the real-form “32 (\(N_T\)) \(\times\) 32 (\(N_R\)) MIMO 4PAM” will be described as the complex-form “\(16\times 16\) MIMO 16QAM”. The discussion in this section focuses on \(E_b/N_0\), where the bit error rate (BER) being discussed of the performance curve is \(10e^{-4}\), and each point is based on 300 codeword errors. The codewords are encoded using LDPC codes with a length of 576 from 802.16e [20], and the output of the detector is decoded using a layered decoder with up to 25 iterations. The code rate is 1/2 (for Fig. 2) or 3/4 (for Fig. 35, 6, 7, 8, 9, 10, 11 and 12).

Fig. 2
figure 2

Performance of SUMIS-SIA and comparative detection algorithms in \(6 \times 6\) MIMO 4QAM configuration. With such small-scale MIMO and low-order modulation configurations, SUMIS-SIA needs to achieve comparable detection performance at a complexity close to SUMIS. The 7 curves belong to ‘ML’, ‘SUMIS \(n_s=3\)’, ‘SUMIS-SIA \(n_s=3\)’, ‘SUMIS-SIA \(n_s=4\)’, ‘AIR-PM \(r=4\)’, ‘AIR-PM \(r=5\)’, and ‘MMSE’, respectively

Fig. 2 presents the simulation results for complex 6x6 MIMO 4QAM with 1/2 LDPC, including SUMIS, SUMIS-SIA, ML, MMSE, and Achievable Information Rate based Partial Marginalization (AIR-PM) [21]. The size of r determines the complexity of maximum likelihood search in AIR-PM, and r satisfies \(n_s\)=r+1. In the experimental results, SUMIS has matched the detection performance of ML at \(n_s\)=3 (with a gap of less than 0.1 dB). SUMIS-SIA needs to achieve performance close to ‘\(n_s\)=3 SUMIS’ with \(n_s\)=4 (with a gap of 0.05 dB), and the search complexity of ‘\(n_s\)=3 SUMIS’ and ‘\(n_s\)=4 SUMIS-SIA’ is similar. This indicates that under small-scale MIMO and low-order modulation, SUMIS-SIA has fewer soft information updates, which cannot reflect its advantages. Furthermore, SUMIS-SIA has at least a 1 dB advantage over linear detection. AIR-PM requires several times the preprocessing overhead of SUMIS-SIA to achieve the performance of ‘\(n_s\)=4 SUMIS-SIA’ at r=5. To demonstrate the advantages of SUMIS-SIA, the next experimental configuration will increase the MIMO scale and modulation order.

Fig. 3
figure 3

Comparison between SUMIS-SIA and SUMIS under \(16 \times 16\) MIMO 16QAM. Under this configuration, SUMIS-SIA’s performance is already better than SUMIS since \(n_s\)=2. The 6 curves belong to ‘SUMIS \(n_s=2\)’, ‘SUMIS \(n_s=3\)’, ‘SUMIS \(n_s=4\)’, ‘SUMIS-SIA \(n_s=2\)’, ‘SUMIS-SIA \(n_s=3\)’, and ‘SUMIS-SIA \(n_s=4\)’, respectively

Fig. 4
figure 4

SER simulation curves of 8 subspaces under \(4 \times 4\) MIMO 16QAM. According to the reordered subspace index, the hard decision accuracy of symbols in SUMIS-SIA subspace detection gradually increases as the index grows. This confirms that the soft information of the symbols becomes increasingly accurate. The 8 curves belong to ‘Subspace 1’, ‘Subspace 2’, ‘Subspace 3’, ‘Subspace 4’, ‘Subspace 5’, ‘Subspace 6’, ‘Subspace 7’, and ‘Subspace 8’, respectively

Fig. 5
figure 5

Comparison between SUMIS-SIA and SUMIS under \(8 \times 8\) MIMO 4QAM. Compared with Fig.2, the \(8 \times 8\) MIMO scale is not large enough to achieve SUMIS-SIA outperforming SUMIS. The 8 curves belong to ‘SUMIS \(n_s=2\)’, ‘SUMIS \(n_s=3\)’, ‘SUMIS \(n_s=4\)’, ‘SUMIS \(n_s=5\)’, ‘SUMIS-SIA \(n_s=2\)’, ‘SUMIS-SIA \(n_s=3\)’, ‘SUMIS-SIA \(n_s=4\)’, and ‘SUMIS-SIA \(n_s=5\)’, respectively

Fig. 6
figure 6

Comparison between SUMIS-SIA and SUMIS under \(16 \times 16\) MIMO 4QAM. Compared with Fig.5, the \(16 \times 16\) MIMO scale is large enough to achieve SUMIS-SIA outperforming SUMIS. The 8 curves belong to ‘SUMIS \(n_s=2\)’, ‘SUMIS \(n_s=3\)’, ‘SUMIS \(n_s=4\)’, ‘SUMIS \(n_s=5\)’, ‘SUMIS-SIA \(n_s=2\)’, ‘SUMIS-SIA \(n_s=3\)’, ‘SUMIS-SIA \(n_s=4\)’, and ‘SUMIS-SIA \(n_s=5\)’, respectively

Fig. 7
figure 7

Comparison between SUMIS-SIA and SUMIS under \(8 \times 8\) MIMO 16QAM. Compared with Fig. 5, a modulation order of 16 is not large enough to achieve SUMIS-SIA outperforming SUMIS. The 8 curves belong to ‘SUMIS \(n_s=2\)’, ‘SUMIS \(n_s=3\)’, ‘SUMIS \(n_s=4\)’, ‘SUMIS \(n_s=5\)’, ‘SUMIS-SIA \(n_s=2\)’, ‘SUMIS-SIA \(n_s=3\)’, ‘SUMIS-SIA \(n_s=4\)’, and ‘SUMIS-SIA \(n_s=5\)’, respectively

Fig. 8
figure 8

Comparison between SUMIS-SIA and SUMIS under \(8 \times 8\) MIMO 64QAM. Compared with Fig. 7, a modulation order of 64 is large enough to achieve SUMIS-SIA outperforming SUMIS. The 6 curves belong to ‘SUMIS \(n_s=2\)’, ‘SUMIS \(n_s=3\)’, ‘SUMIS \(n_s=4\)’, ‘SUMIS-SIA \(n_s=2\)’, ‘SUMIS-SIA \(n_s=3\)’, and ‘SUMIS-SIA \(n_s=4\)’, respectively

Fig. 9
figure 9

Comparison between USUMIS-SIA and SUMIS under \(16 \times 16\) MIMO 16QAM with \(n_s\)=3. USUMIS-SIA (\(n_s\)=3), whose memory length v is shortened to 1, can still achieve performance comparable to SUMIS (\(n_s\)=3). The 3 curves belong to ‘SUMIS \(n_s=3\)’, ‘USUMIS-SIA \(n_s=3, v=2\)’, and “USUMIS-SIA \(n_s=3, v=1\)’, respectively

Fig. 10
figure 10

Comparison between USUMIS-SIA and SUMIS under \(16 \times 16\) MIMO 16QAM with \(n_s=4\). USUMIS-SIA (\(n_s\)=4), whose memory length v is shortened to 2, can still outperform SUMIS (\(n_s\)=4). The 4 curves belong to ‘SUMIS \(n_s=4\)’, ‘USUMIS-SIA \(n_s=4, v=3\)’, “USUMIS-SIA \(n_s=4, v=2\)’, and “USUMIS-SIA \(n_s=4, v=1\)’, respectively

Fig. 11
figure 11

Comparison between USUMIS-SIA and SUMIS under \(16 \times 16\) MIMO 16QAM with \(n_s=5\). USUMIS-SIA (\(n_s\)=5), whose memory length v is shortened to 2, can still outperform SUMIS (\(n_s\)=5). The 5 curves belong to ‘SUMIS \(n_s=5\)’, ‘USUMIS-SIA \(n_s=5, v=4\)’, “USUMIS-SIA \(n_s=5, v=3\)’, “USUMIS-SIA \(n_s=5, v=2\)’, and “USUMIS-SIA \(n_s=5, v=1\)’, respectively

Figure 3 focuses on the impact of \(n_{s}\) on changes in the performance gap between USUMIS-SIA and SUMIS under \(16 \times 16\) MIMO 16QAM. It can be found that as \(n_{s}\) increases from 2 to 4, only when \(n_s = 2\), SUMIS has smaller BER, and when \(n_{s}\) becomes larger, SUMIS-SIA outperforms SUMIS. As \(n_s\) increases, SUMIS-SIA has faster performance convergence. This is because the larger \(n_s\) is, the more the average number of updates of each symbol’s soft information is, and the more accurate the obtained symbol soft information is.

In order to prove the correctness of the above statement, this section gives a performance comparison of the subspaces. According to the description of SUMIS, S2 performs a soft cancellation operation of interference subspace symbols from the received signal. It is precisely because the symbol soft information of the interference subspace is close to the correct transmitted symbol that S2 of SUMIS can obtain better results than S1. If the accuracy of the symbol is higher, the symbol-level mathematical expectation is closer to the correct symbol. Based on this, if the soft information of SUMIS-SIA gradually becomes more accurate, the accuracy of subspace detection should also gradually increase. Figure 4 shows the SER simulation results under \(4 \times 4\) MIMO 16QAM. The corresponding subspace index of the performance curve in Fig. 4 is the reordered subspace index as 3.1.1. From this result, it can be seen that in this group of 8 subspace detections, the accuracy of the hard decision of the symbol gradually increased, which confirmed that the soft information of the symbol gradually became more accurate.

Figures 3, 5, 6, 7 and 8 show the performance gains of SUMIS-SIA under different simulation configurations. As the modulation orders or MIMO scales increase, the advantages of SUMIS-SIA over SUMIS become increasingly significant. This means that with only a small \(n_s\) required, the BER performance of SUMIS-SIA can be superior to SUMIS. For example, in the \(16 \times 16\) MIMO 4QAM scenario, when \(n_s = 5\), the BER performance of SUMIS-SIA is better than that of SUMIS, while in the \(16 \times 16\) MIMO 16QAM scenario, only \(n_s = 3\) is needed. In the \(8 \times 8\) MIMO configuration, the BER performance of SUMIS-SIA is better than that of SUMIS only when the modulation order reaches 64. It is worth noting that Figs. 3, 5, 6, 7 and 8 are only for displaying the performance changes of SUMIS-SIA. The BER performance and computational complexity of SUMIS-SIA are better than SUMIS in most scenarios. The reason SUMIS-SIA can outperform SUMIS is entirely due to the sufficient number of symbol updates provided by \(n_s\). Under the SIA structure, the average number of updates for the soft information of symbols is \(n_s\), rather than just once in SUMIS.

The complexity of SUMIS-SIA is half that of SUMIS. When \(n_s\) increases enough to enable SUMIS-SIA to outperform SUMIS, the performance-complexity trade-off of SUMIS-SIA can outperform that of SUMIS: achieving better performance than SUMIS with lower complexity than SUMIS.

USUMIS-SIA changes the subspace detection method of SUMIS-SIA to a tree or trellis structure based on UOM. In the SIA architecture, the average number of symbol soft information updates is still \(n_s\). In the previous analysis, SUMIS-SIA has already outperformed SUMIS by utilizing a sufficient \(n_s\). The following experiments demonstrate that as long as \(n_s\) is large enough, USUMIS-SIA can still outperform SUMIS by moderately reducing the memory length v. Figures 9, 10, and 11 show the performance of USUMIS-SIA with subspace detection using channel shortening. Firstly, when \(n_s\)=3 and v=1, USUMIS-SIA can achieve performance comparable to ‘\(n_s\)=3 SUMIS’. When \(n_s\)=4 and \(n_s\)=5, even if the memory length of subspace detection is shortened to v=2, USUMIS-SIA can still outperform SUMIS. These experimental results fully demonstrate the effect of \(n_s\) on performance improvement in the SIA: a sufficiently large \(n_s\) provides sufficiently accurate soft information, allowing subspace detection to tolerate moderate channel shortening.

In order to more intuitively understand the complexity of the algorithms in this figure, the subspace detection complexity corresponding to all involved configurations is listed in Table 1. Each data in the table represents the sum of real multiplication and real addition required for subspace detection under such algorithm and configuration. As shown in Table 1, when measuring complexity in terms of real addition and multiplication, the complexity relationship under the same configuration is “SUMIS > SUMIS-SIA > USUMIS-SIA”. Firstly, the preprocessing and traversal complexities of SUMIS-SIA are both half of those of SUMIS. The computational complexity of USUMIS-SIA is further reduced. This is mainly because USUMIS-SIA employs the max-log BCJR algorithm, where some multiplications and additions are simplified to additions and comparisons.

If \(n_s\) is sufficiently large, even with a moderate reduction in memory length v, USUMIS-SIA still outperforms SUMIS. Combining the complexity statistics in Table 1, the performance-complexity trade-off of USUMIS-SIA can also outperform that of SUMIS: achieving better performance than SUMIS with lower complexity than SUMIS.

Fig. 12
figure 12

Comparison between SUMIS-SIA and ‘SUMIS-SIA with parallelism 2’ under \(16 \times 16\) MIMO 16QAM. Pairwise detection of subspaces does not cause significant performance loss. The 6 curves belong to ‘SUMIS-SIA \(n_s=2\)’, ‘SUMIS-SIA \(n_s=3\)’, “SUMIS-SIA \(n_s=4\)’, ‘SUMIS-SIA parallel-2 \(n_s=2\)’, ‘SUMIS-SIA parallel-2 \(n_s=3\)’, and “SUMIS-SIA parallel-2 \(n_s=4\)’, respectively

The results show that when \(n_s\) is greater than a certain threshold, SUMIS-SIA based on SIA’s reasonable utilization and combination of soft information can outperform SUMIS. As previously mentioned, under SIA architecture, the symbol’s soft information needs to be updated many times (instead of SUMIS’s only one time in its S1). The above situation allows the detector to achieve better performance than SUMIS at a computational complexity lower than SUMIS. Situation results show that both SUMIS-SIA and USUMIS-SIA are very good trade-off cases between complexity and performance.

The following experiments will evaluate the detection performance of moderately parallelized SIA. According to the Algorithm 1, SIA is a fully serialized detection algorithm. In this experiment, the parallelism of SIA is increased to 2. In the improved SIA, subspaces \(t_o\) and \(t_o\)+1 (where \(t_o\) is an odd number) share the same set of soft information for soft cancellation. After the two parallel subspace detections are completed, the subspace output results are merged into the accumulated soft information. Figure 12 compares the performance of SUMIS-SIA and SUMIS-SIA with a parallelism of 2 under \(8\times 8\) MIMO 16QAM. When \(n_s\)=2, SUMIS-SIA shows a performance advantage of approximately 0.12dB. When \(n_s\) increases to 3 and 4, the performance gap between SUMIS-SIA and ‘SUMIS-SIA with parallelism 2’ is only 0.07dB. The experimental results show that as the average number of symbol updates increases, the performance loss caused by parallelism can be partially compensated. This is also attributed to the multiple updates of symbol soft information in SIA.

5 Conclusion

Two detection algorithms, SUMIS-SIA and USUMIS-SIA, are proposed by combining SUMIS architecture and soft information acceleration as the main contribution of this paper. For USUMIS-SIA, channel parameter optimization and channel shortening methods suitable for subspace suppression are also proposed so that the proposed algorithm can be flexibly adjusted in two aspects of subspace size \(n_{s}\) and memory length v. Compared with the SUMIS under the same \(n_{s}\), the complexities of these two algorithms are lower. Under scaling-up MIMO and high order modulation, with the increasing of \(n_s\), the performance convergence of SUMIS-SIA is faster than that of SUMIS; under sufficiently large \(n_s\), the performance of USUMIS-SIA after moderate channel shortening is also better than that of SUMIS. To sum up, the two algorithms proposed in this paper can rely on SIA to achieve a better trade-off between performance and complexity.

Availability of data and materials

The datasets used during the current study are available from the corresponding author on reasonable request.

Materials availability

Not applicable.

Code availability

Not applicable.



Multiple input multiple output


Soft information acceleration


Subspace marginalization with interference suppression


Zero forcing


Minimum mean square error


Maximum likelihood


Likelihood ascent search based on K symbols


Ungerboeck observation model


Log-likelihood ratios


Achievable information rate based partial marginalization


Bit error rate


  1. E.G. Larsson, O. Edfors, F. Tufvesson, T.L. Marzetta, Massive MIMO for next generation wireless systems. IEEE Commun. Mag. 52(2), 186–195 (2014).

    Article  Google Scholar 

  2. L. You, K.-X. Li, J. Wang, X. Gao, X.-G. Xia, B. Ottersten, Massive MIMO transmission for Leo satellite communications. IEEE J. Sel. Areas Commun. 38(8), 1851–1865 (2020).

    Article  Google Scholar 

  3. Z. Lin, M. Lin, T. Cola, J.-B. Wang, W.-P. Zhu, J. Cheng, Supporting IoT with rate-splitting multiple access in satellite and aerial-integrated networks. IEEE Internet Things J. 8(14), 11123–11134 (2021).

    Article  Google Scholar 

  4. Z. Lin, H. Niu, K. An, Y. Wang, G. Zheng, S. Chatzinotas, Y. Hu, Refracting ris-aided hybrid satellite-terrestrial relay networks: Joint beamforming design and optimization. IEEE Trans. Aerosp. Electron. Syst. 58(4), 3717–3724 (2022).

    Article  Google Scholar 

  5. Z. Lin, M. Lin, B. Champagne, W.-P. Zhu, N. Al-Dhahir, Secrecy-energy efficient hybrid beamforming for satellite-terrestrial integrated networks. IEEE Trans. Commun. 69(9), 6345–6360 (2021).

    Article  Google Scholar 

  6. K.K. Gill, K. Kaur, Comparative analysis of ZF and MMSE detections for nakagami-m faded MIMO channels, in Paper presented at 2015 annual IEEE India conference (INDICON), (2015), pp. 1–5.

  7. Z. Guo, P. Nilsson, Algorithm and implementation of the k-best sphere decoding for mimo detection. IEEE J. Sel. Areas Commun. 24(3), 491–503 (2006).

    Article  Google Scholar 

  8. Z. Qin, J. Xu, X. Tao, Z. X., Improved Depth-First-Search Sphere Decoding Based on LAS for MIMO-OFDM Systems, in Paper presented at 2015 IEEE 82nd vehicular technology conference (VTC2015-Fall), (2015), pp. 1–5.

  9. M.O. Damen, H.E. Gamal, G. Caire, On maximum-likelihood detection and the search for the closest lattice point. IEEE Trans. Inf. Theory 49(10), 2389–2402 (2003).

    Article  MathSciNet  Google Scholar 

  10. M. Čirkić, E.G. Larsson, Sumis: near-optimal soft-in soft-out mimo detection with low and fixed complexity. IEEE Trans. Signal Process. 62(12), 3084–3097 (2014).

    Article  MathSciNet  Google Scholar 

  11. W. Haselmayr, G. Möstl, S. Seeber, A. Springer, Hardware implementation of the SUMIS detector using high-level synthesis, in Paper presented at 2015 IEEE international symposium on circuits and systems (ISCAS), (2015), pp. 2972-2975. (2015)

  12. F. Rusek, O. Edfors, An information theoretic charachterization of channel shortening receivers, In Paper presented at 2013 asilomar conference on signals, systems and computers, (2013), pp. 2108–2112.

  13. J. Gao, D. Zhang, J. Dai, K. Niu, C. Dong, Resnet-like belief-propagation decoding for polar codes. IEEE Wirel. Commun. Lett. 10(5), 934–937 (2021).

    Article  Google Scholar 

  14. F. Rusek, A. Prlja, Optimal channel shortening for mimo and ISI channels. IEEE Trans. Wirel. Commun. 11(2), 810–818 (2012).

    Article  Google Scholar 

  15. F. Rusek, G. Colavolpe, C.E.W. Sundberg, 40 years with the ungerboeck model: a look at its potentialities [lecture notes]. IEEE Signal Process. Mag. 32(3), 156–161 (2015).

    Article  Google Scholar 

  16. L. Bahl, J. Cocke, F. Jelinek, J. Raviv, Optimal decoding of linear codes for minimizing symbol error rate (corresp.). IEEE Trans. Inform. Theory 20(2), 284–287 (1974).

    Article  Google Scholar 

  17. F. Rusek, M. Loncar, A. Prlja, A comparison of ungerboeck and forney models for reduced-complexity ISI equalization, in Paper presented at IEEE GLOBECOM 2007—IEEE global telecommunications conference, (2007), 1431-1436.

  18. D. Zhang, J. Liu, X. Yang, H. Ji, Max-Log-MAP and Log-MAP Decoding Algorithms for UST Symbol Based Turbo Code, in Paper presented at 2008 4th international conference on wireless communications, networking and mobile computing, (2007), pp. 1-5

  19. A. Krishnamoorthy, D. Menon, Matrix inversion using Cholesky decomposition, in Paper presented at 2013 signal processing: algorithms, architectures, arrangements, and applications (SPA), (2013), pp. 70–72

  20. IEEE p802.16e: Part 16: air interference for fixed and mobile broad-band wireless access systems. IEEE 802.16 document (2005)

  21. S. Hu, F. Rusek, A soft-output mimo detector with achievable information rate based partial marginalization. IEEE Trans. Signal Process. 65(6), 1622–1637 (2017).

    Article  MathSciNet  Google Scholar 

Download references


Not applicable.


This research is funded by the Key Program of the National Natural Science Foundation of China of funder grant number 92067202.

Author information

Authors and Affiliations



An algorithm jointly proposed by four authors. Xiaoxiong Xiong completed the simulation work and wrote the paper.

Corresponding author

Correspondence to Chao Dong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

USUMIS-SIA with trellis structure

USUMIS-SIA with trellis structure

In this paper, the lower bound to the information rate \({{I}_{\text {LB}}}\triangleq -{{\mathbb {E}}_{\textbf{Y}}}[{{\log }_{2}}(\tilde{p}(\textbf{y}))]+{{\mathbb {E}}_{\textbf{Y},\bar{\textbf{X}}}}[{{\log }_{2}}(\tilde{p}(\textbf{y}|\bar{\textbf{x}}))]\) [14] is used for derivation. Integrating (7) gives the expectation of \(\tilde{p}(\textbf{y})\). Define \({{\bar{\textbf{G}}}^{\text {r}}}=\mathbf {U\Delta }{{\textbf{U}}^{T}}\) (\(\textbf{U}^{T}\textbf{U}=\textbf{I}\)) as the eigenvalue decomposition expression of \(\bar{\textbf{G}}^{\text {r}}\) and \(\textbf{z}={{\textbf{U}}^{T}}\bar{\textbf{x}}\).

Firstly, the follwing is the derivation of the part \({{\mathbb {E}}_{\textbf{Y}}}[{{\log }_{2}}(\tilde{p}(\textbf{y}))]\). Given no priori probability about \(\bar{\textbf{x}}\), \(\mathbb {E}\{\bar{\textbf{x}}\}=\textbf{0}\) and the variance of the elements of \(\bar{\textbf{x}}\) are 1. Given \(\int {e^{-x^{2}}}dx = \sqrt{\pi }\), substitute the above disassembly into (7) and integrate it to obtain the following expression,

$$\begin{aligned} \begin{aligned} \widetilde{p}(\textbf{y})&=\frac{1}{{\sqrt{\pi }^{{2n}_{s}}}}\int e^{\frac{-\Vert \textbf{z}{{\Vert }^{2}}}{2}} e^{ \left( {{\textbf{y}}^{T}}{{\bar{\textbf{H}}}^{r}}\textbf{U}\textbf{z}-\frac{1}{2}\textbf{z}^{T}\varvec{\Delta z}\right) }\text {d}\textbf{z} \\&=\frac{1}{{{\pi }^{{n}_{s}}}}\int {\prod \limits _{k=1}^{{n}_{s}}{\exp }}(z_{k}{{d}_{k}}-\frac{1}{2}{z}_{k}^{2}[\delta _{k}^{{}}+1])\text {d}{{z}_{k}}\\&=\prod \limits _{k=1}^{{n}_{s}}{\frac{1}{ 2\left( \delta _{k}^{{}}+1\right) }}\exp \left( \frac{|{{d}_{k}}{{|}^{2}}}{2\left( \delta _{k}^{{}}+1\right) } \right) \end{aligned}, \end{aligned}$$

where \({d}_{k}\) is from vector \(\textbf{d} \triangleq \textbf{U}^{T} \left( \bar{\textbf{H}}^{r} \right) ^{T} \mathbf {{y}}\) (or \(\textbf{d} \triangleq \textbf{U}^{T} \left( \bar{\textbf{H}}^{r} \right) ^{T} \mathbf {{y}'}\) for S2). According to (A1), the expectation \({{\mathbb {E}}_{\textbf{Y}}}[{{\log }_{2}}(\tilde{p}(\textbf{y}))]\) is

$$\begin{aligned} \begin{aligned} {\mathbb {E}_{\textbf{Y}}}[{{\log }_{2}}(\tilde{p}(\textbf{y}))]&= {{\mathbb {E}}_{\textbf{Y}}}\left[ \sum \limits _{k=1}^{{{n}_{s}}}{\left[ \log \frac{1}{2\left( \delta _{k}+1\right) } +\frac{|{{d}_{k}}{{|}^{2}}}{2\left( \delta _{k}+1\right) } \right] } \right] \\&= \sum \limits _{k=1}^{{{n}_{s}}}{\left[ -\log (2\left( \delta _{k}+1\right) )+\frac{{{\mathbb {E}}_{\textbf{Y}}}[|{{d}_{k}}{{|}^{2}}]}{2\left( \delta _{k}+1\right) } \right] } \end{aligned}. \end{aligned}$$

Define the covariance matrix \(\textbf{R}\) of \(\textbf{d}\) as

$$\begin{aligned} \left\{ \begin{array} {lll} \textbf{U}^{T} \left( \bar{\textbf{H}}^{r}\right) ^{T} \bar{\textbf{H}} \bar{\textbf{H}}^{T} \bar{\textbf{H}}^{r} \textbf{U} + \textbf{U}^{T} \left( \bar{\textbf{H}}^{r}\right) ^{T} \textbf{Q} \bar{\textbf{H}}^{r} \textbf{U}{} & {} S1\\\textbf{U}^{T} \left( \bar{\textbf{H}}^{r}\right) ^{T} \bar{\textbf{H}} \bar{\textbf{H}}^{T} \bar{\textbf{H}}^{r} \textbf{U} + \textbf{U}^{T} \left( \bar{\textbf{H}}^{r}\right) ^{T} \textbf{Q}' \bar{\textbf{H}}^{r} \textbf{U}{} & {} S2 \end{array} \right. . \end{aligned}$$

And \({{{\mathbb {E}}_{\textbf{Y}}}[|{{d}_{k}}{{|}^{2}}]}\) is a diagonal element of the \(\textbf{R}\) matrix.

Secondly, the follwing is the derivation of the part \({{\mathbb {E}}_{\textbf{Y},\bar{\textbf{X}}}}[{{\log }_{2}}(\tilde{p}(\textbf{y}|\bar{\textbf{x}}))]\). The expectation in trace form is

$$\begin{aligned} \begin{aligned} {{\mathbb {E}}_{\textbf{Y},\bar{\textbf{X}}}}[\log (\tilde{p}(\textbf{y}|\bar{\textbf{x}}))]\text { }&={{\mathbb {E}}_{\textbf{Y},\bar{\textbf{X}}}}\left[ {-\frac{1}{2}{{\bar{\textbf{x}}}}^{T}}{{{\bar{\textbf{G}}}}^{\text {r}}}\bar{\textbf{x}} +{{{\bar{\textbf{x}}}}^{T}}{{{\bar{\textbf{H}}}}^{\text {r}}}\textbf{y} \right] \\&=-\frac{1}{2}\text {Tr}({{\bar{\textbf{G}}}^{\text {r}}}) + \text {Tr}({{({{\bar{\textbf{H}}}^{\text {r}}})}^{T}}\bar{\textbf{H}}) \end{aligned}, \end{aligned}$$

and \(\textbf{y}'\) is used when S2.

$$\begin{aligned} {{I}_{\text {LB}}}=\sum \limits _{k=1}^{{{n}_{s}}}{\left[ \log (2(\delta _{k}+1))-\frac{{{R}_{kk}}}{2(\delta _{k}+1)}-\frac{\delta _{k}}{2} \right] }+\text {Tr}({{({{\bar{\textbf{H}}}^{\text {r}}})}^{T}}\bar{\textbf{H}}) \end{aligned}$$

Notice that the terms related to \(\bar{\textbf{H}}^{r}\) are \(\sum \limits _{k=1}^{{{n}_{s}}}\frac{{{R}_{kk}}}{2(\delta _{k}+1)}\) and \(\text {Tr}({{({{\bar{\textbf{H}}}^{\text {r}}})}^{T}}\bar{\textbf{H}})\). In S1, for example, rewrite \(\sum \limits _{k=1}^{{{n}_{s}}}\frac{{{R}_{kk}}}{2(\delta _{k}+1)}\) in matrix form as

$$\begin{aligned} \begin{aligned} \sum \limits _{k=1}^{{{n}_{s}}}\frac{{{R}_{kk}}}{2(\delta _{k}+1)}&=\frac{1}{2}\text {Tr}\left( \textbf{R}{{[\varvec{\Delta }+\textbf{I}]}^{-1}}\right) \\&=\frac{1}{2}\text {Tr}\left( {{\textbf{U}}^{T}}{{({{\bar{\textbf{H}}}^{\text {r}}})}^{T}}\left[ \bar{\textbf{H}}{{{\bar{\textbf{H}}}}^{T}}+\textbf{Q} \right] {{\bar{\textbf{H}}}^{\text {r}}}\textbf{U}{{[\varvec{\Delta }+\textbf{I}]}^{-1}}\right) \\&=\frac{1}{2}\text {Tr}\left( {{({{\bar{\textbf{H}}}^{\text {r}}})}^{T}}\left[ \bar{\textbf{H}}{{{\bar{\textbf{H}}}}^{T}}+\textbf{Q} \right] {{\bar{\textbf{H}}}^{\text {r}}}\textbf{U}{{[\varvec{\Delta }+\textbf{I}]}^{-1}}{{\textbf{U}}^{T}}\right) \\&=\frac{1}{2}\text {Tr}\left( {{({{{\bar{\textbf{H}}}}^{\text {r}}})}^{T}}\left[ \bar{\textbf{H}}{{{\bar{\textbf{H}}}}^{T}}+\textbf{Q} \right] {{{\bar{\textbf{H}}}}^{\text {r}}}{{[{{{\bar{\textbf{G}}}}^{\text {r}}}+\textbf{I}]}^{-1}} \right) \end{aligned}. \end{aligned}$$

Define objective function as

$$\begin{aligned} \begin{aligned} f\left( \bar{\textbf{H}}^{r} \right) =&\text {Tr}({{({{\bar{\textbf{H}}}^{\text {r}}})}^{T}}\bar{\textbf{H}})&-\frac{1}{2}\text {Tr}\left( {{({{{\bar{\textbf{H}}}}^{\text {r}}})}^{T}}\left[ \bar{\textbf{H}}{{{\bar{\textbf{H}}}}^{T}}+\textbf{Q} \right] {{{\bar{\textbf{H}}}}^{\text {r}}}{{[{{{\bar{\textbf{G}}}}^{\text {r}}}+\textbf{I}]}^{-1}}\right) \end{aligned}. \end{aligned}$$

Given the gradient calculation methods \(\frac{\partial \text {Tr}\left( {{\textbf{Z}}^{T}}\textbf{A} \right) }{\partial \textbf{Z}}=\textbf{A}\) and \(\frac{\partial \text {Tr}\left( {{\textbf{Z}}^{T}}\textbf{BZA} \right) }{\partial \textbf{Z}}={{\textbf{B}}^{T}}\textbf{Z}{{\textbf{A}}^{T}}+\textbf{BZA}\), the gradient of (A7) is calculated as

$$\begin{aligned} \begin{aligned} \frac{\partial f\left( \bar{\textbf{H}}^{r} \right) }{\partial \bar{\textbf{H}}^{r}}=&\bar{\textbf{H}}-\frac{1}{2} \left[ \bar{\textbf{H}}{{{\bar{\textbf{H}}}}^{T}}+\textbf{Q} \right] ^{T} {{\bar{\textbf{H}}}^{\text {r}}} \left( [\bar{\textbf{G}}^{\text {r}}+\textbf{I}]^{-1}\right) ^{T}\\&-\frac{1}{2} \left[ \bar{\textbf{H}}{{{\bar{\textbf{H}}}}^{T}}+\textbf{Q} \right] {\bar{\textbf{H}}}^{\text {r}} [\bar{\textbf{G}}^{\text {r}}+\textbf{I}]^{-1} \\ =&\bar{\textbf{H}}- \left[ \bar{\textbf{H}}{{{\bar{\textbf{H}}}}^{T}}+\textbf{Q} \right] {\bar{\textbf{H}}}^{\text {r}} [\bar{\textbf{G}}^{\text {r}}+\textbf{I}]^{-1} \end{aligned}. \end{aligned}$$

At zero point of the first derivative, \(\bar{\textbf{H}}^{\text {r}}\) is as follow

$$\begin{aligned} \bar{\textbf{H}}^{\text {r}} = \left[ \bar{\textbf{H}}{{{\bar{\textbf{H}}}}^{T}}+\textbf{Q} \right] ^{-1} \bar{\textbf{H}} [\bar{\textbf{G}}^{\text {r}}+\textbf{I}], \end{aligned}$$

which guarantees \(I_{LB}\) is optimal. What’s more, \(\textbf{Q}'\) is used in (A9) when S2. So far, (11) has been proved.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, X., Dong, C., Bian, Y. et al. Soft information acceleration aided subspace suppression MIMO detection. J Wireless Com Network 2024, 51 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: