Skip to main content

Low-complexity deep unfolded neural network receiver for MIMO systems based on the probability data association detector


The interest on applications where machine learning algorithms and communications are combined has been on a rising in recent years. Machine learning and neural networks are being advocated as a way of improving the performance of several functions across all layers of future communication systems. Furthermore, in applications where complexity reduction is essential for the system feasibility at the cost of an affordable performance loss, more efficient systems might be achieved with the aid of machine learning algorithms. Signal detection for multiple-input multiple-output (MIMO) systems has become a hot topic in recent years given its prominent role in fourth and fifth generations of mobile networks. However, the computational complexity in MIMO systems can become prohibitive when the number of antennas increases. Therefore, by leveraging neural networks architectures we propose a deep unfolded detector, whereby the algorithm of the probability data association (PDA) detector is adapted and enhanced by means of neural network learning capabilities. We unveil that the proposed detector is orders-of-magnitude less complex than the PDA detector, yet presenting no severe penalties in performance in terms of bit error rate (BER).

1 Introduction

The purported success of multiple-input multiple-output (MIMO) systems is being confirmed since the fourth generation of mobile networks (4G) and continued to show its importance in recent deployments of the fifth generation of mobile networks (5G) technology. Early studies on the sixth generation of mobile networks (6G) also show MIMO systems as a key enabler for future wireless systems [1]. Its advantages over classical single-input single-output (SISO) systems are extremely attractive and relatively simple to understand from a theoretical standpoint [2, 3]: By increasing the number of service antennas, an overall increase in data throughput is obtained.

It was shown in [4] that detectors based on neural networks (NNs) have a competitive performance when compared to the optimum maximum likelihood detector (MLD), while the former is more robust to imperfect channel estimations and less complex than the latter. However, the system model in [4] considers a system. Recently, several works [5, 6] proposed solutions that attempt to integrate machine learning (ML) and NN to MIMO systems. One emerging solution involves adapting NN architectures according to model-driven detection algorithms, such that its iterations are unfolded on NN layers. This solution is called deep unfolding [5, 7].

Therefore, in this work we propose a deep unfolded detector [8] based on the probability data association (PDA) detector [9] for MIMO systems. The main aim is to achieve the aforementioned advantages of data-driven detectors for SISO systems in MIMO systems, while advantageous features of the PDA detector [3] are maintained. To the best of authors knowledge, this is the first attempt at combining the deep unfolded architecture with the algorithm of the PDA detector for MIMO systems.

MIMO systems are also largely used for beamforming and beamsteering in the most recent mobile networks, where precoding can provide spatial multiplexing and improve the system performance without increasing the complexity on the receiver side [10]. It is clear that precoding will play an important role in future MIMO systems for mobile communications. Nevertheless, this work focuses on MIMO systems where multiple antennas transmit data over a rich scattering environment without considering precoding, relying on detection techniques that can resolve the inter-antenna interference (IAI) with affordable complexity, a scenario where the PDA detector is an interesting solution [3].

1.1 Contributions and paper organization

In this paper, we make the following contributions:

  • We propose a novel combination of the data-driven deep unfolded detector and the PDA algorithm for signal detection in MIMO systems;

  • Differently from other similar proposals [5, 6, 8], we employ the categorical cross-entropy loss function and dispense with the use of optimal Gaussian denoisers;

  • The computational complexity of the proposed detector is evaluated and compared with the complexity presented by detectors of interest;

  • A low-complexity variation of the deep unfolded PDA (DU-PDA) is also presented, its computational complexity being lower than the linear zero-forcing (ZF)detector;

  • Numerical results from computational simulations compare the uncoded and coded error rates of the proposed detectors with other detectors under time-dispersive channels.

The remainder of this paper is organized as follows. In Sect. 2, we present the system model of the baseline orthogonal frequency division multiplexing (OFDM)-MIMO system. Section 3 then introduces the problem of signal detection for MIMO systems and gives a brief description of the PDA detector and of the deep unfolding learning. This is followed by a description of the proposed DU-PDA and an analysis on the computational complexity of all detectors discussed throughout this paper. Next, in Sect. 4, we provide numerical results to evaluate the performance of all detectors studied in this paper, including the optimum MLD. Finally, Sect. 5 concludes the paper.

1.2 Notation

Throughout this paper, italicized letters (e.g., x or X) represent scalars, boldfaced lowercase letters (e.g., \({\mathbf {x}}\)) represent vectors, and boldfaced uppercase letters (e.g., \({\mathbf {X}}\)) denote matrices. The nth entry of the vector \({\mathbf {x}}\) is represented by \(x\left( n\right)\). The entry on the ith row and jth column of the matrix \({\mathbf {X}}\) is denoted by \(X_{i,j}\). The superscript \({\mathbf {x}}^{\left( n\right) }\) denotes the nth instance of the vector \({\mathbf {x}}\), such that \({\mathcal {X}} = \{{\mathbf {x}}^{\left( n\right) }\}_{\forall n}\) forms a collection of vectors or a dataset. The sets of real and complex numbers are represented by \({\mathbb {R}}\) and \({\mathbb {C}}\), respectively. The absolute value of the scalar \(x\in {\mathbb {R}}\) or the modulus of \(x\in {\mathbb {C}}\) is denoted by \(|x|\). The sets of vectors of dimension X with real and complex entries are, respectively, represented by \({\mathbb {R}}^{X}\) and \({\mathbb {C}}^{X}\). The sets of matrices of dimension \(X\times Y\) with real and complex entries are correspondingly described by \({\mathbb {R}}^{X\times Y}\) and \({\mathbb {C}}^{X\times Y}\). The transposition operation of a vector or matrix is represented as \(\left( \cdot \right) ^{\text {T}}\). The \(\ell _p\)-norm, \(p \ge 1\), of the vector \({\mathbf {x}}\) is given by \(\Vert {\mathbf {x}}\Vert _p = \left( |x\left( 0\right) |^p + |x\left( 1\right) |^p + \cdots + |x\left( n-1\right) |^p\right) ^{1/p}\). The expected value of the random variable z is denoted by \(\mathrm {E}\left[ z\right]\). The real and imaginary parts of \(z \in {\mathbb {C}}\) are denoted by \(\Re (z)\) and \(\Im (z)\). The estimate of a scalar x, a vector \({\mathbf {x}}\) or a matrix \({\mathbf {X}}\) is represented by \({\hat{x}}\), \(\hat{{\mathbf {x}}}\) and \(\hat{{\mathbf {X}}}\), respectively. The number of elements in a set \({\mathcal {X}}\) is given by \(\#{\mathcal {X}}\). Computational complexity is denoted by the asymptotic operator \({\mathcal {O}}(\cdot )\).

2 System model

Suppose that in a multiple antenna system we have \(N_\text {t}\) transmitting antennas and \(N_\text {r}\) receiving antennas, thereby constituting an \(N_\text {t} \times N_\text {r}\) point-to-point baseband and fully digital MIMO system. Therefore, bits of data are demultiplexed into \(N_\text {t}\) substreams, which in turn are mapped to a sequence of complex symbols. These symbols are transmitted by its respective transmit antenna using an OFDM system, for which it is assumed that the cyclic prefix (CP) length is larger than the maximum delay spread for all \(N_\text {t} N_\text {r}\) channels. Finally, after performing the discrete Fourier transform (DFT) we have the following representation of the received baseband signal at the kth subcarrier:

$$\begin{aligned} \mathbf {{\tilde{r}}}_k = \mathbf {{\tilde{H}}}_k\mathbf {{\tilde{a}}}_k + \mathbf {{\tilde{n}}}_k. \end{aligned}$$

Here, \(\mathbf {{\tilde{H}}}_k \in {\mathbb {C}}^{N_\text {r} \times N_\text {t}}\) is the matrix containing all channel frequency responses for the kth OFDM subcarrier; \(\mathbf {{\tilde{a}}}_k \in {\mathbb {C}}^{N_\text {t}}\) represents the symbol vector transmitted by the \(N_\text {t}\) transmit antennas on the kth subcarrier of the OFDM block and \(\mathbf {{\tilde{n}}}_k \in {\mathbb {C}}^{N_\text {r}}\) is the complex additive white Gaussian noise (AWGN) vector in the frequency domain at the kth subcarrier for the \(N_\text {r}\) receive antennas, with zero mean and covariance matrix given by \(\sigma ^2{\mathbf {I}}_{N_\text {r}}\).

For convenience, henceforth we make use of the real-valued representation [3, 8, 9] for systems. Therefore, let the received signal (1) be represented by the concatenation of its real and imaginary parts, such that

$$\begin{aligned} {\mathbf {r}}_k = {\mathbf {H}}_k{\mathbf {a}}_k + {\mathbf {n}}_k, \end{aligned}$$


$$\begin{aligned} {\mathbf {r}}_k&= \left[ \Re (\mathbf {{\tilde{r}}}_k)^\text {T} \ \Im (\mathbf {{\tilde{r}}}_k)^\text {T}\right] ^\text {T} \in {\mathbb {R}}^{2N_\text {r}}, \ \forall \ k, \end{aligned}$$
$$\begin{aligned} {\mathbf {H}}_k&= \begin{bmatrix} \Re (\mathbf {{\tilde{H}}}_k) &{} -\Im (\mathbf {{\tilde{H}}}_k) \\ \Im (\mathbf {{\tilde{H}}}_k) &{} \Re (\mathbf {{\tilde{H}}}_k) \\ \end{bmatrix} \in {\mathbb {R}}^{2N_\text {r} \times 2N_\text {t}}, \ \forall \ k, \end{aligned}$$
$$\begin{aligned} {\mathbf {a}}_k&= \left[ \Re (\mathbf {{\tilde{a}}}_k)^\text {T} \ \Im (\mathbf {{\tilde{a}}}_k)^\text {T}\right] ^\text {T} \in {\mathbb {R}}^{2N_\text {t}}, \ \forall \ k, \end{aligned}$$
$$\begin{aligned} {\mathbf {n}}_k&= \left[ \Re (\mathbf {{\tilde{n}}}_k)^\text {T} \ \Im (\mathbf {{\tilde{n}}}_k)^\text {T}\right] ^\text {T} \in {\mathbb {R}}^{2N_\text {r}}. \end{aligned}$$

Moreover, we assume that \(\Re (\mathbf {{\tilde{a}}}_k) \in {\mathbb {S}}^{N_\text {t}}\) and \(\Im (\mathbf {{\tilde{a}}}_k) \in {\mathbb {S}}^{N_\text {t}}\); that is, the real and imaginary parts of \(\mathbf {{\tilde{a}}}_k\) can take on different values from the finite set of coordinates pertaining to the square M-quadrature amplitude modulation (QAM) constellation. Hence, let \({{\mathbb {S}} = \{\pm E_0,\pm 3E_0,\ldots ,\pm (\sqrt{M} - 1)E_0\}}\), for \(E_0 = \sqrt{\frac{3}{2(M - 1)}}\), such that the constellation energy is normalized to 1 (unity).

3 Detection in MIMO systems

A classical problem in the MIMO literature is to decide which symbols were transmitted by each antenna when only (2) is available at the receiver. This detection problem can be solved optimally, however at great computational effort, by the MLD for MIMO as follows

$$\begin{aligned} \mathbf {{\hat{a}}}_k = \underset{{\mathbf {a}}_k \in {\mathbb {S}}^{2N_\text {t}}}{\arg \min } \Vert {\mathbf {r}}_k - {\mathbf {H}}_k{\mathbf {a}}_k\Vert _2^2, \end{aligned}$$

for which \(\mathbf {{\hat{a}}}_k \in {\mathbb {S}}^{2N_\text {t}}\) is the estimated vector of symbols’ coordinates.

It is known that the prohibitive complexity presented by the MLD motivated the research of several alternative detectors for MIMO throughout the last decades [3]. The PDA detector is one of these alternatives that presents significantly lower complexity when compared with the MLD, with an affordable bit error rate(BER) performance loss under specific conditions, as will be detailed in Sects. 3.5 and 4. In Sect. 3.1, the PDA detectors’ algorithm first proposed in [9] is briefly revisited, followed by our proposed DU-PDA, for which the PDA is the underlying algorithm.

3.1 Probability data association detector

Before the detection task is carried out by the PDA detector, the received signal, \({\mathbf {r}}_k\), is preprocessed or equalized using the ZF principle as follows [2, 3, 9]

$$\begin{aligned} {{\textbf {z}}}_k = {{\textbf {H}}}^{\dagger }_{k} {{\textbf {r}}}_k = {{\textbf {a}}}_k + {{\textbf {v}}}_k, \end{aligned}$$

wherein \({\mathbf {H}}^{\dagger }_{k} = ({\mathbf {H}}^T_k {\mathbf {H}}_k)^{-1} {\mathbf {H}}^T_k\) is the left Moore–Penrose pseudoinverse and \({\mathbf {v}}_k = {\mathbf {H}}^{\dagger }_{k} {\mathbf {n}}_k\) is the enhanced AWGN. Let us rewrite (8), such that

$$\begin{aligned} {\mathbf {z}}_k = {\mathbf {e}}_i a_k (i) + \underbrace{\sum _{j \ne i}{{\mathbf {e}}_j a_k (j)} + {\mathbf {v}}_k}_{{\mathcal {V}}_i}, \ \forall \ i,j \in \{0,1,\ldots ,2N_\text {t} - 1\}, \end{aligned}$$

where \({\mathbf {e}}_i\) is the vector with 1 (one) at its ith entry and 0 (zero) otherwise, and \({\mathcal {V}}_i\) is a multivariate random variable (RV) that can be seen as the effective interference-plus-noise contaminating \(a_k (i)\) [9]. Therefore, the crux is at detecting the symbol transmitted by the ith antenna, while considering that all other \(j \ne i\) transmitted symbols are interference added to the noise term, which is described by \({\mathcal {V}}_i\).

Therefore, the PDA detector associates, for each \(a_k (i)\), a probability vector \({\mathbf {p}}_i \in {\mathbb {R}}^{\sqrt{M}}\), which is given by the evaluation of \({P_m(a_k (i) = q\left( m\right) \ | \ {\mathbf {z}}_k,\{{\mathbf {p}}_j\}_{\forall j \ne i})}\); \({q\left( m\right) \in {\mathbb {S}}}\) being a coordinate of the M-QAM constellation and \({m \in \{0,1,\ldots ,\sqrt{M} - 1\}}\). It is important to remark that the PDA detector uses all \(\{{\mathbf {p}}_j\}_{\forall j \ne i}\) associated with interfering symbols already detected, thanks to the incorporation of a strategy similar to that of successive interference cancellation (SIC) detectors. This significantly reduces the computational complexity for calculating \({\mathbf {p}}_i\), since otherwise \({P_m(a_k (i) = q\left( m\right) \ | \ {\mathbf {z}}_k)}\) would have to be evaluated. The problem here is the requirement of computing multiple integrals for each received symbol, rendering this evaluation prohibitive in practice. Dropping the subscript \((\cdot )_k\) in order to simplify the notation and assuming that \({\mathcal {V}}_i\) has a Gaussian distribution [9, 11], then the likelihood function of \({{\mathbf {z}} \ | \ a(i) = q\left( m\right) }\) can be defined as

$$\begin{aligned} P_m({\mathbf {z}} \ | \ a(i) = q\left( m\right) ) \propto \exp \left( \alpha _m \left( i\right) \right) , \end{aligned}$$

for which

$$\begin{aligned} \alpha _m \left( i\right) = \left( {\mathbf {z}} - \varvec{\mu }_i - 0.5{\mathbf {e}}_i q\left( m\right) \right) ^\text {T} \varvec{\Omega }_i^{-1} {\mathbf {e}}_i q\left( m\right) , \end{aligned}$$

wherein \(\mathrm {E}\left[ {\mathcal {V}}_i \right] = \varvec{\mu }_i\) and \(\mathrm {COV}\left[ {\mathcal {V}}_i \right] = \varvec{\Omega }_i\) are given by

$$\begin{aligned} \varvec{\mu }_i&= \sum _{j \ne i}{{\mathbf {e}}_j\left( {\mathbf {q}}^\text {T} {\mathbf {p}}_j \right) }, \end{aligned}$$
$$\begin{aligned} \varvec{\Omega }_i&= \sum _{j \ne i}{{\mathbf {e}}_j{\mathbf {e}}_j^\text {T}\left( \left( {\mathbf {q}}^2\right) ^\text {T} {\mathbf {p}}_j - \varvec{\mu }_j^2 \right) } + 0.5\sigma ^2 {\mathbf {G}}^{-1}, \end{aligned}$$

where \({\mathbf {q}} = [q\left( 0\right) \ q\left( 1\right) \ \ldots \ q(\sqrt{M} - 1)]^\text {T}\) and \({{\mathbf {G}}^{-1} = ({\mathbf {H}}^\text {T} {\mathbf {H}})^{-1}}\) is the inverse of the Gram matrix [2] that accounts for the noise enhancement caused by the ZF. To evaluate the posteriors probabilities associated with each symbol, we compute

$$\begin{aligned} P_m(a(i) = q\left( m\right) \ | \ {\mathbf {z}},\{{\mathbf {p}}_j\}_{\forall j \ne i}) \approx \frac{P_m({\mathbf {z}} \ | \ a(i) = q\left( m\right) )}{\sum \limits _{m=0}^{\sqrt{M} - 1}{P_m({\mathbf {z}} \ | \ a(i) = q\left( m\right) )}}, \end{aligned}$$

which can be seen as an approximate form of the Bayesian theorem [11]. Then, substituting (10) into (14) yields

$$\begin{aligned} p_i \left( m\right) = \frac{\exp \left( \alpha _m \left( i\right) \right) }{\sum \limits _{m=0}^{\sqrt{M} - 1}{\exp \left( \alpha _m \left( i\right) \right) }}. \end{aligned}$$

Finally, the PDA detector procedure is given in Algorithm 1.

figure a

Note that the optimal detection sequence [9] used in Algorithm 1 can be found with the aid of the following operation:

$$\begin{aligned} \rho \left( i\right) = \frac{1}{{\mathbf {f}}_i^\text {T} {\mathbf {H}} {\mathbf {f}}_i}\max \left\{ 0,{\mathbf {f}}_i^\text {T} {\mathbf {h}}_i - \sum \limits _{j \ne i}{|{\mathbf {f}}_i^\text {T} {\mathbf {h}}_j |}\right\} ^2, \end{aligned}$$

where \({\mathbf {f}}_i^\text {T}\) represents the ith row of \({\mathbf {F}} = {\mathbf {H}}^{\dagger }\) and \({\mathbf {h}}_j\) denotes the jth column of \({\mathbf {H}}\). Note that larger magnitudes for \(\rho \left( i\right)\) mean that the ith antenna suffers less IAI [3]. In other words, the off-diagonal entries of the ith row from \({\mathbf {F}}{\mathbf {H}}\) have, combined, smaller magnitudes than its ith diagonal entry. It is easy to show that the optimal sequence is defined by sorting \({\varvec{\rho } = [\rho \left( 0\right) \ \rho \left( 1\right) \ \ldots \ \rho (2N_\text {t} - 1)]^\text {T}}\) in a descending order, denoted as \({\{k_i \in \{1,\ldots ,2N_\text {t}\} \ | \ \rho \left( k_0\right)> \rho \left( k_1\right)> \ldots > \rho \left( k_{2N_\text {t}}\right) \}}\).

3.2 Deep unfolding

Prior to presenting our proposed DU-PDA detector, a brief description of NNs and deep unfolding is provided in this section.

In general, the NN architecture has shown great potential for detecting signals, but its design and parameterization, among other problems, impose limitations [4]. Alternatively, this architecture can be adapted such that iterations of an given algorithm are unfolded on its layers [5, 6, 12], hence the term “unfolding.” It is also commonly assumed that the NN employs several layers and, consequently, the term “deep” is added.

More specifically, consider an algorithm with an input vector denoted by \({\mathbf {x}} \in {\mathbb {R}}^{N}\), for which its output is given by \({\mathbf {y}} \in {\mathbb {R}}^{S}\), then this algorithm can be expressed by [12]

$$\begin{aligned} y\left( s\right) = g \left( {\mathbf {x}},\varvec{\psi },\varvec{\Theta }\right) , \ \forall \ s \in \{0,1,\ldots ,S - 1\}, \end{aligned}$$

wherein \(\varvec{\Theta }\) is the set of all parameters used by the algorithm, \(g(\cdot )\) represents a mapping function, usually nonlinear, and \(\varvec{\psi }\) is iteratively updated as follows

$$\begin{aligned} \psi _\ell \left( s\right) = f \left( {\mathbf {x}},\psi _{\ell - 1} \left( s\right) ,\varvec{\Theta }\right) , \end{aligned}$$

where the \(\ell\)th iteration also involves an operation with a mapping function \(f(\cdot )\) and \(\varvec{\psi }_0\) denotes the initial value.

Therefore, in the deep unfolded context, \(\varvec{\psi }_\ell\) can be understood as the input-output relationship at the \(\ell\)th layer of a NN architecture, as illustrated in Fig. 1.

Fig. 1
figure 1

Deep unfolding architecture. It is based on an underlying algorithm with an input vector given by \({\mathbf {x}}\) and an output determined by \({\mathbf {y}}\). Each hidden layer unfolds the \(\ell\)th iteration of this algorithm and its input–output relationship is expressed by (18), whereas the output layer is represented by (17)

Note that dimensions of learnable parameters \(\varvec{\Theta }\) are defined according to the underlying algorithm after which (17), (18) and the architecture depicted in Fig. 1 are based. This includes weights and bias, for example, which are optimized by the NN training algorithm [4, 12]. In other words, this means that the number of layers and neurons is fixed, thereby simplifying considerably the process of defining what is commonly known as the NN hyperparameters.

Moreover, improvements are also obtained by using the aforementioned learnable parameters directly into the iterative algorithm. That way, learning capabilities of NNs can be applied for optimizing algorithms such that its global performance, computational complexity, or even both, are improved. In Sect. 3.3, the PDA detector, reviewed in Sect. 3.1, is implemented using the deep unfolded architecture for NNs, unveiling our proposed DU-PDA detector for MIMO systems.

3.3 Proposed deep unfolded PDA detector

Aiming to take advantage of the iterative algorithm of the PDA detector, we propose the DU-PDA detector. Firstly, in the DU-PDA detector, the received signal, \({\mathbf {r}}\), is preprocessed at the \(\ell\)th layer by the following operation [8]; [13, §IV-B, p. 1706]

$$\begin{aligned} {\mathbf {z}}_\ell = \mathbf {{\hat{a}}}_\ell + w_\ell {\mathbf {H}}^\text {T}\left( {\mathbf {r}} - {\mathbf {H}}\mathbf {{\hat{a}}}_\ell \right) , \ \forall \ \ell \in \{0,1,\ldots ,L - 1\}, \end{aligned}$$

where \(\mathbf {{\hat{a}}}_\ell \in {\mathbb {R}}^{2N_\text {t}}\) is the estimated transmitted symbol vector and the scalar \(w_\ell \in {\mathbb {R}}\) represents a learnable parameter. Note that this preprocessing principle differs from the ZF, which is used by the PDA detector, as defined in (8). In contrast, for the proposed DU-PDA, it is employed a preprocessing based on the approximate message passing (AMP)algorithm [14], which also bear similarities with the Richardson method [2, §IV-6, p. 9]. In this way, \(\mathbf {{\hat{a}}}_\ell\) is updated iteratively until it converges to an acceptable approximation of the transmitted symbol vector. Interestingly, when we have \(\mathbf {{\hat{a}}}_\ell \rightarrow {\mathbf {a}}\), then the so-called residual term \(\left( {\mathbf {r}} - {\mathbf {H}}\mathbf {{\hat{a}}}_\ell \right) \rightarrow {\mathbf {n}}\), which give us a result in (19) similar to (8).

The preprocessed signal of (19) is then fed into the following operationFootnote 1:

$$\begin{aligned} \psi _{\ell ^*} \left( m\right)&= \text {softm}\left( \left( {\mathbf {z}}_\ell - \varvec{\mu }_{\ell ^*} - 0.5{\mathbf {e}}_{\ell ^*} q\left( m\right) \right) ^\text {T} \varvec{\Omega }_{\ell ^*}^{-1} {\mathbf {e}}_{\ell ^*} q\left( m\right) \right) \nonumber \\&\forall \ m \in \{0,1,\ldots ,\sqrt{M} - 1\}, \end{aligned}$$


$$\begin{aligned} \text {softm}\left( x_\ell \left( m\right) \right) = \frac{e^{x_\ell \left( m\right) }}{\sum _{m=0}^{L - 1} e^{x_\ell \left( m\right) }}. \end{aligned}$$

Note that the nonlinear function \(\text {softm}(\cdot )\) is applied at each layer. This makes (20) identical to (15) except that it is unfolded on successive layers and that \(\varvec{\psi }_j = {\mathbf {p}}_j\). Notably, this also distinguishes the proposed DU-PDA from other architectures [5, 8, 13] that use instead optimal denoisers at each layer, which do not account for interfering symbols as the underlying PDA algorithm of the DU-PDA does. Moreover, since the preprocessing is modified, then it is necessary to redefine the covariance matrix, \(\varvec{\Omega }_{\ell ^*}\), as follows [15, §III-D, p. 2023], [8]

$$\begin{aligned} \varvec{\Omega }_{\ell ^*} = \sum _{j \ne {\ell ^*}}{{\mathbf {e}}_j{\mathbf {e}}_j^\text {T}\left( \left( {\mathbf {q}}^2\right) ^\text {T} \varvec{\psi }_j - \varvec{\mu }_j^2 \right) } + {\mathbf {e}}_{\ell ^*}{\mathbf {e}}_{\ell ^*}^\text {T}\mathrm {COV}\left[ {\mathbf {z}}_\ell - {\mathbf {a}}\right] , \end{aligned}$$


$$\begin{aligned} \mathrm {COV}\left[ {\mathbf {z}}_\ell - {\mathbf {a}}\right] = \frac{[\epsilon _\ell ]_+ \Vert {\mathbf {I}}_{2N_\text {t}} - w_\ell {\mathbf {H}}^\text {T} {\mathbf {H}}\Vert _2^2 + 0.5\sigma ^2 \Vert w_\ell {\mathbf {H}}^\text {T}\Vert _2^2}{2N_\text {t}}, \end{aligned}$$

wherein \([x]_+ = \max {(0,x)}\) and for which

$$\begin{aligned} \epsilon _\ell = \frac{\Vert {\mathbf {r}} - {\mathbf {H}}\mathbf {{\hat{a}}}_\ell \Vert _2^2 - N_\text {r} \sigma ^2}{\Vert {\mathbf {H}}\Vert _2^2}. \end{aligned}$$

Equation (23) can be understood as the empirical mean-squared error (MSE) estimator of the covariance matrix originated from the residual and noise terms of (19). More importantly, note that \(\varvec{\Omega }_{\ell ^*}\) is now a diagonal matrix. This means that computing \(\varvec{\Omega }_{\ell ^*}^{-1}\) is not as costly as its counterpart in (11), that is, in the PDA detector. More details about such implications are given in Sect. 3.4.

Therefore, by considering developments presented in this subsection and the general model described in Sect. 3.2, we have

$$\begin{aligned} \psi _{{\ell ^*} + 1} \left( m\right)&= \text {softm}\left( {\mathbf {z}}_\ell ,\psi _{\ell ^*} \left( m\right) ,\{w_\ell , \varvec{\mu }_{\ell ^*}, \varvec{\Omega }_{\ell ^*}\}\right) , \end{aligned}$$

which is similar to what is evaluated in (15) with the addition, however, of a learnable parameter and a different preprocessing of the received signal. Note also that \(\varvec{\psi }_L = {\mathbf {y}}\), meaning that the last layer output is also given by (25). Furthermore, let

$$\begin{aligned} \mathbf {{\hat{a}}}_{\ell + 1} = \sum _{j \ne \ell }{{\mathbf {e}}_j z_\ell \left( j\right) + {\mathbf {e}}_\ell \left( {\mathbf {q}}^\text {T} \varvec{\psi }_{\ell ^*}\right) }, \end{aligned}$$

such that the convergence of (19) might be improved, given that the soft combining of symbols’ coordinates and their estimated associated probabilities are fed forward to the next layer.

In Algorithm 2,

figure b

we detail the general procedure carried out by the proposed DU-PDA detector.

The ground truth used for training the NN is defined by \({{\mathbf {I}}_{\ell ^*} = [I\left( 0\right) \ I\left( 1\right) \ \ldots \ I(\sqrt{M} - 1)]^\text {T}}\), such that \({\mathcal {I}} = \{{\mathbf {I}}_{\ell ^*}\}_{\forall \ell ^*}\). It indicates the known constellation coordinates that are transmitted for the training procedure; thus, \(I_{\ell ^*}\left( m\right) \in \{0, 1\} \ \forall \ m\). Observe also that the PDA detector outputs approximate posteriors, as shown in (15), which is leveraged by our proposed DU-PDA detector in Algorithm 2 when employing the categorical cross-entropy loss function:

$$\begin{aligned} {\mathcal {L}}\left( {\mathcal {I}},\varvec{\psi }\right) = \frac{-1}{\sqrt{M}}\sum \limits _{\ell ^*}{{\mathbf {I}}_{\ell ^*} \log {\left( \varvec{\psi }_{\ell ^*} \right) } + \left( 1 - {\mathbf {I}}_{\ell ^*} \right) \log {\left( 1 - \varvec{\psi }_{\ell ^*} \right) }}. \end{aligned}$$

Bear in mind that the loss is calculated considering the output of all L unfolded layers and not only the last one. Also, note that the use of (27) contrasts with the popular choice of the MSE loss function [5]. Additionally, it is a well-known fact that the cross-entropy loss function is more appropriate for classification tasks.

3.4 Simplified DU-PDA

The model of the DU-PDA presented in the previous subsection can be simplified even further if some assumptions are made. Therefore, a new variation of the proposed DU-PDA detector, namely the simplified DU-PDA detector, is presented in this subsection. For this detector, the calculations performed in (23) are simplified and the scalar \(0.5\sigma ^2\) is applied directly in (22). The reasoning behind this approach lies in the asymptotic case, that is, when \(N_\text {t} \rightarrow \infty\) and \(N_\text {r} \rightarrow \infty\). For this case, the first term of (23) vanishes, sinceFootnote 2

$$\begin{aligned} {\mathbf {H}}^\text {T} {\mathbf {H}} \rightarrow {\mathbf {I}}_{2N_\text {t}}, \end{aligned}$$

and similarly for the second term we have

$$\begin{aligned} \Vert w_\ell {\mathbf {H}}^\text {T}\Vert _2^2 \rightarrow 2N_\text {t}, \end{aligned}$$

which yields

$$\begin{aligned} \mathrm {COV}\left[ {\mathbf {z}}_\ell - {\mathbf {a}}\right]&\rightarrow \frac{[\epsilon _\ell ]_+ \Vert {\mathbf {I}}_{2N_\text {t}} - {\mathbf {I}}_{2N_\text {t}}\Vert _2^2 + N_\text {t} \sigma ^2}{2N_\text {t}} \nonumber \\&\rightarrow 0.5\sigma ^2, \end{aligned}$$

wherein, for the sake of simplicity, the learnable parameter \(w_\ell\) is omitted. This is analogous to the channel hardening effect present in massive MIMO systems [2, 3], where values for \(N_\text {t}\) and \(N_\text {r}\) are large. Although we demonstrate via computational simulations in Sect. 4 that the simplified DU-PDA only presents marginal losses in performance, it is still unknown if other similar architectures proposed in the literature [5, 6, 8, 13] are robust enough to allow such simplifications.

3.5 Computational complexity

According to the guidelines presented in [4, §IV-C, p. 122404], the global computation complexity of the PDA detector is approximately given by

$$\begin{aligned} {\mathcal {O}}(16N_\text {t}^4 + 8\sqrt{M}N_\text {t}^3 + 8N_\text {t}^2(N_\text {r} + \sqrt{M}) + 4N_\text {t} N_\text {r}). \end{aligned}$$

However, if we let \(N_\text {r} \gg \sqrt{M}\) and simplify constants, then it can be written more compactly as

$$\begin{aligned} {\mathcal {O}}(N_\text {t}^4 + \sqrt{M}N_\text {t}^3 + N_\text {t}^2N_\text {r} + N_\text {t} N_\text {r}). \end{aligned}$$

Note that \({\mathcal {O}}(8N_\text {t}^3 + 16N_\text {t}^2 N_\text {r} + 4N_\text {t} N_\text {r})\) refers to the local cost of (8), where the inverse of \({\mathbf {G}}\) costs \({\mathcal {O}}(8N_\text {t}^3)\)Footnote 3 and \({{\mathcal {O}}(16N_\text {t}^4 + 8\sqrt{M}(N_\text {t}^3 + N_\text {t}^2))}\) is the complexity due to computing (11), for which \(\varvec{\Omega }_i^{-1}\) costs \({\mathcal {O}}(8N_\text {t}^3)\) [9] per outer iteration in Algorithm 1.

Moreover, the DU-PDA detector has an approximate global complexity of

$$\begin{aligned} {\mathcal {O}}(4LN_\text {t}^2 + 4LN_\text {t}(4N_\text {r} + \sqrt{M}) + LN_\text {r}). \end{aligned}$$

Consider again that all constants are simplified and that \(N_\text {r} \gg \sqrt{M}\) is simplified (33) to

$$\begin{aligned} {\mathcal {O}}(LN_\text {t}^2 + LN_\text {t}N_\text {r} + LN_\text {r}). \end{aligned}$$

The global complexity is composed mainly by the local cost of (19), given by \({\mathcal {O}}(8N_\text {t} N_\text {r})\) per layer, and the local cost of (23), expressed by \({\mathcal {O}}(4N_\text {t}^2 + 8N_\text {t} N_\text {r} + N_\text {r})\) for each layerFootnote 4. The NN training stage cost is not taking into account when calculating the computational complexity of the detection stage, since the training stage is assumed to be computed offline as discussed in [4]. Nevertheless, in general, the backpropagation algorithm used for training NNs has a complexity that scales linearly with the number of training samples, \(N_\text {TR}\), and training iterations, say \(N_\text {TI}\). More importantly, it scales exponentially with the number of layers L because of the chain rule derivatives calculated during backpropagation. In principle, this is a high complexity when compared with the detection or forward-pass complexity, but once trained, the NN-based detector may serve multiple users during a prescribed timeline [16]. This means that the training complexity cost is distributed over time and users, whereas the detection complexity is fixed for each user and transmission cycle. Hence, since training is not performed in the detection cycle, its complexity is not considered, enabling a fair comparison with other detectors.

Furthermore, recall that the simplified form of calculation demonstrated by (30) reduces even further the global complexity of the proposed DU-PDA detector. More specifically, the global complexity of the simplified DU-PDA detector is given approximately by \({\mathcal {O}}(LN_\text {t}N_\text {r})\), meaning that the cost is reduced to one order-of-magnitude when compared to the DU-PDA detector.

From the computational complexity associated with each detector, it is possible to conclude that the PDA is more complex than the proposed DU-PDA. More specifically, this cost difference is due to the higher-order term \(N_\text {t}^4\), included in the PDA global complexity. This is expected because of the inversion of matrices performed by the PDA detector, which are not necessary for both the DU-PDA and simplified DU-PDA detectors. Also, notice that for both of these detectors, the total number of layers L might significantly increase its global complexity. It is demonstrated in Sect. 4, however, that this number is a multiple of \(N_\text {t}\), thus still implying in a lower global complexity for the DU-PDA when compared to the PDA. In fact, the simplified DU-PDA complexity becomes even lower than that of the ZF in the aforementioned case. Additionally, an optimal detection sequence, such as (16), is not a general requirement for the DU-PDA, which further reduces its global complexity in relation to the PDA.

Despite shedding light on how detectors’ computational complexity compares to each other, these are only asymptotic predictions of complexity. A detailed evaluation of system end-to-end latency [17, 18], for example, is out of scope in this work. However, it can be verified for a typical \(4 \times 8\) MIMO considered in Sect. 4, that the symbol detection (see Line 15 of Algorithm 2) of the DU-PDA takes approximately 50 milliseconds in average with neglectable variance. Note that this time value heavily depends on the implementation of the proposed detector, which in this work is based on the TensorFlow library [19] not yet optimized for a full-fledged hardware implementation. Indeed, implementations using hardware description language (HDL) can provide a more reliable analysis on the end-to-end latency of the proposed detector.

For convenience, Table 1 summarizes the global computational complexity for all detectors of interest. Observe that the AMP detector and the sphere detector (SD) are also included for the sake of completeness. For the AMP, \(N_\text {I}\) refers to the number of iterations or updates executed, whereas for the SD we considered the fixed-complexity SD [3, §VIII-D, p. 20], since its performance is near-optimum. To conclude, note also in Table 1 how the complexity of all detectors increases polynomially with the number of transmitting antennas \(N_\text {t}\). The exceptions, however, are the MLD and the SD, whose complexity increases exponentially with \(N_\text {t}\) and \(\sqrt{N_\text {t}}\), respectively, as expected.

Table 1 Global computational complexity of detectors studied in this work. Note that they are given in the most compact form and are also ranked in an ascending order, that is, from less to more costly as lines progress to the bottom of the table

4 Numerical results and discussion

Before presenting numerical results about the detectors performances, we list important system parameters in the following subsection.

4.1 System parameters

In this work, the following system parameters are adopted: (i) Before transmission, a frame of \(n_b\) data bits is encoded using the polar encoder [20] with a code rate of \({R < 1}\). Thus, \(n_b/R\) bits now represent the coded frame that is effectively transmitted; (ii) entries of the channel frequency response matrix, \({\mathbf {H}}\), are drawn from a complex Gaussian random process for all k subcarriers at each transmission of an OFDM frame and are normalized by \(1/\sqrt{N_\text {r}}\). Hence, we have \(H_{i,j} \sim {{\mathcal {C}}}{{\mathcal {N}}}\left( 0,1/N_\text {r}\right) , \ \forall \ i,j\) and, consequently, the system signal-to-noise ratio (SNR) per bit can be expressed as follows

$$\begin{aligned} \Gamma _k = \left( \sqrt{M}R\right) ^{-1} \frac{\mathrm {E}\left[ \Vert {\mathbf {H}}_k {\mathbf {a}}_k \Vert _2^2 \right] }{N_\text {r} \sigma ^2}, \ \forall \ k, \end{aligned}$$

which is henceforward assumed to be identical for all subcarriers.

The BER is employed for measuring coded detectors’ performances, which is obtained by averaging bit decision errors over multiple Monte Carlo experiments. Each experiment is generated using a computational simulation that involves: (i) the generation of \(n_b = 256\) equiprobable data bits; (ii) the encoding of data bits by the polar encoder, resulting in a codeword of \(\frac{256}{R}\) bits; (iii) mapping of coded bits into complex symbols \(\mathbf {{\tilde{a}}}_k \in {\mathbb {S}}^{N_\text {t}}\) for all k subcarriers; (iv) transmission of the OFDM frame; (v) the generation of normalized channel coefficients to form entries of the channel matrix \({\mathbf {H}}_k\); (vi) the generation of complex AWGN samples present in the receiver; (vii) the final decision in favor of the symbol coordinate associated with the higher probability value; and (viii) the subsequent decoding of decided symbols into bits via the polar decoder [21]. More specifically, we implement a tree-based architecture of a successive cancellation list decoding [22], with code rate equal to R.

For the sake of brevity, some algorithmic proceduresFootnote 5 were omitted from Algorithm 2. However, it is worth mentioning that the DU-PDA training is performed considering that SNR values are drawn from a uniform distribution \({{\mathcal {U}} \sim [\min (\text {{SNR}}),\max (\text {{SNR}})]}\), as discussed in [4, §VI-A, p. 122405]. Additionally, it was decided heuristically to use a total number of \(N_\text {TR} = 10^5\) samples for training and also that the DU-PDA should include \(L = 4N_\text {t}\) layersFootnote 6. More details about the proposed DU-PDA hyperparameters can be verified in Table 2. These parameters are used for all scenarios demonstrated in Sect. 4.2.

Furthermore, note that in this work we employ hard decoding for all detectors analyzed. However, in principle, soft decoding could also be integrated to the proposed DU-PDA since soft outputs are available via (25) [11]. Nonetheless, for the proposed DU-PDA, the hard decoding approach attains a better performance-complexity trade-off, which is more aligned with the general aim of the work of proposing a low-complexity detector with affordable performance losses. This also allows for a fair comparison with algorithms that provide hard decoding sequences.

Table 2 Hyperparameters of interest for the proposed DU-PDA

4.2 Performance results

Figure 2 brings the uncoded detection performance for all detectors presented in Table 1, considering a square \(4 \times 4\) MIMO (Fig. 2a) system and a underloaded [3] \(4 \times 8\) MIMO (Fig. 2b), all of which employ the quadrature phase shift keying (QPSK) (\(M=4\)) modulation.

Fig. 2
figure 2

Performance of the ZF, AMP,DU-PDA,PDA, MLD and SD detectors for the uncoded \(N_\text {t} \times N_\text {r}\) MIMO system. The performance metric is the probability of symbol vector error, \(P\left( \mathbf {{\hat{a}}} \ne {\mathbf {a}}\right)\), which is given as a function of a range of SNR values. The scenario of a \(4 \times 4\) MIMO is illustrated, followed by the b \(4 \times 8\) MIMO, both considering the QPSK modulation

The detection performance is given as a function of multiple SNR values, and it is defined as the probability of occurrence of any error in the received symbol vector. This is done because bits are not encoded for the scenarios analyzed in Fig. 2.

Firstly, observe in Fig. 2a that the performance of the PDA detector adheres closely with that reported in the seminal work of [9], thus validating the simulation model. Moreover, notice that the DU-PDA detector has shown a prohibitive performance for the \(4 \times 4\) MIMO scenario, which was also verified to be the case for other square MIMO systems. However, for the underloaded scenario demonstrated in Fig. 2b, where \(N_\text {r} \gg N_\text {t}\), the DU-PDA detector presents better performance. All the same, if the relative performance of the DU-PDA against the ZF and, particularly, the AMP detectors is taken into account, then Fig. 2a and b shows that the DU-PDA outperforms these detectors for most of the SNR range analyzed, while presenting a comparable detection complexityFootnote 7. It was verified, however, that for the underloaded scenario of \(4 \times 8\) MIMO, the DU-PDA detector reaches a performance floor of \(P\left( \mathbf {{\hat{a}}} \ne {\mathbf {a}}\right) \approx 3\times 10^{-3}\), from which no improvement can be obtained irrespective of how high are the SNR values.

This motivated the integration of the Polar encoder as described in Sect. 4.1, also with a aim at potentially improving the proposed DU-PDA performance relative to other detectors. Note in Fig. 3a that the \(4 \times 8\) MIMO scenario is illustrated again as in Fig. 2, however, considering now the Polar encoding with a code rate of \(R = 1/2\).

Fig. 3
figure 3

Performance of the ZF, AMP, simplified DU-PDA, DU-PDA, PDA and MLD detectors for the coded \(N_\text {t} \times N_\text {r}\) MIMO system. Here, the performance metric is the BER, which is given as a function of a range of SNR values. The scenario presented is of the a \(4 \times 8\) MIMO with a code rate of \(R = 1/2\) and QPSK modulation, followed by the b \(4 \times 16\) MIMO also with \(R = 1/2\) and considering now the 16-QAM modulation. Note that we have omitted the SD curves here because it achieves the MLD performance

This is accompanied in Fig. 3b, for which the \(4 \times 16\) MIMO scenario with a 16-QAM (\(M=16\)) modulation is presented, considering the same aforementioned code rate.

We begin by pointing out that the performance floor observed in Fig. 3a and b, although undesirable, is not so much detrimental to the overall performance as in Fig. 2b. This happens because the introduced channel coding improves the performance for all the SNR range under analysis. Therefore, the BER values where the DU-PDA is better than the ZF and AMP consist of the more interesting region of values for which SNR \(< 10\) (dB). It is granted that the performance floor is still presented in Fig. 3a and b, but now at low values of BER \(\approx 2\times 10^{-4}\) and BER \(\approx 2\times 10^{-5}\), respectively. These observations support the conjecture that the uncoded DU-PDA detector is interference limited for high SNR values. In this SNR range, the distribution of (19) ceases to be approximately Gaussian because of the low AWGN levels and becomes defined in most part by the non-Gaussian IAI distribution. This in turn violates the Gaussian distribution assumption mentioned in Sect. 3.1, regarding the PDA detector, which is the underlying algorithm of the proposed DU-PDA detector. Hence, we have the performance floor shown in Fig. 2b, but which is partially mitigated by a robust coding scheme in Fig. 3. Furthermore, to elaborate on the detection performance of the AMP detector in Figs. 2 and 3, one can see that this detector suffers from a severe performance floor for high SNR. This behavior is also explained by the reasoning described for the DU-PDA, which means that the violation of the Gaussian distribution assumption also severely affects the AMP detection performance [23].

Moreover, note also that Fig. 3 depicts the detection performance of the simplified DU-PDA detector. For this detector, the calculations performed in (23) are simplified, yielding (30). Although the dimensions of MIMO systems illustrated in Fig. 3 are not large, numerical BER results presented here show that conclusions from Sect. 3.4 may still hold for a small number of antennas. Note in Fig. 3 that the detection performance of the simplified DU-PDA detector is practically identical to the DU-PDA detectors’ performance, except at the high SNR region where the simplified DU-PDA is marginally worse than the DU-PDA detector.

Finally, note also that the simplified DU-PDA complexity becomes even lower than that of the ZF and AMP detectors, especially when the number of \(L = 4N_\text {t}\) layers used is considered. This makes the simplified DU-PDA detector the less costly of all detectors analyzed in this work, as can be verified in Table 1. Yet it performs approximately 2 dB better than the ZF in Fig. 3a, for values of SNR \(< 10\) (dB), for example. More importantly, the simplified DU-PDA largely improves upon the performance of the AMP detector, in spite of using similar operations as described in (19).

Additionally, Fig. 4a

Fig. 4
figure 4

Performance of the ZF, simplified DU-PDA and PDA detectors for the coded (code rate of \(R = 1/2\)) MIMO system. The performance given in terms of BER values is plotted against a range of SNR values for the a \(8 \times 16\) MIMO, and as a function of multiple values for the number of transmitting antennas when considering the b \(N_\text {t} \times 12\) MIMO scenario. All scenarios presented consider the QPSK modulation. The remaining detectors described in Table 1 are not analyzed either because their performance is too prohibitive or identical to the performance of detectors already shown here

shows the performance of relevant detectors for the \(8 \times 16\) MIMO scenario considering the QPSK modulation. Figure 4b in turn illustrates detectors performances, also considering the QPSK modulation, for multiple values of transmitting antennas, \(N_\text {t}\), for which the number of receiving antennas, \(N_\text {r} = 12\), and the SNR \(= 7\) (dB) are fixed. Note that for this scenario we still assume the number of layers, L, of the DU-PDA detector, to be restrained by \(N_\text {t}\), such that \(L = 2cN_\text {t}\). This is adopted since each layer in the DU-PDA architecture outputs the posterior associated with one transmitted symbol, a by-product of the underlying PDA algorithm employed by the DU-PDA detector. However, we verified through experiments that for \(c > 2\) no improvement was obtained in detection performance, yet at the cost of increased training and detection complexity. Therefore, the value \(L = 4N_\text {t}\) defined in Sect. 4.1 was shown to be the most suitable one.

In Fig. 4a, it can be observed with the larger MIMO system that the proposed simplified DU-PDA detector outperforms the ZF detector, particularly for the low BER \(< 10^{-3}\) region. It is important to remark that for higher values of SNR the performance floor of the coded simplified DU-PDA is still present, remaining, however, at low BER values of approximately \(10^{-5}\). Moreover, note that the simplified DU-PDA performance becomes worse relative to the PDA detectors’ performance as the SNR values get higher, but recall that the simplified DU-PDA presents the lowest complexity (see Table 1). In addition to that, Fig. 4b shows that the simplified DU-PDA detector performance varies approximately linearly with the number of transmitting antennas \(N_\text {t}\), while the performance of the ZF detector changes more abruptly with \(N_\text {t}\). This means that the proposed simplified DU-PDA detector not only outperforms the more complex ZF, but it is also more robust for all considered MIMO system dimensions, assuming a target BER of \(10^{-3}\).

5 Conclusion

In this work, we proposed a detector for MIMO systems based upon the deep unfolded architecture for NNs, namely the DU-PDA detector. This detector unfolds iterations of the PDA algorithm in its layers, enhancing the model-driven PDA detector with the aid of its data-driven architecture.

It was shown that the DU-PDA detector, as well as its simplified form, outperforms both the AMP and ZF detectors, considering most of the SNR range evaluated. This can be particularly verified, for instance, in coded detection for the \(8\times 16\) MIMO system. However, the global computational complexity of the simplified DU-PDA detector is orders-of-magnitude less than the ZF detector. Furthermore, the lack of matrix inverses computations in the DU-PDA architecture not only reduces its cost, but also simplifies its implementation in practical systems. This is the case when, for example, channels are correlated, increasing the condition number of \({\mathbf {G}}\) and making impractical its inverse computation.

For future research endeavors, it would be interesting to increase the scenarios and dimensions of MIMO systems analyzed, by increasing the number of transmitting and receiving antennas, also evaluating practical underloaded and square MIMO systems alike. Moreover, the integration of soft decoding to the proposed DU-PDA can improve its performance and can be regarded as a natural progression of the research done in this work. The applicability of the proposed detector in MIMO systems that employ precoding is also an interesting research topic for future works. Finally, given the flexibility of the deep unfolding architecture, we maintain that other MIMO detection schemes could benefit greatly from the principles laid out in this work, becoming thus a promising topic for future research.


  1. \(\{\ell ^* \in \{0,1,\ldots ,2N_\text {t} - 1\}, \ k \in \{1,2,\ldots ,\lceil L / 2N_\text {t}\rceil - 1\} \ | \ \ell ^* = \ell - k2N_\text {t}; \ k2N_\text {t} \le \ell < (k + 1)2N_\text {t}\}\)

  2. We adopt the normalization of the channel matrix by \(1/\sqrt{N_\text {r}}\) as detailed in Sect. 4.

  3. For the sake of brevity, we assume that the inverse of a matrix, say \({\mathbf {X}} \in {\mathbb {R}}^{N \times N}\), is computed by the well-known Gaussian elimination, whose cost is approximately \({\mathcal {O}}(N^3)\).

  4. Note that the squared norm of a matrix \({\mathbf {X}} \in {\mathbb {R}}^{M \times N}\) can be written as \(\Vert {\mathbf {X}}\Vert _2^2 = \sum _{\forall i}{\sum _{\forall j}{X_{i,j}^2}}\); thus, its cost is \({\mathcal {O}}(MN)\).

  5. As mentioned earlier, we used the TensorFlow library [19] to implement a customized deep unfolded NN model. The implementation code can be found at

  6. It was verified that the PDA algorithm converges within an average of 2 convergence iterations in Algorithm 1 (with \(\epsilon = 10^{-3}\)), for all scenarios of interest. Therefore, there is no loss of generality when comparing both detectors costs in the context of results presented in this section.

  7. Note here that we consider \(L = 4N_\text {t}\) as stated in Sect. 4.1, making \(N_\text {t}^3\) the highest-order term within the DU-PDA complexity. Additionally, we also considered \(N_\text {I} = 50\) [8, §IV-A, p. 5] for the AMP detector, which clearly implies \(N_\text {I} \gg N_\text {t}\) and, consequently, also a highest cubic-order polynomial.



Fourth generation of mobile networks


Fifth generation of mobile networks


Sixth generation of mobile networks


Approximate message passing


Additive white Gaussian noise


Bit error rate


Cyclic prefix


Discrete Fourier transform


Deep neural network


Deep unfolded PDA


Hardware description language


Inter-antenna interference


Independent identically distributed


Long-term evolution


Matched filter


Multiple-input multiple-output


Multilayer perceptron


Minimum mean square error


Machine learning


Maximum likelihood detector


Mean-squared error


Neural network


Orthogonal frequency division multiplexing


Probability data association


Quadrature amplitude modulation


Quadrature phase shift keying


Random variable


Sphere detector


Successive interference cancellation


Single-input single-output


Signal-to-noise ratio




  1. J. Jeon, G. Lee, A.A.I. Ibrahim, J. Yuan, G. Xu, J. Cho, E. Onggosanusi, Y. Kim, J. Lee, J.C. Zhang, MIMO evolution toward 6G: modular massive MIMO in low-frequency bands. IEEE Commun. Mag. 59(11), 52–58 (2021).

    Article  Google Scholar 

  2. M.A. Albreem, M. Juntti, S. Shahabuddin, Massive MIMO detection techniques: a survey. IEEE Commun. Surveys Tutor. 21(4), 3109–3132 (2019).

    Article  Google Scholar 

  3. S. Yang, L. Hanzo, Fifty years of MIMO detection: the road to large-scale MIMOs. IEEE Commun. Surveys Tutor. 17(4), 1941–1988 (2015).

    Article  Google Scholar 

  4. P.H.C. De Souza, L.L. Mendes, M. Chafii, Compressive learning in communication systems: a neural network receiver for detecting compressed signals in OFDM systems. IEEE Access 9, 122397–122411 (2021).

    Article  Google Scholar 

  5. A. Balatsoukas-Stimming, C. Studer, Deep unfolding for communications systems: a survey and some new directions. In: 2019 IEEE International Workshop on Signal Processing Systems (SiPS), pp. 266–271 (2019).

  6. Q.-V. Pham, N.T. Nguyen, T. Huynh-The, L. Le Bao, K. Lee, W.-J. Hwang, Intelligent radio signal processing: a survey. IEEE Access 9, 83818–83850 (2021).

    Article  Google Scholar 

  7. C. Liu, J. Thompson, T. Arslan, A deep unfolding network for massive multi-user MIMO-OFDM detection. In: 2022 IEEE Wireless Communications and Networking Conference (WCNC), pp. 2405–2410 (2022).

  8. M. Khani, M. Alizadeh, J. Hoydis, P. Fleming, Adaptive neural signal detection for massive MIMO. IEEE Trans. Wireless Commun. 19(8), 5635–5648 (2020).

    Article  Google Scholar 

  9. D. Pham, K.R. Pattipati, P.K. Willett, J. Luo, A generalized probabilistic data association detector for multiple antenna systems. IEEE Commun. Lett. 8(4), 205–207 (2004).

    Article  Google Scholar 

  10. M.A. Albreem, A.H.A. Habbash, A.M. Abu-Hudrouss, S.S. Ikki, Overview of precoding techniques for massive MIMO. IEEE Access 9, 60764–60801 (2021).

    Article  Google Scholar 

  11. S. Yang, T. Lv, R.G. Maunder, L. Hanzo, From nominal to true a posteriori probabilities: an exact Bayesian theorem based probabilistic data association approach for iterative MIMO detection and decoding. IEEE Trans. Commun. 61(7), 2782–2793 (2013).

    Article  Google Scholar 

  12. A. Zappone, M. Di Renzo, M. Debbah, Wireless networks design in the era of deep learning: model-based, AI-based, or both? IEEE Trans. Commun. 67(10), 7331–7376 (2019).

    Article  Google Scholar 

  13. H. He, C.-K. Wen, S. Jin, G.Y. Li, Model-driven deep learning for MIMO detection. IEEE Trans. Signal Process. 68, 1702–1715 (2020).

    Article  MathSciNet  Google Scholar 

  14. D.L. Donoho, A. Maleki, A. Montanari, Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences 106(45), 18914–18919 (2009). arXiv:

  15. J. Ma, L. Ping, Orthogonal AMP. IEEE Access 5, 2020–2033 (2017).

    Article  Google Scholar 

  16. S. Ali, W. Saad, N. Rajatheva, K. Chang, D. Steinbach, B. Sliwa, C. Wietfeld, K. Mei, H. Shiri, H.-J. Zepernick, T.M.C. Chu, I. Ahmad, J. Huusko, J. Suutala, S. Bhadauria, V. Bhatia, R. Mitra, S. Amuru, R. Abbas, B. Shao, M. Capobianco, G. Yu, M. Claes, T. Karvonen, M. Chen, M. Girnyk, H. Malik, 6G White Paper on Machine Learning in Wireless Communication Networks. arXiv:2004.13875 (2020). arXiv:2004.13875

  17. J. Chen, X. Ran, Deep learning with edge computing: a review. Proc. IEEE 107(8), 1655–1674 (2019).

    Article  Google Scholar 

  18. C. Zhang, P. Patras, H. Haddadi, Deep learning in mobile and wireless networking: a survey. IEEE Commun. Surveys Tutor. 21, 2224–2287 (2019)

    Article  Google Scholar 

  19. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D.G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, X. Zheng, TensorFlow: A system for large-scale machine learning. arXiv (2016). arXiv:

  20. Guimarães, D.A.: Digital Transmission: A Simulation-Aided Introduction with VisSim/Comm. Springer, Berlin Heidelberg (2009).

  21. K. Besser, Digcommpy 0.9. Accessed 2022-06-07

  22. I. Tal, A. Vardy, List decoding of polar codes. IEEE Trans. Inf. Theory 61(5), 2213–2226 (2015).

    Article  MathSciNet  MATH  Google Scholar 

  23. E. Beck, C. Bockelmann, A. Dekorsy, CMDNet: learning a probabilistic relaxation of discrete variables for soft detection with low complexity. IEEE Trans. Commun. 69(12), 8214–8227 (2021).

    Article  Google Scholar 

Download references


Not applicable.


This work was partially supported by RNP, with resources from MCTIC, Grant No. 01245.010604/2020-14, under the 6G Mobile Communications Systems project of the Radiocommunication Reference Center (Centro de Referência em Radiocomunicações - CRR) of the National Institute of Telecommunications (Instituto Nacional de Telecomunicações - Inatel), Brazil, FAPESP Grant No. 20/05127-2 under the SAMURAI project, CNPq-Brazil and CAPES.

Author information

Authors and Affiliations



Both authors contributed equally for this publication.

P. H. C. de Souza

P. H. C. S. was born in Santa Rita do Sapucaí, Minas Gerais, MG, Brazil, in 1992. He received the BS and MS degrees in telecommunications engineering from the National Institute of Telecommunications - INATEL, Santa Rita do Sapucaí, in 2015 and 2017, respectively, and is currently working toward the PhD degree in telecommunications engineering at INATEL. During the year of 2014, he was a Hardware Tester with the INATEL Competence Center - ICC. His main interests are: digital communication systems, mobile telecommunications systems, 6G, cognitive radio, convex optimization for telecommunication systems, compressive sensing/learning, embedded systems and embedded hardware/firmware.

L. L. Mendes

L. L. M. received the BSc and MSc degrees from Inatel, Brazil, in 2001 and 2003, respectively, and the Doctor degree from Unicamp, Brazil, in 2007, all in electrical engineering. Since 2001, he has been a Professor with Inatel, where he has acted as the Technical Manager of the Hardware Development Laboratory from 2006 to 2012. From 2013 to 2015, he was a Visiting Researcher with the Technical University of Dresden in the Vodafone Chair Mobile Communications Systems, where he has developed his postdoctoral. In 2017, he was elected Research Coordinator of the 5G Brazil Project, an association involving industries, telecom operators and academia which aims for funding and build an ecosystem toward 5G in Brazil. He is also the technical coordinator of the Brazil 6G Project.

Corresponding authors

Correspondence to Pedro H. C. de Souza or Luciano L. Mendes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Souza, P.H.C.d., Mendes, L.L. Low-complexity deep unfolded neural network receiver for MIMO systems based on the probability data association detector. J Wireless Com Network 2022, 69 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: