 Research
 Open access
 Published:
Lowcomplexity deep unfolded neural network receiver for MIMO systems based on the probability data association detector
EURASIP Journal on Wireless Communications and Networking volumeĀ 2022, ArticleĀ number:Ā 69 (2022)
Abstract
The interest on applications where machine learning algorithms and communications are combined has been on a rising in recent years. Machine learning and neural networks are being advocated as a way of improving the performance of several functions across all layers of future communication systems. Furthermore, in applications where complexity reduction is essential for the system feasibility at the cost of an affordable performance loss, more efficient systems might be achieved with the aid of machine learning algorithms. Signal detection for multipleinput multipleoutput (MIMO) systems has become a hot topic in recent years given its prominent role in fourth and fifth generations of mobile networks. However, the computational complexity in MIMO systems can become prohibitive when the number of antennas increases. Therefore, by leveraging neural networks architectures we propose a deep unfolded detector, whereby the algorithm of the probability data association (PDA) detector is adapted and enhanced by means of neural network learning capabilities. We unveil that the proposed detector is ordersofmagnitude less complex than the PDA detector, yet presenting no severe penalties in performance in terms of bit error rate (BER).
1 Introduction
The purported success of multipleinput multipleoutput (MIMO) systems is being confirmed since the fourth generation of mobile networks (4G) and continued to show its importance in recent deployments of the fifth generation of mobile networks (5G) technology. Early studies on the sixth generation of mobile networks (6G) also show MIMO systems as a key enabler for future wireless systems [1]. Its advantages over classical singleinput singleoutput (SISO) systems are extremely attractive and relatively simple to understand from a theoretical standpoint [2, 3]: By increasing the number of service antennas, an overall increase in data throughput is obtained.
It was shown in [4] that detectors based on neural networks (NNs) have a competitive performance when compared to the optimum maximum likelihood detector (MLD), while the former is more robust to imperfect channel estimations and less complex than the latter. However, the system model in [4] considers a system. Recently, several works [5, 6] proposed solutions that attempt to integrate machine learning (ML) and NN to MIMO systems. One emerging solution involves adapting NN architectures according to modeldriven detection algorithms, such that its iterations are unfolded on NN layers. This solution is called deep unfolding [5, 7].
Therefore, in this work we propose a deep unfolded detector [8] based on the probability data association (PDA) detector [9] for MIMO systems. The main aim is to achieve the aforementioned advantages of datadriven detectors for SISO systems in MIMO systems, while advantageous features of the PDA detector [3] are maintained. To the best of authors knowledge, this is the first attempt at combining the deep unfolded architecture with the algorithm of the PDA detector for MIMO systems.
MIMO systems are also largely used for beamforming and beamsteering in the most recent mobile networks, where precoding can provide spatial multiplexing and improve the system performance without increasing the complexity on the receiver side [10]. It is clear that precoding will play an important role in future MIMO systems for mobile communications. Nevertheless, this work focuses on MIMO systems where multiple antennas transmit data over a rich scattering environment without considering precoding, relying on detection techniques that can resolve the interantenna interference (IAI) with affordable complexity, a scenario where the PDA detector is an interesting solution [3].
1.1 Contributions and paper organization
In this paper, we make the following contributions:

We propose a novel combination of the datadriven deep unfolded detector and the PDA algorithm for signal detection in MIMO systems;

Differently from other similar proposals [5, 6, 8], we employ the categorical crossentropy loss function and dispense with the use of optimal Gaussian denoisers;

The computational complexity of the proposed detector is evaluated and compared with the complexity presented by detectors of interest;

A lowcomplexity variation of the deep unfolded PDA (DUPDA) is also presented, its computational complexity being lower than the linear zeroforcing (ZF)detector;

Numerical results from computational simulations compare the uncoded and coded error rates of the proposed detectors with other detectors under timedispersive channels.
The remainder of this paper is organized as follows. In Sect.Ā 2, we present the system model of the baseline orthogonal frequency division multiplexing (OFDM)MIMO system. SectionĀ 3 then introduces the problem of signal detection for MIMO systems and gives a brief description of the PDA detector and of the deep unfolding learning. This is followed by a description of the proposed DUPDA and an analysis on the computational complexity of all detectors discussed throughout this paper. Next, in Sect.Ā 4, we provide numerical results to evaluate the performance of all detectors studied in this paper, including the optimum MLD. Finally, Sect.Ā 5 concludes the paper.
1.2 Notation
Throughout this paper, italicized letters (e.g., x or X) represent scalars, boldfaced lowercase letters (e.g., \({\mathbf {x}}\)) represent vectors, and boldfaced uppercase letters (e.g., \({\mathbf {X}}\)) denote matrices. The nth entry of the vector \({\mathbf {x}}\) is represented by \(x\left( n\right)\). The entry on the ith row and jth column of the matrix \({\mathbf {X}}\) is denoted by \(X_{i,j}\). The superscript \({\mathbf {x}}^{\left( n\right) }\) denotes the nth instance of the vector \({\mathbf {x}}\), such that \({\mathcal {X}} = \{{\mathbf {x}}^{\left( n\right) }\}_{\forall n}\) forms a collection of vectors or a dataset. The sets of real and complex numbers are represented by \({\mathbb {R}}\) and \({\mathbb {C}}\), respectively. The absolute value of the scalar \(x\in {\mathbb {R}}\) or the modulus of \(x\in {\mathbb {C}}\) is denoted by \(x\). The sets of vectors of dimension X with real and complex entries are, respectively, represented by \({\mathbb {R}}^{X}\) and \({\mathbb {C}}^{X}\). The sets of matrices of dimension \(X\times Y\) with real and complex entries are correspondingly described by \({\mathbb {R}}^{X\times Y}\) and \({\mathbb {C}}^{X\times Y}\). The transposition operation of a vector or matrix is represented as \(\left( \cdot \right) ^{\text {T}}\). The \(\ell _p\)norm, \(p \ge 1\), of the vector \({\mathbf {x}}\) is given by \(\Vert {\mathbf {x}}\Vert _p = \left( x\left( 0\right) ^p + x\left( 1\right) ^p + \cdots + x\left( n1\right) ^p\right) ^{1/p}\). The expected value of the random variable z is denoted by \(\mathrm {E}\left[ z\right]\). The real and imaginary parts of \(z \in {\mathbb {C}}\) are denoted by \(\Re (z)\) and \(\Im (z)\). The estimate of a scalar x, a vector \({\mathbf {x}}\) or a matrix \({\mathbf {X}}\) is represented by \({\hat{x}}\), \(\hat{{\mathbf {x}}}\) and \(\hat{{\mathbf {X}}}\), respectively. The number of elements in a set \({\mathcal {X}}\) is given by \(\#{\mathcal {X}}\). Computational complexity is denoted by the asymptotic operator \({\mathcal {O}}(\cdot )\).
2 System model
Suppose that in a multiple antenna system we have \(N_\text {t}\) transmitting antennas and \(N_\text {r}\) receiving antennas, thereby constituting an \(N_\text {t} \times N_\text {r}\) pointtopoint baseband and fully digital MIMO system. Therefore, bits of data are demultiplexed into \(N_\text {t}\) substreams, which in turn are mapped to a sequence of complex symbols. These symbols are transmitted by its respective transmit antenna using an OFDM system, for which it is assumed that the cyclic prefix (CP) length is larger than the maximum delay spread for all \(N_\text {t} N_\text {r}\) channels. Finally, after performing the discrete Fourier transform (DFT) we have the following representation of the received baseband signal at the kth subcarrier:
Here, \(\mathbf {{\tilde{H}}}_k \in {\mathbb {C}}^{N_\text {r} \times N_\text {t}}\) is the matrix containing all channel frequency responses for the kth OFDM subcarrier; \(\mathbf {{\tilde{a}}}_k \in {\mathbb {C}}^{N_\text {t}}\) represents the symbol vector transmitted by the \(N_\text {t}\) transmit antennas on the kth subcarrier of the OFDM block and \(\mathbf {{\tilde{n}}}_k \in {\mathbb {C}}^{N_\text {r}}\) is the complex additive white Gaussian noise (AWGN) vector in the frequency domain at the kth subcarrier for the \(N_\text {r}\) receive antennas, with zero mean and covariance matrix given by \(\sigma ^2{\mathbf {I}}_{N_\text {r}}\).
For convenience, henceforth we make use of the realvalued representation [3, 8, 9] for systems. Therefore, let the received signal (1) be represented by the concatenation of its real and imaginary parts, such that
where
Moreover, we assume that \(\Re (\mathbf {{\tilde{a}}}_k) \in {\mathbb {S}}^{N_\text {t}}\) and \(\Im (\mathbf {{\tilde{a}}}_k) \in {\mathbb {S}}^{N_\text {t}}\); that is, the real and imaginary parts of \(\mathbf {{\tilde{a}}}_k\) can take on different values from the finite set of coordinates pertaining to the square Mquadrature amplitude modulation (QAM) constellation. Hence, let \({{\mathbb {S}} = \{\pm E_0,\pm 3E_0,\ldots ,\pm (\sqrt{M}  1)E_0\}}\), for \(E_0 = \sqrt{\frac{3}{2(M  1)}}\), such that the constellation energy is normalized to 1 (unity).
3 Detection in MIMO systems
A classical problem in the MIMO literature is to decide which symbols were transmitted by each antenna when only (2) is available at the receiver. This detection problem can be solved optimally, however at great computational effort, by the MLD for MIMO as follows
for which \(\mathbf {{\hat{a}}}_k \in {\mathbb {S}}^{2N_\text {t}}\) is the estimated vector of symbolsā coordinates.
It is known that the prohibitive complexity presented by the MLD motivated the research of several alternative detectors for MIMO throughout the last decades [3]. The PDA detector is one of these alternatives that presents significantly lower complexity when compared with the MLD, with an affordable bit error rate(BER) performance loss under specific conditions, as will be detailed in Sects.Ā 3.5 and 4. In Sect.Ā 3.1, the PDA detectorsā algorithm first proposed in [9] is briefly revisited, followed by our proposed DUPDA, for which the PDA is the underlying algorithm.
3.1 Probability data association detector
Before the detection task is carried out by the PDA detector, the received signal, \({\mathbf {r}}_k\), is preprocessed or equalized using the ZF principle as follows [2, 3, 9]
wherein \({\mathbf {H}}^{\dagger }_{k} = ({\mathbf {H}}^T_k {\mathbf {H}}_k)^{1} {\mathbf {H}}^T_k\) is the left MooreāPenrose pseudoinverse and \({\mathbf {v}}_k = {\mathbf {H}}^{\dagger }_{k} {\mathbf {n}}_k\) is the enhanced AWGN. Let us rewrite (8), such that
where \({\mathbf {e}}_i\) is the vector with 1 (one) at its ith entry and 0 (zero) otherwise, and \({\mathcal {V}}_i\) is a multivariate random variable (RV) that can be seen as the effective interferenceplusnoise contaminating \(a_k (i)\) [9]. Therefore, the crux is at detecting the symbol transmitted by the ith antenna, while considering that all other \(j \ne i\) transmitted symbols are interference added to the noise term, which is described by \({\mathcal {V}}_i\).
Therefore, the PDA detector associates, for each \(a_k (i)\), a probability vector \({\mathbf {p}}_i \in {\mathbb {R}}^{\sqrt{M}}\), which is given by the evaluation of \({P_m(a_k (i) = q\left( m\right) \  \ {\mathbf {z}}_k,\{{\mathbf {p}}_j\}_{\forall j \ne i})}\); \({q\left( m\right) \in {\mathbb {S}}}\) being a coordinate of the MQAM constellation and \({m \in \{0,1,\ldots ,\sqrt{M}  1\}}\). It is important to remark that the PDA detector uses all \(\{{\mathbf {p}}_j\}_{\forall j \ne i}\) associated with interfering symbols already detected, thanks to the incorporation of a strategy similar to that of successive interference cancellation (SIC) detectors. This significantly reduces the computational complexity for calculating \({\mathbf {p}}_i\), since otherwise \({P_m(a_k (i) = q\left( m\right) \  \ {\mathbf {z}}_k)}\) would have to be evaluated. The problem here is the requirement of computing multiple integrals for each received symbol, rendering this evaluation prohibitive in practice. Dropping the subscript \((\cdot )_k\) in order to simplify the notation and assuming that \({\mathcal {V}}_i\) has a Gaussian distribution [9, 11], then the likelihood function of \({{\mathbf {z}} \  \ a(i) = q\left( m\right) }\) can be defined as
for which
wherein \(\mathrm {E}\left[ {\mathcal {V}}_i \right] = \varvec{\mu }_i\) and \(\mathrm {COV}\left[ {\mathcal {V}}_i \right] = \varvec{\Omega }_i\) are given by
where \({\mathbf {q}} = [q\left( 0\right) \ q\left( 1\right) \ \ldots \ q(\sqrt{M}  1)]^\text {T}\) and \({{\mathbf {G}}^{1} = ({\mathbf {H}}^\text {T} {\mathbf {H}})^{1}}\) is the inverse of the Gram matrix [2] that accounts for the noise enhancement caused by the ZF. To evaluate the posteriors probabilities associated with each symbol, we compute
which can be seen as an approximate form of the Bayesian theorem [11]. Then, substituting (10) into (14) yields
Finally, the PDA detector procedure is given in Algorithm 1.
Note that the optimal detection sequence [9] used in Algorithm 1 can be found with the aid of the following operation:
where \({\mathbf {f}}_i^\text {T}\) represents the ith row of \({\mathbf {F}} = {\mathbf {H}}^{\dagger }\) and \({\mathbf {h}}_j\) denotes the jth column of \({\mathbf {H}}\). Note that larger magnitudes for \(\rho \left( i\right)\) mean that the ith antenna suffers less IAI [3]. In other words, the offdiagonal entries of the ith row from \({\mathbf {F}}{\mathbf {H}}\) have, combined, smaller magnitudes than its ith diagonal entry. It is easy to show that the optimal sequence is defined by sorting \({\varvec{\rho } = [\rho \left( 0\right) \ \rho \left( 1\right) \ \ldots \ \rho (2N_\text {t}  1)]^\text {T}}\) in a descending order, denoted as \({\{k_i \in \{1,\ldots ,2N_\text {t}\} \  \ \rho \left( k_0\right)> \rho \left( k_1\right)> \ldots > \rho \left( k_{2N_\text {t}}\right) \}}\).
3.2 Deep unfolding
Prior to presenting our proposed DUPDA detector, a brief description of NNs and deep unfolding is provided in this section.
In general, the NN architecture has shown great potential for detecting signals, but its design and parameterization, among other problems, impose limitations [4]. Alternatively, this architecture can be adapted such that iterations of an given algorithm are unfolded on its layers [5, 6, 12], hence the term āunfolding.ā It is also commonly assumed that the NN employs several layers and, consequently, the term ādeepā is added.
More specifically, consider an algorithm with an input vector denoted by \({\mathbf {x}} \in {\mathbb {R}}^{N}\), for which its output is given by \({\mathbf {y}} \in {\mathbb {R}}^{S}\), then this algorithm can be expressed by [12]
wherein \(\varvec{\Theta }\) is the set of all parameters used by the algorithm, \(g(\cdot )\) represents a mapping function, usually nonlinear, and \(\varvec{\psi }\) is iteratively updated as follows
where the \(\ell\)th iteration also involves an operation with a mapping function \(f(\cdot )\) and \(\varvec{\psi }_0\) denotes the initial value.
Therefore, in the deep unfolded context, \(\varvec{\psi }_\ell\) can be understood as the inputoutput relationship at the \(\ell\)th layer of a NN architecture, as illustrated in Fig.Ā 1.
Note that dimensions of learnable parameters \(\varvec{\Theta }\) are defined according to the underlying algorithm after which (17), (18) and the architecture depicted in Fig.Ā 1 are based. This includes weights and bias, for example, which are optimized by the NN training algorithm [4, 12]. In other words, this means that the number of layers and neurons is fixed, thereby simplifying considerably the process of defining what is commonly known as the NN hyperparameters.
Moreover, improvements are also obtained by using the aforementioned learnable parameters directly into the iterative algorithm. That way, learning capabilities of NNs can be applied for optimizing algorithms such that its global performance, computational complexity, or even both, are improved. In Sect.Ā 3.3, the PDA detector, reviewed in Sect.Ā 3.1, is implemented using the deep unfolded architecture for NNs, unveiling our proposed DUPDA detector for MIMO systems.
3.3 Proposed deep unfolded PDA detector
Aiming to take advantage of the iterative algorithm of the PDA detector, we propose the DUPDA detector. Firstly, in the DUPDA detector, the received signal, \({\mathbf {r}}\), is preprocessed at the \(\ell\)th layer by the following operation [8]; [13, Ā§IVB, p. 1706]
where \(\mathbf {{\hat{a}}}_\ell \in {\mathbb {R}}^{2N_\text {t}}\) is the estimated transmitted symbol vector and the scalar \(w_\ell \in {\mathbb {R}}\) represents a learnable parameter. Note that this preprocessing principle differs from the ZF, which is used by the PDA detector, as defined in (8). In contrast, for the proposed DUPDA, it is employed a preprocessing based on the approximate message passing (AMP)algorithm [14], which also bear similarities with the Richardson method [2, Ā§IV6, p. 9]. In this way, \(\mathbf {{\hat{a}}}_\ell\) is updated iteratively until it converges to an acceptable approximation of the transmitted symbol vector. Interestingly, when we have \(\mathbf {{\hat{a}}}_\ell \rightarrow {\mathbf {a}}\), then the socalled residual term \(\left( {\mathbf {r}}  {\mathbf {H}}\mathbf {{\hat{a}}}_\ell \right) \rightarrow {\mathbf {n}}\), which give us a result in (19) similar to (8).
The preprocessed signal of (19) is then fed into the following operation^{Footnote 1}:
where
Note that the nonlinear function \(\text {softm}(\cdot )\) is applied at each layer. This makes (20) identical to (15) except that it is unfolded on successive layers and that \(\varvec{\psi }_j = {\mathbf {p}}_j\). Notably, this also distinguishes the proposed DUPDA from other architectures [5, 8, 13] that use instead optimal denoisers at each layer, which do not account for interfering symbols as the underlying PDA algorithm of the DUPDA does. Moreover, since the preprocessing is modified, then it is necessary to redefine the covariance matrix, \(\varvec{\Omega }_{\ell ^*}\), as follows [15, Ā§IIID, p. 2023], [8]
where
wherein \([x]_+ = \max {(0,x)}\) and for which
Equation (23) can be understood as the empirical meansquared error (MSE) estimator of the covariance matrix originated from the residual and noise terms of (19). More importantly, note that \(\varvec{\Omega }_{\ell ^*}\) is now a diagonal matrix. This means that computing \(\varvec{\Omega }_{\ell ^*}^{1}\) is not as costly as its counterpart in (11), that is, in the PDA detector. More details about such implications are given in Sect.Ā 3.4.
Therefore, by considering developments presented in this subsection and the general model described in Sect.Ā 3.2, we have
which is similar to what is evaluated in (15) with the addition, however, of a learnable parameter and a different preprocessing of the received signal. Note also that \(\varvec{\psi }_L = {\mathbf {y}}\), meaning that the last layer output is also given by (25). Furthermore, let
such that the convergence of (19) might be improved, given that the soft combining of symbolsā coordinates and their estimated associated probabilities are fed forward to the next layer.
In Algorithm 2,
we detail the general procedure carried out by the proposed DUPDA detector.
The ground truth used for training the NN is defined by \({{\mathbf {I}}_{\ell ^*} = [I\left( 0\right) \ I\left( 1\right) \ \ldots \ I(\sqrt{M}  1)]^\text {T}}\), such that \({\mathcal {I}} = \{{\mathbf {I}}_{\ell ^*}\}_{\forall \ell ^*}\). It indicates the known constellation coordinates that are transmitted for the training procedure; thus, \(I_{\ell ^*}\left( m\right) \in \{0, 1\} \ \forall \ m\). Observe also that the PDA detector outputs approximate posteriors, as shown in (15), which is leveraged by our proposed DUPDA detector in Algorithm 2 when employing the categorical crossentropy loss function:
Bear in mind that the loss is calculated considering the output of all L unfolded layers and not only the last one. Also, note that the use of (27) contrasts with the popular choice of the MSE loss function [5]. Additionally, it is a wellknown fact that the crossentropy loss function is more appropriate for classification tasks.
3.4 Simplified DUPDA
The model of the DUPDA presented in the previous subsection can be simplified even further if some assumptions are made. Therefore, a new variation of the proposed DUPDA detector, namely the simplified DUPDA detector, is presented in this subsection. For this detector, the calculations performed in (23) are simplified and the scalar \(0.5\sigma ^2\) is applied directly in (22). The reasoning behind this approach lies in the asymptotic case, that is, when \(N_\text {t} \rightarrow \infty\) and \(N_\text {r} \rightarrow \infty\). For this case, the first term of (23) vanishes, since^{Footnote 2}
and similarly for the second term we have
which yields
wherein, for the sake of simplicity, the learnable parameter \(w_\ell\) is omitted. This is analogous to the channel hardening effect present in massive MIMO systems [2, 3], where values for \(N_\text {t}\) and \(N_\text {r}\) are large. Although we demonstrate via computational simulations in Sect.Ā 4 that the simplified DUPDA only presents marginal losses in performance, it is still unknown if other similar architectures proposed in the literature [5, 6, 8, 13] are robust enough to allow such simplifications.
3.5 Computational complexity
According to the guidelines presented in [4, Ā§IVC, p. 122404], the global computation complexity of the PDA detector is approximately given by
However, if we let \(N_\text {r} \gg \sqrt{M}\) and simplify constants, then it can be written more compactly as
Note that \({\mathcal {O}}(8N_\text {t}^3 + 16N_\text {t}^2 N_\text {r} + 4N_\text {t} N_\text {r})\) refers to the local cost of (8), where the inverse of \({\mathbf {G}}\) costs \({\mathcal {O}}(8N_\text {t}^3)\)^{Footnote 3} and \({{\mathcal {O}}(16N_\text {t}^4 + 8\sqrt{M}(N_\text {t}^3 + N_\text {t}^2))}\) is the complexity due to computing (11), for which \(\varvec{\Omega }_i^{1}\) costs \({\mathcal {O}}(8N_\text {t}^3)\) [9] per outer iteration in Algorithm 1.
Moreover, the DUPDA detector has an approximate global complexity of
Consider again that all constants are simplified and that \(N_\text {r} \gg \sqrt{M}\) is simplified (33) to
The global complexity is composed mainly by the local cost of (19), given by \({\mathcal {O}}(8N_\text {t} N_\text {r})\) per layer, and the local cost of (23), expressed by \({\mathcal {O}}(4N_\text {t}^2 + 8N_\text {t} N_\text {r} + N_\text {r})\) for each layer^{Footnote 4}. The NN training stage cost is not taking into account when calculating the computational complexity of the detection stage, since the training stage is assumed to be computed offline as discussed in [4]. Nevertheless, in general, the backpropagation algorithm used for training NNs has a complexity that scales linearly with the number of training samples, \(N_\text {TR}\), and training iterations, say \(N_\text {TI}\). More importantly, it scales exponentially with the number of layers L because of the chain rule derivatives calculated during backpropagation. In principle, this is a high complexity when compared with the detection or forwardpass complexity, but once trained, the NNbased detector may serve multiple users during a prescribed timeline [16]. This means that the training complexity cost is distributed over time and users, whereas the detection complexity is fixed for each user and transmission cycle. Hence, since training is not performed in the detection cycle, its complexity is not considered, enabling a fair comparison with other detectors.
Furthermore, recall that the simplified form of calculation demonstrated by (30) reduces even further the global complexity of the proposed DUPDA detector. More specifically, the global complexity of the simplified DUPDA detector is given approximately by \({\mathcal {O}}(LN_\text {t}N_\text {r})\), meaning that the cost is reduced to one orderofmagnitude when compared to the DUPDA detector.
From the computational complexity associated with each detector, it is possible to conclude that the PDA is more complex than the proposed DUPDA. More specifically, this cost difference is due to the higherorder term \(N_\text {t}^4\), included in the PDA global complexity. This is expected because of the inversion of matrices performed by the PDA detector, which are not necessary for both the DUPDA and simplified DUPDA detectors. Also, notice that for both of these detectors, the total number of layers L might significantly increase its global complexity. It is demonstrated in Sect.Ā 4, however, that this number is a multiple of \(N_\text {t}\), thus still implying in a lower global complexity for the DUPDA when compared to the PDA. In fact, the simplified DUPDA complexity becomes even lower than that of the ZF in the aforementioned case. Additionally, an optimal detection sequence, such as (16), is not a general requirement for the DUPDA, which further reduces its global complexity in relation to the PDA.
Despite shedding light on how detectorsā computational complexity compares to each other, these are only asymptotic predictions of complexity. A detailed evaluation of system endtoend latency [17, 18], for example, is out of scope in this work. However, it can be verified for a typical \(4 \times 8\) MIMO considered in Sect.Ā 4, that the symbol detection (see Line 15 of Algorithm 2) of the DUPDA takes approximately 50 milliseconds in average with neglectable variance. Note that this time value heavily depends on the implementation of the proposed detector, which in this work is based on the TensorFlow library [19] not yet optimized for a fullfledged hardware implementation. Indeed, implementations using hardware description language (HDL) can provide a more reliable analysis on the endtoend latency of the proposed detector.
For convenience, TableĀ 1 summarizes the global computational complexity for all detectors of interest. Observe that the AMP detector and the sphere detector (SD) are also included for the sake of completeness. For the AMP, \(N_\text {I}\) refers to the number of iterations or updates executed, whereas for the SD we considered the fixedcomplexity SD [3, Ā§VIIID, p. 20], since its performance is nearoptimum. To conclude, note also in TableĀ 1 how the complexity of all detectors increases polynomially with the number of transmitting antennas \(N_\text {t}\). The exceptions, however, are the MLD and the SD, whose complexity increases exponentially with \(N_\text {t}\) and \(\sqrt{N_\text {t}}\), respectively, as expected.
4 Numerical results and discussion
Before presenting numerical results about the detectors performances, we list important system parameters in the following subsection.
4.1 System parameters
In this work, the following system parameters are adopted: (i) Before transmission, a frame of \(n_b\) data bits is encoded using the polar encoder [20] with a code rate of \({R < 1}\). Thus, \(n_b/R\) bits now represent the coded frame that is effectively transmitted; (ii) entries of the channel frequency response matrix, \({\mathbf {H}}\), are drawn from a complex Gaussian random process for all k subcarriers at each transmission of an OFDM frame and are normalized by \(1/\sqrt{N_\text {r}}\). Hence, we have \(H_{i,j} \sim {{\mathcal {C}}}{{\mathcal {N}}}\left( 0,1/N_\text {r}\right) , \ \forall \ i,j\) and, consequently, the system signaltonoise ratio (SNR) per bit can be expressed as follows
which is henceforward assumed to be identical for all subcarriers.
The BER is employed for measuring coded detectorsā performances, which is obtained by averaging bit decision errors over multiple Monte Carlo experiments. Each experiment is generated using a computational simulation that involves: (i) the generation of \(n_b = 256\) equiprobable data bits; (ii) the encoding of data bits by the polar encoder, resulting in a codeword of \(\frac{256}{R}\) bits; (iii) mapping of coded bits into complex symbols \(\mathbf {{\tilde{a}}}_k \in {\mathbb {S}}^{N_\text {t}}\) for all k subcarriers; (iv) transmission of the OFDM frame; (v) the generation of normalized channel coefficients to form entries of the channel matrix \({\mathbf {H}}_k\); (vi) the generation of complex AWGN samples present in the receiver; (vii) the final decision in favor of the symbol coordinate associated with the higher probability value; and (viii) the subsequent decoding of decided symbols into bits via the polar decoder [21]. More specifically, we implement a treebased architecture of a successive cancellation list decoding [22], with code rate equal to R.
For the sake of brevity, some algorithmic procedures^{Footnote 5} were omitted from Algorithm 2. However, it is worth mentioning that the DUPDA training is performed considering that SNR values are drawn from a uniform distribution \({{\mathcal {U}} \sim [\min (\text {{SNR}}),\max (\text {{SNR}})]}\), as discussed in [4, Ā§VIA, p. 122405]. Additionally, it was decided heuristically to use a total number of \(N_\text {TR} = 10^5\) samples for training and also that the DUPDA should include \(L = 4N_\text {t}\) layers^{Footnote 6}. More details about the proposed DUPDA hyperparameters can be verified in TableĀ 2. These parameters are used for all scenarios demonstrated in Sect.Ā 4.2.
Furthermore, note that in this work we employ hard decoding for all detectors analyzed. However, in principle, soft decoding could also be integrated to the proposed DUPDA since soft outputs are available via (25) [11]. Nonetheless, for the proposed DUPDA, the hard decoding approach attains a better performancecomplexity tradeoff, which is more aligned with the general aim of the work of proposing a lowcomplexity detector with affordable performance losses. This also allows for a fair comparison with algorithms that provide hard decoding sequences.
4.2 Performance results
FigureĀ 2 brings the uncoded detection performance for all detectors presented in TableĀ 1, considering a square \(4 \times 4\) MIMO (Fig.Ā 2a) system and a underloaded [3] \(4 \times 8\) MIMO (Fig.Ā 2b), all of which employ the quadrature phase shift keying (QPSK) (\(M=4\)) modulation.
The detection performance is given as a function of multiple SNR values, and it is defined as the probability of occurrence of any error in the received symbol vector. This is done because bits are not encoded for the scenarios analyzed in Fig.Ā 2.
Firstly, observe in Fig.Ā 2a that the performance of the PDA detector adheres closely with that reported in the seminal work of [9], thus validating the simulation model. Moreover, notice that the DUPDA detector has shown a prohibitive performance for the \(4 \times 4\) MIMO scenario, which was also verified to be the case for other square MIMO systems. However, for the underloaded scenario demonstrated in Fig.Ā 2b, where \(N_\text {r} \gg N_\text {t}\), the DUPDA detector presents better performance. All the same, if the relative performance of the DUPDA against the ZF and, particularly, the AMP detectors is taken into account, then Fig.Ā 2a and b shows that the DUPDA outperforms these detectors for most of the SNR range analyzed, while presenting a comparable detection complexity^{Footnote 7}. It was verified, however, that for the underloaded scenario of \(4 \times 8\) MIMO, the DUPDA detector reaches a performance floor of \(P\left( \mathbf {{\hat{a}}} \ne {\mathbf {a}}\right) \approx 3\times 10^{3}\), from which no improvement can be obtained irrespective of how high are the SNR values.
This motivated the integration of the Polar encoder as described in Sect.Ā 4.1, also with a aim at potentially improving the proposed DUPDA performance relative to other detectors. Note in Fig.Ā 3a that the \(4 \times 8\) MIMO scenario is illustrated again as in Fig.Ā 2, however, considering now the Polar encoding with a code rate of \(R = 1/2\).
This is accompanied in Fig.Ā 3b, for which the \(4 \times 16\) MIMO scenario with a 16QAM (\(M=16\)) modulation is presented, considering the same aforementioned code rate.
We begin by pointing out that the performance floor observed in Fig.Ā 3a and b, although undesirable, is not so much detrimental to the overall performance as in Fig.Ā 2b. This happens because the introduced channel coding improves the performance for all the SNR range under analysis. Therefore, the BER values where the DUPDA is better than the ZF and AMP consist of the more interesting region of values for which SNR \(< 10\) (dB). It is granted that the performance floor is still presented in Fig.Ā 3a and b, but now at low values of BER \(\approx 2\times 10^{4}\) and BER \(\approx 2\times 10^{5}\), respectively. These observations support the conjecture that the uncoded DUPDA detector is interference limited for high SNR values. In this SNR range, the distribution of (19) ceases to be approximately Gaussian because of the low AWGN levels and becomes defined in most part by the nonGaussian IAI distribution. This in turn violates the Gaussian distribution assumption mentioned in Sect.Ā 3.1, regarding the PDA detector, which is the underlying algorithm of the proposed DUPDA detector. Hence, we have the performance floor shown in Fig.Ā 2b, but which is partially mitigated by a robust coding scheme in Fig.Ā 3. Furthermore, to elaborate on the detection performance of the AMP detector in Figs.Ā 2 andĀ 3, one can see that this detector suffers from a severe performance floor for high SNR. This behavior is also explained by the reasoning described for the DUPDA, which means that the violation of the Gaussian distribution assumption also severely affects the AMP detection performance [23].
Moreover, note also that Fig.Ā 3 depicts the detection performance of the simplified DUPDA detector. For this detector, the calculations performed in (23) are simplified, yielding (30). Although the dimensions of MIMO systems illustrated in Fig.Ā 3 are not large, numerical BER results presented here show that conclusions from Sect.Ā 3.4 may still hold for a small number of antennas. Note in Fig.Ā 3 that the detection performance of the simplified DUPDA detector is practically identical to the DUPDA detectorsā performance, except at the high SNR region where the simplified DUPDA is marginally worse than the DUPDA detector.
Finally, note also that the simplified DUPDA complexity becomes even lower than that of the ZF and AMP detectors, especially when the number of \(L = 4N_\text {t}\) layers used is considered. This makes the simplified DUPDA detector the less costly of all detectors analyzed in this work, as can be verified in TableĀ 1. Yet it performs approximately 2 dB better than the ZF in Fig.Ā 3a, for values of SNR \(< 10\) (dB), for example. More importantly, the simplified DUPDA largely improves upon the performance of the AMP detector, in spite of using similar operations as described in (19).
Additionally, Fig.Ā 4a
shows the performance of relevant detectors for the \(8 \times 16\) MIMO scenario considering the QPSK modulation. FigureĀ 4b in turn illustrates detectors performances, also considering the QPSK modulation, for multiple values of transmitting antennas, \(N_\text {t}\), for which the number of receiving antennas, \(N_\text {r} = 12\), and the SNR \(= 7\) (dB) are fixed. Note that for this scenario we still assume the number of layers, L, of the DUPDA detector, to be restrained by \(N_\text {t}\), such that \(L = 2cN_\text {t}\). This is adopted since each layer in the DUPDA architecture outputs the posterior associated with one transmitted symbol, a byproduct of the underlying PDA algorithm employed by the DUPDA detector. However, we verified through experiments that for \(c > 2\) no improvement was obtained in detection performance, yet at the cost of increased training and detection complexity. Therefore, the value \(L = 4N_\text {t}\) defined in Sect.Ā 4.1 was shown to be the most suitable one.
In Fig.Ā 4a, it can be observed with the larger MIMO system that the proposed simplified DUPDA detector outperforms the ZF detector, particularly for the low BER \(< 10^{3}\) region. It is important to remark that for higher values of SNR the performance floor of the coded simplified DUPDA is still present, remaining, however, at low BER values of approximately \(10^{5}\). Moreover, note that the simplified DUPDA performance becomes worse relative to the PDA detectorsā performance as the SNR values get higher, but recall that the simplified DUPDA presents the lowest complexity (see TableĀ 1). In addition to that, Fig.Ā 4b shows that the simplified DUPDA detector performance varies approximately linearly with the number of transmitting antennas \(N_\text {t}\), while the performance of the ZF detector changes more abruptly with \(N_\text {t}\). This means that the proposed simplified DUPDA detector not only outperforms the more complex ZF, but it is also more robust for all considered MIMO system dimensions, assuming a target BER of \(10^{3}\).
5 Conclusion
In this work, we proposed a detector for MIMO systems based upon the deep unfolded architecture for NNs, namely the DUPDA detector. This detector unfolds iterations of the PDA algorithm in its layers, enhancing the modeldriven PDA detector with the aid of its datadriven architecture.
It was shown that the DUPDA detector, as well as its simplified form, outperforms both the AMP and ZF detectors, considering most of the SNR range evaluated. This can be particularly verified, for instance, in coded detection for the \(8\times 16\) MIMO system. However, the global computational complexity of the simplified DUPDA detector is ordersofmagnitude less than the ZF detector. Furthermore, the lack of matrix inverses computations in the DUPDA architecture not only reduces its cost, but also simplifies its implementation in practical systems. This is the case when, for example, channels are correlated, increasing the condition number of \({\mathbf {G}}\) and making impractical its inverse computation.
For future research endeavors, it would be interesting to increase the scenarios and dimensions of MIMO systems analyzed, by increasing the number of transmitting and receiving antennas, also evaluating practical underloaded and square MIMO systems alike. Moreover, the integration of soft decoding to the proposed DUPDA can improve its performance and can be regarded as a natural progression of the research done in this work. The applicability of the proposed detector in MIMO systems that employ precoding is also an interesting research topic for future works. Finally, given the flexibility of the deep unfolding architecture, we maintain that other MIMO detection schemes could benefit greatly from the principles laid out in this work, becoming thus a promising topic for future research.
Notes
\(\{\ell ^* \in \{0,1,\ldots ,2N_\text {t}  1\}, \ k \in \{1,2,\ldots ,\lceil L / 2N_\text {t}\rceil  1\} \  \ \ell ^* = \ell  k2N_\text {t}; \ k2N_\text {t} \le \ell < (k + 1)2N_\text {t}\}\)
We adopt the normalization of the channel matrix by \(1/\sqrt{N_\text {r}}\) as detailed in Sect.Ā 4.
For the sake of brevity, we assume that the inverse of a matrix, say \({\mathbf {X}} \in {\mathbb {R}}^{N \times N}\), is computed by the wellknown Gaussian elimination, whose cost is approximately \({\mathcal {O}}(N^3)\).
Note that the squared norm of a matrix \({\mathbf {X}} \in {\mathbb {R}}^{M \times N}\) can be written as \(\Vert {\mathbf {X}}\Vert _2^2 = \sum _{\forall i}{\sum _{\forall j}{X_{i,j}^2}}\); thus, its cost is \({\mathcal {O}}(MN)\).
As mentioned earlier, we used the TensorFlow library [19] to implement a customized deep unfolded NN model. The implementation code can be found at https://github.com/PedroSouzaINATEL/DUPDAcoded.git.
It was verified that the PDA algorithm converges within an average of 2 convergence iterations in Algorithm 1 (with \(\epsilon = 10^{3}\)), for all scenarios of interest. Therefore, there is no loss of generality when comparing both detectors costs in the context of results presented in this section.
Note here that we consider \(L = 4N_\text {t}\) as stated in Sect.Ā 4.1, making \(N_\text {t}^3\) the highestorder term within the DUPDA complexity. Additionally, we also considered \(N_\text {I} = 50\) [8, Ā§IVA, p. 5] for the AMP detector, which clearly implies \(N_\text {I} \gg N_\text {t}\) and, consequently, also a highest cubicorder polynomial.
Abbreviations
 4G:

Fourth generation of mobile networks
 5G:

Fifth generation of mobile networks
 6G:

Sixth generation of mobile networks
 AMP:

Approximate message passing
 AWGN:

Additive white Gaussian noise
 BER:

Bit error rate
 CP:

Cyclic prefix
 DFT:

Discrete Fourier transform
 DNN:

Deep neural network
 DUPDA:

Deep unfolded PDA
 HDL:

Hardware description language
 IAI:

Interantenna interference
 iid:

Independent identically distributed
 LTE:

Longterm evolution
 MF:

Matched filter
 MIMO:

Multipleinput multipleoutput
 MLP:

Multilayer perceptron
 MMSE:

Minimum mean square error
 ML:

Machine learning
 MLD:

Maximum likelihood detector
 MSE:

Meansquared error
 NN:

Neural network
 OFDM:

Orthogonal frequency division multiplexing
 PDA:

Probability data association
 QAM:

Quadrature amplitude modulation
 QPSK:

Quadrature phase shift keying
 RV:

Random variable
 SD:

Sphere detector
 SIC:

Successive interference cancellation
 SISO:

Singleinput singleoutput
 SNR:

Signaltonoise ratio
 ZF:

Zeroforcing
References
J. Jeon, G. Lee, A.A.I. Ibrahim, J. Yuan, G. Xu, J. Cho, E. Onggosanusi, Y. Kim, J. Lee, J.C. Zhang, MIMO evolution toward 6G: modular massive MIMO in lowfrequency bands. IEEE Commun. Mag. 59(11), 52ā58 (2021). https://doi.org/10.1109/MCOM.211.2100164
M.A. Albreem, M. Juntti, S. Shahabuddin, Massive MIMO detection techniques: a survey. IEEE Commun. Surveys Tutor. 21(4), 3109ā3132 (2019). https://doi.org/10.1109/COMST.2019.2935810
S. Yang, L. Hanzo, Fifty years of MIMO detection: the road to largescale MIMOs. IEEE Commun. Surveys Tutor. 17(4), 1941ā1988 (2015). https://doi.org/10.1109/COMST.2015.2475242
P.H.C. De Souza, L.L. Mendes, M. Chafii, Compressive learning in communication systems: a neural network receiver for detecting compressed signals in OFDM systems. IEEE Access 9, 122397ā122411 (2021). https://doi.org/10.1109/ACCESS.2021.3108061
A. BalatsoukasStimming, C. Studer, Deep unfolding for communications systems: a survey and some new directions. In: 2019 IEEE International Workshop on Signal Processing Systems (SiPS), pp. 266ā271 (2019). https://doi.org/10.1109/SiPS47522.2019.9020494
Q.V. Pham, N.T. Nguyen, T. HuynhThe, L. Le Bao, K. Lee, W.J. Hwang, Intelligent radio signal processing: a survey. IEEE Access 9, 83818ā83850 (2021). https://doi.org/10.1109/ACCESS.2021.3087136
C. Liu, J. Thompson, T. Arslan, A deep unfolding network for massive multiuser MIMOOFDM detection. In: 2022 IEEE Wireless Communications and Networking Conference (WCNC), pp. 2405ā2410 (2022). https://doi.org/10.1109/WCNC51071.2022.9771554
M. Khani, M. Alizadeh, J. Hoydis, P. Fleming, Adaptive neural signal detection for massive MIMO. IEEE Trans. Wireless Commun. 19(8), 5635ā5648 (2020). https://doi.org/10.1109/TWC.2020.2996144
D. Pham, K.R. Pattipati, P.K. Willett, J. Luo, A generalized probabilistic data association detector for multiple antenna systems. IEEE Commun. Lett. 8(4), 205ā207 (2004). https://doi.org/10.1109/LCOMM.2004.823405
M.A. Albreem, A.H.A. Habbash, A.M. AbuHudrouss, S.S. Ikki, Overview of precoding techniques for massive MIMO. IEEE Access 9, 60764ā60801 (2021). https://doi.org/10.1109/ACCESS.2021.3073325
S. Yang, T. Lv, R.G. Maunder, L. Hanzo, From nominal to true a posteriori probabilities: an exact Bayesian theorem based probabilistic data association approach for iterative MIMO detection and decoding. IEEE Trans. Commun. 61(7), 2782ā2793 (2013). https://doi.org/10.1109/TCOMM.2013.053013.120427
A. Zappone, M. Di Renzo, M. Debbah, Wireless networks design in the era of deep learning: modelbased, AIbased, or both? IEEE Trans. Commun. 67(10), 7331ā7376 (2019). https://doi.org/10.1109/TCOMM.2019.2924010
H. He, C.K. Wen, S. Jin, G.Y. Li, Modeldriven deep learning for MIMO detection. IEEE Trans. Signal Process. 68, 1702ā1715 (2020). https://doi.org/10.1109/TSP.2020.2976585
D.L. Donoho, A. Maleki, A. Montanari, Messagepassing algorithms for compressed sensing. Proceedings of the National Academy of Sciences 106(45), 18914ā18919 (2009). https://doi.org/10.1073/pnas.0909892106. arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.0909892106
J. Ma, L. Ping, Orthogonal AMP. IEEE Access 5, 2020ā2033 (2017). https://doi.org/10.1109/ACCESS.2017.2653119
S. Ali, W. Saad, N. Rajatheva, K. Chang, D. Steinbach, B. Sliwa, C. Wietfeld, K. Mei, H. Shiri, H.J. Zepernick, T.M.C. Chu, I. Ahmad, J. Huusko, J. Suutala, S. Bhadauria, V. Bhatia, R. Mitra, S. Amuru, R. Abbas, B. Shao, M. Capobianco, G. Yu, M. Claes, T. Karvonen, M. Chen, M. Girnyk, H. Malik, 6G White Paper on Machine Learning in Wireless Communication Networks. arXiv:2004.13875 (2020). arXiv:2004.13875
J. Chen, X. Ran, Deep learning with edge computing: a review. Proc. IEEE 107(8), 1655ā1674 (2019). https://doi.org/10.1109/JPROC.2019.2921977
C. Zhang, P. Patras, H. Haddadi, Deep learning in mobile and wireless networking: a survey. IEEE Commun. Surveys Tutor. 21, 2224ā2287 (2019)
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D.G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, X. Zheng, TensorFlow: A system for largescale machine learning. arXiv (2016). https://doi.org/10.48550/ARXIV.1605.08695. arXiv:https://arxiv.org/abs/1605.08695
GuimarĆ£es, D.A.: Digital Transmission: A SimulationAided Introduction with VisSim/Comm. Springer, Berlin Heidelberg (2009). https://doi.org/10.1007/9783642013591
K. Besser, Digcommpy 0.9. https://pypi.org/project/digcommpy/ Accessed 20220607
I. Tal, A. Vardy, List decoding of polar codes. IEEE Trans. Inf. Theory 61(5), 2213ā2226 (2015). https://doi.org/10.1109/TIT.2015.2410251
E. Beck, C. Bockelmann, A. Dekorsy, CMDNet: learning a probabilistic relaxation of discrete variables for soft detection with low complexity. IEEE Trans. Commun. 69(12), 8214ā8227 (2021). https://doi.org/10.1109/TCOMM.2021.3114682
Acknowledgements
Not applicable.
Funding
This work was partially supported by RNP, with resources from MCTIC, Grant No. 01245.010604/202014, under the 6G Mobile Communications Systems project of the Radiocommunication Reference Center (Centro de ReferĆŖncia em RadiocomunicaĆ§Ćµes  CRR) of the National Institute of Telecommunications (Instituto Nacional de TelecomunicaĆ§Ćµes  Inatel), Brazil, FAPESP Grant No. 20/051272 under the SAMURAI project, CNPqBrazil and CAPES.
Author information
Authors and Affiliations
Contributions
Both authors contributed equally for this publication.
P. H. C. de Souza
P. H. C. S. was born in Santa Rita do SapucaĆ, Minas Gerais, MG, Brazil, in 1992. He received the BS and MS degrees in telecommunications engineering from the National Institute of Telecommunications  INATEL, Santa Rita do SapucaĆ, in 2015 and 2017, respectively, and is currently working toward the PhD degree in telecommunications engineering at INATEL. During the year of 2014, he was a Hardware Tester with the INATEL Competence Center  ICC. His main interests are: digital communication systems, mobile telecommunications systems, 6G, cognitive radio, convex optimization for telecommunication systems, compressive sensing/learning, embedded systems and embedded hardware/firmware.
L. L. Mendes
L. L. M. received the BSc and MSc degrees from Inatel, Brazil, in 2001 and 2003, respectively, and the Doctor degree from Unicamp, Brazil, in 2007, all in electrical engineering. Since 2001, he has been a Professor with Inatel, where he has acted as the Technical Manager of the Hardware Development Laboratory from 2006 to 2012. From 2013 to 2015, he was a Visiting Researcher with the Technical University of Dresden in the Vodafone Chair Mobile Communications Systems, where he has developed his postdoctoral. In 2017, he was elected Research Coordinator of the 5G Brazil Project, an association involving industries, telecom operators and academia which aims for funding and build an ecosystem toward 5G in Brazil. He is also the technical coordinator of the Brazil 6G Project.
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Souza, P.H.C.d., Mendes, L.L. Lowcomplexity deep unfolded neural network receiver for MIMO systems based on the probability data association detector. J Wireless Com Network 2022, 69 (2022). https://doi.org/10.1186/s13638022021520
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13638022021520