# Decoupled signal detection for the uplink of massive MIMO in 5G heterogeneous networks

- Leonel Arévalo
^{1}Email author, - Rodrigo C. de Lamare
^{1}, - Martin Haardt
^{2}and - Raimundo Sampaio-Neto
^{1}

**2017**:131

https://doi.org/10.1186/s13638-017-0916-1

© The Author(s) 2017

**Received: **8 October 2016

**Accepted: **8 July 2017

**Published: **21 July 2017

## Abstract

Massive multiple-input multiple-output (MIMO) systems are strong candidates for future fifth-generation (5G) heterogeneous cellular networks. For 5G, a network densification with a high number of different classes of users and data service requirements is expected. Such a large number of connected devices needs to be separated in order to allow the detection of the transmitted signals according to different data requirements. In this paper, a decoupled signal detection (DSD) technique which allows the separation of the uplink signals, for each user class, at the base station (BS) is proposed for massive MIMO systems. A mathematical signal model for massive MIMO systems with centralized and distributed antennas in heterogeneous networks is also developed. The performance of the proposed algorithm is evaluated and compared with existing detection schemes in a realistic scenario with distributed antennas. A sum-rate analysis and a computational cost study for DSD are also presented. Simulation results show an excellent performance of the proposed algorithm when combined with linear and successive interference cancellation detection techniques.

### Keywords

Massive multiple-input multiple-output (MIMO) systems Heterogeneous networks Fifth-generation (5G) cellular networks Decoupled signal detection (DSD)## 1 Introduction

Large-scale multiple-input multiple-output (MIMO) systems, also known as massive MIMO, are a promising technology which use a large number of antennas to serve a high number of user terminals at the same time without requiring extra bandwidth resources [1–4]. This new greater scale version of traditional MIMO systems, where a restricted number of antennas is used, is designed to exploit the benefits of extra degrees of freedom obtained by the use of more antennas [5]. Massive MIMO can increase the spectral efficiency 10 times or more when compared with its predecessor [6]. In the Long Term Evolution (LTE) standard [7], which uses traditional MIMO systems, with as many as eight antenna ports at the base station (BS), and operates in the frequency-division duplex (FDD) mode, the users estimate the channel response and feed it back to the BS. For massive MIMO with hundreds of antenna elements, this might not be feasible, due to the large number of channel coefficients that each user needs to estimate being proportional to the number of the antennas at the BS. In this paper we focus on the uplink, the reason is that the most natural transmission mode to operate in massive MIMO is the time-division duplexing (TDD) mode, where a reciprocity between the uplink and downlink channels can be obtained, if we use appropriate calibration techniques to combat the distortions induced by hardware imperfections, since the base station can offer more processing resources aimed at estimating the channel between users’ terminals and the BS. On the downlink, it is possible to use different precoding schemes to mitigate the interference for the received signals at the mobile users. Such precoding schemes rely on the channels estimated on the uplink.

One of the main research challenges of massive MIMO is to develop computationally simple ways to process the large number of signals received at the BS. The interference between antennas and users, propagation effects such as correlation, path loss and shadowing, thermal noise and signal degradation due to the hardware imperfections need to be suppressed. Linear detection techniques such as maximum radio combining (MRC) and zero forcing (ZF) are a good option in terms of computational complexity; however, their performance is not compatible with the growing demand for high data rates. The performance of linear detectors can be improved using some nonlinear sub-optimal detector based on successive interference cancellation (SIC) [8], e.g., multibranch SIC (MB-SIC) [9, 10] and multifeedback SIC (MF-SIC) [11]. However, SIC-based detectors have a considerable computational cost for high-dimensional systems. If all signals from all active users are coupled in the detection process, the BS could spend unnecessary processing resources since some types of users might not require a very high performance.

In the next generation of wireless communication systems [12], it is expected that a large number of users with different configurations and requirements are connected to the network. Therefore, it is necessary to design heterogeneous networks capable of interconnecting the different user types with each other [13]. The received signals from this large number of connected devices such as metering equipment, sensors, environmental monitoring devices, health care gadgets, security management products, smart grid components, smart phones, and tablets need to be separated in order to detect the transmitted information according to their different data requirements. In this context, distributed antenna systems (DAS) with massive MIMO are a promising alternative for the 5G cellular architecture [14], where the BS will be equipped with a large number of antennas and some remote antenna arrays or radio heads will be distributed around the cell and connected to the BS via optical fiber. The signals associated with different remote antenna arrays are processed at the BS. DAS have low path loss effects and improve the coverage and the spectral efficiency [15, 16]. The energy consumption of users is reduced and the transmission quality is improved due to the shorter distances between users and some remote antenna arrays. For this vision of 5G wireless networks, which includes a combination of massive MIMO, heterogeneous networks, and distributed antenna systems, efficient signal processing techniques at the BS are necessary.

In this paper, we propose an algorithm for the uplink of massive MIMO systems to separate the combined received signal of all users at the BS into independent signals for each user class. The proposed decoupled signal detection (DSD) applies a decomposition into multiple independent single user class signals, where all users in a class have the same data requirements and a common complex modulation. Assuming that the channel state information (CSI) was previously estimated, DSD employs a common channel inversion and QR decomposition to decouple the received signal. Applying the proposed algorithm, the computational cost of the signal processing is reduced and it is possible to have flexibility on the detection procedures at the BS. A signal model for heterogeneous networks with different classes of users and an arbitrary configuration of centralized antenna systems (CAS) and DAS is also introduced in this paper. A sum rate analysis and a computational complexity study for the proposed DSD are presented. The performance of the proposed scheme is evaluated in a realistic scenario with propagation effects and compared with existing approaches.

The remainder of this paper is organized as follows. In Section 2 the proposed massive MIMO signal model is presented. The proposed DSD scheme is presented in Section 3. The sum rate analysis for the DSD scheme is described in Section 4. Section 5 presents simulation results. Section 6 gives the conclusions.

## 2 Massive MIMO signal model

*N*different classes of active users transmitting simultaneously signals to one base station (BS) equipped with

*D*remote antenna arrays distributed around the cell and

*N*

_{ B }receive antennas at the BS. Each remote array of antennas has

*Q*antennas linked to the BS via wired links. Therefore, the total number of receive antennas is

*N*

_{ r }=

*N*

_{ B }+

*D*

*Q*. The choice of

*N*

_{ B },

*D*, and

*Q*is made based on the features of the network and type of application scenario. For example, suppose that we have a city with a high density of users or devices in the center of a cell and sparsely distributed users or devices in the remaining part of the cell. In this case, we could use a number of centralized antennas to deal with the high density of users and distributed antennas to serve the remaining devices. The cardinality of the

*n*th user class ∣

*C*

_{ n }∣ represents the number of users of the class

*n*. The total number of active users is given by \(K={\sum \nolimits }_{n=1}^{N}\mid C_{n}\mid \). The

*k*th user in the

*n*th user class transmits data divided into \(N_{t_{k,n}}\) sub-streams through \(N_{t_{k,n}}\) antennas, where \(N_{r}\geqslant N_{t}={\sum \nolimits }_{n=1}^{N}{\sum \nolimits }_{k=1}^{\mid C_{n}\mid }N_{t_{k,n}} \) and

*N*

_{ t }is the total number of transmit antennas. The received signal vector at the BS from all active users in all user classes is given by

**s**

_{ k,n }is the \(N_{t_{k,n}}\times 1\) transmitted signal vector, by the

*k*th user of the

*n*th user class, at one time slot taken from a complex constellation, denoted by \(\mathcal {A}=\{a_{1}, a_{2}, \ldots, a_{O} \}\). Each symbol has

*M*bits and

*O*=2

^{ M }. The vector

**n**is an

*N*

_{ r }×1 zero mean complex circular symmetric Gaussian noise vector with covariance matrix \(\mathbf {K}_{\mathbf {n}}=\mathbb {E}\left [\mathbf {n}\mathbf {n}^{\mathcal {H}}\right ]=\sigma _{n}^{2}\mathbf {I}\). Moreover, \(\bar {\mathbf {H}}_{k,n}\) is the \(\phantom {\dot {i}\!}N_{r} \times N_{t_{k,n}} \) channel matrix of the

*k*th user in the class

*n*with elements \(\bar {h}_{i,j}^{(k,n)}\) corresponding to the complex channel gain from the

*j*th transmit antenna of the

*k*th user to the

*i*th receive antenna. For the antenna elements located at the BS and at each remote radio head, the

*D*+1 sub-matrices of \(\bar {\mathbf {H}}_{k,n}=\left [\left (\bar {\mathbf {H}}_{k,n}^{(1)}\right)^{T}, \left (\bar {\mathbf {H}}_{k,n}^{(2)}\right)^{T}, \ldots, \left (\bar {\mathbf {H}}_{k,n}^{(D+1)}\right)^{T}\right ]^{T}\) can be modeled using the Kronecker channel model [18], expressed by

*k*th user and the

*j*th radio head, obtained from an independent and identically distributed random fading model whose coefficients are complex Gaussian random variables with zero mean and unit variance. \(\mathbf {R}_{r}^{(j)}\) and \(\mathbf {R}_{t_{k,n}}\) denote the receive correlation matrix of the

*j*th radio head and the transmit correlation matrix, respectively. The components of the correlation matrices \(\mathbf {R}_{r}^{(j)}\) and \(\mathbf {R}_{t_{k,n}}\) are modeled as a variation of the model described in equations (3)-(5) of reference [18]:

*N*

_{ a }is the number of antennas and

*ρ*is the correlation coefficient of neighboring antennas (\(\rho =\rho _{t_{x}}\) for the transmit antennas and \(\rho =\rho _{r_{x}}\) for the receive antennas), i.e., a decaying of the correlation index with antenna separation faster than exponential was adopted. Note that

*ρ*=0 represents an uncorrelated scenario and

*ρ*=1 implies a fully correlated scenario. The

*N*

_{ r }×

*N*

_{ r }diagonal matrix Υ

_{ k,n }represents the large-scale propagation effects for the user

*k*of the user class

*n*, such as path loss and shadowing, given by

*k*th user to the

*j*th radio head described by

*j*th radio head, \(d_{k,n}^{j}\) is the relative distance between this user and the

*j*th radio head,

*τ*is the path loss exponent chosen between 2 and 4 depending on the environment. The log normal random variable \(\beta _{k,n}^{j}\) which represents the shadowing between user

*k*and the receiver is given by

*μ*

_{ k,s }is the shadowing spread in decibels and \(\vartheta ^{j}_{k,s}\) corresponds to a real-valued Gaussian random variable with zero mean and unit variance. Since the \(\phantom {\dot {i}\!}N_{r}\times N_{t_{k,n}}\) composite channel matrix includes large-scale and small-scale fading effects, it can be denoted as \(\mathbf {H}_{k,n}=\boldsymbol {\Upsilon }_{k,n}\bar {\mathbf {H}}_{k,n}\), and the expression in (1) can be written as

*n*, respectively. The received signal vector can be expressed more conveniently as

where **H**=[**H**
_{1}
**H**
_{2}…**H**
_{
N
}] and \(\mathbf {s}=\left [\mathbf {s}_{1}^{T} \ \ \mathbf {s}_{2}^{T} \ldots \mathbf {s}_{N}^{T} \right ]^{T}\). The symbol vector **s** of all *N* user classes has zero mean and a covariance matrix \(\mathbf {K}_{\mathbf {s}}=\mathbb {E}\left [\mathbf {s}\mathbf {s}^{\mathcal {H}}\right ]=\text {diag}(\mathbf {p})\), where the elements of the vector **p** are the signal power of each transmit antenna. To maintain a notational simplicity in the subsequent analysis, we assume that all antenna elements at the users transmit with the same average transmitted power \(\sigma _{s}^{2}\), i.e., \(\mathbf {K}_{\mathbf {s}}=\sigma _{s}^{2}\mathbf {I}\). We assume that the channel matrix **H** was previously estimated at the BS. From (9) we can see that the signals arrive coupled at the BS. If we want to use different detection procedures for each user class according to its data requirements, we have to separate the received signal vector **y** into independent received signals for each user class. For the system model presented in this work, when the number of remote radio heads is set to zero, i.e., *D*=0, the DAS architecture is reduced to the CAS scheme with *N*
_{
r
}=*N*
_{
B
}.

## 3 Decoupled signal detection

As presented in Section 2, in heterogeneous networks different classes of users send parallel data streams, through the massive MIMO channel operating with distributed antennas, which arrive superposed at the BS. In this 5G context, we need to separate the data streams for each category of users efficiently. In this section, we describe the proposed decoupled signal detection (DSD) which allows us to separate the received signal of the *n*th user class from the others. To this end, we consider that the process of authentication, identification, and channel estimation was already made, i.e, the BS is able to identify the users by classes according to their data requirements. Similar approaches to the proposed algorithm have been proposed for the downlink, such as block diagonalization (BD)-based techniques [19–22]. However, unlike prior work in which downlink BD is used, for the proposed DSD scheme, it is not necessary to use any precoding at the transmit side. The receiver only needs to know the channels between users and receive antennas. Moreover, the concept of separating the users with respect to the classes in heterogeneous networks according to its requirements is a new approach. The first steps to construct the concept proposed in this paper were presented in [23, 24].

where **H**
_{
n
} and **s**
_{
n
} are the channel matrix and the transmitted symbol vector for the *n*th user class, respectively. From (10), we can see that the *n*th user class has inter-user class interference.

### 3.1 Proposed decoupling strategy

*n*th user class, we can employ a linear operation to project the received signal vector

**y**onto the subspace orthogonal or almost orthogonal to the subspace generated by the signals of the interfering classes. In DSD, a matrix

**A**

_{ n }is calculated employing a channel inversion method and a QR decomposition [25, 26], in order to decouple the

*n*th user’s class received signal from other user’s class signals. To compute

**A**

_{ n }, we construct the matrix \(\tilde {\mathbf {H}}_{n}\) excluding the channel matrix of the

*n*th user class in the following form:

*n*th user class. After that, the objective is to obtain a matrix

**A**

_{ n }that satisfies the following condition:

**A**

_{ n }, DSD first computes the MMSE channel inversion of the combined channel matrix

**H**given by

*n*th user group from the other user groups, we employ a QR decomposition as described by

**Q**

_{ n }is a good approximation for

**A**

_{ n }in (12). Using

**A**

_{ n }=

**Q**

_{ n }as a linear combination with the received signal vector in (10), we have

*n*and the term \(\mathbf {Q}_{n}{{\sum \nolimits }_{\substack {m=1\\ m\neq n}}^{N}} \mathbf {H}_{m}\mathbf {s}_{m}\ \approx \mathbf {0}\) represents the residual inter-user class interference. Then, we can transform the received signal vector into parallel single-user class signals as described by

where \(\check {\mathbf {H}}_{n}=\mathbf {Q}_{n}\mathbf {H}_{n}\in \mathbb {C}^{N_{t_{n}}\times N_{t_{n}}}\) is the equivalent channel matrix of the *n*th user class after DSD and \(\mathbf {n}_{n}=\mathbf {Q}_{n}{{\sum \nolimits }_{\substack {m=1\\ m\neq n}}^{N}} \mathbf {H}_{m}\mathbf {s}_{m}+\mathbf {Q}_{n}\mathbf {n}\in \mathbb {C}^{N_{t_{n}}\times 1}\)is the equivalent noise vector. Note that **H**
^{
†
} in (13) could have been provided by the pseudo-inverse of **H**. This option satisfies the zero interference constraint in (12); however, it does result in a noise enhancement effect and has a restriction in terms of the dimension, i.e., *N*
_{
r
}≥*N*
_{
t
}. The proposed strategy can be used even when *N*
_{
r
}<*N*
_{
t
} and provides a balance between the inter-user class interference and the noise effects since the noise is taken into account in the computation of (13).

*r*

_{ n }is the rank of \(\tilde {\mathbf {H}}_{n}\) that corresponds to the number of non-zero singular values, i.e., \(r_{n}=\text {rank}(\tilde {\mathbf {H}}_{n})\leq N_{t}-N_{t_{n}} \), the SVD can be expressed equivalently as:

*r*

_{ n }left singular vectors, \(\tilde {\mathbf {U}}_{0, n}~\in ~\mathbb {C}^{N_{r} \times \left (N_{r}-r_{n}\right)}\) holds the last

*N*

_{ r }−

*r*

_{ n }left singular vectors, \(\phantom {\dot {i}\!}\tilde {\mathbf {V}}_{1, n}~\in ~\mathbb {C}^{(N_{t}-N_{t_{n}})\times r_{n}}\) consists of the first

*r*

_{ n }singular vectors and \(\phantom {\dot {i}\!}\tilde {\mathbf {V}}_{0, n}~\in ~\mathbb {C}^{(N_{t}-N_{t_{n}})\times (N_{t}-N_{t_{n}}-r_{n})}\) hold the last \(\phantom {\dot {i}\!}N_{t}-N_{t_{n}}-r_{n}\) singular vectors. Thus \(\tilde {\mathbf {U}}_{0, n}\) and \(\tilde {\mathbf {V}}^{\mathcal {H}}_{0, n}\) form an orthogonal basis for the left null space and the null space of \(\tilde {\mathbf {H}}_{n}\), respectively. Then, the alternative solution for (12) could be:

Although the matrix \(\tilde {\mathbf {U}}_{0, n}^{\mathcal {H}}\) eliminates the inter-user class interference effectively, i.e., \(\tilde {\mathbf {U}}_{0, n}^{\mathcal {H}}{\sum _{\substack {m=1\\ m\neq n}}^{N}} \mathbf {H}_{m}\mathbf {s}_{m}=\mathbf {0}\), when we use **y**
_{
n
}=**Q**
_{
n
}
**y** in the first approach the noise effects in the detection procedure are mitigated due to the fact that the computation of **Q**
_{
n
} takes the noise into account. Thus, noise effects are reduced which improves the performance, even in the presence of residual interference. In addition, the equivalent channel matrix \(\check {\mathbf {H}}_{n}^{1}=\tilde {\mathbf {U}}_{0, n}^{\mathcal {H}}\mathbf {H}_{n}\) has dimensions (*N*
_{
r
}−*r*)×*N*
_{
t
} as opposed to the matrix \(\check {\mathbf {H}}_{n}^{2}=\mathbf {Q}_{n}\mathbf {H}_{n}\) which has dimensions \(N_{t_{n}}\times N_{t_{n}}\). For this reason the computational complexity of the detector is lower if we use the matrix **Q**
_{
n
} to decouple the received signal vector.

The fact that we obtain a square equivalent channel matrix also allows the possibility of using lattice reduction (LR)-based detectors which have a better performance for square channel matrices [27]. Further, the computational complexity to compute the channel inversion (13) and *N* QR decompositions (15) of matrices with dimensions \(N_{t_{n}}\times N_{r}\) is much lower than the computational cost of computing *N* SVD transformations (19) of matrices with dimensions \(\phantom {\dot {i}\!}N_{r}\times (N_{t}-N_{t_{n}})\). For these reasons, in this paper we will focus on the first alternative.

As it will be presented in the next section, the equivalent received signal vector in (18) shows that the process in (11)-(17) is an effective algorithm to separate the user classes at the BS and we can consider the data stream of the *n*th user class as independent of the received signals of the other user classes. In practice, this allows the possibility of using different transmission and reception schemes for each user class. We can now implement the traditional detectors for each class of users separately which also allows the possibility of using more complex detection schemes due to the reduction of the dimensions of the matrices that need to be processed. The description of the proposed algorithm is presented in Algorithm 1.

### 3.2 Detection algorithms

In this subsection, we examine signal detection algorithms for massive MIMO in heterogeneous networks. To detect the data stream for each class of users independently, we assume that the DSD algorithm described in Algorithm 1 was previously employed.

#### 3.2.1 Linear detectors

*n*th user class \(\mathbf {y}_{n} \in \mathbb {C}^{N_{t_{n}}\times 1}\) is processed by a linear filter to eliminate the channel effects [28]. The two linear detectors considered here are given by

*n*th user class. Note that for the MMSE detector, we consider the autocorrelation matrix of the equivalent noise vector as \(\mathbf {K}_{\mathbf {n}_{n}}\approx \sigma _{n}^{2}\mathbf {I}\). As the residual interference is very small, an excellent performance can be obtained with this approximation. The linear hard decision of

**s**

_{ n }is carried out as follows:

where the function \(\mathbb {C}(x)\) returns the point of the complex signal constellation closest to *x*. The linear detectors have a lower computational complexity when compared with the non-linear detectors. However, due to the impact of interference and noise, linear detectors offer a limited performance.

#### 3.2.2 Successive interference cancellation

*n*th user class in (18) consists of a bank of linear detectors, each detects a selected component

*s*

_{ n,i }of

**s**

_{ n }. The component obtained by the first detector is used to reconstruct the corresponding signal vector which is then subtracted from the equivalent received signal to further reduce the interference in the input to the next linear receive filter. The successively cancelled received data vector that follows a chosen ordering in the

*i*th stage is given by

where \(\check {\mathbf {h}}_{n,j}\) correspond to the columns of the channel matrix \(\check {\mathbf {H}}_{n}\) and \(\hat {s}_{n,j}\) is the estimated symbol obtained at the output of the *j*th linear detector.

#### 3.2.3 Multiple-branch SIC detection

*n*th user class, different orderings are explored for SIC, each ordering is referred to as a branch, so that a detector with

*L*branches produces a set of

*L*estimated vectors. Each branch uses a column permutation matrix

**P**

_{ n }. The estimate of the signal vector of branch

*l*, \(\hat {\mathbf {x}}_{n}^{(l)}\), is obtained using a SIC receiver based on a new channel matrix \(\check {\mathbf {H}}_{n}^{(l)} =\check {\mathbf {H}}_{n}\mathbf {P}_{n}^{(l)}\). The order of the estimated symbols is rearranged to the original order by

Other detectors could be used with the proposed DSD technique and this is up to the designer to choose the detector.

## 4 Sum-rate analysis

**H**was previously estimated at the BS, assume Gaussian signalling and that the received signal vector was decoupled for each user class. Considering the received signal vector as presented in (18), the sum rate [29] that DSD can offer is defined as

*n*th user class, respectively. It is easy to show that (26), can be expressed as

_{ n }is a diagonal matrix whose diagonal elements are the eigenvalues of the matrix \(\mathbf {B}_{n}\mathbf {B}_{n}^{\mathcal {H}}\). Then, the reliable sum rate that the system can offer is

*λ*

_{ i,n }in (29) can be obtained computing the eigenvalues of \(\mathbf {B}_{n}^{\mathcal {H}}\mathbf {B}_{n}\). As mentioned before, for notational simplicity we assume that \(\mathbf {K}_{\mathbf {s}_{n}}=\sigma _{s}^{2}\mathbf {I}\). When the DSD algorithm is applied, the equivalent noise vector for the

*n*th user class \(\mathbf {n}_{n}={\mathbf {Q}_{n}\sum _{\substack {m=1\\ m\neq n}}^{N} \mathbf {H}_{m}\mathbf {s}_{m}+\mathbf {Q}_{n}\mathbf {n}}\in \mathbb {C}^{N_{t_{n}}\times 1}\)is not white due to the residual inter-user class interference. Then its autocorrelation matrix is given by

*λ*

_{ i,n }in (29) are obtained from the eigenvalues of matrix \(\mathbf {B}_{n}^{\mathcal {H}}\mathbf {B}_{n}\) given by

*n*from the above analysis and considering that \(\mathbf {K}_{\mathbf {s}}=\sigma ^{2}_{s}\mathbf {I}\) and \(\mathbf {K}_{\mathbf {n}}=\sigma _{n}^{2}\mathbf {I}\), we get the well-known expression:

*λ*

_{ i }are the eigenvalues of the matrix \(\mathbf {B}^{\mathcal {H}}\mathbf {B}=\frac {\sigma ^{2}_{s}}{\sigma _{n}^{2}}\mathbf {H}^{\mathcal {H}}\mathbf {H}\) [30]. In Appendix 1, we show that, as well as the sum rate in (32) when all user classes are detected together, the sum rate in (29) for the proposed algorithm is independent of the detection procedure. However, the lower bound on the achievable uplink sum rate obtained by using linear detectors is different for each detector [31]. In order to analyze the behavior of the lower bound on the sum rate for the proposed scheme, we consider that a linear detector according to (21) is applied to the equivalent received signal vector (18) to detect the transmitted symbol vector of the user class

*n*, then we have

*k*th element of \(\tilde {\mathbf {y}}_{n}\) we have

where **w**
_{
k,n
} is the *k*th row of **W**
_{
n
}. Modeling the noise interference, the inter-user class interference and the inter-user interference in the user class *n* in (34) as additive Gaussian noise independent of *s*
_{
k,n
}, considering (30) and that the channel is ergodic so that each codeword spans over a large number of realizations, we obtain the lower bound on the achievable rate for the DSD algorithm with linear detectors as described in (35).

For the SIC receiver, each stream is filtered by a linear detector and then, its contribution is subtracted from the received signal to improve the subsequent detection. For each layer the linear filter is recalculated. The performance of SIC detectors can be improved if we choose the cancellation order as a function of the SINR at the output of the linear detector in each layer. The lower bound for the sum rate of the proposed algorithm when a SIC detector is used for each user class, could be calculated updating the expression (35) in each layer, i.e., the values of **w**
_{
k,n
} are recalculated for each detected stream.

## 5 Numerical results

In this section, we evaluate the performance of the proposed algorithm with different detectors in terms of the sum rate and the BER via simulations. Moreover, the computational complexity of the proposed and existing algorithms is also evaluated.

### 5.1 Sum rate

To evaluate the analytic results obtained in Section 4, the sum rate and the lower bounds for the proposed algorithm with different detection schemes will be evaluated considering CAS and DAS configuration assuming perfect CSI. For the CAS configuration, we employ *L*
_{
k
}=0.7, *τ*=2, the distance *d*
_{
k
} to the BS is obtained from a uniform discrete random variable distributed between 0.1 and 0.99, the shadowing spread is *σ*
_{
k
}=3 dB and the transmit and receive correlation coefficients are *ρ*
_{
rx
}=0.2 and *ρ*
_{
tx
}=0.4 (when \(N_{t_{k,n}}>1\)), respectively. For DAS configurations, we consider a densely populated cell, where a fraction of the active users are in the center of the cell and the remaining users are in other locations of the cell. We explore different particular values for the fraction of users in the center and around the cell. Based on that, we choose specific values for *N*
_{
B
}, *D*, and *Q*. For the DAS configuration, we also consider *L*
_{
k,j
} taken from a uniform random variable distributed between 0.7 and 1, *τ*=2, the distance *d*
_{
k,j
} for each link to an antenna is obtained from a uniform discrete random variable distributed between 0.1 and 0.5, the shadowing spread is *σ*
_{
k,j
}=3 dB and the transmit and receive correlation coefficients are *ρ*
_{
rx
}=0.2 and *ρ*
_{
tx
}=0.4, respectively.

*N*

_{ B }=1/2

*N*

_{ r }antennas at the BS. We also consider

*D*=4 arrays of antennas distributed around the cell, each equipped with

*Q*=1/8

*N*

_{ r }antennas. For Fig. 2 a, we consider

*N*=4 classes of users with ∣

*C*

_{ n }∣=8 users each and \(N_{t_{k,n}}=1\) antenna per user. We can see that the sum rate of the proposed DSD algorithm is close to the sum rate of the traditional MIMO system and with a low computational complexity on the detection procedures as will be shown in the next subsection. For Fig. 2 b, we consider 16 active users in the system and that we need to detect each user independently, i.e.,

*N*=16 classes of users with ∣

*C*

_{ n }∣=1 user at each class and \(N_{t_{k,n}}=2\) antennas per user. Under these conditions, the sum rate of the proposed scheme reaches the sum rate of the traditional MIMO system, especially for a large number of receive antennas. From the plot in Fig. 2, we can see that the sum rate for DAS is higher than that for the CAS configuration.

*N*=3 classes of users with ∣

*C*

_{ n }∣=10 users at each class and \(N_{t_{k,n}}=1\) antenna per user. We can see from the plot that, similarly to the traditional MIMO systems, the lower bound on the sum rate for ZF and MMSE achieves the sum rate when

*N*

_{ r }grows. For Fig. 3 b, we consider

*N*=2 classes of users with ∣

*C*

_{ n }∣=16 users at each class and \(N_{t_{k,n}}=1\) antennas per user. We can see that the SIC-MMSE achieves the sum rate and it could be considered optimal in terms of sum rate.

*N*=8 classes of users with ∣

*C*

_{ n }∣=1 user in each class and \(N_{t_{k,n}}=8\) antennas per user transmitting with high correlation between antennas

*ρ*

_{ tx }=0.85. We consider the DAS configuration with

*N*

_{ B }=96,

*D*=4, and

*Q*=8. We can see from the plot that the lower bounds for the proposed algorithms are very close to the lower bounds when the detection procedure is carried out together for all users. Since ZF DSD separates the user classes by computing a MMSE matrix which takes the noise component into account, it will be the only detector that outperform its coupled counter part.

Except for the ZF-DSD case, there is a small loss in the performance for DSD schemes. However, with the current processor technologies, to use a SIC MMSE in a 256×256 massive MIMO system could be infeasible. By dividing the users into classes of users, the computational complexity in the detection procedure decreases significantly and makes the use of the SIC-MMSE and more complex detectors feasible.

### 5.2 Computational complexity analysis

In this subsection, the computational complexity of the proposed algorithm is evaluated and compared with the traditional coupled detection schemes, when all user classes are detected together, by counting the number of floating point operations (FLOPs) per received vector **y**. Different detection schemes are considered such as MMSE, SIC, and MB-SIC. The SIC-based receivers all use MMSE detection. Furthermore, the single-branch SIC and the first branch of the MB-SIC employ norm-based ordering. We consider QPSK modulation; however, the computational cost in these detectors does not change significantly with the modulation order. The number of FLOPs for the complex QR decomposition of an \(N_{t_{n}}\times N_{r}\) matrix is given in [32] as \(16\left (N_{r}^{2}N_{t_{n}}-N_{t_{n}}^{2}N_{r}+1/3N_{t_{n}}^{3}\right)\). To compute the number of FLOPs required for the remaining operations, we use the Light-speed Matlab toolbox [33].

*K*=100 active users, \(N_{t_{k,n}}=2\) transmit antennas per user and

*N*

_{ r }=200 receive antennas distributed around the cell. We consider an increasing number of classes of users, when

*K*is not divisible by the number of classes, the number of active users is set to a smaller value so as to allow the division in

*N*classes. We can see from the figure that the complexity of the SIC and the MB-SIC detectors with the DSD algorithm is lower than the SIC and the MB-SIC coupled detectors, respectively. Furthermore, for receivers with DSD, the computational complexity is reduced as the number of user classes is increased. This fact represents an important advantage for receivers with DSD, due to the fact that it allows the use of more complex detectors for each user class according to its data requirements.

*N*=5 classes of users, \(N_{t_{k,n}}=2\) transmit antennas per user and \(\phantom {\dot {i}\!}N_{r}=3{KN}_{t_{k,n}}\) receive antennas. The MB-SIC and SIC detectors with DSD have a lower computational cost than the coupled SIC detector. For this 5G context, with a high number of antennas, efficient coupled detectors are not feasible to be implemented; however, if a specific user class requires the benefits of complex detectors, the DSD algorithm reduces the cost so that more complex detectors could be applied as illustrated with MB-SIC in Fig. 6. For the results in Fig. 7, we consider that we have

*K*=10 active users and that we need to detect each user independently, i.e.,

*N*=10. The number of transmit antennas per user \(N_{t_{k,n}}\) is increased. We also consider that the number of receive antennas distributed around the cell is given by \(\phantom {\dot {i}\!}N_{r}=2{KN}_{t_{k,n}}\). The MB-SIC and SIC detectors with DSD have a significantly lower complexity when compared with the SIC detector where all users are coupled.

It is worth noting that the curves displayed in Figs. 5, 6, and 7 will have a substantial decrease if the channel does not change over a time period due to quasi static channels. In this case, the equivalent channel matrices for each user class are stored for subsequent use. It would increase the gap, in terms of the computational cost, for the detection schemes using the DSD algorithm.

### 5.3 BER performance

*K*active users is considered. We also consider the DAS configuration where the

*N*

_{ r }=

*N*

_{ B }+

*D*

*Q*receive antennas are distributed around the cell in

*D*radio heads with

*Q*antennas each and the remaining

*N*

_{ B }antennas are located at the BS. We consider QPSK modulation. The SNR per transmitted information bit is defined as

where \(\sigma _{s}^{2}\) is the common variance of the transmitted symbols, \(\sigma _{n}^{2}\) is the noise variance at the receiver and *M* is the number of transmitted bits per symbol. The numerical results correspond to an average of 3000 simulations runs, with 500 *N*
_{
t
} symbols transmitted per run. For the *N*
_{
B
} antennas at the BS, we employ *L*
_{
k
}=0.3, *τ*=2, the distance to the users is obtained from a uniform discrete random variable distributed between 0.4 and 0.7, the shadowing spread is *σ*
_{
k
}=1 dB and the transmit and receive correlation coefficients are equal to *ρ*
_{
rx
}=0.4. For the *R* remote arrays of antennas, we use *L*
_{
k,j
} taken from a uniform random variable distributed between 0.3 and 0.5, the shadowing spread *σ*
_{
k,j
}=1 dB and the receive correlation coefficients are equal to *ρ*
_{
rx
}=0.5. When the number of transmit antennas at the users is \(N_{t_{k,n}}>1\), the transmit correlation coefficient is equal to *ρ*
_{
tx
}=0.55.

*K*=12 user devices, where each user is equipped with \(N_{t_{k,n}}=3\) transmit antennas and

*N*=3 classes of users. The system employs perfect channel state information and QPSK modulation. For the DAS configuration, we consider

*N*

_{ B }=8 receive antennas at the BS,

*D*=4 remote radio heads, and

*Q*=7 receive antennas per remote radio head. We can see from the figure that the decoupled SIC detection presents a performance close to the coupled SIC detector with a difference around 2 dB in the high SNR region. In addition the decoupled SIC detector has a drastic reduction in the computational cost when compared with its coupled version. The result also indicates a remarkable superiority in the performance for the MB-SIC receiver with the DSD scheme over the coupled SIC detector which also has a lower computational complexity than the coupled SIC detector.

*K*=8 active users, each user transmitting with \(N_{t_{k,n}}=8\) antennas. We also consider that we need to detect each user independently of each other, i.e.,

*N*=8 classes of users with ∣

*C*

_{ n }∣=1 user per class. For the DAS configuration, we consider

*N*

_{ B }=64 receive antennas at the BS,

*D*=8 remote radio heads, and

*Q*=8 receive antennas per remote radio head. The Fig. 9 indicates that the performance of the SIC detector with DSD is close to the SIC detector with a lower computational complexity. Note that the results for MB-SIC with DSD show a very good performance with a computational complexity much lower than the SIC detector without DSD.

*K*=64 active users,

*N*=4 classes of users, ∣

*C*

_{ n }∣=16 users per class, and \(N_{t_{k,n}}=1\) transmit antenna per user. For the DAS configuration, we consider

*N*

_{ B }=34 receive antennas at the BS and

*Q*=7 receive antennas per remote radio head. To show the behavior of the BER performance with different numbers of distributed antennas, we consider two configurations of remote radios heads,

*D*=8 and

*D*=6. As expected, the results shows that when the number of RRHs is increased, the BER performance is improved due to the low propagation effects caused by the short distances between users and some remote antenna arrays. We also can see from the figure that the SIC and the MB-SIC detector with DSD offer an excellent BER performance with a low computational cost.

## 6 Conclusions

In this paper, a mathematical signal model and the DSD algorithm for the uplink of massive MIMO systems operating in heterogeneous cellular networks with different classes of users using CAS and DAS configurations have been presented. The proposed algorithm allows one to separate the received signals for each category of users efficiently into independent parallel single user class signals at the receiver side, applying a common channel inversion and QR decomposition and assuming that the channel matrix was previously estimated. With the proposed scheme, it is possible to handle different classes of users in heterogeneous networks and to use different modulation and/or detection schemes for each class according to its data service requirements. The main advantage of DSD is the reduction in the computational cost of efficient detection schemes that, for its high computational complexity, are not feasible for implementation when the signals received from all active users are coupled.

## 7 Appendix 1

*n*th user class as in (33). If we define the matrix \(\bar {\mathbf {A}}_{n}=\mathbf {W}_{n}\check {\mathbf {H}}_{n}\) and the vector \(\bar {\mathbf {n}}_{n}=\mathbf {W}_{n}\mathbf {n}_{n}\) we can rewrite (33) as

*n*th user class after the detection is given by

Note that (41) and (31) will yield the same result which proves that the sum rate for the DSD algorithm is independent of the linear detection procedure.

## Notes

## Declarations

### Acknowledgements

This work was supported in part by grants of CNPq, FAPERJ, and DAAD.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- TL Marzetta, Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE Trans. Wireless Commun.
**9**(11), 3590–3600 (2010).View ArticleGoogle Scholar - EG Larsson, O Edfors, F Tufvesson, TL Marzetta, Massive MIMO for next generation wireless systems. IEEE Commun. Mag.
**52**(2), 186–195 (2014).View ArticleGoogle Scholar - F Rusek, D Persson, B Lau, E Larsson, T Marzetta, O Edfors, F Tufvesson, Scaling up MIMO: Opportunities, and challenges with very large arrays. IEEE Signal Process. Mag.
**30**(1), 40–60 (2013).View ArticleGoogle Scholar - RC de Lamare, Massive MIMO systems: Signal processing challenges and future trends. URSI Radio Science Bulletin (2013).Google Scholar
- T Marzetta, Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE Trans. Wireless Commun.
**9**(11), 3590–3600 (2010).View ArticleGoogle Scholar - EG Larsson, F Tufvesson, O Edfors, TL Marzetta, Massive MIMO for next generation wireless systems. IEEE Commun. Mag.
**52**(2), 186–195 (2014).View ArticleGoogle Scholar - 3GPP-LTE, Technical specification group radio access network: evolved universal terrestrial radio access (E-UTRA): further advancements for E-UTRA physical layer aspects (release 13). 3GPP TR136.942 (2016).Google Scholar
- GD Golden, CJ Foschini, RA Valenzuela, PW Wolniansky, Detection algorithm and initial laboratory results using V-BLAST space-time communication architecture. Electron. Lett.
**35**(1), 14–16 (1999). doi:10.1049/el:19990058.View ArticleGoogle Scholar - RC de Lamare, R Sampaio-Neto, Minimum mean-squared error iterative successive parallel arbitrated decision feedback detectors for DS-CDMA systems. IEEE Trans. Commun.
**56**(5), 778–789 (2008).View ArticleGoogle Scholar - RC de Lamare, Adaptive and iterative multi-branch MMSE decision feedback detection algorithms for multi-antenna systems. IEEE Trans. Wirel. Commun.
**12**(10), 5294–5308 (2013). doi:10.1109/TWC.2013.092013.130233.View ArticleGoogle Scholar - P Li, RC de Lamare, R Fa, Multiple feedback successive interference cancellation detection for multiuser MIMO systems. IEEE Trans. Wirel. Commun.
**10**(8), 2434–2439 (2011).View ArticleGoogle Scholar - P Demestichas, A Georgakopoulos, D Karvounas, K Tsagkaris, V Stavroulaki, J Lu, C Xiong, J Yao, 5G on the horizon: Key challenges for the radio-access network. IEEE Veh. Technol. Mag.
**8**(3), 47–53 (2013).View ArticleGoogle Scholar - H Pervaiz, L Musavian, Q Ni, in 2015 IEEE International Conference on Communication Workshop (ICCW). Area energy and area spectrum efficiency trade-off in 5G heterogeneous networks, (2015), pp. 1178–1183.Google Scholar
- W Roh, A Paulraj, MIMO channel capacity for the distributed antenna systems. IEEE Veh. Technol. Conf.
**3:**, 1520–1524 (2002).Google Scholar - D Wang, J Wang, X You, Y Wang, M Chen, X Hou, Spectral efficiency of distributed MIMO systems. IEEE J. Selected Areas Commun.
**31**(10), 2112–2127 (2002).View ArticleGoogle Scholar - F Römer, M Fuchs, M Haardt. Distributed MIMO systems with spatial reuse for highspeed indoor mobile radio access. 20th Meeting of the Wireless World Research Forum (WWRF), Ottawa, Canada, (2008).Google Scholar
- J Andrews, S Buzzi, W Choi, S Hanly, A Lozano, A Soong, J Zhang, What will 5g be?IEEE J. Selected Areas Commun.
**32**(6), 1065–1082 (2014).View ArticleGoogle Scholar - J Kermoal, L Schumacher, K Perdersen, et al, A stochastic MIMO radio channel model with experimental validation. IEEE J. Selected Areas Commun.
**20**(6), 1211–1226 (2002).View ArticleGoogle Scholar - QH Spencer, AL Swindlehurst, M Haardt, Zero-forcing methods for downlink spatial multiplexing in multiuser MIMO channels. IEEE Trans. Signal Process.
**52**(2), 461–471 (2004).MathSciNetView ArticleGoogle Scholar - LU Choi, RD Murch, A transmit preprocessing technique for multiuser MIMO systems using a decomposition approach. IEEE Trans. Wireless Commun.
**3**(1), 20–24 (2004).View ArticleGoogle Scholar - V Stankovic, M Haardt, Generalized design of multi-user MIMO precoding matrices. IEEE Trans. Wireless Commun.
**7**(3), 953–961 (2008).View ArticleGoogle Scholar - CB Chae, S Shim, RW Heath, Block diagonalized vector perturbation for multiuser MIMO systems. IEEE Trans. Wireless Commun.
**7**(11), 4051–4057 (2008).View ArticleGoogle Scholar - L Arévalo, RC de Lamare, M Haardt, R Sampaio-Neto, in 2015 IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP). Uplink block diagonalization for massive MIMO-OFDM systems with distributed antennas, (2015).Google Scholar
- V Stankovic, M Haardt. Improved diversity on the uplink of multi-user MIMO systems. 2005 European Conference on Wireless Technology (EuWiT), (2005), pp. 113–116.Google Scholar
- K Zu, RC de Lamare, M Haardt, Generalized design of low-complexity block diagonalization type precoding algorithms for multiuser MIMO systems. IEEE Trans. Commun.
**61**(10), 4232–4242 (2013).View ArticleGoogle Scholar - H Sung, S Lee, I Lee, Generalized channel inversion methods for multiuser MIMO systems. IEEE Trans. Commun.
**57**(11), 3489–3499 (2009).View ArticleGoogle Scholar - L Arévalo, RC de Lamare, K Zu, R Sampaio-Neto. Multi-branch lattice reduction successive interference cancellation detection for multiuser MIMO systems. IEEE International Symposium on Wireless Communications Systems, (2014), pp. 219–223.Google Scholar
- A Paulraj, R Nabar, D Gore, Introduction to space-time wireless communications (Cambridge University Press, 2003).Google Scholar
- CK Wen, S Jin, KK Wong, On the Sum-Rate of Multiuser MIMO Uplink Channels with Jointly-Correlated Rician Fading. IEEE Trans. Commun.
**59**(10), 2883–2895 (2011).View ArticleGoogle Scholar - DNC Tse, P Viswanath, Fundamentals of wireless communications (Cambridge University Press, 2005).Google Scholar
- HQ Ngo, EG Larsson, TL Marzetta, Energy and spectral efficiency of very large multiuser MIMO systems. IEEE Trans. Commun.
**61**(4), 1436–1449 (2013).View ArticleGoogle Scholar - G Golub, CV Loan, Matrix computations (The Johns Hopkins University Press, 1996).Google Scholar
- T Minka, The Lightspeed Matlab toolbox, efficient operations for Matlab programming, version 2.2 (Microsoft Corp, 2007).Google Scholar