Skip to content


  • Research
  • Open Access

EM-based parameter iterative approach for sparse Bayesian channel estimation of massive MIMO system

EURASIP Journal on Wireless Communications and Networking20172017:185

Received: 28 March 2017

Accepted: 14 October 2017

Published: 6 November 2017


One of the main challenges for a massive multi-input multi-output (MIMO) system is to obtain accurate channel state information despite the increasing number of antennas at the base station. The Bayesian learning channel estimation methods have been developed to reconstruct the sparse channel. However, these existing methods depend heavily on the channel distribution. In this paper, based on sparse Bayesian method, an expectation maximization-based parameter iterative approach is proposed to estimate the massive MIMO channel with unknown channel distribution. Using the approximate sparse feature, the massive MIMO channel is modeled as a non-zero Gaussian mixture and the sparse Bayesian channel estimation is introduced. The channel marginal probability density function is expressed by using the general approximate message-passing algorithm. All of the required channel parameters are iteratively estimated by the EM method. Simulation results show that the proposed scheme enables evident performance in channel estimation accuracy with a lower complexity when channel distribution is unknown.


  • Massive MIMO
  • Expectation maximization
  • Sparse Bayesian learning
  • General approximate message passing

1 Introduction

Massive multi-input multi-output (MIMO) [1], which equipping a large number antennas at the base station (BS) to simultaneously serve tens of users in the same time-frequency channel, is widely considered as one of the key techniques for future communication network. Such systems can greatly improve the system capacity and energy efficiency by exploiting the increased degree of spatial freedom. Accurate downlink channel state information (CSI) is essential for massive MIMO systems to realize high-speed communication, and it is also vital for signal detection, beam forming, and other operations. However, it is difficult for the BS to obtain accurate downlink CSI due to the prohibitively high overhead used for the downlink channel estimation as the number of antennas increases in massive MIMO systems. Most of the researches exploit the time division duplex(TDD) protocol to overcome this challenge. In TDD, the downlink CSI could be achieved through the uplink channel estimation by using the channel reciprocity property. The CSI in the uplink is more easily acquired at the BS due to the limited number of users. However, the CSI acquired in the uplink is not always accurate for the downlink. In addition, current wireless networks are mainly dominated by the frequency division duplex (FDD) protocol. Thus, it is of great importance to explore the downlink CSI. Compressed sensing (CS), which can reconstruct the sparse channel through few pilots, is viewed as a promising method to solve this problem.

Experimental results in [2, 3] show that the massive MIMO channel is approximate sparse due to the limited scatters at the BS. By exploiting the temporal correlation and the sparsity of massive MIMO channels in time-domain, [4] applied a sparse channel estimation scheme and sharply reduced the pilot overhead. But the sparsity in time domain maybe disappear with the increasing number of scatters at the user side [5]. Gao et al. [6] exploited the spatial correlation of massive MIMO channels and estimated massive MIMO channels through CS. In [7], both spatial and temporal correlation were considered, and a structured compressive sensing (SCS)-based spatio-temporal joint channel estimation scheme is proposed to further reduce the pilot overhead and improve the estimation accuracy. These schemes estimate channels mostly based on greedy algorithms, such as the orthogonal matching pursuit (OMP) algorithm, the subspace pursuit (SP) algorithm, etc. The sparse Bayesian learning [8] has been developed to estimate the sparse channel recently. It takes channel distribution into consideration, resulting in a high accuracy of estimator. However, the parameters of channel distribution are hard to obtain in practice.

In this paper, an expectation maximization (EM)-based parameter iterative approach is proposed to estimate the massive MIMO channel based on sparse Bayesian method. With a large antenna array, the massive MIMO channel matrix is approximate sparse in the beam domain. We model the channel non-zero distribution as Gaussian mixture (GM) and introduce expectation maximization (EM) algorithm to learn all the required channel parameters. And all of the parameters needed for EM update are computed by the general approximate message passing (GAMP) [9, 10] algorithm, which provides a huge gain in reducing the computational complexity. Hence, our approach does not require any channel parameters, which can adaptively adjust channel parameters by using EM algorithm. Numerical results show that the proposed scheme can accurately estimate the massive MIMO channel with a much reduced complexity while channel distribution is unknown. Compared with the OMP algorithm, the proposed EM-GAMP algorithm has a much better performance. Moreover, the accuracy of the EM-GAMP algorithm is not affected by the number of antennas at the BS.

The rest of this paper is organized as follows. In Section 2, we describe the system model and introduce the sparse Bayesian channel estimation. In Section 3, we propose an EM-based parameter iterative approach to estimate the massive MIMO channel based on sparse Bayesian method. The performance evaluation is described in Section 4. Finally, we conclude the paper in Section 5.

Notations. We use boldface capital letters like A to denote matrices. For any matrix \(\mathbf {A}\in \mathbb {C}^{N\times M}\), A nm denotes the (n,m)th element of A. A T and A H are the transpose and the conjugate transpose of A, respectively. I n denotes the identity matrix of size n×n. For a Gaussian random vector x with mean μ and variance Ω, we denote the pdf by \(\mathcal {N}(x;\mathbf {\mu },\mathbf {\Omega })\). In addition, we use \(\mathcal {N}_{\mathbb {C}}(x;\mathbf {\mu },\mathbf {\Omega })\) to denote its circular complex Gaussian counterpart. E{·} and δ(·) denote the expectation operator and Dirac delta, respectively.

2 Sparse Bayesian channel estimation

2.1 Sparse channel model

Consider a multi-user massive MIMO system with one cell. There is one BS and K users in the network, where the BS is equipped with M antennas and each user has a single antenna. During the downlink training phase, the BS simultaneously transmits a sequence of N training pilot symbols to all users. Denote the pilot sequences as a matrix \(\mathbf {S}\in \mathbb {C}^{N\times M}\). The channel matrix between the users and the BS is denoted by \(\mathbf {G}\triangleq \,[\mathbf {g}_{1},\mathbf {g}_{2},\dots,\mathbf {g}_{K}]\in \mathbb {C}^{M\times K}\), where g k is the channel vector from the k-th users to the BS. The received signal Y at the user side can be expressed as
$$\begin{array}{@{}rcl@{}} \mathbf{Y} \triangleq \mathbf{SG} + \mathbf{W}, \end{array} $$

where \(\mathbf {W}\in \mathbb {C}^{N\times K}\) is the white Gaussian noise.

Accurate CSI is the basic of resource allocation, precoding, and other operations in massive MIMO systems. The common channel estimation algorithms, such as least square (LS) and minimum mean square error (MMSE) methods, usually exploit the pilot sequence S and the received signal Y to reconstruct the channel. However, due to massive antennas at the BS, traditional methods cannot be applied to massive MIMO channel estimation directly. Because the pilot overhead, which is proportional to the number of antennas at the BS, will be prohibitively high. Current researches show that the massive MIMO channels exhibit the sparsity in the beam domain. Wen et al. [9] points out that the channel vector g k can be written as
$$\begin{array}{@{}rcl@{}} \mathbf{g}_{k}^{\mathsf{T}} \triangleq \mathbf{v}_{k}^{\mathsf{T}}\mathbf{R}_{k}^{\frac{1}{2}}, \end{array} $$
where R k is the channel covariance matrix and \(\mathbf {v}_{k}\sim \mathcal {N}_{\mathbb {C}}(\mathbf {v}_{k};\mathbf {0},\mathbf {I}_{M})\). After taking the discrete Fourier transform (DFT) [11] of g k , we have
$$\begin{array}{@{}rcl@{}} \mathbf{h}_{k}^{\mathsf{T}} \triangleq \mathbf{g}_{k}^{\mathsf{T}}\mathbf{F} = \mathbf{v}_{k}^{\mathsf{T}}\mathbf{R}_{k}^{\frac{1}{2}}\mathbf{F}, \end{array} $$
where \(\mathbf {h}_{k}^{\mathsf {T}}\) is the channel representation of \(\mathbf {g}_{k}^{\mathsf {T}}\) in beam domain and F is the DFT matrix. The channel matrix in beam domain is proved to be approximate sparse [12]. With the number of antennas at the BS increasing, the channel covariance matrix R k can be represented by
$$\begin{array}{@{}rcl@{}} \mathbf{R}_{k} \triangleq \mathbf{F}\boldsymbol{\Lambda}_{k}\mathbf{F}^{\mathsf{H}}, \end{array} $$

where Λ k is a diagonal matrix whose diagonal elements are the eigenvalues of R k .

Plugging (4) into (3) and simplifying, we can obtain the sparse channel matrix
$$\begin{array}{@{}rcl@{}} \mathbf{h}_{k}^{\mathsf{T}} = \mathbf{v}_{k}^{\mathsf{T}}\mathbf{R}_{k}^{\frac{1}{2}}\mathbf{F} = \mathbf{v}_{k}^{\mathsf{T}}\mathbf{F}\boldsymbol{\Lambda}_{k}^{\frac{1}{2}}. \end{array} $$

We can get the received signal in the beam domain by taking the DFT of (1) both side: \(\mathbf {Y}^{B} \triangleq \mathbf {SH} + \mathbf {W}^{B}\), where Y B = Y F, H=G F, W B =W F. Simulation results in [12] point out that over 99% of the channel power is located only within 16% of the beam indices, so we can conclude that H is approximately sparse. The CS techniques can reconstruct the sparse signal from a small set of linear measurements; thus, it is feasible for massive MIMO channel estimation.

2.2 Sparse Bayesian channel estimation

The received signal vector at the kth user can be expressed as
$$\begin{array}{@{}rcl@{}} \mathbf{y}_{k}^{B} = \mathbf{S}\mathbf{h}_{k} + \mathbf{w}_{k}^{B}. \end{array} $$
Each element of sparse vector h k is a Gaussian random variable with different variances, so we assume the elements of h k follow Gaussian-mixture distributions:
$$ \begin{aligned} P_{\mathbf{h}_{k}}(h_{km};\mathbf{\lambda}_{k},\pmb{\rho}_{k},\pmb{\theta}_{k},\pmb{\sigma}_{k}) \triangleq& (1-\lambda_{k})\delta(h_{km})\\ &+\lambda_{k}\sum_{l=1}^{L}{\rho_{kl}\mathcal{N}(h_{km};\theta_{kl},\sigma_{kl})}, \end{aligned} $$
where λ k is the sparsity rate and L is the number of mixture components; \(\pmb {\rho }_{k}\triangleq [\rho _{k1},\rho _{k2},\dots,\rho _{kL}]^{\mathsf {T}}\), \(\pmb {\theta }_{k}\triangleq \left [\theta _{k1},\theta _{k2},\dots,\theta _{kL} \right ]^{\mathsf {T}}\) and \(\pmb {\sigma }_{k}\triangleq \left [\sigma _{k1},\sigma _{k2},\dots,\sigma _{kL}\right ]^{\mathsf {T}}\) are the weight, mean and variance matrix of the k-th GM component, respectively. In addition, \(\sum _{l=1}^{L}{\rho _{kl}}=1\). We assume that the noise \(\mathbf {w}_{k}^{B}\) is independent to the channel h k and is independent and identically distributed (i.i.d.) Gaussian with mean zero and variance φ k , so we have
$$\begin{array}{@{}rcl@{}} P_{\mathbf{w}_{k}}(w_{k};\varphi_{k}) \triangleq \mathcal{N}(w_{k};0,\varphi_{k}). \end{array} $$
And, the channel h k also contains i.i.d. components, so we have
$$\begin{array}{@{}rcl@{}} P_{\mathbf{h}_{k}}(\mathbf{h}_{k};\mathbf{q}_{k}) = \prod_{m=1}^{M}{P_{\mathbf{h}_{k}}(h_{km};\mathbf{q}_{k})}, \end{array} $$
where \(\mathbf {q}_{k} \triangleq \, [\lambda _{k},\pmb {\rho }_{k},\pmb {\theta }_{k},\pmb {\sigma }_{k},\varphi _{k}]\). By exploiting the sparse Bayesian learning method to estimate the channel h k , we can get
$$\begin{array}{@{}rcl@{}} \hat{h}_{km}=\int{h_{km}Q(h_{km}){dh}_{km}}, \end{array} $$

where \(Q(h_{km})=\int {P_{{\mathbf {h}_{k}}|\mathbf {y}_{k}^{B}}\left ({\mathbf {h}_{k}}|\mathbf {y}_{k}^{B};\mathbf {q}_{k}\right)\prod _{i\neq m}{dh}_{ki}}\) is the marginal probability density function (PDF) of h km .

As we can see in (10), the Bayesian algorithm is computationally intractable and it requires to know the channel statistical properties in advance. However, it is difficult to obtain in massive MIMO systems. So, new methods should be developed to solve these problems. In this paper, we propose an EM-based parameter iterative approach to estimate sparse channel by employing the GAMP to reduce the computational complexity and the EM algorithm to learn all the required channel knowledge.

3 EM-based parameter iterative approach for sparse Bayesian channel estimation

In massive MIMO systems, the channel statistical property is getting harder to obtain with the increasing number of antennas at the BS. Unlike traditional Bayesian method, this paper proposes an EM-based parameter iterative approach to learn all the required channel knowledge.

3.1 EM-GAMP algorithm

We introduce the EM algorithm to learn the channel parameters q k . The EM algorithm constantly updates q k by increasing the lower bound on the likelihood \(\ln {P_{{\mathbf {h}_{k}}|\mathbf {y}_{k}^{B}}\left ({\mathbf {h}_{k}}|\mathbf {y}_{k}^{B};\mathbf {q}_{k}\right)}\) at each iteration [13]. The updating rules are given as:
$$\begin{array}{@{}rcl@{}} \hat{\mathbf{q}}_{k}^{t+1}=\arg\max_{\mathbf{q}_{k}}E\left\{\ln{P_{{\mathbf{h}_{k}^{t}}|\mathbf{y}_{k}^{B}}\left({\mathbf{h}_{k}^{t}}|\mathbf{y}_{k}^{B};\hat{\mathbf{q}}_{k}^{t}\right)}\right\}, \end{array} $$

where t represents the iteration index and \(\ln {P_{{\mathbf {h}_{k}^{t}}|\mathbf {y}_{k}^{B}}} {\left ({\mathbf {h}_{k}^{t}}|\mathbf {y}_{k}^{B};\hat {\mathbf {q}}_{k}^{t}\right)}\) can be calculated by the GAMP algorithm.

Denote \(\mathbf {z}_{k}^{t} = \mathbf {S} \mathbf {h}_{k}^{t}\), \(z_{kn}^{t} = \mathbf {s}_{n}^{\mathsf {T}} \mathbf {h}_{k}^{t}\), where \(\mathbf {s}_{n}^{\mathsf {T}}\) is the nth row of S. The GAMP algorithm models the relationship between \(\mathbf {y}_{kn}^{B}\) and \(z_{kn}^{t}\) by the conditional PDF. The true marginal posterior can be estimated by the whole probability formula
$$ {\begin{aligned} P_{{\mathbf{z}_{k}^{t}}|\mathbf{y}_{k}^{B}}\left({z_{kn}^{t}}|\mathbf{y}_{k}^{B};\mathbf{q}_{k}^{t}\right) \,=\, \frac {P_{{\mathbf{y}_{k}^{B}}|\mathbf{z}_{k}^{t}}\left({y_{kn}^{B}}|z_{kn}^{t};\mathbf{q}_{k}^{t}\right)\mathcal{N}\left(z_{kn}^{t};\hat{p}_{kn}^{t},\mu_{kn}^{p,t}\right)} {\int_{\mathbf{z}_{k}^{t}}{P_{{\mathbf{y}_{k}^{B}}|\mathbf{z}_{k}^{t}}\left({y_{kn}^{B}}|\mathbf{z}_{k}^{t};\mathbf{q}_{k}^{t}\right)\mathcal{N}\left(\mathbf{z}_{k}^{t};\hat{p}_{kn}^{t},\mu_{kn}^{p,t}\right)}}, \end{aligned}} $$
where \(\hat {p}_{kn}^{t}\) and \(\mu _{kn}^{p,t}\) are the mean and variance of \(\mathbf {z}_{k}^{t}\), respectively. The noise vector \(\mathbf {w}_{k}^{t} = \mathbf {y}_{k}^{B} - \mathbf {z}_{k}^{t}\) is assumed to be additive white Gaussian, so we can get \(P_{{\mathbf {y}_{k}^{B}}|\mathbf {z}_{k}^{t}}\left ({y_{kn}^{B}}|z_{kn}^{t};\mathbf {q}_{k}^{t}\right) = \mathcal {N}\left (y_{kn}^{B};z_{kn}^{t},\varphi _{k}^{t}\right)\). Plugging it into (12), we can obtain
$$ {\begin{aligned} P_{{\mathbf{z}_{k}^{t}}|\mathbf{y}_{k}^{B}}\left({z_{kn}^{t}}|\mathbf{y}_{k}^{B};\mathbf{q}_{k}^{t}\right) = \frac {\mathcal{N}\left(y_{kn}^{B};z_{kn}^{t},\varphi_{k}^{t}\right)\mathcal{N}\left(z_{kn}^{t};\hat{p}_{kn}^{t},\mu_{kn}^{p,t}\right)} {\int_{\mathbf{z}_{k}^{t}}{\mathcal{N}\left(y_{kn}^{B};\mathbf{z}_{k}^{t},\varphi_{k}^{t}\right)\mathcal{N}\left(\mathbf{z}_{k}^{t};\hat{p}_{kn}^{t},\mu_{kn}^{p,t}\right)}}\\ = \mathcal{N}\left(z_{kn}^{t};\frac {\mu_{kn}^{p,t}y_{kn}^{B} + \hat{p}_{kn}^{t}\varphi_{k}^{t}} {\mu_{kn}^{p,t} + \varphi_{k}^{t}}, \frac {\mu_{kn}^{p,t}\varphi_{k}^{t}} {\mu_{kn}^{p,t} + \varphi_{k}^{t}}\right). \end{aligned}} $$
The posterior PDF of \(h_{km}^{t}\) could be approximated by the GAMP algorithm
$$ {\begin{aligned} P_{{\mathbf{h}_{k}^{t}}|\mathbf{y}_{k}^{B}}\left({h_{km}^{t}}|\mathbf{y}_{k}^{B};\hat{r}_{km}^{t},u_{km}^{r,t},\mathbf{q}_{k}^{t}\right) = \frac {P_{\mathbf{h}_{k}^{t}}\left(h_{km}^{t};\mathbf{q}_{k}^{t}\right) \mathcal{N}\left(h_{km}^{t};\hat{r}_{km}^{t},u_{km}^{r,t}\right)} {\int_{\mathbf{h}_{k}^{t}}{P_{\mathbf{h}_{k}^{t}} \left(\mathbf{h}_{k}^{t};\mathbf{q}_{k}^{t}\right) \mathcal{N}\left(\mathbf{h}_{k}^{t};\hat{r}_{km}^{t},u_{km}^{r,t}\right)}}, \end{aligned}} $$

where \(\hat {r}_{km}^{t}\) and \(u_{km}^{r,t}\) are the mean and variance of \(\mathbf {h}_{k}^{t}\), respectively.

Taking the prior information (7) into (14), we have
$$\begin{array}{@{}rcl@{}} P_{{\mathbf{h}_{k}^{t}}|\mathbf{y}_{k}^{B}}\left({h_{km}^{t}}|\mathbf{y}_{k}^{B};\hat{r}_{km}^{t},u_{km}^{r,t},\mathbf{q}_{k}^{t}\right) = \left(1-\pi_{km}^{t}\right)\delta\left(h_{km}^{t}\right)\\+ \pi_{km}^{t} \Sigma_{l=1}^{L}{{\bar{\beta}_{km,l}^{t} \mathcal{N}\left(h_{km}^{t};\gamma_{km,l}^{t},\upsilon_{km,l}^{t}\right)}}, \end{array} $$
$$\begin{array}{@{}rcl@{}} {{\pi_{km}^{t}}} &=& \int_{\mathbf{h}_{k}^{t}}{P_{\mathbf{h}_{k}^{t}} \left(\mathbf{h}_{k}^{t};\mathbf{q}_{k}^{t}\right) \mathcal{N}\left(\mathbf{h}_{k}^{t};\hat{r}_{km}^{t},u_{km}^{r,t}\right)}, \end{array} $$
$$\begin{array}{@{}rcl@{}} \beta_{km,l}^{t} &=& \lambda_{k}^{t} \rho_{kl}^{t} \mathcal{N}\left(\hat{r}_{km}^{t};\theta_{kl}^{t},u_{km}^{r,t} + \sigma_{kl}^{t}\right), \end{array} $$
$$\begin{array}{@{}rcl@{}} \bar{\beta}_{km,l}^{t} &=& \frac {\beta_{km,l}^{t}} {\Sigma_{l=i}^{L}{\beta_{km,i}^{t}}}, \end{array} $$
$$\begin{array}{@{}rcl@{}} \pi_{km}^{t} &=& \left(1 + \left(\frac {\Sigma_{l=i}^{L}{\beta_{km,i}^{t}}} {\left(1-\lambda_{k}^{t}\right) \mathcal{N}\left(0;\hat{r}_{km}^{t},u_{km}^{r,t}\right)}\right)^{-1}\right)^{-1}\!\!, \end{array} $$
$$\begin{array}{@{}rcl@{}} \gamma_{km,l}^{t} &=& \frac {\hat{r}_{km}^{t} /u_{km}^{r,t} + \theta_{kl}^{t} / \sigma_{kl}^{t}} {1 /u_{km}^{r,t} + 1 / \sigma_{kl}^{t}}, \end{array} $$
$$\begin{array}{@{}rcl@{}} \upsilon_{km,l}^{t} &=& \frac{1} {1 /u_{km}^{r,t} + 1 / \sigma_{kl}^{t}}. \end{array} $$

\(P_{{\mathbf {h}_{k}^{t}}|\mathbf {y}_{k}^{B}}\left ({h_{km}^{t}}|\mathbf {y}_{k}^{B};\mathbf {q}_{k}^{t}\right)\) obeys a GM distribution according to (15). Plugging (15) into \(Q\left (h_{km}^{t}\right)\) and estimating by the GAMP algorithm, we find that \(Q\left (h_{km}^{t}\right)\) could be viewed as a Gaussian distribution. So it can be written as \(Q\left (h_{km}^{t}\right) = \mathcal {N}\left (h_{km}^{t};a_{km}^{t},{var}_{km}^{t}\right)\), where \(a_{km}^{t}\) and \({var}_{km}^{t}\) are the mean and variance, respectively. And then plugging it into (10), the posterior estimate of \(h_{km}^{t}\) can be obtained as \(a_{km}^{t}\).

Plugging (15) into (11) and following similar steps of manipulation in [13], we have the update for the parameters
$$\begin{array}{@{}rcl@{}} \lambda_{k}^{t+1} &=& \frac {1}{M} \Sigma_{m=1}^{M}{\pi_{km}^{t}}, \end{array} $$
$$\begin{array}{@{}rcl@{}} \rho_{kl}^{t+1} &=& \frac {\Sigma_{m=1}^{M}{\pi_{km}^{t}\bar{\beta}_{km,l}^{t}}} {\Sigma_{m=1}^{M}{\pi_{km}^{t}}}, \end{array} $$
$$\begin{array}{@{}rcl@{}} \theta_{kl}^{t+1} &=& \frac {\Sigma_{m=1}^{M}{\pi_{km}^{t}\bar{\beta}_{km,l}^{t}\gamma_{km,l}^{t}}} {\Sigma_{m=1}^{M}{\pi_{km}^{t}\bar{\beta}_{km,l}^{t}}}, \end{array} $$
$$\begin{array}{@{}rcl@{}} \sigma_{kl}^{t+1} &=& \frac {\Sigma_{m=1}^{M}{\pi_{km}^{t}\bar{\beta}_{km,l}^{t}}\left(|\theta_{kl}^{t}-\gamma_{km,l}^{t}|^{2}+\upsilon_{km,l}^{t}\right)} {\Sigma_{m=1}^{M}{\pi_{km}^{t}\bar{\beta}_{km,l}^{t}}}, \end{array} $$
$$\begin{array}{@{}rcl@{}} \varphi_{k}^{t+1} &=& \frac {1} {N} \Sigma_{n=1}^{N}\left(|y_{kn}^{B}-\hat{z}_{kn}^{t}|^{2}+\mu_{kn}^{z,t}\right), \end{array} $$

where \({{\pi _{km}^{t}}}\), \(\bar {\beta }_{km,l}^{t}\), \(\gamma _{km,l}^{t}\), \(\upsilon _{km,l}^{t}\) are calculated by (16) and \(\mu _{kn}^{z,t}\) is the variance of \(P_{{\mathbf {z}_{k}^{t}}|\mathbf {y}_{k}^{B}}\left ({z_{kn}^{t}}|\mathbf {y}_{k}^{B};\mathbf {q}_{k}^{t}\right)\), namely, \(\mu _{kn}^{z,t} = {\mu _{kn}^{p,t}\varphi _{k}^{t}} / \left (\mu _{kn}^{p,t} + \varphi _{k}^{t}\right)\). As we can see from (17), accurate channel parameters could be achieved after several iterations.

Statistical properties of the adjacent sub-channels change slowly in beam domain because of the limited scatters. So, we can use the learned parameters to initial the next sub-channel parameters, namely, \(\hat {\mathbf {q}}_{k+1}^{0} = \hat {\mathbf {q}}_{k}^{t}\). The proposed EM-GAMP algorithm is summarized in Algorithm 1.

3.2 Performance analysis

The GAMP algorithm transforms a multiple integral nonlinear problem into a linear problem, so the computational complexity in each iteration mainly depends on the simple matrix multiplication and addition operations in formula (10)–(16). In E-step, the computational complexity is dominated by formula (15), namely, 𝜗(L N M). Similarly, the computation of the M-step is 𝜗(4M+N). Therefore, the proposed EM-GAMP algorithm has the computational complexity on the order of 𝜗(T(L N M+4M+N)), where T and L represent the iteration times and the number of Gaussian-mixture components, respectively.

The complexity of classic LS and OMP algorithm is 𝜗(M 3) and 𝜗(N M 2), respectively. In massive MIMO systems, the number of antennas at the BS is much larger than the length of pilot sequences, namely, NM. Simulation results show that the MSE tends to be stable after five to six iterations, namely, TN. In this paper, L is fixed to 3, which contributes to better performance. Thus, the complexity of the EM-GAMP algorithm can be simplified as 𝜗(T N M). It is obvious that the EM-GAMP algorithm proposed in this paper can greatly reduce the computational complexity.

Classic CS algorithms, such as OMP and SP, can also accurately reconstruct the channel. However, these schemes need to know channel statistical properties in advance, which is unachievable in practice. In contrast to the OMP and SP, our approach exploits the EM algorithm to update all the required channel parameters as a part of the estimation procedure.

4 Simulation results

We consider a single cell massive MIMO system including a BS equipped with M=200 antennas and K=40 users. The BS uses N=λ M pilot symbols for channel estimation. We denote λ=0.2 in this paper because over 99% of the channel power is located only within 16% of the beam indices, which is mentioned in Section 2.1. It is essential for the EM algorithm to initial the unknown parameters \(\hat {\mathbf {q}_{k}}\) properly. We initialize the parameters as \(\rho _{1l}^{0}=\frac {1}{3}\), \(\theta _{1l}^{0}=0\), \(\pmb {\sigma }_{1}^{0}=\,[0.01,0.1,1]\), \(\varphi _{1}^{0}=\frac {\|\mathbf {y}_{k}^{B}\|^{2}}{N(1+SNR)}\), where SNR is the signal to noise ratio. We define the mean square error (MSE) as
$$\begin{array}{@{}rcl@{}} {MSE}_{k} = \frac {1}{KM} \sum_{k=1}^{K}{\sum_{m=1}^{M}{|\hat{h}_{km}-h_{km}|^{2}}} \end{array} $$
Figure 1 compares the MSE performance of the proposed algorithm and the traditional algorithms as functions of SNR. As can be seen from the picture, the MSE of all algorithms decrease with SNR increasing. Clearly, the proposed EM-GAMP has much better performance than the LS and OMP and it approaches the perfect CSI bound when S N R>15d B. In addition, we also compare the MSE performance with different GM components. The MSE would reduce when the number of GM components increases. This is because the channel model could be more accurate with more GM components.
Figure 1
Fig. 1

MSE performance comparison of different algorithms as functions of SNR

Figure 2 compares the MSE performance of the EM-GAMP algorithm as functions of the number of iterations. From the picture, we can see that the MSE tends to be stable after five to six iterations. And the convergence rate is inversely proportional to L, the number of GM components. However, the MSE performance degrades with small L. Therefore, it is necessary to choose appropriate L to improve the operation rate as much as possible within the margin of error.
Figure 2
Fig. 2

MSE performance of EM-GAMP algorithm as functions of the iterations

Figure 3 compares the MSE performance of the EM-GAMP, the LS, and the OMP as functions of M with S N R=15d B, L=3, \(\frac {N}{M}=0.2\). From Fig. 3, it is clearly that performance of the LS and OMP degrades with M increasing. By contrast, the proposed EM-GAMP algorithm is not affected by the number of antennas at the BS. The estimation error keeps stable no matter how large M is. This is because that we exploit the EM algorithm to learn and update the channel parameters adaptively, which makes our algorithm adapt to different environments.
Figure 3
Fig. 3

MSE performance comparison of different algorithms as functions of M with S N R=15d B, L=3, \(\frac {N}{M}=0.2\)

Figure 4 compares the MSE performance of different channels and M as functions of \(\frac {N}{M}\) with S N R=15d B, λ=0.2 and L=3. From the results in Fig. 4, when increasing \(\frac {N}{M}\), it is observed that the MSE decreases and tends to be stable when Nλ M as excepted. Different channel models, including Rayleigh channel model and Gaussian mixture channel model, are considered. The proposed approach achieves much better performance in all cases illustrating a substantial improvement over the LS estimator.
Figure 4
Fig. 4

MSE performance comparison of different channels and M as functions of \(\frac {N}{M}\) with S N R=15d B, L=3, λ=0.2

Figure 5 compares the MSE performance of different algorithms as functions of sparsity rate with M=400, S N R=15d B, L=3. The performance of all of the three algorithms increases with the sparsity rate getting larger. However, the EM-GAMP algorithm approaches the perfect CSI when the sparsity rate λ≤0.2, and it always achieves a much better performance than others.
Figure 5
Fig. 5

MSE performance comparison of different algorithms as functions of sparsity rate with M=400, S N R=15d B, L=3

5 Conclusions

To obtain an accurate CSI of massive MIMO system, we propose an EM-based parameter iterative approach based on sparse Bayesian method. Channel statistical properties are essential for sparse Bayesian methods to obtain accurate CSI; however, it is barely accessible in massive MIMO systems due to the larger number of antennas at the BS. In this paper, we model the channel non-zero distribution as a Gaussian mixture and employ the EM algorithm to learn all the required channel parameters. Besides, all of the parameters needed for EM update are computed by the GAMP algorithm. Simulation results show that our approach provides a huge gain in inducing complexity and has a much better performance compared to the LS and OMP algorithms with channel parameters unknown. In addition, the proposed EM-GAMP algorithm is robust to the number of antennas at the BS.

Pilot contamination exists when the same orthogonal pilot sequences used in all cells for multi-cell massive MIMO systems. We have designed a kind of non-orthogonal pilot sequences [14], which can efficiently reduce the impact of pilot contamination. Thus, the proposed method can be extended to multi-cell massive MIMO systems and still have a good performance and complexity reducing.



This work was supported by research grants from the National Natural Science Foundation of P.R. China (No. 61673253, 61271213) and the Ph.D. Programs Foundation of Ministry of Education of P.R. China (20133108110014).

Authors’ contributions

MS conceived and designed the study. MS performed the experiments and wrote the paper. FY reviewed and revised the manuscript. Both authors read and approved the manuscript.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

Key laboratory of Specialty Fiber Optics and Optical Access Networks, Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai, China


  1. EG Larsson, F Tufvesson, O Edfors, TL Marzetta, Massive MIMO for next generation wireless systems. IEEE Commun. Mag.52(2), 186–195 (2014).View ArticleGoogle Scholar
  2. F Kaltenberger, D Gesbert, R Knopp, M Kountouris, Correlation and capacity of measured multi-user MIMO channels. Proc. IEEE Int. Symp. Personal Indoor Mobile Radio Commun. (PIMRC), 1–5 (2008).Google Scholar
  3. J Hoydis, C Hoek, T Wild, S ten Brink, Channel measurements for large antenna arrays. Proc. IEEE Int. Symp. Wireless Commun. Systems (ISWCS). 91(12), 811–815 (2012).Google Scholar
  4. L Dai, Z Wang, Z Yang, Spectrally efficient time-frequency training OFDM for mobile large-scale MIMO systems. IEEE J. Sel. Areas Commun.31(2), 251–263 (2013).View ArticleGoogle Scholar
  5. Z Gao, L Dai, Z Wang, S Chen, Spatially common sparsity based adaptive channel estimation and feedback for FDD massive MIMO. IEEE Trans. Signal Process.63(23), 6169–6183 (2015).View ArticleMathSciNetGoogle Scholar
  6. Z Gao, L Dai, Z Wang, Structured compressive sensing based superimposed pilot design for large-scale MIMO systems. Electron. Lett.50(12), 896–898 (2014).View ArticleGoogle Scholar
  7. Z Gao, L Dai, W Dai, B Shim, Z Wang, Structured compressive sensing-based spatio-temporal joint channel estimation for FDD massive MIMO. IEEE Trans. Commun.64(2), 601–617 (2016).View ArticleGoogle Scholar
  8. C Zhu, Z Zheng, B Jiang, W Zhong, X Gao, Bayesian channel estimation for massive MIMO communications. IEEE 83rd Veh. Technol. Conf, 1–5 (2016).Google Scholar
  9. C-K Wen, K-K Wong, J-C Chen, P Ting, Channel estimation for massive MIMO using gaussian-mixture bayesian learning. IEEE Trans. Commun.14(3), 1356–1368 (2015).Google Scholar
  10. S Rangan, Generalized approximate message passing for estimation with random linear mixing. Proc. IEEE Int. Symp. Inf. Theory (ISIT), 2168–2172 (2011).Google Scholar
  11. J Zhang, B Zhang, S Chen, X Mu, M El-Hajjar, L Hanzo, Pilot contamination elimination for large-scale multiple-antenna aided OFDM systems. IEEE J. Sel. Topics Signal Process.8(5), 759–772 (2014).View ArticleGoogle Scholar
  12. E Bjonson, J Hoydis, M Kountouris, M Debbah, Massive MIMO systems with non-ideal hardware: Energy efficiency estimation and capacity limits. IEEE Trans. Inf. Theory. 60(11), 7112–7139 (2014).View ArticleMATHMathSciNetGoogle Scholar
  13. JP Vila, P Schiner, Expectation-maximization Gaussian-mixture approximate message passing. IEEE Trans. Signal Process.61(19), 4658–4672 (2013).View ArticleMathSciNetGoogle Scholar
  14. F Yong, M Sulin, W Du, Pilot assignment of massive mimo systems for pilot decontamination based on inter-cell cross gain. Processings 2016 Int. Conf. Sci. Comput, 125–129 (2016).Google Scholar


© The Author(s) 2017