Channel estimation via gradient pursuit for mmWave massive MIMO systems with one-bit ADCs

Kim, In-soo; Choi, Junil

doi:10.1186/s13638-019-1623-x

Research
Open access
Published: 30 December 2019

Channel estimation via gradient pursuit for mmWave massive MIMO systems with one-bit ADCs

EURASIP Journal on Wireless Communications and Networking volume 2019, Article number: 289 (2019) Cite this article

1711 Accesses
9 Citations
Metrics details

Abstract

In millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems, 1 bit analog-to-digital converters (ADCs) are employed to reduce the impractically high power consumption, which is incurred by the wide bandwidth and large arrays. In practice, the mmWave band consists of a small number of paths, thereby rendering sparse virtual channels. Then, the resulting maximum a posteriori (MAP) channel estimation problem is a sparsity-constrained optimization problem, which is NP-hard to solve. In this paper, iterative approximate MAP channel estimators for mmWave massive MIMO systems with 1 bit ADCs are proposed, which are based on the gradient support pursuit (GraSP) and gradient hard thresholding pursuit (GraHTP) algorithms. The GraSP and GraHTP algorithms iteratively pursue the gradient of the objective function to approximately optimize convex objective functions with sparsity constraints, which are the generalizations of the compressive sampling matching pursuit (CoSaMP) and hard thresholding pursuit (HTP) algorithms, respectively, in compressive sensing (CS). However, the performance of the GraSP and GraHTP algorithms is not guaranteed when the objective function is ill-conditioned, which may be incurred by the highly coherent sensing matrix. In this paper, the band maximum selecting (BMS) hard thresholding technique is proposed to modify the GraSP and GraHTP algorithms, namely, the BMSGraSP and BMSGraHTP algorithms, respectively. The BMSGraSP and BMSGraHTP algorithms pursue the gradient of the objective function based on the band maximum criterion instead of the naive hard thresholding. In addition, a fast Fourier transform-based (FFT-based) fast implementation is developed to reduce the complexity. The BMSGraSP and BMSGraHTP algorithms are shown to be both accurate and efficient, whose performance is verified through extensive simulations.

1 Introduction

In millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems, the wide bandwidth of the mmWave band in the range of 30–300 GHz offers a high data rate, which guarantees a significant performance gain [1–4]. However, the power consumption of analog-to-digital converters (ADCs) is scaled quadratically with the sampling rate and exponentially with the ADC resolution, which renders high-resolution ADCs impractical for mmWave systems [5]. To reduce the power consumption, low-resolution ADCs were suggested as a possible solution, which recently gained popularity [6–9]. Coarsely quantizing the received signal using low-resolution ADCs results in an irreversible loss of information, which might cause a significant performance degradation. In this paper, we consider the extreme scenario of using 1 bit ADCs for mmWave systems.

In practice, the mmWave band consists of a small number of propagation paths, which results in sparse virtual channels. In the channel estimation point of view, sparse channels are favorable because the required complexity and measurements can be reduced. Sparsity-constrained channel distributions, however, cannot be described in closed forms, which makes it difficult to exploit Bayesian channel estimation. In [10, 11], channel estimators for massive MIMO systems with 1 bit ADCs were proposed, which account for the effect of the coarse quantization. The near maximum likelihood (nML) channel estimator [10] selects the maximizer of the likelihood function subject to the L²-norm constraint as the estimate of the channel, which is solved using the projected gradient descent method [12]. However, the channel sparsity was not considered in [10]. In [11], the Bussgang linear minimum mean squared error (BLMMSE) channel estimator was derived by linearizing 1 bit ADCs based on the Bussgang decomposition [13]. The BLMMSE channel estimator is an LMMSE channel estimator for massive MIMO systems with 1 bit ADCs, whose assumption is that the channel is Gaussian. Therefore, the sparsity of the channel is not taken into account in [11] either.

To take the channel sparsity into account, iterative approximate MMSE estimators for mmWave massive MIMO systems with 1 bit ADCs were proposed in [14, 15]. The generalized expectation consistent signal recovery (GEC-SR) algorithm in [14] is an iterative approximate MMSE estimator based on the turbo principle [16], which can be applied to any nonlinear function of the linearly mapped signal to be estimated. Furthermore, the only constraint on the distribution of the signal to be estimated is that its elements must be independent and identically distributed (i.i.d.) random variables. Therefore, the GEC-SR algorithm can be used as an approximate MMSE channel estimator for any ADC resolutions ranging from 1 bit to high-resolution ADCs. However, the inverse of the sensing matrix is required at each iteration, which is impractical in massive MIMO systems in the complexity point of view.

The generalized approximate message passing-based (GAMP-based) channel estimator for mmWave massive MIMO systems with low-resolution ADCs was proposed in [15], which is another iterative approximate MMSE channel estimator. In contrast to the GEC-SR algorithm, only matrix-vector multiplications are required at each iteration, which is favorable in the complexity point of view. As in the GEC-SR algorithm, the GAMP-based algorithm can be applied to any ADC resolutions and any channel distributions as long as the elements of channel are i.i.d. random variable. The performance of the GEC-SR and GAMP algorithms, however, cannot be guaranteed when the sensing matrix is ill-conditioned, which frequently occurs in the mmWave band. To prevent the sensing matrix from becoming ill-conditioned, the GAMP-based channel estimator constructs the virtual channel representation using discrete Fourier transform (DFT) matrices, whose columns are orthogonal. However, such virtual channel representation results in a large gridding error, which leads to performance degradation.

Our goal is to propose an iterative approximate maximum a posteriori (MAP) channel estimator for mmWave massive MIMO systems with 1 bit ADCs. Due to the sparse nature, the MAP channel estimation problem is converted to a sparsity-constrained optimization problem, which is NP-hard to solve [17]. To approximately solve such problems iteratively, the gradient support pursuit (GraSP) and gradient hard thresholding pursuit (GraHTP) algorithms were proposed in [17] and [18], respectively. The GraSP and GraHTP algorithms pursue the gradient of the objective function at each iteration by hard thresholding. These algorithms are the generalizations of the compressive sampling matching pursuit (CoSaMP) [19] and hard thresholding pursuit (HTP) [20] algorithms, respectively, in compressive sensing (CS).

With highly coherent sensing matrix, however, the GraSP and GraHTP algorithms do not perform appropriately since the objective function becomes ill-conditioned. To remedy such break down, we exploit the band maximum selecting (BMS) hard thresholding technique, which is then applied to the GraSP and GraHTP algorithms to propose the BMSGraSP and BMSGraHTP algorithms, respectively. The proposed BMS-based algorithms perform hard thresholding for the gradient of the objective function based on the proposed band maximum criterion, which tests whether an index is the ground truth index or the by-product of another index. To reduce the complexity of the BMS-based algorithms, a fast Fourier transform-based (FFT-based) fast implementation of the objective function and gradient is proposed. The BMS-based algorithms are shown to be both accurate and efficient, which is verified through extensive simulations.

The rest of this paper is organized as follows. In Section 2, mmWave massive MIMO systems with 1 bit ADCs are described. In Section 3, the MAP channel estimation framework is formulated. In Section 4, the BMS hard thresholding technique is proposed, which is applied to the GraSP and GraHTP algorithms. In addition, an FFT-based fast implementation is proposed. In Section 5, the results and discussion are presented, and the conclusions are followed in Section 6.

Notation:a, a, and A denote a scalar, vector, and matrix, respectively. ∥a∥₀,∥a∥₁, and ∥a∥ represent the L⁰-, L¹-, and L²-norm of a, respectively. ∥A∥_F is the Frobenius norm of A. The transpose, conjugate transpose, and conjugate of A are denoted as A^T,A^H, and $\overline {\mathbf {A}}$, respectively. The element-wise vector multiplication and division of a and b are denoted as a⊙b and a⊘b, respectively. The sum of all of the elements of a is denoted as sum(a). The vectorization of A is denoted as vec(A), which is formed by stacking all of the columns of A. The unvectorization of a is denoted as unvec(a), which is the inverse of vec(A). The Kronecker product of A and B is denoted as A⊗B. The support of a is denoted as supp(a), which is the set of indices formed by collecting all of the indices of the non-zero elements of a. The best s-term approximation of a is denoted as a|_s, which is formed by leaving only the s largest (in absolute value) elements of a and replacing the other elements with 0. Similarly, the vector obtained by leaving only the elements of a indexed by the set $\mathcal {A}$ and replacing the other elements with 0 is denoted as $\mathbf {a}|_{\mathcal {A}}$. The absolute value of a scalar a and cardinality of a set $\mathcal {A}$ are denoted as |a| and $|\mathcal {A}|$, respectively. The set difference between the sets $\mathcal {A}$ and $\mathcal {B}$, namely, $\mathcal {A}\cap \mathcal {B}^{\mathrm {c}}$, is denoted as $\mathcal {A}\setminus \mathcal {B}$. ϕ(a) and Φ(a) are element-wise standard normal PDF and CDF functions of a, whose ith elements are $\frac {1}{\sqrt {2\pi }}e^{-\frac {a_{i}^{2}}{2}}$ and $\int _{-\infty }^{a_{i}}\frac {1}{\sqrt {2\pi }}e^{-\frac {x^{2}}{2}}dx$, respectively. The m×1 zero vector and m×m identity matrix are denoted as 0_m and I_m, respectively.

2 mmWave massive MIMO systems with 1 bit ADCs

2.1 System model

As shown in Fig. 1, consider a mmWave massive MIMO system with uniform linear arrays (ULAs) at the transmitter and receiver. The N-antenna transmitter transmits a training sequence of length T to the M-antenna receiver. Therefore, the received signal $\mathbf {Y}=[\mathbf {y}[1]\quad \mathbf {y}[2]\quad \cdots \quad \mathbf {y}[T]]\in \mathbb {C}^{M\times T}$ is

$$ \mathbf{Y}=\sqrt{\rho}\mathbf{H}\mathbf{S}+\mathbf{N}, $$

(1)

which is the collection of the t-th received signal $\mathbf {y}[t]\in \mathbb {C}^{M}$ over t∈{1,…,T}. In the mmWave band, the channel $\mathbf {H}\in \mathbb {C}^{M\times N}$ consists of a small number of paths, whose parameters are the path gains, angle-of-arrivals (AoAs), and angle-of-departures (AoDs) [21]. Therefore, H is

$$ \mathbf{H}=\sum_{\ell=1}^{L}\alpha_{\ell}\mathbf{a}_{\text{RX}}(\theta_{\text{RX}, \ell})\mathbf{a}_{\text{TX}}(\theta_{\text{TX}, \ell})^{\mathrm{H}} $$

(2)

where L is the number of paths, $\alpha _{\ell }\in \mathbb {C}$ is the path gain of the ℓ-th path, and θ_RX,ℓ∈[−π/2,π/2] and θ_TX,ℓ∈[−π/2,π/2] are the AoA and AoD of the ℓth path, respectively. The steering vectors $\mathbf {a}_{\text {RX}}(\theta _{\text {RX}, \ell })\in \mathbb {C}^{M}$ and $\mathbf {a}_{\text {TX}}(\theta _{\text {TX}, \ell })\in \mathbb {C}^{N}$ are

$$\begin{array}{*{20}l} \mathbf{a}_{\text{RX}}(\theta_{\text{RX}, \ell})&=\frac{1}{\sqrt{M}}\left[\begin{array}{lll}1&\cdots&e^{-j\pi(M-1)\sin(\theta_{\text{RX}, \ell})}\end{array}\right]^{\mathrm{T}}, \end{array} $$

(3)

$$\begin{array}{*{20}l} \mathbf{a}_{\text{TX}}(\theta_{\text{TX}, \ell})&=\frac{1}{\sqrt{N}}\left[\begin{array}{lll}1&\cdots&e^{-j\pi(N-1)\sin(\theta_{\text{TX}, \ell})}\end{array}\right]^{\mathrm{T}} \end{array} $$

(4)

where the inter-element spacing is half-wavelength. The training sequence $\mathbf {S}=[\mathbf {s}[1]\quad \mathbf {s}[2]\quad \cdots \quad \mathbf {s}[T]]\in \mathbb {C}^{N\times T}$ is the collection of the tth training sequence $\mathbf {s}[t]\in \mathbb {C}^{N}$ over t∈{1,…,T}, whose power constraint is ∥s[t]∥²=N. The additive white Gaussian noise (AWGN) $\mathbf {N}=[\mathbf {n}[1]\quad \mathbf {n}[2]\quad \cdots \quad \mathbf {n}[T]]\in \mathbb {C}^{M\times T}$ is the collection of the tth AWGN $\mathbf {n}[t]\in \mathbb {C}^{M}$ over t∈{1,…,T}, which is distributed as $\text {vec}(\mathbf {N})\sim \mathcal {C}\mathcal {N}(\mathbf {0}_{MT}, \mathbf {I}_{MT})$. The signal-to-noise ratio (SNR) is defined as ρ.

At the receiver, the real and imaginary parts of the received signal are quantized by 1 bit ADCs. The quantized received signal $\hat {\mathbf {Y}}$ is

$$\begin{array}{*{20}l} \hat{\mathbf{Y}}&=\mathrm{Q}(\mathbf{Y})\\ &=\mathrm{Q}(\sqrt{\rho}\mathbf{H}\mathbf{S}+\mathbf{N}) \end{array} $$

(5)

where Q(·) is the 1 bit quantization function, whose threshold is 0. Therefore, Q(Y) is

$$ \mathrm{Q}(\mathbf{Y})=\text{sign}(\text{Re}(\mathbf{Y}))+j\text{sign}(\text{Im}(\mathbf{Y})) $$

(6)

where sign(·) is the element-wise sign function. The goal is to estimate H by estimating $\{\alpha _{\ell }\}_{\ell =1}^{L}, \{\theta _{\text {RX}, \ell }\}_{\ell =1}^{L}$, and $\{\theta _{\text {TX}, \ell }\}_{\ell =1}^{L}$ from $\hat {\mathbf {Y}}$.

2.2 Virtual channel representation

In the mmWave channel model in (2), $\{\theta _{\text {RX}, \ell }\}_{\ell =1}^{L}$ and $\{\theta _{\text {TX}, \ell }\}_{\ell =1}^{L}$ are hidden in $\{\mathbf {a}_{\text {RX}}(\theta _{\text {RX}, \ell })\}_{\ell =1}^{L}$ and $\{\mathbf {a}_{\text {TX}}(\theta _{\text {TX}, \ell })\}_{\ell =1}^{L}$, respectively. The non-linear mapping of $\{\theta _{\text {RX}, \ell }\}_{\ell =1}^{L}$ and $\{\theta _{\text {TX}, \ell }\}_{\ell =1}^{L}$ to Y renders a non-linear channel estimation problem. To convert the non-linear channel estimation problem to a linear channel estimation problem, we adopt the virtual channel representation [22].

The virtual channel representation of H is

$$ \mathbf{H}\approx\mathbf{A}_{\text{RX}}\mathbf{X}^{*}\mathbf{A}_{\text{TX}}^{\mathrm{H}} $$

(7)

where the dictionary pair $\mathbf {A}_{\text {RX}}\in \mathbb {C}^{M\times B_{\text {RX}}}$ and $\mathbf {A}_{\text {TX}}\in \mathbb {C}^{N\times B_{\text {TX}}}$ is the collection of B_RX≥M steering vectors and B_TX≥N steering vectors, respectively. Therefore, A_RX and A_TX are

$$\begin{array}{*{20}l} \mathbf{A}_{\text{RX}}&=\left[\begin{array}{lll}\mathbf{a}_{\text{RX}}(\omega_{\text{RX}, 1})&\cdots&\mathbf{a}_{\text{RX}}(\omega_{\text{RX}, B_{\text{RX}}})\end{array}\right], \end{array} $$

(8)

$$\begin{array}{*{20}l} \mathbf{A}_{\text{TX}}&=\left[\begin{array}{lll}\mathbf{a}_{\text{TX}}(\omega_{\text{TX}, 1})&\cdots&\mathbf{a}_{\text{TX}}(\omega_{\text{TX}, B_{\text{TX}}})\end{array}\right], \end{array} $$

(9)

whose gridding AoAs $\{\omega _{\text {RX}, i}\}_{i=1}^{B_{\text {RX}}}$ and AoDs $\{\omega _{\text {TX}, j}\}_{j=1}^{B_{\text {TX}}}$ are selected so as to form overcomplete DFT matrices. The gridding AoAs and AoDs are the B_RX and B_TX points from [−π/2,π/2], respectively, to discretize the AoAs and AoDs because the ground truth AoAs and AoDs are unknown. To make a dictionary pair of the overcomplete DFT matrix form, the gridding AoAs and AoDs are given as ω_RX,i= sin−1(−1+2/B_RX·(i−1)) and ω_RX,j= sin−1(−1+2/B_TX·(j−1)), respectively. We prefer overcomplete DFT matrices because they are relatively well-conditioned, and DFT matrices are friendly to the FFT-based implementation, which will be discussed in Section 4. The virtual channel $\mathbf {X}^{*}\in \mathbb {C}^{B_{\text {RX}}\times B_{\text {TX}}}$ is the collection of $\{\alpha _{\ell }\}_{\ell =1}^{L}$, whose (i,j)th element is α_ℓ whenever (ω_RX,i,ω_TX,j) is the nearest to (θ_RX,ℓ,θ_TX,ℓ) but 0 otherwise. In general, the error between H and $\mathbf {A}_{\text {RX}}\mathbf {X}^{*}\mathbf {A}_{\text {TX}}^{\mathrm {H}}$ is inversely proportional to B_RX and B_TX. To approximate H using (7) with negligible error, the dictionary pair must be dense, namely, B_RX≫M and B_TX≫N.

Before we proceed, we provide a supplementary explanation on the approximation in (7). In this work, we attempt to estimate the L-sparse X^∗ in (7) because the sparse assumption on X^∗ is favorable when the goal is to formulate the channel estimation problem as a sparsity-constrained problem. The cost of assuming that X^∗ is L-sparse is, as evident, the approximation error shown in (7). Alternatively, the approximation error can be perfectly removed by considering X^∗ satisfying $\mathbf {H}=\mathbf {A}_{\text {RX}}\mathbf {X}^{*}\mathbf {A}_{\text {TX}}^{\mathrm {H}}$, i.e., equality instead of approximation. One well-known X^∗ satisfying the equality is the minimum Frobenius norm solution, i.e., $\mathbf {X}^{*}=\mathbf {A}_{\text {RX}}^{\mathrm {H}}(\mathbf {A}_{\text {RX}}\mathbf {A}_{\text {RX}}^{\mathrm {H}})^{-1}\mathbf {H}(\mathbf {A}_{\text {TX}}\mathbf {A}_{\text {TX}}^{\mathrm {H}})^{-1}\mathbf {A}_{\text {TX}}$. Such X^∗, however, has no evident structure to exploit in channel estimation, which is the reason why we assume that X^∗ is L-sparse at the cost of the approximation error in (7).

In practice, the arrays at the transmitter and receiver are typically large to compensate the path loss in the mmWave band, whereas the number of line-of-sight (LOS) and near LOS paths is small [23]. Therefore, X^∗ is sparse when the dictionary pair is dense because only L elements among B_RXB_TX elements are non-zero where L≪MN≪B_RXB_TX. In the sequel, we use the shorthand notation B=B_RXB_TX.

To facilitate the channel estimation framework, we vectorize (1) and (5) in conjunction with (7). First, note that

$$ \mathbf{Y}=\sqrt{\rho}\mathbf{A}_{\text{RX}}\mathbf{X}^{*}\mathbf{A}_{\text{TX}}^{\mathrm{H}} \mathbf{S}+\mathbf{N}+\mathbf{E} $$

(10)

where the gridding error $\mathbf {E}\in \mathbb {C}^{M\times T}$ represents the mismatch in (7).^{Footnote 1} Then, the vectorized received signal $\mathbf {y}=\text {vec}(\mathbf {Y})\in \mathbb {C}^{MT}$ is

$$ \mathbf{y}=\sqrt{\rho}\mathbf{A}\mathbf{x}^{*}+\mathbf{n}+\mathbf{e} $$

(11)

where

$$\begin{array}{*{20}l} \mathbf{A}&=\mathbf{S}^{\mathrm{T}}\overline{\mathbf{A}}_{\text{TX}}\otimes\mathbf{A}_{\text{RX}}\\ &=\left[\begin{array}{llll}\mathbf{a}_{1}&\mathbf{a}_{2}&\cdots&\mathbf{a}_{B}\end{array}\right], \end{array} $$

(12)

$$\begin{array}{*{20}l} \mathbf{x}^{*}&=\text{vec}(\mathbf{X}^{*})\\ &=\left[\begin{array}{llll}x_{1}^{*}&x_{2}^{*}&\cdots&x_{B}^{*}\end{array}\right]^{\mathrm{T}}, \end{array} $$

(13)

$$\begin{array}{*{20}l} \mathbf{n}&=\text{vec}(\mathbf{N}), \end{array} $$

(14)

$$\begin{array}{*{20}l} \mathbf{e}&=\text{vec}(\mathbf{E}). \end{array} $$

(15)

The vectorized quantized received signal $\hat {\mathbf {y}}=\text {vec}(\hat {\mathbf {Y}})\in \mathbb {C}^{MT}$ is

$$\begin{array}{*{20}l} \hat{\mathbf{y}}&=\mathrm{Q}(\mathbf{y})\\ &=\mathrm{Q}(\sqrt{\rho}\mathbf{A}\mathbf{x}^{*}+\mathbf{n}+\mathbf{e}). \end{array} $$

(16)

The goal is to estimate L-sparse x^∗ from $\hat {\mathbf {Y}}$.

3 Problem formulation

In this section, we formulate the channel estimation problem using the MAP criterion. To facilitate the MAP channel estimation framework, the real counterparts of the complex forms in (16) are introduced. Then, the likelihood function of x^∗ is derived.

The real counterparts are the collections of the real and imaginary parts of the complex forms. Therefore, the real counterparts $\hat {\mathbf {y}}_{\mathrm {R}}\in \mathbb {R}^{2MT}, \mathbf {A}_{\mathrm {R}}\in \mathbb {R}^{2MT\times 2B}$, and $\mathbf {x}_{\mathrm {R}}^{*}\in \mathbb {R}^{2B}$ are

$$\begin{array}{*{20}l} \hat{\mathbf{y}}_{\mathrm{R}}&=\left[\begin{array}{ll}\text{Re}(\hat{\mathbf{y}})^{\mathrm{T}}&\text{Im} (\hat{\mathbf{y}})^{\mathrm{T}}\end{array}\right]^{\mathrm{T}}\\ &=\left[\begin{array}{llll}\hat{y}_{\mathrm{R}, 1}&\hat{y}_{\mathrm{R}, 2}&\cdots&\hat{y}_{\mathrm{R}, 2MT}\end{array}\right]^{\mathrm{T}}, \end{array} $$

(17)

$$\begin{array}{*{20}l} \mathbf{A}_{\mathrm{R}}&=\left[\begin{array}{cc}\text{Re}(\mathbf{A})&-\text{Im}(\mathbf{A})\\ \text{Im}(\mathbf{A})&\text{Re}(\mathbf{A})\end{array}\right]\\ &=\left[\begin{array}{llll}\mathbf{a}_{\mathrm{R}, 1}&\mathbf{a}_{\mathrm{R}, 2}&\cdots&\mathbf{a}_{\mathrm{R}, 2MT}\end{array}\right]^{\mathrm{T}}, \end{array} $$

(18)

$$\begin{array}{*{20}l} \mathbf{x}_{\mathrm{R}}^{*}&=\left[\begin{array}{ll}\text{Re}(\mathbf{x}^{*}) ^{\mathrm{T}}&\text{Im}(\mathbf{x}^{*})^{\mathrm{T}}\end{array}\right]^{\mathrm{T}}\\ &=\left[\begin{array}{llll}x_{\mathrm{R}, 1}^{*}&x_{\mathrm{R}, 2}^{*}&\cdots&x_{\mathrm{R}, 2B}^{*}\end{array}\right]^{\mathrm{T}}, \end{array} $$

(19)

which are the collections of the real and imaginary parts of $\hat {\mathbf {y}}, \mathbf {A}$, and x^∗, respectively. In the sequel, we use the complex forms and the real counterparts interchangeably. For example, x^∗ and $\mathbf {x}_{\mathrm {R}}^{*}$ refer to the same entity.

Before we formulate the likelihood function of x^∗, note that e is hard to analyze. However, e is negligible when the dictionary pair is dense. Therefore, we formulate the likelihood function of x^∗ without e. The price of such oversimplification is negligible when B_RX≫M and B_TX≫N, which is to be shown in Section 5 where e≠0_MT. To derive the likelihood function of x^∗, note that

$$ \sqrt{\rho}\mathbf{A}\mathbf{x}^{*}+\mathbf{n}\sim\mathcal{C}\mathcal{N}(\sqrt{\rho}\mathbf{A}\mathbf{x}^{*}, \mathbf{I}_{MT}) $$

(20)

given x^∗. Then, from (20) in conjunction with (16), the log-likelihood function f(x) is [10]

$$\begin{array}{*{20}l} f(\mathbf{x})&=\log\text{Pr}\left[\begin{array}{cc}\hat{\mathbf{y}}=\mathrm{Q}(\sqrt{\rho}\mathbf{A} \mathbf{x}+\mathbf{n})\mid\mathbf{x}\end{array}\right]\\ &=\sum_{i=1}^{2MT}\log\Phi\left(\sqrt{2\rho}\hat{y}_{\mathrm{R}, i}\mathbf{a}_{\mathrm{R}, i}^{\mathrm{T}}\mathbf{x}_{\mathrm{R}}\right). \end{array} $$

(21)

If the distribution of x^∗ is known, the MAP estimate of x^∗ is

$$ \underset{\mathbf{x}\in\mathbb{C}^{B}}{\text{argmax}}\ (f(\mathbf{x})+g_{\text{MAP}}(\mathbf{x})) $$

(22)

where g_MAP(x) is the logarithm of the PDF of x^∗. In practice, however, g_MAP(x) is unknown. Therefore, we formulate the MAP channel estimation framework based on $\{\alpha _{\ell }\}_{\ell =1}^{L}, \{\theta _{\text {RX}, \ell }\}_{\ell =1}^{L}$, and $\{\theta _{\text {TX}, \ell }\}_{\ell =1}^{L}$ where we assume the followings:

1.
$\alpha _{\ell }\sim \mathcal {C}\mathcal {N}(0, 1)$ for all ℓ
2.
θ_RX,ℓ∼unif([−π/2,π/2]) for all ℓ
3.
θ_TX,ℓ∼unif([−π/2,π/2]) for all ℓ
4.
$\{\alpha _{\ell }\}_{\ell =1}^{L}, \{\theta _{\text {RX}, \ell }\}_{\ell =1}^{L}$, and $\{\theta _{\text {TX}, \ell }\}_{\ell =1}^{L}$ are independent.

Then, the MAP estimate of x^∗ considering the channel sparsity is

$$ \underset{\mathbf{x}\in\mathbb{C}^{B}}{\text{argmax}}\ (f(\mathbf{x})+g(\mathbf{x}))\enspace\text{s.t.}\enspace\|\mathbf{x}\|_{0}\leq L $$

(23)

where g(x)=−∥x_R∥² is the logarithm of the PDF of $\mathcal {C}\mathcal {N}(\mathbf {0}_{B}, \mathbf {I}_{B})$ ignoring the constant factor. However, note that only the optimization problems (22) and (23) are equivalent in the sense that their solutions are the same, not g_MAP(x) and g(x). In the ML channel estimation framework, the ML estimate of x^∗ is

$$ \underset{\mathbf{x}\in\mathbb{C}^{B}}{\text{argmax}}\ f(\mathbf{x})\enspace\text{s.t.}\enspace\|\mathbf{x}\|_{0}\leq L. $$

(24)

In the sequel, we focus on solving (23) because (23) reduces to (24) when g(x)=0. In addition, we denote the objective function and the gradient in (23) as h(x) and ∇h(x), respectively. Therefore,

$$\begin{array}{*{20}l} h(\mathbf{x})&=f(\mathbf{x})+g(\mathbf{x}), \end{array} $$

(25)

$$\begin{array}{*{20}l} \nabla h(\mathbf{x})&=\nabla f(\mathbf{x})+\nabla g(\mathbf{x})\\ &=\left[\begin{array}{llll}\nabla h(x_{1})&\nabla h(x_{2})&\cdots&\nabla h(x_{B})\end{array}\right]^{\mathrm{T}} \end{array} $$

(26)

where the differentiation is with respect to x.

4 Channel estimation via gradient pursuit

In this section, we propose the BMSGraSP and BMSGraHTP algorithms to solve (23), which are the variants of the GraSP [17] and GraHTP [18] algorithms, respectively. Then, an FFT-based fast implementation is proposed. In addition, we investigate the limit of the BMSGraSP and BMSGraHTP algorithms in the high SNR regime in 1 bit ADCs.

4.1 Proposed BMSGraSP and BMSGraHTP algorithms

Note that h(x) in (23) is concave because f(x) and g(x) are the sums of the logarithms of Φ(·) and ϕ(·), respectively, which are log-concave [24]. However, (23) is not a convex optimization problem because the sparsity constraint is not convex. Furthermore, solving (23) is NP-hard because of its combinatorial complexity. To approximately optimize convex objective functions with sparsity constraints iteratively by pursuing the gradient of the objective function, the GraSP and GraHTP algorithms were proposed in [17] and [18], respectively.

To solve (23), the GraSP and GraHTP algorithms roughly proceed as follows at each iteration when x is the current estimate of x^∗ where the iteration index is omitted for simplicity. First, the best L-term approximation of ∇h(x) is computed, which is

$$ T_{L}(\nabla h(\mathbf{x}))=\nabla h(\mathbf{x})|_{L} $$

(27)

where T_L(·) is the L-term hard thresholding function. Here, T_L(·) leaves only the L largest elements (in absolute value) of ∇h(x), and sets all the other remaining elements to 0. Then, after the estimate of supp(x^∗) is updated by selecting

$$ \mathcal{I}=\text{supp}(T_{L}(\nabla h(\mathbf{x}))), $$

(28)

i.e., $\mathcal {I}$ is the set of indices formed by collecting the L indices of ∇h(x) corresponding to its L largest elements (in absolute value), the estimate of x^∗ is updated by solving the following optimization problem

$$ \underset{\mathbf{x}\in\mathbb{C}^{B}}{\text{argmax}}\ h(\mathbf{x})\enspace\text{s.t.}\enspace\text{supp}(\mathbf{x})\subseteq\mathcal{I}, $$

(29)

which can be solved using convex optimization because the support constraint is convex [24]. The GraSP and GraHTP algorithms are the generalizations of the CoSaMP [19] and HTP [20] algorithms, respectively. This follows because the gradient of the squared error is the scaled proxy of the residual.

To solve (23) using the GraSP and GraHTP algorithms, h(x) is required either to have a stable restricted Hessian [17] or to be strongly convex and smooth [18]. These conditions are the generalizations of the restricted isometry property (RIP) in CS [25], which means that h(x) is likely to satisfy these conditions when A is either a restricted isometry, well-conditioned, or incoherent. In practice, however, A is highly coherent because the dictionary pair is typically dense to reduce the mismatch in (7).

To illustrate how the GraSP and GraHTP algorithms fail to solve (23) when A is highly coherent, consider the real counterpart of ∇h(x). The real counterpart $\nabla h(\mathbf {x}_{\mathrm {R}})\in \mathbb {R}^{2B}$ is

$$\begin{array}{*{20}l} &\nabla h(\mathbf{x}_{\mathrm{R}})\\ =&\left[\begin{array}{ll}\text{Re}(\nabla h(\mathbf{x}))^{\mathrm{T}}&\text{Im}(\nabla h(\mathbf{x}))^{\mathrm{T}}\end{array}\right]^{\mathrm{T}}\\ =&\sum_{i=1}^{2MT}\lambda\left(\sqrt{2\rho}\hat{y}_{\mathrm{R}, i}\mathbf{a}_{\mathrm{R}, i}^{\mathrm{T}}\mathbf{x}_{\mathrm{R}}\right)\sqrt{2\rho}\hat{y}_{\mathrm{R}, i}\mathbf{a}_{\mathrm{R}, i}-2\mathbf{x}_{\mathrm{R}}, \end{array} $$

(30)

which follows from $\nabla \log \Phi (\mathbf {a}_{\mathrm {R}}^{\mathrm {T}}\mathbf {x}_{\mathrm {R}})=\lambda (\mathbf {a}_{\mathrm {R}}^{\mathrm {T}}\mathbf {x}_{\mathrm {R}})\mathbf {a}_{\mathrm {R}}$ and ∇∥x_R∥²=2x_R where λ(·)=ϕ(·)⊘Φ(·) is the inverse Mills ratio function^{Footnote 2}. Then, the following observation holds from directly computing ∇h(x_i), whose real and imaginary parts are the i-th and (i+B)-th elements of ∇h(x_R), respectively.

Observation 1

∇h(x_i)=∇h(x_j) if a_i=a_j and x_i=x_j.

However, Observation 1 is meaningless because a_i≠a_j unless i=j. To establish a meaningful observation, consider the coherence between a_i and a_j, which reflects the proximity between a_i and a_j according to [26,27]

$$ \mu(i, j)=\frac{|\mathbf{a}_{i}^{\mathrm{H}}\mathbf{a}_{j}|}{\|\mathbf{a}_{i}\|\|\mathbf{a}_{j}\|}. $$

(31)

Then, using the η-coherence band, which is [26]

$$ B_{\eta}(i)=\{j\mid\mu(i, j)\geq\eta\} $$

(32)

where η∈(0,1), we establish the following conjecture when η is sufficiently large.

Conjecture 1

∇h(x_i)≈∇h(x_j) if j∈B_η(i) and x_i=x_j.

At this point, we use Conjecture 1 to illustrate how the GraSP and GraHTP algorithms fail to estimate supp(x^∗) from (28) by naive hard thresholding when A is highly coherent. To proceed, consider the following example, which assumes that x^∗ and $\hat {\mathbf {Y}}$ are realized with x representing the current estimate of x^∗ so as to satisfy

1)
$i=\underset {k\in \{1, 2, \dots, B\}}{\text {argmax}}\ |x_{k}^{*}|$
2)
$i=\underset {k\in \{1, 2, \dots, B\}}{\text {argmax}}\ |\nabla h(x_{k})|$
3)
$\mathcal {J}\cap \text {supp}(\mathbf {x}^{*})=\emptyset $

where i is the index corresponding to the largest element of the ground truth^{Footnote 3} virtual channel x^∗, and

$$ \mathcal{J}=\{j\mid j\in B_{\eta}(i), x_{i}=x_{j}\}\setminus\{i\} $$

(33)

is the by-product of i. Here, $\mathcal {J}$ is called the by-product of i because

$$\begin{array}{*{20}l} |\nabla h(x_{j})|&\approx|\nabla h(x_{i})|\\ &=\underset{k\in\{1, 2, \dots, B\}}{\text{max}}\ |\nabla h(x_{k})|, \end{array} $$

(34)

which follows from Conjecture 1, holds despite $x_{j}^{*}=0$ for all $j\in \mathcal {J}$. In other words, the by-product of i refers to the fact that ∇h(x_i) and ∇h(x_j) are indistinguishable for all $j\in \mathcal {J}$ according to (34), but the elements of x^∗ indexed by $\mathcal {J}$ are 0 according to 3). Therefore, when we attempt to estimate supp(x^∗) by hard thresholding ∇h(x), the indices in $\mathcal {J}$ will likely be erroneously selected as the estimate of supp(x^∗) because ∇h(x_j) and the maximum element of ∇h(x), which is ∇h(x_i) according to 2), are indistinguishable for all $j\in \mathcal {J}$.

To illustrate how (28) cannot estimate supp(x^∗) when A is highly coherent, consider another example where ∇h(x) and T_L(∇h(x)) are shown in Figs. 2 and 3, respectively. In this example, supp(x^∗) is widely spread, whereas most of supp(T_L(∇h(x))) are in the coherence band of the index of the maximum element of ∇h(x). This shows that hard thresholding ∇h(x) is not sufficient to distinguish whether an index is the ground truth index or the by-product of another index. To solve this problem, we propose the BMS hard thresholding technique.

The BMS hard thresholding function T_BMS,L(·) is an L-term hard thresholding function, which is proposed based on Conjecture 1. The BMS hard thresholding technique is presented in Algorithm 1. Line 3 selects the index of the maximum element of ∇h(x) from the unchecked index set as the current index. Line 4 constructs the by-product testing set. Line 5 checks whether the current index is greater than the by-product testing set. In this paper, Line 5 is referred to as the band maximum criterion. If the band maximum criterion is satisfied, the current index is selected as the estimate of supp(x^∗) in Line 6. Otherwise, the current index is not selected as the estimate of supp(x^∗) because the current index is likely to be the by-product of another index rather than the ground truth index. Line 8 updates the unchecked index set.

Note that Algorithm 1 is a hard thresholding technique applied to ∇h(x). If the BMS hard thresholding technique is applied to x+κ∇h(x) where κ is the step size, ∇h(x) is replaced by x+κ∇h(x) in the input, output, and Lines 3, 5, and 10 of Algorithm 1. This can be derived using the same logic based on Conjecture 1. Now, we propose the BMSGraSP and BMSGraHTP algorithms to solve (23).

The BMSGraSP and BMSGraHTP algorithms are the variants of the GraSP and GraHTP algorithms, respectively. The difference between the BMS-based and non-BMS-based algorithms is that the hard thresholding function is T_BMS,L(·) instead of T_L(·). The BMSGraSP and BMSGraHTP algorithms are presented in Algorithms 2 and 3, respectively. Lines 3, 4, and 5 of Algorithms 2 and 3 roughly proceed based on the same logic. Line 3 computes the gradient of the objective function. Line 4 selects $\mathcal {I}$ from the support of the hard thresholded gradient of the objective function. Line 5 maximizes the objective function subject to the support constraint. This can be solved using convex optimization because the objective function and support constraint are concave and convex, respectively. In addition, b is hard thresholded in Line 6 of Algorithm 2 because b is at most 3L-sparse. A natural halting condition of Algorithms 2 and 3 is to halt when the current and previous $\text {supp}(\tilde {\mathbf {x}})$ are the same. The readers who are interested in a more in-depth analyses of the GraSP and GraHTP algorithms are referred to [17] and [18], respectively.

Remark 1

Instead of hard thresholding b, we can solve

$$ \tilde{\mathbf{x}}=\underset{\mathbf{x}\in\mathbb{C}^{B}}{\text{argmax}}\ h(\mathbf{x})\enspace\text{s.t.}\enspace\text{supp}(\mathbf{x})\subseteq\text{supp}(T_{L}(\mathbf{b})), $$

(35)

which is a convex optimization problem, to obtain $\tilde {\mathbf {x}}$ in Line 6 of Algorithm 2. This is the debiasing variant of Algorithm 2 [17]. The advantage of the debiasing variant of Algorithm 2 is a more accurate estimate of x^∗. However, the complexity is increased, which is incurred by solving (35).

Remark 2

In this paper, we assume that only h(x) and ∇h(x) are required at each iteration to solve (23) using Algorithms 2 and 3, which can be accomplished when the first order method is used to solve convex optimization problems in Line 5 of Algorithms 2 and 3. An example of such first order method is the gradient descent method with the backtracking line search [24].

4.2 Fast implementation via FFT

In practice, the complexity of Algorithms 2 and 3 is demanding because h(x) and ∇h(x) are required at each iteration, which are high-dimensional functions defined on $\mathbb {C}^{B}$ where B≫MN. In recent works on channel estimation and data detection in the mmWave band [14,15,28], the FFT-based implementation is widely used because H can be approximated by (7) using overcomplete DFT matrices. In this paper, an FFT-based fast implementation of h(x) and ∇h(x) is proposed, which is motivated by [14,15,28].

To facilitate the analysis, we convert the summations in h(x) and ∇h(x_R) to matrix-vector multiplications by algebraically manipulating (21) and (30). Then, we obtain

$$\begin{array}{*{20}l} &h(\mathbf{x})\\ =&\text{sum}(\log\Phi(\sqrt{2\rho}\hat{\mathbf{y}}_{\mathrm{R}}\odot\mathbf{A}_{\mathrm{R}}\mathbf{x}_{\mathrm{R}}))-\|\mathbf{x}_{\mathrm{R}}\|^{2}, \end{array} $$

(36)

$$\begin{array}{*{20}l} &\nabla h(\mathbf{x}_{\mathrm{R}})\\ =&\mathbf{A}_{\mathrm{R}}^{\mathrm{T}}(\lambda(\sqrt{2\rho}\hat{\mathbf{y}}_{\mathrm{R}}\odot\mathbf{A}_{\mathrm{R}}\mathbf{x}_{\mathrm{R}})\odot\sqrt{2\rho}\hat{\mathbf{y}}_{\mathrm{R}})-2\mathbf{x}_{\mathrm{R}} \end{array} $$

(37)

where we see that the bottlenecks of h(x) and ∇h(x) come from the matrix-vector multiplications involving A_R and $\mathbf {A}_{\mathrm {R}}^{\mathrm {T}}$ resulting from the large size of A. For example, the size of A is 5120×65536 in Section 5 where M=N=64,B_RX=B_TX=256, and T=80.

To develop an FFT-based fast implementation of the matrix-vector multiplications involving A_R and $\mathbf {A}_{\mathrm {R}}^{\mathrm {T}}$, define $\mathbf {c}_{\mathrm {R}}\in \mathbb {R}^{2MT}$ as $\mathbf {c}_{\mathrm {R}}=\lambda (\sqrt {2\rho }\hat {\mathbf {y}}_{\mathrm {R}}\odot \mathbf {A}_{\mathrm {R}}\mathbf {x}_{\mathrm {R}})\odot \sqrt {2\rho }\hat {\mathbf {y}}_{\mathrm {R}}$ from (37) with $\mathbf {c}\in \mathbb {C}^{MT}$ being the complex form of c_R. From the fact that

$$\begin{array}{*{20}l} \mathbf{A}_{\mathrm{R}}\mathbf{x}_{\mathrm{R}}&=\left[\begin{array}{ll}\text{Re}(\mathbf{A}\mathbf{x})^{\mathrm{T}} &\text{Im}(\mathbf{A}\mathbf{x})^{\mathrm{T}}\end{array}\right]^{\mathrm{T}}, \end{array} $$

(38)

$$\begin{array}{*{20}l} \mathbf{A}_{\mathrm{R}}^{\mathrm{T}}\mathbf{c}_{\mathrm{R}}&=\left[\begin{array}{ll}\text{Re}(\mathbf{A} ^{\mathrm{H}}\mathbf{c})^{\mathrm{T}}&\text{Im}(\mathbf{A}^{\mathrm{H}}\mathbf{c})^{\mathrm{T}} \end{array}\right]^{\mathrm{T}}, \end{array} $$

(39)

we now attempt to compute Ax and A^Hc via the FFT. Then, Ax and A^Hc are unvectorized according to

$$\begin{array}{*{20}l} \text{unvec}(\mathbf{A}\mathbf{x})&=\mathbf{A}_{\text{RX}}\mathbf{X}\mathbf{A}_{\text{TX}}^{\mathrm{H}}\mathbf{S}\\ &=\underbrace{\mathbf{A}_{\text{RX}}(\underbrace{\mathbf{S}^{\mathrm{H}}(\underbrace{\mathbf{A}_{\text{TX}}\mathbf{X}^{\mathrm{H}}}_{\text{FFT}})}_{\text{IFFT}})^{\mathrm{H}}}_{\text{FFT}}, \end{array} $$

(40)

$$\begin{array}{*{20}l} \text{unvec}(\mathbf{A}^{\mathrm{H}}\mathbf{c})&=\mathbf{A}_{\text{RX}}^{\mathrm{H}}\mathbf{C}\mathbf{S}^{\mathrm{H}}\mathbf{A}_{\text{TX}}\\ &=\underbrace{\mathbf{A}_{\text{RX}}^{\mathrm{H}}(\underbrace{\mathbf{A}_{\text{TX}}^{\mathrm{H}}(\underbrace{\mathbf{S}\mathbf{C}^{\mathrm{H}}}_{\text{FFT}})}_{\text{IFFT}})^{\mathrm{H}}}_{\text{IFFT}} \end{array} $$

(41)

where $\mathbf {X}=\text {unvec}(\mathbf {x})\in \mathbb {C}^{B_{\text {RX}}\times B_{\text {TX}}}$ and $\mathbf {C}=\text {unvec}(\mathbf {c})\in \mathbb {C}^{M\times T}$. If the matrix multiplication involving S can be implemented using the FFT, e.g., Zadoff-Chu (ZC) [29] or DFT [11] training sequence, (40) and (41) can be implemented using the FFT because A_RX and A_TX are overcomplete DFT matrices. For example, each column of A_TXX^H in (40) can be computed using the B_TX-point FFT with pruned outputs, i.e., retaining only N outputs, because we constructed A_TX as an overcomplete DFT matrix.

In particular, the matrix multiplications involving A_TX,S^H, and A_RX in (40) can be implemented with B_TX-point FFT with pruned outputs repeated B_RX times, T-point IFFT with pruned inputs repeated B_RX times, and B_RX-point FFT with pruned outputs repeated T times, respectively.^{Footnote 4} Using the same logic, the matrix multiplications involving $\mathbf {S}, \mathbf {A}_{\text {TX}}^{\mathrm {H}}$, and $\mathbf {A}_{\text {RX}}^{\mathrm {H}}$ in (41) can be implemented using T-point FFT with pruned outputs repeated M times, B_TX-point IFFT with pruned inputs repeated M times, and B_RX-point IFFT with pruned inputs repeated B_TX times, respectively. Therefore, the complexity of the FFT-based implementation of (40) and (41) is O(B_RXB_TX logB_TX+B_RXT logT+TB_RX logB_RX) and O(MT logT+MB_TX logB_TX+B_TXB_RX logB_RX), respectively.

To illustrate the efficiency of the FFT-based implementation of (40) and (41), M/N,M/B_RX,M/B_TX, and M/T are assumed to be fixed. Then, the complexity of the FFT-based implementation of Ax and A^Hc is O(M² logM), whereas the complexity of directly computing Ax and A^Hc is O(M⁴). Therefore, the complexity of Algorithms 2 and 3 is reduced when h(x) and ∇h(x) are implemented using the FFT operations.

Remark 3

Line 5 of Algorithms 2 and 3 is equivalent to solving

$$ \underset{\mathbf{x}_{\mathcal{I}}\in\mathbb{C}^{|\mathcal{I}|}}{\text{argmax}}\ h_{\mathcal{I}}(\mathbf{x}_{\mathcal{I}})=\underset{\mathbf{x}_{\mathcal{I}}\in\mathbb{C}^{|\mathcal{I}|}}{\text{argmax}}\ (f_{\mathcal{I}}(\mathbf{x}_{\mathcal{I}})+g_{\mathcal{I}}(\mathbf{x}_{\mathcal{I}})) $$

(42)

where

$$\begin{array}{*{20}l} f_{\mathcal{I}}(\mathbf{x}_{\mathcal{I}})&=\log\text{Pr}\left[\begin{array}{ll}\hat{\mathbf{y}}=\mathrm{Q}(\sqrt{\rho} \mathbf{A}_{\mathcal{I}}\mathbf{x}_{\mathcal{I}}+\mathbf{n})\mid\mathbf{x}_{\mathcal{I}}\end{array}\right], \end{array} $$

(43)

$$\begin{array}{*{20}l} g_{\mathcal{I}}(\mathbf{x}_{\mathcal{I}})&=-\|\mathbf{x}_{\mathcal{I}}\|^{2}, \end{array} $$

(44)

and $\mathbf {A}_{\mathcal {I}}\in \mathbb {C}^{MT\times |\mathcal {I}|}$ is the collection of a_i with $i\in \mathcal {I}$. Therefore, only $h_{\mathcal {I}}(\mathbf {x}_{\mathcal {I}})$ and $\nabla h_{\mathcal {I}}(\mathbf {x}_{\mathcal {I}})$ are required in Line 5 of Algorithms 2 and 3, which are low-dimensional functions defined on $\mathbb {C}^{|\mathcal {I}|}$ where $|\mathcal {I}|=O(L)$. If $h_{\mathcal {I}}(\mathbf {x}_{\mathcal {I}})$ and $\nabla h_{\mathcal {I}}(\mathbf {x}_{\mathcal {I}})$ are computed based on the same logic in (40) and (41) but A replaced by $\mathbf {A}_{\mathcal {I}}$, the complexity of Algorithms 2 and 3 is reduced further because the size of the FFT is reduced in Line 5.

5 Results and discussion

In this section, we evaluate the performance of Algorithms 2 and 3 from different aspects in terms of the accuracy, achievable rate, and complexity. Throughout this section, we consider a mmWave massive MIMO system with 1 bit ADCs, whose parameters are M=N=64 and T=80. The rest vary from simulation to simulation, which consist of B_RX,B_TX, and L. In addition, we consider S, whose rows are the circular shifts of the ZC training sequence of length T as in [15,33]. Furthermore, H is either random or deterministic. If H is random, $\alpha _{\ell }\sim \mathcal {C}\mathcal {N}(0, 1), \theta _{\text {RX}, \ell }\sim \text {unif}([-\pi /2, \pi /2])$, and θ_TX,ℓ∼unif([−π/2,π/2]) are independent. Otherwise, we consider different H from simulation to simulation.

The MSEs of $\{\alpha _{\ell }\}_{\ell =1}^{L}, \{\theta _{\text {RX}, \ell }\}_{\ell =1}^{L}$, and $\{\theta _{\text {TX}, \ell }\}_{\ell =1}^{L}$ are

$$\begin{array}{*{20}l} \text{MSE}(\{\alpha_{\ell}\}_{\ell=1}^{L})&=\mathbb{E}\left\{\frac{1}{L}\sum_{\ell=1}^{L}|\tilde{\alpha}_{\ell}-\alpha_{\ell}|^{2}\right\}, \end{array} $$

(45)

$$\begin{array}{*{20}l} \text{MSE}(\{\theta_{\text{RX}, \ell}\}_{\ell=1}^{L})&=\mathbb{E}\left\{\frac{1}{L}\sum_{\ell=1}^{L}(\tilde{\theta}_{\text{RX}, \ell}-\theta_{\text{RX}, \ell})^{2}\right\}, \end{array} $$

(46)

$$\begin{array}{*{20}l} \text{MSE}(\{\theta_{\text{TX}, \ell}\}_{\ell=1}^{L})&=\mathbb{E}\left\{\frac{1}{L}\sum_{\ell=1}^{L}(\tilde{\theta}_{\text{TX}, \ell}-\theta_{\text{TX}, \ell})^{2}\right\} \end{array} $$

(47)

where $(\tilde {\alpha }_{\ell }, \tilde {\theta }_{\text {RX}, \ell }, \tilde {\theta }_{\text {TX}, \ell })$ corresponds to some non-zero element of $\tilde {\mathbf {X}}=\text {unvec}(\tilde {\mathbf {x}})\in \mathbb {C}^{B_{\text {RX}}\times B_{\text {TX}}}$. The normalized MSE (NMSE) of H is

$$ \text{NMSE}(\mathbf{H})=\mathbb{E}\left\{\frac{\|\tilde{\mathbf{H}}-\mathbf{H}\|_{\mathrm{F}}^{2}}{\|\mathbf{H}\|_{\mathrm{F}}^{2}}\right\} $$

(48)

where $\tilde {\mathbf {H}}=\mathbf {A}_{\text {RX}}\tilde {\mathbf {X}}\mathbf {A}_{\text {TX}}$. In (45)–(48), the symbol $\tilde {\hphantom {\mathbf {y}}}$ is used to emphasize that the quantity is an estimate.

Throughout this section, we consider the debiasing variant of Algorithm 2. The halting condition of Algorithms 2 and 3 is to halt when the current and previous $\text {supp}(\tilde {\mathbf {x}})$ are the same. The gradient descent method is used to solve convex optimization problems, which consist of (35) and Line 5 of Algorithms 2 and 3. The backtracking line search is used to compute the step size in the gradient descent method and κ in Line 3 of Algorithm 3. In addition, η is selected so that Conjecture 1 is satisfied. In this paper, we select the maximum η satisfying

$$ \underset{i\in\{1, 2, \dots, B\}}{\text{min}}\ |B_{\eta}(i)|>1. $$

(49)

For example, the maximum η satisfying (49) is η=0.6367 when B_RX=2M and B_TX=2N. The channel estimation criterion of Algorithms 2 and 3 is either MAP or ML, which depends on whether H is random or deterministic. To compare the BMS-based and non-BMS-based algorithms, the performance of the GraSP and GraHTP algorithms is shown as a reference in Figs. 4, 5, 6, and 7. The GraSP and GraHTP algorithms forbid B_RX≫M and B_TX≫N because the GraSP and GraHTP algorithms diverge when A is highly coherent. Therefore, the parameters are selected as B_RX=M and B_TX=N when the GraSP and GraHTP algorithms are implemented. Such B_RX and B_TX, however, are problematic because the mismatch in (7) is inversely proportional to B_RX and B_TX.

In Figs. 4 and 5, we compare the accuracy of the BMS-based and band excluding-based (BE-based) algorithms at different SNRs using $\text {MSE}(\{\alpha _{\ell }\}_{\ell =1}^{L}), \text {MSE}(\{\theta _{\text {RX}, \ell }\}_{\ell =1}^{L}), \text {MSE}(\{\theta _{\text {TX}, \ell }\}_{\ell =1}^{L})$, and NMSE(H). The BE hard thresholding technique was proposed in [26], which was applied to the orthogonal matching pursuit (OMP) algorithm [34]. In this paper, we apply the BE hard thresholding technique to the GraSP algorithm, which results in the BEGraSP algorithm. In this example, B_RX=B_TX=256 for the BMS-based and BE-based algorithms. We assume that L=8 and H is deterministic where $\alpha _{\ell }=(0.8+0.1(\ell -1))e^{j\frac {\pi }{4}(\ell -1)}$. However, $\{\theta _{\text {RX}, \ell }\}_{\ell =1}^{L}$ and $\{\theta _{\text {TX}, \ell }\}_{\ell =1}^{L}$ vary from simulation to simulation, which are either widely spread (Fig. 4) or closely spread (Fig. 5). In Figs. 4 and 5, the notion of widely and closely spread paths refer to the fact that the minimum 2-norm distance between the paths are either relatively far or close, i.e., mini≠j∥(θ_RX,i−θ_RX,j,θ_TX,i−θ_TX,j)∥₂ of Fig. 4, which is $\sqrt {(\pi /18)^{2}+(\pi /18)^{2}}$, is greater than that of Fig. 5, which is $\sqrt {(\pi /36)^{2}+(\pi /36)^{2}}$. The path gains, AoAs, and AoDs are assumed to be deterministic because the CRB is defined for deterministic parameters only [35]. A variant of the CRB for random parameters is known as the Bayesian CRB, but adding the Bayesian CRB to our work is left as a future work because applying the Bayesian CRB to non-linear measurements, e.g., 1 bit ADCs, is not as straightforward.

According to Figs. 4 and 5, the BMS-based algorithms succeed to estimate both widely spread and closely spread paths, whereas the BE-based algorithms fail to estimate closely spread paths. This follows because the BE hard thresholding technique was derived based on the assumption that supp(x^∗) is widely spread. In contrast, the BMS hard thresholding technique is proposed based on Conjecture 1 without any assumption on supp(x^∗). This means that when supp(x^∗) is closely spread, the BE hard thresholding technique cannot properly estimate supp(x^∗) because the BE hard thresholding technique, by its nature, excludes the elements near the maximum element of x^∗ from its potential candidate. The BMS hard thresholding technique, in contrast, uses the elements near the maximum element of x^∗ to construct the by-product testing set only, i.e., Line 4 of Algorithm 1. Therefore, the BMS-based algorithms are superior to the BE-based algorithms when the paths are closely spread. The Cramér-Rao bounds (CRBs) of $\text {MSE}(\{\alpha _{\ell }\}_{\ell =1}^{L}), \text {MSE}(\{\theta _{\text {RX}, \ell }\}_{\ell =1}^{L})$, and $\text {MSE}(\{\theta _{\text {TX}, \ell }\}_{\ell =1}^{L})$ are provided, which were derived in [36]. The gaps between the MSEs and their corresponding CRBs can be interpreted as a performance limit incurred by the discretized AoAs and AoDs. To overcome such limit, the AoAs and AoDs must be estimated based on the off-grid method, which is beyond the scope of this paper.

In addition, note that $\text {MSE}(\{\alpha _{\ell }\}_{\ell =1}^{L})$ and NMSE(H) worsen as the SNR enters the high SNR regime. To illustrate why x^∗ cannot be estimated in the high SNR regime in 1 bit ADCs, note that

$$\begin{array}{*{20}l} \mathrm{Q}(\sqrt{\rho}\mathbf{A}\mathbf{x}^{*}+\mathbf{n})&\approx\mathrm{Q}(\sqrt{\rho}\mathbf{A}\mathbf{x}^{*})\\ &=\mathrm{Q}(\sqrt{\rho}\mathbf{A}c\mathbf{x}^{*}) \end{array} $$

(50)

in the high SNR regime with c>0, which means that x^∗ and cx^∗ are indistinguishable because the magnitude information is lost by 1 bit ADCs. The degradation of the recovery accuracy in the high SNR regime with 1 bit ADCs is an inevitable phenomenon, as observed from other previous works on low-resolution ADCs [11,14,15,33,37].

In Figs. 6 and 7, we compare the performance of Algorithms 2 and 3, and other estimators when H is random. The Bernoulli Gaussian-GAMP (BG-GAMP) algorithm [15] is an iterative approximate MMSE estimator of x^∗, which was derived based on the assumption that $x_{i}^{*}$ is distributed as $\mathcal {C}\mathcal {N}(0, 1)$ with probability L/B but 0 otherwise, namely, the BG distribution. The fast iterative shrinkage-thresholding algorithm (FISTA) [38] is an iterative MAP estimator of x^∗, which was derived based on the assumption that the logarithm of the PDF of x^∗ is g_FISTA(x)=−γ∥x∥₁ ignoring the constant factor, namely, the Laplace distribution. Therefore, the estimate of x^∗ is

$$ \underset{\mathbf{x}\in\mathbb{C}^{B}}{\text{argmax}}\ (f(\mathbf{x})+g_{\text{FISTA}}(\mathbf{x})), $$

(51)

which is solved using the accelerated proximal gradient descent method [38]. The regularization parameter γ is selected so that the expected sparsity of (51) is 3L for a fair comparison, which was suggested in [17]. In this example, L=4, whereas B_RX and B_TX vary from algorithm to algorithm. In particular, we select B_RX=B_TX=256 for Algorithms 2, 3, and the FISTA, whereas B_RX=M and B_TX=N for the BG-GAMP algorithm.

In Fig. 6, we compare the accuracy of Algorithms 2, 3, and other estimators at different SNRs using NMSE(H). According to Fig. 6, Algorithms 2 and 3 outperform the BG-GAMP algorithm and FISTA as the SNR enters the medium SNR regime. The accuracy of the BG-GAMP algorithm is disappointing because the mismatch in (7) is inversely proportional to B_RX and B_TX. However, increasing B_RX and B_TX is forbidden because the BG-GAMP algorithm diverges when A is highly coherent. The accuracy of the FISTA is disappointing because the Laplace distribution does not match the distribution of x^∗. Note that (23), which is the basis of Algorithms 2 and 3, is indeed the MAP estimate of x^∗, which is in contrast to the FISTA. According to Fig. 6, NMSE(H) worsens as the SNR enters the high SNR regime, which follows from the same reason as in Figs. 4 and 5.

In Fig. 7, we compare the achievable rate lower bound of Algorithms 2, 3, and other estimators at different SNRs when the precoders and combiners are selected based on $\tilde {\mathbf {H}}$. The achievable rate lower bound shown in Fig. 7 is presented in [15], which was derived based on the Bussgang decomposition [13] in conjunction with the fact that the worst-case noise is Gaussian. According to Fig. 7, Algorithms 2 and 3 outperform the BG-GAMP algorithm and FISTA, which is consistent with the result in Fig. 6.

In Fig. 8, we compare the complexity of Algorithms 2, 3, and other estimators at different B_RX and B_TX when H is random. To analyze the complexity, note that Algorithms 2, 3, and the FISTA require h(x) and ∇h(x) at each iteration, whose bottlenecks are Ax and A^Hc, respectively, while the BG-GAMP algorithm requires Ax and A^Hc at each iteration. Therefore, the complexity is measured based on the number of complex multiplications performed to compute Ax and A^Hc, which are implemented based on the FFT. In this example, L=4, whereas SNR is either 0 dB or 10 dB.

In this paper, the complexity of the BG-GAMP algorithm is used as a baseline because the BG-GAMP algorithm is widely used. The normalized complexity is defined as the number of complex multiplications performed divided by the per-iteration complexity of the BG-GAMP. For example, the normalized complexity of the FISTA with B_RX=B_TX=256 is 160 when the complexity of the FISTA with B_RX=B_TX=256 is equivalent to the complexity of the 160-iteration BG-GAMP algorithm with B_RX=B_TX=256. In practice, the BG-GAMP algorithm converges in 15 iterations when A is incoherent [39]. In this paper, an algorithm is said to be as efficient as the BG-GAMP algorithm when the normalized complexity is below the target threshold, which is 15. As a sidenote, our algorithms, namely, the BMSGraSP and BMSGraHTP, requires 2.1710 and 2.0043 iterations in average, respectively, across the entire SNR range.

According to Fig. 8, the complexity of the FISTA is impractical because the objective function of (51) is a high-dimensional function defined on $\mathbb {C}^{B}$ where B≫MN. In contrast, the complexity of Algorithms 2 and 3 is dominated by (42), whose objective function is a low-dimensional function defined on $\mathbb {C}^{|\mathcal {I}|}$ where $|\mathcal {I}|=O(L)$. The normalized complexity of Algorithms 2 and 3 is below the target threshold when B_RX≥192 and B_TX≥192. Therefore, we conclude that Algorithms 2 and 3 are as efficient as the BG-GAMP algorithm when B_RX≫M and B_TX≫N.

6 Conclusions

In the mmWave band, the channel estimation problem is converted to a sparsity-constrained optimization problem, which is NP-hard to solve. To approximately solve sparsity-constrained optimization problems, the GraSP and GraHTP algorithms were proposed in CS, which pursue the gradient of the objective function. The GraSP and GraHTP algorithms, however, break down when the objective function is ill-conditioned, which is incurred by the highly coherent sensing matrix. To remedy such break down, we proposed the BMS hard thresholding technique, which is applied to the GraSP and GraHTP algorithms, namely, the BMSGraSP and BMSGraHTP algorithms, respectively. Instead of directly hard thresholding the gradient of the objective function, the BMS-based algorithms test whether an index is the ground truth index or the by-product of another index. We also proposed an FFT-based fast implementation of the BMS-based algorithms, whose complexity is reduced from O(M⁴) to O(M² logM). In the simulation, we compared the performance of the BMS-based, BE-based, BG-GAMP, and FISTA algorithms from different aspects in terms of the accuracy, achievable rate, and complexity. The BMS-based algorithms were shown to outperform other estimators, which proved to be both accurate and efficient. Our algorithms, however, addressed only the flat fading scenario, so an interesting future work would be to propose a low-complexity channel estimator capable of dealing with the wideband scenario.

7 Methods/experimental

The aim of this study is to propose an accurate yet efficient channel estimator for mmWave massive MIMO systems with 1 bit ADCs. Our channel estimator was proposed based on theoretical analysis. To be specific, we adopted and modified CS algorithms to exploit the sparse nature of the mmWave virtual channels. In addition, we carefully analyzed the proposed channel estimator to reduce the complexity. To verify the accuracy and complexity of the proposed channel estimator, we conducted extensive (Monte-Carlo) simulations.

Notes

In practice, X^∗ may be either approximately sparse or exactly sparse to formulate (10). If X^∗ is approximately sparse, the leakage effect is taken into account so the mismatch in (7) becomes 0, namely, vec(E)=0_MT. In contrast, the mismatch in (7) must be taken into account with a non-zero E when X^∗ is exactly sparse. Fortunately, E is inversely proportional to B_RX and B_TX. Therefore, we adopt the latter definition of X^∗ and propose our algorithm ignoring E assuming that B_RX≫M and B_TX≫N. The performance degradation from E will become less as B_RX and B_TX become sufficiently large.
The element-wise vector division in the inverse Mills ratio function is meaningless because the arguments of the inverse Mills ratio function are scalars in (30). The reason we use the element-wise vector division in the inverse Mills ratio function will become clear in (37), whose arguments are vectors.
We use the term “ground truth” to emphasize that the ground truth x^∗ is the true virtual channel which actually gives the quantized received signal $\hat {\mathbf {Y}}$ from (16), whereas x merely represents the point where ∇h(x) is computed to estimate supp(x^∗) via hard thresholding.
The inputs and outputs are pruned because A_RX,A_TX, and S are rectangular, not square. The details of the pruned FFT are presented in [30–32].

Abbreviations

ADC:: Analog-to-digital converter
AoA:: Angle-of-arrival
AoD:: Angle-of-departure
AWGN:: Additive white Gaussian noise
BE:: Band excluding
BG-GAMP:: Bernoulli Gaussian-generalized approximate message passing
BLMMSE:: Bussgang linear minimum mean squared error
BMS:: Band maximum selecting
CoSaMP:: Compressive sampling matching pursuit
CRB:: Cramér-Rao bound
CS:: Compressive sensing
DFT:: Discrete fourier transform
FFT:: Fast Fourier transform
FISTA:: Fast iterative shrinkage-thresholding algorithm
GAMP:: Generalized approximate message passing
GEC-SR:: Generalized expectation consistent signal recovery
GraHTP:: Gradient hard thresholding pursuit
GraSP:: Gradient support pursuit
HTP:: Hard thresholding pursuit
i.i.d.:: Independent and identically distributed
LOS:: Line-of-sight
MAP:: Maximum a posteriori
MIMO:: Multiple-input multiple-output
mmWave:: Millimeter wave
nML:: Near maximum likelihood
NMSE:: Normalized mean squared error
OMP:: Orthogonal matching pursuit
RIP:: Restricted isometry property
SNR:: Signal-to-noise ratio
ULA:: Uniform linear array
ZF:: Zadoff-Chu

References

A. L. Swindlehurst, E. Ayanoglu, P. Heydari, F. Capolino, Millimeter-wave massive MIMO: the next wireless revolution?IEEE Commun. Mag.52(9), 56–62 (2014). https://doi.org/10.1109/MCOM.2014.6894453.
Article Google Scholar
Z. Gao, L. Dai, D. Mi, Z. Wang, M. A. Imran, M. Z. Shakir, mmWave massive-MIMO-based wireless backhaul for the 5G Ultra-dense network. IEEE Wirel. Commun.22(5), 13–21 (2015). https://doi.org/10.1109/MWC.2015.7306533.
Article Google Scholar
T. E. Bogale, L. B. Le, Massive MIMO and mmWave for 5G Wireless HetNet: potential benefits and challenges. IEEE Veh. Technol. Mag.11(1), 64–75 (2016). https://doi.org/10.1109/MVT.2015.2496240.
Article Google Scholar
F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, P. Popovski, Five disruptive technology directions for 5G. IEEE Commun. Mag.52(2), 74–80 (2014). https://doi.org/10.1109/MCOM.2014.6736746.
Article Google Scholar
Bin Le, T. W. Rondeau, J. H. Reed, C. W. Bostian, Analog-to-digital converters. IEEE Sig. Proc. Mag.22(6), 69–77 (2005). https://doi.org/10.1109/MSP.2005.1550190.
Article Google Scholar
C. Mollén, J. Choi, E. G. Larsson, R. W. Heath, Uplink performance of wideband massive MIMO with one-bit ADCs. IEEE Trans. Wirel. Commun.16(1), 87–100 (2017). https://doi.org/10.1109/TWC.2016.2619343.
Article Google Scholar
L. Fan, S. Jin, C. Wen, H. Zhang, Uplink achievable rate for massive MIMO systems with low-resolution ADC. IEEE Commun. Lett.19(12), 2186–2189 (2015). https://doi.org/10.1109/LCOMM.2015.2494600.
Article Google Scholar
J. Zhang, L. Dai, S. Sun, Z. Wang, On the spectral efficiency of massive MIMO systems with low-resolution ADCs. IEEE Commun. Lett.20(5), 842–845 (2016). https://doi.org/10.1109/LCOMM.2016.2535132.
Article Google Scholar
S. Jacobsson, G. Durisi, M. Coldrey, U. Gustavsson, C. Studer, Throughput analysis of massive MIMO uplink with low-resolution ADCs. IEEE Trans. Wirel. Commun.16(6), 4038–4051 (2017). https://doi.org/10.1109/TWC.2017.2691318.
Article Google Scholar
J. Choi, J. Mo, R. W. Heath, Near maximum-likelihood detector and channel estimator for uplink multiuser massive MIMO systems with one-bit ADCs. IEEE Trans. Commun.64(5), 2005–2018 (2016). https://doi.org/10.1109/TCOMM.2016.2545666.
Article Google Scholar
Y. Li, C. Tao, G. Seco-Granados, A. Mezghani, A. L. Swindlehurst, L. Liu, Channel estimation and performance analysis of one-bit massive MIMO systems. IEEE Trans. Sign. Proc.65(15), 4075–4089 (2017). https://doi.org/10.1109/TSP.2017.2706179.
Article MathSciNet Google Scholar
D. P. Bertsekas, Nonlinear Programming. Journal of the Operational Research Society. 48(3), 334–334 (1997). https://doi.org/10.1057/palgrave.jors.2600425.
Article Google Scholar
J. J. Bussgang, Research Laboratory of Electronics, Massachusetts Institute of Technology. Technical report. 216:, 1–14 (1952). Article type: Technical report Institution: Research Laboratory of Electronics, Massachusetts Institute of Technology Volume: 216 Page: 1-14 Year: 1952
Google Scholar
H. He, C. Wen, S. Jin, Bayesian optimal data detector for hybrid mmWave MIMO-OFDM systems with low-resolution ADCs. IEEE J. Sel. Top. Sig. Proc.12(3), 469–483 (2018). https://doi.org/10.1109/JSTSP.2018.2818063.
Article Google Scholar
J. Mo, P. Schniter, R. W. Heath, Channel estimation in broadband millimeter wave MIMO systems with Few-Bit ADCs. IEEE Trans. Sign. Proc.66(5), 1141–1154 (2018). https://doi.org/10.1109/TSP.2017.2781644.
Article MathSciNet Google Scholar
T. Liu, C. Wen, S. Jin, X. You, in 2016 IEEE International Symposium on Information Theory (ISIT). Generalized turbo signal recovery for nonlinear measurements and orthogonal sensing matrices, (2016), pp. 2883–2887. https://doi.org/10.1109/ISIT.2016.7541826.
S. Bahmani, B. Raj, P. T. Boufounos, Greedy Sparsity-Constrained Optimization. J. Mach. Learn. Res.14(Mar), 807–841 (2013).
MathSciNet MATH Google Scholar
X. -T. Yuan, P. Li, T. Zhang, Gradient Hard Thresholding Pursuit. J. Mach. Learn. Res.18(166), 1–43 (2018).
MathSciNet MATH Google Scholar
D. Needell, J. A. Tropp, CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal.26(3), 301–321 (2009). https://doi.org/10.1016/j.acha.2008.07.002.
Article MathSciNet Google Scholar
S. Foucart, Hard Thresholding Pursuit: An Algorithm for Compressive Sensing. SIAM J. Numer. Anal.49(6), 2543–2563 (2011). https://doi.org/10.1137/100806278.
Article MathSciNet Google Scholar
M. R. Akdeniz, Y. Liu, M. K. Samimi, S. Sun, S. Rangan, T. S. Rappaport, E. Erkip, Millimeter wave channel modeling and cellular capacity evaluation. IEEE J. Sel. Areas Commun.32(6), 1164–1179 (2014). https://doi.org/10.1109/JSAC.2014.2328154.
Article Google Scholar
A. M. Sayeed, Deconstructing Multiantenna Fading Channels. IEEE Trans. Sign. Proc.50(10), 2563–2579 (2002). https://doi.org/10.1109/TSP.2002.803324.
Article Google Scholar
W. Hong, K. Baek, Y. Lee, Y. Kim, S. Ko, Study and Prototyping of Practically Large-Scale mmWave Antenna Systems for 5G Cellular Devices. IEEE Commun. Mag.52(9), 63–69 (2014). https://doi.org/10.1109/MCOM.2014.6894454.
Article Google Scholar
S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004).
Book Google Scholar
Y. C. Eldar, G. Kutyniok, Compressed Sensing: Theory and Applications (Cambridge University Press, Cambridge, 2012).
Book Google Scholar
A. Fannjiang, W. Liao, Coherence Pattern–Guided Compressive Sensing with Unresolved Grids. SIAM J. Imaging Sci.5(1), 179–202 (2012). https://doi.org/10.1137/110838509.
Article MathSciNet Google Scholar
N. Jindal, MIMO broadcast channels with finite-rate feedback. IEEE Trans. Inf. Theory. 52(11), 5045–5060 (2006). https://doi.org/10.1109/TIT.2006.883550.
Article MathSciNet Google Scholar
Z. Marzi, D. Ramasamy, U. Madhow, Compressive channel estimation and tracking for large arrays in mm-wave Picocells. IEEE J. Sel. Top. Sig. Proc.10(3), 514–527 (2016). https://doi.org/10.1109/JSTSP.2016.2520899.
Article Google Scholar
D. Chu, Polyphase Codes with Good Periodic Correlation Properties (Corresp.)IEEE Trans. Inf. Theory. 18(4), 531–532 (1972). https://doi.org/10.1109/TIT.1972.1054840.
Article Google Scholar
J. Markel, FFT Pruning. IEEE Trans. Audio Electroacoustics. 19(4), 305–311 (1971). https://doi.org/10.1109/TAU.1971.1162205.
Article Google Scholar
D. Skinner, Pruning the Decimation In-Time FFT Algorithm. IEEE Trans. Acoust. Speech Sig. Proc.24(2), 193–194 (1976). https://doi.org/10.1109/TASSP.1976.1162782.
Article Google Scholar
T. Sreenivas, P. Rao, FFT Algorithm for both input and output pruning. IEEE Trans. Acoust. Speech Sig. Proc.27(3), 291–292 (1979). https://doi.org/10.1109/TASSP.1979.1163246.
Article Google Scholar
Y. Ding, S. Chiu, B. D. Rao, Bayesian Channel estimation algorithms for massive MIMO systems with hybrid analog-digital processing and low-resolution ADCs. IEEE J. Sel. Top. Sig. Proc.12(3), 499–513 (2018). https://doi.org/10.1109/JSTSP.2018.2814008.
Article Google Scholar
J. A. Tropp, A. C. Gilbert, Signal Recovery from Random Measurements via Orthogonal Matching Pursuit. IEEE Trans. Inf. Theory. 53(12), 4655–4666 (2007). https://doi.org/10.1109/TIT.2007.909108.
Article MathSciNet Google Scholar
H. V. Poor, An Introduction to Signal Detection and Estimation (Springer, Berlin, 2013).
Google Scholar
P. Wang, J. Li, M. Pajovic, P. T. Boufounos, P. V. Orlik, in 2017 51st Asilomar Conference on Signals, Systems, and Computers. On Angular-Domain Channel Estimation for One-Bit Massive MIMO Systems with Fixed and Time-Varying Thresholds, (2017), pp. 1056–1060. https://doi.org/10.1109/ACSSC.2017.8335511.
R. P. David, J. Cal-Braz, in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Feedback-Controlled Channel Estimation with Low-Resolution ADCs in Multiuser MIMO Systems, (2019), pp. 4674–4678. https://doi.org/10.1109/ICASSP.2019.8683652.
A. Beck, M. Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM J. Imaging Sci.2(1), 183–202 (2009). https://doi.org/10.1137/080716542.
Article MathSciNet Google Scholar
J. P. Vila, P. Schniter, Expectation-maximization Gaussian-mixture approximate message passing. IEEE Trans. Sign. Proc.61(19), 4658–4672 (2013). https://doi.org/10.1109/TSP.2013.2272287.
Article MathSciNet Google Scholar

Download references

Funding

This work was partly supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (No. 2016-0-00123, Development of Integer-Forcing MIMO Transceivers for 5G & Beyond Mobile Communication Systems) and by the National Research Foundation (NRF) grant funded by the MSIT of the Korea government (2019R1C1C1003638).

Author information

Authors and Affiliations

School of Electrical Engineering, KAIST, Daejeon, South Korea
In-soo Kim & Junil Choi

Authors

In-soo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Junil Choi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

IK and JC led the research and wrote the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Junil Choi.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Kim, Is., Choi, J. Channel estimation via gradient pursuit for mmWave massive MIMO systems with one-bit ADCs. J Wireless Com Network 2019, 289 (2019). https://doi.org/10.1186/s13638-019-1623-x

Download citation

Received: 24 June 2019
Accepted: 12 December 2019
Published: 30 December 2019
DOI: https://doi.org/10.1186/s13638-019-1623-x

Channel estimation via gradient pursuit for mmWave massive MIMO systems with one-bit ADCs

Abstract

1 Introduction

2 mmWave massive MIMO systems with 1 bit ADCs

2.1 System model

2.2 Virtual channel representation

3 Problem formulation

4 Channel estimation via gradient pursuit

4.1 Proposed BMSGraSP and BMSGraHTP algorithms

Observation 1

Conjecture 1

Remark 1

Remark 2

4.2 Fast implementation via FFT

Remark 3

5 Results and discussion

6 Conclusions

7 Methods/experimental

Notes

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords