 Research
 Open Access
 Published:
mmWave channel estimation with accelerated gradient descent algorithms
EURASIP Journal on Wireless Communications and Networking volume 2018, Article number: 272 (2018)
Abstract
The availability of millimeter wave (mmWave) band in conjunction with massive multipleinputmultipleoutput (MIMO) technology is expected to boost the data rates of the fifthgeneration (5G) cellular systems. However, in order to achieve high spectral efficiencies, an accurate channel estimate is required, which is a challenging task in massive MIMO. By exploiting the small number of paths that characterize the mmWave channel, the estimation problem can be solved by compressedsensing (CS) techniques. In this paper, we propose a novel CS channel estimation method based on the accelerated gradient descent with adaptive restart (AGDAR) algorithm exploiting a ℓ_{1}norm approximation of the sparsity constraint. Moreover, a modified reweighted compressedsensing (RCS) technique is considered that iterates AGDAR using a weighted version of the ℓ_{1}norm term, where weights are adapted at each iteration. We also discuss the impact of cell sectorization and tracking on the channel estimation algorithm. We compare the proposed solutions with existing channel estimations with an extensive simulation campaign on downlink thirdgeneration partnership project (3GPP) channel models.
Introduction
Due to its huge spectrum availability, the millimeter wave (mmWave) band is currently considered for the fifth generation (5G) of cellular networks [1–3]. The high attenuation incurred at those frequencies imposes the use of multiple antennas at each device, typically resulting in massive multipleinputmultipleoutput (MIMO) systems, giving rise to various challenges. We focus here on channel estimation that is needed for proper transmit beamforming. In fact, the least square (LS) estimate using short training sequences and limited transmit power (to reduce overhead in massive MIMO systems) is not accurate enough for capacity achieving beamforming. However, the mmWave MIMO channel comprises a small number of dominant clusters of paths and even with many antennas a small set of parameters characterizes the entire channel. This induces a sparsity of the mmWave channel matrix when transformed by a Fourier transform into the socalled virtual channel, and compressedsensing (CS) techniques can be used for channel estimation.
Various solutions have been proposed for channel estimation in mmWave communication systems, and the reader may refer to [4] for their survey. Part of the literature has considered transceivers with hybrid beamformers (cascade of beamformers before and after the digital to analog converters): the joint optimization of both training and estimation has been pursued in [5] for these structures using a feedback channel. Orthogonal matching pursuit (OMP) solutions have been considered with both single path [6, 7] and multiplepath cancelation [8, 9]. In [10], an enhanced approach for generating the beamforming codebook has been proposed, using the continuous basis pursuit (CBP) method, while [11] considers a fast iterative shrinkagethresholding algorithm (FISTA) approach.
For fully digital beamformers, [12] considers the sparse channel estimation as a least absolute shrinkage and selection operator (LASSO) problem. In [13], OMP is used to estimate the channel by iteratively detecting and canceling paths from the virtual channel estimate. In [14], a basis pursuit denoise (BPDN) approach is suggested where a weighted version of ℓ_{1}norm term is considered in the LASSO problem and weights are iteratively adapted. A sparsity adaptive matching pursuit (SSAMP) approach is instead used in [15], while in [16] the LASSO problem is solved by applying a generalized approximate message passing (GAMP) algorithm exploiting the BernoulliGaussian distribution of paths in the virtual channel.
In this paper, we propose a novel sparse channel estimation method based on the accelerated gradient descent with adaptive restart (AGDAR) algorithm [17]. Focusing on a scenario where the receiver obtains first the LS estimate of the narrowband mmWave MIMO channel, we relax the sparse optimization problem using LASSO, wherein the ℓ_{0}norm is replaced by the ℓ_{1}norm. We apply then the AGDAR algorithm [17] to solve the sparse channel estimation problem. In order to further enhance the channel estimation procedure, a reweighted ℓ_{1}norm problem is considered leading to the reweighted compressedsensing (RCS) algorithm [18], which iterates AGDAR with different weights of the ℓ_{1}norm term. We also discuss the impact of cell sectorization and channel tracking on the channel estimation algorithm. We compare the proposed solutions with OMP solutions [6, 8, 13]. With respect to the rest of the literature we reduce the complexity (with respect to the random search of ALASSO in [12]), we swap the objective functions and the constraints with respect to [14], effectively minimizing the mean square error (MSE) and providing details on the implementation of the optimization algorithm. Compared to [15], we use different algorithms (AGDAR and RCS instead of SSAMP) that tradeoff between sparsity and noise reduction. Lastly, we consider a single user and a static pilot transmission for the initial estimate, while [16] considers the adaptation of the transmit and receive beamformers to allow channel estimation simultaneously for more users. An extensive simulation campaign on thirdgeneration partnership project (3GPP) channel models [3] for a downlink scenario has been conducted to show the merits of the proposed approach in terms of both estimate MSE and computational complexity.
The rest of the paper is organized as follows. We introduce the system model in Section 2, providing the description of both the mmWave channel model and the existing OMP solutions. The sparse channel estimation problem is introduced in Section 3, together with a discussion on sectorization and channel tracking. The proposed AGDAR technique is described in Section 4, together with the refined RCS approach. Numerical results are presented in Section 5 to assess the performance of the considered techniques in a 5G scenario, before conclusions are driven in Section 6.
System model
We consider a massive MIMO narrowband communication system with N_{t} antennas at the transmitter and N_{r} antennas at the receiver. This models indifferently either the uplink or the downlink of a cellular communication system. Let \(\mathbf {H} \in \mathbb {C}^{N_{\mathrm {r}} \times N_{\mathrm {t}} }\) be the channel matrix with complex entries. Antennas are organized into either uniform linear arrays (ULAs) [19] or uniform planar arrays (UPAs) [20] at both the transmitter and receiver: ULA antennas are uniformly spaced along the z axis while UPA antennas are uniformly tiled over the yzplane^{Footnote 1}. For the sake of a clearer explanation in the main body of the paper, we only provide derivations for ULA, while we report in Appendix A the results for an UPA with D_{2}×D_{3}=N_{t} transmit antennas and D_{0}×D_{1}=N_{r} receive antennas.
We indicate with L the number of paths for the signal from the transmitter to the receiver, so that the channel matrix entries can written as
where i_{1}=0,…,N_{r}−1, i_{2}=0,…,N_{t}−1, \(\eta _{d}^{l}\leq \frac {\delta }{\lambda }\), d=1,2, λ is the carrier wave length, δ is the antenna spacing, and α_{l} is the lpath amplitude including path loss, shadowing, and fading. Note that parameters \(\eta _{d}^{l}\) are related to the angles of departure and arrival of the lth path. By assuming δ≤λ/2, we have \(\eta _{d}^{l} \in \left [\frac 12, \frac 12\right ]\). The statistics of each parameter depend on the considered propagation scenario, and various relevant cases can be found for example in the 3GPP mmWave channel model [3] including channel models with clustered subpaths [3], where L becomes the total number of (sub)paths from all clusters. Typically, in mmWave systems, the number of paths (or subpaths) L is small [21].
Figure 1 shows an example of receiver with N_{r}=3 receive antennas and a single path arriving at the antennas with an angle 𝜗 from a distance D: in this case \(\eta _{1}^{1} = \frac {\delta }{\lambda } \cos \vartheta \) and \(\alpha _{1} = \frac {1}{D^{2}} e^{j2\pi \frac {D}{\lambda }}\).
LS estimate
The considered channel estimation techniques in this paper are all based on the LS channel estimate, briefly summarized here.
The set of N_{t} training symbols^{Footnote 2} transmitted with the N_{t} antennas are collected into the N_{t}×N_{t} matrix S, assumed here to be unitary. The corresponding received N_{t}×N_{r} matrix signal is
where \(\boldsymbol {N} \in \mathbb {C}^{N_{\mathrm {r}} \times N_{\mathrm {t}} }\) is the noise matrix with independent and identically distributed (iid) zeromean complex Gaussian entries, each with power σ^{2}. The LS channel estimate at the receiver is obtained as [22]
where N^{′} is a matrix with iid zeromean complex Gaussian entries having power σ^{2} (thanks to the unitary property of S). Note that this estimation procedure may yield a significant overhead for the transmission of training sequences only when the number of transmit antennas grows large [23], since the number of transmitted symbols (the columns of S) is N_{t}. We will further address this problem in Section 3.
OMP methods
We will compare our channel estimation algorithm with two OMP techniques: single peak cancelation (SPC) [6] and joint peak cancelation (JPC) [8]. Both methods use the Fourier transform of the channel matrix, in what is usually denoted as virtual channel or angular domain representation [24, Sec. 7.3.3]. With reference to ULA, let \(I_{n}(x)=\frac {\sin (\pi x)}{n \sin \left ({\pi \frac {x}{n}}\right)}\) be the 1Dperiodic sinc function and let
be the twodimensional (2D) sampled periodic sinc function, where M_{1} and M_{2} are the number of samples per period in the 2D virtual channel domain, f=(f_{1},f_{2}), f_{d}=0,…,M_{d}−1, d=1,2, are the indices of the samples, and \(\boldsymbol {\Omega }^{l}=\left (\Omega _{1}^{l}, \Omega _{2}^{l}\right)=\left (M_{1}\eta _{1}^{l},M_{2}\eta _{2}^{l}\right)\).
The virtual channel matrix \(\boldsymbol {V} \in \mathbb C^{M_{1} \times M_{2}}\) is the 2Ddiscrete Fourier transform (DFT) of H with entries [6]
where f=(f_{1},f_{2}) are the indices of the channel sample in the virtual domain.
The SPC method [6] reported in Algorithm 1 (for algo = SPC), iteratively estimates the amplitude α_{l} and the discrete positions Ω^{l} of I paths in the virtual channel and cancels their corresponding periodic sinc functions in the virtual channel. After I iterations, the channel estimate \(\boldsymbol {\widehat {H}}\) is obtained by taking the 2Dinverse discrete Fourier transform (IDFT) of the estimated virtual channel \(\boldsymbol {\widehat {V}}\) reconstructed by summing the contributions of all the detected paths.
The number of iterations (i.e., the number of detected paths) is a tradeoff between L (the number of paths) and the noise level. On the one hand, it is advisable to estimate all L paths, and on the other hand, noise can make smallpower paths not detectable; therefore, it is better not to estimate all of them by using I<L. In Section 5, we determine by simulations the optimal I that minimizes the MSE of the channel estimate. Note that SPC provides an intrinsically approximated solution even in the absence of noise, since the peak positions Ω^{l} are estimated on a fixed discrete grid.
The JPC algorithm of [8] reported in Algorithm 1 (for algo= JPC) is a modification of SPC that at each iteration jointly estimates the amplitudes of all previously detected peaks by the LS approach and cancels the corresponding periodic sinc’s from the virtual channel. In particular, x=vec(X) stacks the columns of matrix X into the column vector x; at iteration l, one peak is detected (line 3) and then the amplitudes of all previously detected peaks are jointly estimated (line 8), and the new virtual channel with removed peaks is obtained (lines 9–11). This is achieved by building matrix w that contains in column l the vector version of \(\boldsymbol {W}\left (\widehat {\boldsymbol {\Omega }}^{l}\right)\) (line 6).
This algorithm has the advantage over the SPC that each amplitude estimate is refined at each iteration thus taking advantage also of the peaks detected in further iterations.
Sparse dual channel estimation
In order to obtain an efficient and simple channel estimator, we exploit the specific channel structure described in the previous section. In particular, we use the fact that the channel is composed of a small number of paths with respect to the typically large number of transmit and receive antennas.
In this paper, we directly refer to the representation (1) and interpret it as 2DIDFT of a sparse matrix having only L nonzero entries. First, the channel H is rearranged into the channel column vector \( {\boldsymbol {h}}=\text {vec}(\boldsymbol {H}) \in \mathbb {C}^{N_{\mathrm {r}}N_{\mathrm {t}} \times 1}\) with entries
where i_{1}=0,…,N_{r}−1, i_{2}=0,…,N_{t}−1, while the 2DIDFT matrix is \(\boldsymbol {F} \in \mathbb {C}^{N_{\mathrm {t}} N_{\mathrm {r}} \times M_{2} M_{1}} \) with entries
where f_{d}=0,…,M_{d}−1, d=1,2. Lastly, we define the column vector v of length M_{1}M_{2} with L nonzero entries at position \(\bar {\Omega }_{1}^{l} + M_{1} \bar {\Omega }_{2}^{l}\), for l=1,…,L, i.e.,
with
and 〈x〉 denotes the integer part of x. From (1), we can approximate the channel vector as
We will denote with v as the dual channel in the M_{1}×M_{2} domain. Note that the dual channel is sparse as it contains only L nonzero entries.
Remark 1
The approximation (10) stems from the rounding of (9), i.e., from approximating \(\Omega _{d}^{l}\) with \(\bar {\Omega }_{d}^{l}\). As M_{d}→∞, the approximation becomes more accurate. Moreover, we have used DFTs with M_{d} points along dimension d, as for the dual channel representation, in order to make a simpler comparison among various channel estimation schemes. Lastly, note that v is not the vectorial representation of the virtual channel, since the DFT used to obtain the virtual channel does not invert the IDFT of (10): in fact, the DFT is taken on the reduced set of N_{r}×N_{t} samples, thus yielding the periodic sinc’s of (4).
Similarly to the vectorial representation of the channel, we define
where indices of H^{′} to obtain h^{′} are selected similarly to (6). From (11), we observe that the LS estimate is a noisy version of a linear transformation of the dual channel v.
We propose an algorithm that improves the LS channel estimate by exploiting the sparsity of v. In particular, we define by \(\hat {\boldsymbol {v}}\) the new estimate of the dual channel v and write the sparse channel estimation problem as
where ∥v^{′}∥_{0} is the ℓ_{0}norm that counts the nonzero elements in v^{′} and ρ is a parameter that controls the sparsity of the solution. This problem formulation aims at minimizing the MSE between the estimated channel and the LS channel estimate, under a constraint on the sparsity of vector \(\hat {\boldsymbol {v}}\), imposed by the normzero term.
Unfortunately, problem (12) is nonconvex and NPhard [25]. Thus, we relax the problem by replacing the ℓ_{0}norm with the ℓ_{1}norm obtaining the LASSO problem
with \(\boldsymbol {v}_{1} = {\sum \nolimits }_{i=0}^{M_{1}M_{2}1} v_{i}\), which is now convex.
Remark 2
Note that estimating the dual channel opens the possibility of reducing the training overhead. We observe that systems with different numbers of antennas (placed at the same position) share the same dual channel. Thus, once we have an estimate of v, we can change F to obtain the channel estimate for a different antenna setting. Indeed, we can use a fewer transmit antennas to transmit the training sequence, then obtain an estimate of the dual channel and finally project the estimate into a larger number of antennas by modifying the size of the IDFT matrix in (10). Typically paths are concentrated in clusters on part of the dual channel, thus by an iterative channel estimation procedure, we can beamform training signals in the part of the dual channel covered by the clusters.
Solution of the sparse channel estimation problem
A vast literature is available for the solution of the sparse channel estimation problem (13), see for example the survey [26]. We propose here to use two recent and efficient methods based on the gradient descent algorithm with improved convergence speed, namely the AGDAR algorithm (also named FISTA with adaptive restart [27]) and the RCS algorithm [18].
Accelerated gradient descent with adaptive restart
The AGDAR algorithm [27] has been developed to solve problems where the objective function is the sum of a differentiable function and a general but simple closed convex function.
Here, we briefly summarize the motivation of the AGDAR algorithm. We first observe that the minimization problem \(\min _{\boldsymbol {x} \in \mathbb R^{N}} f(\boldsymbol {x})\) when f(·) is convex and smooth can be solved by the gradient descent algorithm that iteratively updates the solution, computing at iteration p
where t is the step size and ∇f(x) is the gradient of f(·) computed in x. An alternative formulation of (14) is provided by the proximal form [28]
Now, in order to minimize f(x)+g(x) with f(·) still convex and smooth but g(·) convex, nondifferentiable, and lower semicontinuous, the proximal form must be modified as follows [28]
In general, this optimization problem may be hard to solve; however, when g(x)=ρx_{1}, problem (16) is efficiently solved by splitting it into N separate onedimensional problems for each entry of \(\boldsymbol {x} \in \mathbb R^{N}\), i.e.,
with z_{n}=[x_{p−1}−t∇f(x_{p−1})]_{n}. Problem (17) can be solved in closed form [29] as \([\boldsymbol {x}_{p}]_{n} = \mathcal T_{\rho t}(z_{n})\), with the shrinkage operator defined as
obtaining the iterative shrinkagethresholding algorithm (ISTA) algorithm. This solution can be made faster by applying the Nesterov acceleration principle [30]: instead of using the gradient descent (14), x_{p} is updated as a linear combination of the gradient descent terms (14) at the current and previous iterations, i.e.,
Combining this approach with ISTA, we obtain the FISTA algorithm where \([\boldsymbol {y}_{p}]_{n} = \mathcal T_{\rho t}(z_{n})\) and x_{p} is updated using (19b) It turns out that this approach fastens the convergence of the algorithm for example by choosing as linear combination coefficients [30]
The explanation of the Nesterov iteration is not very intuitive, and the interested reader can find more details in [30].
The parameter choice (20) is not in general optimal while its optimization is a difficult task. An alternative approach is the adaptive restart technique [27], in which γ_{p} is set according to (20) (thus in a suboptimal way) but the FISTA algorithm is restarted whenever the objective function is locally increasing (thus the iterative solution is moving in the wrong direction), i.e., when
From (19a) we obtain that the restarting condition (22) can be written as
The restart consists in resetting θ_{p}=1 and using as initial point the last point produced by the algorithm.
Reweighted compressed sensing
The RCS method is proposed [18] to improve the sparsity of the gradient descent solution of (13). The algorithm weights the entries of x_{p} in the ℓ_{1}norm in (13) in order to better approximate the ℓ_{0}norm term in (12). Therefore, instead of solving (12), the RCS method aims at solving problem
where the diagonal matrix D contains the weights. This problem can be seen as a ℓ_{1}norm relaxation of a weighted version of the ℓ_{0}norm problem (12), i.e.,
As the ℓ_{0}norm counts the nonzero entries, regardless of their amplitude, for nonzero weights, the two problems (12) and (25) have the same solutions.
About the choice of the weights, they are meant to provide a good approximation of the ℓ_{0}norm using the (weighted) ℓ_{1}norm. Therefore, imposing that at solution
we obtain the optimum weights (diagonal entries of matrix D)
where ∗ denotes any nonzero value. However, this choice requires the knowledge of the problem solution \(\hat {\boldsymbol {v}}\), which is not available while solving the problem.
In [18], an iterative approach has been proposed, where the weights are adapted to converge to (27) without knowing the optimal solution. In particular, RCS runs q_{2} times the AGDAR algorithm, using at each iteration a different set of weights chosen according to (28). It has been shown by extensive simulations over a variety of examples that the following weight adaptation strategy is performing well: starting from D_{0}=diag{[1,…,1]}, for which (24) corresponds to (13), and then at iteration i+1 update the weights as
where x_{p} is the solution of (24) for weights D_{i} and ζ is a small number. Note that at convergence (for ζ≈0) we obtain (27).
The implementation of RCS is obtained by running q_{2} times the AGDAR algorithm and computing the shrinkage function with a weighted parameter, i.e., \(\mathcal T_{w_{n} \rho t}(z_{n})\). The resulting procedure is reported in Algorithm 3.
It has been shown [18] that the RCS algorithm is a majorizationminimization algorithm that iteratively minimizes a simple surrogate function majoring the objective function, and indeed provides in general a better approximation to the original ℓ_{0}norm problem.
Sectorization and channel tracking
When the antennas at either or both the transmitter and the receiver are transmitting/receiving in a focused direction (in what is known as cell sectorization), the departure and arrival angles are within subintervals of [0,2π); therefore, also \(\eta _{d}^{l}\) will take value in subintervals of [−1/2,1/2) and the rounding of \(\eta _{d}^{l} M_{d}\) will be in a subinterval of \(\left (\frac {M_{d}}{2}, \frac {M_{d}}{2}\right)\). Therefore, vector v can be reduced by eliminating the entries corresponding to values of \(\eta _{d}^{l}\) that are never taken by the channel realization. Correspondingly, the columns of F are removed and the AGDAR algorithm is run over a reduced space, thus increasing its accuracy.
About channel tracking, once the channel has been estimated, it may slowly change due to the variations of the propagation environment. In this case, we can reduce the complexity of the channel estimation and make it more effective by simply tracking its changes rather than starting from scratch its estimation. We propose to focus the search of the paths in the dual channel within intervals around the initial estimates of arrival and departure angles. Therefore, for both AGDAR and RCS approaches, we have a reduction of the dual channel vector v. Indeed, this is similar to sectorization; however, for channel tracking we must consider multiple angle intervals, one for each initially estimated path.
Sectorization or tracking are also possible for SPC and JPC, wherein the search of the peak positions Ω^{l} will be done on a subgrid of the M_{1}×M_{2} grid, according to the intervals of \(\eta _{1}^{l}\) and \(\eta _{2}^{l}\). Also in this case, the benefits for the channel estimation process will be a lower probability of periodic sinc misplacement, as the search space is smaller.
Computational complexity
The computational complexity of the considered algorithms is evaluated in terms of complex multiplications (CMUX), complex additions (CADD), and comparisons (COM). Let \(M_{tot}=\prod _{d} M_{d}\), where the product is along all dimensions, depending on the use of either UPA or ULA. Hence, using the fast Fourier transform algorithm, a (I)DFT of M_{tot} samples requires M_{tot} log2M_{tot} CMUX, M_{tot} log2M_{tot} CADD and no comparison. A summary of the computational complexity of the various algorithms is reported in Table 1, where q_{3} denotes the effective number iterations over the variable p of the AGDAR and RCS algorithms. In the following section, we will present numerical results also for a complexity comparison among the various techniques.
Numerical results
In this section, we compare the proposed channel estimation techniques by evaluating the MSE in decibels (dB)
where \(\mathbb E[\cdot ]\) denotes the expectation operator and \(\hat {\boldsymbol {H}}\) is the estimated channel matrix.
We consider the urban macro cell (UMa), urban micro cell (UMi), rural macro cell (RMa), and Indoor Hotspot (InH) 3GPP channel models [3], with both lineofsight (LOS) and nonlineofsight (NLOS). In these scenarios, the number of clusters (typically from 4 to 20) depends on the channel model and the number of subpaths is 20 per cluster, thus totaling L in the range of tens to hundreds. Note that although the number of subpaths is large, only three or four subpaths have a notable power. Therefore de facto, we find the sparse channel model described in this paper and in many literature papers and measurement campaign results. Channels are obtained for a downlink, where the base station (BS) and user equipment (UE) are on the same plane, with parameters defined in Table 2. The average channel gain is unitary, so we assume that transmit power has been adapted to compensate for the path loss; therefore, the average SNR is the reciprocal of the noise power. This also provides that the MSE for the LS estimate is simply the reciprocal of the SNR, which is then not reported in the figures.
For the proposed AGDAR and RCS algorithms, we use parameters as in Table 2. In the following, we will always consider the same antenna geometry (either ULA or UPA) at both BS and UE, with a different number of antennas at the two ends.
Parameters setting
We first evaluate the impact of the parameters on the channel estimate MSE. The performance of both AGDAR and RCS is determined by the parameter ρ that weights the ℓ_{1}norm term in the objective function and should be chosen according to the channel sparsity and the operating signal to noise ratio (SNR). For each value of SNR, we have assessed the optimum value of ρ that minimizes the average MSE of the channel estimate. The results are reported in Fig. 2 for both (at both devices) with N_{r}=16=4×4 (D_{0}=D_{1}=4) and N_{t}=4=2×2 (D_{2}=D_{3}=2) and M_{0}=M_{1}=M_{2}=M_{3}=8, and ULAs (at both devices) with N_{r}=16, N_{t}=4 and M_{1}=M_{2}=32. In this case the channel model is UMi LOS for which the number of clusters is random, between 3 and 12, while the number of subpaths per cluster is 20, for a maximum total of L=240 paths. We can see a smooth behavior of ρ with respect to the average SNR, that can be described with simple functions for its adaptation to operating conditions. Moreover, as the average SNR increases, the optimal ρ decreases, since the LS estimate is less noisy and a limited sparsification of the channel is required. We have optimized the value of this parameter also for other conditions (e.g., different number of antennas) and forthcoming results are obtained with the optimized ρ.
A second relevant parameter for both AGDAR and RCS is the maximum number of iterations q_{1}. Figure 3 shows the average MSE as a function of the number of maximum allowed iterations q_{1} for ULAs with N_{r}=16, N_{t}=4, and various DFT sizes. We note that for all DFT sizes the MSE is flooring as q_{1} increases: in particular, with log10q_{1}=2.5, all algorithms converge to the minimum MSE. We also observe that the required number of iterations grows with (M_{1},M_{2}), and RCS achieves a lower asymptotic MSE and converges faster.
For a comparison with SPC and JPC, we have also optimized parameter I, i.e., the number of detected paths. Note that the existing literature typically assumes the knowledge of L and sets I=L. We instead observe that I does not necessarily correspond to the number of paths, since small paths can be neglected as may be easily confused with noise artifacts. This is particularly true in the 3GPP channel model, where many clusters and subpaths are present, most of which however have a very limited power. Figure 4 shows the value of I that minimizes the average MSE versus the average SNR for various values of M=M_{1}=M_{2}, using ULAs with N_{r}=16, N_{t}=4 and SPC. The channel model is UMi LOS. We observe that we need a large value of I when the SNR is high, as the considered channel model has many (sub)paths, and at high SNRs, they can be distinguished from the noise. Also, note that the optimal value of I is decreasing as M increases: indeed, for higher M the approximation between positions Ω_{l} and \(\hat {\boldsymbol {\Omega }}_{l}\) becomes more accurate, thus fewer sinc functions (closer to the actual number of paths) is enough (and better) for channel estimation. The results reported in the rest of this section are obtained with the optimized value of I.
Lastly, we consider the optimization of the DFT size, i.e., the number of points used to approximate \(\eta _{d}^{l}\) in all considered channel estimation methods. In Appendix B, we provide the analytical derivation of the MSE as a function of M, for ULA and a Rayleigh fading channel with a small number of taps. Figure 5 shows the MSE as a function of M=M_{1}=M_{2} for ULA with N_{t}=8, N_{r}=2 and the channel model described in Appendix B. As expected the MSE decreases for a higher value of M, as the quantization error is reduced, flooring for high values of M. We then have considered a ULA system with N_{r}=32 and N_{t}=4 and a DFT size multiple μ of the number of antennas, i.e., M_{1}=μN_{r} and M_{2}=μN_{t}. For a UMi LOS model, Fig. 6 shows the average MSE for an SNR of 0, 10, and 20 dB, and different channel estimation techniques. We observe that in all cases by increasing μ we obtain a better channel estimate, thanks to a better quantization precision of either the virtual channel domain (for SPC and JPC) or the values of \(\eta _{d}^{l}\) (for both AGDAR and RCS). Moreover, both AGDAR and RCS methods achieve a lower MSE than SPC and JPC techniques at both low and high SNRs, thanks to their better exploitation of compact channel representation in the dual domain. The RCS has almost negligible performance improvement with respect to AGDAR, as both have a gain from 3 to 5 dB with respect to interference cancelation techniques. Note that the gain is more remarkable at a lower SNR, showing that the compressedsensing techniques are able to better reject the noise. Lastly, note that JPC has an almost negligible improvement over SPC; thus, we can conclude that the detection of the peaks is already accurate when performed sequentially rather than in parallel. Overall, we conclude that μ=2 already provides closetooptimal results for all methods. Similar observations can be drawn from Fig. 7, where we report the MSE for a UPA configuration with 2×2 antennas at the UE and 6×6 antennas at BS in a UMa LOS channel model.
In order to show the performance of the proposed solution in a scenario with a large number of antennas, Fig. 8 shows the MSE as a function of μ for ULA with N_{r}=128 and N_{t}=16 and the UMi LOS model. Also in this case, we can appreciate the advantage of all techniques with respect to the LS method, as we recall that for LS the MSE is the reciprocal of the SNR, thus 0 dB in correspondence of the dotted lines, −10 dB for the solid lines and −20 dB for the dashed lines. Indeed, a higher number of antennas with respect to Fig. 6 increases the gain of the other channel estimation techniques with respect to LS. About the comparison among the various methods, we can derive similar conclusions as those of Fig. 6, confirming also that other results are representative of a massive MIMO scenario.
Sectorization and channel tracking
As we already discussed, sectorization provides a faster and more accurate search of the channel paths. Here, we consider a system where channel angles are uniformly distributed in intervals of 6, 60, and 180 degrees. We have L=14 paths, with independent Gaussiandistributed amplitudes α_{l}: by this simple channel model, we better capture the effects of sectorization and channel tracking.
Figure 9 shows the resulting MSE as a function of M=M_{1}=M_{2} for the various systems, when the average SNR is 10 dB. We observe that sectorization indeed reduces the MSE of all channel estimates, and sectors of 6° provide a MSE of 10 to 16 dB smaller than that of 6° sectors. Comparing the various techniques we observe that with large (180°) and small (6°) sectors all techniques take advantage of the sectorization in a similar way, while for intermediate values (60°) the compressedsensing methods have a higher gains than interference cancelation methods.
We also consider channel tracking where, after an initial channel estimation performed according to the various considered techniques and with an angle span of 360°, when channels are timeinvariant. Estimators are run using an angle span of 6° around the angles of each path. Figure 10 shows the MSE of the channel estimates at an average SNR of 10 dB, and for various values of M_{1} and M_{2}. We observe that, thanks to the search over a smaller angle span both SPC and JPC achieve similar performance to the proposed approaches, a phenomenon that we already observed with sectorization. Also, the difference between RCS and AGDAR is further reduced, again because of the easier task of channel estimation in this case. We still note instead a high sensitivity to the DFTsize, which corresponds to an accuracy in the estimate of the angles of arrival and departure. Lastly, both sectorization and channel tracking reduce the complexity of the proposed solutions, as path search operations can be performed on a reduced space.
3GPP channel scenarios comparison
Until now, we considered only the UMi LOS channel model: in this section instead, we consider also the other 3GPP channel models. Figure 11 shows the average MSE for various algorithms and UPAs with N_{t}=8×8 at the BS and N_{r}=2×2 at the UE, and M_{0}=M_{1}=10 and M_{2}=M_{3}=4. We compare various channel estimation techniques for an average SNR of 10 dB. At this intermediate SNR value, we observe that the proposed AGDAR and RCS significantly outperform both SPC and JPC for all the considered channel models, by 6 to 7 dB. The indooroffice model (InH Mixed) has a few significant taps with reduced dispersion, a favorable condition for SPC and JPC, which exhibit a reduced gap with respect to the proposed (still better performing) techniques. On the other hand, dispersive channels with many lowpower taps (UMi Street Canyon NLOS) make the channel estimation more problematic for SPC and JPC, while can be handled very efficiently by the compressedsensing techniques, thanks to their ability to better distinguish between noise and channel components. This provides a gain of 7 dB between SPC and AGDAR. We also note that both AGDAR and RCS have comparable performance across all channel models. Similar results are obtained at low and high SNR (results are not reported here for the sake of conciseness).
Complexity comparison
In order to assess the complexity of the various channel estimation methods, we first report in Fig. 12 the effective number of iterations q_{3} of AGDAR as a function of the maximum allowed number of iterations q_{1}, for ULAs with N_{r}=32, N_{t}=4, and average SNR of 10 dB on a UMi LOS channel model. As expected, when the number of allowed iterations increases, also the number of effective iterations increases, until reaching a floor. Moreover, a higher value of M_{1} and M_{3} requires a higher number of iterations. We also observe that RCS requires fewer iterations (while achieving a better performance in terms of MSE) as the reweighting fastens the convergence.
In Fig. 13, we compare the complexity of the various scenarios, by considering the number of complex multiplications, as derived in Section 4.4, as a function of the number of canceled paths I. Parameters are those of Fig. 3, so that we can compare the achieved MSE of the various schemes. We observe that the number of multiplications grows exponentially for the SPC and JPC techniques. We also report the number of complex multiplications for the AGDAR and RCS methods that do not depend on I. We note that the AGDAR has a remarkably lower complexity than other methods. However, we also notice that for M_{1}=64 and M_{2}=8, RCS has a significantly higher complexity with respect to the other methods. When comparing the MSE performance (Fig. 3), we see that AGDAR achieves a much lower MSE than SPC and JPC methods for a lower complexity (in terms of CMUX and CADD).
Conclusions
In this paper, we have proposed channel estimation techniques for mmWave massive MIMO systems, based on a CS approach, where we have exploited the sparse nature of the channel, considering in particular the small number of channel paths at those frequencies. Efficient innovative solutions based on the adaptive restart of the Nesterov accelerated gradient algorithm have been explored. Numerical results have shown the superiority of the proposed approaches with respect to existing procedures, with similar or lower computational complexity. We have also considered the effects of sectorization and proposed a channel tracking technique that exploits slow channel variations.
Appendix A: CS channel estimation for UPA
For a UPA system, we denote as D_{2} and D_{3} the number of transmit antennas along the y and zaxes, respectively; therefore, N_{t}=D_{2}×D_{3}. Similarly D_{0} and D_{1} are the numbers of antennas at the receiver along the two axes; therefore, N_{r}=D_{0}×D_{1}. The channel matrix has entries
The channel H^{UPA} is transformed into the channel column vector \( {\boldsymbol {h}^{\text {UPA}}}=\text {vec}\left (\boldsymbol {H}^{\text {UPA}}\right) \in \mathbb {C}^{N_{r}N_{\mathrm {t}} \times 1}\) with entries
We also define the 4DDFT matrix as \(\boldsymbol {F}^{\text {4D}} \in \mathbb {C}^{N_{r}N_{\mathrm {t}} \times (M_{0} M_{1} M_{2}M_{3})}\) with entries
Lastly, we define the column vector v^{UPA} of length M_{0}M_{1}M_{2}M_{3} with L nonzero entries, namely
where
From (30), we can approximate the channel as
Similarly, we can define \(\boldsymbol {h}^{'\text {UPA}}\phantom {\dot {i}\!}\) and \(\boldsymbol {v}^{'\text {UPA}}\phantom {\dot {i}\!}\) for the LS estimate of the channel and its dual representation. The AGDAR and RCS algorithms for UPA can be obtained as described in Section 4, with F, h^{′}, and v^{′} replaced by F^{4D}, \(\phantom {\dot {i}\!}\boldsymbol {h}^{'\text {UPA}}\), and \(\boldsymbol {v}^{'UPA}\phantom {\dot {i}\!}\), respectively.
Appendix B: On the choice of M _{d}
The choice of the number of DFT points per dimension M_{d} is important to determine the performance of the channel estimation algorithm. From (34), we have that the AGDAR solution approximates \(\eta _{d}^{l}\) with a quantized value taken over M_{d} possible points. Therefore, we can write the quantization error on the lth path as
Focusing now on a scenario wherein ULAs are used at both transmitter and receiver, assuming that all other estimates (i.e., the amplitude angle estimates of each path) are correct and the only imperfection is the quantization error, from (36), the estimated channel can be written as (compared with (1))
and using the definition of H of (1), we have that the MSE of the channel estimate is
Now, assuming that the amplitudes and angles are independent random variables, we have
where \(\phantom {\dot {i}\!}\sigma _{\alpha }^{2}(l) = {\mathbb E}[\alpha _{l}^{2}]\). Expanding the expectation we have
Let us assume that arrival and departure angles (\(\vartheta ^{r}_{l}\) and \(\vartheta ^{t}_{l}\)) are uniformly distributed in the interval [0,2π) and \(\eta _{1}^{l}={\frac {\delta }{\lambda }\cos \vartheta ^{r}_{l}}, \eta _{2}^{l}={\frac {\delta }{\lambda }\cos \vartheta ^{t}_{l}}\). Then, the probability density function of \(\eta _{1}^{l}\) and \(\eta _{2}^{l}\) for \(\delta =\frac {\lambda }{2}\) becomes
and (40) becomes
This MSE provides a guideline for the choice of M_{d}, as we must have at least γ^{(q)}>σ^{2}, so that quantization does not introduce more errors (in terms of its power) than noise already present.
Notes
 1.
Note that other configurations (e.g., UPA on one side and ULA on the other side) can be obtained with similar derivations.
 2.
In general, the number of symbols can be larges than N_{t} but we consider here this simpler case for the sake of conciseness.
Abbreviations
 3GPP:

Thirdgeneration partnership project
 AGDAR:

Accelerated gradient descent with adaptive restart
 AWGN:

Additive white Gaussian noise
 BPDN:

Basis pursuit denoise
 BS:

Base station
 CBP:

Continuous basis pursuit
 CS:

Compressed sensing
 DFT:

Discrete Fourier transform
 FISTA:

Fast iterative shrinkagethresholding algorithm
 IDFT:

Inverse discrete Fourier transform
 InH:

Indoor Hotspot
 ISTA:

Iterative shrinkagethresholding algorithm
 JPC:

Joint peak cancelation
 LASSO:

Least absolute shrinkage and selection operator
 LOS:

Lineofsight
 LS:

Least square
 MIMO:

Multipleinputmultipleoutput
 ML:

Maximum likelihood
 MSE:

Mean square error
 NLOS:

Nonlineofsight OMP: Orthogonal matching pursuit
 RCS:

Reweighted compressed sensing
 RF:

Radio frequency
 RMa:

Rural macro cell
 SNR:

Signal to noise ratio
 SPC:

Single peak cancelation
 SSAMP:

Sparsity adaptive matching pursuit
 UE:

User equipment
 ULA:

Uniform linear array
 UMa:

Urban macro cell
 UMi:

Urban micro cell
 UPA:

Uniform planar array
References
 1
T. S. Rappaport, J. N. Murdock, F. Gutierrez, State of the art in 60ghz integrated circuits and systems for wireless communications. Proc. IEEE. 99(8), 1390–1436 (2011).
 2
S. Rangan, T. S. Rappaport, E. Erkip, Millimeterwave cellular wireless networks: Potentials and challenges. Proc. IEEE. 102(3), 366–385 (2014).
 3
Tecnical report, 5G; study on channel model for frequencies from 0.5 to 100 GHz (3GPP TR 38.901 version 14.3.0 release 14) (2018).
 4
R. W. Heath, N. GonzálezPrelcic, S. Rangan, W. Roh, A. M. Sayeed, An overview of signal processing techniques for millimeter wave MIMO systems. IEEE J. Sel. Top. Sig. Proc. 10(3), 436–453 (2016). https://doi.org/10.1109/JSTSP.2016.2523924.
 5
A. Alkhateeb, O. E. Ayach, G. Leus, R. W. Heath, Channel estimation and hybrid precoding for millimeter wave cellular systems. IEEE J. Sel. Top. Sig. Proc. 8(5), 831–846 (2014). https://doi.org/10.1109/JSTSP.2014.2334278.
 6
S. Montagner, N. Benvenuto, S. Tomasin, in Proc 2015 IEEE Int. Conf. on Communication Workshop (ICCW). Taming the complexity of mmwave massive MIMO systems: Efficient channel estimation and beamforming, (2015), pp. 1251–1256. https://doi.org/10.1109/ICCW.2015.7247349.
 7
D. De Donno, J. P. Beltrán, D. Giustiniano, J. Widmer, in 2016 IEEE International Conference on Communications Workshops (ICC), Kuala Lumpur. Hybrid analogdigital beam training for mmWave systems with lowresolution RF phase shifters, (2016), pp. 700–705. https://doi.org/10.1109/ICCW.2016.7503869.
 8
J. Lee, G. T. Gil, Y. H. Lee, Channel estimation via orthogonal matching pursuit for hybrid MIMO systems in millimeter wave communications. IEEE Trans. Commun. 64(6), 2370–2386 (2016). https://doi.org/10.1109/TCOMM.2016.2557791.
 9
J. Palacios, D. De Donno, D. Giustiniano, J. Widmer, in 2016 IEEE 27th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Valencia. Speeding up mmWave beam training through lowcomplexity hybrid transceivers, (2016), pp. 1–7. https://doi.org/10.1109/PIMRC.2016.7794709.
 10
S. Sun, T. S. Rappaport, in 2017 IEEE International Conference on Communications Workshops (ICC Workshops), Paris. Millimeter Wave MIMO channel estimation based on adaptive compressed sensing, (2017), pp. 47–53. https://doi.org/10.1109/ICCW.2017.7962632.
 11
X. Li, J. Fang, H. Li, P. Wang, Millimeter wave channel estimation via exploiting joint sparse and lowrank structures. IEEE Trans. Wirel. Commun. 17(2), 1123–1133 (2018). https://doi.org/10.1109/TWC.2017.2776108.
 12
G. Destino, M. Juntti, S. Nagaraj, in Proc 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP). Leveraging sparsity into massive MIMO channel estimation with the adaptiveLASSO, (2015), pp. 166–170. https://doi.org/10.1109/GlobalSIP.2015.7418178.
 13
Z. Marzi, D. Ramasamy, U. Madhow, Compressive channel estimation and tracking for large arrays in mmwave picocells. IEEE J. Sel. Top. Sig. Proc.10(3), 514–527 (2016). https://doi.org/10.1109/JSTSP.2016.2520899.
 14
S. Malla, G. Abreu, in Proc 2016 Int. Symposium on Wireless Comm. Systems (ISWCS). Channel estimation in millimeter wave MIMO systems: Sparsity enhancement via reweighting, (2016), pp. 230–234. https://doi.org/10.1109/ISWCS.2016.7600906.
 15
Z. Gao, L. Dai, Z. Wang, in Proc 2016 IEEE Int. Conf. on Commun. (ICC). Channel estimation for mmwave massive MIMO based access and backhaul in ultradense network, (2016), pp. 1–6. https://doi.org/10.1109/ICC.2016.7511578.
 16
M. Kokshoorn, H. Chen, Y. Li, B. Vucetic, Beamongraph: Simultaneous channel estimation for mmwave MIMO systems with multiple users. IEEE Trans. Commun. PP(99), 1–1 (2018). https://doi.org/10.1109/TCOMM.2018.2791540.
 17
Y. Nesterov, Gradient methods for minimizing composite objective function. Math. Program. Ser. B. 140:, 125–161 (2007).
 18
E. J. Candès, M. B. Wakin, S. P. Boyd, Enhancing sparsity by reweighted ℓ _{1} minimization. J. Fourier Anal. Appl.14(5), 877–905 (2008). https://doi.org/10.1007/s000410089045x.
 19
A. M. Sayeed, Deconstructing multiantenna fading channels. IEEE Trans. Sig. Process. 50(10), 2563–2579 (2002). https://doi.org/10.1109/TSP.2002.803324.
 20
J. Mo, P. Schniter, N. G. Prelcic, R. W. Heath, in Proc 2014 48th Asilomar Conference on Signals, Systems and Computers. Channel estimation in millimeter wave MIMO systems with onebit quantization, (2014), pp. 957–961. https://doi.org/10.1109/ACSSC.2014.7094595.
 21
C. A. Balanis, Antenna Theory: Analysis and Design (WileyInterscience, Hoboken, New Jersey, 2005).
 22
Y. S. Cho, J. Kim, W. Y. Yang, C. G. Kang, MIMOOFDM Wireless Communications with MATLAB (Wiley, Hoboken, New Jersey, 2010).
 23
E. Björnson, E. G. Larsson, T. L. Marzetta, Massive mimo: ten myths and one critical question. IEEE Commun. Mag. 54(2), 114–123 (2016). https://doi.org/10.1109/MCOM.2016.7402270.
 24
D. Tse, P. Viswanath, Fundamentals of Wireless Communication (Cambridge University Press, New York, 2005).
 25
S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge university press, Cambridge, 2004).
 26
M. A. T. Figueiredo, R. D. Nowak, S. J. Wright, Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Sig. Proc.1(4), 586–597 (2007). https://doi.org/10.1109/JSTSP.2007.910281.
 27
B. O’Donoghue, E. Candès, Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015). https://doi.org/10.1007/s1020801391503.
 28
N. Parikh, S. Boyd, Proximal algorithms. Found. Trends Optim.1(3), 127–239 (2014). https://doi.org/10.1561/2400000003.
 29
A. Chambolle, R. A. D. Vore, N. Y. Lee, B. J. Lucier, Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Process. 7(3), 319–335 (1998). https://doi.org/10.1109/83.661182.
 30
Y Nesterov, A method of solving a convex programming problem with convergence rate o(1/k 2). Sov. Math. Dokl. 27:, 372–376 (1983).
Acknowledgements
No acknowledgements.
Funding
This work has been supported by Huawei Technology, Italy.
Availability of data and materials
No data is available.
Author information
Affiliations
Contributions
The ain contributions of this paper are as follows: The proposal of two new algorithms for the channel estimation in mmwave systems and the performance evaluation of the proposed algorithms in a 5g scenario. All authors read and approved the final manuscript.;
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Soleimani, H., De Donno, D. & Tomasin, S. mmWave channel estimation with accelerated gradient descent algorithms. J Wireless Com Network 2018, 272 (2018). https://doi.org/10.1186/s1363801812823
Received:
Accepted:
Published:
Keywords
 Compressed sensing
 Estimation
 mmWave