Skip to main content

Multi-user hybrid precoding for mmWave massive MIMO systems with sub-connected structure


Hybrid precoding achieves a compromise between the sum rate and hardware complexity of millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems. However, most prior works on multi-user hybrid precoding only consider the full-connected structure. In this paper, a novel multi-user hybrid precoding algorithm is proposed for the sub-connected structure. Based on the improved successive interference cancellation (SIC), the analog precoding matrix optimization problem is decomposed into multiple analog precoding sub-matrix optimization problems. Further, a near-optimal analog precoder is designed through factorizing the precoding sub-matrix for each sub-array. Furthermore, digital precoding is designed according to the block diagonalization (BD) technology. Finally, the water-filling power allocation method is used to further improve the communication quality. The extensive simulation results demonstrate that the sum rate of the proposed algorithm is higher than the existing hybrid precoding methods with the sub-connected structure, and has higher energy efficiency compared with existing approaches. Moreover, the proposed algorithm is closer to the state-of-the-art optimization approach with the full-connected structure. In addition, the simulation results also verify the effectiveness of the proposed hybrid precoding design of the uniform planar array (UPA).


The high-speed and low-latency characteristics of the fifth generation (5G) are the biggest differences from previous communication systems. Many emerging technologies, such as physical layer technology, network densification technology, etc. [1,2,3,4], have only reached current progress. However, the key problem in the technical development of communication systems today is the shortage of spectrum. Millimeter-wave (mmWave) provides new spectrum resources for wireless communication systems and satisfies the bandwidth requirements for 5G services [5, 6]. MmWave large-scale multiple-input multiple-output (MIMO) technology with shorter wavelengths can package large-scale antennas into small sizes. Hybrid beamforming technology improves link reliability by compensating for severe mmWave path losses.

Traditional analog beamforming has also been considered in mmWave systems. The idea is to use a low-cost phase shifter (PS) to control the phase of the signal transmitted by each antenna [7,8,9]. The disadvantage is that it cannot transmit parallel data streams to provide multiplexing gain. However, the traditional full-digital beamforming, although the best multiplexing gain can be obtained [10, 11], each antenna requires a radio frequency (RF) chain, which is expensive and consumes a lot of power. Therefore, a hybrid precoding structure [12,13,14,15] that saves RF consumption and ensures good performance is extremely important.

For hybrid precoding schemes in single-user MIMO (SU-MIMO) systems, existing literatures [16] and [17] give different solutions from different perspectives. Based on the compressed sensing, [16] solves the problem in [18] with an alternative iteration between a locally optimal analog precoder and a baseband digital precoder. In addition, [17] resolves the matrix optimization problem into multiple optimization sub-problems by using the iterative algorithm in [19]. In addition, if the number of RF chains is greater than or equal to twice the number of data streams, hybrid beamforming can achieve the same performance as full-digital beamforming in the paper [20].

Inspired by [16], the work [21] proposes the orthogonal match pursuit algorithm (OMP) for multi-user MIMO (MU-MIMO) systems. A hybrid block diagonalization (Hy-BD) algorithm that analog precoding is designed by exhaustive searching and the equal gain transmission (EGT) is proposed in [22]. In a similar way, two hybrid BD algorithms that maximize the analog beamforming gain by iteratively updating the analog precoder and combiner are proposed in [9, 10]. Moreover, a series of hybrid zero-forcing (Hy-ZF) and hybrid minimum-mean-squared-error (Hy-MMSE) schemes are proposed in [24, 24,25,26]. However, those works [21,22,23,24,25,26,27,28] focus on the design of hybrid beamforming techniques based on the full-connected structure, which requires a lot of power consumption and is not efficient for implementation.

The full-connected structure means that each antenna element is connected to all RF chains. Because each RF chain is connected to only a subset of transmitting antennas, the sub-connected structure can maximize the system’s energy efficiency. For sub-connected structure, different solutions are given in [20, 23, 24, 27, 29,30,31,32,33,34]. However, the hybrid beamforming schemes [23, 29,30,31] are designed for MU-MISO systems with single antenna receivers, and only the scheme [9] is designed for MU-MIMO systems. Decomposing the total achievable rate optimization problem with non-convex constraints into a series of simple sub-rate optimization problems with the sub-connected structure is proposed in [33, 34], but they cannot be directly applicable to mmWave massive MU-MIMO systems. Although many scholars have conducted extensive research on hybrid beamforming, there is still much room for improvement in sub-connected hybrid precoding design, especially for MU-MIMO systems.

In this paper, we focus on the sub-connected structure design of hybrid precoding in mmWave massive MU-MIMO systems, where the single base station (BS) with multiple sub-arrays serves several multi-antenna users simultaneously. Assuming that the perfect channel state information (CSI) is available at both the BS and users, we propose a near-optimal hybrid precoding scheme by jointly designing the analog and digital beamformer/combiner. The contributions of this work are summarized as follows:

  1. (1)

    The proposed hybrid precoding design scheme is for the sub-connected structure in the mmWave massive MIMO system. Compared with the full-connected structure, it has lower hardware complexity. To reduce the computational complexity, we reformulate the original optimization problem as two mmWave sum-rate maximization subproblems according to the idea of hierarchical optimization.

  2. (2)

    To solve the sum-rate maximization problem, we propose the improved successive interference cancelation (SIC) method which designs the analog precoding scheme by trying to avoid the loss of information at each stage. Then the baseband BD scheme and water-filling power allocation method are utilized to solve the digital precoding and power allocation matrix, respectively. The proposed algorithm is a closed-form solution, and the result of this solution is stable.

  3. (3)

    The theoretical analysis and simulation results of the proposed hybrid precoding scheme are given in detail. We study the influence of various parameters on design performance for our algorithm. Simulation results show that the proposed algorithm has a higher sum rate than the existing hybrid precoding approaches under the sub-connected structure, and closes to the state-of-the-art optimization approach under the full-connected structure. Furthermore, the proposed algorithm has higher power efficiency compared with the optimization design algorithm under the full-connected structure.

Notation: In this paper, bold upper-case and lower-case letters denote matrices and vectors, respectively. \(E[ \cdot ]\) represents the expectation. \({( \cdot )^T}\), \({( \cdot )^{ - 1}}\), \({( \cdot )^H}\) and \({\left\| \cdot \right\| _F}\) denote the transpose, inversion, conjugate transpose, and Frobenius norm of a matrix, respectively. \({{\mathbf{I}}_N}\) is the \(N \times N\) identity matrix and \({{\mathbf{0}}_{M \times N}}\) is the \(M \times N\) all-zero matrix. \({{\mathbb{C}}^{{m} \times n}}\) represents an \(m \times n\) dimensional complex space. Finally, \(\angle {\mathbf{X}}\) denotes a matrix having elements of the form \({e^{j{\varphi_{i,j}}}}\), where \({\varphi_{i,j}}\) is the phase of the (ij) th element of \({\mathbf{X}}\).


In this paper, we first introduce the existing hybrid precoding methods for mmWave massive MIMO systems. They are almost all based on a full-connected structure and only consider the case of the uniform linear array (ULA). The research background and related methods are presented in Sect. 1. There are many factors that affect communication and rate performance. This paper considers improving system performance from the perspective of algorithm improvement and structure selection.

On the one hand, the application of hybrid precoding can effectively improve the system and sum rate performance. On the other hand, with the rapid development of mmWave communication, it also solves the problem of high energy consumption of traditional precoding. The hybrid precoding can be divided into a full-connected structure and a sub-connected structure. Compared with the full-connected structure, the sub-connected structure uses each RF chain to link an antenna subset, which greatly saves the number of RF layouts, has more application significance, and makes the hybrid precoding design more green and energy-saving. Compared with ULA, the uniform planar array (UPA) can use fewer array elements to achieve higher space utilization, which can reduce system cost.

The goal of this paper is to maximize the sum rate of the system by designing a hybrid precoding scheme for multiple users. Under the power limitation of BS, it is solved by two steps: analog precoding and digital precoding. Since the CSI of all users is completely available at the BS, inspired by SIC and based on the sub-connected structure we considered, a new analog precoding design scheme is proposed. The optimization sequence is selected according to the difference of each sub-channel, and then by considering the continuous optimization of each sub-matrix, an approximately optimal analog precoding is obtained. In terms of optimizing digital precoding, BD technology is used under equivalent channels to eliminate the inter-user interference. Finally, the water-filling method is used to achieve better power allocation.

In order to verify the effectiveness of the algorithm, we have conducted a variety of experiments to obtain comparison results. Firstly, we introduce several advanced hybrid precoding schemes. Then, the complexity is calculated, and the superiority of the proposed algorithm is proved in simulation. Finally, compare the proposed algorithm with other algorithms in the same environment. The specific analysis can be found in Sect. 5.

System model and problem formulation

In this paper, we consider a sub-connected structure for hybrid precoding in mmWave massive MU-MIMO systems, as shown in Fig. 1. The BS is equipped with \({N_t}\) antennas and N independent RF chains. Each RF chain is connected to one sub-array, and each sub-array includes M antennas, then \(NM = {N_t}\). The BS communicates with K users. Each user is equipped with \({N_r}\) antennas to support \({N_s}\)(\({N_s} \ge 1\)) data streams, which means total \(K{N_s}\) data streams are processed by the BS.

Fig. 1

Sub-connected structure in mmWave massive MU-MIMO systems. The figure illustrates the application of a mmWave communication system equipped with hybrid analog/digital precoding in this paper

At the BS, the signals are processed by a power allocation matrix \({\mathbf{P}} \in {{\mathbb{C}}^{K{N_s} \times K{N_s}}}\) and then, it is processed by an analogue RF precoder \({{\mathbf{F}}_{\mathrm{RF}}} \in {{\mathbb{C}}^{{N_t} \times N}}\) after the baseband digital precoder \({{\mathbf{F}}_{\mathrm{BB}}} \in {{\mathbb{C}}^{N \times K{N_s}}}\). Finally, the pre-encoded signal is sent to the wireless channel. It should be pointed out that the baseband precoder \({{\mathbf{F}}_{\mathrm{BB}}}\) enables both amplitude and phase modifications, but only phase changes (phase-only control) can be realized by \({{\mathbf{F}}_{\mathrm{RF}}}\) since it is implemented by using analog phase shifters. Each entry of \({{\mathbf{F}}_{\mathrm{RF}}}\) is normalized to satisfy \({\left| {\left. {{\mathbf{F}}_{\mathrm{RF}}^{i,j}} \right| } \right. ^2} = \frac{1}{{{N_t}}}\). Moreover, to satisfy the total transmit power constraint, \({{\mathbf{F}}_{\mathrm{BB}}}\) is normalized to satisfy \(\left\| {\left. {{{\mathbf{F}}_{\mathrm{RF}}}{{\mathbf{F}}_{\mathrm{BB}}}} \right\| } \right. _F^2 = K{N_s}\). The structure of \({{\mathbf{F}}_{\mathrm{RF}}} \in {{\mathbb{C}}^{{N_t} \times N}}\) is given as

$$\begin{aligned} {{\mathbf{F}}_{\mathrm{RF}}} = {{\mathrm{blk(}}}{{{{\bar{\mathbf{a }}}}}_1},{{{{\bar{\mathbf{a }}}}}_2}, \ldots ,{{{{\bar{\mathbf{a }}}}}_N}{{\mathrm{)}}}, \end{aligned}$$

where \({{{{\bar{\mathbf{a }}}}}_n} \in {{\mathbb{C}}^{{M} \times 1}}, n \in \{ 1,2, \ldots ,N\}\).

Therefore, the received signal vector \({{{{\hat{\mathbf{y }}}}}_k} \in {{\mathbb{C}}^{{N_s} \times 1}}\) at the kth user can be written as

$$\begin{aligned} {{{{\hat{\mathbf{y }}}}}_k} = {{\mathbf{H}}_k}{{\mathbf{F}}_{\mathrm{RF}}}{\mathbf{F}}_{\mathrm{BB}}^k{{\mathbf{s}}_k} + \sum \limits _{i = 1,i \ne k}^K {{{\mathbf{H}}_k}{{\mathbf{F}}_{\mathrm{RF}}}{\mathbf{F}}_{\mathrm{BB}}^i} {{\mathbf{s}}_i} + {{\mathbf{n}}_k}, \end{aligned}$$

where \({{\mathbf{s}}_k} \in {{\mathbb{C}}^{{N_s} \times 1}},k \in \{ 1,2, \ldots ,K\}\) means the signal vector of the \({N_s}\) data streams. \({\mathbf{F}}_{\mathrm{BB}}^k\) is the \(((k - 1){N_s} + 1)\)-th to the \(k{N_s}\)-th columns of \({{\mathbf{F}}_{\mathrm{BB}}}\), corresponding to the precoding for \({{\mathbf{s}}_k}\). The transmit signal vector \({\mathbf{s}}\) is assumed to satisfy \({\mathbf{{\mathrm{E}} }}\left[ {{\mathbf{s}}{{\mathbf{s}}^H}} \right] = \frac{1}{{K{N_s}}}{{\mathbf{I}}_{K{N_s}}}\). \({\mathbf{s}} = \left[ {\begin{array}{*{20}{l}}{\begin{array}{*{20}{l}}{\begin{array}{*{20}{l}}{{\mathbf{s}}_1^T,}&{{\mathbf{s}}_2^T,}\end{array}}&\ldots \end{array},}&{{\mathbf{s}}_K^T}\end{array}} \right] \in {{\mathbb{C}}^{K{N_s} \times 1}}\) represents the total vector of transmitted signals of K users. \({{\mathbf{H}}_k} \in {{\mathbb{C}}^{{N_r} \times {N_t}}}\) denotes the channel matrix based on the Saleh–Valenzuela model between the BS and the kth user. \({{\mathbf{n}}_k} \in {{\mathbb{C}}^{{N_r} \times 1}}\) is an additive Gaussian white noise vector with independent and identically distribution (i.i.d.).

When the Gaussian symbols are used by the BS, the sum rate achieved will be shown as

$$\begin{aligned} R&= \sum \limits _{k = 1}^K {{{\log }_2}(1 + {{\mathrm{SIN}}}{{{\mathrm{R}}}_{{\mathrm{{k}}}}})} \\&= \sum \limits _{k = 1}^K {{{\log }_2}\left( 1 + \frac{{\frac{{{P_N}}}{{K{N_s}}}\left\| {\left. {{{\mathbf{H}}_k}{{\mathbf{F}}_{\mathrm{RF}}}{\mathbf{F}}_{\mathrm{BB}}^k} \right\| } \right. _F^2}}{{\frac{{{P_N}}}{{K{N_s}}}\sum \nolimits _{i = 1,i \ne k}^K {\left\| {\left. {{{\mathbf{H}}_k}{{\mathbf{F}}_{\mathrm{RF}}}{\mathbf{F}}_{\mathrm{BB}}^i} \right\| } \right. _F^2} + {\sigma^2}}}\right) } ,\end{aligned}$$

where \({P_N}\) is the transmit power, and the noise variance at each user is \({\sigma^2} = 1\). \({{\mathrm{SIN}}}{{{\mathrm{R}}}_{\mathrm{{k}}}}\) is expressed as the signal-to-interference noise ratio (\({{\mathrm{SINR}}}\)) of the signal \({{\mathbf{s}}_k}\). It can be calculated by the ratio of the energy of the useful signal in (3) to the interference of the remaining terms plus noise energy.

In this paper, we use the geometric Saleh–Valenzuela channel model which is more appropriate for mmWave communication [35, 36]. The normalized mmWave downlink channel for the kth user \({{\mathbf{H}}_k}\) is assumed to be contributed by \({N_{\mathrm{c}}}{N_{\mathrm{p}}}\) propagation paths, where \({N_{\mathrm{c}}}\) is the number of scattering clusters and \({N_{\mathrm{p}}}\) is the number of paths of each cluster. Therefore, the channel of kth user can be expressed as [37]

$$\begin{aligned} {{\mathbf{H}}_k} = \sqrt{\frac{{{N_r}{N_t}}}{{{N_{\mathrm{c}}}{N_{\mathrm{p}}}}}} \sum \limits _{i = 1}^{{N_{\mathrm{c}}}} {\sum \limits _{l = 1}^{{N_{\mathrm{p}}}} {\alpha _{i,l}^k} } {\mathbf{a}}_r^k(\theta _{i,l}^k,\varphi_{i,l}^k){\mathbf{a}}_t^k{(\theta _{i,l}^k,\varphi_{i,l}^k)^H}, \end{aligned}$$

where \(\alpha _{i,l}^k\) is the complex gain of the ith path in the lth cluster, which follows \({{{\mathcal{C}}}}{{{\mathcal{N}}}}({\mathbf{0 }},{\sigma^2}{\mathbf{I }})\). \(\theta _{i,l}^k\) and \(\varphi_{i,l}^k\) denote the horizontal and elevation angles in (4), respectively. \({\mathbf{a}}_r^k(\theta _{i,l}^k,\varphi_{i,l}^k)\) and \({\mathbf{a}}_t^k(\theta _{i,l}^k,\varphi_{i,l}^k)\) represent the array response vectors of the kth user and the BS, respectively.

For the ULA with U elements, the array response vector can be presented as [34]

$$\begin{aligned} {{\mathbf{a}}_{ULA}}(\theta ) = \sqrt{\frac{1}{U}} {\left[ {1,{{\mathop {\mathrm{e}}\nolimits } ^{j\frac{{2\pi }}{\lambda }d\sin (\theta )}},\ldots ,{{\mathop {\mathrm{e}}\nolimits } ^{j(U - 1)\frac{{2\pi }}{\lambda }d\sin (\theta )}}} \right] ^T}, \end{aligned}$$

where d is the spacing distance between two neighboring antenna elements, and \(\lambda\) is the wavelength of the transmission. But, we do not include \(\varphi\) since the ULA response vector is independent of the elevation angle.

Furthermore, when we consider the UPA with \({W_1}\) and \({W_2}\) elements (\({W_1}{W_2} = U\)) on horizon and vertical, respectively, the array response vector can be given [34]

$$\begin{aligned} {{\mathbf{a}}_{{\mathrm{UPA}}}}(\theta ,\varphi )&= \sqrt{\frac{1}{U}} \left[ {1,\ldots ,{{\mathop {\mathrm{e}}\nolimits } ^{j\frac{{2\pi }}{\lambda }d(x\sin (\theta )\sin (\varphi ) + y\cos (\varphi ))}},} \right. \\&\quad {\left. {\ldots ,{{\mathop {\mathrm{e}}\nolimits } ^{j\frac{{2\pi }}{\lambda }d(({W_1} - 1)\sin (\theta )\sin (\varphi ) + ({W_2} - 1)\cos (\varphi ))}}} \right] ^T},\end{aligned}$$

where \(0 \le x \le ({W_1} - 1)\) and \(0 \le y \le ({W_2} - 1).\)

Proposed near-optimal hybrid precoding design

We aim to design the analog precoder \({{\mathbf{F}}_{\mathrm{RF}}}\) and digital precoder \({{\mathbf{F}}_{\mathrm{BB}}}\), so as to maximize the total sum rate R, which can be written as

$$\begin{aligned} \left( {{{\mathbf{F}}_{\mathrm{RF}}},{{\mathbf{F}}_{\mathrm{BB}}},{\mathbf{P}}} \right)&= \mathop {\arg {\mathrm{max}} }\limits _{\left( {{{\mathbf{F}}_{\mathrm{RF}}},{{\mathbf{F}}_{\mathrm{BB}}},{\mathbf{P}}} \right) } R\\&\quad s.t.{\left| {\left. {{\mathbf{F}}_{\mathrm{RF}}^{i,j}} \right| } \right. ^2} = \frac{1}{{{N_t}}},\\&\quad \left\| {{{\mathbf{F}}_{\mathrm{RF}}}{{\mathbf{F}}_{\mathrm{BB}}}} \right\| _F^2 = K{N_s},\\&\quad {{\mathbf{F}}_{\mathrm{RF}}} = {{\mathrm{blk(}}}{{{{{\bar{\mathbf{a }}}}}}_1},{{{{{\bar{\mathbf{a }}}}}}_2}, \ldots ,{{{{{\bar{\mathbf{a }}}}}}_N}{{\mathrm{)}}},\\&\quad \left\| {\left. {\mathbf{P}} \right\| } \right. _F^2 = {P_N}.\end{aligned}$$

Since the nonzero elements in the analog precoding matrices are usually realized by phase shifters [34], the nonzero elements in \({{\mathbf{F}}_{\mathrm{RF}}}\) satisfy the constant-modulus constraints. Unfortunately, the non-convex constraints on the constant-modulus constraints lead the optimization to be non-convex. In other words, it is difficult to find the globally optimal solution of problem (7).

Analog precoding design

In the case of multiple users, the inter-user interference can be effectively eliminated by using the baseband BD technology. After removing the interference between users, R in (7) can be rewritten as

$$\begin{aligned} R = {\log _2}\left( \left| {{{\mathbf{I}}_{K{N_r}}} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{H }}{{\mathbf{F}}_{\mathrm{RF}}} {{\mathbf{F}}_{\mathrm{BB}}}{\mathbf{F}}_{\mathrm{BB}}^H{\mathbf{F}}_{\mathrm{RF}}^H{{\mathbf{H }}^H}} \right| \right) . \end{aligned}$$

It means we should find the optimal solution \({{\mathbf{F}}_{\mathrm{RF}}}\) in R as far as possible. Based on (1), the limitations of the analog precoding matrix design are constant amplitude and BD. However, these non-convex constraints make it difficult to maximize the capacity of (8). Based on the special block diagonal structure of the hybrid precoding matrix \({{\mathbf{F}}_{\mathrm{RF}}}\), we observe that the precoding on different sub-antenna arrays is independent. Inspired by [33, 34], we can resolve the complicated optimization problem (8) into a series of sub-rate optimization problems, which is much easier solved.

In other words, considering each antenna array connected to each RF chain one by one, we can optimize the sum rate of the first antenna array selected by turning off all their antenna sub-arrays. After that, we can select the sum rate of the second antenna array that needs to be optimized.

The traditional SIC method is optimized in a recursive order, but the channel state of each antenna sub-array is different. We can sort the N antenna sub-arrays according to the capacity of the channel before optimization. The optimized order of capacity is determined by the pros and cons of the capacity, that is, our optimization order is in the order of screening.

\({C_n}\) is defined as the capacity of the nth antenna sub-array in the millimeter wave massive MIMO systems, where \(n = 1,2, \ldots ,N\). After the optimization sequence is determined, we will perform the above-mentioned SIC process until the last antenna sub-array is optimized. During the calculation, we assume that the digital precoding matrix is fixed. Then the objective in (8) can be expressed as follows

$$\begin{aligned} {{\mathbf{F}}_{\mathrm{RF}}}&= \arg \mathop {{\mathrm{max}} }\limits _{{{\mathbf{F}}_{\mathrm{RF}}}} {C_{{\mathrm{max}} }}\\&\quad s.t.{\left| {\left. {{\mathbf{F}}_{\mathrm{RF}}^{i,j}} \right| } \right. ^2} = \frac{1}{{{N_t}}},\\&\quad {{\mathbf{F}}_{\mathrm{RF}}} = {{\mathrm{blk(}}}{{{{{\bar{\mathbf{a }}}}}}_1},{{{{{\bar{\mathbf{a }}}}}}_2}, \ldots ,{{{{{\bar{\mathbf{a }}}}}}_N}{{\mathrm{)}}},\end{aligned}$$

where \({C_{{\mathrm{max}} }} = \sum \nolimits _{n = 1}^N {{{\log }_2}(1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}} {\mathbf{H }}{{\mathbf{F}}_{\mathrm{RF}}}{\mathbf{F}}_{\mathrm{RF}}^H{{\mathbf{H }}^H}) = {C_1} + {C_2} + \cdots {C_N}\). After the analog precoding is obtained, the optimal digital precoding matrix is solved by the baseband BD technology.

We can divide the hybrid precoding matrix \({{\mathbf{F}}_{\mathrm{RF}}}\) into \({{\mathbf{F}}_{\mathrm{RF}}} = ({\mathbf{F}}_{\mathrm{RF}}^{N - 1}{\mathbf{F}}_{\mathrm{RF}}^N)\) at the BS. \({\mathbf{F}}_{\mathrm{RF}}^N\) is the Nth column of \({{\mathbf{F}}_{\mathrm{RF}}}\), and \({\mathbf{F}}_{\mathrm{RF}}^{N - 1}\) is an \(NM \times (N - 1)\) matrix containing the first \(N - 1\) columns of \({{\mathbf{F}}_{\mathrm{RF}}}\). Then the sum rate in (9) can be rewritten as

$$\begin{aligned} {C_{{\mathrm{max}} }}&= {\log _2}\left( \left| {{{\mathbf{I}}_{K{N_r}}} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{H }}[({\mathbf{F}}_{\mathrm{RF}}^{N - 1}{\mathbf{F}}_{\mathrm{RF}}^N)]{{[({\mathbf{F}}_{\mathrm{RF}}^{N - 1}{\mathbf{F}}_{\mathrm{RF}}^N)]}^H}{{\mathbf{H }}^H}} \right| \right) \\&= {\log _2}\left( \left| {{{\mathbf{I}}_{K{N_r}}} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{HF}}_{\mathrm{RF}}^{N - 1}{\mathbf{F}}_{\mathrm{RF}}^{N - {1^H}}{{\mathbf{H }}^H} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{HF}}_{\mathrm{RF}}^N{\mathbf{F}}_{\mathrm{RF}}^{{N^H}}{{\mathbf{H }}^H}} \right| \right) .\end{aligned}$$

Define auxiliary matrix

$$\begin{aligned} {{\mathbf{S}}_{N - 1}} = {{\mathbf{I}}_{K{N_r}}} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{HF}}_{\mathrm{RF}}^{N - 1}{\mathbf{F}}_{\mathrm{RF}}^{N - {1^H}}{{\mathbf{H }}^H}. \end{aligned}$$

Due to the fact that \(\left| {{\mathbf{I + XY }}} \right| = \left| {{\mathbf{I + YX }}} \right|\) by defining \({\mathbf{X}} = {\mathbf{S}}_{N - 1}^{ - 1}{\mathbf{HF}}_{\mathrm{RF}}^N\) and \({\mathbf{Y }} = {\mathbf{F}}_{\mathrm{RF}}^{N - {1^H}}{{\mathbf{H }}^H}\). (10) can be simplified as

$$\begin{aligned} {C_{{\mathrm{max}} }}&{\mathop {=}\limits ^{(a)}} {\log _2}\left( \left| {{{\mathbf{S}}_{N - 1}}} \right| \right) + {\log _2}\left( \left| {{{\mathbf{I}}_{K{N_r}}} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{S}}_{N - 1}^{ - 1}{\mathbf{HF}}_{\mathrm{RF}}^N{\mathbf{F}}_{\mathrm{RF}}^{{N^H}}{{\mathbf{H }}^H}} \right| \right) \\&{\mathop {=}\limits ^{(b)}} {\log _2}\left( \left| {{{\mathbf{S}}_{N - 1}}} \right| \right) + {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{N^H}} {{\mathbf{H }}^H}{\mathbf{S}}_{N - 1}^{ - 1}{\mathbf{HF}}_{\mathrm{RF}}^N} \right| \right) .\end{aligned}$$

Obviously, the second term \(1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{N^H}}{{\mathbf{H }}^H}{\mathbf{S}}_{N - 1}^{ - 1}{\mathbf{HF}}_{\mathrm{RF}}^N\) on the right side of (b) in (12) is the achievable sub-rate of the Nth antenna sub-array and the first term has the same form as (8). Further, we can decompose \({\log _2}(\left| {{{\mathbf{S}}_{N - 1}}} \right| )\) using the similar method in (12) as

$$\begin{aligned} {\log _2}(\left| {{{\mathbf{S}}_{N - 2}}} \right| ) + {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}({\mathbf{F}}_{\mathrm{RF}}^{N - {1^H}}{{\mathbf{H }}^H}{\mathbf{S}}_{N - 2}^{ - 1}{\mathbf{HF}}_{\mathrm{RF}}^{N - 1}} \right| \right) . \end{aligned}$$

Then, after N such decompositions, the total sum rate in (9) can be shown as

$$\begin{aligned} {C_{{\mathrm{max}} }} = \sum \limits _{n = 1}^N {{\log }_2}\left( 1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{H }}^H}{\mathbf{S}}_{n - 1}^{ - 1}{\mathbf{HF}}_{\mathrm{RF}}^n\right) , \end{aligned}$$

where \({{\mathbf{S}}_n} = {{\mathbf{I}}_{K{N_r}}} + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{HF}}_{\mathrm{RF}}^n{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{H }}^H}\) and \({{\mathbf{S}}_1} = {{\mathbf{I}}_N}\).

According to the analysis above, the capacity of the first and the optimized antenna sub-array can be expressed as

$$\begin{aligned} {C_{n,{\mathrm{max}} }} = {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{T}}_{n - 1}}{\mathbf{F}}_{\mathrm{RF}}^n} \right| \right) , \end{aligned}$$

where \({C_{n,{\mathrm{max}} }} \in {\mathrm{max}} \left\{ {\left. {\begin{array}{*{20}{l}} {{C_1}}&{{C_2}}&{\begin{array}{*{20}{l}} \cdots&{{C_N}} \end{array}} \end{array}} \right\} } \right.\) represents the first antenna sub-array that needs to be optimized. \({{\mathbf{T}}_{n - 1}} = {{\mathbf{H }}^H}{\mathbf{S}}_{n - 1}^{ - 1}{\mathbf{H }}\) satisfies the restrictions. Therefore, (15) can be rewritten as

$$\begin{aligned} {C_{n,{\mathrm{max}} }} = {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{G}}_{n - 1}}{\mathbf{F}}_{\mathrm{RF}}^n} \right| \right) , \end{aligned}$$

where \({{\mathbf{G}}_{n - 1}} \in {{\mathbb{C}}^{M \times M}}\) is the corresponding sub-array of \({{\mathbf{T}}_{n - 1}}\) by only keeping the rows and columns of \({{\mathbf{T}}_{n - 1}}\) from the \((M(n - 1) + 1)\)th one to the (Mn)th one, respectively. It can be presented as

$$\begin{aligned} {{\mathbf{G}}_{n - 1}} = {\mathbf{R }}{{\mathbf{T}}_{n - 1}}{{\mathbf{R }}^H} = {\mathbf{R }}{{\mathbf{H }}^H}{\mathbf{S}}_{n - 1}^{{{\mathrm{- 1}}}}{\mathbf{H }}{{\mathbf{R }}^H}, \end{aligned}$$

where \({\mathbf{R }} = {\left[ {\begin{array}{*{20}{l}} {{{\mathbf{0}}_{M \times M(n - 1)}}}\\ {{{\mathbf{I}}_M}}\\ {{{\mathbf{0}}_{M \times M(N - n)}}} \end{array}} \right] ^T}\) is the corresponding selection matrix. Defining the singular value decomposition (SVD) of \({{\mathbf{G}}_{n - 1}}\) as \({{\mathbf{G}}_{n - 1}} = {\mathbf{V }{\varvec{\Sigma }} }{{\mathbf{V }}^H}\), where \({\varvec{\Sigma }} \in {{\mathbb{C}}^{M \times M}}\) is the singular value of \({{\mathbf{G}}_{n - 1}}\), and \({\mathbf{V }} \in {{\mathbb{C}}^{M \times M}}\) is the right singular value vector of \({{\mathbf{G}}_{n - 1}}\).

The optimal solution of (17) can be obtained as

$$\begin{aligned} {\mathbf{F}}_{\mathrm{RF}}^{{N_{{\mathrm{opt}}}}} = {\left[ {\begin{array}{*{20}{l}} 0\\ {{{{{{\bar{\mathbf{a }}}}}}_{N,{\mathrm{opt}}}}}\\ 0 \end{array}} \right] _{NM \times 1}}, \end{aligned}$$

where \({{{{\bar{\mathbf{a }}}}}_{N,{\mathrm{opt}}}} \in {{\mathbb{C}}^{M \times 1}}\) represents the first column of \({\mathbf{V }}\). Since the elements of \({{{{\bar{\mathbf{a }}}}}_{N,{\mathrm{opt}}}}\) do not obey the constraint in Sect. 3, the analog precoding vector \({\mathbf{F}}_{\mathrm{RF}}^{{N_{{\mathrm{opt}}}}}\) cannot be directly chosen as \({{{{\bar{\mathbf{a }}}}}_{N,{\mathrm{opt}}}}\). Then, by calculating the MMSE between \({\mathbf{F}}_{\mathrm{RF}}^{{N_{{\mathrm{opt}}}}}\) and the solution \({\mathbf{F}}_{\mathrm{RF}}^N\) in the constrained case, the conclusion that the \({\mathbf{F}}_{\mathrm{RF}}^N\) shares the phase of the corresponding element of \({\mathbf{F}}_{\mathrm{RF}}^{{N_{{\mathrm{opt}}}}}\) can be obtained.

Matrices \({\varvec{\Sigma }}\) and \({\mathbf{V }}\) are, respectively, separated into following two parts:

$$\begin{aligned} {\varvec{\Sigma }} = \left[ {\begin{array}{*{20}{l}} \begin{array}{l} {{\varvec{\Sigma}}_1}\\ 0 \end{array}&{}\begin{array}{l} 0\\ {{\varvec{\Sigma}}_2} \end{array} \end{array}} \right] ,{\mathbf{V }} = [\begin{array}{*{20}{l}} {{{\mathbf{v}}_{\mathbf{1 }}}}&\quad {{{\mathbf{v}}_2}} \end{array}]. \end{aligned}$$

Further, the \({C_{n,{\mathrm{max}} }}\) given by (16) can also be rewritten as

$$\begin{aligned} {C_{n,{\mathrm{max}} }}&= {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{\mathbf{V }}{\varvec{\Sigma }} {{\mathbf{V }}^H}{\mathbf{F}}_{\mathrm{RF}}^n} \right| \right) \\&= {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{v}}_1}{{\varvec{\Sigma}}_1}{\mathbf{v}}_1^H{\mathbf{F}}_{\mathrm{RF}}^n + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{v}}_2}{{\varvec{\Sigma}}_2}{\mathbf{v}}_2^H{\mathbf{F}}_{\mathrm{RF}}^n} \right| \right) .\end{aligned}$$

In order to find the \({\mathbf{F}}_{\mathrm{RF}}^n\) closest to \({\mathbf{F}}_{\mathrm{RF}}^{{n_{{\mathrm{opt}}}}}\), we reasonably assume that \({\mathbf{F}}_{\mathrm{RF}}^n\) is orthogonal to \({{\mathbf{v}}_2}\) which is \({\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{v}}_2} \approx 0\). Due to \(\left| {{\mathbf{I + XY }}} \right| = \left| {{\mathbf{I + YX }}} \right|\) and effective theory of high signal-to-noise-ratio (\({{\mathrm{SNR}}}\)) approximation, i.e.,

$$\begin{aligned} {\left( 1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{{\varvec{\Sigma}}_1}\right) ^{ - 1}}\frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{{\varvec{\Sigma}}_1} \approx 1. \end{aligned}$$

Thus, (20) can be expressed as

$$\begin{aligned} {C_{n,{\mathrm{max}} }}&\approx {\log _2}\left( \left| {1 + \frac{{{P_N}{{\varvec{\Sigma}}_1}}}{{{\sigma^2}K{N_s}}}{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{v}}_1}{\mathbf{v}}_1^H{\mathbf{F}}_{\mathrm{RF}}^n} \right| \right) \\&\approx {\log _2}\left( \left| {1 + \frac{{{P_N}}}{{{\sigma^2}K{N_s}}}{{\varvec{\Sigma}}_1}} \right| \right) + {\log _2}\left( \left| {{\mathbf{F}}_{\mathrm{RF}}^{{n^H}}{{\mathbf{v}}_1}{\mathbf{v}}_1^H{\mathbf{F}}_{\mathrm{RF}}^n} \right| \right) .\end{aligned}$$

From (22), we observe that maximizing \({C_{n,{\mathrm{max}} }}\) is equivalent to maximize the square of the inner product between two vectors \({\mathbf{F}}_{\mathrm{RF}}^{{n_{{\mathrm{opt}}}}}\) and \({\mathbf{F}}_{\mathrm{RF}}^n\). Based on this fact, the optimization problem (15) is equivalent to the following

$$\begin{aligned} \mathop {\arg \min }\limits _{{\mathbf{F}}_{\mathrm{RF}}^N \in \zeta } \left\| {{\mathbf{F}}_{\mathrm{RF}}^{{n_{{\mathrm{opt}}}}} - {\mathbf{F}}_{\mathrm{RF}}^n} \right\| _2^2. \end{aligned}$$

The function of MMSE in all antenna sub-arrays can be expressed as

$$\begin{aligned} &{\mathrm{E}}\left\{ {\left\| {{\mathbf{F}}_{\mathrm{RF}}^{{\mathrm{opt}}} - {{\mathbf{F}}_{\mathrm{RF}}}} \right\| _F^2} \right\} \\&\quad = {{\mathrm{tr}}}\left\{ {{{({\mathbf{F}}_{\mathrm{RF}}^{{\mathrm{opt}}} - {{\mathbf{F}}_{\mathrm{RF}}})}^H}({\mathbf{F}}_{\mathrm{RF}}^{{\mathrm{opt}}} - {{\mathbf{F}}_{\mathrm{RF}}})} \right\} \\&\quad = 2N - {{\mathrm{tr}}}\left\{ {2{\mathop {\mathrm{Re}}\nolimits } {{({{\mathbf{F}}_{\mathrm{RF}}})}^H}{\mathbf{F}}_{\mathrm{RF}}^{{\mathrm{opt}}}} \right\} \\&\quad = 2N - 2\sum \limits _{n = 1}^N {\sum \limits _{m = 1}^{{N_t}} {{\mathop {\mathrm{Re}}\nolimits } } } \left\{ {\left. {\left. {\left| {{{\mathbf{F}}_{\mathrm{RF}}}(m,n)} \right. } \right| } \right| \left. {{\mathbf{F}}_{\mathrm{RF}}^{{\mathrm{opt}}}(m,n)} \right| {e^{j\varphi (m,n)}}} \right\} ,\end{aligned}$$

where \(\varphi (m,n) = \angle {{\mathbf{F}}_{\mathrm{RF}}}(m,n) - \angle {\mathbf{F}}_{\mathrm{RF}}^{{\mathrm{opt}}}(m,n)\). It is clear that when \(\varphi (m,n) = 0\), the objective function is minimized.

Therefore, the analog precoding matrix can be chosen as

$$\begin{aligned} {{{{\bar{\mathbf{a }}}}}_n} = \frac{1}{{\sqrt{M} }}{e^{j\angle {{{{{\bar{\mathbf{a }}}}}}_{n,opt}}}}, \end{aligned}$$

where \(\angle {{{{\bar{\mathbf{a }}}}}_{n,opt}}\) represents the phase vector of \({{{{\bar{\mathbf{a }}}}}_{n,opt}}\).

Therefore, the sum rate optimization problem can be transformed into a series of sub-rate optimization problems which can be optimized one by one. After that, according to the idea of SIC after sorting, we only need to continuously update \({{\mathbf{S}}_N}\),and the process is shown in Fig. 2.

Fig. 2

The structure diagram of analog precoding solution process. It shows the analog precoding solution process. After determining the optimization order firstly, and then optimizing the sub-matrices one by one. The whole process only needs to update \({{\mathbf{S}}_N}\)

According to the capacity \({C_{n,{\mathrm{max}} }} \in {\mathrm{max}} \left\{ {\left. {\begin{array}{*{20}{l}} {{C_1}}&{{C_2}}&{\begin{array}{*{20}{l}} \cdots&{{C_N}} \end{array}} \end{array}} \right\} } \right.\), \({\mathbf{F}}_{\mathrm{RF}}^{1,{\mathrm{max}} }\) indicates the analog precoding corresponding to the first optimized antenna array. \({\mathbf{F}}_{\mathrm{RF}}^{2,{\mathrm{max}} }\) is the second analog precoding that needs to be optimized. This process is repeated until the last antenna sub-array is optimized.

Digital precoding design

Based on the above solution process, the analog precoding matrix \({{\mathbf{F}}_{\mathrm{RF}}}\) can be obtained. In order to obtain the best digital precoding, BD technology is adopted. The MU-MIMO channel is divided into multiple SU-MIMO channels, which is the main idea of applying BD technology. If it can be guaranteed that the signal received by the kth user is in the null space of channels of other users, then the inter-user interference will be eliminated. First of all, the transit matrix \({{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k}}\) can be expressed as

$$\begin{aligned} {{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k}} = {{\mathbf{H}}_k}{{\mathbf{F}}_{\mathrm{RF}}},k \in \{ 1,2, \ldots ,K\} . \end{aligned}$$

In order to eliminate interference, the constraint can be expressed as

$$\begin{aligned} {{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,j}}{\mathbf{F}}_{\mathrm{BB}}^k = 0,\forall j \ne k. \end{aligned}$$

To get the digital precoder, \({{{{\tilde{\mathbf{H }}}}}_k}\) can be defined as

$$\begin{aligned} {{{{\tilde{\mathbf{H }}}}}_k} = {\left[ {\begin{array}{*{20}{l}} {\begin{array}{*{20}{l}} {\begin{array}{*{20}{l}} {{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,1}^T,}&{ \cdots ,} \end{array}}&{{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k - 1}^T,}&{{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k + 1}^T,}&{ \cdots ,} \end{array}}&{{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,K}^T} \end{array}} \right] ^T}. \end{aligned}$$

Then, the digital precoding \({\mathbf{F}}_{\mathrm{BB}}^k\) should fall in the null space of \({{{{\tilde{\mathbf{H }}}}}_k}\). Therefore, SVD of \({{{{\tilde{\mathbf{H }}}}}_k}\) can get

$$\begin{aligned} {{{{\tilde{\mathbf{H }}}}}_k} = {{{{\tilde{\mathbf{U }}}}}_k}{{{{\tilde{\varvec{\Sigma }}}}}_k}{\left[ {{{{\tilde{\mathbf{V }}}}}_k^{(1)},{{{\tilde{\mathbf{V }}}}}_k^{(0)}} \right] ^H}, \end{aligned}$$

where \({{{{\tilde{\mathbf{U }}}}}_k}\) and \({{{{\tilde{\varvec{\Sigma }}}}}_k}\) represent the left singular value vector of \({{{{\tilde{\mathbf{H }}}}}_k}\) and the diagonal matrix of \({{{{\tilde{\mathbf{H }}}}}_k}\), respectively. \({{{\tilde{\mathbf{V }}}}}_k^{(1)} = {{{{\tilde{\mathbf{V }}}}}_k}(:,1:(K - 1){N_s})\) and \({{{\tilde{\mathbf{V }}}}}_k^{(0)} = {{{{\tilde{\mathbf{V }}}}}_k}(:,(K - 1){N_s} + 1:end)\) represent the subspace orthogonal basis of \({{{{\tilde{\mathbf{H }}}}}_k}\) and the null space orthogonal basis of \({{{{\tilde{\mathbf{H }}}}}_k}\), respectively. Then we can know

$$\begin{aligned} {{{{{\tilde{\mathbf{H }}}}}}_k}{{{\tilde{\mathbf{V }}}}}_k^{(0)}&= {{{{{\tilde{\mathbf{U }}}}}}_k}{{{{{\tilde{\varvec{\Sigma }}} }}}_k}{\left[ {{{{\tilde{\mathbf{V }}}}}_k^{(1)},{{{\tilde{\mathbf{V }}}}}_k^{(0)}} \right] ^H}{{{\tilde{\mathbf{V }}}}}_k^{(0)}\\&= {{{{{\tilde{\mathbf{U }}}}}}_k}{{{{{\tilde{\varvec{\Sigma }}} }}}_k}{({{{\tilde{\mathbf{V }}}}}_k^{(1)})^H}{{{\tilde{\mathbf{V }}}}}_k^{(0)}\\&= 0.\end{aligned}$$

The channel becomes \({{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k}}{{{\tilde{\mathbf{V }}}}}_k^{(0)}\) called an equivalent channel. SVD of the equivalent channel shows

$$\begin{aligned} {{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k}}{{{\tilde{\mathbf{V }}}}}_k^{(0)} = {{\mathbf{U}}_k}{{\mathbf{S}}_k}{\left[ {{\mathbf{V}}_k^{(1)},{\mathbf{V}}_k^{(0)}} \right] ^H}. \end{aligned}$$

where \({{\mathbf{S}}_k}\) represents the diagonal matrix of equivalent channel (\({{\mathbf{H}}_{{\mathop {\mathrm{int}}} ,k}}{{{\tilde{\mathbf{V }}}}}_k^{(0)}\)). To eliminate inter-user interference, taking the \({\mathbf{V}}_k^{(1)}\) corresponding to the nonzero singular value matrix as the precoding matrix, and the final digital precoding matrix is given by

$$\begin{aligned} {\mathbf{F}}_{\mathrm{BB}}^k = {{{\tilde{\mathbf{V }}}}}_k^{(0)}{\mathbf{V}}_k^{(1)}. \end{aligned}$$

There are two types of BD algorithms: average power allocation and water-filling power allocation. Since the transmission capacity of each channel is usually different, the application of average power distribution results in the waste of communication resources and even the loss of communication capacity. The principle of the water-filling method is that after each user’s channel is divided into N independent sub-channels, the channel of each user of the multi-channel system may be equal to the channel of each bandwidth B. According to the Shannon formula, the subchannel capacity of the kth user is:

$$\begin{aligned} C(k) = Blb\left( 1 + {\left| {{f_k}} \right| ^2}\frac{{{p_k}}}{{{n_0}}}\right) . \end{aligned}$$

where \({{p_k}}\), \(\left| {{f_k}} \right|\), and \({{n_0}}\) are the transmission power, frequency response, and noise component of the kth subchannel, respectively. Because when N is large enough, the SNR of each channel can be regarded as a constant. In the case of known channel SNR, we can assign different power signals to each different channel to achieve the maximum sum rate. Therefore, the maximum sum capacity can be expressed as:

$$\begin{aligned} &{\mathrm{max}} C = \sum \limits _{k = 1}^N {Blb\left( 1 + {{\left| {{f_k}} \right| }^2}\frac{{{p_k}}}{{{n_0}}}\right) } \\ & s.t.\left\{ {\begin{array}{*{20}{l}} {\sum \limits _{k = 1}^N {{p_k} = {P_N}} }\\ {{p_k} \ge 0(n = 1,2, \ldots ,N)} \end{array},} \right.\end{aligned}$$

where \({{P_N}}\) is the total power. According to the Lagrangian multiplier algorithm, the power \({{p_k}}\) is:

$$\begin{aligned} {p_k} = \frac{B}{\lambda } - \frac{{{n_0}}}{{{f_k}}}, \end{aligned}$$

where \(\lambda\) is the Lagrangian multiplier factor, \(\frac{B}{\lambda }\) is called the water-filling line of the water-filling principle.

The principle of water-filling can reach the theoretical maximum of sum rate, and get better communication quality, thus it is widely used. The whole process of the algorithm in this paper is shown in Table 1.

Table 1 Algorithm 1

Results and discussion

In this section, we evaluate the performance of the proposed hybrid beamforming schemes with the sub-connected structure in MU-MIMO systems, the corresponding simulation results are described below [38]. All simulation results are averaged over 1000 channel realizations based on MATLAB platform, the Win10 system, the processor: Inter (R) Core (TM) i5-8250 U CPU @ 1.60 GHz, the RAM:8.00 GB, and the system type: 64-bit operating systems. For simplicity, the propagation environment is modeled as a \({N_{\mathrm{c}}} = 8\) cluster with \({N_{\mathrm{p}}} = 10\) rays per cluster, and the inter-element spacing d is assumed to be half wavelength. The AoA and AoD of each element are uniformly distributed in \(\left[ {0,2\pi } \right]\), respectively. Typical mmWave massive MIMO configurations with \({N_t} = 128\), \(N = 16\) and \({N_r} = 16\) are considered. The number of users is provided as \(K{{\mathrm{= 4}}}\). The noise variance at each user is \({\sigma^2} = 1\). The \({{\mathrm{SNR}}} = \frac{{{P_N}}}{{{\sigma^2}}}\). (Note: Unless otherwise specified, the above parameters are default parameters.) It is worth noting that we focus on the hybrid beamforming design of massive MIMO systems with sub-connected architecture in the paper. But we contrast the performance of the proposed method and the state-of-the-art hybrid beamforming design methods with full-connected architecture, which includes the least number of RF chains (the least number of RF chains is equal to the number of the transmitted streams) based HyEB scheme [28], the full-digital dirty paper coding (DPC) method [39]. Since the DPC realized with the iterative water-filling algorithm has been certified to be capacity-reaching in the broadcast channel, it is used as the performance upper bound of the hybrid ones. For the comparison of sub-connected structure methods, we will find the analog precoder by the SIC method [33]. The digital precoding is obtained by the BD technology. The above method is named SIC-BD algorithm in the system. In addition, we choose the Full-Analog precoding algorithm to compare with other algorithms. In this scheme, we consider the same parameter conditions as other algorithms, but do not consider inter-user interference. That is, the Full-Analog scheme in this case is the upper limit of the multi-user. For more convenient comparison and analysis, we define the full-connected as FC and sub-connected as SC in the following.

A. Performance for the sum rate

We first evaluate the sum rate performance for different methods versus SNR in ULA, and the corresponding simulation results are shown in Fig. 3. Here, Fig. 3 illustrates that the proposed precoding algorithm is proved valid when SNR increases from − 20 to 20 dB. The result under a massive MIMO system with \({N_t} = 128\) is represented by (a), and the result under a MIMO system with \({N_t} = 32\) is represented by (b). The simulation results also demonstrate that with an increasing SNR, the proposed hybrid precoding based on SC structure has a more near performance to those of the HyEB [28] on FC structure. And it is much higher than Full-Analog. To further investigate the performance of the proposed design scheme with small antenna arrays, Fig. 3b demonstrates the sum rate comparison for different beamforming schemes versus SNR when the number of BS antennas is small (\({N_t} = 32\)). In addition, the proposed algorithm has the objective capacity, it is still slightly higher than SIC-BD and Full-Analog.

Fig. 3

Sum rate comparison for different beamforming schemes versus SNR in ULA. It illustrates that the different precoding algorithms simulation results when SNR increases from − 20 to 20 dB. The result under a massive MIMO system with \({N_t} = 128\) is represented by (a), and the result under a MIMO system with \({N_t} = 32\) is represented by (b)

Fig. 4

Sum rate comparison for different beamforming schemes versus SNR in UPA. a \({N_t} = 128\) and b \({N_t} = 32\)

The performance of the sum rate versus SNR for different precoding algorithms in UPA is displayed in Fig. 4, where (a) represents \({N_t} = 128\) and (b) represents \({N_t} = 32\). It can be seen from Fig. 4 that the sum rate of each algorithm under UPA decreases slightly compared with that under ULA. The performance of the proposed algorithm in Fig. 4a is significantly better than that of SIC-BD. In Fig. 4b, the proposed algorithm is closer to the HyEB [28]. The Full-Analog algorithm is much lower than other algorithms. Although the use of UPA in the MU-MIMO channel will cause the overall performance of the proposed algorithm to slightly decrease, the trend of change is still consistent with the use of ULA. Furthermore, when the antenna deployment mode is changed from a linear array to an area array, the area of the antenna array deployed by the base station is greatly saved, and the space utilization rate of the base station and users on the device is effectively improved.

B. Performance for the number of BS antennas

The performance of the sum rate versus the BS antennas for different precoding algorithms is displayed in Fig. 5, where SNR = 0 dB. We note that the performance of all algorithms can be improved by increasing the number of BS antennas. When the number of BS antennas is large, the performance gap between the SC beamforming scheme and FC hybrid beamforming scheme becomes larger. But the proposed design scheme is better than the SIC-BD. Moreover, compared with the small number of BS antennas, the performance gap between the proposed beamforming scheme and the HyEB [28] scheme is small. The Full-Analog method is far lower than the proposed algorithm.

Fig. 5

Sum rate comparison for different hybrid precoding schemes versus the number of BS antennas. The graph shows the result of the number of antennas from 64 to 256 when SNR = 0 dB

C. Performance for the number of users

Figure 6 compares the sum rate performance of different precoding schemes versus the number of users with SNR = 5 dB, where the number of users changes from 2 to 12. We can see that the proposed method is very close to the SIC-BD, but the overall performance is still better than the SIC-BD. As the number of users increases, the sum rate performance of different design methods becomes large. Furthermore, it can also be explained that with the increase in the scale of the system, the proposed design scheme effectively eliminates inter-user interference, so as to improve the performance of the system. The sum rate of the Full-Analog algorithm does not change significantly with the number of users, and its growth rate is the smallest compared to other algorithms.

Fig. 6

The sum rate comparison for different hybrid precoding schemes versus the numbers of users. It compares the sum rate performance of different precoding schemes versus the number of users with SNR = 5 dB, where the number of users changes from 2 to 12

Table 2 The running time of five schemes (unit, second)

In order to compare the computational complexity of proposed schemes, we list the running time of five schemes in Table 2 with the average time over 100 random channel realizations. Regardless of the computer hardware, we can find that the running time of full-connected structure schemes is tremendously large. Although the full-connected structure has better performance, it has the disadvantages of complicated layout, high cost, and excessive power consumption. For the sub-connected structure, the hardware complexity and energy consumption are reduced, and the performance is not significantly different from the full-connected structure. The proposed algorithm is slightly slower than SIC-BD in running time, but its performance is ahead of SIC-BD due to its screening optimization. The full-analog algorithm takes faster time but the performance difference is obvious.

D. Performance for data streams per user

Figure 7 shows the sum rates achieved by different hybrid precoding schemes when the number of data streams per user is different, where \({N_s} = 2\), 4. Considering the costs and power consumption, we find that the performance of different hybrid precoding schemes with SC is similar but the proposed method is more closer to that of the HyEB [28] scheme as the number of data streams per user is small, i.e., \({N_s}= 2\). When the number of data streams provided by the system increases, the gaps between the sum rates of different schemes become larger correspondingly. However, the proposed hybrid precoding scheme still performs better than SIC-BD when the number of data streams is different.

Fig. 7

Sum rate comparison for hybrid precoding schemes versus \({N_s}\). The dotted line indicates sum rate when the number of data streams per user is \({N_s} = 2\). The solid line expresses sum rate when the number of data streams per user is \({N_s} = 4\)

E. Performance for the power efficient

As mentioned in Sect. 1, the power consumption is an important issue which should be considered for both the SC and FC hybrid precoding. In this subsection, we aim to compare the power efficiency performance of different hybrid precoding design schemes.

To better compare the performance of the two hybrid precoding structures, the power efficiency \(\eta\) is defined as the ratio between the achievable rate R and the total power consumption \({P_{{{\mathrm{total }}}}}\), which is expressed as follows:

$$\begin{aligned} \eta = \frac{R}{{{P_{{{\mathrm{total }}}}}}}({{\mathrm{bps}}}/{{{\mathrm{Hz}}}}/J), \end{aligned}$$

where the unit of \(\eta\) is bps/Hz/J and \({P_{{{\mathrm{total }}}}}\) is the total power consumption of the system.

Considering the hybrid precoding architecture, we can note that in the hybrid precoding architecture, the power is depleted by five blocks [40]: (a) the phase shifter (PS) on the transmitter side; (b) the RF chains on the transmitter side; (c) digital-to-analog converters (DAC) on the transmitter side; (d) the base-band (BB) processor; (e) the power amplifiers (PA) on the transmitter side.

Considering the full-digital precoding for MIMO, the amounts of power consumed by BS and users in full-digital MIMO architecture are written as

$$\begin{aligned} {P_{{{\mathrm{total,(DPC) }}}}} = {N_t}({P_{{{\mathrm{RF}}}}} + {P_{{\mathrm{DAC}}}} + {P_{{\mathrm{PA}}}}) + {P_{\mathrm{BB}}}, \end{aligned}$$

where \({P_{\mathrm{BB}}}\), \({P_{{{\mathrm{RF}}}}}\), \({P_{{\mathrm{PA}}}}\), \({P_{{\mathrm{PS}}}}\), and \({P_{{\mathrm{DAC}}}}\) are the power of BB, the power of each RF chain, the power of each PA, the power of each PS, and the power of each DAC, respectively.

Different from the full-digital precoding for MIMO, the total power consumption \({P_{{{\mathrm{total }}}}}\) in the hybrid precoding architecture can be written as

$$\begin{aligned} {P_{{{\mathrm{total,(Hybrid)}}}}} = N({P_{{{\mathrm{RF}}}}} + {P_{{\mathrm{DAC}}}} + {N_{{\mathrm{PS}}}}{P_{{\mathrm{PS}}}}) + {N_t}{P_{{\mathrm{PA}}}} + {P_{\mathrm{BB}}}. \end{aligned}$$

The simulation parameters according to [41,42,43] are set as follows: \({P_{\mathrm{BB}}}=243\,\mathrm{mW}\), \({P_{{{\mathrm{RF}}}}}=40\,\mathrm{mW}\), \({P_{{\mathrm{PA}}}}=16\,\mathrm{mW}\), \({P_{{\mathrm{DAC}}}}=110\,\mathrm{mW}\) and \({P_{{\mathrm{PS}}}}=10\,\mathrm{mW}\).

Here we note that for the FC and the SC structures, the number of phase shifters \({N_{{\mathrm{PS}}}}\) can be written as

$$\begin{aligned} {N_{{\mathrm{PS}}}} = \left\{ {\begin{array}{*{20}{l}} {NM,\quad FC}\\ {M,\quad SC} \end{array}} \right. \end{aligned}$$

Figure 8 compares the power efficiency for different hybrid precoding schemes versus SNR. It is observed from Fig. 8 that we discover that the performance of different hybrid precoding methods with SC is similar, but it is higher than hybrid precoding schemes with FC structures. It is obvious that the proposed algorithm has always been superior to the SIC-BD in the whole range. It can be noticed that the proposed method can issue the signal more efficiently than SIC-BD with the same SNR and power consumption, which means it has higher power efficiency. What is more, the full-digital MIMO architecture requires more hardware and produces higher power consumption, its power efficiency performance is relatively low compared with the hybrid architecture. Therefore, the full-digital MIMO architecture is rarely used for signal propagation in practical applications.

Fig. 8

Power efficient comparison for different hybrid precoding schemes versus SNR. The figure shows that the proposed method SC performs better than the method based on SIC-BD SC. The full-connected structure energy consumption is higher than sub-connected

F. Performance for sensitivity of channel estimation errors

Finally, we evaluate the impact of imperfect CSI on the proposed hybrid precoding. Let \({{{\tilde{\mathbf{H }}}}}\) represents the estimated channel, then it can be modeled as [44]

$$\begin{aligned} {{{\tilde{\mathbf{H }}}}} = \xi {\mathbf{H }} + \sqrt{1 - {\xi ^2}} {\mathbf{E }}, \end{aligned}$$

where \(\xi \in [0,1]\) expresses the accuracy of estimated CSI, and \({\mathbf{E }}\) is the error matrix with entries following the distribution i.i.d. \({{{\mathcal{C}}}}{{{\mathcal{N}}}}(0,1)\).

Fig. 9

Impact of imperfect CSI on the proposed scheme. It compares the sum rate performance of proposed algorithm based on different CSI

It can be noticed from Fig. 9 that the proposed hybrid precoding method seems to be insensitive to the CSI accuracy in SNR conditions. Even when the channel estimation accuracy is not high, the proposed method can obtain a considerable sum rate. It is particularly noticeable at low SNR. When SNR = 15 and \(\xi = 0.9\), the performance of the proposed method is quite close to that in the perfect CSI condition. It can still achieve about 96.9% of the perfect CSI condition’s sum rate. Even when \(\xi = 0.6\), the performance of the proposed method can still achieve about 84.1% of the rate in the perfect CSI condition. In this case, only 19.16 bps/Hz is lost compared to the case where the CSI is completely known in the transmission end. Therefore, the proposed method has strong robustness and certain practical value.


This paper has proposed a hybrid precoding scheme for MU-MIMO systems. According to the structure of the optimal hybrid precoding matrix, we decompose the maximum achievable rate optimization problem into a series of sub-rate optimization problems. Firstly, we focus on the design of the analog hybrid precoder and optimize it to maximize the overall analog beamforming gain, then perform BD technology on the equivalent baseband channel. Finally, the sum rate performance is improved again by water-filling power allocation. The simulation results agree with the theoretical analysis. It proves that the proposed multi-user scheme can achieve an appropriate compromise between hardware complexity and system performance. Both the sum rate and energy efficiency are improved, and the algorithm has strong robustness. The perspective of this work contains an extension to mmWave MIMO systems relying on lens antenna arrays [45], which have a small number of radio-frequency chains. In that work, the impact of pilot-data transmission [46, 47] for the overall system performance is also considered for practical applications. In future work, it is possible to add consideration to the performance change of the receiver as an imperfect receiver, which will be more practical in future applications.

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.



the fifth generation




multiple-input multiple-output


phase shifter


radio frequency


single-user MIMO


multi-user MIMO


orthogonal match pursuit


hybrid block diagonalization


equal gain transmission


hybrid zero-forcing


hybrid minimum-mean-squared-error


base station


channel state information


successive interference cancelation


independent and identically distribution


signal-to-interference noise ratio


uniform linear array


uniform planar array


singular value decomposition


dirty paper coding


  1. 1.

    J. Hoydis, S. Ten Brink, M. Debbah, Massive MIMO in the UL/DL of cellular networks: how many antennas do we need? IEEE J. Sel. Areas Commun. 31(2), 160–171 (2013)

    Article  Google Scholar 

  2. 2.

    E.G. Larsson, O. Edfors, F. Tufvesson, T.L. Marzetta, Massive MIMO for next generation wireless systems. IEEE Commun. Mag. 52(2), 186–195 (2014)

    Article  Google Scholar 

  3. 3.

    E. Björnson, L. Sanguinetti, M. Kountouris, Deploying dense networks for maximal energy efficiency: small cells meet massive MIMO. IEEE J. Sel. Areas Commun. 34(4), 832–847 (2016)

    Article  Google Scholar 

  4. 4.

    C. Li, J. Zhang, K.B. Letaief, Throughput and energy efficiency analysis of small cell networks with multi-antenna base stations. IEEE Trans. Wirel. Commun. 13(5), 2505–2517 (2014)

    Article  Google Scholar 

  5. 5.

    A.L. Swindlehurst, E. Ayanoglu, P. Heydari, F. Capolino, Millimeter-wave massive MIMO: the next wireless revolution? IEEE Commun. Mag. 52(9), 56–62 (2014)

    Article  Google Scholar 

  6. 6.

    W. Roh, J.-Y. Seol, J. Park, B. Lee, J. Lee, Y. Kim, J. Cho, K. Cheun, F. Aryanfar, Millimeter-wave beamforming as an enabling technology for 5G cellular communications: theoretical feasibility and prototype results. IEEE Commun. Mag. 52(2), 106–113 (2014)

    Article  Google Scholar 

  7. 7.

    V. Venkateswaran, A.-J. van der Veen, Analog beamforming in MIMO communications with phase shift networks and online channel estimation. IEEE Trans. Signal Process. 58(8), 4131–4143 (2010)

    MathSciNet  MATH  Article  Google Scholar 

  8. 8.

    S. Kutty, D. Sen, Beamforming for millimeter wave communications: an inclusive survey. IEEE Commun. Surv. Tutor. 18(2), 949–973 (2015)

    Article  Google Scholar 

  9. 9.

    S. Hur, T. Kim, D.J. Love, J.V. Krogmeier, T.A. Thomas, A. Ghosh, Millimeter wave beamforming for wireless backhaul and access in small cell networks. IEEE Trans. Commun. 61(10), 4391–4403 (2013)

    Article  Google Scholar 

  10. 10.

    J. Joung, A.H. Sayed, Multiuser two-way amplify-and-forward relay processing and power control methods for beamforming systems. IEEE Trans. Signal Process. 58(3), 1833–1846 (2009)

    MathSciNet  MATH  Article  Google Scholar 

  11. 11.

    A. Azizzadeh, R. Mohammadkhani, S.V.A.-D. Makki, E. Björnson, BER performance analysis of coarsely quantized uplink massive MIMO. Signal Process. 161, 259–267 (2019)

    Article  Google Scholar 

  12. 12.

    J. Zhang, Y. Huang, T. Yu, J. Wang, M. Xiao, Hybrid precoding for multi-subarray millimeter-wave communication systems. IEEE Wirel. Commun. Lett. 7(3), 440–443 (2017)

    Article  Google Scholar 

  13. 13.

    L. Dai, B. Wang, M. Peng, S. Chen, Hybrid precoding-based millimeter-wave massive MIMO-NOMA with simultaneous wireless information and power transfer. IEEE J. Sel. Areas Commun. 37(1), 131–141 (2018)

    Article  Google Scholar 

  14. 14.

    K. Song, B. Ji, Y. Huang, M. Xiao, L. Yang, Performance analysis of heterogeneous networks with interference cancellation. IEEE Trans. Veh. Technol. 66(8), 6969–6981 (2017)

    Article  Google Scholar 

  15. 15.

    C. Zhang, Y. Huang, Y. Jing, S. Jin, L. Yang, Sum-rate analysis for massive MIMO downlink with joint statistical beamforming and user scheduling. IEEE Trans. Wirel. Commun. 16(4), 2181–2194 (2017)

    Article  Google Scholar 

  16. 16.

    O. El Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, R.W. Heath, Spatially sparse precoding in millimeter wave MIMO systems. IEEE Trans. Wirel. Commun. 13(3), 1499–1513 (2014)

    Article  Google Scholar 

  17. 17.

    M. Majidzadeh, A. Moilanen, N. Tervo, H. Pennanen, A. Tölli, M., Latva-aho, Hybrid beamforming for single-user MIMO with partially connected RF architecture, in 2017 European Conference on Networks and Communications (EuCNC) (IEEE, 2017), pp. 1–6

  18. 18.

    F. Sohrabi, W. Yu, Hybrid digital and analog beamforming design for large-scale antenna arrays. IEEE J. Sel. Top. Signal Process. 10(3), 501–513 (2016)

    Article  Google Scholar 

  19. 19.

    Z. Pi, Optimal transmitter beamforming with per-antenna power constraints, in 2012 IEEE International Conference on Communications (ICC) (IEEE, 2012), pp. 3779–3784

  20. 20.

    F. Sohrabi, W. Yu, Hybrid beamforming with finite-resolution phase shifters for large-scale MIMO systems, in 2015 IEEE 16th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC) (IEEE, 2015), pp. 136–140

  21. 21.

    T.E. Bogale, L.B. Le, Beamforming for multiuser massive MIMO systems: digital versus hybrid analog–digital, in 2014 IEEE Global Communications Conference (IEEE, 2014), pp. 4066–4071

  22. 22.

    W. Ni, X. Dong, Hybrid block diagonalization for massive multiuser MIMO systems. IEEE Trans. Commun. 64(1), 201–211 (2015)

    Article  Google Scholar 

  23. 23.

    A. Li, C. Masouros, Hybrid analog–digital millimeter-wave MU-MIMO transmission with virtual path selection. IEEE Commun. Lett. 21(2), 438–441 (2016)

    Article  Google Scholar 

  24. 24.

    A. Li, C. Masouros, Hybrid precoding and combining design for millimeter-wave multi-user MIMO based on SVD, in 2017 IEEE International Conference on Communications (ICC) (IEEE, 2017), pp. 1–6

  25. 25.

    J. Jiang, Y. Yuan, L. Zhen, Multi-user hybrid precoding for dynamic subarrays in mmWave massive MIMO systems. IEEE Access 7, 101718–101728 (2019)

    Article  Google Scholar 

  26. 26.

    Z. Wang, M. Li, Q. Liu, A.L. Swindlehurst, Hybrid precoder and combiner design with low-resolution phase shifters in mmWave MIMO systems. IEEE J. Sel. Top. Signal Process. 12(2), 256–269 (2018)

    Article  Google Scholar 

  27. 27.

    N. Song, T. Yang, H. Sun, Overlapped subarray based hybrid beamforming for millimeter wave multiuser massive MIMO. IEEE Signal Process. Lett. 24(5), 550–554 (2017)

    Article  Google Scholar 

  28. 28.

    C. Hu, J. Liu, X. Liao, Y. Liu, J. Wang, A novel equivalent baseband channel of hybrid beamforming in massive multiuser MIMO systems. IEEE Commun. Lett. 22(4), 764–767 (2017)

    Article  Google Scholar 

  29. 29.

    S. Payami, M. Ghoraishi, M. Dianati, M. Sellathurai, Hybrid beamforming with a reduced number of phase shifters for massive MIMO systems. IEEE Trans. Veh. Technol. 67(6), 4843–4851 (2018)

    Article  Google Scholar 

  30. 30.

    A. Li, C. Masouros, Energy-efficient SWIPT: from fully digital to hybrid analog–digital beamforming. IEEE Trans. Veh. Technol. 67(4), 3390–3405 (2017)

    Article  Google Scholar 

  31. 31.

    F. Sohrabi, W. Yu, Hybrid analog and digital beamforming for mmWave OFDM large-scale antenna arrays. IEEE J. Sel. Areas Commun. 35(7), 1432–1443 (2017)

    Article  Google Scholar 

  32. 32.

    J.-C. Chen, Hybrid beamforming with discrete phase shifters for millimeter-wave massive MIMO systems. IEEE Trans. Veh. Technol. 66(8), 7604–7608 (2017)

    Article  Google Scholar 

  33. 33.

    L. Dai, X. Gao, J. Quan, S. Han, I. Chih-Lin, Near-optimal hybrid analog and digital precoding for downlink mmWave massive MIMO systems, in 2015 IEEE International Conference on Communications (ICC) (IEEE, 2015), pp. 1334–1339

  34. 34.

    X. Gao, L. Dai, S. Han, I. Chih-Lin, R.W. Heath, Energy-efficient hybrid analog and digital precoding for mmWave MIMO systems with large antenna arrays. IEEE J. Sel. Areas Commun. 34(4), 998–1009 (2016)

    Article  Google Scholar 

  35. 35.

    Z. Pi, F. Khan, An introduction to millimeter-wave mobile broadband systems. IEEE Commun. Mag. 49(6), 101–107 (2011)

    Article  Google Scholar 

  36. 36.

    R.W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, A.M. Sayeed, An overview of signal processing techniques for millimeter wave MIMO systems. IEEE J. Sel. Top. Signal Process. 10(3), 436–453 (2016)

    Article  Google Scholar 

  37. 37.

    M.R. Akdeniz, Y. Liu, M.K. Samimi, S. Sun, S. Rangan, T.S. Rappaport, E. Erkip, Millimeter wave channel modeling and cellular capacity evaluation. IEEE J. Sel. Areas Commun. 32(6), 1164–1179 (2014)

    Article  Google Scholar 

  38. 38.

    Y. Zhang, J. Du, Y. Chen, M. Han, X. Li, Optimal hybrid beamforming design for millimeter-wave massive multi-user MIMO relay systems. IEEE Access 7, 157212–157225 (2019)

    Article  Google Scholar 

  39. 39.

    N. Jindal, W. Rhee, S. Vishwanath, S.A. Jafar, A. Goldsmith, Sum power iterative water-filling for multi-antenna gaussian broadcast channels. IEEE Trans. Inf. Theory 51(4), 1570–1580 (2005)

    MathSciNet  MATH  Article  Google Scholar 

  40. 40.

    R. Méndez-Rial, C. Rusu, N. González-Prelcic, A. Alkhateeb, R.W. Heath, Hybrid MIMO architectures for millimeter wave communications: phase shifters or switches? IEEE Access 4, 247–267 (2016)

    Article  Google Scholar 

  41. 41.

    C.-E. Chen, An iterative hybrid transceiver design algorithm for millimeter wave MIMO systems. IEEE Wirel. Commun. Lett. 4(3), 285–288 (2015)

    Article  Google Scholar 

  42. 42.

    T.S. Rappaport, J.N. Murdock, F. Gutierrez, State of the art in 60-GHz integrated circuits and systems for wireless communications. Proc. IEEE 99(8), 1390–1436 (2011)

    Article  Google Scholar 

  43. 43.

    C.A. Balanis, Antenna Theory: Analysis and Design (Wiley, Hoboken, 2016)

    Google Scholar 

  44. 44.

    R.A. Horn, C.R. Johnson, Topics in Matrix Analysis (Cambridge University Press, Cambridge, 1991)

    MATH  Book  Google Scholar 

  45. 45.

    X. Gao, L. Dai, S. Zhou, A.M. Sayeed, L. Hanzo, Wideband beamspace channel estimation for millimeter-wave MIMO systems relying on lens antenna arrays. IEEE Trans. Signal Process. 67(18), 4809–4824 (2019)

    MathSciNet  MATH  Article  Google Scholar 

  46. 46.

    J. Du, M. Han, L. Jin, Y. Hua, X. Li, Semi-blind receivers for multi-user massive MIMO relay systems based on block Tucker2-PARAFAC tensor model. IEEE Access 8, 32170–32186 (2020)

    Article  Google Scholar 

  47. 47.

    Z. Zhou, L. Liu, J. Zhang, FD-MIMO via pilot-data superposition: tensor-based DOA estimation and system performance. IEEE J. Sel. Top. Signal Process. 13(5), 931–946 (2019)

    Article  Google Scholar 

Download references


The authors acknowledged the anonymous reviewers and editors for their efforts in constructive and generous feedback.


This research was supported by the grant from the National Natural Science Foundation of China (Nos. 61601414, 61702466), the National Key Research and Development Program of China (No. 2016YFB0502001), and the Fundamental Research Funds for the Central Universities (No. 2018CUCTJ082).

Author information




Z.W and L.J performed software. Z.W and Y.Z performed validation. J.D and Z.W were responsible for writing—original draft preparation. Y.Z., Y.G and L.J were involved in writing—review, proofreading and editing. J.D and Y.G were responsible for supervision. Conceptualization and methodology were performed by J.D. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Jianhe Du.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Du, J., Wang, Z., Zhang, Y. et al. Multi-user hybrid precoding for mmWave massive MIMO systems with sub-connected structure. J Wireless Com Network 2021, 157 (2021).

Download citation


  • Beamforming
  • Massive MIMO
  • Near-optimal
  • Hybrid precoding
  • Sub-connected structure