Spectral-efficient hybrid precoding for multi-antenna multi-user mmWave massive MIMO systems with low complexity

Millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems allow for a data transmission rate of gigabits per second owing to the large bandwidth available in the mmWave spectrum and the antenna gains provided by the massive MIMO system. However, hybrid precoding with high complexity and low spectral efficiency cannot address the challenge of high cost and power consumption of RF chains of multi-user systems. In this paper, we propose a low-complexity hybrid precoding scheme for downlink multi-antenna multi-user mmWave massive MIMO systems, aiming to enhance the sum spectral efficiency (SSE) performance. We first extend the dimension of the analog precoding matrix into a square matrix and find the optimal analog combiner by selecting some of the discrete Fourier transform (DFT) bases, which enhances the equivalent baseband channel matrix gain. Then, we directly aggregate the channel gain through the equal gain transmission (EGT) method to ensure the frequency efficiency performance. Finally, we propose an improved BD scheme to design the digital precoder and combiner to reduce the inter-user interference. We consider both the mmWave channel and the Rayleigh channel to evaluate the performance of the proposed algorithm. The simulation results verify that the proposed scheme enjoys near-optimal achievable sum spectrum efficiency and BER performance in both the mmWave channel and Rayleigh channel and performs even better in Rayleigh channel than in the mmWave channel.

communication systems [3]. A massive MIMO antenna array can overcome high path loss in the mmWave channel. Furthermore, the small wavelength and narrow beam characteristics of millimeter waves allow systems to employ large-scale antenna arrays [4], which can be packed into small-form factors and achieve reasonable array gains. Therefore, the combination of mmWave and massive MIMO systems can effectively realize the complementary advantages of the two technologies [5][6][7][8].
Implementing high-quality communication links in mmWave massive MIMO systems requires the deployment of large antenna arrays at base station (BS). Each BS needs to serve multiple mobile users (MU) simultaneously for efficient system performance. Therefore, precoding is applied to generate the transmitted signal at the BS for multiplexing various data streams to different users and canceling out noise, inter-user interference and fast fading in the mmWave massive MIMO systems. Traditional full digital precoding reduces the interference between data streams and users and the complexity of receiver computation [9,10]. However, full digital precoding schemes require every antenna element to be driven by its own dedicated energy-intensive radio frequency (RF) chain, imposing extremely high costs and energy consumption [11]. In contrast, an antenna array of full analog precoding is connected to only one RF chain by a phase shifter [12,13]. Since the phase shifter cannot control the amplitude, the performance of full analog precoding scheme is restricted. To solve these problems, hybrid precoding schemes have been proposed that implement cost-effective variable phase shifters in the RF domain and allow reduced-dimensional signal processing schemes to be carried out digitally in the baseband [14]. Hybrid precoding can reduce the number of RF chains without obvious performance loss [15], thus, achieving a tradeoff between system performance and hardware complexity. Because of the characteristics of anti-multipath fading [16], anti-interference, high spectral efficiency [17], high energy efficiency [18], etc., hybrid precoding has become an important signal processing technology in mmWave massive MIMO systems [19][20][21][22].
Hybrid precoding algorithms in mmWave massive MIMO system can be divided into three categories according to the number of users and the number of antennas. The first is the single-user system in which the user carries multiple antennas. [23] proposed a hybrid precoding that selects analog and digital precoder from a discrete Fourier transform (DFT)-based codebook to maximize the spectral efficiency in single-user system. The theoretical basis of this algorithm is to minimize the chordal distance between the optimal unconstrained precoder obtained from the maximum right singular vector of the channel and the hybrid beamformers selected from the statistical DFT codebook. However, it aims to minimize the performance loss under limited feedback CSI under transmitters assumptions. [24] proposed a double-pilot-based hybrid precoding system, which predicts the analog precoding by using deep learning method and updates the equivalent channel frequently for the digital precoding by enhancing the frequency of equivalent channel estimation. This method is flexible but with high consumption of pilot and computation time because the time-varying influence. [25] proposed a hybrid precoding algorithm based on switches that selects a subset of antennas to feed through the RF switching network and delivers a lower-dimensional array to the digital domain. Although this algorithm permits low-complexity and hardware implementation, devising such a hybrid beamformer can be computationally expensive for large-dimensional systems due to the binary nature of the selection problem. Since the single-user system cannot meet the practical needs that the system simultaneously serves multiple user devices, multi-user systems has been attracted more attention. There are two categories in multi-user systems: One is that each user carries a single antenna, and the other is that each user adopts multiple antennas to transmit signals. In single-antenna multiuser system, it is difficult to jointly consider the changes of user-specific signal quality and multi-user interference. [26] proposed a low-cost hybrid precoding algorithm based on quantized CSI feedback for a single-antenna scenario and revealed that pure analog precoding outperforms hybrid precoding in terms of the ergodic achievable rate under certain conditions, as derived in closed forms with respect to the SNR, the number of users and the number of feedback bits. However, directly quantized feedback still causes a huge amount of overhead to maintain a usable accuracy. In [27], a deep learning quantized phase hybrid precoding algorithm was proposed. This algorithm estimates the channel vectors with a deep compression sensing algorithm and trains the hybrid precoding neural network with the estimated channel vectors in the offline state. Then, the analog precoding matrix can be obtained by ideal phase quantization and the digital precoding matrix can be obtained via ZF algorithm in the online state. However, the performance of deep learning-based CSI feedback architecture will decline significantly when the dataset is insufficient. Furthermore, the compressed sensing method used to obtain the estimated channel matrix is iterative, which will also cause poor real-time performance.
Since single-antenna multi-user systems cannot meet the requirements of multiple antenna configurations for user terminals, multi-antenna multi-user systems with larger antenna array and stronger interference have been attracted more attention in recent years. Hybrid precoding in multi-antenna multi-user mmWave massive MIMO system can be divided into joint optimization and two-stage optimization for analog precoding and digital precoding according to the design steps. In [28], a convolutional neural network (CNN) framework was first proposed to optimize hybrid precoding for mmWave multi-user MIMO (MU-MIMO) systems in which the network takes the imperfect channel matrix as the input and jointly produces analog and digital precoders as outputs. This method can effectively enhance the performance and efficiency of the system, but using multiple large-dimensional layers constructions may consume tremendous computation time in the training phase, which is impractical with the hardware constraints (e.g., limited computational capability and memory resources) of mobile terminals. Generally, the joint optimization algorithm should be an essential part of designing a precoding scheme because the comprehensiveness of problem formulation. Nevertheless, the signaling overhead is very heavy because the joint optimization requires CSI for the whole system which is equipped with large dimension antenna arrays. Additionally, the optimal solution is hardly tractable because the jointly optimization is a non-convex mixed integer optimization. Moreover, the digital precoding and analog beamforming are asynchronous process in practical application. Therefore, to eliminate the hardware restrictions and consider the asynchronism of analog and digital precoding process, the two-stage algorithms are used in practical design [29][30][31][32]. Although the two-stage method may not achieve optimal precoding design compared to the joint optimization with comprehensiveness of problem formulation, the solution of the two-stage method needs less feedback and training signals for being considered as a non-synchronous process. Therefore, the two-stage method has been widely appreciated in recent years. [29] proposed an alternating optimization scheme to design the analog precoder and combiner, and [30] sorted the antenna sub-arrays according to the capacity of the channel before optimization. The above two algorithms adopt the iterative optimization theory, which increases the computational complexity of implementation. In [31], the author developed an adaptive two-stage reduced dimensionality multi-user hybrid precoder algorithm with limited feedback. The considered model assumes that each user only supports a single data stream. This problem becomes more complicated if the transmission of multiple data streams per user and the hybrid structure implemented by the user are also taken into account. [32] proposed a two-stage hybrid precoding algorithm based on equal gain transmission (EGT) aiming at maximizing the end-to-end mutual information (EEMI). Apart from the low complexity of its implementation, the developed scheme is a more generic hybrid block diagonalization (HBD) solution, as it not only takes the frequency selectivity into account but also removes the reliance on the highresolution analog network. However, the traditional EGT method regards the problem of finding optimal analog precoding/combiner matrix as a multivariate coupling variable optimization that has non-convex nonlinearity. Moreover, the spectral efficiency performance of this algorithm was degraded in the case of serious interference between users. To achieve a tradeoff between system performance and complexity, two-stage hybrid precoding with low complexity and high spectral efficiency becomes a crucial solution to combat path loss and interference in multi-antenna multi-user mmWave massive MIMO system.
In this paper, we propose a spectral-efficient hybrid precoding algorithm for a downlink multi-antenna multi-user mmWave massive MIMO system. To reduce the hardware restriction and iterative process of the algorithm while ensuring the performance, we split the design into analog precoder/combiner and the digital counterparts. Because the large spectral efficiency can be obtained by increasing the equivalent baseband channel gain, we first propose to extend the dimension of the analog precoding matrix into a full rank matrix and find the optimal analog combiner in analog precoder/combiner processing. Then, we perform the analog combiner based on an EGT method to aggregate the channel gain. Different from traditional EGT algorithms [32], we propose to represent the codebook by a discrete Fourier transform (DFT) matrix because each column vector in DFT matrix is unrelated and the column vectors can be combined linearly to synthesize the array response vector in any direction [33], which makes the analog combiner design to be a sparse precoding optimization problem containing only unary variables. Moreover, because the number of receiving antennas per user in mmWave massive MIMO is much smaller than the number of BS antennas and the hardware restrictions in practical applications, the exhaustive search on the DFT bases is acceptable. Finally, BD precoding is performed based on the equivalent channel matrix to eliminate the inter-user interference. The contributions of the paper are summarized as follows.
(1) We divide the calculation of hybrid precoding into analog and digital stages to reduce the signaling overhead and formulate the spectral efficiency optimization as maximizing the equivalent channel gain in multi-antenna multi-user mmWave massive MIMO system. (2) We propose to denote the analog precoder as a square matrix to obtain a large equivalent channel gain and use the EGT method to reap the diversity benefit of an analog phased-array. Then, we construct a DFT basis by discretizing the codebook which is extracted from the equivalent channel gain, so that we can select vectors from the DFT basis for analog combiner design. The proposed design can obtain a large array gain by enlarging the sum squares of diagonal terms with low computational complexity. (3) Furthermore, we use a low-dimensional block diagonalization digital precoding and combining to attenuate the complexity and significantly maximize the spectral efficiency by reducing the inter-user interference of the multi-antenna multi-user system. (4) The theoretical analysis shows that the proposed hybrid precoding scheme can improve the spectral efficiency with low complexity. We also provide numerical results to demonstrate that the proposed scheme can achieve a better spectral efficiency and bit-error-rate than some existing schemes under both Saleh-Valenzuela mmWave and Rayleigh channels.
The rest of this paper is organized as follows: In Sect. 3, the system model for a multiuser massive MIMO system is described. In Sect. 4, a low-complexity hybrid precoding algorithm is proposed for downlink multi-antenna multi-user mmWave massive MIMO systems, aiming to enhance the sum spectral efficiency (SSE) performance. Moreover, the computational complexity of the proposed algorithm and other traditional algorithms are compared in a theoretical analysis. The simulation results and the conclusion are presented in Sects. 5 and 6, respectively. Notation: We use bold-faced lower-case and upper-case letters to denote column vector and Rayleigh matrices, respectively. (·) −1 and (·) H represent the inversion and conjugate of a matrix, respectively. E[·] denotes the expectation value. �·� and �·� F denote the norm of a vector and the Frobenius norm of a matrix, respectively. Finally, CN (0, 1) represents the complex Gaussian distribution with zero mean and unit variance.

Methodology
In this paper, we first introduce the research background and related hybrid precoding methods for mmWave massive MIMO systems in Sect. 1. Compared with single-user scenario, multi-user scenario satisfies the practical demand for multi-user service in modern communication systems. Compared with multi-user single-antenna scenario, the multi-user multi-antenna scenario meets the number requirements of cells which needed to be served in the actual mmWave communication system. However, existing hybrid precoding algorithms cannot maximize their spectral efficiency with low computational complexity because the antenna array is more complex and there is more interference in multi-user multi-antenna scenarios. Therefore, hybrid precoding with low complexity and high spectral efficiency becomes the key solution to combat interference and path loss in mmWave massive MIMO systems. We propose a spectral-efficient hybrid precoding algorithm in a downlink multi-user multiple-antenna mmWave massive MIMO system to maximize the sum spectral efficiency and reduce the complexity of the system. To reduce the iterative process of the algorithm while ensuring the performance, we split the precoding design into the RF precoding and the digital precoding design. In RF precoding, we extend the dimension of the analog precoding matrix into a square matrix and find the optimal analog combiner by selecting some of the discrete Fourier transform bases, which enhances the equivalent baseband channel matrix gain. Then, to ensure the spectral efficiency performance, we directly aggregate the channel gain through the equal gain transmission method. In terms of optimizing digital precoding, BD technology is used under equivalent channels to eliminate the inter-user interference.
To verify the effectiveness of the proposed algorithm, we have conducted a variety of experiments to obtain comparison results by calculating the computational complexity and simulating in the same environment in both mmWave and Rayleigh channel. The specific analysis can be found in Sect. 5.

Multi-user massive MIMO system model
A downlink transmission of a TDD-based multi-antenna massive MU-MIMO system is depicted in Fig. 1. The base station (BS) is equipped with N t transmission antennas and N t RF RF chains to communicate with K independent users. Each user is equipped with N r receiving antennas and N r RF RF chains to support N s data streams. To ensure the effectiveness of transmission, the number of transmitted steams is constrained by K N s ≤ N t RF ≤ N t for the BS and N s ≤ N r RF ≤ N r for each user. The transmitted symbols are processed by a N t RF × K N s digital precoder F BBk and mapped onto N t RF RF chains. Then, the symbols are processed by a N t × N t RF analog precoder F RF . Since the analog precoder is composed of analog phase shifters, it can only control the phase of the signal and each entry of F RF is normalized to satisfy F i,j = 1 where F i,j denotes the amplitude of the i, j -th element of F RF . The digital precoder F BBk enables both amplitude and phase modifications. We assume H k ∈ C N t ×N r as the downlink channel matrix for the k-th user. The received signal of the k-th user can be expressed as where S = s T 1 , s T 2 , · · · s T K T is the signal vector of K users, S ∈ C K N s ×1 satisfies E SS H = P K N S I K N S , P represents the average transmit power of the BS, and I K N S is a K N s × K N s identity matrix. n k ∈ C N r ×1 is the vector of the i.i.d. CN (0, 1) additive white Gaussian noise (AWGN). The received signal y k after combining at the k-th user can be represented as where W RFk is the N r × N r RF analog combiner matrix and W BBk is the N t RF × N s baseband combiner for the k-th user.
If H k = W H RFk H k F RF , k = 1, 2, · · · K is denoted as the equivalent baseband channel matrix, then (2) can be re-expressed as Therefore, the sum spectral efficiency of K users can be expressed as matrix of interference and noise.

Channel model
In this paper, the general channel matrix is set as large-scale path fading and Ḣ k indicates the normalized channel matrix for the k-th user, satisfying E Ḣ k 2 F = N t N r . In the mmWave channel, the main feature is the limited number of scattering clusters in the propagation path [34][35][36]. To characterize the limited scattering property of mmWave channels, the Saleh-Valenzuela geometric model is adopted in this paper [37,38]. Specifically, the channel H k ∈ C N t ×N r from the BS to the k-th user can be modeled as where N cl is the number of clusters and each cluster comprises N ray propagation paths. α k ul is the complex amplitude associated with the l-th path in the u-th scattering cluster. a k r θ r ul and a k t φ t ul represent the receive and transmit array response vectors for the k-th user, respectively, whereas θ r ul and φ t ul are the arrival and departure azimuth angles (AOA and DOA) of the (u, l)-th path. The truncated Laplacian distribution is employed to generate θ r ul and φ t ul [39]. Although the proposed algorithms can be implemented on any antenna array, we use a uniform linear array (ULA) as the array model, the array response vector can be defined as (2)  where is the wavelength of the signal, and d is the distance between adjacent antenna elements. It is always assumed that d = 2 .

Proposed hybrid precoding algorithm
Generally, the joint optimization of the digital hybrid precoder and analog combiner should be an essential part of designing a processing scheme for optimal sum spectral efficiency [28,40]. However, the optimal solution is hardly tractable because the jointly optimization is a non-convex mixed integer optimization. Moreover, the digital precoding and analog precoding are asynchronous process in practical application. Therefore, to eliminate the hardware restrictions and consider the asynchronism of analog and digital precoding process, we divide the calculation of the hybrid precoding into analog and digital steps [29][30][31][32]. In the first step, the BS and each MS design the analog precoding and combining vectors to maximize the desired array gain for equivalent channel matrix.
In the second step, the digital precoder is designed based on the equivalent channel matrix after the application of the proposed analog precoder in the first stage, to manage the multi-user interference. In general, the two-stage method is an asynchronous calculation of analog precoding and digital precoding based on the equivalent channel matrix.

The analog precoder/combiner design
Owing to the large number of antennas in massive MU-MIMO systems, the channel gains of the equivalent channel H eq can be scaled up through appropriate phase-only control in the RF domain. In traditional multi-user hybrid precoding algorithms [41,42], because the dimension of the analog precoding matrix F RF is N t × N t RF , the rank of the baseband equivalent channel matrix H eq is reduced after passing through the channel. This result is a reduction in the gain of the baseband equivalent channel matrix, thereby reducing the sum spectral efficiency. To obtain a large array gain, it is necessary to increase the rank of F RF . We first denote the analog precoding matrix as an N t × N t full rank matrix. Thus, the improved equipment baseband multi-user channel H eq can be expressed as To reap the diversity benefit of an analog phased-array, we utilize the analog precoding based on EGT method. The analog precoding matrix can be expressed as where ϕ (i,j) is the phase of the i, j -th element of (W RF H) H .
Since the diagonal elements in H eq represent the equivalent channel gain, the offdiagonal elements in H eq indicate the inter-chain interference. Therefore, to obtain the large array gain, we propose to increase the sum of the squares of the diagonal elements in H eq by finding the precoding combiners W RFk (k = 1, 2 · · · K ) of K users. The optimization problem is described in diagonal element of H eq and �·� 1 denotes the 1-norm of a vector, corresponding to the m-th RF chains of the k-th user. Because all W RFk (k = 1, 2 · · · K ) are independent of each other, the above equation is equivalent to maximizing (9) for all K users. Then, the analog combiner can be obtained from the following optimization, Due to the non-convex constraints of W RFk , we cannot directly solve the problem in (9). Therefore, we modify the constraints of the analog combining matrix by where w = 2π d sin θ represents the corresponding spatial frequency.
To obtain the large array gain, we require that the rank of the baseband equivalent channel matrix H eq does not decrease after the analog combining matrix W RFk is multiplied by the channel matrix H k . Therefore, we require the columns of W RFk to be pairwise orthogonal so that the rank of H eq is lower bound by N r RF > N s . Traditional EGT algorithms, such as [43,44], find the optimal analog combiner via iterative methods, which has non-convex nonlinearity and is time-consuming and computationally complex. Considering that each column (representing the antenna response in each incidence direction) of DFT codebook is orthogonal to each other, we propose a DFT codebook-based analog combiner whose orthogonal beam columns are specified by DFT codeword. The DFT base is a sparse base so that the problem formulation is transformed from multivariate coupling variable optimization problem to unitary sparse optimization problem, which reduces the computational complexity. According to the form of w (m) RFk , we discretize w into N r levels over [0, 2π ) and construct N r RF discrete Fourier transform (DFT) bases. The DFT base is a sparse base with orthogonal properties. It can be expressed as where C is a candidate set of w (m) RFk with N r RF DFT bases. Therefore, the design of the modified analog combiner can be re-expressed as To solve the problem in (14), we sort all the w Although various cross-domain collaborative filtering (CDCF) algorithms are presented to address the sparsity problem [45,46], as seen from the above analysis in our method, each user needs to solve (14) only once to obtain the corresponding analog combining matrix W RFk . In addition, the number of receiving antennas per user should be much smaller than the number of BS antennas, which makes the exhaustive search on the DFT bases acceptable.

The digital precoder design
After obtaining the analog precoder F RF and combiner W RFk , in this section, we apply a digital BD precoding scheme to eliminate the interference among users and maximize the spectral efficiency of the system. The BD scheme is an extension of the ZF scheme in multiuser MIMO systems. Each user's linear precoder and combiner can be obtained by two singular value decomposition (SVD) operations [47]. In our hybrid case, the BD scheme is designed based on the effective channel matrix H k , ∀k . To eliminate interference among multiple users, the following constraints need to be imposed on the baseband equivalent channel matrix It can be concluded from (15) that F BBi must be found in the null space of the other users' channel matrices. Thus, H k is defined as We compute the SVD of H k and obtain (13)  is defined as where V N s k represents the right singular matrix corresponding to a nonzero singular value, and V (N r RF −N s ) k represents the right singular matrix corresponding to a zero-singular value. Thus, we define the digital precoder as The baseband combiner of the k-th user is chosen as (18)  Therefore, the sum spectral efficiency achieved by the proposed hybrid precoding scheme is The DFT orthogonal basis is used to construct the analog combining matrix W RFk , where W H RFk W RFk = I . Thus, we obtain The sum spectral efficiency of K users can finally be expressed as where is a diagonal matrix, whose elements are the power loading coefficients, which can be found by water-filling on the singular values N s k from all users collected together, assuming a total power constraint.

Computational complexity analysis
In this subsection, the computational complexity of the proposed hybrid precoding algorithm is discussed. The complexity of the proposed algorithm comes from the assignment of analog precoders and the calculation of digital precoders. Table 1 provides the complexity comparison between the proposed hybrid precoding and some other schemes. The full-digital BD precoding scheme has the highest computational complexity. Considering the typical mmWave MIMO system with N t = 256, N r = 16, K = 8 , N t RF = 8, N r RF = 1 , we can observe that the complexity of the proposed BD precoding scheme is approximately 67,108,864 multiplications, which is only approximately 0.45% as complex as the full-digital BD precoding scheme.
Compared with the existing hybrid precoding algorithms based on the single-user system [23][24][25] and the single-antenna multi-user systems [26,27], our work can be applied  in multi-antenna multi-user systems with low complexity and high spectral efficiency.
To avoid high computational complexity caused by non-convex mixed integer optimization of joint optimization [28], the proposed precoding and combining schemes use two-stage design. Compared with the two-stage hybrid precoding algorithms in [29][30][31][32], the proposed algorithm achieves a tradeoff between system performance and complexity because the multi-user equivalent channel matrix enabled by the proposed analog precoder can directly aggregate the channel gains in massive MIMO system. In addition, compared with the traditional EGT algorithms in [30,41,42], a DFT basis is constructed to significantly reduce the computational complexity. Moreover, the proposed algorithm can make full use of the multi-antenna array gain to achieve a higher sum rate so that a higher spectral efficiency performance can be guaranteed. Compared with [31,48,49] and the conventional BD precoding schemes, the proposed algorithm has the optimal spectral efficiency and BER performance with the lowest computational complexity.

Simulation results and discussion
In this section, we compare the performance of the proposed algorithm with the spatially sparse hybrid precoding scheme [48], the MMSE combiner hybrid precoding scheme [49], the limited feedback hybrid precoding scheme [31], and the conventional BD precoding scheme. In addition, we also provide the simulation results of the spectral efficiency to evaluate the performance of the proposed hybrid precoding scheme in Rayleigh fading channel to verify the practicability of this algorithm in different channels, because the Rayleigh fading channel can effectively describe the wireless environment with obstacles which can scatter many signals. For mmWave channel, the clustered mmWave channel model is employed to characterize the limited scattering feature of the mmWave channel. Both the transmission and receiving antenna arrays are ULAs with antenna spacing d = 2 . Since the BS usually employs directional antennas to eliminate interference and increase antenna gains [20], the AoDs are assumed to follow a uniform distribution within − π 6 , π 6 . Moreover, due to the random position of users, we assume that the AoAs follow a uniform distribution within [−π , π] , which means that omnidirectional antennas are adopted by users. For Rayleigh fading channel, variable settings are consistent with those in the mmWave channel.

Spectral efficiency
To verify the spectral efficiency performance of the proposed algorithm, Fig. 2 depicts the performance comparison of the sum spectral efficiency against the SNR in mmWave channel and Rayleigh channel, with N t = 128 and N r = 16 . It can be seen that the proposed scheme is significantly superior to the other schemes in both channels. The EGT enabled by the RF precoder can directly aggregate the channel gains so that the spectral efficiency performance can be guaranteed. When SNR = 0 dB, the sum spectral efficiency of the proposed scheme in mmWave channel is 47.7 bits/s/Hz, and it is 47.26 bits/s/Hz in Rayleigh channel. Comparing Fig. 2a and b, we find that when the number of BS antennas is 128, the performance of the proposed scheme in the Rayleigh channel is worse than that in the mmWave channel. It is probably due to the fact that the DFT bases selection in the proposed scheme essentially captures the dominant paths of the mmWave channels.  Figure 3 compares the sum spectral efficiency versus the number of BS antennas in both channels given K = 8, 16 user antennas, and SNR = 0 dB. The number of scattering clusters is 8, and each cluster has 10 paths so that the Rayleigh channel is complex with multiple scattered clusters and the mmWave channel is sparse. We find that the performance improves with an increase in the number of BS antennas for all hybrid structures. Moreover, because the useful information only exists in a few paths in mmWave sparse channel, it is more difficult to obtain the exact channel state information. In this case, the proposed algorithm has better spectral efficiency than the traditional BD method, so it can get better performance when used in the massive MIMO systems with different channels.

Robustness evaluation
Simulation results regarding the performance for different numbers of users are provided in Fig. 4, with N t = 256, N r = 16 and SNR= 0 dB. It can be clearly seen that the performance in both channels improves with the increase in the number of users for all hybrid structures and the performance of the proposed scheme is optimal by using an improved low-dimensional BD precoder and combiner. When the number of users is 8, the sum spectral efficiency gap between the proposed scheme and the hybrid BD precoding scheme is 2.738 bits/s/Hz in mmWave channel and 2.136 bits/s/   Hz in Rayleigh channel. Therefore, the proposed algorithm can significantly cancel inter-user interference in multi-user systems. Simulation results regarding the performance for different numbers of BS RF chains are provided in Fig. 5, with N t = 256, N r = 16 . It can be clearly seen that the performance improves as the number of RF chains increase for all hybrid structures and the proposed scheme is significantly superior to the other schemes. The proposed scheme can be safely recommended for implementation in systems with a large number of RF chains, because it is less vulnerable to the inter-user interference than the other schemes in this case. However, as the number of RF chains increases to 13, the performance of all the algorithms tends to be stabilized. This is because when the number of RF chains grows beyond an optimal value, inter-user interference substantially becomes more severe and the sum spectral efficiency is gradually influenced.
Simulation results regarding the performance for different numbers of data streams are provided in Fig. 6, with N t = 256, N r = 16 . Figure 6 shows that when the number of data streams is 8, the spectral efficiency achieved by the proposed scheme is 130.9 bits/s/Hz in mmWave channel and 134.3 bits/s/Hz in Rayleigh channel. It can be clearly seen that the performance improves as the number of data streams for all hybrid structures increase, and that the proposed scheme is significantly superior   to the other schemes. However, as the number of data streams increases to 5, the performance of the proposed algorithm tends to be decrease in both channels. This is because the pursuit of the large array gain slightly introduces the inter-stream interference in the RF domain, which will degrade the system spectral efficiency after the baseband BD processing. Figure 7 illustrates the bit-error-rate (BER) performance achieved by the proposed scheme for the 256 × 16 massive MIMO system. The proposed hybrid precoding method achieves better BER performance than other hybrid precoding methods. This is because the proposed scheme generates multiple subchannels with the equal gain transmission for each user, which improves the total BER performance. From Fig. 7, we can also find that as the SNR increases, the BER achieved by each algorithm decreases. Furthermore, by comparing the achieved robustness performances between the Rayleigh channel and mmWave channel, due to the poor scattering nature of mmWave channel, we find that the proposed scheme achieves a better robustness performance in Rayleigh channel than in mmWave channel.

Conclusion
In this paper, we propose a high spectral-efficiency hybrid precoding algorithm for downlink multi-user mmWave massive MIMO systems with low complexity. We extend the dimensions of the analog precoding matrix into a square matrix and find the optimal analog combiner to increase the gain of the equivalent baseband channel matrix. Then, a block diagonal (BD) precoding is performed based on the equivalent channel seen from the baseband to eliminate inter-user interference. The proposed scheme, with its lower implementation and computational complexity, achieves a capacity performance that is close to and sometimes even higher than those of conventional BD processing. The simulation results indicate that the proposed scheme can achieve a better spectral efficiency and bit-error-rate (BER) performance than other precoding schemes in both the Rayleigh fading channel and mmWave channel.