Multi-user hybrid precoding for mmWave massive MIMO systems with sub-connected structure

Hybrid precoding achieves a compromise between the sum rate and hardware complexity of millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) systems. However, most prior works on multi-user hybrid precoding only consider the full-connected structure. In this paper, a novel multi-user hybrid precoding algorithm is proposed for the sub-connected structure. Based on the improved successive interference cancellation (SIC), the analog precoding matrix optimization problem is decomposed into multiple analog precoding sub-matrix optimization problems. Further, a near-optimal analog precoder is designed through factorizing the precoding sub-matrix for each sub-array. Furthermore, digital precoding is designed according to the block diagonalization (BD) technology. Finally, the water-filling power allocation method is used to further improve the communication quality. The extensive simulation results demonstrate that the sum rate of the proposed algorithm is higher than the existing hybrid precoding methods with the sub-connected structure, and has higher energy efficiency compared with existing approaches. Moreover, the proposed algorithm is closer to the state-of-the-art optimization approach with the full-connected structure. In addition, the simulation results also verify the effectiveness of the proposed hybrid precoding design of the uniform planar array (UPA).

wavelengths can package large-scale antennas into small sizes. Hybrid beamforming technology improves link reliability by compensating for severe mmWave path losses.
Traditional analog beamforming has also been considered in mmWave systems. The idea is to use a low-cost phase shifter (PS) to control the phase of the signal transmitted by each antenna [7][8][9]. The disadvantage is that it cannot transmit parallel data streams to provide multiplexing gain. However, the traditional full-digital beamforming, although the best multiplexing gain can be obtained [10,11], each antenna requires a radio frequency (RF) chain, which is expensive and consumes a lot of power. Therefore, a hybrid precoding structure [12][13][14][15] that saves RF consumption and ensures good performance is extremely important.
For hybrid precoding schemes in single-user MIMO (SU-MIMO) systems, existing literatures [16] and [17] give different solutions from different perspectives. Based on the compressed sensing, [16] solves the problem in [18] with an alternative iteration between a locally optimal analog precoder and a baseband digital precoder. In addition, [17] resolves the matrix optimization problem into multiple optimization subproblems by using the iterative algorithm in [19]. In addition, if the number of RF chains is greater than or equal to twice the number of data streams, hybrid beamforming can achieve the same performance as full-digital beamforming in the paper [20].
Inspired by [16], the work [21] proposes the orthogonal match pursuit algorithm (OMP) for multi-user MIMO (MU-MIMO) systems. A hybrid block diagonalization (Hy-BD) algorithm that analog precoding is designed by exhaustive searching and the equal gain transmission (EGT) is proposed in [22]. In a similar way, two hybrid BD algorithms that maximize the analog beamforming gain by iteratively updating the analog precoder and combiner are proposed in [9,10]. Moreover, a series of hybrid zero-forcing (Hy-ZF) and hybrid minimum-mean-squared-error (Hy-MMSE) schemes are proposed in [24,[24][25][26]. However, those works [21][22][23][24][25][26][27][28] focus on the design of hybrid beamforming techniques based on the full-connected structure, which requires a lot of power consumption and is not efficient for implementation.
The full-connected structure means that each antenna element is connected to all RF chains. Because each RF chain is connected to only a subset of transmitting antennas, the sub-connected structure can maximize the system's energy efficiency. For subconnected structure, different solutions are given in [20,23,24,27,[29][30][31][32][33][34]. However, the hybrid beamforming schemes [23,[29][30][31] are designed for MU-MISO systems with single antenna receivers, and only the scheme [9] is designed for MU-MIMO systems. Decomposing the total achievable rate optimization problem with non-convex constraints into a series of simple sub-rate optimization problems with the sub-connected structure is proposed in [33,34], but they cannot be directly applicable to mmWave massive MU-MIMO systems. Although many scholars have conducted extensive research on hybrid beamforming, there is still much room for improvement in sub-connected hybrid precoding design, especially for MU-MIMO systems.
In this paper, we focus on the sub-connected structure design of hybrid precoding in mmWave massive MU-MIMO systems, where the single base station (BS) with multiple sub-arrays serves several multi-antenna users simultaneously. Assuming that the perfect channel state information (CSI) is available at both the BS and users, we propose a near-optimal hybrid precoding scheme by jointly designing the analog and digital beamformer/combiner. The contributions of this work are summarized as follows: (1) The proposed hybrid precoding design scheme is for the sub-connected structure in the mmWave massive MIMO system. Compared with the full-connected structure, it has lower hardware complexity. To reduce the computational complexity, we reformulate the original optimization problem as two mmWave sum-rate maximization subproblems according to the idea of hierarchical optimization. (2) To solve the sum-rate maximization problem, we propose the improved successive interference cancelation (SIC) method which designs the analog precoding scheme by trying to avoid the loss of information at each stage. Then the baseband BD scheme and water-filling power allocation method are utilized to solve the digital precoding and power allocation matrix, respectively. The proposed algorithm is a closed-form solution, and the result of this solution is stable. (3) The theoretical analysis and simulation results of the proposed hybrid precoding scheme are given in detail. We study the influence of various parameters on design performance for our algorithm. Simulation results show that the proposed algorithm has a higher sum rate than the existing hybrid precoding approaches under the sub-connected structure, and closes to the state-of-the-art optimization approach under the full-connected structure. Furthermore, the proposed algorithm has higher power efficiency compared with the optimization design algorithm under the full-connected structure.
Notation: In this paper, bold upper-case and lower-case letters denote matrices and vectors, respectively. E[·] represents the expectation. (·) T , (·) −1 , (·) H and �·� F denote the transpose, inversion, conjugate transpose, and Frobenius norm of a matrix, respectively. I N is the N × N identity matrix and 0 M×N is the M × N all-zero matrix. C m×n represents an m × n dimensional complex space. Finally, ∠X denotes a matrix having elements of the form e jϕ i,j , where ϕ i,j is the phase of the (i, j) th element of X.

Methodology
In this paper, we first introduce the existing hybrid precoding methods for mmWave massive MIMO systems. They are almost all based on a full-connected structure and only consider the case of the uniform linear array (ULA). The research background and related methods are presented in Sect. 1. There are many factors that affect communication and rate performance. This paper considers improving system performance from the perspective of algorithm improvement and structure selection. On the one hand, the application of hybrid precoding can effectively improve the system and sum rate performance. On the other hand, with the rapid development of mmWave communication, it also solves the problem of high energy consumption of traditional precoding. The hybrid precoding can be divided into a full-connected structure and a sub-connected structure. Compared with the full-connected structure, the subconnected structure uses each RF chain to link an antenna subset, which greatly saves the number of RF layouts, has more application significance, and makes the hybrid precoding design more green and energy-saving. Compared with ULA, the uniform planar array (UPA) can use fewer array elements to achieve higher space utilization, which can reduce system cost.
The goal of this paper is to maximize the sum rate of the system by designing a hybrid precoding scheme for multiple users. Under the power limitation of BS, it is solved by two steps: analog precoding and digital precoding. Since the CSI of all users is completely available at the BS, inspired by SIC and based on the sub-connected structure we considered, a new analog precoding design scheme is proposed. The optimization sequence is selected according to the difference of each sub-channel, and then by considering the continuous optimization of each sub-matrix, an approximately optimal analog precoding is obtained. In terms of optimizing digital precoding, BD technology is used under equivalent channels to eliminate the inter-user interference. Finally, the water-filling method is used to achieve better power allocation.
In order to verify the effectiveness of the algorithm, we have conducted a variety of experiments to obtain comparison results. Firstly, we introduce several advanced hybrid precoding schemes. Then, the complexity is calculated, and the superiority of the proposed algorithm is proved in simulation. Finally, compare the proposed algorithm with other algorithms in the same environment. The specific analysis can be found in Sect. 5.

System model and problem formulation
In this paper, we consider a sub-connected structure for hybrid precoding in mmWave massive MU-MIMO systems, as shown in Fig. 1. The BS is equipped with N t antennas and N independent RF chains. Each RF chain is connected to one sub-array, and each sub-array includes M antennas, then NM = N t . The BS communicates with K users. Each user is equipped with N r antennas to support N s (N s ≥ 1 ) data streams, which means total K N s data streams are processed by the BS.
At the BS, the signals are processed by a power allocation matrix P ∈ C K N s ×K N s and then, it is processed by an analogue RF precoder F RF ∈ C N t ×N after the baseband digital precoder F BB ∈ C N ×K N s . Finally, the pre-encoded signal is sent to the wireless channel. It should be pointed out that the baseband precoder F BB enables both amplitude and phase modifications, but only phase changes (phase-only control) can be realized by F RF since it is implemented by using analog phase shifters. Each entry of F RF is normalized to satisfy F i,j RF 2 = 1 N t . Moreover, to satisfy the total transmit power constraint, F BB is normalized to satisfy � F RF F BB � 2 F = K N s . The structure of F RF ∈ C N t ×N is given as Therefore, the received signal vector ŷ k ∈ C N s ×1 at the kth user can be written as where s k ∈ C N s ×1 , k ∈ {1, 2, . . . , K } means the signal vector of the N s data streams. F k BB is the ((k − 1)N s + 1)-th to the kN s -th columns of F BB , corresponding to the precoding for s k . The transmit signal vector s is assumed to satisfy E ss H = 1 K N s I K N s . s = s T 1 , s T 2 , . . . , s T K ∈ C K N s ×1 represents the total vector of transmitted signals of K users. H k ∈ C N r ×N t denotes the channel matrix based on the Saleh-Valenzuela model between the BS and the kth user. n k ∈ C N r ×1 is an additive Gaussian white noise vector with independent and identically distribution (i.i.d.).
When the Gaussian symbols are used by the BS, the sum rate achieved will be shown as where P N is the transmit power, and the noise variance at each user is σ 2 = 1 . SINR k is expressed as the signal-to-interference noise ratio ( SINR ) of the signal s k . It can be calculated by the ratio of the energy of the useful signal in (3) to the interference of the remaining terms plus noise energy.
In this paper, we use the geometric Saleh-Valenzuela channel model which is more appropriate for mmWave communication [35,36]. The normalized mmWave downlink channel for the kth user H k is assumed to be contributed by N c N p propagation paths, where N c is the number of scattering clusters and N p is the number of paths of each cluster. Therefore, the channel of kth user can be expressed as [37] where α k i,l is the complex gain of the ith path in the lth cluster, which follows CN (0, σ 2 I) . θ k i,l and ϕ k i,l denote the horizontal and elevation angles in (4), respectively. a k r (θ k i,l , ϕ k i,l ) and a k t (θ k i,l , ϕ k i,l ) represent the array response vectors of the kth user and the BS, respectively. For the ULA with U elements, the array response vector can be presented as [34] (1) where d is the spacing distance between two neighboring antenna elements, and is the wavelength of the transmission. But, we do not include ϕ since the ULA response vector is independent of the elevation angle. Furthermore, when we consider the UPA with W 1 and W 2 elements ( W 1 W 2 = U ) on horizon and vertical, respectively, the array response vector can be given [34] where 0 ≤ x ≤ (W 1 − 1) and 0 ≤ y ≤ (W 2 − 1).

Proposed near-optimal hybrid precoding design
We aim to design the analog precoder F RF and digital precoder F BB , so as to maximize the total sum rate R, which can be written as Since the nonzero elements in the analog precoding matrices are usually realized by phase shifters [34], the nonzero elements in F RF satisfy the constant-modulus constraints. Unfortunately, the non-convex constraints on the constant-modulus constraints lead the optimization to be non-convex. In other words, it is difficult to find the globally optimal solution of problem (7).

Analog precoding design
In the case of multiple users, the inter-user interference can be effectively eliminated by using the baseband BD technology. After removing the interference between users, R in (7) can be rewritten as It means we should find the optimal solution F RF in R as far as possible. Based on (1), the limitations of the analog precoding matrix design are constant amplitude and BD. However, these non-convex constraints make it difficult to maximize the capacity of (8). Based on the special block diagonal structure of the hybrid precoding matrix F RF , we observe that the precoding on different sub-antenna arrays is independent. Inspired by [ 33,34], we can resolve the complicated optimization problem (8) into a series of subrate optimization problems, which is much easier solved. In other words, considering each antenna array connected to each RF chain one by one, we can optimize the sum rate of the first antenna array selected by turning off all their antenna sub-arrays. After that, we can select the sum rate of the second antenna array that needs to be optimized.
The traditional SIC method is optimized in a recursive order, but the channel state of each antenna sub-array is different. We can sort the N antenna sub-arrays according to the capacity of the channel before optimization. The optimized order of capacity is determined by the pros and cons of the capacity, that is, our optimization order is in the order of screening.
C n is defined as the capacity of the nth antenna sub-array in the millimeter wave massive MIMO systems, where n = 1, 2, . . . , N . After the optimization sequence is determined, we will perform the above-mentioned SIC process until the last antenna sub-array is optimized. During the calculation, we assume that the digital precoding matrix is fixed. Then the objective in (8) can be expressed as follows After the analog precoding is obtained, the optimal digital precoding matrix is solved by the baseband BD technology.
We can divide the hybrid precoding matrix Obviously, the second term 1 + P N (12) is the achievable sub-rate of the Nth antenna sub-array and the first term has the same form as (8). Further, we can decompose log 2 (|S N −1 |) using the similar method in (12) as Then, after N such decompositions, the total sum rate in (9) can be shown as where S n = I K N r + P N σ 2 K N s HF n RF F n H RF H H and S 1 = I N . According to the analysis above, the capacity of the first and the optimized antenna sub-array can be expressed as where C n,max ∈ max C 1 C 2 · · · C N represents the first antenna sub-array that needs to be optimized. The optimal solution of (17) Further, the C n,max given by (16) can also be rewritten as In order to find the F n RF closest to F n opt RF , we reasonably assume that F n RF is orthogonal to v 2 which is F n H RF v 2 ≈ 0 . Due to |I + XY| = |I + YX| and effective theory of high signal-tonoise-ratio ( SNR ) approximation, i.e., Thus, (20) can be expressed as From (22), we observe that maximizing C n,max is equivalent to maximize the square of the inner product between two vectors F (20) C n,max = log 2 a n = 1 √ M e j∠ā n,opt , Therefore, the sum rate optimization problem can be transformed into a series of subrate optimization problems which can be optimized one by one. After that, according to the idea of SIC after sorting, we only need to continuously update S N ,and the process is shown in Fig. 2.
According to the capacity C n,max ∈ max C 1 C 2 · · · C N , F 1,max RF indicates the analog precoding corresponding to the first optimized antenna array. F 2,max RF is the second analog precoding that needs to be optimized. This process is repeated until the last antenna sub-array is optimized.

Digital precoding design
Based on the above solution process, the analog precoding matrix F RF can be obtained. In order to obtain the best digital precoding, BD technology is adopted. The MU-MIMO channel is divided into multiple SU-MIMO channels, which is the main idea of applying BD technology. If it can be guaranteed that the signal received by the kth user is in the null space of channels of other users, then the inter-user interference will be eliminated. First of all, the transit matrix H int,k can be expressed as In order to eliminate interference, the constraint can be expressed as To get the digital precoder, H k can be defined as Then, the digital precoding F k BB should fall in the null space of H k . Therefore, SVD of H k can get where Ũ k and ˜ k represent the left singular value vector of H k and the diagonal matrix of H k , respectively. Ṽ (1) k =Ṽ k (:, 1 : (K − 1)N s ) and Ṽ (0) k =Ṽ k (:, (K − 1)N s + 1 : end) represent the subspace orthogonal basis of H k and the null space orthogonal basis of H k , respectively. Then we can know (26) Fig. 2 The structure diagram of analog precoding solution process. It shows the analog precoding solution process. After determining the optimization order firstly, and then optimizing the sub-matrices one by one.
The whole process only needs to update S N The channel becomes H int,kṼ k called an equivalent channel. SVD of the equivalent channel shows where S k represents the diagonal matrix of equivalent channel ( H int,kṼ (0) k ). To eliminate inter-user interference, taking the V (1) k corresponding to the nonzero singular value matrix as the precoding matrix, and the final digital precoding matrix is given by There are two types of BD algorithms: average power allocation and water-filling power allocation. Since the transmission capacity of each channel is usually different, the application of average power distribution results in the waste of communication resources and even the loss of communication capacity. The principle of the water-filling method is that after each user's channel is divided into N independent sub-channels, the channel of each user of the multi-channel system may be equal to the channel of each bandwidth B. According to the Shannon formula, the subchannel capacity of the kth user is: where p k , f k , and n 0 are the transmission power, frequency response, and noise component of the kth subchannel, respectively. Because when N is large enough, the SNR of each channel can be regarded as a constant. In the case of known channel SNR, we can assign different power signals to each different channel to achieve the maximum sum rate. Therefore, the maximum sum capacity can be expressed as: where P N is the total power. According to the Lagrangian multiplier algorithm, the power p k is: where is the Lagrangian multiplier factor, B is called the water-filling line of the waterfilling principle.
The principle of water-filling can reach the theoretical maximum of sum rate, and get better communication quality, thus it is widely used. The whole process of the algorithm in this paper is shown in Table 1.

Results and discussion
In this section, we evaluate the performance of the proposed hybrid beamforming schemes with the sub-connected structure in MU-MIMO systems, the corresponding simulation results are described below [38]. All simulation results are averaged over 1000 channel realizations based on MATLAB platform, the Win10 system, the processor: Inter (R) Core (TM) i5-8250 U CPU @ 1.60 GHz, the RAM:8.00 GB, and the system type: 64-bit operating systems. For simplicity, the propagation environment is modeled as a N c = 8 cluster with N p = 10 rays per cluster, and the inter-element spacing d is assumed to be half wavelength. The AoA and AoD of each element are uniformly distributed in [0, 2π ] , respectively. Typical mmWave massive MIMO configurations with N t = 128 , N = 16 and N r = 16 are considered. The number of users is provided as K = 4 . The noise variance at each user is σ 2 = 1 . The SNR = P N σ 2 . (Note: Unless otherwise specified, the above parameters are default parameters.) It is worth noting that we focus on the hybrid beamforming design of massive MIMO systems with sub-connected architecture in the paper. But we contrast the performance of the proposed method and the state-of-the-art hybrid beamforming design methods with full-connected architecture, which includes the least number of RF chains (the least number of RF chains is equal to the number of the transmitted streams) based HyEB scheme [28], the full-digital dirty paper coding (DPC) method [39]. Since the DPC realized with the iterative water-filling algorithm has been certified to be capacity-reaching in the broadcast channel, it is used as the performance upper bound of the hybrid ones. For the comparison of sub-connected structure methods, we will find the analog precoder by the SIC method [33]. The

Analog stage:
According C n,max ∈ max C 1 C 2 · · · C N , determine the optimization order.
for 1 ≤ n ≤ N do 1) Update matrix T n−1 , obtain the submatrix G n−1 .
2) Obtain the value of ā n,opt .

End for
End stage Digital stage:

End stage
Obtain the total equivalent baseband channel H total = HF RF F BB .
Compute P by using water-filling power allocation of the total equivalent channel H total .

End stage
Output: F RF , F BB , P. digital precoding is obtained by the BD technology. The above method is named SIC-BD algorithm in the system. In addition, we choose the Full-Analog precoding algorithm to compare with other algorithms. In this scheme, we consider the same parameter conditions as other algorithms, but do not consider inter-user interference. That is, the Full-Analog scheme in this case is the upper limit of the multi-user. For more convenient comparison and analysis, we define the full-connected as FC and sub-connected as SC in the following.

A. Performance for the sum rate
We first evaluate the sum rate performance for different methods versus SNR in ULA, and the corresponding simulation results are shown in Fig. 3. Here, Fig. 3 illustrates that the proposed precoding algorithm is proved valid when SNR increases from − 20 to 20 dB. The result under a massive MIMO system with N t = 128 is represented by (a), and the result under a MIMO system with N t = 32 is represented by (b). The simulation results also demonstrate that with an increasing SNR, the proposed hybrid precoding based on SC structure has a more near performance to those of the HyEB [28] on FC structure. And it is much higher than Full-Analog. To further investigate the performance of the proposed design scheme with small antenna arrays, Fig. 3b demonstrates the sum rate comparison for different beamforming schemes versus SNR when the number of BS antennas is small ( N t = 32 ). In addition, the proposed algorithm has the objective capacity, it is still slightly higher than SIC-BD and Full-Analog.
The performance of the sum rate versus SNR for different precoding algorithms in UPA is displayed in Fig. 4, where (a) represents N t = 128 and (b) represents N t = 32 . It can be seen from Fig. 4 that the sum rate of each algorithm under UPA decreases slightly compared with that under ULA. The performance of the proposed algorithm in Fig. 4a is significantly better than that of SIC-BD. In Fig. 4b, the proposed algorithm is closer to the HyEB [28]. The Full-Analog algorithm is much lower than other algorithms. Although the use of UPA in the MU-MIMO channel will cause the overall performance of the proposed algorithm to slightly decrease, the trend of change is still consistent with the use of ULA. Furthermore, when the antenna deployment mode is changed from a linear array to an area array, the area of the antenna array deployed by the base station is greatly saved, and the space utilization rate of the base station and users on the device is effectively improved.

B. Performance for the number of BS antennas
The performance of the sum rate versus the BS antennas for different precoding algorithms is displayed in Fig. 5, where SNR = 0 dB. We note that the performance of all algorithms can be improved by increasing the number of BS antennas. When the number of BS antennas is large, the performance gap between the SC beamforming scheme and FC hybrid beamforming scheme becomes larger. But the proposed design scheme is better than the SIC-BD. Moreover, compared with the small number of BS antennas, the performance gap between the proposed beamforming scheme and the HyEB [28] scheme is small. The Full-Analog method is far lower than the proposed algorithm. Figure 6 compares the sum rate performance of different precoding schemes versus the number of users with SNR = 5 dB, where the number of users changes from 2 to 12. We can see that the proposed method is very close to the SIC-BD, but the overall performance is still better than the SIC-BD. As the number of users increases, the sum rate performance of different design methods becomes large. Furthermore, it can also be explained that with the increase in the scale of the system, the proposed design scheme effectively eliminates inter-user interference, so as to improve the performance of the system. The sum rate of the Full-Analog algorithm does not change significantly with the number of users, and its growth rate is the smallest compared to other algorithms. In order to compare the computational complexity of proposed schemes, we list the running time of five schemes in Table 2 with the average time over 100 random channel realizations. Regardless of the computer hardware, we can find that the running time of full-connected structure schemes is tremendously large. Although the full-connected  structure has better performance, it has the disadvantages of complicated layout, high cost, and excessive power consumption. For the sub-connected structure, the hardware complexity and energy consumption are reduced, and the performance is not significantly different from the full-connected structure. The proposed algorithm is slightly slower than SIC-BD in running time, but its performance is ahead of SIC-BD due to its screening optimization. The full-analog algorithm takes faster time but the performance difference is obvious. Figure 7 shows the sum rates achieved by different hybrid precoding schemes when the number of data streams per user is different, where N s = 2 , 4. Considering the costs and power consumption, we find that the performance of different hybrid precoding schemes with SC is similar but the proposed method is more closer to that of the HyEB [28] scheme as the number of data streams per user is small, i.e., N s = 2 . When the number of data streams provided by the system increases, the gaps between the sum rates of different schemes become larger correspondingly. However, the proposed hybrid precoding scheme still performs better than SIC-BD when the number of data streams is different.

E. Performance for the power efficient
As mentioned in Sect. 1, the power consumption is an important issue which should be considered for both the SC and FC hybrid precoding. In this subsection, we aim to compare the power efficiency performance of different hybrid precoding design schemes.
To better compare the performance of the two hybrid precoding structures, the power efficiency η is defined as the ratio between the achievable rate R and the total power consumption P total , which is expressed as follows:  where the unit of η is bps/Hz/J and P total is the total power consumption of the system. Considering the hybrid precoding architecture, we can note that in the hybrid precoding architecture, the power is depleted by five blocks [40]: (a) the phase shifter (PS) on the transmitter side; (b) the RF chains on the transmitter side; (c) digital-to-analog converters (DAC) on the transmitter side; (d) the base-band (BB) processor; (e) the power amplifiers (PA) on the transmitter side.
Considering the full-digital precoding for MIMO, the amounts of power consumed by BS and users in full-digital MIMO architecture are written as where P BB , P RF , P PA , P PS , and P DAC are the power of BB, the power of each RF chain, the power of each PA, the power of each PS, and the power of each DAC, respectively.
Different from the full-digital precoding for MIMO, the total power consumption P total in the hybrid precoding architecture can be written as The simulation parameters according to [41][42][43] are set as follows: P BB = 243 mW , P RF = 40 mW , P PA = 16 mW , P DAC = 110 mW and P PS = 10 mW.
Here we note that for the FC and the SC structures, the number of phase shifters N PS can be written as Figure 8 compares the power efficiency for different hybrid precoding schemes versus SNR. It is observed from Fig. 8 that we discover that the performance of different hybrid precoding methods with SC is similar, but it is higher than hybrid precoding schemes with FC structures. It is obvious that the proposed algorithm has always been superior to the SIC-BD in the whole range. It can be noticed that the proposed method can issue the signal more efficiently than SIC-BD with the same SNR and power consumption, which means it has higher power efficiency. What is more, the full-digital MIMO architecture requires more hardware and produces higher power consumption, its power efficiency performance is relatively low compared with the hybrid architecture. Therefore, the fulldigital MIMO architecture is rarely used for signal propagation in practical applications.

F. Performance for sensitivity of channel estimation errors
Finally, we evaluate the impact of imperfect CSI on the proposed hybrid precoding. Let H represents the estimated channel, then it can be modeled as [44] where ξ ∈ [0, 1] expresses the accuracy of estimated CSI, and E is the error matrix with entries following the distribution i.i.d. CN (0, 1).
It can be noticed from Fig. 9 that the proposed hybrid precoding method seems to be insensitive to the CSI accuracy in SNR conditions. Even when the channel estimation accuracy is not high, the proposed method can obtain a considerable sum rate. It is particularly noticeable at low SNR. When SNR = 15 and ξ = 0.9 , the performance of the proposed method is quite close to that in the perfect CSI condition. It can still achieve about 96.9% of the perfect CSI condition's sum rate. Even when ξ = 0.6 , the performance of the proposed method can still achieve about 84.1% of the rate in the perfect CSI condition. In this case, only 19.16 bps/Hz is lost compared to the case where the CSI is completely known in the Sum−rate (bits/s/Hz) perfect CSI imperfect CSI with ξ=0.9 imperfect CSI with ξ=0.7 imperfect CSI with ξ=0.6 Fig. 9 Impact of imperfect CSI on the proposed scheme. It compares the sum rate performance of proposed algorithm based on different CSI