Joint hybrid-precoding design for MU-MISO systems with a subconnected architecture

In this study, we propose a joint hybrid-precoding algorithm for multiuser multiple-input single-output downlink systems. Specifically, we consider that the base station employs an energy-efficient hybrid-precoding subconnected (SC) architecture with fixed equal subarrays (FESA) (SC-FESA). Optimizing the analog precoding matrix in an SC-FESA architecture is challenging due to its unique constraint structure. In this study, to maximize system sum rate, we propose an efficient method to transform the system’s sum-rate optimization problem into a continuous and differentiable objective function wherein only the nonzero elements of the analog precoding matrix are optimized. For the formulated problem, we develop an alternating optimization (AO) approach to jointly optimize the digital and analog precoders in succession by maximizing the system’s sum rate. Specifically, in the proposed AO method, when the digital precoder is fixed, we employ the Riemannian conjugate gradient algorithm to generate the analog precoder. Furthermore, when the analog precoder is fixed, we use the minimum mean squared error method to obtain the digital precoder. Numerical simulation results show that the proposed AO algorithm improves the sum rate and energy efficiency of the SC-FESA architecture compared to existing algorithms.


Introduction
A large multiple-input multiple-output (MIMO) system is one of the crucial technologies for fifth-generation (5G) communication systems because of its ability to provide ultrahigh data rates and massive device connectivity. Despite the considerable potential of large MIMO systems, the implementation of a conventional fully digital architecture, which requires a dedicated radio-frequency chain (RFC) per antenna, is typically prohibitive because of its high cost, complexity, and circuitry power consumption [1]. To address this issue, various hybrid analog-digital (HAD) architectures, which can achieve data rates closer to the fully digital architecture at a much lower cost and power consumption, have been proposed in the literature [2][3][4][5][6][7][8].
HAD architectures can be grouped into two categories: fully connected (FC) and subconnected (SC), depending on how the antennas and RFCs are connected [5,9]. In an FC architecture, each RFC is connected to all antennas through phase shifters.
In contrast, in an SC architecture, each RFC is connected to a unique subset of antennas, forming an array of subarrays [5]. Moreover, FC schemes offer the maximum precoding gain, resulting in higher achievable rates than SC schemes. On the other hand, SC architectures provide lower power consumption and hardware complexity than FC architectures because of the reduced number of RF components. Consequently, as compared with FC architectures, the SC architectures are regarded as energy-efficient and low-complexity solutions for massive MIMO systems. Unfortunately, SC architectures have received less attention than FC architectures because high data rates are required for 5G systems [10].
However, studies have been devoted to developing algorithms to improve the performance of SC architectures [11,12,[14][15][16]. The works of [11,12] presented hybrid-precoding (HP) algorithms for single-user (SU)-MIMO systems. Specifically, [11] proposed a successive interference cancellation (SIC) approach for optimizing the columns of a precoding matrix via singular value decomposition, and [12] proposed a semidefinite relaxation-based method for jointly designing digital and analog precoders.
For SC architectures in the multiuser (MU)-MIMO scenario, [14] proposed the MU-SIC approach. In [14], the analog precoder and combiner were alternately updated via the SIC method, and a block diagonalization (BD) scheme was employed to generate the digital precoding and combining matrices. In [15], a coordinate ascent (CA)-based HP and combining design algorithm was proposed for a MU-MIMO environment. Recently, [16] proposed a nonlinear Tomlinson-Harashima (TH)-based HP and combining algorithm. In the TH-based algorithm, the columns of the analog precoding and combining matrices were successively optimized using the Schur complement, whereas the digital precoding and combining matrices were generated via QR decomposition. Although the algorithms in [15] and [16] have low complexity and achieve improved performance, they are restricted to a special case when the number of transmitted data symbols is equal to the number of RF chains at the BS. Additionally, the work in [14] achieves suboptimal performance. Furthermore, previous works proposed dynamic SC schemes [17][18][19][20], where each subarray is connected to an arbitrary number of consecutive antennas. Although the dynamic SC schemes can achieve higher beamforming gains than the conventional fixed SC schemes, they require additional hardware, particularly switches, which leads to increased complexity and power consumption. Moreover, the switches may suffer from slow switching speeds and poor port isolation [19,21], which makes the fixed SC schemes more practical than the dynamic SC schemes.
In this study, we aimed to further improve the performance of SC architectures. We propose an alternating optimization (AO) approach to jointly design the digital and analog precoders by directly solving the sum-rate maximization problem. Specifically, we consider a MU-multiple-input single-output (MISO) downlink system for an SC architecture with fixed equal subarrays (FESA) (SC-FESA). It is well known that an optimal digital precoder for MU-MISO systems can be obtained via traditional linear precoding schemes, such as zero-forcing (ZF), block diagonalization (BD), or minimum mean squared error (MMSE) [14,22]. The primary goal of this study is to find an appropriate analog precoder that leads to improved sum rates. Moreover, the proposed algorithm is applicable to a scenario wherein the number of transmitted data symbols is less than or equal to that of RF chains at the BS, thus making it more practicable.
The SC-FESA architecture has a unique constraint structure on its analog precoding matrix, which is very challenging to tackle [12]. This paper proposes a novel technique to transform the system's sum-rate maximization problem into a tractable and differentiable objective function for optimizing the analog precoder. The formulated problem allows us to optimize the nonzero elements only in the analog precoding matrix. Consequently, we apply a manifold optimization (MO)-based algorithm to optimize the analog precoder as a single vector of all nonzero elements by iteratively searching on the complex circle manifold to find a local optimum with zero gradient of the objective function.
Our proposed method was motivated by the near-optimal performance achieved by the MO-based algorithm in [12,13]. Reference [12] considers an SU-MIMO system with an FC architecture and solves the spectral efficiency-maximization problem by minimizing the Frobenius norm of the difference between the optimal fully digital precoder and the hybrid precoder. Moreover, Alluhaibi et al. in [13] also consider an SU-MIMO system with an SC architecture. In this study, we focus on an MU-MISO system with the SC-FESA architecture and directly solved the sum-rate maximization problem to generate digital and analog precoders; this has not been addressed in previous works [12,13].

Main contributions
In this study, our main contributions are as follows: 1. Considering the SC-FESA architecture in MU-MISO downlink systems, we propose a novel method to transform the system's original sum-rate maximization problem into a more tractable problem for optimizing the nonzero elements in the analog precoding matrix. The formulated analog precoding optimization problem belongs to a class of MO problems defined by a unit modulus constraint optimized using the MO-based algorithms. 2. To solve the formulated analog precoding optimization problem, we develop an AObased algorithm that jointly designs the digital and analog precoders to maximize the system's sum rate. Specifically, the analog precoder is generated via the Riemannian conjugate gradient (RCG) algorithm when the digital precoder is fixed. Furthermore, by fixing the analog precoder, the digital precoder is generated via the MMSE algorithm.

Notations
A boldface capital letter, X , is used to denote a matrix, and a boldface lowercase letter, x , denotes a vector. The n th row and m th column entry of X is denoted by x n,m . We use X H , X T , and X −1 to denote the Hermitian transpose, transpose, and inverse of X , respectively. diag(a 1 , · · · , a N ) is a diagonal matrix containing a 1 , · · · , a N as its diagonal elements, blkdiag(x 1 , · · · , x N ) is a block-diagonal matrix formed by vectors (x 1 , · · · , x N ) , and I N is an N × N identity matrix. |x| denotes the magnitude of a complex number x, and Re{x} is its real part. The Frobenius norm of X and the Euclidean norm of x are denoted by X F and x , respectively. Finally, • denotes the Hadamard (elementwise) product and a calligraphic letter, X , denotes a set.

System model and problem formulation
We consider a conventional SC-FESA architecture for a MU-MISO downlink system, which is illustrated in Fig. 1. The BS is equipped with N t = NM transmit antennas and M (< N t ) RFCs transmitting K data streams to K single-antenna users, whereas K ≤ M . Each RFC is connected to a fixed subarray consisting of N antennas. Let s ∈ C K ×1 denote the transmitted data symbol, which satisfies E{ss H } = I K . The received vector for all K users, y = [y 1 , · · · , y K ] T , where y k denotes the signal received by the kth user, can be written as which represents the channel column vector between the BS and kth user. A = blkdiag(a 1 , · · · , a M ) ∈ C N t ×M denotes an analog precoding matrix.
Here, a m = [a 1,m , · · · , a N ,m ] T ∈ C N ×1 , m = 1, 2, · · · , M is an analog weighting vector for the mth subarray, whose elements have a unit magnitude, i.e., a n,m = 1 . Moreover, is the digital precoding vector for the kth user and z ∈ C K ×1 is an independent and identically distributed additive white Gaussian noise vector with z i ∼ CN (0, σ 2 ) . For the signal model in (1), the sum rate for all K users is given by [14] where P t is the transmit power at the BS and A is a set of feasible RF precoders, which can be defined as where (1) y = HADs + z,

Channel model
In this study, we adopted a widely used geometric-channel model for millimeter-wave communication systems [23,25]. We assume that the BS is equipped with a uniform linear array (ULA) [23]. Hence, the physical-channel model between the BS and the kth user is given by: is the normalization factor, L k denotes the number of propagation paths for the kth user, and α k l ∼ CN (0, ρ k l ) denotes the complex gain associated with the lth path as seen by the kth user. The variance ρ k l includes the path loss of the kth user and Thus, without loss of generality, we consider ρ k l = ρ = 1 [12,24,25]. In addition, φ k l ∈ [0, 2π) denotes the angle of departure (AoD) of the lth path from the BS to the kth user, and a t (·) denotes the array response vector at the BS. We assume that the BS deploys a uniform linear array, for which the corresponding normalized array response vector is given by [23,Eq. (34)], where is the wavelength and d is the antenna spacing.

Problem formulation
Our goal is to maximize the sum rate by jointly optimizing the digital and analog precoding matrices using an AO approach under constraints on the total transmit power at the BS and unit-modulus analog phase shifters. The optimization problem is expressed as follows: Unfortunately, optimizing P1 is a difficult task because of the non-convexity in A , which makes it unlikely to obtain a global solution. In the next section, we discuss the proposed AO algorithm, which provides an efficient solution to P1.

Proposed AO method
In this section, we describe the proposed AO method for solving problem P1 . We decompose P1 into two subproblems: one for optimizing the digital precoder D and the other for the analog precoder A . In particular, the joint optimization employs an alternating updating rule; i.e., for a fixed analog precoder A , the digital precoder D is optimized, and then vice versa, until convergence is reached.

Digital precoding design
Given that the analog precoding matrix A is fixed, we adopted the MMSE algorithm for the digital precoding design. Using the MMSE method, the k th userâ€ ™ s digital precoder d k , which satisfies the total power constraint in (4c), is given in the following two steps [22]:

Analog precoding design
We now focus on optimizing the analog precoder A for a fixed digital precoder D . However, because of the unique structure of the constraint on A , we first transform P1 into an efficient form that allows for the optimization of nonzero elements in A only. Hence, let x = [a 1,1 , · · · a n,m , · · · , a NM,M ] H ∈ C N t ×1 , ∀n ∈ N, m ∈ M denote a vector of all nonzero elements in A ; X = diag(x H ) be an N t × N t diagonal matrix, and G = blkdiag(g 1 , · · · , g M ) represent a block-diagonal matrix of size N t × M , where g m is an N-dimensional vector of all ones. Subsequently, the objective function for optimizing x is given by Note that, in P2 , the analog precoding matrix is Moreover, by defining h k = diag(h H k )Gd k ∈ C N t ×1 , the optimization problem P2 can be rewritten as A = XG.

P3
: max s.t. |x ℓ | = 1, ∀ℓ = 1, 2, · · · , N t , Now, the primary obstacle to solving P3 is the non-convex unit modulus constraint |x ℓ | = 1 , which lacks a standard method. However, we observe that P3 is continuous and differentiable, and its constraint set of x forms a complex circle manifold. Hence, this study considers an MO-based approach for obtaining the local optimal solution of P3 . Specifically, this study employs a type of MO approach called the RCG algorithm [26] to address P3 . The RCG algorithm is used as it has shown good performance in addressing hybrid-precoding design problems [11,13]. The RCG algorithm follows three main steps for each iteration. The first step is to determine the Riemannian gradient , defined as an orthogonal projection onto the tangent space of the gradient of f (x) . The Riemannian gradient is obtained by where ∇f denotes the Euclidean gradient given by The second step is to find the conjugate gradient direction in Euclidean space via the following update rule: where ̟ is chosen as the Polak-Ribiere parameter [27, p. 2], η represents the previous search direction, and T (·) is given by The third step is to perform a retraction, which is the process of mapping points from a tangent space back to the complex circle manifold. The following update rule is applied: where τ is the Armijo step size obtained according to [28,Definition 4.2.2]. The overall procedure for the RCG algorithm to solve P3 is summarized in Algorithm 1. We note that input vector x is obtained from the initialization and update procedures of Algorithm 2. Having discussed the designs of the digital and analog precoders, where the digital precoder is obtained using the MMSE algorithm and the analog precoder is generated via the RCG-based algorithm, their joint optimization via the AO approach is summarized in Algorithm 2.

Complexity analysis
In this subsection, we analyze the complexity of the proposed AO algorithm. We compare the complexity of the proposed AO algorithm to the existing algorithms by considering the dominant computational loads of each algorithm. In the proposed algorithm, the digital precoding matrix is generated via the MMSE method, which requires a complexity of O(K 3 ) . Then, Algorithm 1 is used to generate the analog precoder in step 6 of Algorithm 2, which is dominated by the O(N 2 t K 2 ) complexity required to compute the Euclidean gradient. Therefore, the dominant complexity of the proposed AO algorithm is O(I O I I N 2 t K 2 ) , where I O and I I denote the numbers of iterations required for Algorithm 1 and Algorithm 2, respectively, to converge. In comparison, the dominant complexities of the SIC-HP [14], CA-HP [15], TH-HP [16], and phase extraction-based HP (PE-HP) [25] are O(max{K 4 , KM 3 }) , O(IKN 3 t ) , O(IK 2 N t ) , and O(K 3 ) , respectively, where I is the number of iterations required for convergence. In the FD architecture, the MMSE algorithm is applied to the channel matrix H . Hence, its complexity becomes O(N t 3 ). Further, the proposed AO algorithm requires I I and I O iterations for the inner and outer loops, respectively. These iterations increase the overall complexity of the proposed algorithm than those of the TH-HP and CA-HP algorithms, which require only I iterations to generate their analog precoding matrices. However, as observed later in Sect. 4, I I is slightly higher than I, whereas I O requires only few tens of iterations. Hence, the complexity of the proposed algorithm is still within the range of existing algorithms with a slightly increased complexity. Moreover, the increased complexity of the proposed AO algorithm can be justified, owing to its benefits in terms of the sum rate and EE, which we verify through numerical simulation results in Sect. 4.

Energy efficiency
The energy efficiency (EE) is defined as the ratio of the achievable sum rate to the total power consumption at the BS, i.e., EE = R P [5]. The total power consumption of the FD, FC, and SC architectures is expressed as follows [5]: where P PA , P PS , P RFC , P DAC , and P BB denote the power consumption of the power amplifier, phase shifter, RFC, digital-to-analog converter, and baseband processing, respectively.

Simulation results
In this section, we present the simulation results to evaluate the effectiveness of the proposed AO algorithm by comparing it with existing algorithms. For a fair comparison, the MMSE algorithm is adopted for the digital precoding designs of all benchmark schemes. The compared algorithms and their corresponding abbreviations are summarized as follows: • FD: Fully digital architecture with the MMSE digital precoding algorithm [22]. (12a) P FD = N t (P PA + P RFC + P DAC ) + P BB , In the simulation, we assume a millimeter-wave channel with L = 10 , d = 2 [23]. Furthermore, we set I = I O = I I = 10 , unless otherwise stated. For the EE comparison, we set P PA = 20 mW , P PS = 30 mW , P RFC = 40 mW , and P DAC = P BB = 200 mW [5]. Finally, all simulation results are averaged over 10 3 channel realizations, and the signalto-noise ratio (SNR) in the plots is defined as P t σ 2 , where the noise variance at each user σ 2 is set to 1. Figure 2 compares the sum rates of the different algorithms for various SNRs. We assume a system with N t = 60 and K = M = 12 . In Fig. 2, we observe a large performance gap between the FC and SC architectures owing to the reduced precoding gain of the SC architecture. However, considering only the schemes based on SC architecture, Fig. 2 demonstrates that the proposed AO scheme achieves a higher performance than conventional algorithms over the entire SNR range. It is also observed that if the analog precoder is not properly optimized-for instance, in the PE-HP scheme, which extracts only the phases of the channel coefficients to generate the analog precoding matrix-we obtain poor performance compared to the other schemes. This suggests that both the digital and analog precoding matrices have a significant impact on improving the achievable sum rate of the SC architecture.
Using the results reported in Fig. 3, we study the effect of the number of transmit antennas, namely N t , located at the BS on the achievable sum rate of a system. We assume that the BS employs a number of RFCs equal to the number of users, that is, K = M = 12 and SNR = 0 dB . Figure 3 shows that the sum rates of all algorithms can be improved by increasing N t . Figure 3 also indicates that the proposed AO scheme maintains a performance gap over TH-HP and CA-HP schemes, whereas the proposed AO demonstrates more significant performance gains over PE-HP and SIC-HP schemes as N t increases, which further indicates the effectiveness of our proposed method for the SC architecture. Figure 4 compares the achievable sum rate of different algorithms as a function of the number of users. We assume that the BS employs RFCs equal to the number of  Figure. 4 shows that the proposed AO scheme performs better than the conventional algorithms based on the SC architecture. Additionally, the performance gap between the proposed scheme and existing algorithms based on SC architecture slightly widens as K increases. This indicates that the proposed AO scheme effectively eliminates interuser interference, resulting in an improved performance of the system. Figure 5 illustrates the sum rate of the proposed algorithm versus the number of users with a fixed number of RF chains M = 12. Note that we only consider the algorithms that can be generalized for K ≤ M . Moreover, we assume N t = 60 and SNR = 0 dB . The simulation results in Fig. 5 verify that the proposed algorithm can achieve substantial performance gains over the SIC-HP scheme in the case of K ≤ M. FD [22] PE-HP (FC) [25] PE-HP (SC) [25] SIC-HP (SC) [14] CA-HP (SC) [15] TH-HP (SC) [16] Proposed AO (SC) 14 14. Next, we show the convergence behavior of the proposed AO, TH-HP, and CA-HP algorithms for N t = 60 , K = M = 12 , and SNR = 0 dB . In Fig. 6, we illustrate the convergence behavior of the proposed MO-based algorithm for optimizing the analog precoder in P3 , which terminates in approximately 13 iterations because there is a negligible increase in the objective value f(x) of P3 . In Fig. 7, we plot the achievable sum rates versus the number of iterations for the proposed AO, TH-HP, and CA-HP algorithms. This shows that the convergence speed of the proposed AO scheme is slightly slower than those of the TH-HP and CA-HP schemes, which can cause increased complexity. However, we note that the proposed algorithm presents substantially higher sum rates than the existing schemes. Figure 8 plots the EE variations for different numbers of transmit antennas. We set SNR = 0 dB and fix the number of users and RFCs to K = M = 12 , whereas the number of transmit antennas N t varies from 24 to 240. In Fig. 8, we observe that the EE performance of the HP schemes based on the SC architecture is higher than that of the PE-HP (FC) and FD schemes. Moreover, Fig. 8 shows that the EE gap between the SC and FC schemes increases with N t . The proposed AO scheme achieves a higher EE than that of the conventional algorithms over the entire range of N t . In Fig. 8, we also observe that as N t increases, the EE of all the compared SC schemes decreases. However, the EE gains of the proposed AO over the SIC-HP and PE-HP schemes increase drastically with N t , while maintaining nearly constant performance gains over the CA-HP and TH-HP schemes. Finally, in Fig. 9, we show the EE versus number of users for N t = 60 and SNR = 0 dB . For K = M , the power consumption of a system increases with K due to the increased number of RFCs M. Consequently, in Fig. 9, we observe that the EE tends to decrease for all schemes as K increases. We also observe that the EE of the FD scheme increases with K since its sum rate increases with K, whereas its power consumption remains constant. More importantly, the proposed AO scheme always achieves better EE performance than the existing schemes for different K.

Conclusion
In this study, we proposed a joint HP design algorithm for MU-MISO downlink systems, where the BS is equipped with a SC-FESA architecture. We proposed a novel technique to transform the system's sum-rate optimization problem into a tractable objective function, which allows the efficient adoption of the MO-based algorithm to optimize the nonzero elements of the analog precoding matrix. The proposed algorithm enhanced the achievable sum rate by jointly optimizing the digital and analog precoding matrices using an AO approach. Specifically, we employed the MMSE algorithm to generate the digital precoder, whereas the analog precoder was generated using an MO-based technique. Through numerical simulation, we showed that the proposed AO algorithm attained higher sum rates than the existing algorithms for SC architecture. In addition, the proposed AO algorithm achieved higher EE than conventional algorithms for both SC and FC architectures. Notably, in this study, we considered only the narrowband channels and MU-MISO system; however, the hybrid precoding for wideband channels and MU-MIMO wireless communication systems is one of the areas that require investigation. It would be interesting to extend our proposed algorithm to wideband channels and MU-MIMO systems in future studies. Moreover, the convergence analysis of the proposed AO algorithm will require further investigation.  [25] PE-HP (SC) [25] SIC-HP (SC) [14] CA-HP (SC) [15] TH-HP (SC) [16] Proposed AO (SC)