Optimal Multiuser Zero-Forcing with Per-Antenna Power Constraints for Network MIMO Coordination

We consider a multi-cell multiple-input multiple-output (MIMO) coordinated downlink transmission, also known as network MIMO, under per-antenna power constraints. We investigate a simple multiuser zero-forcing (ZF) linear precoding technique known as block diagonalization (BD) for network MIMO. The optimal form of BD with per-antenna power constraints is proposed. It involves a novel approach of optimizing the precoding matrices over the entire null space of other users' transmissions. An iterative gradient descent method is derived by solving the dual of the throughput maximization problem, which finds the optimal precoding matrices globally and efficiently. The comprehensive simulations illustrate several network MIMO coordination advantages when the optimal BD scheme is used. Its achievable throughput is compared with the capacity region obtained through the recently established duality concept under per-antenna power constraints.


Introduction
While the potential capacity gains in point-to-point [1,2] and multiuser [3] multiple-input multiple-output (MIMO) wireless systems are significant, in cellular networks this increase is very limited due to intra-and intercell interference. Indeed, the capacity gains promised by MIMO are severely degraded in cellular environments [4,5]. To mitigate this limitation and achieve spectral efficiency increase due to MIMO spatial multiplexing in future broadband cellular systems, a network-level interference management is necessary. Consequently, there has been a growing interest in network MIMO coordination [6][7][8][9][10][11]. Network MIMO coordination is a very promising approach to increase signal to interference plus noise ratio (SINR) on downlinks of cellular networks without reducing the frequency reuse factor or traffic load. It is based on cooperative transmission by base stations in multiuser, multicell MIMO systems. The network MIMO coordinated transmission is often analyzed using a large virtual MIMO broadcast channel (BC) model with one base station and more antennas [12][13][14]. This approach increases the number of transmit antennas to each user, and hence the capacity increases dramatically compared to conventional MIMO networks without coordination [7][8][9]. Moreover, intercell scheduled transmission benefits from the increased multiuser diversity gain [15]. The capacity region of network MIMO coordination as a MIMO BC has been previously established under sum power constraint using uplink-downlink duality [16][17][18][19][20]. However, the coordination between multiple base stations requires per-base station or even more realistic in practice per-antenna power constraints. A more general case is the extension to any linear power constraints. Under per-antenna power constraints, uplink-downlink duality for the multiantenna downlink channel has been established in [21,22] using Lagrangian duality framework in convex optimization [23] to explore the capacity region. It is known that the capacity region is achievable with dirty paper coding (DPC). However, DPC is too complex for practical implementation. Consequently, due to their simplicity, linear precoding schemes such as multiuser zero forcing (ZF) or block diagonalization (BD) are considered [24,25].
The key idea of BD is linear precoding of data in such a way that transmission for each user lies within the null space of other users' transmissions. Therefore, the interference to other users is eliminated. Multicell BD has been employed 2 EURASIP Journal on Wireless Communications and Networking explicitly for network MIMO coordinated systems in [26][27][28][29] with the diagonal structure of the precoders and the sum power constraint [24]. Although there were attempts in these papers to optimize the precoders to satisfy per-basestation and per-antenna power constraints, this structure of the precoders is no longer optimal for such power constraints and must be revised [27,30,31]. In [32], the ZF matrix is confined to the pseudoinverse of the channel for the single receive antenna users with per-antenna power constraints. The suboptimality of pseudoinverse ZF beamforming subject to per-antenna power constraints was first shown in [27] and received further attention in [30,31,33,34]. Reference [30] presented the optimal precoder's structure using the concept of generalized inverses, which lead to a nonconvex optimization problem, the relaxed form of which required semidefinite programming (SDP) [33]. This was investigated only for single-antenna mobile users. Reference [31] also used the generalized inverses for the single-antenna mobile users, employing multistage optimization algorithms.
In this paper, we aim to maximize the throughput of network MIMO coordination employing multiple antennas both at the base stations and the mobile users through optimization of precoding. We employ BD for precoding due to its simplicity. An optimal form of BD is proposed by extending the search domain of precoding matrices to the entire null space of other users' transmissions [34]. The dual of the throughput maximization problem is used to obtain a simple iterative gradient descent method [23] to find the optimal linear precoding matrices efficiently and globally. The gradient descent method applied to the dual problem requires fewer optimization variables and less computation than comparable algorithms that have already been proposed in [26,28,30,31]. Reference [35] has employed the idea presented in [34], which is optimizing over the entire null space of other users' channels, but it developed an algorithm based on the subgradient method. The subgradient method is not a descent method unlike the gradient method and does not use the line search for the step sizes [36]. Furthermore, our approach is also applicable to the case of nonsquare channel matrices, single-antenna mobile users and per-base-station power constraints. In contrast to previous numerical results on network MIMO coordination [26,37,38] assuming the sum power or perbase-station power constraints, in this paper the proposed optimal BD is examined with per-antenna power constraints enforced. To consider network MIMO coordination feasible in practice, local coordination of base stations is used through clustering [26,38,39]. The results show that the proposed optimal BD scheme outperforms the earlier BD schemes used in network MIMO coordination. For the sake of comparison the capacity limits are determined employing the uplink-downlink duality idea in MIMO BC under perantenna power constraint introduced in [21,22].
The remainder of this paper is organized as follows. In Section 2 the system model is introduced, and the network MIMO coordination structure, the transmission strategy, and the corresponding capacity region are discussed. In Section 3 the multicell BD scheme is studied, and its comparison with the conventional BD is presented, which motivates research on optimal multicell BD under perantenna power constraints. The optimal multicell BD scheme is proposed in Section 3.2, and its further extensions and generalizations are considered. Comprehensive numerical results are presented in Section 5 following the discussion of the simulation setup in Section 4. Conclusions are given in Section 6.

System Model
We consider a downlink cellular MIMO network, with multiple antennas at both base stations and mobile users. Each user is equipped with n r receive antennas, and each base station is equipped with n t transmit antennas. The base stations across the network are assumed to be coordinated via high-speed backhaul links. For a large cellular network of several cells, this coordination is difficult in practice and requires large amount of channel state information and user data available at each base station. Hence, clustering of the network is applied, where each group of B cells is clustered together and benefits from intracluster coordinated transmission [26,38,39]. Hence, within each cluster each user's receive antennas may receive signal from all N t = n t B transmit antennas. The cellular network contains C clusters. The base stations within each cluster are connected and capable of cooperatively transmitting data to mobile users within the cluster. Hence, there are two types of interference in the network, the intracluster and inter-cluster interference. If we define H c,k,b ∈ C nr ×nt to be the downlink channel matrix of user k from base station b within cluster c, then the aggregate downlink channel matrix of user k within cluster c is an n r × N t matrix defined as H c,k = [H c,k,1 H c,k,2 · · · H c,k,B ]. The aggregate downlink channel matrix for all K users scheduled within cluster c, H c ∈ C Knr ×Nt is defined as H c = [H T c,1 · · · H T c,K ] T , where (·) T denotes the matrix transpose. The multiuser downlink channel is also called broadcast channel (BC) in information theory literature [40]. Assuming that the same channel is used on the uplink and downlink, the aggregate uplink channel matrix is H H c , where (·) H denotes the conjugate (Hermitian) matrix transpose [13]. The multiuser uplink channel is also called multiple-access channel (MAC). In the BC, let x c ∈ C Nt×1 denote the transmitted signal vector (from N t base stations' antennas of cth cluster), and let y c,k ∈ C nr ×1 be the received signal at the receiver of the mobile user k. The noise at receiver k is represented by n c,k ∈ C nr ×1 containing n r circularly symmetric complex Gaussian components (n c,k ∼ CN (0, σ 2 I nr )). The received signal at the kth user in cluster c is then where H c,k represents the channel coefficients from the surrounding clusters c to the kth user of the cluster c. The transmit covariance matrix can be defined as EURASIP Journal on Wireless Communications and Networking 3 The base stations are subject to the per-antenna power constraints p 1 , . . . , p Nt , which imply where [·] ii is the ith diagonal element of a matrix. The cancelation of intracluster multiuser interference is done by applying BD, which is discussed in Section 3. The remaining inter-cluster interference plus noise covariance matrix at the kth user of the cluster c is given by To simplify the analysis, we have normalized the vectors in (1) dividing each by the standard deviation of the additive noise component, σ. Completely removing the inter-cluster interference requires universal coordination between all surrounding clusters. The worst-case scenario for interference is when all surrounding clusters transmit at full allowed power ([41, Theorem 1]). Although this result is for the case with the total sum power constraint on the transmit antennas, it is used in our numerical results, and it gives a pessimistic performance of the network MIMO coordination [38]. Then, a prewhitening filter can be applied to the system, and as a result the inter-cluster interference in this case can be assumed spatially white [42]. The received signal for the kth user in the cth cluster after postprocessing can be simplified as where z k is the noise vector. For ease of notation, we dropped the cluster index c.

Capacity Region for Network MIMO Coordination.
The capacity region of a MIMO BC with sum power constraint has been previously discussed in [16][17][18]. The sum capacity of a Gaussian vector broadcast channel under per-antenna power constraint is the saddle point of a minimax problem, and it is shown to be equivalent to a dual MAC with linearly constrained noise [22]. The dual minimax problem is convex-concave, and consequently the original downlink optimization problem can be solved globally in the dual domain. An efficient algorithm using Newton's method [23] is used in [22] to solve the dual minimax problem; it finds an efficient search direction for the simultaneous maximization and minimization. This capacity result is used to determine the sum capacity of the multibase coordinated network, and it constitutes the performance limit for the proposed transmission schemes.

Transmission Strategy.
A block diagram of transmission strategy for network MIMO coordination is shown in Figure 1. The transmitted symbol to user k is an n rdimensional vector u k , which is multiplied by an N t × n r precoding matrix W k and passed on to the base station's antenna array. Since all base station antennas are coordinated, the complex antenna output vector x is composed of signals for all K users. Therefore, x can be written as follows: The received signal y k at user k can be represented as where z k ∼ CN (0, I nr ) denotes the normalized AWGN vector at user k. The random characteristics of channel matrix entries of H k are discussed in Section 4. They encompass three factors: path loss, Rayleigh fading, and lognormal shadowing. Random structure of the channel coefficients ensures rank(H k ) = min(n r , N t ) = n r for user k with probability one. Per-antenna power constraints (2) impose a power constraint on each transmit antenna. The sum power constraint also can be expressed as Due to the structure of multiuser zero forcing scheme, the number of users that can be served simultaneously in each time slot is limited. Hence, user selection algorithm is necessary. We consider two main criteria for the user selection scheme: maximum sum rate (MSR) and proportional fairness (PF). We employ the greedy user selection algorithm discussed in [43,44]. The proportionally fair user selection algorithm is based on greedy weighted user selection algorithm with an update of the weights discussed in [45][46][47].

Multicell Multiuser Block Diagonalization
To remove the intracluster interference, a practical linear zero forcing can be employed. Applying multiuser zero forcing to multiple-antenna users requires block diagonalization (BD) rather than channel inversion [24]. Assuming the transmission strategy in Section 2.3, each user's data u k is precoded with the matrix W k , such that Hence the received signal for user k can be simplified to 4 EURASIP Journal on Wireless Communications and Networking  (9) forces W k to lie in the null space of H k which requires a dimension condition Bn t ≥ Kn r to be satisfied. This directly comes from the definition of null space in linear algebra [48]. Hence, the maximum number of users that can be served in a time slot is K = (N t /n r ). We focus on the K users which are selected through a scheduling algorithm and assigned to one subband. The remaining unserved users are referred to other subbands or will be scheduled in other time slots. Recall that the vectors in (5) are normalized with respect to the standard deviation of the additive noise component, σ, resulting in n k having components with unit variance. Assume that H k is a full rank matrix rank( H k ) = (K − 1)n r , which holds with probability one due to the randomness of entries of channel matrices. We perform singular value decomposition (SVD) where Υ k holds the first (K − 1)n r right singular vectors corresponding to nonzero singular values and V k ∈ C Nt×mr contains the last m r = N t − (K − 1)n r right singular vectors corresponding to zero singular values of H k . If number of scheduled users is K = N t /n r , then m r = n r , otherwise m r > n r when K < N t /n r . The orthonormality of V k means that The columns of V k form a basis set in the null space of H k , and hence W k can be any linear combination of the columns of V k , that is, where Ψ k ∈ C mr×nr can be any arbitrary matrix subject to the per-antenna power constraints [34]. Conventional BD scheme proposed in [24] assumes only linear combinations of a diagonal form to simplify it to a power allocation algorithm through water-filling. The conventional BD is optimal only when sum power constraint is applied [49], and it is not optimal under per-antenna power constraints [27,30,31].

Conventional BD.
In conventional BD [24], the sum power constraint is applied to the throughput maximization problem and further relaxed to a simple water-filling power allocation algorithm. In this scheme, the linear combination introduced in (12) is confined to have a form given by where V k ∈ C mr×nr is the right singular vector of the matrix H k V k corresponding to its nonzero singular values. Hence, the aggregate precoding matrix of the conventional scheme, W BD , is defined as where Θ = bdiag [Θ 1 , . . . , Θ K ] is a diagonal matrix whose elements scale the power transmitted into each of the columns of W BD . The sum power constraint implies that This relaxes the problem to optimization over the diagonal terms of Θ, which can be interpreted as a power allocation problem and solved through well-known water-filling algorithm over the diagonal terms of Θ. However, this form of BD cannot be extended as an optimal precoder to the case of per-antenna power constraints because where Note that ith diagonal term of the left side of (16) is a linear combination of all entries of matrix Θ and not only the diagonal terms. The selection of Θ as a diagonal matrix reduces the search domain size of optimization and hence does not necessarily lead to the optimal solution. Furthermore, V k impacts the diagonal terms of W BD W H BD (i.e., transmission covariance matrix), and therefore insertion of V k not necessarily reduces the required power allocated to each antenna. In addition it adds K SVD operations to the precoding computation procedure (one for each served users) to find V k . Additionally, the per-antenna power constraints do not allow the optimization to lead to simple water-filling algorithm. Previous work on BD with per-antenna (similarly with per-base-station) power constraints for a case of multiple-receive antennas employs this conventional BD and optimizes diagonal terms of Θ [26][27][28]. Hence, it is not optimal. The optimal form of BD proposed in this paper includes the optimization over the entire null space of other users' channel matrices resulting in optimal precoders under per-antenna power constraints, easily extendable to per-base station power constraints.
The numerical results in Figure 2 compare maximized sum rate of a MIMO BC system using conventional BD [24] with the optimal scheme proposed later in this paper. There are 12 transmit antennas at the base station and 2 receive antennas at each mobile user. B = 1 is considered to specifically show the difference between the two BD schemes. Note that the conventional BD has a domain of R Nt + , while the optimal BD searches over all possible K symmetric matrices and therefore has a larger domain of C Knr (nr−1)/2 ++ . Its size also grows with the number of users per cell. Consequently, the difference between these two schemes increases with the number of users per cell. Details of the simulation setup are given in Section 4. In the following section the optimal BD scheme is introduced and discussed in detail, and the algorithm to find the precoders is presented.

Optimal Multicell BD.
The focus of this section is on the design of optimal multicell BD precoder matrices W k to maximize the throughput while enforcing per-antenna power constraints. In this scheme, we search over the entire null space of other users channel matrices ( H k ), that is, Ψ k can be any arbitrary matrix of C mr×nr satisfying the perantenna power constraints.
Following the design of precoders according to (12), the received signal for user k can be expressed as Denote Φ k = Ψ k Ψ H k ∈ C mr ×mr , k = 1, . . . , K, which are positive semidefinite matrices. The rate of kth user is given by Therefore, sum rate maximization problem can be expressed as where the maximization is over all positive semidefinite matrices Φ 1 , . . . , Φ K with a rank constraint of rank(Φ k ) ≤ n r . Notice that the objective function in (19) is concave ([48, p. 466]), and the constraints are also affine functions [23]. Thus, the problem is categorized as a convex optimization problem. We propose a gradient descent algorithm to find the optimal BD precoders. We define G k = H k V k and correspondingly its right pseudoinverse matrix as G † which is an N t × n r matrix, and we perform the SVD Q H k ΛQ k = U k Σ k U H k . We introduce the positive semidefinite matrices Ω k defined as where with a gradient descent direction given as

EURASIP Journal on Wireless Communications and Networking
The optimal BD precoders for the optimal value of Λ are given as Proof. The proof is given in the appendix.
The KKT conditions for the dual problem are given as Λ 0, with the last condition being the complementarity ([23, p. 142]). Thus, the stopping criterion for the gradient descent method can be established using small values of ≥ 0 replacing zero values.
More interestingly, the sum rate maximization in (19) through the dual problem in (21) facilitates the extension to any linear power constraints on the transmit antennas. The dual problem has N t variables λ i , i = 1, . . . , N t , one for each transmit antenna power constraint. More general power constraints than those given in (19) can be defined as [31] tr where T l are positive semidefinite symmetric matrices and p l are nonnegative values corresponding to each of L linear constraints. The special case of this structure of power constraints has been discussed frequently in the literature: for L = 1, p 1 = P, and T 1 = I, the conventional sum power constraint results [24]; when L = N t and T l is a matrix with its lth diagonal term equal to one and all other elements zero, we get per-antenna power constraints studied in this section. Another scenario is per-base station power constraint, which is derived with L = B, p l = P l (lth per-base power limit), and T l all zero except equal to one on n t terms of its diagonal each corresponding to one of the lth base station's transmit antennas. When the sum power constraint is applied only one dual variable is needed in dual optimization problem (21) (i.e., Λ = λI Nt ), where λ determines the water level in the water-filling algorithm [24]. For per-base station power constraints, the optimization dual variable can be defined as Λ = Λ bs ⊗I nt , where Λ bs = diag [λ 1 , . . . , λ B ] consists of B dual variables (one for each base station) and the operator ⊗ is the Kronecker product [48]. The details of the optimization steps in the per-base station power constraints scenario are discussed in Section 3.3, and the study of general linear constraints is left for further work.

Per-Base-Station Power Constraints.
In this Section, the extension of the ZF beamforming optimization to the system with per-base station power constraint is considered. The optimization problem in (19) can be rewritten considering the per-base-station power constraints as where P 1 , . . . , P B are the per-base station maximum powers and Δ b is a diagonal matrix with its entries equal to one for the corresponding antennas within the base-station b and the rest equal to zero. For the simplicity, bth n t -entries of the diagonal of Δ b are only equal to one. Following similar steps as (A.1), the Lagrange dual function is obtained as where P bs = diag[P 1 , . . . , P B ] and ⊗ is the Kronecker product [48]. The KKT conditions yield that where Λ bs = diag [λ 1 , . . . , λ B ] and Ω k can be defined in a similar way as (20). The dual problem can be expressed similarly to (21). Following the steps in Section 3.2, the gradient descent search direction is given by where tr b is a partial matrix trace over bth n t -entries of the diagonal terms of a matrix. diag b=1,...,B [·] gives a diagonal matrix with B elements computed for each b = 1, . . . , B.

Single-Antenna Receivers.
Although this paper studies a network MIMO system with multiple receive antenna users, the results can be applied to a system with single receive antenna users. In this case each user's transmission must be orthogonal to a vector (rather than a matrix), which is the basis vector for other users' transmissions. The optimization is over all real vectors with positive elements (R Nt + ) satisfying the power constraints. This approach facilitates the optimization presented in [30,31] using the generalized inverses and multistep optimizations.

Simulation Setup
The propagation model between each base station's transmit antenna and mobile user's receive antenna includes three factors: a path loss component proportional to d −β kb (where d kb denotes distance from base station b to mobile user k and β = 3.8 is the path loss exponent) and two random components representing lognormal shadow fading and Rayleigh fading. The channel gain between transmit antenna t of the base station b and receive antenna r of the kth user is given by where [H k,b ] (r,t) is the (r, t)th element of the channel matrix H k,b ∈ C nr ×nt from the base station b to the mobile user k, α (r,t) k,b ∼ CN (0, 1) represents independent Rayleigh fading, d 0 = 1 km is the cell radius, and ρ k,b = 10 ρ (dBm) k,b /10 is the lognormal shadow fading variable between bth base station and kth user, where ρ (dBm) k,b ∼ CN (0, σ ρ ) and σ ρ = 8 dB is its standard deviation. A reference SNR, Γ = 20 dB, is a typical value of the interference-free SNR at the cell boundary (as in [7,38]).
Our cellular network setup involves clustering. Since global coordination is not feasible, clustering with cluster sizes of up to B = 7 is considered. The cellular network layout is shown in Figure 3. A base station is located at the center of each hexagonal cell. Each base station is equipped with n t transmit antennas. There are n r receive antennas on each user's receiver, and there are K users per cell per subband. All N t = Bn t base stations' transmit antennas in each cluster are coordinated. In Figure 3 the clusters of sizes 3 and 7 are shown. For cluster size 7, one wrap-around layer of clusters is considered to contribute inter-cluster interference, while for B = 3 two tiers of interfering cells are accounted for. User locations are generated randomly, uniformly, and independently in each cell. For each drop of users, the distance of users from base stations in the network is computed, and path loss, lognormal, and Rayleigh fading are included in the channel gain calculations. User scheduling is performed employing a greedy algorithm with maximum sum rate (MSR) and proportionally fair (PF) criteria with the updated weights for the rate of each user as in [45][46][47]. To compare the results all the sum rates achieved through network MIMO coordination are normalized by the size of clusters B. Base stations causing inter-cluster interference are assumed to transmit at full power, which is the worst case as discussed in Section 2.

Numerical Results
In this section, the performance results (obtained via Monte Carlo simulations) of the proposed optimal BD scheme in a network MIMO coordinated system are discussed. The network MIMO coordination exhibits several system advantages, which are exposed in the following.

Network MIMO Gains.
While the universal network MIMO coordination is practically impossible, clustering is a practical scheme, which also benefits the network MIMO coordination gains and reduces the amount of feedback required at the base stations [26,38]. The size of clusters, B, is a parameter in network MIMO coordination. B = 1 means no coordination with optimal BD scheme applied. Figure 4 shows that with increasing cluster size throughput of the system increases. System throughput is computed using MSR scheduling and averaged over several channel realizations for a large number of user locations generated randomly. The normalized throughput for different cluster sizes is compared, which means that the total throughput in each cluster is divided by the number of cells in each cluster B. The normalized sum rate has lower variance in larger clusters, which shows that the performance of the system is less dependent on the position of users and that network MIMO coordination brings more stability to the system.

Multiple-Antenna Gains.
The intercell interference mitigation through coordination of base stations enables the cellular network to enjoy the great spectral efficiency improvement associated with employing multiple antennas. Figure 5 shows the linear growth of the maximum throughput achievable through the proposed optimal multicell BD and the capacity limits of DPC [22]. The number of receive antennas at each mobile user is fixed to n r = 2, and the number of transmit antennas n t at each base station is increasing. When the cluster size grows, the slope of spectral efficiency also increases. The maximum power on each transmit antenna is normalized such that the total power at each base station for different n t is constant.

Multiuser Diversity.
Multicell coordination benefits from increased multiuser diversity, since the number of users scheduled at each time interval is B times of that without coordination. In Figure 6, the multiuser diversity gain of network MIMO is shown with up to 10 users per cell. The MSR scheduling is applied for each drop of users and averaged over several channel realizations.

Fairness Advantages.
One of the main purposes of network MIMO coordination is that the cell-edge users gain from neighboring base stations signals. In Figure 7, the cumulative distribution functions (CDFs) of the mean rates for the users are shown and compared for B = 1 (i.e., beamforming without coordination) and B = 3, 7 using the proposed optimal BD scheme. There are 10 users per cell randomly and uniformly dropped in the network for each simulation. For each drop of users, the proportionally fair scheduling algorithm is applied over hundreds of scheduling time intervals using sliding window width τ = 10 time slots (see [17]). Each user's rates achieved in all time intervals are averaged to find the mean rates per user, and their corresponding CDF for several user locations is plotted. As shown by the plots, for B = 3 and B = 7 network MIMO coordination nearly 70% and 80% users have mean rate larger than 1 bps/Hz, respectively, while for the scheme without coordination it is only 45% of the users. However, fairness among users does not seem to be improved when cluster sizes increase. This is perhaps due to the existence of larger number of cell-edge users when cluster size increases.

Convergence.
Convergence of the gradient descent method proposed in Section 3.2 is illustrated in Figure 8. The normalized sum rates obtained after each iteration with respect to the optimal target values versus the number of iterations are depicted. The convergence behavior of the algorithm for 20 independent and randomly generated user location sets is shown, and their channel realizations are tested with the proposed iterative algorithm, and the values of sum rate after each iteration divided by the target value are monitored. For nearly all system realizations, the optimizations converge to the target value within only 10 first iterations with 1% error.

Conclusions
In this paper, a multicell coordinated downlink MIMO transmission has been considered under per-antenna power constraints. Suboptimality of the conventional BD considered in earlier research has been shown, and this has motivated the search for the optimal BD scheme. The optimal block diagonalization (BD) scheme for network MIMO coordinated system under per-antenna power constraints has been proposed in the paper, and it has been shown that it can be generalized to the case of per-base station power constraints. A simple iterative descent gradient algorithm has also been proposed in the paper, which determines the optimal precoders for multicell BD. The comprehensive simulation results have demonstrated advantages achieved by using multicell coordinated transmission under more practical per-antenna power constraints.

A. Proof of Theorem 1
We consider the optimization problem (19). For the ease of further analysis, let us substitute Note that the rank constraint on Φ k must be inserted into the optimization when m r > n r , and hence it makes the problem nonconvex. Thus, to analyze this problem two cases are considered based on the value of m r with respect to n r . In the first case m r = n r , when the total number of transmit antennas at all base stations, N t , is equal to the total number of receive antennas at all K served users, N r . In the second case N t > N r .
A.1. (N t = N r ). This happens when exactly K = N t /n r users are scheduled. In this case, the rank constraint over Φ k can be dropped because m r = n r , and therefore the optimization problem in (19) is convex. The matrices G k are also square and invertible. Therefore which is an N t × n r matrix. Thus, the throughput maximization problem can be expressed as (since where S k ∈ C nr ×nr . Although one possibility is to perform this convex optimization with Kn r (n r − 1)/2 variables introducing logarithmic barrier functions for inequality power constraints and the set of positive semidefinite constraints, we approach the problem by establishing the dual problem and solving it through simple and efficient gradient descent method [23]. Hence, the Lagrangian function can be formed as where Λ = diag(λ 1 , . . . , λ Nt ) is a dual variable which is a diagonal matrix with nonnegative elements, λ i ≥ 0. The positive semidefinite matrix Ω k is a dual variable to assure positive semidefiniteness of S k . The Karush-Kuhn-Tucker (KKT) conditions require that the optimal values of primal and dual variables [23] satisfy the following:  Since the constraint functions are affine, strong duality holds, and thus the dual objective reaches a minimum at the optimal value of the primal problem [23]. As a result, the Lagrange dual problem can be stated as minimize g(Λ) subject to Λ 0, Λ diagonal. (A.7) The gradient of g can be obtained from (A.6) as This gives a descent search direction, ΔΛ = −∇ Λ g, for the gradient algorithm for the Lagrange dual problem [23].

A.2. N t > N r .
When the total number of transmit antennas is strictly larger than the total number of receive antennas in the network (i.e., N t > N r ) the optimization problem in (A.1) is no longer convex due to the rank constraints. We relax the problem and show that it leads to an optimal solution, which also satisfies the rank constraints in the original problem. Similar gradient algorithm to the one for N t = N r can be deployed to find the optimal BD precoders.
Recall that m r = N t − (K − 1)n r . Thus, when the total number of transmit antennas is strictly larger than the total number of receive antennas, N t > N r , then m r > n r . From Section 3 note that V k is an N t × m r matrix, and correspondingly the size of Ψ k is m r × n r which enforces a rank constraint over Φ k = Ψ k Ψ H k . (i.e., rank(Φ k ) ≤ n r ). This updates the optimization in (19) by adding the rank constraints as The problem above is not convex due to the rank constraint. Assume the convex relaxation problem obtained by removing the rank constraint. The problem can then be expressed as Since this problem is convex and the constraints are affine, any solution satisfying the KKT conditions is optimal [23]. Let us introduce an optimization problem S k 0, k = 1, . . . , K. (A.11) Assume that the optimal solutions for this problem are S k s. Defining Φ k = G † k S k (G † k ) H satisfies all the KKT conditions for (A.10), since G k Φ k G H k = S k . Furthermore, rank(Φ k ) =rank(S k ) ≤ n r which also satisfies the rank constraint in the original optimization problem (A.9). Note that also Φ k 0 ⇔ S k 0 (see [48, p. 399] ).
The optimization in (A.11) is equivalent to the convex optimization problem in (A.1) by replacing Q k = V k G † k . Recall that when m r = n r then the matrix G k is square and invertible. Hence, Q k = V k G −1 k , as defined in Section A.1. Thus, this problem can be solved through the gradient descent method applied to the dual problem (A.7) with the gradient descent search direction (A.8). The stopping criterion is also the same as (25) except that Q k has different definition.
Note that (24) can be simply concluded from the first equation of the KKT conditions (A.3) and the definition of for the optimal value of dual variables Λ .