Constrained power allocation schemes for coordinated base station transmission using block diagonalization

In this study, we propose several power allocation schemes in a coordinated base station downlink transmission with per antenna and per base station power constraints. Block Diagonalization is employed to remove interference among users. For each set of power constraints, two schemes based on the waterfilling distribution are proposed and compared to the optimal solution, which can only be obtained numerically by using convex optimization. We show that the proposed schemes achieve a performance, in terms of weighted sum rate, very close to the optimal, without the heavy computational complexity required by the latter. The sum rates are compared first in a simplified two-user two-cell case where we also compare our approach to the previous solutions available in the literature. Then, we examine the performance in a multi-cell scenario where we also evaluate the degradation of the performance caused by imperfect channel state information.


Introduction
Space-division multiplexing (SDM) based on multiple input-multiple output (MIMO) techniques emerged as a means of achieving high-capacity communications [1]. However, the introduction of MIMO processing in cellular networks does not offer the expected benefits, the main reason being the interference that characterizes these environments. SDM requires high signal-to-noise-plusinterference ratios (SINR) to leverage its capacity-achieving potential. Unfortunately, the interference in cellular systems lowers the operating point toward low SINR, thus making MIMO processing not so advantageous. Recently some study has been devoted to manage interference in cellular systems with reuse one, where all cells are allowed to use the same frequencies, also known as universal frequency reuse [2]. In [3], the other-cell interference (OCI) is considered when designing the transmission for a multiuser MIMO downlink. In [4], the authors analyze several approaches for overcoming interference in SDM MIMO cellular networks. If the interference is known by the transmitters, then cooperative encoding among base stations using dirty paper coding (DPC) can suppress OCI [5]. In [6], several strategies are proposed to perform coordinated base station transmission (CBST). Interference is eliminated by jointly and coherently coordinating the transmission from the base stations in the network, assuming that base stations know all downlink signals. Besides DPC, they propose a zero-forcing (ZF) scheme that, although suboptimal, does not involve the complexity of DPC. The capacity of MIMO benefits from CBST not only because of the rise of the operating SINR point, but also from the better rank condition of the joint channel matrix resulting from non-collocated base stations [7].
Similar to multi-user MIMO, block diagonalization (BD) [8,9] may be applied for CBST as a good compromise between complexity and performance. In [10], BD is applied in a multicell scenario in combination with the OCI reduction scheme of [3]. Alternatively, in [11], a singular value decomposition (SVD) approach is proposed that simplifies the channel estimation requirements at the expense of a performance degradation.
In this article, we focus on BD-based CBST with different power constraints at the transmission side, with the aim of maximizing the weighted sum rate (WSR) of the users in a cellular network. A first reasonable assumption for power constraints is to consider that each base station (BS) has restricted its total transmission power; this was used, for example, in [6,10,12]. Alternatively, per antenna constraints may be more realistic, since each transmission antenna is usually driven by its own high-power amplifier [13].
In this article, we consider both per base station and per antenna restrictions. For each of them, we will formulate the optimization problem and derive two power allocation schemes that resemble the well-known waterfilling (WF) distribution. While WF is known to achieve capacity in single-user frequency-selective transmission [14], modified versions of WF also give the capacity-achieving power allocation in multi-user communications [15,16]. In [10], a scaled WF (SWF) scheme is heuristically proposed for the case of per base station power constraints to avoid a lengthy numerical optimization. However, its performance is not discussed nor compared to optimal approaches. In [7], a BD scheme denoted as JT-decomp is proposed where the powers are assigned to the users' transmissions with the only aim of insuring that per base station power constraints are fulfilled. No optimization is performed on the transmit powers to maximize the achievable rates. Consequently, the obtained rates are lower. Also, some partial results of the study shown here, again only for the case of per base station power constraints, have been presented in [17]. We will show that the schemes that we are proposing, although suboptimal, perform very close to the optimum power allocation-obtained by numerical convex optimization-with a reduced complexity.
In brief, the innovative contributions of this article are the following. We develop closed-form and implementable solutions for the power allocation in a BD-based CBST system with realistic power constraints at the transmission side. These solutions are not empirical, but they are obtained, starting from the optimal allocation, using only few approximations that allow us to understand why they perform close to the optimum. In the case of per base station power constraints, one of our proposals gives the same result as SWF [10], while the others are new. We show also that our approaches reduce dramatically the complexity with respect to the optimal search. Moreover, we consider also the effect of errors in the channel estimation and of a time-varying channel, in which the use of outdated channel state information due to the feed-back delay reduces the achievable rates.
The remainder of this article is structured as follows. In the next section, the system model is presented; in the "Constrained optimization and optimal power allocation" section, the optimization problem is described; while in the "Waterfilling distributions for suboptimal power allocation schemes" section the proposed power allocation schemes with per base station and per antenna constraints are developed. The "Numerical results" section discusses some performance results and the "Complexity" section explores the complexity of the proposed solutions. The article concludes with some concluding remarks.
Notations: In this article, the following notations will be used. Boldface symbols will be used for matrices and vectors, while italic letters will be used for scalars. Superscripts T and H denote the transpose and the Hermitian transpose of a matrix, respectively; superscript * refers to an optimal solution; [·] + denotes the maximum between zero and the argument; and ||·||| F denotes de Frobenius norm of a matrix.

System model
The system model assumes a coordinated transmission downlink scenario, where M base stations serve N users. Each base station has t transmit antennas, and each user has r receive antennas. Although our analysis is general, the performance will be illustrated for BS-user pairs; therefore the case M = N will be considered in the "Results" section.
Assuming narrowband transmission (if the channel is frequency selective, it can be decomposed into a number of parallel non-interfering subchannels, each experiencing approximately frequency-flat fading), the channel may be modeled by a Nr × Mt channel matrix H where each matrix coefficient represents the gain from each transmit antenna in the BS to each receive antenna at the user side.
The received signal model is as follows: where y is the received Nr × 1 signal vector, x is the Mt × 1 signal vector transmitted from all the BSs, and n is a Nr × 1 vector of i.i.d complex Gaussian entries with zero mean and unit variance. If we define H i with i = 1...N as the r × Mt channel matrix seen by user i, then Figure 1 illustrates the reference scenario for the particular case of M = N = 3, t = 3 and r = 1.
For the CBST scenario, we define x as follows: 11 ,..., b 1r ,..., b Nr ] T , b ij represents the jth symbol for user i transmitted with power P ij , the precoding matrix is defined as W = [w 11 ,..., w 1r ,..., w Nr ] and The precoding sub-matrices W i = [w i1 ,..., w ir ] will be obtained through BD as in [6,8,10], to guarantee that there is no inter-user interference, that is where 0 is an all-zero matrix of dimensions r × r. If where V (0) i holds the last Mt − l i right singular vectors. We consider another SVD: where V represents the transmission vectors that maximize the information rate for user i subject to the condition of canceling interference. Therefore, U i is an r × r unitary matrix and i . Then, the received signal can be expressed as Each user independently rotates the received signal and decouples the different streams: where the noiseñ remains white with the same covariance because of the unitary transformation. BD is possible in this scenario if the condition Mt ≥ Nr is satisfied [6,8,10].

Constrained optimization and optimal power allocation
Under the BD-CBST strategy, it can be observed from (9) that the overall system is then turned into a set of parallel non-interfering channels. Therefore, the achievable rate of user i is R i = r j=1 log 2 (1 + λ ij P ij ). We would like to maximize a weighted sum of the rates R i for the set of users, which requires solving the following optimization problem in terms of the power P ij allocated to the jth stream of user i: , can be seen as indicating the priorities of the users: the closer α i is to 1, the higher the priority given to user i. In the particular case of α i = 1/N, for all i, the solution of the above problem maximizes the sum rate.
In this context, two different constraints on the power available at the transmitter side may be considered. The first one deals with per base station restrictions, where each base station k has a maximum available power P maxBS to transmit. a Then, the power allocation P ij should fulfill the following constraints: for each BS k = 1,..., M.
The second set of constraints that may be considered on the power available at the transmitters is given by a restriction of the maximum power P maxAn transmitted by each antenna of each BS (per antenna constraints). b The restriction of the maximum power transmitted by each antenna l = 1,..., t of each BS k = 1,..., M, conditions the power allocation P ij as follows: Maximizing a weighted sum of the rates R i under any of the two proposed constraints is a convex problem, since the logarithmic function is concave in the power assignments: the additional operation preserves concavity, and the constraints (11) are linear. Therefore, the optimal solution may eventually be derived by numerical convex optimization techniques [18,19]. However, closed-form solutions, even if suboptimal, are highly preferable, to reduce the computational time and resources required by the CBST for the power allocation. Thus, we approach a closed-form solution of the problem by applying the Lagrange duality theory.
When we have the per base station constraints (11), Λ(P, μ) is given by and Similarly, when the per antenna constraints (12) are applied, Λ(P, μ) is given by and With the per base station constraints, the solution of the problem is given by a point [P*, μ*] that satisfies the set of Nr +M equations: Again and similarly, with the per antenna constraints, the solution should satisfy the set of Nr + Mt equations: The solution of (18) and (19) is given by the values of P* and μ* such that where the vector of Lagrange multipliers μ*, which defines L BS/An ij , should be chosen so that each set of power constraints is satisfied. It can be observed that in both cases, the solution resembles the well-known WF distribution. However, here the waterlevel is given by , that is, the waterlevel is different for each symbol j to be transmitted to each user i.
We have obtained an expression for the power allocation that is still highly complex. However, this procedure gives us an insight on how to build alternative simplified schemes based on the same idea of the wellknown WF. Although suboptimal, they may perform close to the optimal solution, with the advantage of a much lower optimization burden.

Modified waterfilling
By analyzing the set of constraints in either (11) or (12), the solution that we propose is to reduce the problem by considering an equivalent virtual BS (antenna) that would lead to a single constraint equation. The underlying idea is that, instead of all the BSs (antennas) giving a constraint on the powers P ij allocated to each stream j of user i, we choose the BS (antenna) that needs more power to transmit this user information stream and, hence, will be the first to violate the constraint if we increase P ij . Defining the new quantities the simplified optimization problem reduces the constraints to just one, becoming It should be noted that this new constraint is more restrictive than all the previous ones. Therefore, if we satisfy this restriction, then we also fulfill the restrictions in (11) or (12). Application of the Lagrange multiplier technique gives the new function: partial derivatives of which, with respect to the powers P ij , give the set of equations: Hence, Therefore, the problem is equivalent to finding the Lagrange multiplier (or constant K MWF = 1 ln(2)μ ) such that, for all the power levels P ij , the following equation holds: where K MWF must be found to fulfill the constraints (11) or (12). This corresponds again to a WF distribution with variable waterlevel. However here, and unlike the optimal solution in (20), in the variable waterlevel, we have decoupled the term containing the Lagrange multiplier K MWF from BS/An ij and a i . That is, the problem reduces to finding the only unknown value K MWF in (26), while the variability in the waterlevel is confined to the known parameters BS/An ij and a i . This can be solved with the same type of algorithms that solve standard WF [20].

Waterfilling
In order to further simplify the solution to the optimization problem, we may consider the fact that, in a practical realization, the values of BS/An ij are close to each other for all i, j. Then, we assume them to be constant and include that constant into the waterlevel to simplify the solution in (26) giving where again K WF must be found to fulfill the constraints (11) or (12). This corresponds to a WF distribution with the waterlevel modified only by the user priorities. In particular, for equal priorities, α i = 1/N, which corresponds to a standard WF.

Numerical results
In this section we compare the performance of the proposed modified waterfilling (MWF) of (26), waterfilling (WF) of (27), and the optimum solution found by numerical convex optimization (CVX) [21]. For the sake of comparison, we also include the rates achieved when using the scaled WF (SWF) proposed in [10] for per base station constraints, the results of [7], also for per base station constraints, and a uniform power distribution (UP). In the case of UP, the power allocated to each user stream is the same and corresponds to the maximum value that fulfills either constraints (11) or (12).
In the following subsection, we analyze the achievable rates for each scheme in a simple scenario to understand how close they perform without the influence of the fading model. Then, in the subsequent subsections, we analyze a more realistic scenario with the effects of imperfect channel estimation and of the feed-back delay in a time-varying channel, which outdates the current channel with respect to the one used for precoding and power assignment.

Achievable rates
In a simple two-BS, two-user case (M = N = 2), we consider a simplified channel model where the matrix channel entries are independent identically distributed complex Gaussian random variables with zero-mean and unit variance. We set P maxBS = 1 and P maxAn = 1/t. We find the boundary B of the region of achievable rates for each proposed scheme as B(α) = αR 1 +(1 -α)R 2 , for α [0, 1], with (R 1 , R 2 ) being the pairs in the achievable region. Figures 2 and 3 show the regions of mean achievable rates, averaged over 1,000 channel realizations, with the per base station and per antenna power constraints, comparing the three different approaches with the uniform power allocation as a reference. SWF is also shown when the per base station constraints are used. Different values of the number of transmit and receive antennas are considered. It can be seen that the gap between the achievable rates obtained with WF and MWF and the optimal solution CVX is very narrow for the case of per base station constraints, while for the per antenna constraints, the difference between CVX and the waterfilling distributions becomes more noticeable. In both cases, these rates are considerably higher than what is achieved by UP. When the power is constrained per base station the performance of WF and SWF are the same; however, the use of MWF can give a small improvement. The fact that MWF performs better than WF is more visible when the power is constrained per antenna, because BS ij exhibit less variability and can be better approximated by a constant. In any case, the increase of mean achievable rates with higher values of t and/or r is substantial, meaning that the capabilities of multiple antennas are leveraged.
It is also interesting to analyze with more detail the behavior of UP in Figures 2 and 3 for t = r = 2 and t = 2, r = 1. CBST is transmitting as many data streams per user as the number of receive antennas r (10), each multiplied by the elements of the diagonal matrix S i (λ 1/2 ij ) after the compound effect of transmit, channel, and receive processing. This means that one stream for r = 1 and two streams for r = 2 are transmitted using, therefore, 1 or 2 values of λ 1/2 ij per user i. For each user, in these channel conditions, one of these values is generally considerably higher than the other, and so sharing the transmission power between two streams (r = 2) in the case of UP results in a waste of power that renders a lower rate than just using the entire available power in one stream (r = 1). One illustrative example from a particular channel realization: for r = 1 we have 6878 . This is a well-known effect leading to the dominant eigenmode transmission concept described in [22]. Figure 4 shows the average achievable rates when users' transmissions have the same priority (α = 0.5). Average rates (over 10,000 channel realizations) are plotted for different values of t and r and the two considered types of power restrictions. We can see that when the number of antennas is increased, the advantage of using the WF schemes over a uniform power distribution is more evident. The advantage of MWF over WF is relatively small, and they both perform close to CVX. Finally, we can observe that the per antenna power constraints, even though they may be more realistic, reduce the degrees of freedom in the power assignment, and therefore this leads to worse performance compared to the per base station power constraints. The effect is more noticeable for high antenna dimensions. In this figure, the mean achievable rates obtained with the JT-decomp precoding proposed in [7] for per base station power constraints are also shown. Since no optimization is performed on the transmit powers to maximize the achievable rates, the obtained mean rates are lower with this scheme.

Effect of an erroneous or outdated channel estimation
In the results of previous subsection, we assumed that the channel was perfectly estimated at each receiver and instantaneously fed back to the base stations so that BD insured that perfect cancelation of the interference was achieved. However, the channel is usually estimated at the receivers using the information conveyed by pilot symbols, and this estimation will normally be corrupted by additive white Gaussian noise (AWGN). Moreover, sending the estimated channel state information (CSI) to the base stations will require some time, and therefore a delayed version of the estimated CSI will be available there. If BD is performed with erroneous or outdated CSI, then the diagonalization will not be perfect and some interference will remain. The power will be subsequently allocated using the wrong estimates. With the results shown in the subsequent figures, we discuss theses two effects.
The effects of imperfect channel estimation are evaluated using a noisy estimate of the channel matrix H instead of the real one where H σ is a matrix of i.i.d. complex Gaussian entries with zero mean and variance σ 2 H . BD is performed with the imperfectly estimated H , and therefore the power allocation is determined using the singular values obtained with this estimation error. The mean squared error (MSE) of the channel estimation is defined as which is coincident with the normalized MSE of [23] where we can see [ [23], Figure 4] that values of MSE lower than 10 -1 can be achieved for operational numbers of antennas and signal-to-noise ratio (SNR) values.
The effects of the imperfect channel estimation are examined in Figure 5 for the case of per base station power constraints. If the mean rates that may be achieved with perfect channel estimation are denoted as R per and the mean rates achieved with imperfect channel estimation are denoted as R imp , then the relative loss (R per -R imp )/R per is plotted. We can observe that for reasonable values of the MSE obtained in the channel estimation (up to 10 -1 ), the values of the relative loss are small, and so the degradation caused by imperfect channel estimation is not important. If the MSE increases above 10 -1 , then a degradation of the achievable rates can be observed, which increases with the number of  antennas. We can see that the degradation obtained when using WF and UP is in general quite similar. Although not included, the behavior of MWF is the same as WF.
The effects of outdated CSI are examined in Figures 6,  7, and 8 for the case of per base station power constraints. Here, we have evaluated the performance in a more realistic scenario with M = N = 64 as described in [6]. Cells of radius d 0 = 1.6 km are arranged to form a torus which avoids the boundary effect that causes cells at the border of the cellular deployment to receive less interference. Each cell has a BS in its center and a single user allocated in the shared frequency, time-slot, or code resource of interest. The position of each user is randomly varied according to a uniform distribution over the area of each cell.
The channel-fading coefficients account for path loss with exponent decay 3.8, lognormal shadow fading with mean of 0 dB and standard deviation of 8 dB, and Gaussian complex fading with zero mean, unit variance, and with a Doppler spectrum modeled by a Jakes filter [24] with maximum Doppler spread f D . Given that in the system model we assume unit variance noise, we normalize the path loss (PL) accordingly to account for different SNRs, which are specified at the cell boundary (at distance d 0 from the center) as In this definition of SNRs, only the effect of path loss is included, not of the shadowing or fading, according to [6]. Also, it should be noted that the receivers placed closer to the BS will experience a higher value of SNR. Figures 6, 7, and 8 show the CDF of the rates achieved in this scenario when t = r = 1, t = r = 2, and t = r = 4, respectively. The SNR at the cell boundary is 18 dB. The parameter D indicates the delay between the actual CSI and the CSI being used for the BD. That is, the CSI is outdated by a delay D with respect to ideal CSI. In the figures, we show the performance for different values of D conveniently normalized with respect to the channel coherence time (T c = 1/f D ). We can observe that the delay must be very small compared to the coherence time of the channel to cancel effectively the interference. When D = 0.001T c , the degradation of the rates is already substantial. The degradation is more accentuated when transmitter and receiver have a higher number of antennas. Actually, with D = 0.005T c , the advantages of increasing the number of antennas are lost, and the performance is basically the same for all the number of antennas considered in these figures.
A value D = 0.001T c is in line with the feed-back delay used in [25] to evaluate the performance of closed-loop MIMO systems (D = 0.1 ms with T c = 167 ms, so D = 0.0006T c ), while smaller values of delay seem infeasible in practice. Therefore, these results confirm that the use of outdated CSI can seriously degrade the performance of BD, and therefore efficient feed-back mechanisms must be designed, which are beyond the scope of this article. We can note that UP and WF suffer approximately the same degradation (and also MWF not shown), while CVX is more prone to the effects of the outdated CSI, which makes sense, since it strongly relays on the channel information to optimize the power allocation.

Complexity
The optimum power distribution can be obtained through a convex optimization procedure, while WF approaches allow a much reduced complexity at the expense of some performance degradation. In this section, we examine the difference in terms of complexity between both approaches for the power optimization procedure.
Since the power is distributed over Nr user transmissions, the complexity does not depend on the number of transmit antennas or base stations (as long as Mt ≥ Nr as required for BD). Therefore, the complexity of WF, MWF, and CVX does not increase with the number of antennas per BS, which is a preferable characteristic, since often t > r in practice.
A thorough comparison of complexity of the methods is not easy, since the optimization procedures are adaptive with a number of operations that can vary according to   the channel realization. In general, the convex optimization by using interior-point methods implements a Newton search with a number of iterations, which is slightly dependent on the problem size, and in most of the cases can be considered limited to few tens, while inside each Newton iteration, the complexity is dominated by the determination of the so-called Newton step which has a complexity order of about (Nr) 3 /3 [19]. For the WF, again, we can have a number of iterations variable with the channel conditions and the required accuracy; however, a theoretical number of operations for each iteration is on the order of Nr log(Nr). In the specific case of modified WF, the search procedure cannot be optimized as in the WF, because of the variable waterlevel, and the complexity saving with respect to the convex optimization is lower.
To get a practical idea, we denote by T CVX , T WF , and T MWF the mean execution time of CVX, WF, and MWF, respectively, all averaged over 1,000 channel realizations, and we plot in Figure 9 the ratios T CVX /T WF and T CVX / T MWF varying the number of cooperating base station-user pairs with per base station power constraints. For illustration purposes, we have set the number of antennas per base station to t = 4; however, the results do not depend on the value of this parameter. The execution times obtained for r = 1,2, and 4 have been averaged and plotted in this figure. The channel conditions are the same as in the "Achievable rates" section. The simulations were run on an Intel Core 2 Duo CPU at 2.53 GHz with 2.00 GB RAM; however, since we are dealing with time ratios, the same values can be expected in other processors.
We have to keep in mind that the specific code implementation of the convex optimization and WF will have an impact on the measured times, and so what we give here is just an idea of their relative execution times. Having said that, we can observe that both WF and MWF are always more than three orders of magnitude faster than CVX. More specifically, WF is between 6,000 and 12,000 times faster, while MWF is between 1,000 and 2,000 times faster. Therefore, even if they are suboptimum, WF and MWF make a good choice in terms of the balance between complexity and achievable rates.

Conclusions
We have proposed two power optimization schemes (WF and MWF) for the CBST downlink based on BD with different transmit power constraints. Both are derived with a technique similar to the WF distribution: the first (WF) has the lowest complexity and reduces to the standard WF if the user priorities are the same. In the case of per base station constraints, it achieves the same performance as SWF of [10]. The second (MWF) shows a better performance, more noticeable for the case of per antenna constraints, with a small increase in complexity. They both perform close to the optimal solution. However, the optimum can be derived only by resorting to the numerical solution of the convex optimization problem, with a heavy computational complexity, much higher than the proposed schemes of WF and MWF. Also, the degradation in terms of mean rates caused by imperfect channel estimation is small for reasonable values of the MSE of the channel estimation. However, our simulation results confirm the need of a fast feed-back of the estimated CSI to the base stations to avoid a severe degradation of the rates.
We have observed that the rates achieved with the more realistic per antenna constraints are lower compared to a per base station one. In general, the proposed schemes allow us to obtain the capacity improvements of MIMO, canceling the high amount of interference which characterizes cellular environments.
In [26], it is shown that, in the context of the Broadcast Channel, the performance of BD gets very close to DPC with a proper selection of the user scheduling. In further study, we will cope with the joint optimization of the power allocation, the precoding scheme, and the user scheduling.
Endnotes a We assume, without loss of generality, that all the base stations have the same maximum available power. b Again, we assume, without loss of generality, that all the antennas in all the base stations have the same maximum available power.