A novel distributed power allocation scheme for coordinated multicell systems

Coordination between base stations (BSs) is a promising solution for cellular wireless systems to mitigate intercell interference, improving system fairness, and increasing capacity in the years to come. The aim of this manuscript is to propose a new distributed power allocation scheme for the downlink of distributed precoded multicell MISO-OFDM systems. By treating the multicell system as a superposition of single cell systems we define the average virtual bit error rate (BER) of one single-cell system, allowing us to compute the power allocation in a distributed manner at each BS. The precoders are designed in two phases: first the precoder vectors are computed in a distributed manner at each BS considering two criteria, distributed zero-forcing and virtual signal-to-interference noise ratio; then the system is optimized through distributed power allocation with per-BS power constraint. The proposed power allocation scheme minimizes the average virtual BER over all user terminals and the available subcarriers. Both the precoder vectors and the power allocation are computed by assuming that the BSs have only knowledge of local channel state information. The performance of the proposed scheme is compared against other power allocation schemes that have recently been proposed for precoded multicell systems based on LTE specifications. The results also show that although our power allocation scheme is based on the minimization of the virtual uncoded BER, it also has significant gains in coded systems.


Introduction
The rapid growth of wireless traffic and the number of devices have as a result the interference level to continuously increases, which significantly degrades the capacity gains promised by the single-cell MIMO-based techniques [1]. An attractive option to improve the system capacity is the cell reduction concept. However, the deployment of a large number of small cells is not without new technical challenges [2]. Most of the interference mitigation challenges originate from the edge users/devices that are increasing as the number of cells increase. Multicell cooperation or coordination is a promising solution for cellular wireless systems to mitigate intercell interference, improving system fairness and increasing capacity [3,4], and thus is already under study in LTE-Advanced under the coordinated multipoint (CoMP) concept [5].
There are several CoMP approaches depending on the amount of information shared by the transmitters through the backhaul network and where the processing takes place, i.e., centralized if the processing takes place at the central unit (CU) or distributed if it takes at the different transmitters. Coordinated centralized beamforming approaches, where transmitters exchange both data and channel state information (CSI) for joint signal processing at the CU, promise larger spectral efficiency gains than distributed interference coordination techniques, but typically at the price of larger backhaul requirements and more severe synchronization requirements. Some sub-optimal centralized precoding schemes have been discussed in [6]. The interference is eliminated by joint and coherent coordination of the transmission from the base stations (BSs) in the network, assuming that they share all downlink signals. In [7], the inner bounds on capacity regions for downlink transmission were derived with or without BS cooperation and under per-antenna power or sum-power constraint. Two centralized multicell precoding schemes based on the waterfilling technique have been proposed in [8]. It was shown that these techniques achieve a close to optimal weighted sum-rate performance. Based on the statistical knowledge of the channels, CU that performs a centralized power allocation that jointly minimizes the outage probability of the user terminals (UTs) was proposed in [9]. In [10], a clustered BS coordination is enabled through a multicell block diagonalization (BD) strategy to mitigate the effects of interference in multicell MIMO systems. A BD cooperative multicell scheme was proposed in [11] where the weighted sum-rate achievable for all the UTs is maximized.
Distributed precoding approaches, where the precoder vectors are computed at each BS in a distributed fashion, have been proposed in [12] for the particular case of two UTs and generalized for K UTs in [13]. It is assumed that each BS has only the knowledge of local CSI and based on that a parameterization of the beamforming vectors used to achieve the outer boundary of the achievable rate region was derived. In [12,13], some distributed power allocation algorithms, for the derived precoder vectors, were proposed to further improve the sum-rate. In [12], a very simple channel power splitting was considered and no optimization metric was assumed. In [13], a heuristic power allocation based on maximization of a metric related with the sum-rate was derived. A promising distributed precoding scheme based on zero-forcing criterion with several centralized power allocation approaches, which minimize the average bit error rate (BER) and sum of inverse of signal-to -interference noise ratio (SNIR), was proposed in [14]. These distributed schemes were evaluated and compared with some full centralized multicell schemes in [15]. In [16], two interference mitigation techniques have been investigated and compared, namely interference alignment and resource division multiple access.
The aim of this study is to propose a distributed power allocation scheme for the downlink of distributed precoded multicell MISO-OFDM systems. By considering the multicell system as a superposition of single-cell systems we define the average virtual BER of one singlecell system. This allows us to compute the power allocation in a distributed manner at each BS. The precoder is designed in two phases: first the precoder vectors are computed based on distributed zero-forcing (DZF), and distributed virtual SINR (DVSINR), recently proposed. Then the system is further optimized by proposing a new distributed power allocation algorithm that minimizes the average virtual BER (VBER), under per-BS power constraint. With the proposed strategy, both the precoder vectors and the power allocation are computed at each BS in a distributed manner. The considered criterion for power allocation essentially leads to a redistribution of powers among users and subcarriers, and therefore provides users fairness mainly at the cell edges, which in practical cellular systems may be for the operators a goal as important as throughput maximization. To the best of the authors' knowledge, distributed power allocation solutions, for distributed precoded multicell systems, based on minimization of average VBER has not been addressed in the literature. The major contributions are the following: We define the average VBER by treating the multicell system as a superposition of individual single cell systems. We develop a new distributed power allocation scheme for precoded multicell systems, which minimizes the average VBER. The solution is based on Lambert's W(x) function of index 0, W 0 (x). Preliminary uncoded numerical results have been presented in [17]. We derive upper and lower bounds for the Lambert's W 0 (x) function for x ≥ 0. These bounds are used to reduce the search space for the optimum solution and therefore efficiently perform the power allocation procedure.
The remainder of the article is organized as follows: Section 2 presents the multicell MISO-OFDM system model. Section 3 briefly describes the considered distributed precoder vectors, namely the DZF and DVSINR. The new distributed power allocation scheme is derived in Section 4. Section 5 presents the main numerical results. The conclusions will be drawn in Section 6.
Notation: Throughout this article, we will use the following notations. Lowercase letters, boldface lowercase letters, and boldface uppercase letters are used for scalars, vectors, and matrices, respectively. (.) H represents the conjugate transpose operator, E[.] represents the expectation operator, I N is the identity matrix of size N × N, CN :; : ð Þ denotes a circular symmetric complex Gaussian vector.

System model
We consider B BSs, each equipped with N t b antennas, transmitting to K single antenna UTs, as shown in Figure 1. Also, we assume an OFDM-based system with N c available subcarriers. Under the assumption of linear precoding, the signal in frequency domain transmitted by the BS b on sub-carrier l is given by where p b,k,l represents the power allocated to UT k on sub-carrier l and BS b w b;k;l ∈C N tb Â1 is the precoder of user k at BS b on sub-carrier l with unit norms, i.e., ‖w b,k,l ‖ = 1, b = 1, . . ., B, k = 1, . . ., K, l = 1, . . ., N c . The data symbol s k,l , with E[|s k,l | 2 ] = 1, is intended for UT k and is assumed to be available at all BSs. The average power transmitted by the BS b is then given by where x b is the signal transmitted over the N c subcarriers. The received signal in frequency domain at the UT k on sub-carrier l y k,l ∈ C 1 × 1 , can be expressed as Where h b;k;l e CN 0; ρ b;k I N t b of size N t b Â 1 , represents the channel between user k and BS b on subcarrier l and ρ b,k is the long-term channel power gain between BS b and UT k, and n k;l e CN 0; σ 2 ð Þ is the noise. From (1) and (3), the received signal in frequency domain at UT k on sub-carrier l can be decomposed in assuming that the cyclic prefix is long enough to account for different overall channel impulse responses between the BSs and the UTs. From (4), the instantaneous SINR of user k on sub-carrier l can be written as where type ∈ {DZF, DVSINR}. Assuming an M-ary QAM constellations and a Gaussian approximation of the overall interference plus noise, the instantaneous probability of error for user k and data symbol transmitted on subcarrier l is given by [18] The considered scenario with k UTs (illustrated for B = 4 BSs equipped with N tb antennas), the subcarrier script is omitted for simplicity.

Distributed precoder vectors
In this section, we briefly describe the distributed precoding vectors, namely DZF and DVSINR, recently proposed. To design the distributed precoder vectors we assume that the BSs have only knowledge of local CSI, i.e., BS b knows the instantaneous channel vectors h b,k,l ∀ k, l, reducing the feedback load over the backhaul network as compared with the full centralized precoding approach. Hence, there is no exchange of CSI between BSs, thus allowing the scalability of multicell cooperation to large and dense networks. Each BS has CSI for its links to all receivers, which is non-scalable when the resources for CSI acquisition are limited. However, it is still a good model for large networks as most terminals will be far away from any given transmitter and thus have negligibly weak channel gains, as discussed in [13]. Recently, a simple and versatile-limited CSI feedback scheme from UTs to the BSs has been proposed in the context of multipoint coordination based systems [19].

DZF
Zero-forcing is a classic beamforming strategy which removes the co-terminal interference. In this case, w b,k, l (DZF) in (5) is a unit-norm zero-forcing vector orthog- contain the channels of all users except the kth. The SVD of e H b;k;l can be partitioned as follows The columns of W b;k;l are candidates for k's precoding vector since they will produce zero interference at the other UTs. An optimal linear combination of these vectors can be given by [14] w It can be shown that the solution given by (8) is equivalent to the one based on the orthogonal matrix projection onto the column space of H e b;k;l discussed in some works (e.g., [13,20]). The equivalent channel, , is a positive real number, which means that the signals arriving at a given UT from different BSs will add coherently. It should be emphasized that the precoder vectors given by (8) only holds for N t b ≥K .

DVSINR
Intuitively, the maximal ratio transmission is the asymptotically optimal strategy at low SNR, while ZF has good performance at high SNR or as the number of antennas increases. As discussed in [13], the optimal strategy lies in between these two precoders and cannot be determined without global CSI. However, inspired by the uplink-downlink duality for broadcast channels, Bjornson et al. [13] have derived a novel DVSINR precoder. The precoder vectors are achieved by maximizing the SINRlike expression in (9) where the signal power that BS b generates at UT k is balanced against the noise and interference power generated at all other UTs. It should be mentioned that the DVSINR in (9) is similar to a signal-to-leakage-pus-noise ratio expression discussed in [21] for single-cell MIMO scenario. The precoder vectors are computed by where T p b is the per-BS power constraint. One possible solution to (9) can be written as [13] w Where As for the DZF, the expression above was selected to make h b,k,l H w b,k,l (DVSINR) positive and real valued, which means that the signals arriving at a given terminal from different BSs will also add constructively.

Distributed power allocation strategy
A centralized power allocation approach based on the minimization of average BER was proposed in [14]. Essentially, we perform the minimization min p b;k;l f g under per-BS power constraint, for the precoder vectors given by (8). However, this strategy requires the knowledge of all equivalent channels, h b,k,l H w b,k,l (DZF) , ∀ b, k, l at the CU, to jointly compute the powers p b,k,l , ∀ b, k, l. In this article, we derive a new distributed power allocation algorithm, computed locally at each BS and using only the knowledge of local CSI that minimizes the average VBER over the available subcarriers. Note that minimizing the VBER over the available subcarriers we have more DoF to improve the system's performance as discussed in [22], for point-to-point communications.
To derive the distributed power allocation, we assume that the interference is negligible for both precoders. This is based on the observation that with the DVSINR precoder, the interference is negligible at both low and high SNR. Thus, the same strategy can be used to deduce the power allocation for both precoders. This approach has been followed by some other works, where the power allocation strategy used for the ZF-based precoders can be also employed for the non-ZF-based ones [23]. Assuming an interference-free system, for both precoders, (5) can be simplified as The above expression cannot be used to derive distributed power allocation because it would imply the knowledge of non-local channel gains, i.e., the equivalent channel gains between all BSs and the user k, at BS b. Therefore, we define a virtual SNR b,k,l as the power of the equivalent channel between bth BS and the kth UT on lth subcarrier plus a parameter (which account for the nonlocal contribution) over the noise, given by For pression corresponds to the SNR k,l one given by (12). To avoid the exchange of the instantaneous CSI between the BSs, two strategies can be considered to compute d b,k,l , namely, it can be set to zero d b,k,l = 0 or using long-term values of the equivalent channels. When the parameter d b,k,l = 0, the powers at each BS are computed ignoring the contributions from the others BSs on the desired received signal, i.e., the powers are computed at each BS using only local information. This strategy can be seen as the worst case (WC). When d b,k,l ≠ 0 the powers are computed taking into account some channel information from the others BSs, i.e., there is some cooperation between BSs to compute the powers. Based on (13), we define the average VBER as Note that (14) does not represent any real average BER. Considering the multicell system as a superposition of B single cell systems, as shown in Figure 2, (14) can be seen as the average VBER of the bth single-cell system. The motivation to use (14) is that the minimization of the average VBER reduces the dynamic range of the VSNRs between the different UTs and subcarriers, i.e.,  leads to an equalization of the VSNRs over all UTs and subcarriers (more power is allocated to the weaker links and less to the stronger ones as compared to equal power allocation approach), which implicitly leads to an equalization of the SINRs and therefore provides user fairness at the cell edges. The power allocation problem at each BS b, with per-BS power constraint, can be formulated as The Lagrangian associated with this problem can be written as where μ b ≥ 0 and λ b,k,l ≥ 0 are the Lagrange multipliers [24]. Since the objective function is convex in p b,k,l , and the constraint functions are linear, this is a convex optimization problem. It is necessary and sufficient to solve the Karush-Kuhn-Tucker conditions, given as with h b,k,l eq = h b,k,l H w b,k,l (type) . Let us assume that μ b = 0. Therefore, from the first equation of (17) we see that λ b,k,l < 0. However, by the third line of Equation (17) we know that λ b,k,l ≥ 0, a contradiction. Consequently, μ b is always positive (μ b > 0) and the power constraint, at each BS b, is always active X N c l¼1 X K k¼1 p b;k;l ¼ T p b . Additionally, by removing the positivity constraint of p b,k,l and solving optimization problem (15) we get an optimal solution with all p b,k,l ≥ 0. Henceforth, the optimal solution of problem (15) is independent of constraints p b,k,l ≥ 0 and λ b,k,l = 0. As shown in Appendix, assuming d b,k,l = 0 and λ b,k,l = 0, i.e., for the WC, the powers p b,k,l as function of the Lagrange multiplier μ b are given by where and W 0 stands for Lambert's W function of index 0 [25]. This function W 0 (x) is an increasing function with W 0 (0) = 0 and W 0 (x) > 0, x > 0. Therefore, μ b can be efficiently determined iteratively to satisfy X N c l¼1 X K k¼1 p b;k;l ¼ T p b by using the bisection method. For that a sub-interval in which the root μ b must lie should be provided. It can be shown that the Lambert's W 0 (x) function is bounded by, Thus, we can derive a lower bound for the root μ b , given by and for faster algorithm's convergence the upper bound should be as close to as the lower bound, thus α should be chosen as e 1þe and therefore the upper bound is given by thus the root . This scheme is referred as minimum VBER WC power allocation (MVBER WC). The corresponding algorithm can be described, in pseudo code, as follows For the case where d b,k,l ≠ 0, to the best of the authors' knowledge no solution based on Lambert's W function can be derived, but the powers can be computed by solving directly (15) using for example the interior-point method [26]. However, as discussed in Section 5, for this case the complexity to compute the powers is much higher than for d b,k,l = 0. One possible selection for d b,k,l could be Considering the DZF precoder, the average power of the equivalent channels, h j,k,l H w j,k,l (type) , is given by In this case, the long-term channel powers, ρ j,k,l , j ≠ k, should be either feedbacked from the UTs to the BS b or shared by the backhaul network. This scheme is referred as minimum VBER long-term channel power allocation (MVBER LTC). Note that for the VSINR precoder it is difficult to obtain a closed-form expression for the average power of the equivalent channels E[|h j,k,l H w j,k,l (DVSINR) | 2 ].

Numerical results
In this section, the performance of the different distributed power allocation strategies will be obtained numerically. Also, some insights regarding the complexity of the different approaches are given. The scenario consists of K uniformly distributed single antenna UTs in a square with BSs in each of the corners. The power decay is proportional to 1/r 4 , where r is the distance from a transmitter. We define the SNR at the cell edge as where the ρ c represented the longterm channel power in the center of the square. This represents a scenario where terminals are moving around in the area covered by four BSs each equipped with four antennas. The main parameters used in the simulations are based on LTE standard [27]: FFT size of 1024; number of available subcarriers set to 128; sampling frequency set to 15.36 MHz; useful symbol duration is 66.6 μs, cyclic prefix duration is 5.21 μs; overall OFDM symbol duration is 71.86 μs; sub-carrier separation is 15 kHz, frame size set to 12 OFDM symbols, modulation is QPSK and channel code is the convolutional turbo code (CTC) with block size of (6144, 3072). The code rate was set to 1/2 and a Max Log MAP algorithm with eight iterations was used. Also, we used the LTE extended typical urban channel model with nine taps [28].

Complexity analyses
In this section, the complexity of the different approaches is evaluated numerically. We compare the average running time for the algorithm MVBER WC (d b,k,l = 0) for the cases where the search interval is restricted to the derived interval and when there is no a priori bounding of the interval, i.e., the search is over 0 Inf ½ , where Inf is the maximum software number representation. We also evaluate the average running time for MVBER WC (d b,k,l = 0) solving directly (15) using the interior-point-method (IPM), here referred as MVBER WC IPM. For this latter case, the complexity is approximately the same as the one of the algorithm using d b,k,l ≠ 0. The stop criterion for the algorithms using the bisection method (MVBER WC where i is the index for the iteration and ε is the chosen convergence threshold. For the one using the IPM, the stop criterion is p b,k,l (i − 1) − p b,k,l (i) ≤ ε, ∀ b, k, l. The results of Figures 3 and 4 were obtained setting ε = 10 − 8 . This parameter was also used to obtain the curves presented in Figures 5, 6, 7, and 8.
The results of Figure 3 are presented in terms of the ratio between the average running time of the MVBER WC IPM over the one of MVBER WC  (curves A in Figure 3) and MVBER WC 0 Inf ½ (curves B in Figure 3), as function of the number of users. The average running times of the different algorithms have been measured over 10 3 trials and we obtained results for two operation points: Cell Edge SNR = 0 and 12 dB. As can be observed from Figure 3, the average running time of the MVBER WC IPM is approximately 120 and 500 times more than the proposed one MVBER WC μ b LB μ b UB Â Ã for K = 2 and K = 4, respectively. Also, we can see that the gain of the MVBER WC 0 Inf ½ against MVBER WC IPM is modest. This means that if the interval for the bisection method is not efficiently computed the gain relatively to the MVBER WC IPM is low.
In Figure 4, we present results in terms of the ratio between the average number of iterations required of the MVBER WC 0 Inf ½ over the one of MVBER WC The curves are shown as function of number of users and the SNRs considered were the same used for Figure 3. As can be seen from the figure, the average number of iterations required for the MVBER WC 0 Inf ½ to achieve the solution is approximately 9.5 and 6 times more than the ones required by the proposed algorithm MVBER WC for the cases of 2 and 4 users, respectively (for cell-edge SNR = 12 dB). Considering the low SNR regime the gains are slightly lower. We can observe a gain (in terms on number of required iterations) of approximately 7.5 and 5   Figure 6 Performance evaluation of the distributed power allocation schemes for k = 3 and uncoded data. . Also, we can see that the gain decreases as the number of users increases for the both SNRs regimes.

Performance evaluation
We compare the performance results of the proposed distributed power allocation schemes, MVBER WC for both precoders and MVBER LTC for DZF one. Also, these schemes are compared with two different power allocation strategies: equal power allocation approach, i.e., the power available at each BS is equally divided by the users and subcarrier, p b;k;l ¼ T p b =KN c ; ∀ b; k; l ð Þ, referred as EPA; DZF with joint centralized power allocation as proposed in [14], referred to here as centralized MBER power allocation (CMBER). We also present the curve for the DVSINR with joint centralized power allocation using the same strategy as for the DZF, also referred as CMBER. Figure 5 shows the performance results considering K = 4 and uncoded data. The results are presented in terms of the average BER as a function of cell-edge SNR defined above. From the figure, we can see that the performance of the proposed distributed power allocation schemes for both precoders outperforms their equal power, i.e., the DZF EPA and DVSINR EPA ones, because they redistribute the powers across the different users and subchannels more efficiently. As can be seen in Figure 5, the gain of the MVBER WC power allocation scheme is approximately 1 dB for both precoders (BER = 10 -3 ) when compared with the equal power strategy. The results show that knowing the non-local LTC powers at each BS the performance can be improved namely at high SNR regime, we can observe a gain of approximately 0.5 dB of the MVBER LTC against MVBER WC, for BER = 10 -3 . Also, the performance can be improved whether the powers are computed jointly at the CU to minimize the real average BER (approximately of 3 dB gain of the CMBER against the MVBER WC for DZF precoder at BER = 10 -3 ). However, this strategy requires more feedback load over the backhaul network as compared with the full distributed approaches. Figure 6 shows the performance results when the number of UTs is reduced to 3. In this scenario, we have more (DoF) since N t b > K . It can be observed that increasing the DoF, the DZF tends to the DVSINR. This behavior is similar to the single-cell systems where the precoders based on ZF criterion tends to the ones based on MMSE as the number of transmit antennas (or DoF) increases or at high SNR. From these results it is clear that the gains with power allocation schemes relatively to the EPA case are lower than in the previous scenario. Also, the gain obtained with the centralized power allocation against the full distributed approaches is lower. In this plot, the curve for the approach MVBER LTC is omitted for clarity, since its performance is approximately the same as MVBER WC. This means that only for full-load scenarios, i.e., N t b ¼ K , the knowledge of long-term equivalent channel variables bring some improvements regarding the MVBER WC approach.
Although our power allocation scheme is based on the minimization of the virtual uncoded BER, we also assess the impact of our scheme on a coded system. In Figures 7 and 8, we depict the performance results for the same scenarios of Figures 5 and 6, respectively, but now considering the CTC specified above. From this figure, we basically can point out the same conclusions as for the results obtained in Figures 5 and 6. The gain of the MVBER WC power allocation scheme for both precoders is approximately 1 dB (BER = 10 -3 ) when compared with the equal power strategy. The penalty regarding the joint centralized approach is approximately of 1.2 dB at BER of 10 -3 . In these plots, the curve for the approach MVBER LTC is also omitted for clarity, since its performance is approximately the same as MVBER WC. This means that for practical scenarios the knowledge of long-term equivalent channel variables does not bring significant improvements regarding the MVBER WC approach.

Conclusions
In this article, we proposed a new distributed power allocation scheme for distributed precoding approaches, namely DZF and DVSINR, and for the downlink multicell MISO-OFDM-based systems. Both the precoders and power allocation schemes were computed at each BS just by assuming the knowledge of local CSI or long-term equivalent channel non-local statistics. We defined the VBER by treating the multicell system as a superposition of single-cell systems. The metric used to derive the power allocation scheme, minimization of VBER, implicitly provides user's fairness at the cell edges. We also obtain upper and lower bounds for the Lambert's W function of index zero that can be used to allow an efficient computation of the power allocation coefficients.
The results have shown that the proposed distributed power allocation scheme outperforms the equal power ones with moderate complexity. When the number of DoF of the equivalent channel variables increases the DZF-based approaches tends to the DVSINR ones, and the performance of the distributed power allocation schemes also tends to the joint centralized strategies. Furthermore, the minimization of the virtual uncoded BER produces an effective improvement on the performance of coded data.
It is clear from the presented results that the proposed distributed precoding scheme can be of significant interest for the design of next generation wireless networks, which are expected to employ cooperation between BSs.
From (31), f '(x) is a strictly increasing function since W0(x) is also strictly increasing. Therefore, f '(x) has at most one zero (x 0 ) and due to it monotonic properties f(x 0 ) is the global minimum of f (x) Solving the inequality f (x 0 ) ≥ 0 we get Hence, since f (x 0 ) is the global minimum of f (x) f x ð Þ≥0; x≥0∧α∈ 0; e 1 þ e As a consequence of (41), we obtain the following Lambert function lower bound Competing interests The authors declare that they have no competing interests.