Design and measurement-based evaluations of coherent JT CoMP: a study of precoding, user grouping and resource allocation using predicted CSI

Coordinated multipoint (CoMP) transmission provides high theoretic gains in spectral efficiency with coherent joint transmission (JT) to multiple users. However, this requires accurate channel state information at the transmitter (CSIT) and also user groups with spatially compatible users. The aim of this paper is to use measured channels to investigate if significant CoMP gains can still be obtained with channel estimation errors. This turns out to be the case, but requires the combination of several techniques. We here focus on coherent downlink JT CoMP to multiple users within a cluster of cooperating base stations. The use of Kalman predictors is investigated to estimate the complex channel gains at the moment of transmission. It is shown that this can provide sufficient CSIT quality for JT CoMP even for long (>20 ms) system delays at 2.66 GHz at pedestrian velocities or, for lower delays, at 500 MHz, at vehicular velocities. A user grouping and resource allocation scheme that provides appropriate groups for CoMP is also suggested. It provides performance close to that obtained by exhaustive search at very low complexity, low feedback cost and very low backhaul cost. Finally, a robust linear precoder that takes channel uncertainties into account when designing the precoding matrix is considered. We show that, in challenging scenarios, this provides large gains compared with zero-forcing precoding. Evaluations of these design elements are based on measured channels with realistic noise and intercluster interference assumptions. These show that high JT CoMP gains can be expected, on average over large sets of user positions, when the above techniques are combined - especially in severely intracluster interference limited scenarios.


Introduction
Shadowed areas and interference at cell borders pose challenges for future wireless broadband systems. A potentially powerful remedy would be coordinated multipoint (CoMP) transmission, using remote radio heads or coordination between cellular base station sites. It can overcome interference limitations in cellular radio networks and also provide coverage gains. The first steps towards support for CoMP have recently been added to the 3GPP LTE standard in Release 11 [1].
CoMP techniques for downlink transmission are often categorized into two groups [2,3]. With joint transmission *Correspondence: rikke.apelfrojd@signal.uu.se Signals and Systems, Uppsala University, Box 534, Uppsala 751 21, Sweden (JT), sometimes referred to as joint processing, user data is transmitted via several access points. The second group uses coordination for interference avoidance without sharing user data, using, e.g. joint scheduling (JS) and/or joint beamforming (JB) (see, e.g. [4]). The later techniques are often considered to require less backhaul capacity and to be more robust to inaccurate channel state information at transmitters (CSIT). Joint transmission can provide higher potential gains in spectral efficiency at full load (see, e.g. [3,5]), by converting harmful interference power into useful signal power. For example, coherent JT CoMP was in [6] found to have the theoretical potential to multiply the spectral efficiency at 10% outage by a factor of 5 for terminals and base stations with single antennas. These gains are especially important for users at cell edges [7].
However, much less spectacular results are provided by recent system level simulations. Evaluations of coherent JT CoMP within 3GPP have resulted in gains in average spectral efficiency of below 27% for homogeneous deployments using 4 × 2 MIMO transmission [8].
These large discrepancies raise questions that have motivated our research: What reduces the large potential gains of JT CoMP? Can large improvements be obtained for most users, or only for a small subset of users, e.g. those close to cell edges? What combinations of scheduling strategies and beamforming algorithms are efficient for realistic coordination topologies, propagation conditions and CSIT quality?
Answering such questions requires a joint study of multiple aspects of the problem and their interactions, in particular the assumed propagation environment, the cooperation architecture, the CSIT quality, physical layer techniques, scheduling and the grouping of users who participate in cooperation. We here investigate an important subset of these issues for downlinks of orthogonal frequency-division multiplexing (OFDM) systems, mainly considering frequency-division duplexing (FDD). One focus is the effect of imperfect CSIT due to mobility. To obtain results for realistic propagation conditions, we mainly use measured channels from channel sounding signals in an urban environment for 20-MHz OFDM downlinks. The measurements use simultaneous transmissions from three single antenna sites to a moving terminal. Large numbers of combinations of user positions are investigated and CSIT is obtained by Kalman channel predictors. These provide the best attainable quality of imperfect channel estimates.
Preliminary results obtained under these conditions were reported in [9]. A robust linear precoder performed joint coherent transmission from the three single antenna base stations to three single antenna terminals. These moved along randomly selected segments along the measured route at pedestrian velocities. The performance was here improved greatly for a minority of user sets by using JT CoMP, as compared to using conventional cellular transmission. However, the average spectral efficiency over all investigated sets of user positions was reduced. Such rather pessimistic results (obtained with imperfect CSIT) would be consistent with those recently reported in [8] that assumed perfect CSIT.
New results presented here are significantly more positive for the potential of JT CoMP: Large gains are obtained for a large majority of investigated user positions.

Contributions
We investigate and develop a transmit strategy for coherent JT CoMP by a step-by-step evaluation of its various components and interactions, leading to the following main conclusions and results.
First, one issue with CoMP is that significant coordination delays over backhaul links might eliminate the potential for CoMP gains. We show that channel prediction enables large average performance gains when using linear coherent joint transmission at pedestrian velocities for total delays of over 20 ms at 2.66 GHz. For lower delays, the same conclusion holds for higher-mobility users. CoMP would, e.g. remain possible at 500 MHz carrier frequencies for velocities up to 120 km/h, if the total delays are 5 ms.
Second, two parts of a JT CoMP design that are crucial for the average performance gains are the means for resource allocation over frequency-selective OFDM downlinks and the user grouping, i.e. the formation of groups of users who will share a particular time-frequency resource block.
We here introduce and evaluate a user grouping scheme with very low complexity, 'User groups provided by cellular scheduling' . This user grouping strategy is based on local scheduling in the base stations, and it can (but does not have to) utilize already existing scheduling algorithms. In many papers with 2 to 3 base stations and singlecarrier transmission, the authors have intuitively used a user grouping scheme similar to this, often with all users placed at the same distance to their nearest base station site. However, to the best of our knowledge, this has never been compared with other schemes nor is it usually motivated by the authors using it. At much lower complexity than, e.g. greedy user selection, this strategy provides spatially good (although not optimal) user groups that improve the sum rate performance when using linear precoding. It preserves multiuser diversity gains and also requires less feedback and less backhaul capacity than alternative strategies proposed previously. For systems with many users, the backhaul demand for transmission control can even be significantly lower than that for JS/JB CoMP. Using this scheme, JT CoMP can improve the sum capacity for essentially all investigated combinations of user positions. On average over random sets of user positions, it is increased by up to 54% as compared to cellular transmission, with imperfect CSIT at full system load.
Third, a main mechanism behind the sometimes disappointing performance of JT CoMP is highlighted: The different distances involved from sets of transmitters to the different receivers will often generate hard-to-invert joint channel matrices. This results in precoders with large differences in the scaling of their elements. A joint linear precoding design under a per-antenna power constraint is then forced to reduce the transmit powers of the closest base station to a user far below the allowed power to obtain a balanced solution. This effect reduces the total transmit power for a cluster of transmitters that http://jwcn.eurasipjournals.com/content/2014/1/100 participate in joint transmission, often with the result that out-of-cluster interference and noise reduce performance below that of single-cell transmission. The proposed user grouping strategy alleviates this problem.
Finally, since the CSIT is uncertain, robust techniques for joint precoder design are of interest. The robust linear precoder (RLP) design, introduced in [9], is here investigated further and is developed into a versatile tool for design of linear joint precoders. Robust design is most easily performed for mean square error (MSE) criteria. The RLP is here designed to optimize more general criteria by using a low-dimensional iteration over weighting matrices in a closed-form robust precoder design. We here provide sufficient conditions for the closed-form robust design to minimize a weighted sum of intracluster interference and transmit powers under imperfect CSIT accuracy for known second-order moments of the statistical uncertainties. We also show that imperfect CSIT due to quantization is straightforwardly included into the design.
We investigate under what conditions a robust JT design provides benefits by comparing to a simple zero-forcing (ZF) design. Also, we observe that the interplay between channel prediction errors, opportunistic scheduling and precoder design increases the multiuser scheduling gain when using CoMP, relative to single-cell transmission.
These results, taken together, in our opinion indicate that large performance gains are indeed possible by using linear JT CoMP techniques that can be designed with reasonable computational complexity.

Assumptions, design choices and related work
The potential for coherent JT CoMP was shown in [10] to be highest for low-mobility users, as compared to joint scheduling and to the use of noncoherent JT CoMP. We therefore here focus on coherent JT CoMP, also referred to as network multiple-input multiple-output (MIMO) or multi-cell MIMO (see, e.g. [5,6,11,12]), for low-mobility users.
Although, the largest gains are achieved with nonlinear precoding techniques such as dirty paper coding [6], complexity currently makes nonlinear precoding unfeasible for most realistic systems. We here focus on a lowcomplexity linear precoding solution. Zero-forcing linear precoders [13] are here a frequently studied alternative.
Coordination over a very wide area would provide the highest performance, but would be unrealistic due to computational complexity, delay constraints and capacity constraints in the fixed network. Therefore, we consider the use of CoMP within limited coordinated sets (clusters) of N transmitters distributed over N B cells. In cellular transmission, the transmitters belonging to each cell are coordinated, but they are uncoordinated to the transmission in other cells. In CoMP that uses clustered joint transmission, the aim is to suppress the intracluster interference when jointly transmitting to M g users. With perfect CSIT, the intracluster interference can then be eliminated by phase cancellation when N ≥ M g .
The cluster size, i.e. the number of cooperating cells per cluster, involves a trade-off. A larger size ideally provides larger gains relative to cellular transmission, since a lower fraction of users are then located at cluster edges, but introduces a higher computational burden. Investigations in [11,14] show that a cluster size above 7 to 9 cells will not provide large additional gains for systems with MIMO links. In [15], for few base station antennas, a cluster that used transmitters at three separate sites was adequate to attain most of the achievable CoMP gains (see also [16]). Our evaluations in Sections 6 and 7 focus on a cluster size of three sites, partially motivated by the results of [15] and partially due to the limitations of our measurements. An important aspect is to limit the remaining intercluster interference. An interesting scheme proposed in [14] and further evaluated in [17] uses cluster-specific antenna tilting and power control for this purpose. We have in our investigations adjusted the interference statistics to approximate the one that would be generated by the scheme of [14].
Near accurate CSIT is important for multi-user MIMO [18] and for coherent JT CoMP [19]. We here evaluate schemes under the imperfect CSIT that would be due to the main unavoidable causes: noisy estimates and outdated CSIT due to signaling delays. Users are assumed to move at pedestrian velocities at 2.66 GHz. This setting results in large channel estimation errors due to outdating when channel prediction is not used. It has previously not been clear if the use of channel prediction helps CoMP performance in a significant way. Promising results based on simulations were reported in [19], using adaptive recursive least squares prediction. A preliminary simulation study in [20] investigated a two-user, two-cell scenario. The recent paper [21] investigated this question theoretically, in the limit of large numbers of antennas per base station, but did not use a per-base station transmit power constraint, so it is hard to draw conclusions from these results.
Channel predictors are here assumed to be located in the user terminals. They report the predictions to their strongest base station. The base stations then transmit the reports over a backhaul link to a central control unit (CU) for the cluster which jointly designs the beamformers.
Kalman prediction of MIMO OFDM channels, outlined in Section 3 and Appendix 1 has been investigated in, e.g. [22,23]. We here investigate its use in a CoMP setting, focusing on two requirements that are peculiar to this setting: (1) Transmit antennas located at different sites will be at different distances while their channels, with differing signal-to-interference-and-noise ratio (SINR), have to be estimated jointly. The weakest signals will in general http://jwcn.eurasipjournals.com/content/2014/1/100 be estimated with the lowest accuracy. The effects of this on the choice of pilots, the resulting precoder matrices and capacity performance need to be understood. (2) Channels may need to be predicted over long prediction horizons, due to the coordination delays.
Since significant model errors will be present, the precoder (the set of joint beamformers) should furthermore be designed to be robust with respect to (w.r.t.) the expected errors. Implementation without unrealistic computational complexity is here in focus, so we will restrict attention to linear precoders. We mainly use a versatile scheme with reasonable design complexity, the iteratively adjusted RLP introduced in [9] and further developed in Section 5 and in Appendix 2. This averaged robust design is used since it is less conservative than the minimax schemes in, e.g. [24,25]. A useful property of the RLP is that the channel uncertainty in the form of covariance matrices that are provided by Kalman predictors can be directly used in its adjustment.
In the optimization of a criterion such as the weighted sum capacity for the involved terminals, the RLP design utilizes the analytical solution to an MSE-optimal linear robust precoder and iteratively optimizes over criterion weights used by this design. This MSE-optimal analytical solution constitutes a special case of robust feedforward control filters for dynamic (frequency-selective) systems, previously developed in [26][27][28]. Robust linear precoders that minimize MSE by averaging over CSIT uncertainty have more recently been highlighted for multiple-input single-output (MISO) transmit schemes by [29,30] and for multiuser and MIMO downlinks in [24,31]. Very few solutions have been proposed for robust linear precoder design for more general performance criteria.
Many proposals form user groups for CoMP, as, e.g. [32,33], by first forming the user group and then allocating it to a transmission resource. This can provide groups with spatially compatible users, but may sacrifice some of the potential multiuser scheduling gain, since the frequency-domain variability of channels to users is not taken into account. Another approach is to use a greedy algorithm as in, e.g. [34][35][36] that assigns one user at a time to a given resource, forming a near-optimal solution both in terms of spatially compatible users and exploiting multiuser diversity. This, however, requires repeated pre-evaluation of beamformers, resulting in a high complexity. Greedy user grouping will in Section 7 be compared to the user grouping scheme we propose, but due to high complexity, we use a block-fading model rather than the whole measured channel statistics for this particular comparison.

Notations
In the following,Ē [·] averages over the distribution of channel model errors, E [·] averages over the statistics of noise and message symbols, · denotes the 2-norm of a vector, tr (·) is the trace of a matrix, Re (·), (·) T and (·) * denote the real part, the transpose and the Hermitian transpose of a matrix, respectively. The unit matrix is denoted I. For simplicity, we shall enumerate the users such that users 1, . . . , M g are in the selected user group for the subcarriers considered. The Kronecker delta function is denoted δ ij . Unless otherwise explicitly stated, (·) jn denotes element j, n and (·) j denotes column j of a matrix or the jth element of a vector. The indices i and m are user indices, j and n are base station indices, t and τ are time indices and k and q are subcarrier indices. We shall denote the base station that, on average over all subcarriers and over the small-scale fading, has the strongest channel gain to a user as that user's master base station.

Channel model
We assume an OFDM downlink with K subcarriers, over which M single antenna users are served by a coordinated cluster of N transmitters controlled by N B base stations, where each base station may control several transmit antennas. If M g ≤ M users are selected to be served jointly on the kth subcarrier at OFDM symbol τ , then their received signals y k (τ ) ∈ C M g ×1 , after OFDM receiver processing, are Here, n k (τ ) ∈ C M g ×1 is the sum of noise and outof-cluster interference (we will henceforth call it noise), modeled as independent and identically distributed (i.i.d.) white noise with zero mean and known variance, u k (τ ) ∈ C N×1 is the vector of transmitted signals and H k (τ ) ∈ C M g ×N is the channel matrix where H k ij (τ ) is the complex channel gain from transmitter j to user i. The assumption that n k (τ ) can be modeled as i.i.d. white noise with known variance is a simplification. It is relatively reasonable in the here considered downlink, since the intercluster interference consists of contributions from many base stations, that each transmit to many users. The resulting averaging of contributions would tend to stabilize the variance of n k (τ ) and to make it predictable. (The assumption of a knowable noise variance would be more problematic in the uplinks, where intercluster interference could be dominated by bursty transmission from a few user terminals). There exist methods for noise floor estimation [37].
Time and frequency synchronization with respect to all N transmitters is assumed to be adequate, in the sense that any intersymbol and intercarrier interference can be modeled as parts of the noise n k (τ ). It is also assumed that any frequency errors, causing rotation of elements of H k (τ ) over time can be handled by the tracking ability of the (Kalman) channel estimation.

Channel predictions
For mobile users, the delays created by link adaption and CoMP processing will cause the CSIT to be outdated. This can partially be compensated by using channel predictions. To investigate the effectiveness of the channel prediction in a CoMP setting, we utilize Kalman predictors, which provide minimum mean square error (MMSE)-optimal predictions if the channel fading statistics are known. Therefore,Ē H k (τ ) = 0 and . Kalman prediction can be performed either in the time domain (for channel impulse response components) or in the frequency domain for the complex channel gains H k ij (τ ). These provide comparable accuracy [22] and we have chosen the frequency domain approach.
We consider FDD system downlinks, so predictions are based on downlink measurements of known antenna specific reference symbols (RS), or pilots. We will assume that the RS have regular time and frequency spacing, τ and f . The predictors are here assumed to be localized in the user terminals. For every RS-bearing subcarrier, the ith terminal predicts its channels from several base stations within the cluster. Depending on the choice of user grouping strategy, described in Section 4, all M users that might potentially use a resource then report either the full CSIT and/or some Channel Quality Indicator (CQI), such as SINR, to their master base station.

Short-term fading models
The Kalman predictor requires statistical models of the correlation properties of the channels over time and frequency to adjust the channel estimate according to the short-term fading. For this, we use autoregressive (AR) models of order n a . The AR models at w RS-bearing subcarriers of the channels from the N transmitters to the M users can then be realized in state space form. The dynamics of each complex channel gain is then modeled by using n a state variables. At user i, Here, the integer t represents time steps spaced by τ , x(t) ∈ C (w·n a ·N)×1 is the vector of state variables, e(t) ∈ C (w·N)×1 is the zero mean process noise with covariance matrix Q, and for Kalman predictor number q = 0, . . . , K CRS −1 w where K CRS is the number of RS-bearing subcarriers. Note that the superscript index qw, qw + 1 . . . in (4) represents a frequency spacing of f , while k in (1) represents a frequency spacing of f /n CRS where n CRS is the RS spacing in number of subcarriers. The prediction accuracy can be improved by increasing the number w of subcarriers that are predicted jointly, by averaging the noise. However, this comes at a cost of higher computational complexity which grows as O w 3 [22].
The matrices A, B, C and the covariance matrix Q can be updated based on past channel estimates at an interval that is related to the time constant of the shadow fading (see [23] and chapter 4 of [22]).

Kalman predictor
Based on the AR fading models (3), each user is assumed to have a set of Kalman filters that provide filter estimatesx(t|t) of the state vector in (3) and also covariance matrices Please see Appendix 1 for further aspects on the filter design.
MMSE-optimal predictions of the states x(t) and channel component vector (4) can then be calculated from the filter estimates. The required prediction horizon is ϑ t, where ϑ ∈ N. It corresponds to the delay of the entire transmission control loop, including channel predictions, feedback, scheduling, joint precoding and any additional delays. The vector of channel predictions for a time horizon ϑ RS ahead,ĥ(t + ϑ), at the ith user is obtained from the filter estimatex(t|t) by extrapolation in time. Equation (3) is iterated ϑ steps and future noise terms e(t + 1), . . . , e(t + ϑ − 1) are set to their average values of zero. This giveŝ The state prediction error covariance matrix is computed recursively starting with the covariance matrix P(t|t) of the filter estimate: Covariances of the prediction error h(t) of the channels to one user can be described by the matrix As mentioned above there is a trade-off in the choice of the number w of subcarriers estimated by each Kalman filter. We here keep this parameter low and, in a second step, reduce prediction errors further by Wiener smoothing over estimates for all subcarriers. The true prediction error covariances then differ from those of (7) due to two effects. First, the AR models (3) are imperfect which http://jwcn.eurasipjournals.com/content/2014/1/100 increases the errors. Second, Wiener smoothing over frequency decreases the errors. In our studies, these two effects leave the variance of the prediction error slightly less than that given by (7). The use of the accurate covariance instead of (7) would cause only minor noticeable difference in precoder performance and only for systems with very low noise power. We shall therefore use (7) in the precoder design in Section 5.

UE allocation and scheduling
Appropriate user grouping is important if CoMP is to improve the rates for all participating users. Out of M users, M g ≤ N users will be selected for JT within a resource block. In [9] a preliminary investigation was performed where groups of three users were formed by random placement along a route for which measured channels from three sites were available. Figure 1 illustrates the received powers from the three sites along the measurement route. It then became evident that singlecell (SC) transmission in many situations outperformed coherent JT CoMP since JT might help some users but not all within the group simultaneously.
A subsequent analysis showed that for most of the CoMP groups that led to SC transmission outperforming CoMP, all three users had poor channels to the same base station. This led to a poorly conditioned channel matrix H, which forced the precoder design to reduce the total transmit power to fulfill a per-base station power constraint. This reduced the SNR as compared to SC transmission.
To solve this problem, we here propose to perform scheduling decisions locally at each base station and will show that this automatically creates good (although not optimal) CoMP groups. This scheme has the benefits that it has very low complexity and would be easy to implement in existing systems. It can furthermore utilize already existing scheduling algorithms. It generates no extra control signaling backhaul load since all decisions can be made locally at every base station. The proposed solution will in Section 7 be compared to the use of random user groups, to a Greedy user grouping (GUG) algorithm described below and to the optimal solution.

User groups provided by cellular scheduling (CUG)
This is our main proposed strategy to create diagonaldominant channel matrices that then become relatively easy to invert in the CoMP precoder design. We first present this scheme, denoted as cellular user grouping (CUG), for single antenna base stations. All users with the same master base station are then locally scheduled on orthogonal subcarriers by a scheduler connected to their master base station, as shown in the example in Figure 2. This scheduling is based on a CQI metric. For the schedulers explored in this paper, the CQI for user i at resource block b, CQI b,i , is given by the average estimated channel gains from all antennas at that user's master base station.
On each resource block, the scheduled M g ≤ N users within the cooperation cluster (with equality if each base station is the master base station of at least one user) will then form a CoMP group. These users, which all belong to different cells, are to be served jointly by all base stations in the cluster, including base stations that are not the master base station of any of these users. The full CSIT used in the precoder design then only needs to be fed back and transmitted over backhaul by the users that have been scheduled and only for a scheduled resource. Two-step feedback approaches such as this have been investigated in [39] for multiuser MIMO and in [40] for CoMP.
The score-based (SB) scheduler proposed in [41] will be used in evaluations. It represents a fair scheduler in the sense that all users belonging to the same master base station are given approximately the same amount of resources. For each user, a score is computed for each resource block that indicates the ranking of its CQI relative to those of other resource blocks. Assuming scheduling over b = 1, . . . , B resource blocks, block l will for user i have a score of Here > denotes a logical comparison resulting in 1 if true and 0 otherwise. The user with the highest score will be allocated to the resource block l. The use of score-based scheduling to create the user grouping will be denoted SB-CUG.
A second scheduler to be used is a close to optimal sum rate scheduler that always chooses the user with the highest estimated rate for every frequency resource. It is here based on the rate a user would have in a cellular system in which no other users within the cluster is served on the same resourcê  (table to the right). Users UE1, UE2 and UE3 will then be served jointly on subcarrier 1, users UE2, UE4 and UE5 will be served jointly on subcarrier 2 and so on.
with P j mast:i ,max being the power constraint for the antennas of the master base station of user i. It is denoted best rate CUG (BR-CUG). The use of this metric to compare attainable rates presupposes that a well-functioning CoMP scheme will suppress intracluster interference. For multiantenna base stations with N A antennas, cellular scheduling proceeds similarly but may allocate up to N A users per frequency resource and base station, using cell-specific beamforming.

Greedy user grouping (GUG)
Here, for every frequency resource the CU uses, an algorithm first searches for the user that, given a specific criterion, has most to gain from entering the group. Then, it searches amongst the remaining users for the user that would provide the largest increase of the criterion value and adds that user to the group. It continues until none of the remaining users can increase the criterion value or until M g = N. We here use the specific criterion function Here, P S,i , P I,i and P N,i are the powers of the signals, the interference and the scalar noise powers at the receiver antenna i = 1, .., M g , respectively. Calculations of the expected values of the powers based on the prediction error statistics is discussed in Appendix 3 . If α i = 1 for all i the sum rate is maximized. We shall denote this GUG with best rate (GUG-BR). If instead α i = 1/r i withr i being the average throughput of user i over already scheduled resources, we get a proportional fair scheduler [42], which will be denoted GUG with proportional fair scheduling (GUG-PF).
GUG should provide better system performance than CUG which generates its user grouping without explicitly taking the resulting performance into account. However, this comes at several costs.

Precoding
A CU for the cluster is assumed to have full information of the reported predicted channels and of the covariances of the prediction and quantization errors of the scheduled users. It designs precoding matrices R ∈ C N×M g for all utilized time-frequency resource blocks. The blocks consists of adjacent OFDM symbols and subcarriers, with at least one resource slot dedicated to a reference symbol. All transmitted symbols within such a resource block will normally be exposed to close to identical channels as at the RS position and can therefore use the same precoder. In the following, time and subcarrier indices within a block are excluded: H ij H k ij (t),Ĥ ij Ĥ k ij (t + ϑ), n n k (t), u u k (t) and y y k (t).
On each subcarrier and for each OFDM symbol within the resource block, the transmitted signal vector, u ∈ C N×1 , is generated by a linear precoder where c is a scalar scaling factor and s ∈ C M g ×1 is the message symbol vector, assumed to be white, have zero mean, covariance matrix I and to be uncorrelated with the noise n. We assume that per-antenna transmit power constraints, P j,max , apply to each subcarrier individually. The scaling factor c in (10) is selected to assure that the transmit powers at the N antennas satisfy where u j is the jth element of the transmit vector u. (A reasonable modification would be to have a sum power constraint over all subcarriers. With a sum rate criterion, this would lead to a water filling power allocation as described in [17], which slightly increases the sum rate performance).

Target system
The system model used for precoder design is shown in Figure 3. Here, u ∈ C N×1 is the transmit signal vector, and z = 1 c Ds ∈ C M g ×1 is the desired received vector. Its desired properties are modeled by a target matrix D which is diagonal, representing the ideal of a complete interference suppression. In a generalization to multiple receiver antennas, D would be block-diagonal. The distances between terminals and transmitters will differ substantially in a CoMP setting. It would therefore be unrealistic to demand equal received power at all users by setting D = I. Instead, the targeted received signal magnitudes (the diagonal elements of D) should be set to realistically attainable levels. This can be done in different ways. We here adjust the targeted received signal magnitudes to the amplitude of the strongest channel for each user This is a very simple way of choosing D. For channel matrices with a dominant diagonal, which often appear, e.g. if all users in a CoMP group have different master base stations, (12) provides a sum rate close to the sum rate that is obtained if D is optimized. Alternatively, in [43] all users are given the same fraction of the transmit power in combination with zero-forcing precoding. This corresponds to an alternative strategy for adjusting the diagonal elements of D. We have investigated both that alternative and numerical optimization of D with respect to the sum rate. We then found little differences in the end result as compared with the use of (12). (However the use of D = I, which is commonly used in zero-forcing precoders for single-cell multiuser MIMO problems, would cause a large loss in system performance in CoMP settings).

Robust linear precoder (RLP)
The RLP scheme uses the closed-form solution to a robust linear quadratic (LQ) optimal feedforward control problem presented in [26,27] as its basic element. It minimizes general robust performance criteria by iterating over elements in penalty matrices of the robust LQ design. The robust LQ design generates a precoder matrix R that minimizes a scalar criterion J. In our case, the criterion includes a weighted difference between target and noise-free received signals, ε = 1 c (HR − D) s (describing the remaining intracluster interference) and a weighted transmit power term. These terms are averaged over all uncertainties and transmit symbol statistics Here, V is a diagonal positive definite matrix and S is a positive semidefinite matrix, both real-valued. The use of these weighting matrices in the design is discussed in Sections 5.2.1 to 5.2.3 below. Theorem 1. For a transmission system (1), model (2) and linear precoder (10), assume thatĒ [ H] = E H quant = 0, thatĒ H * V * V H quant = 0, that S ∈ R N×N has full rank and that s in (10) is white. Then, the precoding matrix R minimizing J by (13) exists and is given uniquely by For a proof, see Appendix 4 .
After obtaining the precoder matrix R RLP by (14), the scale factor c is adjusted to fulfill the transmit power constraint (11). This scales the criterion (13) but does not affect the minimizing precoder matrix. http://jwcn.eurasipjournals.com/content/2014/1/100 The third and fourth terms in the inverse in (14) can be evaluated from the channel error statistics, Here, H n refers to column n of either the prediction error H (for the third term) or the quantization error H quant (for the fourth term). For prediction errors, E[ H n H * j ] is obtained using the covariance matrices CP (t + ϑ|t) C * for each of the M g users provided by their Kalman predictors. Since the terminals are assumed to predict the channels independently, Here (·) k denotes the submatrix of (CP (t + ϑ|t) C * ) from (3), (6) and (7) for relevant subcarrier k.
The matrix element j, n of the fourth term, describing the quantization error covariance of reported predictions, is by (15) determined byĒ H quant,n H * quant,j . This matrix will be diagonal if all channel components are quantized independently. The design works for any specified CSI quantization and feedback schemes, as long as errors introduced by them can be modeled and quantified. For example, assuming individual linear quantization with a properly set maximum power, the diagonal elements of this matrix are given by δ 2 step /12 where δ step is the step size of the quantizer, which may be adjusted individually for each channel component. If the quantization granularity (step size) is individually controlled by the standard deviation of the prediction error, then the quantization error term in (2) can be kept small relative to the prediction error term in an efficient way. The quantization errors would then have negligible impact on the performance metric.
As a comparison to the RLP, we have also investigated the zero-forcing (ZF) precoder with gain control. When M g ≤ N, the minimum norm pseudo-inverse generates the ZF precoder matrix to be used in (10). (When M g < N, other generalized inverses exist that provide better performance under perantenna power constraints than (17) (see [44])). The ZF solution is commonly used and is simple to compute, but model errors are not taken into account. Furthermore, illconditioned matricesĤ generate precoders R ZF with large elements. This results in the use of a large scaling factor c in (10) to fulfill the power constraint (11). The resulting reduction of transmit power decreases the SNR. This is referred to as the power normalization loss problem. Three ways of using the weighing matrices V and S in (13) are outlined below.

Minimizing intracluster interference
Consider V = I and S = I in (13), using a very small real-valued regularization term S * S = 2 I in (14), with = 0 to preserve full rank in the inverse. Then, the transmit powers are almost not penalized and the errors at all receivers are considered equally important. This setup minimizes the sum of intracluster interference powers. It is related to ZF, but takes the channel uncertainty into account. Note that when M g = N,Ĥ −1 exists, V = I, → 0 and H = H quant = 0, then (14) and (17) reduce to the same solution, R = H −1 D.

Optimization w.r.t. an arbitrary criterion
The robust MSE solution of Theorem 1 can be used as a tool for adjusting the precoder matrix R w.r.t. a general criterion Here, P S,i , P I,i and P N,i , are the powers of the signals, the interference, and the scalar noise powers at receiver antenna i = 1, .., M g . Calculations of the expected values of the powers based on the prediction error statistics is discussed in Appendix 3.
Diagonal penalty matrices V and S in (13) provide significant flexibility, and optimization of their elements w.r.t. (18) provides a flexible tool for adjusting the precoder matrix by a low-dimensional numerical search. Here, the elements of V mainly affect the weighting and fairness between users, while the elements of S affect the power balance between transmit antennas.
One particular case is when (18) is set to approximate an unweighted sum rate criterion. Then, the use of a fixed V = I is appropriate. The use of S = I, with being a very small scalar, would then approximately minimize the intracluster interference, but not the sum rate. This is because the noise in (1) is not taken into account in (13) and its impact might be enhanced by the scaling to meet the power constraint through (10). The performance w.r.t. (18) is then for most cases improved significantly by iteratively adjusting a few real-valued diagonal elements of the transmit power penalty matrix S, to re-balance the received powers, interference and noise. This procedure is outlined in Appendix 2.
The solution will be suboptimal but, in a comparative study in [17], we showed that the precoder of (14) performed close to a near optimal linear precoder [45] found http://jwcn.eurasipjournals.com/content/2014/1/100 through a high-dimensional search of all the complex elements of R.
In the evaluations, the RLP will be designed iteratively to maximize an approximation of the sum rate for a given precoder R. This iterative scheme has been found to perform well compared to investigated alternatives.

Addressing user fairness by utilizing the penalty matrix V
User fairness can be incorporated in (18), e.g. by using a weighted sum rate. In a low-complexity optimization that iteratively uses (13), the weighting matrix V can then be used to place a high weight on the interference at some users. These users will then be allocated a larger fraction of the transmit power and experience a higher SIR which directly affects the per-user performance. However, user fairness is also affected by the choice of scheduling criterion as well as the scaling of the target matrix D. The balancing of user fairness by these tools is an interesting topic but has been left out of the scope of the present work.

Channel measurements
All simulations in this section are based on channel sounding measurements carried out by Ericsson Research. Three omnidirectional single-antenna base stations, located at different sites with 350-to 600-m distance, were used to transmit channel sounding orthogonal RS to a measurement van in an outdoor urban environment in central Kista, Stockholm. The measurement parameters are presented in Table 1, and the received signal powers from the base stations are plotted in Figure 1. The measurements are of high quality and can hence be assumed to represent the true complex channel gains in space. For a detailed description of the measurements and channels, see [46,47].

Simulation method and assumptions
To simulate velocities of pedestrian users, and to make the model more 3GPP-LTE like, the data has been upsampled 25 times in time resulting in the parameters presented in the right-hand column of Table 1. The upsampling is done using the fast Fourier transform to ensure that no extra frequency components are added.
In the present investigation, we have focused only on the prediction error part in the error model (2).

Prediction assumptions
The downlink channels from the N B = 3 single-antenna base stations are predicted for the entire measurement route in Figure 1. For this, the fading statistics in time and frequency, represented by fourth-order AR models, are estimated periodically every 1 s. The use of higher AR order than 4 would not significantly improve the prediction performance for this data set. The AR models are based on noise-free channel data, i.e. on perfect CSIT, from the past 1 s. From studying the measured data, we have found that this time interval is appropriate with respect to the long-term fading. It is short enough to ensure that the statistics of the Doppler spectrum stays fairly constant within the interval. It is also long enough to provide appropriate prediction performance statistics and CoMP performance statistics for each interval. For high-mobility users, the interval might need to be shorter.
Signal measurements with an appropriate range of SNRs are created by using (21) in Appendix 1 with a transmit power of P = 1 and additive white Gaussian noise of three different power levels, σ 2 (see Figure 1). On average over all three noise levels, the median SNR is 24 dB at the investigated positions. The SNR CDF is similar to that obtained when applying the intercluster interference mitigation framework of [14,17,48]. That proposal forms overlapping static clusters that use different timefrequency allocations and further controls interference by using different antenna downtilts and transmit powers to the outside and to the inside of each cluster. The noise is i.i.d. over subcarriers for all users.
The channel correlation over frequency determines the covariance matrix Q = E [ee * ] for each user in (3). It is estimated as the sample mean of h k h k+κ * for k = 1, . . . , K CRS − w, κ = 1, . . . , w − 1 and i = 1, . . . , M. Computational complexity increases with w, so we use a low value of w = 4. The channels are predicted for 144 RS-bearing subcarriers using prediction horizons of ϑ = 0, 4, 8, 12 and18 RS. These correspond to distances d λ = 0, 0.06, 0.13, 0.19 and 0.28 wavelengths or time horizons of 0, 5, 10, 15 and 23 ms for the system defined in Table 1. The results for prediction distances d λ are scalable and could be interpreted as predictions for time horizons of d λ · λ c /v at a carrier wavelength of λ c and a user moving at velocity v. For these simulations, http://jwcn.eurasipjournals.com/content/2014/1/100 the Kalman filters are updated in each RS-bearing symbol with t = 1, 3 ms. However, after approximately ten iterations (i.e. after 13 ms), they converge to a constant value for each AR model. This could be utilized in a commercial system to keep complexity low.
Orthogonal RS are used in all results below. The noise powers at the RS-bearing resources might in general differ from those on the payload-bearing resources. In evaluations, we will here use the same power for both cases.
The prediction performance will be evaluated using the normalized mean squared error (NMSE) for the channel from the jth transmitter to the ith user where T is an appropriate averaging interval. The NMSE (20) is averaged in decibels over each 1 s interval for every subcarrier separately.

Scheduling and precoding assumptions
It is assumed that the active users within a cluster have data to receive. The scheduling and precoding methods are evaluated at full system load for two cases. First with M = N = 3 users and second with M = 9 users. The single-antenna users are randomly scattered over the measurement route. At every time slot of length 1.3 ms, the users are grouped and scheduled over the resource blocks, represented by the 144 subcarriers, based on the predicted CSIT. Precoding is then carried out at each time slot as the users move along the route for 0.5 s. A one-dimensional search in the penalty matrix S by (23) in Appendix 2 is used by the RLP scheme to optimize the approximated sum rate (19). The obtained sum rate log (1 + SINR) is then averaged over the 0.5 s for each subcarrier. This is repeated for 1,000 different sets of user starting positions along the measurement route. The same noise power levels as those for the predictions are used. The power constraint is P max = 1 for each transmitter and for each subcarrier.
User grouping results are compared to a random user grouping with round robin scheduling denoted RUG-RR. In that scheme, all M users are randomly subdivided into user groups of size M g ≤ N, with equality (M g = N = 3) in these simulations. Groups are scheduled in a round robin (RR) fashion over frequency, so all M users are served within a time slot.
Precoding results are compared to SC transmission with frequency reuse one. Then, each of the three base stations serves its own users on orthogonal resources, transmitting at full power with no base station cooperation. When SC transmission is compared to RUG-RR, users within a cell are scheduled with RR and when it is compared to SB-CUG, SB scheduling is used.

Prediction performance
The average NMSE of the predictions obtained by the experiments outlined above are presented in Table 2. For comparison, the NMSE achieved if the outdated estimate is used as a predictor is presented in the last (fifth) column. As the prediction horizon increases so does the benefit of using predicted CSIT as opposed to outdated. Due to high transmission delays (>5 ms), current systems would need ϑ > 4 for JT CoMP under the assumptions of Table 1. Therefore, the use of predictions instead of outdated estimates is very important.
For JT CoMP, assume that an interfering scalar complex-valued channel is given by g =ĝ + g, withĝ known,Ē g = 0,Ē ĝ g * = 0 and an NMSEĒ g 2 /Ē g 2 . If this interference is to be canceled by receiving another channel component h, from another base station, then the resulting interference powerĒ g + h 2 is minimized by setting h = −ĝ resulting inĒ g + h 2 =Ē g 2 . Therefore, the maximum attainable relative dampening factor would become of −x dB indicates that we can reduce the interference from that base station by at most x dB. For example, at a prediction horizon of ϑ = 18, the interference from the weakest base station at a given user can on average only be suppressed by 3 to 5 dB. The prediction performance of the weakest base station is far below that of the average performance over all base stations. These poor predictions might become 'bad apples' that infect the quality of the total precoding solutions. A closer study of the effect of using different noise floors and RS SNRs is shown in Figures 4 and 5. As expected, a low noise floor increases the prediction performance. The impact of the RS SNR is largest at short prediction horizons. This is because at long prediction horizons the fading statistics, rather than the noise, is the main limiting factor of the prediction performance, as also discussed in [22].

Precoding performance
In Table 3 the per-cell sum rates are presented for the precoding schemes when M = 3 and when the channels for 1,000 sets of user starting positions are predicted with a prediction horizon of ϑ = 8. When using random user grouping and round robin scheduling (RUG-RR), we see that the two JT CoMP schemes, RLP and ZF, provide small gains as compared to SC transmission. In fact, ZF transmission performs much worse than SC transmission for the most difficult user groups (the 5% percentiles). Comparing ZF with RLP for these user groups, which can be regarded as the toughest CoMP groups, RLP outperforms ZF by almost a factor of 3. There are two reasons for this, the first being that RLP considers the CSIT inaccuracy in the design process and the second being that RLP performs power adjustments through the iterative process described in Section 5.2.2. As discussed in [9], both are important, but the most significant factor is that the RLP takes the CSIT inaccuracy into account. RLP will avoid transmitting power over poorly predicted channels,  which usually coincide with the weak channels. Therefore, RLP will require a lower scaling constant c than ZF, even without using the iterative power adjustment.
With RUG-RR, SC transmission outperforms RLP for 34% of the groups. For 17% of the groups, the per-cell sum rate is more than 1 bps/Hz/cell higher for SC transmission. With cellular user grouping combined with scorebased scheduling (SB-CUG), these numbers decrease to 7% and 0.6%, respectively. The improvement is due to better conditioned 3 × 3 channel matrices H resulting in the need for on average smaller power scaling factors c in (10). These results indicate that even with few users to choose from in the system, local scheduling will provide good user groups for CoMP. This phenomenon will be further validated in Section 7.
A clear benefit of using local scheduling algorithms such as score-based scheduling is that we can get the benefits of multiuser diversity at low complexity. This is evident when we in Tables 3 and 4 compare the average sum rates when M = 3 with those for M = 9. The results for RUG-RR remain almost unchanged, as expected. However the SB-CUG provides a multiuser diversity gain in the range of 30% for the CoMP schemes and 15% for SC transmission. For SB-CUG with M = 9, the fraction of situations where  Sum rate for M = 9 users evaluated at a prediction horizon of ϑ = 8.
SC outperforms CoMP with RLP is only 1%. The advantage of SC in sum rate is more than 1 bps/Hz in less than 0.01% of the situations. Interestingly, both of these observations indicate that the multiuser diversity gain is higher for JT CoMP than for SC transmission when using SB-CUG. This is because the score-based scheduler selects users when they have their best channel quality, so their prediction errors will also be the lowest. This increases the accuracy of the CoMP precoder.
With SB-CUG for M = 9 users, CoMP improves the average sum rate by 54% as compared to SC transmission. For the worst combinations of positions of scheduled users (the 5% percentile), the sum rate improves by 47%.
It is seen from Figure 6 that the highest sum rate gains from using CoMP are achieved when the noise floor is low. The system is then intracluster interference limited. The performance for ZF with perfect CSIT has been added for comparison. As the noise floor decreases, the gap between ZF with perfect CSIT and ZF with predicted CSIT increases. For low noise floors, RLP does not outperform ZF since RLP can only compensate for inaccurate CSIT by allocating transmit power over the more reliable channels, but it cannot compensate for the actual phase errors in the CSIT. As the noise floor decreases, and the channels become more accurate as a result (see Table 2), it therefore cannot perform better than ZF, even for the tough user groups.
We now in Figure 6 compare ZF, RLP and ZF with perfect CSIT in the case with a noise floor of −110 dBm using RUG-RR. ZF with perfect CSIT then performs worse than RLP with predicted channels, which may seem surprising. However, as mentioned, the regularizing third term in the inverse in (14) affects the power allocation such that more power is transmitted over accurate channels than over very inaccurate channels. Since generally the most accurate channels are also the strongest channels, the power allocation is automatically better than that of the ZF solution, even when ZF uses perfect CSIT. Table 5 shows the results as the prediction horizon increases to ϑ = 18 (23 ms at 2.66 GHz). The decrease in CSIT quality decreases the performance for CoMP, as coherent transmission is sensitive to phase errors. Interestingly, with SB-CUG, there is still a clear gain with using CoMP as compared to using SC transmission. This is not the case with RUG-RR. The CoMP schemes in combination with SB-CUG is hence more robust to  channel prediction errors than in combination with RUG-RR. Even for these fairly long delays of 23 ms, we still obtain significant CoMP gains, 38% increase in average sum rate for users at pedestrian velocities in the 2.66 GHz band. Moreover, if the system could guarantee delays of maximum 10 or 5 ms, we could equivalently obtain significant CoMP gains for users at vehicular velocities of about 60 and 120 km/h respectively at a carrier frequency of 500 MHz. All investigated scenarios above suggest that using SB-CUG instead of RUG-RR is especially important for ZF precoding. User grouping based on cellular scheduling increases the average sum rate performance of ZF precoding so that it becomes equal to that of RLP. The 5% percentile sum rate is increased by up to a factor 6.7. This is because SB-CUG generates well-conditioned matrices. The channel errors from the weak base stations will then have less effect on the final solution. This is most evident in the lowest percentiles, since these include the user groups with the largest channel errors.
It is noticeable, from Table 3 and Figure 6, that with SB-CUG, ZF sometimes outperforms RLP. In our studies, we have seen that this is due to the approximations made when calculatingĒ [ H * V * V H] in (14) by using (7), (15) and (16). This overestimates the variance of the prediction error as discussed in Section 3.2. RLP then becomes overly cautious, yielding a slightly worse solution. However, these effects are small and only noticeable at the lowest noise floor.
In all the above, we have assumed that the quantization error is small compared to the prediction error and therefore negligible. As the prediction errors are mostly in the regions of over −20 dB, a feedback cost of 8 to 10 bits per complex-valued scalar channel would ensure this. With an adaptive quantization scheme, the poor channels might only need 4 to 6 bits per complex-valued scalar channel for the quantization error to be negligible compared with the prediction error, so the feedback cost can then be lowered. The overhead required to notify the base station on how many bits each channel require is low, as this relates to the shadow fading and only needs to be fed back on a slow varying time scale, related to the shadow fading.
An idea of how a nonnegligible adaptive quantization error would affect the results can be gained by studying the performance differences between different noise floors. The higher noise floors lead to less accurate predictions, and quantization errors would amplify this effect. However, with a fixed quantization granularity, the size of the quantization error would be independent of the channel prediction quality. Then, in the presence of nonnegligible quantization errors, other effects might occur, which are not present in the results presented her. This is a topic of importance, which will be left for future studies.

Investigation of user grouping strategies
Due to the high computational complexity of some of the user grouping schemes, all of them have not been evaluated on the extensive channel data of Section 6, but rather in a simulation environment. Three cells supported by N = 3 omnidirectional single-antenna base stations at a distance R = 500 m serve M = 3, 6, . . . , 27 singleantenna users, with independently block-fading channels. The simulations use 140 block-fading resource blocks. The channel gains H ij for each set of user i and base station j are modeled as zero-mean and circular symmetric complex Gaussian variables. Their variance σ 2 h ij is given by the path loss model 128.1 + 37.6 log 10 (d) and lognormal shadow fading with 8-dB standard deviation. The channels are generated in two steps. First, channel prediction error variances σ 2 h ij are calculated through (6) assuming that w = 4 flat fading subcarriers are predicted jointly and that the fading statistics for all channels H ij is perfectly represented by a known fourth-order AR model with poles in 0.96 ±0.09i and 0.91 ±0.04i yielding a flat Doppler spectrum. Such a spectrum generally causes channels that are harder to predict than those in the measurements, where there is a mixture of different Doppler spectra. Second, to ensure that the prediction and the prediction error are uncorrelated, each H ij is calculated through (2) with H quant = 0 and with H ij andĤ ij modeled as uncorrelated circular symmetric complex Gaussian variables with variances σ 2 h ij and σ 2 h ij − σ 2 h ij , respectively. The parameters in the right-hand column of Table 1 and a prediction horizon of ϑ = 8 are assumed.
Users are dropped randomly with equal probability within a circle of 360-m radius from the cluster center. This area corresponds well to the area in which a user would be allocated for overlapping network centric cooperation clusters that are formed as described in [14,17,48]. http://jwcn.eurasipjournals.com/content/2014/1/100 Performance is evaluated in terms of sum rate and individual user rate using ZF JT CoMP over 1,000 sets of user positions. The results from an exhaustive search of which user groups give the best sum rate on each resource have been added. This is denoted as optimal best rate (opt. BR).

Results
Comparisons between all the user grouping and scheduling schemes described in Section 4, as well as RUG-RR are presented in terms of sum rate (Figure 7) and average user rate (Figure 8) for M = 9 users. Note that the CUG scheme performs close to the much more complex GUG algorithm both for the near sum rate optimal groups, comparing GUG-BR with BR-CUG and for the 'fair' user groups, comparing GUG-PF with SB-CUG. Both GUG-BR and BR-CUG also perform close to the sum rate optimal user grouping obtained by exhaustive search. In terms of the lowest percentiles of the average user rates for the fair algorithms, GUG-PF is more fair than SB-CUG. This can be explained by the SB-CUG being restricted to allocating resources fair amongst users in the same cell. Therefore, when the users are unevenly distributed, e.g. when 80% of the users belong to the same master base station, then these users will be allocated to less resources than the other 20% of the users. The low percentiles of SB-CUG are still much better than those obtained with RUG-RR and with the sum rate optimal user grouping algorithms. In Figure 9, we see that the multiuser scheduling gain for the BR-CUG algorithm is on level with that of the sum rate optimal algorithm. For the more fair SB-CUG, the gain in terms of sum rate is less.

Discussions and conclusions
The paper has investigated the sum rate performance gains by coordinated joint linear transmission (JT CoMP) from several sites, relative to conventional cellular transmission with frequency reuse 1.
We have taken several types of constraints into account to obtain a reasonably realistic setting. Measured channel sounding data were used to obtain fading channels from multiple transmitter sites for a large set of terminal positions. We focused on cooperation between three single-antenna (macro) sites, to model a scenario with reasonable demands on feedback and on backhaul in a small cooperation cluster. All users furthermore had pedestrian velocities and we predicted their channels by Kalman algorithms. This setting produced significant CSIT errors and allowed us to investigate the limits of performance due to channel outdating. To obtain reasonable computational complexity, we furthermore restricted focus to linear precoders that were designed jointly for the whole cluster based on the inaccurate CSIT.
Our results take delays over the backhaul links into account, via the required prediction horizon, but backhaul capacity within clusters is not constrained. Such constraints would reduce performance markedly [43]. Furthermore, quantization errors of the channel prediction feedback over uplinks in FDD systems have been assumed small, relative to the prediction errors. This assumption would, e.g. be fulfilled by using 10-bit quantization of complex channel components. (For the considered case of three base station antennas per cooperating cluster, the resulting feedback load over the uplink would then be 30 bits per scheduled user for each scheduled block. This assumes feedback of predictions only by the scheduled users and only for scheduled resource blocks, as proposed in Section 4. Methods that further reduce the feedback overhead are under current investigation).
The first main conclusion that stands out from these results is the crucial importance of a good user grouping. Joint transmission to a group of users with a badly conditioned channel matrix would lead to scaling problems in a linear precoder that is designed under per-antenna power constraints. With random user positions, such problems occur frequently, with the result that the advantages of CoMP relative to cellular transmission are lost.
A second main conclusion is that with reasonably good user grouping, JT CoMP combined with fair opportunistic scheduling provides significant performance gains for practically all of the sets of investigated user positions. This holds also at quite large CSIT error levels, e.g. at NMSE -9 dB on average over all positions at 0.28 wavelengths or 23 ms prediction horizons (Tables 2 and 5). However, for still longer prediction distances in space, the performance starts to deteriorate and the gains of using coherent joint transmission vanish [10]. A specialized 'predictor antenna' system for vehicles has recently been proposed to obtain accurate CSI also at very high velocities [49].
A third highlight is that these gains can be obtained by using a simple user grouping scheme that we have proposed and evaluated here. Its essence is 'Perform multiuser scheduling with respect to frequency locally for each cell. Then, for each frequency resource block, design joint transmission precoders for the terminals that have thereby been allocated to use that resource block. ' The first step can be executed locally in the base stations as opposed to in the central control unit, providing less strain on the backhaul links. Multiuser scheduling gains over frequency-selective channels are then preserved and even amplified (comparing Tables 3 and 4) by using JT CoMP relative to single-cell transmission that uses the same schedulers. By enabling the use of a two-stage feedback approach, the proposed user grouping scheme also reduces the CSI feedback overhead in FDD systems drastically.
The simulations provided in Section 7 have shown that this extremely simple algorithm performs very close to the much more complex, feedback and backhaul demanding greedy user grouping algorithm. It also performs close to rate-optimal. Its effectiveness in avoiding bad user groups is illustrated most strikingly by the resulting increase of the 5% percentile sum rate performance, relative to random user grouping, when using zero-forcing precoding (Tables 3, 4 and 5). This user grouping scheme could be improved further, by introducing a second scheduling round that eliminates the few remaining cases with channel matrices with large singular value spread. That would however increase both the delay and the computational complexity.
A similar user grouping scheme can also be used with multiantenna base stations, where we in a first step design (multiuser) MIMO beamformers for each cell. Joint CoMP precoders (beamformers for the whole cluster) are then designed in a second step and are added to the signal chains before the cellular beamformers [17,50].
Robust precoding that takes the channel inaccuracy into account is an important safeguard against remaining cases with problematic channel matrices. We have studied the use of the iterative RLP design of linear precoders for this purpose. When provided a 'tough' user group, with a badly conditioned channel matrix, then robust precoder designed by using the RLP scheme outperforms standard zero-forcing by a factor of 3 in terms of 5% percentile sum rate (Tables 3, 4 and 5, for RUG-RR). However when user groups are chosen that mostly ensure diagonal-dominant channel matrices, then RLP does not have a great advantage over ZF.
We have furthermore found interesting interactions between channel estimation and the properties of RLP precoders. A question posed in the introduction was on the effects of large differences in estimation accuracy for strong and weak channels. Would the larger inaccuracy of estimates in weak channels spoil the precoder design? When the RLP design is used, the opposite happens. Large inaccuracies of weak channels lead to these transmitters being less used by the precoder. This leads to less need for rescaling of the solution to satisfy the transmit power constraint.
With good precoder design and user grouping schemes, the limits of performance for linear downlink JT CoMP will mainly be due to the CSIT quality and the out-ofcluster interference and noise level (see Figure 6, SB-CUG). Cooperation cluster design is therefore crucial Then, the transmit powers at base station 1, 2 and 3 are [ 1 13 1.25 ], so j max = 2 and 1 j max = [ 0 1 0 ]. The parameter ρ 1 is iteratively optimized w.r.t. (18) over an interval 0, ρ 1,max where ρ 1,max is the smallest value that will cause j max to change. This procedure can be repeated for the second strongest base station, denoted j max 2 , with S ρ,2 = diag ρ 1 · 1 j max + diag ρ 2 · 1 j max 2 .
Similarly, as for ρ 1 , the parameter ρ 2 is now optimized over , ρ 2,max , while ρ 1 is held fixed, where ρ 2,max is the http://jwcn.eurasipjournals.com/content/2014/1/100 smallest value that will cause the value of j max or j max 2 to change. In the above example, j max 2 = 3 and 1 j max 2 = [ 0 0 1 ]. Again, the procedure can be repeated for all remaining base station in the order of decreasing transmit power until the final matrix is For clusters with few base stations, it is often sufficient to adjust only one scalar parameter in S related to the strongest base station as for (23). For clusters with many base stations, further improvements are obtained by adjusting additional diagonal elements in S starting with that associated with the second strongest base station.
For multiantenna base stations, all the co-located transmit antennas of one cell have average channel gains of the same order of magnitude. They should therefore be penalized using the same order of magnitude. Then, one penalty parameter value ρ j can be adjusted simultaneously for all antennas at one base station j at a time as for the single-antenna base station example above.

Appendix 3
Assuming no quantization errors, E s i s * j = δ ij ,Ē [ H] = 0 and E [nn * ] = σ 2 I in (1), (2) and (10), the expected values of the power for the received message P S,i , the intracluster interference P I,i and the noise P N,i at the ith user are given bȳ