A Dynamic Clustering and Resource Allocation Algorithm for Downlink CoMP Systems with Multiple Antenna UEs

Coordinated multi-point (CoMP) schemes have been widely studied in the recent years to tackle the inter-cell interference. In practice, latency and throughput constraints on the backhaul allow the organization of only small clusters of base stations (BSs) where joint processing (JP) can be implemented. In this work we focus on downlink CoMP-JP with multiple antenna user equipments (UEs) and propose a novel dynamic clustering algorithm. The additional degrees of freedom at the UE can be used to suppress the residual interference by using an interference rejection combiner (IRC) and allow a multistream transmission. In our proposal we first define a set of candidate clusters depending on long-term channel conditions. Then, in each time block, we develop a resource allocation scheme by jointly optimizing transmitter and receiver where: a) within each candidate cluster a weighted sum rate is estimated and then b) a set of clusters is scheduled in order to maximize the system weighted sum rate. Numerical results show that much higher rates are achieved when UEs are equipped with multiple antennas. Moreover, as this performance improvement is mainly due to the IRC, the gain achieved by the proposed approach with respect to the non-cooperative scheme decreases by increasing the number of UE antennas.


I. INTRODUCTION
Coordination among base stations (BSs) has been widely studied in the recent years to tackle inter-cell interference which strongly limits the rates achieved in cellular systems, in particular by the user equipments (UEs) at the cell-edge [1]. Supported by the first results promising huge gains with respect to the baseline non-cooperative system [2], a lot of attention has been paid to the topic both in the academia [3], [4] and in the industry [5], [6]. These techniques, known in the industry as coordinated multi-point (CoMP), are classified into a) coordinated scheduling/beamforming (CS/CB), which require channel state information (CSI) but no data sharing among the BSs, and b) joint processing (JP), which require both CSI and data sharing among the BSs. This paper focuses on downlink CoMP-JP, where BSs jointly serve the scheduled UEs by sharing the data to be sent. Although CoMP-JP is a very promising technique, many issues make its implementation still challenging. First, CSI at the transmitter may be unreliable because of noise on channel estimation in time division duplex (TDD) systems and limited bandwidth available for feedback in frequency division duplex (FDD) systems. Then, sharing UE data among all the BSs is generally limited by throughput and delay constraints in the backhaul infrastructure. A possible approach to deal with backhaul throughput constraints relies on partial sharing of UE data among the BSs, i.e., a BS serving a certain UE may have only a partial knowledge of the data to be sent toward that UE [7], [8]. Although the promising results achieved under idealistic assumptions, partial UE data sharing has not found application in a real system mainly because of its complexity. Hence, most of the works in the literature focus on the simpler clustering approach to deal with limited throughput backhaul, where the BSs are organized in clusters and joint processing is applied within each cluster by sharing the whole data to be sent among all the BSs of the cluster. However, even if intra-cluster interference is mitigated by using CoMP schemes within the cluster, UEs at the cluster border suffer strong inter-cluster interference (ICI). Many clustering schemes have been developed in the literature to deal with ICI. In [9] static clustering with block diagonalization is considered and precoders are designed in each cluster by nullifying the interference towards UEs of neighboring clusters close to the border. A more flexible solution is obtained with dynamic clustering [10], [11] where the set of clusters changes over time by adapting to the network conditions. In [10] a greedy algorithm is developed where, for each cluster, the first BS is selected randomly to guarantee fairness, while the remaining BSs of the clusters are selected by maximizing the cluster sum rate.
In [11] a joint clustering and scheduling scheme is developed by generating a set of candidate clusterings based on the long-term channel conditions and then by selecting, in each time block, the best one. In [12] the set of clusters is optimized by maximizing the increase in the achievable UE rate, whereas in [12], [13] by minimizing the interference power. In [14] a BS negotiation algorithm is designed for cluster formation by considering a fixed cluster size. In [15] active clusters are selected by minimizing an overall cost function among a set of candidates which depend on UE average received power. A framework for feedback and backhaul reduction is developed in [16] where each UE feeds back CSI only to a subset of BSs, and UEs associated to the same subset are grouped together. In [17] a greedy scheduling algorithm with overlapping clusters is proposed where precoders are designed by considering the layered virtual signal to interference plus noise ratio (SINR) criterion [18].
However, most of the works on dynamic clustering ( [10], [11], [12], [13], [16], [17]) assume that UEs are equipped with only one antenna, although the Long Term Evolution (LTE) Advanced standard developed by the 3rd Generation Partnership Project (3GPP) considers that UEs may be equipped with up to eight antennas [19]. Although this number seems a bit optimistic for current mobile devices, the technological innovation may allow in the near-future manufacturing smartphones or tablets with numerous antennas and hence much more attention should be paid to the study of CoMP schemes with multiple antenna UEs [20], [21]. Therefore, in this work we consider downlink CoMP-JP with a constraint on the maximum cluster size and propose a novel dynamic clustering algorithm by explicitly considering that UEs are equipped with multiple antennas. In our proposal, UEs can exploit these additional degrees of freedom by implementing interference rejection combiner (IRC) [22] to partially suppress the ICI and being served by means of a multi-stream transmission. Moreover, differently from many works on dynamic clustering where UE selection is not considered and a simple round robin scheduler is implemented ( [10], [12], [13], [14], [16]), here we assume UE scheduling as a part of the optimization. In our approach we first define a set of candidate clusters depending on longterm properties of the channels. Then, given this set of candidate clusters, in each time block the proposed algorithm follows a two-step procedure: a) a weighted sum rate is estimated within each candidate cluster by performing UE selection, precoding design, power allocation and transmission rank selection, and then b) the central unit (CU) coordinating all the BSs schedules a set of non-overlapping candidate clusters by maximizing the system weighted sum rate. We emphasize that the developed resource allocation is performed by jointly optimizing transmitter and receiver, i.e., by taking into account the benefits of IRC and multi-stream transmission. Moreover, the proposed scheme can be implemented in a semi-distributed way, as the cluster weighted sum rate optimization can be performed within the cluster without additional information from the out-of-cluster BSs.
We evaluate the proposed solution in a LTE-TDD scenario, where channels are estimated at the BSs thanks to pilot sequences sent by the UEs, and compare it against a baseline singlecell processing (SCP) scheme and two static clustering schemes, where the clusters do not dynamically adapt to the network conditions. Numerical results with perfect CSI at the BSs show that the achievable rates strongly increase with the number of UE antennas. Then, as with CoMP part of the interference is managed at the transmit side, multi-stream transmission is more effective with the proposed scheme than with SCP. However, as most of the gain is due to the interference suppression capability of the IRC, the relative gain achieved by the proposed scheme with respect to SCP decreases by increasing the number of UE antennas. Finally, a further decrease of this gain is observed when imperfect CSI is considered at the BSs. We assume a block fading channel model and denote with H k,j (t), t = 0, 1, . . . , T − 1, the multiple-input multiple-output (MIMO) channel matrix of size N × M between BS j and UE k in block t. We consider that the entries of matrix H k,j (t) are identically distributed zero-mean complex Gaussian random variables, i.e., [H k,j (t)] n,m ∼ CN 0, σ 2 k,j , for n = 0, 1, . . . , N −1 and m = 0, 1, . . . , M −1, where σ 2 k,j represents the large scale fading between BS j and UE k, which depends on path-loss and shadowing. We assume that the statistical description of the channels does not change for all the T blocks, whereas fast fading realizations are independent among different blocks. Then, we denote with Σ k,j = E vec(H k,j (t))vec(H k,j (t)) H the covariance matrix of the channel matrix H k,j (t). We indicate with N E the number of resource elements, i.e., time slots, forming a block. Note that the block fading model considered in this work can be adapted to represent a more realistic channel which changes continuously both in time and in frequency domains by suitably selecting the number of resource elements in each block. In fact, by denoting with W C and T C the coherence bandwidth and time of the channel, respectively, We assume that the BSs are coordinated by a CU and the backhaul links have zero latency and are error free. Each block is organized in three phases: a) in the first phase all the UEs send pilot sequences to allow channel estimation at the BSs, b) in the second phase clustering, UE scheduling, beamforming design, transmission rank selection and power optimization are performed by the CU and finally c) in the third phase the BSs perform data transmission toward the set of scheduled UEs.

A. First Phase: Uplink Pilot Transmission
The first N T resource elements of each block are allocated to the uplink pilot transmission performed by the UEs. We assume that orthogonal sequences are employed by the UEs, thus interference on channel estimation is avoided at the BSs: in the considered scenario this is achieved by imposing a minimum length of the training sequence of N T ≥ NK. By denoting with P (UE) the maximum power available at each UE and σ 2 n the thermal noise power, under the assumption of perfect reciprocity BS j estimates the channel matrix H k,j (t) connecting UE where η k,j,n,m (t) ∼ CN 0, N σ 2 n N T P (UE) . By assuming that BS j knows the covariance matrix Σ k,j , the minimum mean square error (MMSE) estimateĤ k,j (t) of H k,j (t) given the observation (1) can be written as [23,Ch. 10] vec Ĥ k,j (t) =Σ k,j Σ k,j + Nσ 2 where [η k,j (t)] n,m = η k,j,n,m (t).
Note that in the case of uncorrelated channels, i.e., when Σ k,j = I M N , the expression in (2) turns out to beĤ

B. Second Phase: Resource Allocation at the CU
After uplink pilot transmission, the CU organizes BSs in clusters and schedules in each block t a subset S(t) ⊆ K of UEs. In this work we consider dynamic multi-stream transmission and we denote with l k (t) the transmission rank allocated to UE k in block t, i.e., the number of for UE k, and with P (BS) the power available at each BS. Then, as the BSs do not have perfect CSI because of noise on channel estimation (2), we denote withR k (t) the estimate at BSs of the spectral efficiency achieved by UE k. We consider a constraint on the cluster size by assuming that each UE can be served by up to J MAX BSs. Let us define the step function Therefore, the weighted sum rate maximization can be written as where scaling factor α k (t) in (5a) is the quality of service (QoS) for UE k which depends on the employed scheduler, (5b) is the power constraint at each BS and (5c) imposes that each UE cannot be served by more than J MAX BSs. Note that optimization (5) is very general and includes a) BS clustering, b) UE scheduling, c) beamforming design, d) transmission rank selection and e) power allocation. A practical solution to solve (5) is described in Section III.

C. Third Phase: Downlink Data Transmission
After computing the solution to problem (5), BS clusters serve the scheduled UEs by using the N E − N T resource elements still available in block t. For the sake of clarity, in the rest of the paper we drop the block index t. By defining matrix , the signal received by UE k can be written as where s k ∼ CN (0 l k ×1 , diag (P k )) is the data symbol vector transmitted toward UE k and is the thermal noise at the UE antennas. We assume perfect CSI at the UE side, which employs IRC with successive interference cancellation (SIC) [24, Ch. 10]. We assume perfect detection, hence there is no error propagation. Note that IRC both minimizes the mean square error and maximizes the SINR at the detection point [22].
From (6), by defining the interference plus noise covariance matrix at the UE as the effective spectral efficiency achieved by UE k can be written as where in (8) the overhead due to the UE pilot transmission is taken into account in the scaling factor before the logarithm.
Remark: Note that the assumption of perfect CSI at the UE in the considered system does not limit the value of the developed analysis. Indeed, it has already been shown [25], [26]  Hence, in this work, we propose a practical solution to (5) the cluster of u BSs with the strongest average channel toward UE k, i.e., Hence, f k (1) is the anchor BS for UE k, i.e., the BS characterized by the highest average signal to noise ratio (SNR).
In a network with J BSs and a maximum cluster size of J MAX , the number of possible BS clusters that can be constructed to serve a given UE is which rapidly increases with J. However, as most of the interference at each UE comes from the closest BSs, we limit the number of candidate BS clusters that can be organized by the CU. We order the BS clusters (9) by an integer index c ∈ C and denote by J c cluster c. By imposing a constraint on the maximum cluster size, we assume that C includes all and only the sets J (u) k whose size is not bigger than J MAX . Note that the considered assumption yields an important saving in terms of computational complexity by strongly limiting the number of candidate clusters with respect to (10): this complexity saving is evaluated in Section IV for a typical LTE scenario.
For each cluster J c , we define the corresponding set U c of UEs that can be scheduled for reception, which is formed by the UEs whose anchor BS belongs to J c , i.e., Note that (11) allows BSs in cluster J c to serve all the UEs in its coverage area, even UEs close to the border. Although a different choice could be taken for instance by forcing the cluster to serve only the UEs far away from the border, it has been shown in [27] that this alternative choice provides worse performance than (11) when a huge network is considered and fairness among the UEs is taken into account.
Then, a solution to problem (5) is obtained by applying the following two-step algorithm.
1) For each candidate BS cluster J c , we estimate the weighted sum rateR (c) by selecting a suitable subset of UEs S c ⊆ U c , designing precoders, selecting transmission ranks and allocating powers. With the aim of allowing a distributed implementation of the algorithm, we perform this optimization without requiring any information from the other candidate BS clusters.
2) After collecting the weighted sum rateR (c) from all the candidate BS clusters in C, the CU schedules a set of non-overlapping BS clusters, where each BS belongs to at most one cluster.
First, although a solution with overlapping clusters would provide higher rates, it would be much more challenging in terms of computational complexity, because of the higher number of available cluster combinations. Moreover, it would require that everything is optimized at the CU [17]. Hence, we focus here on the easier and more practical non-overlapping cluster solution.
Second, note that the proposed method can also be implemented in a fully centralized system, for instance by a star backhaul network topology [11], where the CU is directly connected by a low latency link to each BS. However, this situation is unlikely when the number of BSs J is high, and a more realistic scenario considers a backhaul network with direct links only among neighboring BSs [28]. In such a case, the developed scheme properly adapts to the backhaul infrastructure and, for example, set C could be partitioned into subsets, each with a BS responsible for estimating the weighted sum rate achieved by all the candidate BS clusters of that subset. Then, this BS would forward only the estimated weighted sum rates to the CU which is managing the whole network.
Moreover, based on (11) we observe that UE k can be selected only by clusters that include its anchor BS f k (1). Hence, if we enforce a non-overlapping solution, each UE is never scheduled by two different non-overlapping clusters in the same block. However, we highlight that the proposed dynamic solution allows the flexibility of scheduling a given UE in different clusters across successive blocks.
In the rest of this section we describe more in detail the above two main steps of the algorithm.
We stress that the candidate cluster selection, i.e., the construction of set C depends on the large scale fading: hence, in our model, it should be performed only every T blocks. On the other hand, the two-step algorithm proposed to solve (5) follows a fast fading time-scale and therefore must be implemented in each block.

A. Cluster Weighted Sum Rate Estimation
whereî (c) k is the estimate of the inter-cluster interference (ICI) suffered by UE k. Note that the exact value ofî Note that (13) represents the average ICI power at the UE k when all the BSs outside cluster c are transmitting at full power [29, (2)].
Similarly to (7), from (12) we define the interference plus noise covariance matrix Last, BSs in J c yield an estimate of the weighted sum rateR (c) by solving the optimization problem (15), at the top of the page.
Maximization (15) is a well studied multi-user MIMO problem [30] involving a) UE selection, b) transmission rank selection, c) precoding design and d) power allocation. Note that the estimate of the rate achieved by UE k in (15a) is computed by assuming that the multiple receive antennas are exploited by performing SIC with IRC: the rank l k allocated to UE k is given by the number of columns of precoder G (c) k , whereas the remaining N − l k degrees of freedom are used to partially suppress the residual ICI. We propose to solve (15) under the following assumptions.
• Equal power is allocated to the streams sent toward the scheduled UEs, i.e., P k,l = P (c) , k ∈ S c , l = 0, 1, . . . , l k − 1, where P (c) can be analytically computed from (15b) as • Beamformers are designed by using the multiuser eigenmode transmission (MET) scheme [31], where the precoding matrix used to serve UE k is optimized with the aim of nullifying the interference toward the eigenmodes selected for the co-scheduled UEs m ∈ S c \ {k}.
In detail, letĤ k are arranged so that the ones selected for transmission toward UE k appear in the leftmost columns. By defining matrix , . . . , precoding matrix used to serve UE k satisfies constraintŝ

B. Clustering Optimization
After collecting all the weighted sum ratesR (c) , c ∈ C, the CU schedules a set of nonoverlapping clusters by maximizing the system weighted sum rate. In detail, by defining 1, CU schedules candidate cluster J c , 0, otherwise, we consider that each BS belongs to at most one cluster, i.e., we impose c∈C a j,c x c ≤ 1 , j ∈ J .
Therefore, at the CU the clustering optimization is performed by solving the following linear integer optimization problem max xc, c∈C c∈CR s.t. (19).
Note that (20) differs from the optimization carried out in [15] where the objective function simply depends on the received power measured by the UEs.
Maximization (20) is the optimization version of the set packing problem, which is shown to be NP-hard [32]. Hence, as the exhaustive search is not a viable method to solve (20), we propose a greedy iterative algorithm which is reported in Tab. I: the proposed solution basically selects at each iteration the best (in terms of system weighted sum rate) cluster and ends when each BS has been assigned to at least one cluster. In detail, let C (A) (n) be the set of candidate clusters considered at iteration n. The algorithm starts by imposing C (A) (1) ← C and ends when C (A) (n) = ∅. Note that C (A) (n) includes all the candidate clusters that do not overlap with the clusters scheduled in the previous iterations. At iteration n, we select cluster J w ∈ C (A) (n) that maximizes the per-BS weighted sum rate, i.e., and we remove from C (A) (n) all the clusters that partially overlap with J w . Note that in criterion (21) we normalize the cluster weighted sum rateR (c) with the number of BSs |J c | included in the cluster with the aim of scheduling big clusters only if this really provides a system performance improvement. Let us consider as an example the basic scenario with J = 2: by using (21), we schedule the cluster of 2 BSs only if the weighted sum rate achieved within this cluster is higher than the system weighted sum rate achieved by the SCP scheme, i.e., when the 2 BSs are uncoordinated.  (20). c , c ∈ C, the greedy solution to (20) obtained by applying the proposed algorithm, the set S of UEs scheduled in the current block turns out to be Finally, the precoding matrix and power used to serve UE k ∈ S are the ones computed within the cluster as described in Section III-A.

IV. NUMERICAL RESULTS
We consider an hexagonal cellular scenario where J = 21 BSs, each equipped with M = 4 antennas, are organized in 7 sites, each with 3 co-located BSs (see also Fig. 1). We consider 10 UEs randomly dropped in the coverage area of each BS, with K = 210 UEs overall. The power available at each BS is P (BS) = 46 dBm, the power available at each UE is P (UE) = 23 dBm and the thermal noise power is σ 2 n = −101 dBm. The large scale fading between BS j and UE k can be written as where d k,j is the distance between BS j and UE k, ν = 3.5 is the path-loss coefficient, Γ (CE) dB = 10 dB is the average SNR when an UE is at the cell edge, e ζ k,j is the lognormal shadowing with 8 dB as standard deviation and A(θ k,j ) models the antenna gain as a function of the direction θ k,j of UE k with respect to the antennas of BS j, with where θ 3dB = (70/180)π and A s dB = 20 dB [33, (21.3)]. We consider an inter-site distance of 500 m and a minimum distance d min = 35 m between BSs and UEs. Wraparound is used to deal with boundary effects [34]. We also assume that channels are correlated by considering the popular Kronecker model [35]. By denoting with R BS the square correlation matrix of size M at the BS, with tr(R BS ) = M, and with R UE the square correlation matrix of size N at the UE, with tr(R UE ) = N, we can write  whereH k,j (t) is a matrix of size N ×M whose entries are independent and identically distributed zero-mean complex Gaussian random variables with σ 2 k,j as statistical power. Results are obtained by simulating 100 UE drops and T = 200 block channel realizations for each UE drop. We assume that proportional fair scheduling [36] is implemented to provide fairness among UEs, i.e., α k (t) = 1/R k (t), withR k (t+1) = (1−γ)R k (t)+γR k (t), t = 0, 1, . . . , T − 1, where γ = 0.1 is the forgetting factor and we initializeR k (1) = log 2 1 +P σ 2 k,j k /σ 2 n . However, to allow the scheduler to reach a steady state, only the last T /2 channels of each UE drop are considered for system performance evaluation.
We compare the developed scheme based on dynamic clustering (DC) against: • single cell processing (SCP), where no cooperation is allowed among the BSs and each UE is served by its anchor BS (baseline scheme); • intra-site cooperation (ISC), where we statically create 7 clusters, each one composed by 3 co-located BSs (see Fig. 2(a)), and each UE is served by the best site in terms of average SNR; • static clustering (SC), where we still consider 7 static clusters, but with cooperation allowed among three BSs of three different sites as shown in Fig. 2(b).
Note that with ISC and SC there is no dynamic clustering, i.e., the clusters do not change over time adapting to the channel conditions. Moreover, we assume that UE scheduling, beamforming design, transmission rank selection and power allocation are performed by enforcing the assumptions described in Section III-A also with the static schemes: hence, UEs are served by using MET [31] with equal power allocation among the eigenmodes and greedy user selection within each cluster.
The proposed schemes are compared in terms of: • average cell rate, defined asR First, to evaluate the complexity saving achieved by the candidate cluster selection described at the beginning of Section III, we show in Tab. II the 5th, the 50th, and the 95th percentiles of the number |C| of candidate clusters considered with DC by assuming J MAX = 3. By adapting the candidate clusters to the long-term channel conditions with working assumption described at the beginning of Section III, we have a saving of about 80% in terms of |C| with respect to the full search (10): in fact, with our approach we ignore candidate clusters that include far apart BSs.

A. Effect of Multiple Antennas at the UEs
In this section we consider perfect CSI at the BSs, i.e.,Ĥ k,j (t) = H k,j (t) in (2), uncorrelated antennas, i.e., in (25) R BS = I M and R UE = I N , and we assume J MAX = 3 with DC for a fair comparison against ISC and SC in terms of maximum cluster size. In Fig.s 3 and 4 we report the average cell rate and the 5th percentile of the UE rate for three values of the number N of UE antennas, respectively. First, we observe an important performance improvement by adding This shows that in general most of the gain is due to the IRC and multi-stream transmission plays a non-negligible role only with DC. Then, we observe that ISC provides a moderate gain with respect to SCP in terms of cell rate, but almost no gain in terms of the 5th percentile of the UE rate, whereas the opposite happens with SC. Indeed ISC of Fig. 2(a), by only allowing cooperation among the sectors of the same site, partially helps the UEs close to the site border, which however get better performance with SC of Fig. 2(b). Moreover, we also observe that the performance gain achieved by DC over SCP decreases by adding more antennas at the UE side.
In fact, as the gain of using multiple antenna UEs is mainly due to the IRC which cancels ICI, the benefits of increasing N are seen more in a non-cooperative scenario, where the residual ICI is higher with respect to DC. In detail, from

B. Effect of Antenna Correlation
In Fig. 5 we consider N = 4, l (MAX) = 1, and perfect CSI at BSs, and we introduce correlation among UE antennas by assuming that R UE is a symmetric Toeplitz matrix whose first column is [R UE ] ·,0 = 1, β, . . . , β N −1 T , and plot the 5th percentile of the UE rate in terms of β. As  Moreover, we also observe that the gain achieved by DC over SCP decreases by decreasing β: in detail, this gain drops from 25% with β = 0.9 to 18% with β = 0.1. In fact, by decreasing the correlation among UE antennas, we improve the interference suppression capability of IRC.
These results confirm that it is not worthy to add more antennas at the UE when they are strongly correlated.

C. Effect of Cluster Size
In Fig. 6, by assuming N = 1, 4, l (MAX) = 1, uncorrelated antennas and perfect CSI at BSs, we compare SCP and DC in terms of the 5th percentile of the UE rate for four values of the maximum cluster size J MAX . An important gain is observed with CoMP by increasing J MAX : for instance, the gain achieved by DC over SCP increases from 43% (18%) with J MAX = 3 to about 84% (40%) with J MAX = 6 when N = 1 (N = 4). These results show that although the strongest interferers are managed by CoMP with J MAX = 3, the ICI suffered by UEs is still very high and strongly limits system performance. Hence, a general comment is that BS clusters of higher dimension should be employed if the backhaul infrastructure allows to do this.

D. Effect of Imperfect CSI at the BSs
In this section we assume that the CSI at the BSs is affected by noise (2) due to the finite number of resource elements N T allocated to the pilot transmissions in each block. After denoting with f d the maximum Doppler frequency and withτ rms the root-mean square delay spread of the channel, we define, respectively, the coherence bandwidth [37,Ch. 4] and coherence time [38,Ch. 4] of the channel as Note that above expressions are only used to determine the block size N E such that the channel can be modeled as uncorrelated between adjacent blocks. Indeed, if f d orτ rms increase, N E is reduced and this lowers the rate of each UE as given by (8). Due to the issues in obtaining a reliable CSI at the BSs in a high mobility scenario, in the following we consider f d = 5 Hz, which roughly corresponds to a mobile velocity of 2 km/h [33,Ch. 20]. In this section we also assume N = 4, l (MAX) = 1, uncorrelated antennas and J MAX = 3 with DC.
In Fig. 7 we consider the extended pedestrian A (EPA) model, which is a very low frequency selective channel withτ rms = 43 ns [33,Tab. 20.2]. In detail, we show the 5th percentile of the UE rate in terms of the ratio N T /N E , which represents the fraction of resources used for pilot transmission. We observe that in this case, rate performance close to the perfect CSI case can be achieved by properly increasing the value of N T . Moreover, note that SCP approaches the best performance faster than CoMP schemes. In fact, while with SCP only the channels between a BS and its anchored UEs are used for precoding design, with CoMP precoders are optimized on the basis also of the channels between some other auxiliary BSs and these UEs. As these channels are generally characterized by a lower SNR with respect to the channel between a BS and its anchored UEs, more pilots are necessary to collect a reliable CSI at transmit side.
In Fig. 8 we plot the 5th percentile of the UE rate with respect to the ratio N T /N E for the very frequency selective extended typical urban (ETU) channel model, characterized byτ rms = 991 ns [33,Tab. 20.2]. In this case we observe that the rates increase with N T up to a maximum and then they start decreasing. In fact, increasing the value of N T has two conflicting effects: a) from (3) a more reliable CSI is collected at the BSs thus improving performance and b) a lower number of resource elements is allocated to data transmission thus obviously reducing the achievable rate. Clearly, for lower values of N T the effect of a better CSI dominates, whereas for higher values of N T the CSI is reliable enough for the SINR level of the UEs, and a further increase of the number of pilots represents only a waste of resources. Even if we are still considering a low mobility scenario, due to the higher frequency selectivity of the ETU channel model, no scheme reaches the rates achieved with perfect CSI. Then, as observed for the EPA model, the fraction of resources allocated to pilots necessary to reach the peaks is lower for SCP (N T /N E ≈ 0.02) with respect to DC (N T /N E ≈ 0.03). As a consequence, by choosing for each scheme the value of N T which provides best rate, the gain achieved by DC over SCP decreases with respect to the perfect CSI case to about 16%.

V. CONCLUSIONS
In this paper we have considered a downlink CoMP-JP system and, by assuming a maximum cluster size, we have developed a dynamic clustering and resource allocation algorithm where the clusters change over time adapting to the channel conditions. We consider that UEs are equipped with multiple antennas that can implement IRC and be served by means of a multi-stream transmission. The proposed algorithm first defines a set of candidate BS clusters depending on the large scale fading. Then, a two step procedure is applied following a fast fading time scale: a) first, a weighted sum rate is estimated within each candidate cluster by performing UE selection and precoding, power and transmission rank selection, and then b) the CU schedules only a subset of these candidates by maximizing the system weighted sum rate. We highlight that joint optimization of transmitter and receiver is performed in the developed resource allocation scheme. Numerical results show that much higher rates can be achieved when UEs are equipped with multiple antennas. In fact, by reducing the level of interference suffered by UEs, the proposed approach exploits more the multi-stream transmission than SCP. However, as most of the gain is due to the IRC, the gain achieved by the proposed approach decreases with respect to SCP by increasing the number of UE antennas. Finally, when channel estimation is considered at the BSs, the gain promised in the perfect CSI scenario may be achieved only in part: in fact, a better estimate requires a longer training sequence and this lowers the system rate.