We consider the downlink of a coordinated multicell MU-MIMO HetNet. Several macro BSs are co-located at each macro site, the coverage of which is partitioned into different cells each covered by an antenna array installed on a macro BS. We assume two different network layouts, the first with six macrocells per site, and the second with three macrocells per site. (For shorthand, we refer to these respectively as “six-cell” and “three-cell” layouts in this paper.) Different system model characteristics are assumed, which are described in Sections 2.1 and 2.2, respectively, for the six-cell and three-cell layouts. In both system models, omnidirectional pico BSs surround each macro site and overlay the macro coverage area. The macro BSs each transmit with power P
t
, and the inter-site distance (ISD) between macro sites is fixed and denoted by D. Each macro BS is equipped with Nmacro transmit antennas, while each pico BS has Npico antennas. We assume for simplicity that Nmacro and Npico are both equal to N. Since all macro and pico BSs in a cluster jointly transmit their signals, essentially virtually forming one large antenna array, this assumption is reasonable and does not impact the proposed methods or the analysis in this work. Having Nmacro larger than Npico would of course increase the system sum rates and the number of users that could be scheduled simultaneously, but would also thereby significantly (and likely needlessly) increase the complexity and length of time of relevant simulations. Additionally, the examination herein assumes perfect and instantaneous CSI and data shared across the backhaul of the network. A centralized processor for the network is assumed to collect the CSI, e.g., by (idealized) feedback from the mobile users of the CSI obtained using pilot reference signals; it also performs scheduling and coordinates the transmissions between BSs. The specifics of the CSI gathering and coordination over the backhaul are outside the scope of this paper.
Layout 1: six-cell layout (hexagonal-shaped cooperating area)
In our first HetNet model, the coverage area of the six cells per site overall forms a hexagonal-shaped region. Each macro BS covers a 60° angle of the area with a directional antenna. The macro site is surrounded by 12 low-powered pico BSs that form picocells overlaying the macro coverage area (see Fig. 1). Of the six cells per site, two adjacent ones are coordinated at any given time to form an effectively larger cell area. The picocells also coordinate within whatever cluster that the macrocell they overlay is part of. Without loss of generality, we may consider any arbitrary macro site (with coverage area shown in green) and the clusters it participates in (shown by the red dashed lines). Therefore, the BSs of any macro site contribute to three different clusters.
As depicted in Fig. 1, two different patterns of clustering are possible, in which different adjacent cells cooperate with each other. All cells within each thick red dashed hexagon coordinate signals from their BSs to form a cluster; one example cluster in each pattern is emphasized in the figure for clarity. As Fig. 1a depicts, those users in a cluster that are located near the border of the cluster (for example, at location “1”) experience the poorest channel conditions from the BSs in the cooperating set. By rotating the clustering pattern by 60° around any macro site (see Fig. 1b), those previously poor-coverage users are now in the middle of the cluster (i.e., they will have better channel gains or higher achievable rates). Therefore, most users will have the opportunity to have a higher chance of being scheduled and to achieve reasonably good data rates for a fraction of the overall transmission time. With users being scheduled primarily during their most favorable clustering pattern, their corresponding rates will be higher than otherwise. Averaging the throughput over all transmission periods and clustering patterns, the overall achievable transmission sum rate of the users will be improved.
There are K users uniformly distributed over the coverage area of each macro site, each user equipped with M receive antennas. Kc(i) is the number of users assigned to cluster c(i), from which Uc(i) users are served, where i refers to the ith pattern of clustering. Each cluster transmits coordinated data signals from all its BSs to its scheduled users.
Layout 2: three-cell layout (clover-leaf-shaped cooperating area)
For the second HetNet model, which is more commonly used in LTE-advanced design [32] and is called a clover-leaf model, each cell in a macro site is covered by a high-powered BS, which is located at a corner of the cell. The directional antenna at a macro BS covers a hexagonal-shaped cell within the angle of 120°. Each macrocell is overlaid by four low-powered omnidirectional pico BSs. These are located near the four edges of the macrocell that are the most distant from the macro BS, as depicted in Fig. 2. Any three adjacent macrocells and their constituent picocells may form a cluster, if the macrocells share a corner that is not a site. Therefore, considering an arbitrary macro site and its corresponding three macrocells (shown in green in Fig. 2), the macro BSs may belong to two or three independent clusters (shown by the red dashed lines).
As depicted in Fig. 2, five different patterns of clustering are possible. We again highlight one example cluster in each pattern for clarity. Those users that are located near the edge of the cluster experience poor channel conditions from the BSs in cooperating set. Consequently, their achievable rates will be smaller compared to the users in the middle of the cluster. By rotating the clustering pattern (see Fig. 2b), a portion of those previously poor-coverage users are now in the middle of the cluster, and some of the users, previously located at the middle of cluster, are now near the edge of the cluster. To put the remainder of the cluster-edge users in Fig. 2a near the middle of a cluster, more rotations are required, which are depicted in Fig. 2c–e. Therefore, in this layout after five intervals of rotation all users have the opportunity of being at least once in the middle of the cooperating area; they thus have a higher chance of being scheduled and achieving reasonably good data rates.
There are K users, each equipped with M receive antennas, uniformly distributed over each cell. This is in contrast to the six-cell layout, which has K users distributed over the coverage area of the macro site. Thus, for the three-cell layout, there are 3K users per site, i.e., Kc(i)=3K.
For both layouts, other patterns of clustering could in theory be used, e.g., by coordinating more macrocells together in a cluster. However, please consider the corners of each macrocell that do not contain a macro BS site. These locations have the worst SINR when no coordination occurs. The patterns that we use cluster the smallest possible number of macro BSs such that it allows each of those corners to be in the center of the cluster in one of the patterns. At the same time, the duration between any given corner being in the center (as the scheme rotates through the patterns) is also the smallest possible for the number of macrocells per cluster being used. (Note there are two of these corners per macrocell in the six-cell layout, and five such corners in the three-cell layout, hence leading to two and five patterns, respectively, for the layouts.)
Cluster rotation in general network layouts
There are, in general, two “rotation” aspects to cluster rotation. The first can be viewed as a physical rotation. Please note the highlighted cluster in Figs. 1 and 2 (denoted by dashed and solid vertical lines, respectively). As the cluster patterns change, that cluster, in a sense, can be imagined as rotating around some location in the network. In the two cases depicted by Figs. 1 and 2, that location is the macro site in the middle of each subfigure, though this need not be the case in general. The second aspect of rotation is the periodic rotation through a set of cluster patterns, in a round-robin fashion. This latter aspect is more general to any arbitrary cell layout. The first aspect may not necessarily be applicable, or at least quite so readily visible, as the second. For example, the five patterns in Fig. 2 could be ordered arbitrarily. If so, the physical rotation aspect would not be as apparent, but the rotation through the set of (re-ordered) patterns would still occur.
While we investigate two regular grid-like cell layouts herein, the concept of cluster rotation can also be applied to more general irregular layouts. For such irregular layouts, it would first be necessary to determine sets of BSs in the network for coordination and then assign different clustering patterns to them. This may not be as simple as with a regular cell layout, but remains feasible, given a set of BS locations and coverage areas and/or where interference results without coordination. Voronoi diagrams of order n [36] could be of use to locate regions of coordinated BSs, by identifying the n nearest BSs at any given location; the distances should also be weighted based on the type/tier of each transmitting node. The system can then rotate through those patterns just as in this work.
Complexity comparison of dynamic clustering and rotating clustering
Fully dynamic clustering (whether the scheme in [30] or otherwise) results in significantly higher overhead in computational load and signaling. The system must determine and exchange possible choices of BSs for each user, run some sort of optimization or other routine to determine the choice of which BSs to serve which user, and finally communicate these choices across the network and to the users. This could occur potentially as often as every scheduling interval, though the system could also perform these operations less frequently. In comparison, almost none of those computations are required with rotating clustering since the sets of clusters are predetermined beforehand and known at all transmitting nodes. The additional overhead beyond that of static clustering is simply the same as the last stage of fully dynamic clustering, i.e., to periodically inform the users what their new cluster will be.
Furthermore, there are additional savings in complexity in regard to cell association. With rotating clustering, the association of a user to a specific anchor BS has much less impact on the network’s operation (disregarding the context of high user mobility and/or handoff, which are outside the scope of this work). Note that a user receives data from a macrocell and all picocells overlaying that macrocell. Borders between macrocells (where the received power from the BSs of those cells are equal) are statistically identical; at times, that cell border may also be a cluster border, while at other times, it will not. Thus, a complicated cell association scheme is not required. Whether a user chooses an anchor BS by closest distance, highest average received power, adding on a tier-dependent association bias factor, etc., the performance of the scheme is by and large unchanged. Essentially, users can be considered more to be associated with a cluster rather than with an individual cell; in terms of performance, it is largely equivalent to associate with any one of the cells in the cluster. There may still be, for example, considerations of offloading traffic, but these would now be between clusters rather than between cells. In any event, such factors are beyond the scope of this paper.
Achievable weighted sum rate maximization and user scheduling
For both layouts, as stated earlier, averaging the throughput over all transmission periods and clustering patterns will improve the overall achievable sum rate of the users, with users scheduled primarily during their most favorable pattern. Defining T
cl
as a specific clustering pattern duration in units of scheduling intervals, rotation to the next pattern will occur every T
cl
scheduling intervals. Denoting Bc(i) as the number of BSs in the c(i)th cluster of the ith pattern, the aggregate downlink channel \(\mathbf {H}_{c(i),k} \in \mathcal {C}^{M\times B_{c(i)} N}\) of the kth user from all these Bc(i) BSs is defined by Hc(i),k=[Hc(i),k(1),⋯,Hc(i),k(Bc(i))], where \(\mathbf {H}_{c(i),k} (b) \in \mathcal {C}^{M\times N}\) denotes the downlink channel matrix between the kth user and bth BS of the cluster. Each element of Hc(i),k(b), denoted by hc(i),k(b,m,n), is the complex downlink channel signal strength coefficient between the mth receiving antenna of the kth user and the nth transmitting antenna of the bth BS in the c(i)th cluster. This coefficient includes path loss, log-normal shadowing, and Rayleigh fading, and is modeled by
$$ {\begin{aligned} h_{c(i),k} (b,m,n)&=z_{c(i),k} (b,m,n)\\ &\quad\times \sqrt{\Gamma_{0} P_{t} (b)\! \left(\frac{R_{m}}{d_{c(i),k} (b)}\right)^{\alpha (b)} \! \rho_{c(i),k} (b)A(\theta,\! b)}. \end{aligned}} $$
(1)
zc(i),k(b,m,n) represents small-scale frequency-flat Rayleigh fading with an i.i.d. complex Gaussian random variable distributed as \(~\mathcal {CN}(0,1)\). R
m
is the reference distanceFootnote 2, and Γ0 is a scaling factor controlling the reference signal-to-noise ratio (SNR) at a distance of R
m
in the boresight direction of the directional antenna. The distance between user k and BS b in cluster c(i) is represented by dc(i),k(b), and α(b) is the path loss exponent for BS b. P
t
(b) is the transmit power of BS b, and ρc,k(b) denotes the log-normal shadow fading coefficient with standard deviation σ
ρ
. The antenna pattern A(θ,b) of a macro BS, where θ is the angle between the direction of interest and the boresight of the antenna at BS b, is defined as described in [32, 37]; A(θ,b) is equal to unity for pico BSs with omnidirectional antennas.
All Bc(i) BSs of cluster c(i) cooperatively transmit the data vector \(\mathbf {s}_{c(i),k} \in \mathcal {C}^{M\times 1}\) for user k using the aggregate precoding matrix \(\mathbf {W}_{c(i),k} \in \mathcal {C}^{B_{c(i)} N\times M}\). The received signal \(\mathbf {y}_{k} \in \mathcal {C}^{M\times 1}\) for user k is given by
$$ {{\begin{aligned} {}\mathbf{y}_{k} = \mathbf{H}_{c(i),k}\sum_{j=1}^{U_{c(i)}}\mathbf{W}_{c(i),j}\mathbf{s}_{c(i),j} +\! \underbrace{\sum_{\check{c}(i)\neq c(i)}\mathbf{H}_{\check{c}(i),k}\sum_{\forall j}\mathbf{W}_{\check{c}(i),j}\mathbf{s}_{\check{c}(i),j}+\mathbf{n}_{k}}_{\mathbf{Z}_{c(i),k}}. \end{aligned}}} $$
(2)
The first term in (2) is the received signal from cluster c(i), to which the user belongs, while the second term describes the interference from other clusters. Applying the central limit theorem, the total interference signal from all clusters not including c(i), denoted by \(\check {c}(i)\neq c(i)\), is approximated by an M×1 complex Gaussian random vector with zero mean and standard deviation σ
I
. To estimate the standard deviation of this interference, it is assumed that all BSs outside the cluster c(i) are transmitting with full power, representing the worst case for ICI. The interference from these BSs experienced at different locations within the cluster c(i) is determined and averaged via Monte Carlo simulation over many channel realizations. The standard deviation of these realizations is used as the value of σ
I
. The last term \(\mathbf {n}_{k} \in \mathcal {C}^{M\times 1}\) is a complex additive white Gaussian noise vector with each element having zero mean and unity variance. The summation of interference and noise is denoted by Zc(i),k, which with the Gaussian interference approximation ends up as a complex Gaussian random vector with zero mean and variance \(\sigma _{I}^{2} + 1\). For convenience of calculation, the interference-plus-noise power is normalized at the receiver. This is equivalent to applying a filter at the receiver of \(\mathbf {Q}_{r} = \left (\sigma _{I}^{2} + 1\right)^{-1/2}\mathbf {I}_{M}\). Hence, by defining \(\tilde {\mathbf {H}}_{c(i),k}= \mathbf {Q}_{r}\mathbf {H}_{c(i),k}\) as the post-processed equivalent channel matrix and \(\tilde {\mathbf {Z}}_{c(i),k}= \mathbf {Q}_{r}\mathbf {Z}_{c(i),k}\) as the normalized interference plus noise, (2) is revised as
$$ \tilde{\mathbf{y}}_{k} = \tilde{\mathbf{H}}_{c(i),k}\sum_{j=1}^{U_{c(i)}}\mathbf{W}_{c(i),j}\mathbf{s}_{c(i),j} + \tilde{\mathbf{Z}}_{c(i),k}. $$
(3)
We choose to use the SZF-DPC precoding technique, where the encoding order of the users is very important for maximization of the achievable weighted sum rate. Given a set of users with order \(\pi _{c(i)}^{j}\) and defining the user encoded at position k as \(\pi _{c(i),k}^{j}\), the post-processed received signal can be modified and expanded as
$$ \begin{aligned} \tilde{\mathbf{y}}_{\pi_{c(i),k}^{j}} & = \tilde{\mathbf{H}}_{c(i),\pi_{c(i),k}^{j}}\mathbf{W}_{c(i),\pi_{c(i),k}^{j}}\mathbf{s}_{c(i),\pi_{c(i),k}^{j}} \\ & \quad+ \tilde{\mathbf{H}}_{c(i),\pi_{c(i),k}^{j}}\sum\limits_{l<k}\mathbf{W}_{c(i),\pi_{c(i),l}^{j}}\mathbf{s}_{c(i),\pi_{c(i),l}^{j}} \\ & \quad+ \tilde{\mathbf{H}}_{c(i),\pi_{c(i),k}^{j}}\sum\limits_{l>k}\mathbf{W}_{c(i),\pi_{c(i),l}^{j}}\mathbf{s}_{c(i),\pi_{c(i),l}^{j}} \\ & \quad+ \tilde{\mathbf{Z}}_{c(i),\pi_{c(i),k}^{j}}. \end{aligned} $$
(4)
The two summations in the second and third line of (4) represent the intra-cluster interference for user k. In SZF-DPC, the precoding matrix \(\mathbf {W}_{c(i),\pi _{k}^{j} }\) is constrained to lie in the null space of the channel matrices of all users encoded before \(\pi _{c(i),k}^{j}\); the aggregate channel matrix of previously encoded users is defined as \(\mathbf {H}_{k-1}=\left [\tilde {\mathbf {H}}_{c(i),\pi _{c(i),1}^{j}}^{T},\ldots,\tilde {\mathbf {H}}_{c(i),\pi _{c(i),k-1}^{j}}^{T} \right ]^{T}\). The precoding matrix cancels the intra-cell interference from the summation in the third line of (4), while the effect of the remaining intra-cell interference represented by the summation in the second line of (4) is removed by using DPC. Using singular value decomposition of Hk−1, for a given ordered user \(\pi _{c(i),k}^{j}\), its achievable rate \(R_{c(i),\pi _{c(i),k}^{j}}\) is given by
$$ \begin{aligned} {} R_{c(i),\pi_{c(i),k}^{j}} &= \log_{2}\left|\mathbf{I}_{M} + \left(\tilde{\mathbf{H}}_{c(i),\pi_{c(i),k}^{j}}\mathbf{V}_{k-1}^{0}\right) \right.\\ &\qquad \qquad\quad \times \left. \mathbf{Q}_{c(i),\pi_{c(i),k}^{j}}(\tilde{\mathbf{H}}_{c(i),\pi_{c(i),k}^{j}}\mathbf{V}_{k-1}^{0})^{H}\right|. \end{aligned} $$
(5)
\(\mathbf {Q}_{c(i),\pi _{c(i),k}^{j}}\) is the transmit covariance matrix for user \(\pi _{c(i),k}^{j}\) in cluster c(i), and \(\mathbf {V}_{k-1}^{0}\) are orthonormal basis vectors for the joint null space of Hk−1 for the users before \(\pi _{c(i),k}^{j}\) in the encoding order; \(\mathbf {V}_{0}^{0} \triangleq \mathbf {I}_{B_{c(i)}N}\).
The throughput maximization criterion results in the selection of a scheduled vector of users that achieves the largest sum rate among all possible vectors of users. Those users who have better channel gains have a higher likelihood to be selected by an MT scheduler. Thus, users with poorer channel gains may be very infrequently (and potentially never) selected by the scheduler, which is not fair. In PF scheduling, each user has a weight related to its priority for being chosen by the scheduler. The scheduler adjusts each weight based on the average achievable rates in the user’s history. A PF scheduler chooses those users whose instantaneous rates relative to their average rates are better than the others and uses a weighted sum rate as its scheduling metric, i.e., the combination of those users with maximum weighted sum rate will be chosen to be scheduled. If a user has been selected by the scheduler often, its weight for the next interval will be decreased (as its average rate increases), i.e., its chance to be chosen in the next scheduling interval diminishes. Meanwhile, another user with a worse channel matrix may have more opportunity to be scheduled in the next interval simply by having higher weight. Using this method provides more fairness in the network among all users.
In each cluster, the maximum achievable weighted sum rate WSRc(i) is given byFootnote 3
$$ {\begin{aligned} &\qquad {WSR}_{c(i)} = \max_{\pi_{c(i)}^j:j \in \{1,2,\cdots,U_{c(i)}!\}} \\ &\max_{\left\{\mathbf{Q}_{c(i),\pi_{c(i),k}^{j}}\right\}_{k\in \{1,\cdots,U_{c(i)}\}} : \mathbf{Q}_{c(i),\pi_{c(i),k}^{j}}\succeq \mathbf{0}, \ \sum\limits_{\forall k} Tr(\mathbf{Q}_{c(i),\pi_{c(i),k}^{j}})\leq 1} \\ &\qquad\qquad\qquad \sum_{k=1}^{U_{c(i)}} \mu_{c(i),\pi_{c(i),k}^{j}}(t)R_{c(i),\pi_{c(i),k}^{j}}(t) \end{aligned}} $$
(6)
where \(\mu _{c(i),\pi _{c(i),k}^{j}}(t)\) is the priority weight of the kth user during the tth scheduling interval in cluster c(i). In PF scheduling, for the lth user out of Kc(i), \(\mu _{c(i),l} (t) = 1/\bar {R}_{c(i),l} (t)\), where \(\bar {R}_{c(i),l} (t)\) is the average achievable data rate of the lth user at time t, averaged over a window of the past t
c
intervals. In each time interval, \(\bar {R}_{c(i),l}(t)\) (and thus μc(i),l(t)) is updated by an exponential filter as
$$ {}\bar{R}_{c(i),l} (t+1) = \left\{ \begin{array}{ccc} \left(1-\frac{1}{t_{c}}\right)\bar{R}_{c(i),l} (t) &\text{if the } l\text{th user is }\\ + \frac{R_{c(i),l}(t)}{t_{c}} \qquad &\text{scheduled in }\\ &\text{interval } {t},\\ \left(1-\frac{1}{t_{c}}\right)\bar{R}_{c(i),l} (t) &\text{otherwise} \end{array}\right. $$
(7)
Rc(i),l(t) is the instantaneous rate of the lth user, and is obtained from (5), assuming the lth user is scheduled in position k of the ordered scheduling vector \(\pi _{c(i)}^{j}\). One important special case of achievable weighted sum rate maximization is MT, which is defined by setting μc(i),l to a constant of 1 for all users. Let us define the best ordered user vector as π∗. Then, in any clustering pattern i, the maximum average achievable weighted sum rate over the area of an arbitrary macrocell, averaged over time t when using pattern i, is given as
$$ \begin{aligned} {}\mathbb{E}_{t} \!\left(WSR (t,i) \right) \,=\, \mathbb{E}_{t} \!\left(\sum\limits_{k=1}^{U_{c(i)}} \mu_{c(i),\pi_{c(i),k}^{*}}(t)R_{c(i),\pi_{c(i),k}^{*}}(t)\!\right)\!/ w_{c(i)} \end{aligned} $$
(8)
where wc(i) is the number of macrocells in cluster c(i).
To solve the optimization problem in (6) using (5) as \(R_{c(i),\pi _{c(i),k}^{j}}\), we must consider the \(\mu _{c(i),\pi _{c(i),k}^{j}}(t)\) weights when calculating \(\mathbf {Q}_{c(i),\pi _{c(i),k}^{j}}\) using the water-filling algorithm, which allocates power over the eigenmodes of the block-diagonal matrix formed using the effective channel matricesFootnote 4\(\mathbf {G}_{c(i),\pi _{c(i),k}^{j}} = \tilde {\mathbf {H}}_{c(i),\pi _{c(i),k}^{j}}\mathbf {V}_{k-1}^{0}\). The user selection within a cluster is performed by using a SAS algorithm similar to what we proposed in [35] and described in Algorithm 3 therein. For ease of reference, the pseudocode of the SAS algorithm is described in Algorithm 1 here. The main difference between [35] and here is that the solution values s
x
and \(s_{\hat {\mathbf {x}}}\) are now achievable weighted sum rates as per (6). The ordered user vectors x and \(\hat {\mathbf {x}}\) are used for \(\pi _{c(i)}^{j}\) in (6). The rest of the operation of the algorithm is unchanged. The SAS algorithm operates in parallel separately for each cluster.
Two positive variables B1 and B2 limit the iterations and control how closely the algorithm approaches the optimal solution with the trade-off of the algorithm’s performance and its acceptable complexity. The larger the values of B1 and B2, the closer the algorithm comes to the optimal solution, but the computational complexity also increases as the algorithm iterates longer. The SAS algorithm starts with the variable τ
t
(analogous to the temperature in metallurgical annealing) equal to τhot. It continues until τ
t
“cools” to the value of \(\phantom {\dot {i}\!}\tau _{f} = \tau _{\text {hot}} \phi ^{B_{1}}\). ϕ is a uniformly distributed random variable in the interval (0,1). The neighborhood function N(x(n)) at random either deletes a random user from the vector x(n), adds a random unscheduled user (if possible), replaces one scheduled user with a random unscheduled one, or swaps the encoding order position of two random scheduled users. Each of these actions has an equal probability of being chosen.
We refer the reader to [35] for more details on our SAS algorithm. Note that no change to the scheduling algorithm is required for rotating clustering; better cluster patterns for users are automatically detected by the algorithm through the corresponding more favorable channel gains and/or achievable rates during that pattern, making the users more likely to be scheduled during those better patterns.