Massive MIMO with Multi-cell MMSE Processing: Exploiting All Pilots for Interference Suppression

In this paper, a new state-of-the-art multi-cell MMSE scheme is proposed for massive MIMO networks, which includes an uplink MMSE detector and a downlink MMSE precoder. The main novelty is that it exploits all available pilots for interference suppression. Specifically, let $K$ and $B$ denote the number of users per cell and the number of orthogonal pilot sequences in the network, respectively, where $\beta = B/K$ is the pilot reuse factor. Then our multi-cell MMSE scheme utilizes all $B$ channel directions, that can be estimated locally at each base station, to actively suppress both intra-cell and inter-cell interference. The proposed scheme is particularly practical and general, since power control for the pilot and payload, imperfect channel estimation and arbitrary pilot allocation are all accounted for. Simulations show that significant spectral efficiency (SE) gains are obtained over the single-cell MMSE scheme and the multi-cell ZF, particularly for large $\beta$ and/or $K$. Furthermore, large-scale approximations of the uplink and downlink SINRs are derived, which are asymptotically tight in the large-system limit. The approximations are easy to compute and very accurate even for small system dimensions. Using these SINR approximations, a low-complexity power control algorithm is also proposed to maximize the sum SE.


I. INTRODUCTION
Multi-user multiple-input-multiple-output (MU-MIMO) communication has drawn considerable interest in recent years. By scheduling multiple users to share the spatial channel simultaneously, the spatial degrees of freedom offered by multiple antennas can be exploited to focus signals on intended receivers, reduce interference, and thereby increase the system data rate [1]- [6]. These features make MU-MIMO incorporated into recent and evolving wireless standards like 4G long-term evolution (LTE) and LTE-Advanced [7].
Massive MU-MIMO, or very large MU-MIMO, is an emerging technology that scales up MU-MIMO by orders of magnitude [8], [9]. The idea is to employ an array comprising say a hundred, or more, antennas at the base station (BS) and serve tens of users simultaneously per cell. Compared to the contemporary cellular systems, the system SE can be drastically increased without consuming extra bandwidth [7]- [9]. Uplink and downlink transmit power can also be reduced by an order of magnitude since the phase-coherent processing provides a comparable array gain [10]. In the limit of an infinite number of antennas, intra-cell interference and uncorrelated noise can be averaged out by using simple coherent precoders and detectors, and the only performance limitation is pilot contamination and the distortion noise from hardware impairments [8], [11]. Furthermore, in time division duplex (TDD) mode, the channel training overhead scales linearly with the number of users, instead of the number of BS antennas, which allows for adding antennas elements without affecting the training overhead [12]. These features make massive MIMO one of the key technologies for the next generation wireless communication systems.
In the uplink reception and downlink transmission, the most common linear processing schemes are matched filtering (MF), zero forcing (ZF) and minimum mean square error (MMSE). 1 Let B denote the number of orthogonal pilot sequences that are available in the network, and K denote the number of users in each cell. We can then define β = B/K ≥ 1 as the pilot reuse factor, since only 1/β of the cells use the same set of pilots. In conventional massive MIMO systems, the BS first listens to the uplink pilot signalling from its own cell, estimates the K intra-cell channels and then constructs its transceiver processing based on the channel estimates to mitigate the intra-cell interference [13]- [16]. However, parts of the inter-cell interference can also be suppressed when β > 1. If the BS is aware of all pilot sequences, then it can locally estimate B channel directions by listening to the pilot signalling from all cells instead of only from its own cell. Since its K users only occupy K out of the B channel directions, the BS is able to select its user-specific detectors in the uplink to suppress interference from other cells, and design precoders in the downlink to mitigate interference leakage to other cells. Based on similar observations, some multi-cell detection and precoding schemes have been proposed in [16]- [19].
In [17], a multi-cell ZF detector (referred to as full-pilot ZF detector in [17]) is proposed, which exploits and orthogonalizes all available directions to mitigate parts of the inter-cell interference.
It achieves a higher SE than the conventional ZF when the interfering users are near to the edges of the surrounding cells. In general cellular networks, however, the gain is less obvious, partly due to the loss in array gain of B in multi-cell ZF, instead of K as with conventional ZF. Uplink multi-cell MMSE detectors are proposed in [16] and [19], but the former is limited to β = 1 and equal power allocation, and the latter is based on the unrealistic assumption that perfect channel state information (CSI) is known at BS. The multi-cell MMSE precoder proposed in [18] brings a notable gain over single-cell processings. However, like [16], this scheme does not account for arbitrary pilot allocation which, as shown in [17], is an important way to suppress pilot contamination and achieve high system SE in massive MIMO deployments. Moreover, no closed-form performance expressions are provided in [18].
In this paper, a new state-of-the-art multi-cell MMSE transceiver scheme is proposed, which includes an uplink MMSE detector and a downlink transmit MMSE precoder. The novelty of the multi-cell MMSE scheme is that all B pilots are exploited at each BS to actively suppress both intra-cell and inter-cell interference. Power control for the pilot and payload, imperfect channel estimation and arbitrary pilot allocation are all accounted for in our scheme. Numerical results show that significant SE gains can be obtained by the proposed scheme over conventional single-cell schemes and the multi-cell ZF from [17], and the gains become more significant as β and/or K increase. Furthermore, large-scale approximations of the uplink and downlink SINRs are derived for the proposed multi-cell MMSE scheme, which are asymptotically tight in the large-system limit. The approximations are easy to compute since they only depend on large-1 A special case of the downlink MMSE precoder is the regularized ZF (RZF) precoder, which is obtained when all the users in a cell have equal pathlosses [20]. Since this is generally not the case in cellular networks, RZF provides lower performance than the MMSE precoder and is not considered in this paper. scale fading, power control and pilot allocation, and shown to be very accurate even for small system dimensions. Based on the SINR approximations, a low complexity iterative power control algorithm for sum SE maximization is proposed for the multi-cell MMSE scheme. Compared to the equal power allocation policy, our proposed algorithm significantly improves the system sum SE and also provides good user fairness.
The paper is organized as follows: In Section II, we describe the system model and the construction of the multi-cell MMSE transceiver. Large-scale approximations of the uplink and downlink SINRs are derived in Section III. Based on the SINR approximations, a low complexity iterative power control algorithm is proposed in Section IV. Simulation results are provided in Section V before we conclude the paper in Section VI. All proofs are deferred to the appendix.
Notations: Boldface lower and upper case symbols represent vectors and matrices, respectively.

II. SYSTEM MODEL AND TRANSCEIVER DESIGN
We consider a synchronous massive MIMO cellular network with multiple cells. Each cell is assigned with an index in the cell set L, and the cardinality |L| is the number of cells. The BS in each cell is equipped with an antenna array of M antennas and serves K single-antenna users within each coherence block. Assume that this time-frequency block consists of T c seconds and W c Hz, such that T c is smaller than the coherence time of all users and W c is smaller than the coherence bandwidth of all users. This leaves room for S = T c × W c transmission symbols per block, and the channels of all users remain constant within each block. Let h jlk denote the channel response from user k in cell l to BS j within a block, and assume that it is a realization from a zero-mean circularly symmetric complex Gaussian distribution: The vector z lk ∈ R 2 is the geographical position of user k in cell l and d j (z) is an arbitrary function that accounts for the channel attenuation (e.g., path loss and shadowing) between BS j and any user position z. Since the user position changes relatively slowly, d j (z lk ) is assumed to be known at BS j for all l and all k.
We consider a TDD protocol in this paper. where the downlink channels are estimated by uplink pilot signaling by exploiting channel reciprocity. In TDD mode, each transmission block is divided into two phases: 1) uplink channel estimation phase, where each BS estimates the CSI from uplink pilot signalling which occupies B out of S symbols in each block; 2) uplink and downlink payload data transmission phase, where each BS processes the received uplink signal and the to-be-transmitted downlink signals using the estimated CSI. Let ζ ul and ζ dl denote the fixed fractions allocated for uplink and downlink payload data transmission, respectively. These fractions can be selected arbitrarily under the conditions that ζ ul + ζ dl = 1 and that ζ ul (S − B) and ζ dl (S − B) are positive integers. In what follows, the uplink channel estimation is first discussed to lay a foundation for the transceiver design.

A. Uplink Channel Estimation
In the uplink channel estimation phase, the collective received signal at BS j is denoted as is the length of the pilot sequences (it also equals to the number of orthogonal pilot sequences available in the network). Then Y j can be expressed as where h jlk is the channel response defined in (1), p lk ≥ 0 is the power control coefficient for the pilot of user k in cell l, and N j ∈ C M ×B contains independent and identically distributed (i.i.d.) elements that follow CN (0, σ 2 ). We assume that all pilot sequences originate from a predefined orthogonal pilot book, defined as and let i lk ∈ {1, . . . , B} denote the index of the pilot sequence used by user k in cell l.
Arbitrary pilot reuse is supported in our work by denoting the relation between B and K by B = βK, where β ≥ 1 is called the pilot reuse factor. If the pilots are allocated wisely in the network, a larger β brings a lower level of interference during pilot transmission, known as pilot contamination.
Based on the received signal in (2), the MMSE estimate of the uplink channel h jlk is [17] where Ψ j is the covariance matrix of the vectorized received signal vec(Y j ) and is given by According to the orthogonality principle of MMSE estimation, the covariance matrix of the estimation errorh jlk = h jlk −ĥ jlk is given by where α ji lk is a scalar, the estimation error covariance matrix in (6) can be expressed as As pointed out in [17], the part Y j (Ψ * j ) −1 v * i lk of the MMSE channel estimate in (4) depends only on which pilot sequence that user k in cell l uses. Consequently, users who use the same pilot sequence have parallel estimated channels at each BS, while only the amplitudes are different in the estimates. To show this explicitly, define the M × B matrix which allows the channel estimate in (4) to be reformulated aŝ where e i denotes the ith column of the identity matrix I B . The property that users with the same pilot have parallel estimated channels is utilized to derive new SE expressions in the sequel.
Notice that the estimated channelĥ jlk is also a zero-mean complex Gaussian vector, with its Define the covariance matrix ofĥ V,ji asΦ V,ji . Then according to (10) and (11),Φ V,ji = α ji BI M .

B. Uplink Multi-cell MMSE detector
After the uplink channel estimation, during the uplink payload data transmission phase, the received signal y j ∈ C M ×1 at BS j is where τ lk is the transmit power of the payload data from user k in cell l, x lk ∼ CN (0, 1) is the transmitted signal from a Gaussian codebook, and n j ∼ CN (0, σ 2 I M ) is additive white Gaussian noise (AWGN). Different symbols are used for pilot power and payload power to allow for different power control policies for them. Denote the linear detector used by BS j for an arbitrary user k in its cell as g jk , then the detected signalx jk iŝ By using (13), the following achievable ergodic SE can be achieved for this user [13] R ul where E {ĥ (j) } denotes the expectation with respect to all the channel estimates obtained at BS j, and the SINR η ul jk is given by where E{·|ĥ (j) } denotes the conditional expectation given all the estimated channels at BS j. Due to that the imperfectly estimated channels are available, the SE in (14) is achieved by treating g H jkĥ jjk as the true channel, and treating uncorrelated interference and channel uncertainty as worst-case Gaussian noise [13]. Thus, R ul jk is a lower bound on the uplink ergodic capacity. The second line of Eqn. (15) shows that the uplink SINR takes the form of a generalized Rayleigh quotient. Therefore, a new multi-cell MMSE (M-MMSE) detector can be derived to maximize this SINR for given channel estimates: As the name suggests, this detector (with an appropriate scaling) also minimizes the mean square error (MSE) in estimating x jk [21]: By plugging (8) and (10) into (16), the M-MMSE detector can also be expressed as where Λ j = l∈L K k=1 τ lk p lk d 2 j (z lk )e i lk e H i lk is a diagonal matrix, and its ith diagonal element λ ji depends on the large scale fading, the pilot and payload power of the users that use the ith pilot To elaborate the advantages of our M-MMSE scheme, we compare it with some related work.
First, the conventional single-cell MMSE (S-MMSE) detector from [13]- [15] is where inter-cell interference is either ignored by setting Z j = 0 or only considered statistically as with Notice that the S-MMSE detector in (19) is not a pure single-cell detector if Z j in (20) is used, since statistical information about the multi-cell interfering channels is utilized in Z j . We refer to it as "single-cell" detector because it only utilizes the K estimated channel directions from within the serving cell, and treats directions from other cells as uncorrelated noise. In comparison, all the B available estimated directions inĤ V,j are utilized in our M-MMSE detector so that BS j can actively suppress also parts of inter-cell interference when B > K. Therefore, our detector can actually maximize the SINR in (15), while the S-MMSE can only do this in singlecell scenarios. The M-MMSE scheme can be seen as a coordinated beamforming scheme, but since there is no signalling between the BSs (BS j estimatesĤ V,j from the uplink pilots), the M-MMSE scheme is fully scalable.
Compared with the multi-cell MMSE scheme proposed in [16] and [19], our detector is more practical and general. To begin with, power control and any fractional pilot reuse policy are supported in our scheme, which allows for an analysis based on a more flexible and practical network deployment. It is shown in [17] that in massive MIMO systems, fractional pilot reuse is an important way to suppress pilot contamination and achieve high system SE. Furthermore, the uplink detector in [19] is based on the unrealistic assumption that perfect CSI is known at each BS, while imperfect channel estimation is accounted for in our detector. Thus the performance gains provided by our detector are actually achievable in practical systems. This makes our new M-MMSE detector the state-of-the-art method for massive MIMO detection. In Section III, an explicit large-scale approximation expression of the SINR in (15) is provided, which allows for simple performance analysis and the design of resource allocation schemes without timeconsuming Monte Carlo simulation.

C. Downlink Multi-cell MMSE Precoder
During the downlink payload data transmission, the received signal at user k in cell j is where w lm ∈ C M ×1 is the precoder used by BS l for user m in its cell, s lm ∼ CN (0, 1) is the payload data symbol for user m in cell l, ̺ lm is the corresponding downlink transmit power coefficient, and n jk ∼ CN (0, 1) is AWGN.
Recently, an uplink-downlink duality for massive MIMO systems was established in [17] which proves that for a proper downlink power control, the uplink SEs can be achieved also in the downlink if each downlink precoder is a scaled version of the corresponding uplink detector. Since the M-MMSE detector proposed in the Subsection II-B is the state-of-the-art uplink method, we apply the same methodology for downlink precoding. The downlink M-MMSE precoder is constructed as where γ jk = E{ g M−MMSE jk 2 } normalizes the average transmit power for the user k in cell j Since there are no downlink pilots in the TDD protocol, the users do not know the current channel but can learn their statistical equivalent channels, √ ̺ jk E {h} {h H jjk w jk }, and the total interference variance. Consequently, a downlink SE can be achieved for user k in cell l [13], [17], where η dl jk is This downlink SINR holds for any linear precoding scheme, and we omit the superscript "M-MMSE" of w jk for brevity. The SE in (23) is achieved by treating E {h} {h H jjk w jk } as the true channel, and treating interference and channel variations as worst-case uncorrelated Gaussian noise. Thus, R dl jk is a lower bound on the downlink ergodic capacity. By utilizing all the available estimated directions, the M-MMSE precoder can suppress intracell interference and also reduce the interference caused to other cells, and thus a higher SINR can be expected by our precoder than conventional single-cell precoders, at least for an appropriate power control [17]. In the next section, a large-scale approxmiation of the downlink SINR in (24) is derived. In [18], the authors also proposed a multi-cell MMSE precoder which brings a notable gain over single-cell processing, but it does not accounted for arbitrary or optimized pilot allocation. Moreover, no closed-form performance expression is provided in [18].
Looking jointly at the uplink and downlink, the ergodic achievable SE for user k in cell j is

III. ASYMPTOTIC ANALYSIS
In this section, performance analysis is conducted for the proposed multi-cell MMSE scheme.
Since the uplink SINR in (15) depends on the stochastic channel estimates in each block, the uplink SE in (14) cannot be computed in closed form. Therefore, a deterministic equivalent expression for the SINR is computed instead which is tight in the large-system limit. A largescale approximation of the downlink SINR is also provided. The large-system limit is considered, where M and K go to infinity while keeping K/M finite. In what follows, the notation M → ∞  1, ..., B), have uniformly bounded spectral norms (with respect to M). Then, for any ρ > 0, where T(ρ) ∈ C M ×M is defined as and the elements of δ(ρ) for t = 1, 2, . . . , with initial values δ where T ′ (ρ) ∈ C M ×M is defined as T(ρ) and δ(ρ) are defined in Theorem 1, and δ ′ (ρ) = δ ′ 1 (ρ), ..., δ ′ B t(ρ)] T is calculated as where J(ρ) and v(ρ) are defined as [

B. Large-scale Approximations of the SINRs with the M-MMSE scheme
In what follows, we derive the deterministic equivalentη ul jk of η ul jk with the M-MMSE detector, and the large-scale approximationη dl jk of η dl jk with the M-MMSE precoder, such that with where δ lm , µ ljkm and ϑ ′′ lm are given in Theorem 3.
Proof: See Appendix C.
By utilizing Theorem 3 and 4, the ergodic SEs R ul jk in (14) and R dl jk in (23), after dropping the prelog factor (1 − B S ), converge toR ul jk = log 2 (1 +η ul jk ) andR dl jk = log 2 (1 +η dl jk ) in the large-system limit, respectively. Therefore, a large-scale approximation of the joint ergodic SE in (25) is provided by (1 − B S )(ζ ulRul jk + ζ dlRdl jk ). This approximation is easy to compute and only depends on the large-scale fading, power control and pilot allocation. As shown in Section V, this large-scale approximation is very accurate also at small system dimensions.

C. The Uplink and Downlink Duality for the M-MMSE scheme
It is pointed out in [17] that when the precoder is a scaled version of the detector, like (22) in our case, the same per user SEs as in the uplink can be achieved in the downlink by properly selecting the downlink payload power. We establish this uplink-downlink duality for our M-MMSE scheme, using the large-scale SINR approximations given by Theorem 3 and Theorem 4. and that the same SE is achieved in the downlink, i.e.,η dl jk =η ul jk .
Proof: See Appendix D.
Note that Theorem 5 establishes the duality for the large-scale SINR approximations, instead of the real SINRs. However, since the approximations are very accurate even for small system dimensions, Theorem 5 provides a powerful tool to obtain a judicious downlink power allocation whenever the same SEs are desired in both the uplink and downlink.

IV. ITERATIVE POWER CONTROL
The large-scale approximations of the uplink and downlink SINRs given in Theorem 3 and Theorem 4 not only enable us to evaluate the system performance without time-consuming Monte Carlo simulation, but they also enable us to improve the system performance by optimizing key system parameters based on only large-scale fading. In this section, we consider optimizing the uplink payload transmit power jointly for the multi-cell network to maximize the weighted uplink sum SE. Since the downlink payload power can be obtained according to Theorem 5, the optimized uplink SEs can be achieved also in the downlink using the same total transmit power.
The effectiveness of our proposed power control algorithm is testified in Section V.

A. Joint Uplink Power Control for Weighted Uplink Sum SE Maximization
The power control for sum SE maximization has been widely studied in cellular networks [23]- [30], and here we consider this sum SE metric for the proposed M-MMSE detector. Using the same notations of D, F and τ defined in Appendix D, and define the vector r = η ul 11 , . . . ,η ul LK T ∈ R LK×1 , then the uplink SINR approximation in (35) can be expressed as where (·) l denotes the lth element of the corresponding vector and l = k + (j − 1)K. Using the notation in (37), we want to find the power control that maximizes the weighted SE as where P max is the maximum radiated transmit power of each user and ξ l > 0 is the weight for the corresponding user. All ξ l = 1 corresponds to conventional sum SE maximization, while other values can be used to enforce some fairness. However, as proved in [31], power control problems for sum SE maximization are strongly NP-hard. Thus lower bounding of log 2 (1 + r l ) by log 2 (r l ) is often used to approximate P as P 1 [32], [33]: For fixed F and D, by introducing the auxiliary vector q with its lth element q l ≤ r ξ l l , problem P 1 can be turned into the geometric programming (GP) problem P 2 above. The optimal solution of P 2 can be obtained numerically, for example, using the convex optimization toolbox in MATLAB. A low-complexity fixed point iteration method is also proposed in [33] to solve problems of the same type as P 2 . With our notation, the power coefficient τ l is updated as where t is the iteration index in the fixed point algorithm, for t = 0, 1, . . .. It is proved in [33] that starting from the initial point τ l (0) = P max for all l, the above algorithm converges at a geometric rate to the optimal solution of P 1 (for fixed F and D).
In our case, however, F and D are not fixed since δ jk , µ jlmk and ϑ ′′ jk will change as τ l changes. Hence, P 2 in our work is not a pure GP. Therefore, Algorithm 1 is proposed to iterate between solving P 2 for fixed F and D, and updating F and D using the current τ .
In step 3, the matrices F, D, the current power τ j and the SINR r j of all users in the network are needed at each BS. Thus Algorithm 1 involves some information exchange among the BSs. However, since the asymptotic approximation only depends on long-term parameters, the information exchange overhead is much smaller than if the sum SE would be maximized in every coherence block based on the current small-scale fading. Moreover, the proposed algorithm only involves simple calculations and converges quickly, thus it is of low complexity. Since the
4: Update the time slot index t with t + 1.
convergence has been proved in [33] for fixed F and D, and we improve them in each iteration, our algorithm converges to some local optimal solution of P 1 .

V. SIMULATION RESULTS
In this section, we illustrate the analytical contributions by simulation results for a symmetric hexagonal network topology. We apply the classic 19-cell-wrap-around structure to avoid edge effects and guarantee the consistent simulated performance for all cells; see  The user locations are generated independently and uniformly at random in the in cells, but the distance between each user and its serving BS is at least 0.14r. For each user location z ∈ R 2 , a classic pathloss model is considered, where the variance of the channel attenuation The vector b j ∈ R 2 is the location of the BS in cell j, κ is the pathloss exponent, and · denotes the Euclidean norm. C (z) > 0 is independent shadow fading for some user location z with 10 log 10 (C (z) ) ∼ N (0, σ 2 sf ). In the simulation, we assume κ = 3.7, σ 2 sf = 5 and the coherence block length S = 1000. 3

A. Benefits of the proposed M-MMSE scheme
In this subsection, we show the benefits of our M-MMSE scheme over the conventional alternatives. Statistical channel inversion power control is applied to both pilot and uplink payload data, i.e., p lk = τ lk = ρ d l (z lk ) [17]. Thus during the uplink phase, the average effective channel gain between users and their serving BSs is constant: Then the average uplink SNR per antenna and user at its serving BS is ρ/σ 2 . This is a simple but effective policy to avoid near-far blockage and, to some extent, guarantee a uniform user performance in the uplink. For downlink payload data transmission, the transmit power ̺ lk is selected according to Theorem 5 to achieve the same downlink SE at each user as in the uplink.
In our simulation, ρ/σ 2 is set to 0 dB to allow for decent channel estimation accuracy, and the time proportions for the uplink and downlink are set to ζ ul = ζ dl = 1 2 . To verify the accuracy of the large-scale approximations from Section III, 10000 independent Monte-Carlo channel realizations are generated to numerically calculate the joint achievable SE in (25). The numerical results and their large-scale approximations from Theorem 3 and Theorem 4 are shown in Fig. 2. As seen from Fig. 2, the achievable sum SE per cell increases monotonically with β for the considered range of values. This is due to the following two properties. Firstly, a larger β results in a lower level of pilot contamination, contributes to a higher channel estimation accuracy, and thereby increases the achievable SE. Secondly, a larger β indicates more available estimated channel directions in the construction of the M-MMSE detector and precoder, thus a higher inter-cell interference suppression can be achieved.
Moreover, Fig. 2 shows that the numerical results and the large-scale approximations match very well, even for small M and small K.

B. Effectiveness of the joint power control scheme
In this subsection, the effectiveness of the power control scheme proposed in Section IV is testified. Since it has been shown in the previous subsection that the proposed M-MMSE scheme performs better than the conventional techniques, especially for large β, we focus on the M-MMSE scheme in this subsection. Statistical channel inversion power control p lk = ρ d l (z lk ) is still applied for pilots, while the uplink payload data power τ jk is optimized. ρ/σ 2 is still set to 0 dB and the maximal transmit power P max in P is selected to make the cell edge SNR (without shadowing) equal to -3 dB. Results for the equal power allocation (i.e., τ lk = P max ) is provided as a base line. We also apply Algorithm 1 to the instantaneous SINR in (15) for comparison.
The following results are obtained for M = 100 and K = 10. After generating user locations and shadow fading, 9 users with the worst channel conditions in the whole network are dropped to provide 95% coverage.
We first consider the performance metric of average user SE which is calculated as the network sum SE divided by the number of served users. The cumulative distribution functions (CDFs) over user locations are shown in Fig. 6 and Fig. 7 for β = 4 and β = 7, respectively. As seen from the figures, the CDF curves with long-term power control based on Algorithm 1 coincide with those with short-term power control optimized for the instantaneous SINR at every coherence block, which validates our power control based on the large-scale SINR approximation. Since the approximation only depends on the long-term statistics, the optimization complexity can be spread over time. Furthermore, compared with the equal power allocation policy, the average user SEs can be significantly improved by our power control scheme. At the 50 percentile, 17% increase can be achieved by our scheme for both β = 4 and β = 7.
We analyze how the per user SE at different parts of the cells is affected by our power control.
Results are also provided for the power control proposed in [34], which tries to provide equal  SE for users in the same cell so that, to some extent, intra-cell user fairness is guaranteed.
CDFs of the per user SE are shown in Fig. 8 for β = 4 and in Fig. 9 for β = 7. Equal power allocation leads to the largest SE variations, while the power control from [34] gives relatively small variations. Interestingly, the proposed power control from Algorithm 1 provides essentially the same SE for the weakest users, while pushing the SE of the majority of the users to higher values. Despite the larger SE variations, we conclude the proposed power control brings a better type of user fairness than the scheme from [34] since the strong users get higher SEs without degrading for the weakest ones.

VI. CONCLUSIONS
In this paper, a new state-of-the-art multi-cell MMSE scheme is proposed, which includes an uplink M-MMSE detector and a downlink M-MMSE precoder. Compared with the conventional single-cell MMSE scheme, that only makes use of the intra-cell channel directions, the novelty of our multi-cell MMSE scheme is that it utilizes all channel directions that can be estimated locally at each BS, so that both intra-cell and inter-cell interference can be actively suppressed.
The proposed scheme brings very promising sum SE gains over the conventional single-cell MMSE and the multi-cell ZF from [17], particularly for large β and K. Since imperfect estimated CSI is accounted for in our scheme, the gains obtained by our scheme are likely to be achievable in practical systems. Furthermore, large-scale approximations of the uplink and downlink SINRs are derived for the proposed multi-cell MMSE scheme, and these are tight in the large-system limit. The approximations are easy to compute since they only depend on large-scale fading, power control and pilot allocation, and shown to be very accurate even for small system dimensions. Based on the SINR approximations, an uplink-downlink duality is established and a low complexity power control algorithm for sum SE maximization is proposed for the multi-cell MMSE scheme. The proposed power control brings a notable sum SE gain and also provides good user fairness compared to the equal power allocation policy. Since the SINR approximations depend only on long-term statistics, the complexity of the power control algorithm can be spread over a long time period. Then, for any vector x ∈ C M ×1 and any scalar τ ∈ C such that A + τ xx H is invertible, Lemma 2 (Matrix inversion lemma (II), [13]): Let A ∈ C M ×M be a Hermitian invertible matrix.
Then, for any vector x ∈ C M ×1 and any scalar τ ∈ C such that A + τ xx H is invertible, . Assume that A has uniformly bounded spectral norm (with respect to M) and that x, y and A are mutually independent. Then, for all p ≥ 1, where (a) follows from Lemma 1 and the fact thatĥ jjk = √ p jk d j (z jk )ĥ V,ji jk and (b) follows from Lemma 4 2). Notice that Lemma 4 2) can be applied since Σ where steps (a) and b follow from Lemma 1 and Lemma 4 3), respectively, which completes the proof.
We use this lemma in the following to determine the asymptotic behaviour of each term in the uplink SINR of (15).

A. Signal power
Since g H jkĥ jjk =ĥ H jjk Σ jĥjjk , then according to Lemma 5, it is obvious that