Skip to main content

Distributed MMSE-based uplink receive combining, downlink transmit precoding and optimal power allocation in cell-free massive MIMO systems


In cell-free massive multiple-input multiple-output systems, a large number of distributed wireless access points (AP) simultaneously serve a number of user equipments (UEs). This setup has recently been introduced as a promising alternative for the current 5G cellular networks. The setup has the ability to offer a good quality of service, especially for the UEs on the cell edges, be it that there is still a need for low-complexity signal processing algorithms. In this paper, the problem of optimal power allocation combined with uplink receive combining and downlink transmit precoding is tackled by providing efficient distributed MMSE-based algorithms. The necessary fronthaul communications to estimate the combining/precoding vectors and the necessary large-scale channel statistics are reduced to a minimum and rely on in-network summation that can be accomplished whenever the APs can be arranged into a tree-topology. Non-weighted max-sum and max–min are used as utilities for the power allocation, but the algorithms are not limited to these cases. Simulations show that the proposed algorithms outperform heuristic power allocation methods, both in uplink and downlink.

1 Introduction

‘multiple-input multiple-output’ (MIMO) technology plays an important role in present-day wireless communication standards and networks. It uses multiple antennas to transmit and receive multiple data signals simultaneously over one and the same radio channel and hence increase spectral efficiency. Massive MIMO (mMIMO) systems use a large(r) number of antennas, to reap additional benefits in rich scattering environments. While mMIMO was initially viewed as a promising concept for future cellular wireless communication networks, it already became a reality in 2018 with commercial deployment in several countries [1]. The signal processing methods required to achieve large spectral efficiency gains compared to conventional multi-user MIMO systems [2] are available and well understood [3]. The spectral efficiency gains associated with mMIMO emerge from the higher spatial resolution, its robustness against small-scale fading due to the so-called ‘channel hardening’ effect, and its ability to suppress interference even with imperfect channel state information (CSI).

One of the major shortcomings of current cellular wireless communication networks, where a user equipment (UE) is served only by the access point (AP) of the cell it resides in, is that UEs on the cell edges experience a low channel gain to their serving AP and a high interference power from nearby cells. Current uplink (UL) receive combining and downlink (DL) transmit precoding algorithms do not effectively mitigate such interference. To address this issue, the concept of cell-free massive MIMO (CFmMIMO) [4,5,6] has been proposed as a promising alternative for the current 5G cellular networks. In CFmMIMO systems a large number of APs are connected to a network center (NC) and cooperate via a fronthaul network to serve a large number of UEs in the network simultaneously as shown in Fig. 1. The joint operation of the APs then allows for implementing interference-rejecting combining and precoding algorithms effective for all UEs in the network. CFmMIMO thus effectively eliminates cell borders while still reusing the network layout as rolled out previously, which offers the potential to spectacularly increase performance as compared to current cellular mMIMO systems.

Fig. 1
figure 1

Comparison of a cellular massive MIMO system (left) with a CFmMIMO system (right). In a CFmMIMO system, a large number of APs with multiple antennas are connected to a NC, serving a large number of UEs in the coverage area

CFmMIMO has its roots in works on distributed MIMO [7, 8] and coordinated multipoint [9, 10], but with the distinction that a UE can be served by several APs, instead of only by the one AP serving the cell in which the UE is located and that the transmission is coordinated by a NC. To allow for a scalable and low-complexity implementation, CFmMIMO has to operate under the following constraints [6]: (a) the time division duplex (TDD) protocol is used, exploiting channel reciprocity between UL and DL; (b) UL channel estimates are computed locally at each AP and will be used only locally, which means that they are not directly communicated to the NC over the fronthaul links; (c) combining/precoding vectors are computed locally at the APs and not at the NC; (d) the fronthaul network is only used to send data symbols and the necessary channel statistics to perform centralized data decisions.

1.1 Related works

The authors in [4, 11] proposed to use maximum ratio combining and conjugate beamforming for UL and DL data transmission in CFmMIMO systems, presented closed-form expressions for the spectral efficiency in UL and DL for each UE and formulated max–min power control algorithms, which were shown to outperform a cellular setup. However, using more complicated combining and precoding algorithms combined with heuristic power allocation methods [5, 6, 10, 12, 13], are shown to lead to higher throughputs. Optimal UL power allocation and receive combining was investigated in [14, 15] using minimum-mean-squared error (MMSE) processing by formulating the original max–min signal-to-interference-plus-noise ratio problem for the optimization of combining vectors at a central point (like the NC). Unfortunately centralized computations increase the fronthaul communication and are not complying with the scalable characteristics of CFmMIMO systems as defined before. Optimal DL power allocation was considered in [5] combined with zero-forcing transmit precoding: although the gains were significant, the zero-forcing transmit precoding again requires a fully centralized computation with significant fronthaul communications. In [16], optimal power allocation algorithms for local partial zero-forcing were proposed as a scalable counterpart and showed to achieve a performance close to the performance of zero-forcing transmit precoding and the centralized regularized zero-forcing transmit precoding, which is a benchmark in DL transmit precoding and can be related to MMSE combining in the UL via UL/DL duality [17, 18]. In [19], a max sum SE problem in an uplink CFmMIMO system was also investigated. Artificial neural networks were used, with the UE positions as input and the power control policy as the output. An UL CFmMIMO system with limited fronthaul was taken into consideration in [20], where zero-forcing was employed locally to heuristically estimate the max sum SE power control scheme. For single antenna APs, a deep neural network-based power allocation method was proposed in [21].

1.2 Contributions

The missing point in the research field is the fact that (optimal) MMSE-based combining/precoding in its original form can only be performed at the NC if all antenna signals are transmitted to it, which implies a large fronthaul communication overhead. Otherwise, heuristic combining/precoding schemes can be used at the APs, reducing the fronthaul communication overhead strongly, but optimal performance can no longer be guaranteed. The same can be stated for the UL and DL power allocation. Therefore, in this paper an MMSE-based distributed UL receive combing and power allocation algorithm (D-UL-RCPA) and DL transmit precoding and power allocation algorithm (D-DL-TPPA) are presented that are scalable and yet attain this optimal performance. The combining/precoding vectors are computed only at the APs, such that the computation are distributed over the network and are not only performed by the NC. The necessary fronthaul communications to estimate the combining/precoding vectors and the necessary large-scale channel statistics are reduced to a minimum and rely on in-network summation that can be accomplished whenever the APs can be arranged into a tree-topology. The reduction in fronthaul communications is especially large when the total number of antennas in the network is larger than the total number of UEs.

The algorithms are derived based on general lower bounds of the system achievable rate, taking into account the effect of channel estimation errors. Non-weighted max-sum and max–min are used as utilities for the power allocation, but the algorithms are not limited to these cases. The former power allocation strategy maximizes the proposed lower bound for the total system sum rate, while the latter, targeting fairness across users, maximizes the minimum spectral efficiency lower bound across all the UEs. Simulations show that the proposed algorithms outperform heuristic power allocation methods, both in UL and DL.

1.3 Paper outline and notation

The paper is organized as follows. The signal model is presented in Sect. 2. In Sect. 3 an optimal UL receive combining and power allocation strategy is developed and an efficient distributed algorithm for a CFmMIMO system is presented, A similar algorithm is developed in Sect. 4 for DL transmit precoding and power allocation, using UL/DL duality. Some further considerations about the power allocation strategies and network topologies are presented in Sect. 5. The performance of the proposed algorithms is numerically evaluated and compared in Sect. 6. Conclusions and further research directions are provided in Sect. 7.

Superscripts \(.^{\mathrm{T}}\) and \(.^{H}\) are used to denote the transpose and conjugate transpose operation. Bold lower case letters \({\textbf{b}} = [b_{1} \ \ldots \ b_{K}]^{\mathrm{T}}\) are used to denote complex vectors and bold upper case letters are used to represent complex matrices \({\textbf{A}} = [{\textbf{a}}_{1} \ \ldots \ {\textbf{a}}_{K}]\). \({\textbf{I}}_{N}\) is the \(N \times N\) identity matrix and \({\textbf{0}}\) denotes an all zeros vectors whose dimensions are clear from the context. The multi-variate circularly symmetric complex Gaussian distribution with correlation matrix \({\textbf{R}}\) is denoted by \({\mathcal{N}}{\mathcal{C}}(0,{\textbf{R}})\), diag\(\{a_{1}, \ldots , a_{K}\}\) is used to denote a diagonal matrix with \(a_{1}, \ldots , a_{K}\) on its diagonal and Blkdiag\(\{{\textbf{A}}_{1}, \ldots , {\textbf{A}}_{K}\}\) is used for a block-diagonal matrix with the square matrices \({\textbf{A}}_{1}, \ldots ,{\textbf{A}}_{K}\) on the diagonal. Finally ||.|| and \(E\{.\}\) denote the Euclidean norm and expected value operator respectively.

2 Signal model

Consider a CFmMIMO system consisting of K single-antenna UEs and L APs, each having N antennas and local processing capabilities, that are randomly deployed over the coverage area. The APs are connected to a NC via a fronthaul network. This setup allows for coherent transmission and reception of data to and from the UEs. In the CFmMIMO literature [5, 22] it is often assumed that the number of antennas NL is much larger than the number of UEs K and that both L and K are large.

The time-frequency resources are divided into coherence blocks of \(\tau _{\mathrm{c}}\) samples during which the channels are assumed to remain constant [18]. APs and UEs operate using a TDD protocol and a channel coherence interval is divided into three phases: UL channel estimation, UL data transmission and DL data transmission. These phases consist of \(\tau _{\mathrm{p}}\), \(\tau _{\mathrm{u}}\) and \(\tau _{\mathrm{d}}\) samples respectively, such that \(\tau _{\mathrm{c}} = \tau _{\mathrm{p}} + \tau _{\mathrm{u}} + \tau _{\mathrm{d}}\). Since the TDD protocol is used, the UL channel and the DL channel are assumed to be each other’s conjugate due to channel reciprocity, so that indeed only UL channel estimation has to be performed. The channel from UE k to AP l is given by \({\textbf{h}}_{kl} \in {\mathbb {C}}^{N}\) and the channel from UE k to all the APs is denoted by \({\textbf{h}}_{k} = [{\textbf{h}}_{k1}^{\mathrm{T}} \ \ldots \ {\textbf{h}}_{kL}^{\mathrm{T}}]^{\mathrm{T}} \in {\mathbb {C}}^{NL}\). The channels \({\textbf{h}}_{kl}\) are assumed to remain constant during a coherence block and drawn from an independent correlated Rayleigh fading realization \({\mathcal{N}}{\mathcal{C}}({\textbf{0}},{\textbf{R}}_{kl})\) in each coherence block.Footnote 1\({\textbf{R}}_{kl} \in {\mathbb {C}}^{N \times N}\) is the positive semi-definite spatial correlation matrix describing the large-scale fading, including geometric pathloss, shadowing, antenna gains, and spatial channel correlation [24]. The large-scale fading is assumed to remain constant over different coherence blocks. The Gaussian distribution models the small-scale fading. Due to the spatial distribution of the APs in the network, the channel vectors of different APs are independently distributed, i.e. \(E\{{\textbf{h}}^{}_{kl} {\textbf{h}}_{kn}^{H} \} = {\textbf{0}}_{N \times N}\) for \(l \ne n\), such that channel estimation can be performed independently at each AP. A schematic overview of the signal model is provided in Fig. 2.

Fig. 2
figure 2

Schematic overview of the signal model with the used symbols

2.1 UL channel estimation

In the UL channel estimation phase, the UEs send UL pilots to allow for simultaneous channel estimation at the APs. There are \(\tau _{\mathrm{p}}\) mutually orthogonal pilot signals \(\{\phi _{1}, \ldots ,\phi _{\tau _{\mathrm{p}}}\} \in {\mathbb {C}}^{\tau _{\mathrm{p}}}\) of \(\tau _{\mathrm{p}}\) samples, normalized to have unit power. This implies that \(\phi _{i}^{H} \phi _{j}\) is equal to \(\tau _{\mathrm{p}}\) if \(i = j\) and 0 otherwise. Each UE is assigned to a pilot signal when it gains access to the network (see [4, 17] for pilot assignment protocols). As K is often much larger than \(\tau _{\mathrm{p}}\), the same pilot signal will be assigned to multiple UEs, leading to pilot contamination and the associated negative effects on the channel estimation [3, 25, 26]. The received signal \({\textbf{y}}^{\mathrm{pilot}}_{l} \in {\mathbb {C}}^{N \times \tau _{\mathrm{p}}}\) at AP l is given as:

$$\begin{aligned} {\textbf{y}}^{\mathrm{pilot}}_{l} = \sum _{k=1}^{K} \sqrt{p^{\mathrm{pilot}}_{k}} {\textbf{h}}_{kl} \phi ^{H}_{k} + {\textbf{n}}^{\mathrm{pilot}}_{l} \end{aligned}$$

where \(p^{\mathrm{pilot}}_{k}\) is the pilot transmit power of UE k and \({\textbf{n}}^{\mathrm{pilot}}_{l} \in {\mathbb {C}}^{N \times \tau _{\mathrm{p}}}\) is thermal noise. Let \(\mathcal{S}_{t} \subset \{1, \ldots , K\}\) denote the subset of UEs assigned to the same pilot signal \(\phi _{t}\) for \(t \in \{1, \ldots , \tau _{\mathrm{p}} \}\). After despreading, i.e. multiplying the signal coherently with \(\phi _{t}/ ||\phi _{t}||\), the received signal \({\textbf{y}}^{\mathrm{pilot}}_{tl} \in {\mathbb {C}}^{N}\) at AP l is given as

$$\begin{aligned} {\textbf{y}}^{\mathrm{pilot}}_{tl} = \sum _{i \in \mathcal{S}_{t}} \sqrt{\tau _{\mathrm{p}} p^{\mathrm{pilot}}_{i}} {\textbf{h}}_{il} + {\textbf{n}}^{\mathrm{pilot}}_{tl} \end{aligned}$$

where the thermal noise \({\textbf{n}}^{\mathrm{pilot}}_{tl}\) for each pilot signal t is distributed as \({\mathcal{N}}{\mathcal{C}}({\textbf{0}},{\textbf{R}}_{{\textbf{n}}_{l} {\textbf{n}}_{l}}^{\mathrm{pilot}})\). Assuming that AP l is aware of the local large-scale fading correlation matrix \({\textbf{R}}_{kl}\) (see e.g., [27,28,29,30] for correlation matrix estimation methods) of all the UEs, the MMSE estimate of \({\textbf{h}}_{kl}\) for \(k \in \mathcal{S}_{t}\) is

$$\begin{aligned} \hat{{\textbf{h}}}_{kl} = \sqrt{\tau _{\mathrm{p}} p^{\mathrm{pilot}}_{k} } {\textbf{R}}_{kl} \varvec{\Psi }_{tl}^{-1} {\textbf{y}}^{\mathrm{pilot}}_{tl} \end{aligned}$$


$$\begin{aligned} \varvec{\Psi }_{tl} = E \left\{ {\textbf{y}}^{\mathrm{pilot}}_{tl} {\textbf{y}}^{\mathrm{pilot},H}_{tl}\right\} = \sum _{i \in \mathcal{S}_{t}} \tau _{\mathrm{p}} p^{\mathrm{pilot}}_{i} {\textbf{R}}_{il} + {\textbf{R}}_{{\textbf{n}}_{l} {\textbf{n}}_{l}}^{\mathrm{pilot}}. \end{aligned}$$

The MMSE estimate \(\hat{{\textbf{h}}}_{kl}\) of \({\textbf{h}}_{kl}\) is an unbiased estimate with estimation error \(\tilde{{\textbf{h}}}_{kl} = {\textbf{h}}_{kl} - \hat{{\textbf{h}}}_{kl}\). The distribution of both random variables is given by

$$\begin{aligned}{} & {} {\hat{\textbf{h}}}_{kl} \sim {\mathcal{N}}{\mathcal{C}}\left( {\textbf{0}},\tau _{\mathrm{p}} p^{\mathrm{pilot}}_{k} {\textbf{R}}_{kl} \varvec{\Psi }_{tl}^{-1} {\textbf{R}}_{kl} \right) , \end{aligned}$$
$$\begin{aligned}{} & {} {\tilde{\textbf{h}}}_{kl} \sim {\mathcal{N}}{\mathcal{C}}\left( {\textbf{0}},{\textbf{R}}_{kl} - \tau _{\mathrm{p}} p^{\mathrm{pilot}}_{k} {\textbf{R}}_{kl} \varvec{\Psi }_{tl}^{-1} {\textbf{R}}_{kl} \right) \triangleq {\mathcal{N}}{\mathcal{C}}\left( {\textbf{0}},{\textbf{C}}_{kl}\right) . \end{aligned}$$

It is noted that the \(N \times N\) matrices \(\sqrt{\tau _{\mathrm{p}} p^{\mathrm{pilot}}_{k} } {\textbf{R}}_{kl} \varvec{\Psi }_{tl}^{-1}\) and \({\textbf{C}}_{kl}\) can be precomputed for complexity reduction at AP k, since they only depend on the channel statistics, which are changing slowly throughout the communication.Footnote 2

2.2 UL data transmission

In the UL data transmission phase, the received signal \({\textbf{y}}^{\mathrm{UL}}_{l} \in {\mathbb {C}}^{N}\) at AP l is given by

$$\begin{aligned} {\textbf{y}}^{\mathrm{UL}}_{l} = \sum _{k=1}^{K} {\textbf{h}}_{kl} s_{k} + {\textbf{n}}^{\mathrm{UL}}_{l} = {\textbf{H}}_{l} {\textbf{s}} + {\textbf{n}}^{\mathrm{UL}}_{l} \end{aligned}$$

where \(s_{k} \in {\mathbb {C}}\) is the signal transmitted by UE k with UL transmit power \(p_{k} = E \{s_{k} s_{k}^{H}\}\) and \({\textbf{n}}^{\mathrm{UL}}_{l} \in {\mathbb {C}}^{N} \sim {\mathcal{N}}{\mathcal{C}}({\textbf{0}},{\textbf{R}}_{{\textbf{n}}_{l} {\textbf{n}}_{l}}^{\mathrm{UL}})\) is an additive Gaussian noise component, including thermal antenna noise and quantization noise during UL transmission. Furthermore, \({\textbf{H}}_{l} = [{\textbf{h}}_{1\,l} \ \ldots \ {\textbf{h}}_{Kl}] \in {\mathbb {C}}^{N \times K}\), \({\textbf{s}} = [s_{1} \ \ldots \ s_{K}]^{\mathrm{T}} \in {\mathbb {R}}^{K}\) and \({\textbf{p}} = [p_{1} \ \ldots \ p_{K}]^{\mathrm{T}} \in {\mathbb {C}}^{K}\) are the concatenation of the channels from all the UEs to AP l, the signals transmitted by all the UEs and the UL transmit powers respectively. Stacking the received signals of all APs in \({\textbf{y}}^{\mathrm{UL}} = [{\textbf{y}}_{1}^{\mathrm{UL},{\mathrm{T}}} \ \ldots \ {\textbf{y}}_{L}^{\mathrm{UL},{\mathrm{T}}}]^{\mathrm{T}} \in {\mathbb {C}}^{NL}\) as well as the noise components in \({\textbf{n}}^{\mathrm{UL}} = [{\textbf{n}}^{\mathrm{UL},{\mathrm{T}}}_{1} \ \ldots \ {\textbf{n}}^{\mathrm{UL},{\mathrm{T}}}_{L}]^{\mathrm{T}} \in {\mathbb {C}}^{NL} \sim {\mathcal{N}}{\mathcal{C}}(0, {\textbf{R}}_{{\textbf{n}} {\textbf{n}}}^{\mathrm{UL}})\) where \({\textbf{R}}_{{\textbf{n}} {\textbf{n}}}^{\mathrm{UL}} = {\text{Blkdiag}}\{{\textbf{R}}_{{\textbf{n}}_{1} {\textbf{n}}_{1}}^{\mathrm{UL}}, \ldots , {\textbf{R}}_{{\textbf{n}}_{L} {\textbf{n}}_{L}}^{\mathrm{UL}}\}\), results in the network-wide signal model:

$$\begin{aligned} {\textbf{y}}^{\mathrm{UL}} = {\textbf{H}} {\textbf{s}} + {\textbf{n}}^{\mathrm{UL}} \end{aligned}$$

with \({\textbf{H}} = [{\textbf{h}}_{1} \ \ldots \ {\textbf{h}}_{K}] = [{\textbf{H}}_{1}^{\mathrm{T}} \ \ldots \ {\textbf{H}}_{L}^{\mathrm{T}}]^{\mathrm{T}} \in {\mathbb {C}}^{NL \times K}\).

In network-wide UL receive combining the signal \(s_{k}\) is estimated by linearly combining the received signal \({\textbf{y}}^{\mathrm{UL}}\) with a combining vector \({\textbf{v}}_{k} \in {\mathbb {C}}^{NL}\), i.e. \({\hat{s}}_{k} = {\textbf{v}}_{k}^{H} {\textbf{y}}^{\mathrm{UL}}\). Note that this linear combining can be performed in the network if AP l selects the local combining vector \({\textbf{v}}_{kl} \in {\mathbb {C}}^{N}\) in \({\textbf{v}}_{k} = [{\textbf{v}}_{k1}^{\mathrm{T}} \ \ldots \ {\textbf{v}}_{kL}^{\mathrm{T}}]^{\mathrm{T}}\) and computes the local estimate \({\textbf{v}}^{H}_{kl} {\textbf{y}}_{l}\). The NC then estimates \(s_{k}\) by combining the local estimates as:

$$\begin{aligned} \begin{aligned} {\hat{s}}_{k} = \sum _{l=1}^{L} {\textbf{v}}^{H}_{kl} {\textbf{y}}^{\mathrm{UL}}_{l} = {\textbf{v}}^{H}_{k} {\textbf{h}}_{k} s_{k} + \sum _{i \ne k} {\textbf{v}}^{H}_{k} {\textbf{h}}_{i} s_{i} + {\textbf{v}}^{H}_{k} {\textbf{n}}^{\mathrm{UL}}. \end{aligned} \end{aligned}$$

The goal is then to choose the combining vector \({\textbf{v}}_{k}\) that provides a good estimate \({\hat{s}}_{k}\), but where the APs use only local CSI. A popular choice in CFmMIMO literature is maximum-ratio (MR) combing with \({\textbf{v}}_{kl} = \hat{{\textbf{h}}}_{kl}\) [4, 31, 32]. An alternative heuristic scheme, which performs generally better but requires more processing power of the AP is local MMSE combining [6]. Network-wide UL power allocation algorithms for MR combining can be found in [14, 15, 22]. In this paper, network-wide MMSE combining [17] and associated power allocation algorithms will be considered, requiring typically network-wide CSI. However this paper shows that if a limited number of parameters can be exchanged between the NC and the APs, this network-wide MMSE combining and associated power allocation can still be obtained locally at each AP.

2.3 DL data transmission

In the DL data transmission phase, the received signal \(y_{k}^{\mathrm{DL}}\) at UE k is given by

$$\begin{aligned} \begin{aligned} y_{k}^{\mathrm{DL}}&= \sum _{l=1}^{L} {\textbf{h}}_{kl}^{H} \sum _{i=1}^{K} {\textbf{w}}_{il} \sqrt{\rho _{i}} \zeta _{i} + n_{k}^{\mathrm{DL}} \\&= {\textbf{h}}_{k}^{H} {\textbf{w}}_{k} \sqrt{\rho _{k}} \zeta _{k} + \sum _{i \ne k} {\textbf{h}}_{k}^{H} {\textbf{w}}_{i} \sqrt{\rho _{i}} \zeta _{i} + n^{\mathrm{DL}}_{k} \end{aligned} \end{aligned}$$

where \(\zeta _{k} \in {\mathbb {C}}\) is the signal transmitted to UE k with unit DL transmit power, \(\rho _{k}\) is the network-wide DL transmit power allocated to UE k, \(n_{k}^{\mathrm{DL}} \sim {\mathcal{N}}{\mathcal{C}}\left( 0, \sigma _{k}^{2,\mathrm{DL}} \right)\) is an additive Gaussian noise component, \({\textbf{w}}_{k} = [{\textbf{w}}^{\mathrm{T}}_{k1} \ldots {\textbf{w}}^{\mathrm{T}}_{kL}]^{\mathrm{T}} \in {\mathbb {C}}^{NL}\) is the concatenation of the precoding vectors used for UE k. Also denote the DL transmit powers as \(\varvec{\rho } = [\rho _{1} \ \ldots \ \rho _{K}]^{\mathrm{T}} \in {\mathbb {R}}^{K}\) and the predocing matrix as \({\textbf{W}} = [{\textbf{w}}_{1} \ \ldots \ {\textbf{w}}_{K}] \in {\mathbb {C}}^{NL \times K}\). Again the selection of the precoding vector should only depend on local CSI for CFmMIMO. The most popular choice is MR precoding with

$$\begin{aligned} {\textbf{w}}_{kl} = \frac{\hat{{\textbf{h}}}_{kl}}{\sqrt{E\{||\hat{{\textbf{h}}}_{kl}||^{2}\}}}. \end{aligned}$$

DL power allocation algorithms for different local precoding schemes can be found in [4, 5, 33]. In this paper, network-wide MMSE precoding [34] and associated power allocation algorithms will be considered, requiring typically network-wide CSI. However this paper shows that using a similar approach as for UL data transmission, this network-wide MMSE precoding and associated power allocation can still be obtained locally at each AP.

3 UL receive combing and power allocation

3.1 UL receive combining

A standard lower bound for the ergodic capacity of the UL data transmission for UE k can be obtained by rewriting (9) using the estimated channels as

$$\begin{aligned} \begin{aligned} {\hat{s}}_{k} = {\textbf{v}}^{H}_{k} \hat{{\textbf{h}}}_{k} s_{k} + {\textbf{v}}^{H}_{k} \tilde{{\textbf{h}}}_{k} s_{k} + \sum _{i \ne k} {\textbf{v}}^{H}_{k} (\hat{{\textbf{h}}}_{i} + \tilde{{\textbf{h}}}_{i}) s_{i} + {\textbf{v}}^{H}_{k} {\textbf{n}}^{\mathrm{UL}} \end{aligned} \end{aligned}$$

and is given by the achievable UL spectral efficiency (SE) [24, 35]

$$\begin{aligned} {\text{SE}}^{\mathrm{UL},1}_{k} = \frac{\tau _{\mathrm{u}}}{\tau _{\mathrm{c}}} E \left\{ \log _{2} \left( 1+ {\text{SINR}}^{\mathrm{UL},{\text{inst}}}_{k}\right) \right\} \ {\text{[bits/s/Hz]}} \end{aligned}$$

with the instantaneous effective signal-to-interference-and-noise ratio \({\text{SINR}}^{\mathrm{UL},{\text{inst}}}_{k}\) given as

$$\begin{aligned} \frac{p_{k} |{\textbf{v}}^{H}_{k} \hat{{\textbf{h}}}_{k}|^{2} }{\sum _{i \ne k} p_{i} |{\textbf{v}}^{H}_{k} \hat{{\textbf{h}}}_{i}|^{2} + \sum _{i=1}^{K} p_{i} {\textbf{v}}^{H}_{k} {\textbf{C}}_{i} {\textbf{v}}_{k} + {\textbf{v}}_{k}^{H} {\textbf{R}}_{{\textbf{n}} {\textbf{n}}}^{\mathrm{UL}} {\textbf{v}}_{k} }. \end{aligned}$$

where \({\textbf{C}}_{i} = {\text{Blkdiag}}\{{\textbf{C}}_{i,1}, \ldots , {\textbf{C}}_{iL}\}\). The expectation is with respect to the different channel realizations.

The UL SE is valid for any combining vector \({\textbf{v}}_{k}\). In this paper optimal receive combining is considered and it can be seen from expression (13) that the optimal instantaneous combining vector can be obtained by maximizing the generalized Rayleigh quotient in the instantaneous SINR in (14). The optimal combining vector is unique up to a scalar multiplication, and so to make the solution unique, the solution that minimizes the mean-squared error \(E \{ |s_{k} - {\hat{s}}_{k}|^{2} | \hat{{\textbf{H}}}\}\) is chosen. The MMSE combining vector for UE k is then given by

$$\begin{aligned} {\textbf{v}}^{\mathrm{MMSE}}_{k} = p_{k} \left( \hat{{\textbf{H}}} {\textbf{P}} \hat{{\textbf{H}}}^{H} + \sum _{i=1}^{K} p_{i} {\textbf{C}}_{i} + {\textbf{R}}_{{\textbf{n}} {\textbf{n}}}^{\mathrm{UL}} \right) ^{-1} \hat{{\textbf{h}}}_{k} \end{aligned}$$

with \({\textbf{P}} = {\text{diag}}\{p_{1}, \ldots , p_{K}\}\) as shown in [27].

The following theorem shows that the MMSE combining vectors for UEs \(k=1 \ldots K\), i.e., the combining matrix \({\textbf{V}}^{\mathrm{MMSE}} = [{\textbf{v}}^{\mathrm{MMSE}}_{1}, \ldots , {\textbf{v}}^{\mathrm{MMSE}}_{K}]\), can be computed in a way that allows for an efficient distributed implementation.

Theorem 3.1

The MMSE combining matrix \({\textbf{V}}^{\mathrm{MMSE}}\) can be written as

$$\begin{aligned} {\textbf{V}}^{\mathrm{MMSE}} = \begin{bmatrix} {\textbf{X}}_{1}\\ \vdots \\ {\textbf{X}}_L \end{bmatrix} {\textbf{U}} = {\textbf{X}} {\textbf{U}} \end{aligned}$$


$$\begin{aligned} {\textbf{U}}= & {} \left( {\textbf{P}}^{-1} + {\textbf{X}}^{H} \hat{{\textbf{H}}} \right) ^{-1} = \left( {\textbf{P}}^{-1} + \sum _{l=1}^{L} {\textbf{X}}^{H}_{l} \hat{{\textbf{H}}}_{l} \right) ^{-1}, \end{aligned}$$
$$\begin{aligned} {\textbf{X}}_{l}= & {} \left( \sum _{k=1}^{K} p_{k} {\textbf{C}}_{kl} + {\textbf{R}}^{\mathrm{UL}}_{{\textbf{n}}_{l} {\textbf{n}}_{l}} \right) ^{-1} \hat{{\textbf{H}}}_{l}. \end{aligned}$$


See “Appendix 1”. \(\square\)

In [27], the network-wide distributed MMSE receive combining (N-DRC) algorithm is proposed based on the computation of the MMSE combining vectors in Theorem 3.1. The algorithm can be summarized in the following steps for each coherence block:

  1. 1

    Each AP l estimates \(\hat{{\textbf{H}}}_{l}\) and computes \({\textbf{X}}_{l}\) as in (18) and \({\textbf{X}}_{l}^{H} \hat{{\textbf{H}}}_{l}\). It transmits the combined signals \({\textbf{x}}_{l} = {\textbf{X}}_{l}^{H} {\textbf{y}}_{l}^{\mathrm{UL}}\) and the parameters \({\textbf{X}}_{l}^{H} \hat{{\textbf{H}}}_{l}\) to the NC.

  2. 2

    The available links in the network are used efficiently such that the NC obtains the in-network sums \({\textbf{x}}= \sum _{l=1}^{L} {\textbf{x}}_{l}\) and \({\textbf{X}}^{H} \hat{{\textbf{H}}} = \sum _{l=1}^{L} {\textbf{X}}_{l}^{H} \hat{{\textbf{H}}}_{l}\).

  3. 3

    The NC uses \({\textbf{X}}^{H} \hat{{\textbf{H}}}\) to compute \({\textbf{U}} = [{\textbf{u}}_{1} \ \ldots \ {\textbf{u}}_{K}]\) as in (17) and computes the estimates \({\hat{s}}_{k}\) as \({\textbf{u}}_{k}^{H} {\textbf{x}}\) for all UEs.

Some important advantages of this distributed algorithm compared to a centralized algorithm, i.e. when each AP serves as a receiver that directly transmits its received signals to the NC, are repeated here:

  • The NC only has to invert a \(K \times K\) matrix in (17) instead of the \(LN \times LN\) matrix in (15). However, this requires that each AP inverts an \(N \times N\) matrix to compute \({\textbf{X}}_{l}\), where the necessary local matrices can be precomputed and kept fixed as long as the transmit powers \({\textbf{p}}\) remain unchanged.

  • If the in-network sums in step 2 are computed efficiently exploiting the available network topology, this will reduce the network signaling strongly compared to the transmission of all NL signals.

  • The method is also robust against link failures: if the data (\({\textbf{x}}_{l}\) and \({\textbf{X}}_{l}^{H} \hat{{\textbf{H}}_{l}}\)) from a certain AP l is not received as a term in the in-network sums, the obtained estimate will still be optimal for a network setup with AP l removed.

It is important to note that the MMSE combining vectors depend on the UL transmit powers, the noise statistics and the channel estimation \(\hat{{\textbf{H}}}\), which in turn depends on the pilot transmit powers, the pilot assignment strategy and the large-scale statistics. The UL SE in (13) is as a consequence also influenced by these parameters. In the following subsection a way to improve this achievable bound is provided, based on power allocation.

3.2 UL power allocation preliminaries

With UL power allocationFootnote 3 the UL transmit powers \({\textbf{p}}\) are set to maximize some utility function for some UL receive combining strategy. The problem can be formulated mathematically as:

$$\begin{aligned} \begin{aligned} \max _{{\textbf{p}} \in \mathcal{P}}&\quad U({\text{SE}}_{1}, \ldots , {\text{SE}}_{K})\\ {\text{s.t.}}&\quad {\text{SE}}_{k} = {\text{SE}}({\textbf{p}}, {\textbf{v}}_{k}({\textbf{p}})) \quad \forall k \end{aligned} \end{aligned}$$

where, for instance, (13) is used for SE\(_{k}\) and \(\mathcal{P}\) is a convex region of allowed UL transmit powers \({\textbf{p}}\) for the UEs, e.g. corresponding to a total power constraint or a per UE power constraint. The utility function is assumed to be (quasi-)concave and nondecreasing. In this paper, two specific utility functions are considered, but the proposed algorithm is not limited to these cases. The channel hardening makes power allocation different in CFmMIMO systems compared to single-antenna systems. While the combining vectors \({\textbf{v}}_{k}\) will be adopted depending on small-scale variations of the channels, \({\textbf{p}}\) will only be adopted to large-scale variations to effectively compute the expectations in e.g. (13). Since the expected value operation in the expression for the SE in (13) contains the logarithm, finding the optimal power allocation is difficult due to the non-trivial relation between \({\textbf{p}}\) and the \({\text{SE}}^{\mathrm{UL},1}_{k}({\textbf{p}}, {\textbf{v}}_{k})\). Therefore a different SE formulation will be considered for the UL power allocation, which is obtained under the assumption of channel hardeningFootnote 4. For this, only the part of the desired signal received over the average channel \(E\{{\textbf{v}}^{H}_{k} {\textbf{h}}_{k}\}\) is treated as the true desired signal, i.e.

$$\begin{aligned} \begin{aligned} {\hat{s}}_{k} =&\ E\left\{ {\textbf{v}}^{H}_{k} {\textbf{h}}_{k}\right\} s_{k} + \left( {\textbf{v}}^{H}_{k} {\textbf{h}}_{k} - E\left\{ {\textbf{v}}^{H}_{k} {\textbf{h}}_{k}\right\} \right) s_{k} \\&+ \sum _{i \ne k} {\textbf{v}}^{H}_{k} {\textbf{h}}_{i} s_{i} + {\textbf{v}}^{H}_{k} {\textbf{n}}. \end{aligned} \end{aligned}$$

This results in a deterministic channel under uncorrelated interference and noise for which the following SE (also referred to as the use-and-then-forget bound [24]) can be derived:

$$\begin{aligned} {\text{SE}}^{\mathrm{UL},2}_{k} = \frac{\tau _{\mathrm{u}}}{\tau _{\mathrm{c}}} \log _{2} \left( 1+ {\text{SINR}}^{\mathrm{UL},2}_{k}\right) \ {\text{[bits/s/Hz]}} \end{aligned}$$


$$\begin{aligned} \frac{p_{k} \left| E\left\{ {\textbf{v}}^{H}_{k} {\textbf{h}}_{k}\right\} \right| ^{2} }{\sum _{i =1}^{K} p_{i} E\left\{ \left| {\textbf{v}}^{H}_{k} {\textbf{h}}_{i}\right| ^{2}\right\} - p_{k} \left| E\left\{ {\textbf{v}}^{H}_{k} {\textbf{h}}_{k}\right\} \right| ^{2} + E\left\{ \left| {\textbf{v}}^{H}_{k} {\textbf{n}}\right| ^{2} \right\} } \end{aligned}$$

defines the signal-to-interference-and-noise ratio \({\text{SINR}}^{\mathrm{UL},2}_{k}\). The expectations are with respect to the different channel realizations. The expression is in theory only depending on the large-scale fading.

It is intuitively clear that the \({\text{SE}}^{\mathrm{UL},2}_{k}\) is a less tight lower bound for the ergodic capacity than the achievable lower bound given by \({\text{SE}}^{\mathrm{UL},1}_{k}\), since the channel estimates are not utilized in the signal estimation, i.e.

$$\begin{aligned} {\text{SE}}^{\mathrm{UL},1}_{k} \ge {\text{SE}}^{\mathrm{UL},2}_{k}. \end{aligned}$$

However, simulations in [24, Sect. 4.2] show that the bound is still tight for \(LN \gg 1\), i.e. when channel hardening has its effect. The bound is also tight for a deterministic channel, i.e. when only 1 channel realization is used to compute the expectations over the channel realizations. The simulations also show that the bound is tight for any combining scheme, but that using the MMSE combining vector leads to a tighter bound than any other multiple of this MMSE combining vector, e.g. when a constraint \({\textbf{v}}_{k}^{H} \hat{{\textbf{h}}}_{k} = 1\) is used in each coherence block.

Since closed form expressions for the expectations in (22) are difficult to derive for the optimal MMSE combining vector (15),Footnote 5 the next theorem provides a way to estimate the expectations without having access to the true channels \({\textbf{H}}\).

Theorem 3.2

If the combining vector \({\textbf{v}}_{k}\) is independent of the instantaneous channel estimation error \(\{\tilde{{\textbf{h}}}_{k}\}_{\forall k}\) and also independent of the instantaneous UL noise \({\textbf{n}}^{\mathrm{UL}}\), then the expectations can be computed without knowledge of the true channel \({\textbf{H}}\):

$$\begin{aligned} E\left\{ {\textbf{v}}^{H}_{k} {\textbf{h}}_{k} \right\}= & {} \gamma _{k}, \end{aligned}$$
$$\begin{aligned} E\left\{ \left| {\textbf{v}}^{H}_{k} {\textbf{h}}_{i}\right| ^{2} \right\}= & {} \upsilon _{ki} + \xi _{ki} = \upsilon _{ki} + \sum _{l=1}^{L} \xi ^{l}_{ki}, \end{aligned}$$
$$\begin{aligned} E\left\{ \left| {\textbf{v}}^{H}_{k} {\textbf{n}}^{\mathrm{UL}}\right| ^{2} \right\}= & {} \nu _{k} = \sum _{l=1}^{L} \nu ^{l}_{k}. \end{aligned}$$

The right-hand side symbols are defined in Table 1 and these only depend on the channel estimates and the large-scale statistics \(\{{\textbf{C}}_{k}\}_{\forall k}\) and \({\textbf{R}}^{\mathrm{UL}}_{{\textbf{n}} {\textbf{n}}}\).

Table 1 Overview of used symbols for power allocation


See “Appendix 2\(\square\)

The requirements for Theorem 3.2 are valid for the MMSE combining vector (15), as \({\textbf{v}}^{\mathrm{MMSE}}_{k}\) only depends on the instantaneous channel estimates \(\hat{{\textbf{H}}}\) and the fixed matrices \(\{{\textbf{C}}\}_{\forall k}\) and \({\textbf{R}}_{{\textbf{n}} {\textbf{n}}}^{\mathrm{UL}}\).

The (ergodic) statistics in Table 1 can be estimated using (recursive) time-averaging at the NC or in a distributed way using in-network sums for \(\xi _{ki}\) and \(\nu _{k}\), if each AP l has access to the transformation matrix \({\textbf{U}}\), since \({\textbf{v}}_{kl} = {\textbf{X}}_{l} {\textbf{u}}_{k}\).

3.3 UL receive combining and power allocation

To be able to perform the power allocation problem (19) with SE\(({\textbf{p}},{\textbf{v}}_{k}({\textbf{p}}))\) defined as \({\text{SE}}^{\mathrm{UL},2}_{k}\) in (21), the following alternating optimization method is proposed.

  1. 1

    Set \(i \leftarrow 0\) and initialize \({\textbf{p}}^{0}\) randomly such that \({\textbf{p}}^{0} \in \mathcal{P}\).

  2. 2

    Determine the combining matrix \({\textbf{V}}^{i+1}\) for each coherence block that maximizes:

    $$\begin{aligned} \begin{aligned} \max _{{\textbf{V}}}&\quad U({\text{SE}}_{1}, \ldots , {\text{SE}}_{K})\\ {\text{s.t.}}&\quad {\text{SE}}_{k} = {\text{SE}}^{\mathrm{UL},2}_{k}({\textbf{p}}^{i}, {\textbf{v}}_{k}) \quad \forall k \end{aligned} \end{aligned}$$

    for fixed transmit powers \({\textbf{p}}^{i}\). Since the relation between \({\text{SE}}^{\mathrm{UL},2}_{k}({\textbf{p}}^{i}, {\textbf{v}}_{k})\) and \({\textbf{v}}_{k}\) is non-trivial, (23) is used to obtain a tractable optimization problem:

    $$\begin{aligned} \begin{aligned} \max _{{\textbf{V}}}&\quad U({\text{SE}}_{1}, \ldots , {{\text{SE}}}_{K})\\ {\text{s.t.}}&\quad {\text{SE}}_{k} = {\text{SE}}^{\mathrm{UL},1}_{k}\big ({\textbf{p}}^{i}, {\textbf{v}}_{k}\big ) \left( \ge {\text{SE}}^{\mathrm{UL},2}_{k}\big ({\textbf{p}}^{i}, {\textbf{v}}_{k}\big ) \right) \ \forall k. \end{aligned} \end{aligned}$$

    Due to the nondecreasing function \(U({\text{SE}}_{1}, \ldots , {\text{SE}}_{K})\), this is equivalent to maximizing \(\text{SINR}^{\mathrm{UL},\text{inst}}_{k}\) in (14) for each k in each coherence block. The solution is thus given by (15) and (16).

  3. 3

    Determine the optimal transmit powers \({\textbf{p}}^{i+1}\) that maximize:

    $$\begin{aligned} \begin{aligned} \max _{{\textbf{p}} \in \mathcal{P}}&\quad U\left( {\text{SE}}_{1}, \ldots , {\text{SE}}_{K}\right) \\ {\text{s.t.}}&\quad {\text{SE}}_{k} = {\text{SE}}^{\mathrm{UL},2}_{k}\left( {\textbf{p}}, {\textbf{v}}^{i+1}_{k}\right) \quad \forall k. \end{aligned} \end{aligned}$$
  4. 4

    Set \(i \leftarrow i + 1\) and return to step 2.

To following theorem is relevant to determine the convergence of the proposed alternating optimization method.

Theorem 3.3

Suppose that \(f({\textbf{x}}_{1}, {\textbf{x}}_{2})\) is continuously differentiable over a closed convex set \(\mathcal{X} = \mathcal{X}_{1} \times \mathcal{X}_{2}\). Furthermore, suppose that for each \({\textbf{x}}^{i} \in \mathcal{X}\), the maximizations

$$\begin{aligned} {\textbf{x}}^{i+1}_{1}= \mathop {\mathrm{arg\,max}}\limits _{{\textbf{z}}_{1} \in \mathcal{X}_{1}} f\left( {\textbf{z}}_{1}, {\textbf{x}}^{i}_{2}\right) \end{aligned}$$


$$\begin{aligned} {\textbf{x}}^{i+1}_{2}= \mathop {\mathrm{arg\,max}}\limits _{{\textbf{z}}_{2} \in \mathcal{X}_{2}} f\left( {\textbf{x}}^{i+1}_{1}, {\textbf{z}}_{2}\right) \end{aligned}$$

are uniquely attained. Let \(\{x^{i}\}\) be the sequence generated from a random initial point \({\textbf{x}}^{0} \in \mathcal{X}\) using the above equations. Then every limit point of \(\{x^{i}\}\) is a stationary point of \(\max _{{\textbf{x}} \in \mathcal{X}} f({\textbf{x}})\).


The proof is a simplification of the proof in [39, Sect. 2.7] for block coordinate descent methods. Only two blocks are considered here and the maximizations in the theorem need to be replaced with a minimization of \(-f({\textbf{x}})\) to obtain the same result. \(\square\)

Although not fully applicable to the proposed alternating optimization method, this theorem can still be used to show that the alternating optimization method converges to a stationary point of (19) when the number of antennas NL goes to infinity. In that case, the bound (23) is tight, i.e. \({\text{SE}}^{\mathrm{UL},1}_{k} \approx {\text{SE}}^{\mathrm{UL},2}_{k}\) and this implies that step 3 of the alternating optimization method is optimizing (19) for a fixed \({\textbf{p}}^{i}\). The maximum in optimization problems (27) is uniquely attained whenever U is nondecreasing, so it still needs to be checked if a unique solution to (29) can be found for the given utility function.

Based on this, a distributed UL receive combining and power allocation algorithm (D-UL-RCPA) is proposed in Algorithm 1, where the (16)–(18) is used in each coherence block and the statistics to determine \({\text{SE}}^{\mathrm{UL},2}_{k}({\textbf{p}}, {\textbf{v}}^{i+1}_{k})\) are gathered over B coherence blocks before a better power allocation is computed. If the channel statistics \(\{{\textbf{R}}_{kl}\}_{\forall k,l}\) remain constant and the necessary large-scale statistics are estimated without estimation error, i.e. \(B \rightarrow \infty\), one can see that the proposed algorithm is basically performing the alternating optimization method with (27)–(28). It exhibits the same convergence behavior in that case. For a finite window B the large-scale statistics for the power allocation step are only obtained approximately but the algorithm will still provide a good performance compared to other methods as will be shown in the simulations sections. Simulations show also that \({\textbf{p}}\) is not changing much after 3 iterations if the large-scale statistics remain constant, demonstrating the good tracking performance of the proposed algorithm.

Algorithm 1 also has the advantages listed in Sect. 3.1: The NC only has to deal with \(K \times K\) matrices, the communications over the fronthaul network can be done efficiently using in-network sums and the algorithm is robust against link failures. The extra communication of \({\textbf{U}}\) and the parameters \(\{\xi _{ki}, \nu _{k}\}_{\forall k,i}\) are only necessary for the power allocation step, so whenever the performance of the estimation task at hand is sufficient, these steps can be removed to save network resources. Furthermore, it can be seen as an adaptive algorithm, since is adapts to slow changes in the channel statistics over time.

figure a

4 DL transmit precoding and power allocation

4.1 DL power allocation preliminaries

When a precoding vector \({\textbf{w}}_{k}\) is used to encode data for UE k, the desired signal is received at UE k propagates with a gain of \({\textbf{h}}_{k}^{H} {\textbf{w}}_{k}\). The UE k does not know this gain a priori, but it can estimate its mean value \(E\{{\textbf{h}}_{k}^{H} {\textbf{w}}_{k}\}\), which is a common approach in massive MIMO systems. The received DL signal can then be expressed as

$$\begin{aligned} \begin{aligned} y^{\mathrm{DL}}_{k} =&\ E\left\{ {\textbf{h}}^{H}_{k} {\textbf{w}}_{k}\right\} \sqrt{\rho _{k}} \zeta _{k} + \left( {\textbf{h}}^{H}_{k} {\textbf{w}}_{k} - E\left\{ {\textbf{h}}^{H}_{k} {\textbf{w}}_{k}\right\} \right) \sqrt{\rho _{k}} \zeta _{k} \\&+ \sum _{i \ne k} {\textbf{h}}^{H}_{k} {\textbf{w}}_{i} \sqrt{\rho _{i}} \zeta _{i} + n_{k}^{\mathrm{DL}}. \end{aligned} \end{aligned}$$

Here \(E\{{\textbf{h}}^{H}_{k} {\textbf{w}}_{k}\}\) represents a deterministic channel and the other terms represent uncorrelated interference and noise. The following achievable SE can then be obtained using the hardening bound [4, 5, 24]:

$$\begin{aligned} {\text{SE}}^{\mathrm{DL},2}_{k} = \frac{\tau _{\mathrm{d}}}{\tau _{\mathrm{c}}} \log _{2} \left( 1+ \text{SINR}^{\mathrm{DL}}_{k}\right) \ \text{[bits/s/Hz]} \end{aligned}$$

with \(\text{SINR}^{\mathrm{DL}}_{k}\) given as

$$\begin{aligned} \frac{\rho _{k} |E\{{\textbf{w}}^{H}_{k} {\textbf{h}}_{k}\}|^{2} }{\sum _{i=1}^{K} \rho _{i} E\{|{\textbf{w}}^{H}_{i} {\textbf{h}}_{k}|^{2}\} - \rho _{k} |E\{{\textbf{w}}^{H}_{k} {\textbf{h}}_{k}\}|^{2} + \sigma ^{2,\mathrm{DL}}_{k} }. \end{aligned}$$

Here the assumption \(E\{||{\textbf{w}}_{k}||^{2}\} = 1\) is not implied, while this is the case in [4, 5, 24]. The total transmit power allocated to UE k is still given by \(\rho _{k} E\{||{\textbf{w}}_{k}||^{2}\}\), which is no longer only determined by \(\rho _{k}\), but is also influenced by the statistics of \(E\{||{\textbf{w}}_{k}||^{2}\}\).

The DL power allocation problem with a per AP total power constraint, is finally formulated as

$$\begin{aligned} \begin{aligned} \max _{\varvec{\rho } \ge {\textbf{0}}}&\quad U(\overline{\text{ SE }}_{1}, \ldots , \overline{\text{ SE }}_{K})\\ {\text{s.t.}}&\quad \overline{\text{ SE }}_{k} = {\text{SE}}^{\mathrm{DL},2}_{k}(\varvec{\rho }, {\textbf{W}}(\varvec{\rho })) \quad \forall k,\\&\quad \sum _{k=1}^{K} \rho _{k} \mu ^{l}_{k} \le P^{l}_{t} \quad \forall l\\ \end{aligned} \end{aligned}$$

with \(P^{l}_{t}\) the maximum transmit power of AP l and

$$\begin{aligned} \mu ^{l}_{k} = E\left\{ {\textbf{w}}_{kl}^{H} {\textbf{w}}_{kl}\right\} . \end{aligned}$$

4.2 UL–DL duality

Instead of solving the primal problem (42) directly, it is proposed to solve the dual problem using a dual subgradient descent procedure. The dual problem to (42) can be found by introducing L dual variables \(\varvec{\sigma } = [\sigma _{1} \ \ldots \ \sigma _L]^{\mathrm{T}}\) for the L power constraints and an extra common dual variable \(\lambda\) resulting in

$$\begin{aligned} \min _{\varvec{\sigma }^{v} \ge {\textbf{0}},\lambda \ge 0} \max _{\varvec{\rho } \ge {\textbf{0}}, {\textbf{W}}} \mathcal{L}^{\mathrm{DL}} (\varvec{\rho }, {\textbf{W}}, \varvec{\sigma }^{v}, \lambda ) \end{aligned}$$

where the Lagrangian \(\mathcal{L}^{\mathrm{DL}} (\varvec{\rho }, {\textbf{W}}, \varvec{\sigma }^{v}, \lambda )\) is given as

$$\begin{aligned} U\left( {\text{SE}}^{\mathrm{DL},2}_{1}, \ldots ,{\text{SE}}^{\mathrm{DL},2}_{K}\right) - \lambda \sum _{l=1}^{L} \sigma ^{v}_{l} \left( \sum _{k=1}^{K} \rho _{k} \mu ^{l}_{k} - P^{l}_{t} \right) . \end{aligned}$$

For some specific cases with deterministic channels (e.g. [40, 41]), it is possible to show that the duality gap between the primal and dual problem is zero, so that solving the dual problem will result in a primal optimal solution. Unfortunately this is not the case for the general CFmMIMO DL power allocation problem in (42), so convergence to a global optimum can not be guaranteed. However, the proposed algorithm will provide an efficient way of obtaining a set of locally optimal DL power allocation and precoding vectors.

To be able to find the optimal precoding strategy for \({\textbf{W}}\), a change of variables is introduced. SE\(_{k}^{\mathrm{DL},2}\) in (42) is replaced with the expression for the SE of a virtual UL network SE\(_{k}^{\mathrm{UL},v} = \frac{\tau _{\mathrm{d}}}{\tau _{\mathrm{c}}} \log _{2} (1+ \text{SINR}^{\mathrm{UL},v}_{k})\), depending on the newly introduced variables \({\textbf{p}}^{v}\), which will replace the DL transmit powers \(\varvec{\rho }\). The virtual UL network \(\text{SINR}^{\mathrm{UL},v}_{k}\) depends on \({\textbf{p}}^{v}, {\textbf{W}}\) and \(\varvec{\sigma }\) and is defined as:

$$\begin{aligned} \frac{p^{v}_{k} \left| E\left\{ {\textbf{w}}^{H}_{k} {\textbf{h}}_{k}\right\} \right| ^{2} }{\sum _{i =1}^{K} p^{v}_{i} E\left\{ \left| {\textbf{w}}^{H}_{k} {\textbf{h}}_{i}\right| ^{2}\right\} - p^{v}_{k} \left| E\left\{ {\textbf{w}}^{H}_{k} {\textbf{h}}_{k}\right\} \right| ^{2} + \underbrace{ \sum \nolimits _{l=1}^{L} \sigma ^{v}_{l} \mu ^{l}_{k}}_{\nu _{k}}}. \end{aligned}$$

In order to show that this results in the same allowed region for \(\overline{\text{ SE }}_{k}\), it can be shown that any set of DL SEs \(\{{\text{SE}}_{k}^{\mathrm{DL},2}\}_{\forall k}\) obtained with specific DL transmit powers \(\varvec{\rho } \ge {\textbf{0}}\) can also be obtained as UL SEs \(\{{\text{SE}}_{k}^{\mathrm{UL},v}\}_{\forall k}\) with specific virtual transmit powers \({\textbf{p}}^{v} \ge {\textbf{0}}\) and visa versa. Equating the virtual UL SE expression \({\text{SE}}_{k}^{\mathrm{UL},v}\) with the DL SE expression \({\text{SE}}_{k}^{\mathrm{DL},2}\) for all UEs, as done in expression (47),


result in the following relation between the UL and DL transmit powers:

$$\begin{aligned} \varvec{\rho } = \left( {\textbf{M}} ({\textbf{p}}^{v}, {\textbf{W}}, \varvec{\sigma }^{v}) \right) ^{-1} {\text{diag}}\left\{ \sigma ^{2,{\mathrm{DL}}}_{1}, \ldots , \sigma ^{2,{\mathrm{DL}}}_{K}\right\} {\textbf{p}}^{v} \end{aligned}$$


$$\begin{aligned} {[}{\textbf{M}}]_{ki} = {\left\{ \begin{array}{ll} \sum _{i\ne k} p^{v}_{i} E\left\{ \left| {\textbf{w}}^{H}_{k} {\textbf{h}}_{i}\right| ^{2}\right\} + \nu _{k} &{}\quad {\text{if}} \; k = i,\\ -p^{v}_{k} E\left\{ \left| {\textbf{w}}^{H}_{i} {\textbf{h}}_{k}\right| ^{2}\right\} &{}\quad {\text{if}} \; k \ne i. \end{array}\right. } \end{aligned}$$

The matrix \({\textbf{M}}\) is real-valued with positive diagonal elements and non-positive off-diagonal elements, and is column diagonally dominant. Therefore, \({\textbf{M}}\) is an M-matrix [42]. As M-matrices are inverse positive, it is clear that for each \({\textbf{p}}^{v} \ge {\textbf{0}}\), solving (48) will yield DL transmit powers \(\varvec{\rho } \ge {\textbf{0}}\). Similar results can be shown for the inverse relation, where \({\textbf{p}}^{v}\) is computed from \(\varvec{\rho }\).

Summing all left- and right-hand sides of (47) for all UEs, results in the following relation between the virtual UL transmit powers and DL transmit powers:

$$\begin{aligned} \sum _{k=1}^{K} \rho _{k} \sum _{l=1}^{L} \sigma ^{v}_{l} \mu ^{l}_{k} = \sum _{k=1}^{K} p^{v}_{k} \sigma ^{2,{\mathrm{DL}}}_{k}. \end{aligned}$$

As such the dual problem (44) can be written as

$$\begin{aligned} \min _{\varvec{\sigma }^{v} \ge {\textbf{0}}} \min _{\lambda \ge 0} \max _{{\textbf{p}}^{v} \ge {\textbf{0}}, {\textbf{W}}} \mathcal{L}^{\mathrm{UL},v} ({\textbf{p}}^{v}, {\textbf{W}}, \varvec{\sigma }^{v},\lambda ) \end{aligned}$$

where the Lagrangian \(\mathcal{L}^{\mathrm{UL},v} ({\textbf{p}}^{v}, {\textbf{W}}, \varvec{\sigma }^{v}, \lambda )\) is given as

$$\begin{aligned} U({\text{SE}}_{1}^{\mathrm{UL},v}, \ldots ,{\text{SE}}_{K}^{\mathrm{UL},v}) - \lambda \left( \sum _{k=1}^{K} p_{k}^{v} \sigma ^{2,{\mathrm{DL}}}_{k} - \sum _{l=1}^{L}\sigma ^{v}_{l} P^{l}_{t} \right) . \end{aligned}$$

If the expressions for estimating the statistics in Theorem 3.2 are used in (46) and the expectations are replaced by immediate channel estimations, an expression for the virtual instantaneous \(\text{SINR}^{\mathrm{UL},v,\text{inst}}_{k}\) is obtained as

$$\begin{aligned} \frac{p^{v}_{k} |{\textbf{w}}^{H}_{k} \hat{{\textbf{h}}}_{k}|^{2} }{\sum _{i \ne k} p^{v}_{i} |{\textbf{w}}^{H}_{k} \hat{{\textbf{h}}}_{i}|^{2} + \sum _{i=1}^K p^{v}_{i} {\textbf{w}}^{H}_{k} {\textbf{C}}_{i} {\textbf{w}}_{k} + {\textbf{w}}_{k}^{H} {\textbf{R}}_{{\textbf{n}} {\textbf{n}}}^{\mathrm{UL},v} {\textbf{w}}_{k} } \end{aligned}$$

with \({\textbf{R}}_{{\textbf{n}} {\textbf{n}}}^{\mathrm{UL},v} = \text{Blkdiag}\{\sigma _{1}^{v} {\textbf{I}}_{N}, \ldots , \sigma _L^{v} {\textbf{I}}_{N}\}\). Again the optimal value for \({\textbf{w}}_{k}\) that maximizes this Rayleigh quotient is given by a closed form formula similar to (15), but with \({\textbf{p}}\) and \({\textbf{R}}_{{\textbf{n}} {\textbf{n}}}^{\mathrm{UL}}\) replaced by \({\textbf{p}}^{v}\) and \({\textbf{R}}_{{\textbf{n}} {\textbf{n}}}^{\mathrm{UL},v}\).

This instantaneous expression for \(\text{SINR}^{\mathrm{UL},v,\text{inst}}_{k}\) can again be used to define a \({\text{SE}}^{\mathrm{UL},v,1}_{k}\) as

$$\begin{aligned} {\text{SE}}^{\mathrm{UL},v,1}_{k} = \frac{\tau _{\mathrm{d}}}{\tau _{\mathrm{c}}} E \left\{ \log _{2} \left( 1+ \text{SINR}^{\mathrm{UL},v,\text{inst}}_{k}\right) \right\} \ \text{[bits/s/Hz]} \end{aligned}$$

which provides as before a tight upper bound for the virtual \({\text{SE}}^{\mathrm{UL},v}_{k}\), i.e.:

$$\begin{aligned} {\text{SE}}^{\mathrm{UL},v}_{k} \le {\text{SE}}^{\mathrm{UL},v,1}_{k} \end{aligned}$$

and this bound is tight for \(LN \gg 1\).

Taking the equivalence defined above into account, the combined DL transmit precoding and power allocation problem can be formulated as

$$\begin{aligned} \begin{aligned} \min _{\begin{array}{c} \varvec{\sigma }^{v} \ge {\textbf{0}} \\ \lambda \ge 0 \end{array}}&\max _{\begin{array}{c} {\textbf{p}}^{v} \ge {\textbf{0}} \\ {\textbf{W}} \end{array}} U({\text{SE}}_{1}, \ldots , {\text{SE}}_{K}) - \lambda \left( \sum _{k=1}^{K} p_{k}^{v} \sigma ^{2,DL}_{k} - \sum _{l=1}^{L}\sigma ^{v}_{l} P^{l}_{t} \right) \\ {\text{s.t.}}&\ {\text{SE}}_{k} = {\text{SE}}^{\mathrm{UL},v}_{k}({\textbf{p}}^{v}, {\textbf{w}}_{k}, \varvec{\sigma }^{v}) \le {\text{SE}}^{\mathrm{UL},v,1}_{k}({\textbf{p}}^{v}, {\textbf{w}}_{k}, \varvec{\sigma }^{v}) \ \forall k\\ \end{aligned} \end{aligned}$$

with the optimal value for \(\varvec{\rho }\) defined using (48).

4.3 DL transmit precoding and power allocation

Since the dual problems (56) is convex in \(\varvec{\sigma }^{v}\), it can be solved via a subgradient projection algorithm, where the subgradient for \(\varvec{\sigma }^{v}\) can be found using the proposed UL-DL duality. A subgradient can be derived from (45) as

$$\begin{aligned} \lambda ^{\star }(\varvec{\sigma }^{v}) \left( P^{l}_{t} - \sum _{k=1}^{K} \rho ^{\star }_{k}(\varvec{\sigma }^{v}) E\left\{ {\textbf{w}}^{\star ,H}_{kl}(\varvec{\sigma }^{v}) {\textbf{w}}^{\star }_{kl}(\varvec{\sigma }^{v})\right\} \right) \end{aligned}$$

where \(\varvec{\rho }^{\star }(\varvec{\sigma }^{v}), \lambda ^{\star }(\varvec{\sigma }^{v})\) and \({\textbf{W}}^{\star }(\varvec{\sigma }^{v})\) are the optimal solutions of \(\min _{\lambda \ge 0} \max _{\varvec{\rho } \ge {\textbf{0}}, {\textbf{W}}} \mathcal{L}^{\mathrm{DL}} (\varvec{\rho }, {\textbf{W}}, \varvec{\sigma }^{v})\) and can also be obtained using the optimal solutions of \(\min _{\lambda \ge 0}\max _{\varvec{\rho } \ge {\textbf{0}}, {\textbf{W}}} \mathcal{L}^{\mathrm{UL},v} ({\textbf{p}}^{v}, {\textbf{W}}, \varvec{\sigma }^{v})\) and using (48) to obtain \(\varvec{\rho }^{\star }(\varvec{\sigma }^{v})\). By inspecting \(\min _{\lambda \ge 0}\max _{{\textbf{p}}^{v} \ge {\textbf{0}}, {\textbf{W}}} \mathcal{L}^{\mathrm{UL},v} ({\textbf{p}}^{v}, {\textbf{W}}, \varvec{\sigma }^{v})\), it can be shown that this is the dual problem of the following virtual UL problem:

$$\begin{aligned} \begin{aligned} \max _{{\textbf{p}}^{v}\ge {\textbf{0}}, {\textbf{W}}}&\quad U\left( {\text{SE}}_{1}^{\mathrm{UL},v}, \ldots , {\text{SE}}_{K}^{\mathrm{UL},v}\right) \\ {\text{s.t.}}&\quad \sum _{k=1}^{K} p^{v}_{k} \sigma ^{2,\text{DL}} \le \sum _{l=1}^{L}\sigma ^{v}_{l} P^{l}_{t}.\\ \end{aligned} \end{aligned}$$

The optimal solutions for \(\varvec{\rho }^{\star }(\varvec{\sigma }^{v})\) and \({\textbf{W}}^{\star }(\varvec{\sigma }^{v})\) can alternatively be found from this virtual UL problem, which is a particular instance of problem (19) and can thus be solved using Algorithm 1.

However, as computing the required maximization for (57) in the CFmMIMO setup will require multiple iterations of Algorithm 1 over different coherence blocks, this will result in a slow tracking speed. Therefore, it is proposed to use the solutions after one iteration of the alternating optimization method to update the dual variables. Also all the subgradients (57) are scaled with a common factor \(\lambda ^{\star }(\varvec{\sigma }^{v})\), which can be compensated by the stepsize, so its computation can be avoided. The simulations in Sect. 6 will show that the convergence behavior is not affected.

The proposed algorithm to solve the DL transmit precoding and power allocation problem can thus be summarized as follows:

  1. 1.

    Set \(i \rightarrow 0\) and initialize \({\textbf{p}}^{v,0} \ge {\textbf{0}}, \varvec{\sigma }^{v,0} \ge {\textbf{0}}\) randomly

  2. 2.

    Determine the precoding matrix \({\textbf{W}}^{i+1}\) for each coherence block that maximizes:

    $$\begin{aligned} \begin{aligned} \max _{{\textbf{W}}}&\quad U({\text{SE}}_{1}, \ldots , {\text{SE}}_{K})\\ {\text{s.t.}}&\quad {\text{SE}}_{k} = {\text{SE}}^{\mathrm{UL},v,1}_{k}\left( {\textbf{p}}^{v,i}, {\textbf{w}}_{k}, \varvec{\sigma }^{v,i}\right) \quad \forall k \end{aligned} \end{aligned}$$

    for a fixed transmit power \({\textbf{p}}^{v,i}\) and \(\varvec{\sigma }^{v,i}\) by using the virtual quantities in the MMSE-expression (15) or (16).

  3. 3.

    Determine the optimal transmit powers \({\textbf{p}}^{v,i+1}\) that maximizes:

    $$\begin{aligned} \begin{aligned} \max _{{\textbf{p}}^{v} \ge {\textbf{0}}}&\quad U({\text{SE}}_{1}, \ldots , {\text{SE}}_{K}) \\ {\text{s.t.}}&\quad {\text{SE}}_{k} = {\text{SE}}^{\mathrm{UL},v}_{k}\left( {\textbf{p}}^{v}, {\textbf{w}}^{i+1}_{k}, \varvec{\sigma }^{v,i}\right) \quad \forall k \\&\quad \sum _{k=1}^{K} p_{k}^{v} \sigma ^{2,DL}_{k} \le \sum _{l=1}^{L} \sigma _{l} P^{l}_{t} \end{aligned} \end{aligned}$$

    and compute \(\varvec{\rho }^{i+1}\) using (48).

  4. 4.

    Update the dual variables with stepsize \(t_{i}\) using the current estimate for the subgradient:

    $$\begin{aligned} \sigma ^{i+1}_{l} = \max \left( \sigma ^{i}_{l} - t_{i} \left( P^{l}_{t} - \sum _{k=1}^{K} \rho ^{i+1}_{k}{\mu ^{l,i+1}_{k}} \right) , 0\right) \ \forall l. \end{aligned}$$
  5. 5.

    Set \(i \leftarrow i + 1\) and return to step 2.

Algorithm 2 presents the corresponding distributed DL tranmsit beamforming and power allocation algorithm (DL-D-DTTP). Algorithm 2 also has the advantages of Algorithm 1, since the communications over the fronthaul network and the local computations are similar. The introduction of the dual variables \(\varvec{\sigma }\) and the virtual uplink network allows to make abstraction of the DL transmit power constraints in (44) and transforms them into a virtual noise source using \({\textbf{R}}_{{\textbf{n}} {\textbf{n}}}^{\mathrm{UL},v}\). The virtual uplink problem has only one constraint related to the virtual uplink powers and can then be efficiently solved using Algorithm 1. The dual subgradient method is used to update the dual variables, which can be done efficiently and locally at each AP, without requiring the transmission of the transmit precoder \({\textbf{W}}_{l} = {\textbf{X}}_{l} {\textbf{U}}\) to the other APs or NC.

It is possible for an AP to include additional constraints on its antenna powers. Examples are per-antenna power constraints or per-cluster constraints when an AP is a concatenation of different antenna arrays with local power amplifiers. The proposed algorithm can be extended by introducing an additional dual variable \(\sigma ^{v}_{l,m}\) for each additional constraint locally at AP l and redefining \(\nu ^{l}_{k}\) as \(\sum _{m} \sigma ^{v}_{l,m} \nu ^{l,m}_{k}\) where the sum is over all the local constraints and \(\nu ^{l,m}_{k}\) is the measured power relevant for the m-th constraint. The dual variable update can be extended in a similar way. The virtual noise correlation matrix \({\textbf{R}}^{\mathrm{UL},v}_{{\textbf{n}}_{l} {\textbf{n}}_{l}}\) in (62) should then include the extra dual variables on the corresponding positions. The introduction of the additional constraints at an AP increases the computational load of an AP, but not of the NC.Footnote 6

figure b

5 Further considerations

5.1 Examples of power allocation strategies

In this subsection, two different utility functions are introduced, which will be used for the simulations in Sect. 6. It is repeated that the requirements are that the utility function is monotonically nondecreasing in each argument SE\(_{k}\) and that the power allocation problem in (29) and (60) have a unique maximum that can be obtained. The details of the algorithms used to obtain the optimal solution for (29) and (60) can be found in the provide references.

The first utility function is the non-weighted max-sum SE, which represents the total number of bits per second that are transmitted without considering how these bits are assigned between the UEs. The max-sum SE problem can be expressed as

$$\begin{aligned} U({\text{SE}}_{1}, \ldots , {\text{SE}}_{K}) = \sum _{k=1}^{K} {\text{SE}}_{k}. \end{aligned}$$

Note that the above problem is usually not convex in \({\textbf{p}}\) or \(\varvec{\rho }\) for given combining/precoding matrices \({\textbf{V}}^{i+1},{\textbf{W}}^{i+1}\) and, hence, it is hard to obtain the optimal solution [43]. There exist global optimization methods [44, 45], but their computational complexity is unsuitable for real-time applications. A pragmatic solution is to instead settle for a local optimum. Common approaches for finding a local optimum to the max-sum SE problem are the weighted MMSE method [46] or successive convex optimization [47], which result in iterative algorithms. Schemes using machine learning can potentially find better solutions [20, 21], but require some off-line training and will not provide guaranteed convergence to the optimal solution. Therefore, successive convex optimization [47] will be used for the simulations in Sect. 6.

The max-sum SE intrinsically provides a higher SE for users with good channel conditions, while leaving users with worse channel conditions at a disadvantage. Therefore the goal of the second utility function is to provide uniform fairness for all users. The aim of the non-weighted max–min SE fairness is to maximize the minimum SE among all the UEs in the network:

$$\begin{aligned} U({\text{SE}}_{1}, \ldots , {\text{SE}}_{K}) = \min _{k} {\text{SE}}_{k}. \end{aligned}$$

Since the SE of UE k is an increasing function of the effective SINR\(^{\mathrm{UL},2}\) or SINR\(^{\mathrm{UL},v}\) respectively, maximizing the minimum SE is the same as maximizing the minimum effective SINR among all the UEs. This quasi-concave problem can be solved using the epigraph trick [48] combined with a bisection method [4, 5, 47, 49] or using fixed-point algorithms [50]. Since all these algorithms converge to a global maximum, the methods in [5] is chosen for the simulations in Sect. 6.

Other utility functions that result in convex optimization problems for fixed combining/precoding matrices \({\textbf{V}}^{i+1},{\textbf{W}}^{i+1}\) [44], are for example the weighted geometric mean \(\prod _{k} {\text{SE}}^{w_{k}}_{k}\) and the weighted harmonic mean \((\sum _{k} w_{k}/{\text{SE}}_{k})^{-1}\) for some positive weights \(\{w_{k}\}_{\forall k}\).

5.2 Network topologies

The proposed algorithms have been described as if the network has a tree topology, where all the APs are connected and the NC is the root node. However, they are applicable to any network topology since selecting a tree topology is always possible whenever all the APs are connected [51]. The tree topology allows for an efficient way of computing the required in-network sums in D-UL-RCPA and D-DL-TPPA by starting from the leaf nodes and summing up all the required data when moving towards the root node. Therefore, many current technologies like radio-stripes [52], very large aperture mMIMO [53], distributed mMIMO [54], etc. can be used to realize the proposed algorithms.

Table 2 provides an overview of the required communications in one iteration i of the proposed algorithms compared to a network-wide setup, where each AP merely transmits its received signals to the NC. Note that quantities like \({\textbf{X}}^{H}_{l} \hat{{\textbf{H}}}_{l}\) are Hermitian symmetric, so only transmitting the upper blockdiagonal part suffices. The big advantage of the proposed algorithms is that the training signals do not need to be transmitted to the NC and that only K signals are broadcast over the different links instead of NL. Therefore the gain in fronthaul communication can be expressed as \({\mathcal{O}}(\frac{NL}{K})\). Another advantage of the proposed algorithms is that the required large-scale statistics \({\textbf{R}}_{kl}\) have to be available only at AP l and that the processing requirement of the NC is relaxed, i.e. it only has to invert \(K \times K\) matrices instead of \(NL \times NL\) matrices.

Table 2 Number of complex scalars communicated over each fronthaul link (either from AP to NC or from NC to AP) during B coherence blocks

6 Numerical simulations

Numerical simulations are used in this section to demonstrate the performance and convergence properties of the proposed algorithms. The parameters used for the setup are presented in Table 3. The propagation model in [24, Sec. 4.1.3] with spatially correlated fading is used. \(\tau _{\mathrm{u}} = 190\) and \(\tau _{\mathrm{d}} = 190\) are used respectively when evaluating the SE expressions for UL and DL. A per UE transmit power constraint is used for UL power allocation and a per AP transmit power constraint for DL power allocation. The presented figures and tables are the results averaged over 50 Monte Carlo simulations. The APs and UEs are distributed uniformly at random over the area with a wrap-around technique to determine the shortest distance between a UE and the APs in each Monte Carlo simulation. \(B =\) 100 channel realizations are used to estimate the expectations and to estimate the large-scale statistic of Table 1.

Table 3 Key parameters of considered simulation setup, based on [17]

6.1 Numerical simulations UL

6.1.1 Estimation method

The first UL simulations are conducted to investigate the validity of Theorem 3.2 and the tightness of (23). This is done for a scenario where all UEs are transmitting with maximum power \(p_{k} = 100\) mW\(\ \forall k\) and MMSE-based combining is used. The results for different numbers of antennas per AP are shown in Fig. 3.

Fig. 3
figure 3

Different UL SE for MMSE combining with fixed UE transmit power

SE\(_{k}^{\mathrm{UL},2}\) and the SE\(_{k}^{\mathrm{UL},2}\) (true), which is the result obtained when the true channel is used in the expectation, are almost perfectly overlapping for all the considered setups. This shows that when a sufficiently large number of samples are used to estimate the quantities of Theorem 3.2, the required assumptions hold for the MMSE-based combining vector. Here \(B = 100\) samples are used for the expectations, so this provides already a good estimate of the average channel statistics.

Regarding the tightness of the bound in (23) which can be observed by comparing SE\(_{k}^{\mathrm{UL},1}\) and the SE\(_{k}^{\mathrm{UL},2}\), it can be observed that SE\(_{k}^{\mathrm{UL},2}\) is always a lower bound for SE\(_{k}^{\mathrm{UL},1}\), but this bound becomes tighter as the number of antennas per AP increases. For the setup with \(N = 16\), there is almost no difference between the different SEs, which validates the assumption of equality in the proposed algorithms.

6.1.2 Convergence behavior

To investigate the convergence behavior of D-UL-RCPA, the utility function over different iterations is plotted in Fig. 4, both for max-sum and max–min power allocation. The iterations are started from a random feasible power allocation \({\textbf{p}}^{0} \ge {\textbf{0}}\). The results are plotted for both SE\(_{k}^{\mathrm{UL},1}\) and SE\(_{k}^{\mathrm{UL},2}\) and for different numbers of antennas.

Fig. 4
figure 4

Convergence behavior of \(\sum _{k=1}^{K}\) SE\(_{k}\) and \(\min _{k}\) SE\(_{k}\) for D-UL-RCPA over different iterations with max-sum (upper figure) and max–min (lower figure) power allocation and a per UE transmit power constraint

As expected, both \(\sum _{k=1}^{K}\)SE\(_{k}^{\mathrm{UL},1}\) and \(\sum _{k=1}^{K}\) SE\(_{k}^{\mathrm{UL},2}\) for max-sum power allocation and \(\min _{k}\)SE\(_{k}^{\mathrm{UL},1}\) and \(\min _{k}\) SE\(_{k}^{\mathrm{UL},2}\) for max–min power allocation are monotonically increasing and reach an equilibrium point after 3 iterations for each setup. This is even the case for the max-sum power allocation, although the power allocation problem is here only solved to local optimality. The fast convergence and significant performance improvement are desirable properties of D-UL-RCPA.

6.1.3 Performance

The performance of the combining vectors and power allocation obtained after convergence of D-UL-RCPA is compared with heuristic distributed combining methods, namely MR combining and Local MMSE (LMMSE) combining. In MR combining, the local combining vector is chosen as \({\textbf{v}}_{kl} = \hat{{\textbf{h}}}_{kl}\) for UE k at AP l. In LMMSE, the combining vector is chosen to minimize the local MMSE criterion \(E\{ || s_{k} - {\textbf{v}}^{H}_{kl} {\textbf{y}}_{l} ||^{2} | \hat{{\textbf{H}}}_{l} \}\) and is given as \({\textbf{v}}_{kl} = p_{k} \left( \hat{{\textbf{H}}}_{l} {\textbf{P}} \hat{{\textbf{H}}}^{H}_{l} + \sum _{i=1}^{K} p_{i} {\textbf{C}}_{il} + {\textbf{R}}_{{\textbf{n}}_{l} {\textbf{n}}_{l}}^{\mathrm{UL}} \right) ^{-1} \hat{{\textbf{h}}}_{kl}\). The optimal power allocation for both combining schemes is chosen similar as in D-UL-RCPA, but with \({\textbf{U}}\) fixed to an identity matrix \({\textbf{I}}_{K}\). The optimal power allocation for MR combining is then found in one iteration, since \({\textbf{v}}_{kl}\) is independent of \({\textbf{p}}\). For LMMSE combining this will result in an iterative algorithm, but convergence is here also observed after 3 iterations. The cumulative distribution function (CDF) of the UL SE per UE is provided in Fig. 5 for 3 kinds of power allocation: uniform power allocation (where each UE transmits with maximal power), max-sum and max–min power allocations and for setups with either \(N = 4\) antennas per AP or \(N = 16\) antennas per AP. The values of the utility function compared to uniform power allocation are given in Table 4 for max-sum and Table 5 for max–min power allocation.

Fig. 5
figure 5

UL SE of SE\(^{\mathrm{UL},1}\) (–) and SE\(^{\mathrm{UL},2}\) (- -) for different combining schemes with uniform, max-sum and max–min power allocation

Table 4 \(\sum _{k=1}^{K}\)SE\(_{k}\) after convergence of D-UL-RCPA (uniform power allocation | max-sum power allocation)
Table 5 \(\min _{k}\)SE\(_{k}\) after convergence of D-UL-RCPA (uniform power allocation | max–min power allocation)

It is clear that the proposed power allocation algorithm improves the utility function for all the presented receive combining methods. MR combining requires no inversion, but its performance is always inferior to LMMSE combining. Even though LMMSE combining requires an inversion of an \(N \times N\) matrix in each coherence block at each AP, while for MMSE-based combining this inversion only has to be performed after B coherence blocks at an AP l, its performance is always inferior to MMSE-based combining.

6.2 Numerical simulations DL

6.2.1 Convergence behavior

D-DL-TPPA is run for 40 iterations using a stepsize of \(t_{i} = \min (1,10/i)\) and initialized with virtual UL powers \(p^{\mathrm{v},0}_{k} = 100\) mW \(\forall k\) and a random feasible DL power allocation \(\varvec{\rho }^0 \ge {\textbf{0}}\). The utility function over different iterations is plotted in Fig. 6, both for max-sum and max–min power allocation. To verify if the constraints in all the APs are satisfied, the maximal constraint variation \(\max _{l} \sum _{k=1}^{K} \rho _{k} E\{{\textbf{w}}^{H}_{kl} {\textbf{w}}^{H}_{kl}\}\) is shown in Fig. 7 over the different iterations. The per AP transmit power constraints are satisfied when this is below 1000 mW.

Fig. 6
figure 6

Convergence behavior of \(\sum _{k=1}^{K}\) SE\(^{\mathrm{DL},2}_{k}\) and \(\min _{k}\) SE\(^{\mathrm{DL},2}_{k}\) for D-DL-TPPA over different iterations with max-sum (upper figure) and max–min (lower figure) power allocation and a per AP transmit power constraint

Fig. 7
figure 7

Convergence behavior of maximal constraint violation \(\max _{l} \sum _{k=1}^{K} \rho _{k} E\{{\textbf{w}}^{H}_{kl} {\textbf{w}}^{H}_{kl}\}\) of D-DL-TPPA over different iterations with max-sum and max–min power allocation and a per AP transmit power constraint

While difficult the observe in Fig. 6, the utility function is no longer monotonically increasing since the proposed method is based on a dual subgradient projection algorithm. However, the objective remains approximately constant after 4 iterations. Looking at the constraint violation in Fig. 7, it can be observed that only for \(N = 4,8,16\) and max-sum power allocation convergence is reached after 40 iterations. For \(N = 2\) is fluctuating around 1100 mW, probably because the APs have a lower degree of freedom (fewer antennas) so need more iterations with smaller stepsizes to converge. The constraint violations for max–min power allocation are converging very slowly, which is typical for dual subgradient methods. However, as will be shown in the next subsection, the performance after 40 iterations is still superior even when the DL transmit powers are normalized to satisfy the constraints.

6.2.2 Performance

Finally, the performance of the precoding vectors and power allocation obtained after 40 iterations of D-DL-TPPA is again compared to MR precoding and LMMSE precoding. To make sure that the per AP transmit power constraint is not violated during the evaluation, the DL transmit powers are normalized at the APs where this constraint is violated. The heuristic power allocation method proposed in [17] is used to determine the DL transmit powers for full power transmission. The cumulative distribution function (CDF) of the DL SE per UE are provided in Fig. 8 for 3 different power allocations: full power transmission, max-sum and max–minpower allocation, and for setups with either \(N = 4\) antennas per AP or \(N = 16\) antennas per AP. The values of the utility function compared to full power transmission are given in Table 6 for max-sum and Table 7 for max–min power allocation.

Fig. 8
figure 8

a Full power transmission. b Max-sum power allocation. c Max–min power allocation

Table 6 \(\sum _{k=1}^{K}\)SE\(^{\mathrm{DL},2}\) after convergence of D-DL-TPPA (full power transmission | max-sum power allocation)
Table 7 \(\min _{k}\)SE\(^{\mathrm{DL},2}\) after convergence of D-DL-TPPA (full power transmission | max–min power allocation)

Similar observations as for the UL simulations can be made concerning the performance of D-DL-TPPA. The proposed power allocation algorithm improves the utility function for all the presented precoding methods, MR combining requires less computational effort, but its performance is inferior and MMSE-based combining outperforms LMMSE combining.

7 Conclusion

This paper has presented distributed MMSE-based receive combining, transmit beamforming and power allocation strategies for both UL as well as DL data transmission in CFmMIMO systems. The necessary fronthaul communications to estimate the combining/precoding vectors and the necessary large-scale channel statistics are reduced to a minimum and rely on in-network summation that can be accomplished whenever the APs can be arranged into a tree-topology. The computations are also distributed over the different APs and the NC. It is shown that the MMSE-based algorithms reach optimal performance.

In this paper it is assumed that all APs serve all the UEs in the network and the data of all the UEs is transmitted to one central NC. This is not scalable when the network size and the number of UEs grow larger. Therefore in future work, the current algorithms will be used as benchmark algorithms when designing future scalable algorithms in which not all APs serve all users and certain APs take some of the functiona;ity of the NC. To not revert to cellular systems, a user-centric approach [17, 32] will be followed, in which each UE can determine its serving APs.

Availability of data and materials

Not applicable.


  1. One can consider the more general Rician fading channel [23], but this is not considered here for simplicity.

  2. Changes are due to UE mobility or new scheduling decisions, but these channel statistics changes are typically 50 times slower than the small scale fading statistics changes, so are assumed negligible for several coherence blocks.

  3. The pilot transmit powers are assumed to remain fixed. The estimation of the quantities \({\textbf{C}}_{k}\) and \(\hat{{\textbf{h}}}_{k}\) is thus not depending on the assigned UL transmit powers. Joint pilot and UL transmit power allocation methods can be found in [36, 37], but are not considered here.

  4. When the number of antennas per AP is large, the effective channels to the desired UEs are almost deterministic after combining/precoding, although the channel responses are random. This phenomenon is called channel hardening and can be expressed as \(\frac{{\textbf{v}}_{k}^{H} {\textbf{h}}_{k}}{NL} \rightarrow \frac{E\{{\textbf{v}}_{k}^{H} {\textbf{h}}_{k}\}}{NL}\) if \(NL \rightarrow \infty\).

  5. There are closed form expressions available in [15, 38] when the number of antennas in the network grows to infinity, but these do not allow for an efficient distributed computation.

  6. Note that the per-AP power constraint will only hold upon convergence of the subgradient algorithm. Therefore, it may be necessary to rescale the DL transmit powers locally at each AP l such that \(\sum _{k=1}^{K} \rho _{kl} \mu ^{l}_{k} \le P^{l}_{t}\) before transmitting the DL signal. This will not influence the convergence of the algorithm.


  1. E. Björnson, L. Sanguinetti, H. Wymeersch, J. Hoydis, T.L. Marzetta, Massive MIMO is a reality-What is next?: Five promising research directions for antenna arrays. Digit. Signal Process. 94, 3–20 (2019)

    Article  Google Scholar 

  2. D. Gesbert, S. Hanly, H. Huang, S. Shamai Shitz, O. Simeone, W. Yu, Multi-cell MIMO cooperative networks: a new look at interference. IEEE J. Sel. Areas Commun. 28(9), 1380–1408 (2010)

    Article  Google Scholar 

  3. T.L. Marzetta, Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE Trans. Wirel. Commun. 9(11), 3590–3600 (2010)

    Article  Google Scholar 

  4. H.Q. Ngo, A. Ashikhmin, H. Yang, E.G. Larsson, T.L. Marzetta, Cell-free massive MIMO versus small cells. IEEE Trans. Wirel. Commun. 16(3), 1834–1850 (2017)

    Article  Google Scholar 

  5. E. Nayebi, A. Ashikhmin, T.L. Marzetta, H. Yang, B.D. Rao, Precoding and power optimization in cell-free massive MIMO systems. IEEE Trans. Wirel. Commun. 16(7), 4445–4459 (2017)

    Article  Google Scholar 

  6. E. Björnson, L. Sanguinetti, Making cell-free massive MIMO competitive with MMSE processing and centralized implementation. IEEE Trans. Wirel. Commun. 19(1), 77–90 (2020)

    Article  Google Scholar 

  7. S. Shamai, B.M. Zaidel, Enhancing the cellular downlink capacity via co-processing at the transmitting end. IEEE Veh. Technol. Conf. 3(53ND), 1745–1749 (2001)

    Google Scholar 

  8. S. Zhou, M. Zhao, X. Xu, J. Wang, Y. Yao, Distributed wireless communication system: a new architecture for future public wireless access. IEEE Commun. Mag. 41(3), 108–113 (2003)

    Article  Google Scholar 

  9. M. Boldi, A. Tölli, M. Olsson, E. Hardouin, T. Svensson, F. Boccardi, L. Thiele, V. Jungnickel, Coordinated multipoint (comp) systems, in Mobile and Wireless Communications for IMT-Advanced and Beyond (Wiley, 2011), pp. 121–155

  10. E. Björnson, R. Zakhour, D. Gesbert, B. Ottersten, Cooperative multicell precoding: rate region characterization and distributed strategies with instantaneous and statistical CSI. IEEE Trans. Signal Process. 58(8), 4298–4310 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  11. S. Chen, J. Zhang, E. Bjornson, J. Zhang, B. Ai, Structured massive access for scalable cell-free massive MIMO systems. IEEE J. Sel. Areas Commun. 39(4), 1086–1100 (2021)

    Article  Google Scholar 

  12. G. Interdonato, P. Frenger, E.G. Larsson, Scalability aspects of cell-free massive MIMO, in IEEE International Conference on Communications (2019), pp. 1–5

  13. S. Chen, J. Zhang, J. Zhang, E. Björnson, B. Ai, A survey on user-centric cell-free massive MIMO systems. Digit. Commun. Netw. 8, 695–719 (2021)

    Article  Google Scholar 

  14. M. Bashar, K. Cumanan, A.G. Burr, M. Debbah, H.Q. Ngo, On the uplink max–min SINR of cell-free massive MIMO systems. IEEE Trans. Wirel. Commun. 18(4), 2021–2036 (2019)

    Article  Google Scholar 

  15. E. Nayebi, A. Ashikhmin, T.L. Marzetta, B.D. Rao, Performance of cell-free massive MIMO systems with MMSE and LSFD receivers, in 2016 59th Asilomar Conference on Signals, Systems and Computers (IEEE, 2017), pp. 203–207

  16. G. Interdonato, M. Karlsson, E. Björnson, E.G. Larsson, Local partial zero-forcing precoding for cell-free massive MIMO. IEEE Trans. Wirel. Commun. 19(7), 4758–4774 (2020)

    Article  Google Scholar 

  17. E. Björnson, L. Sanguinetti, scalable cell-free massive MIMO systems. IEEE Trans. Commun. 68(7), 4247–4261 (2020).

  18. D. Tse, P. Viswanath, Fundamentals of Wireless Communication (Cambridge University Press, New York, 2005)

    Book  MATH  Google Scholar 

  19. C. D’Andrea, A. Zappone, S. Buzzi, M. Debbah, Uplink power control in cell-free massive MIMO via deep learning, in 2019 IEEE 8th International Workshop on Computational Advances in Multi-sensor Adaptive Processing (CAMSAP) (2019), pp. 554–558

  20. M. Bashar, A. Akbari, K. Cumanan, H.Q. Ngo, A.G. Burr, P. Xiao, M. Debbah, J. Kittler, Exploiting deep learning in limited-fronthaul cell-free massive MIMO uplink. IEEE J. Sel. Areas Commun. 38(8), 1678–1697 (2020)

    Article  Google Scholar 

  21. F. Liang, C. Shen, W. Yu, F. Wu, Towards optimal power control via ensembling deep neural networks. IEEE Trans. Commun. 68(3), 1760–1776 (2020)

    Article  Google Scholar 

  22. H.Q. Ngo, A. Ashikhmin, H. Yang, E.G. Larsson, T.L. Marzetta, Cell-free massive MIMO: uniformly great service for everyone, in IEEE Workshop on Signal Processing Advances in Wireless Communications, SPAWC (2015), pp. 201–205

  23. Ö. Özdogan, E. Björnson, E.G. Larsson, Massive MIMO with spatially correlated Rician fading channels. IEEE Trans. Commun. 67(5), 3234–3250 (2019)

    Article  Google Scholar 

  24. E. Björnson, J. Hoydis, L. Sanguinetti, Massive MIMO networks: spectral, energy, and hardware efficiency. Found. Trends® Signal Process. 11(3–4), 154–655 (2017)

    Article  Google Scholar 

  25. H. Yin, L. Cottatellucci, D. Gesbert, R.R. Müller, G. He, Robust pilot decontamination based on joint angle and power domain discrimination. IEEE Trans. Signal Process. 64(11), 2990–3003 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  26. E. Björnson, J. Hoydis, L. Sanguinetti, Massive MIMO has unlimited capacity. IEEE Trans. Wirel. Commun. 17(1), 574–590 (2018)

    Article  Google Scholar 

  27. R. Van Rompaey, M. Moonen, Scalable and distributed MMSE algorithms for uplink receive combining in cell-free massive MIMO systems, in ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021), pp. 4445–4449

  28. D. Neumann, M. Joham, W. Utschick, Covariance matrix estimation in massive MIMO. IEEE Signal Process. Lett. 25(6), 863–867 (2018)

    Article  Google Scholar 

  29. H. Xie, F. Gao, S. Jin, An overview of low-rank channel estimation for massive MIMO systems. IEEE Access 4, 7313–7321 (2016)

    Article  Google Scholar 

  30. E. Björnson, L. Sanguinetti, M. Debbah, Massive MIMO with imperfect channel covariance information, in 2016 50th Asilomar Conference on Signals, Systems and Computers (2016), pp. 974–978

  31. J. Zhang, S. Chen, Y. Lin, J. Zheng, B. Ai, L. Hanzo, Cell-free massive MIMO: a new next-generation paradigm. IEEE Access 7, 99 878-99 888 (2019)

    Article  Google Scholar 

  32. S. Buzzi, C. D’Andrea, Cell-free massive MIMO: user-centric approach. IEEE Wirel. Commun. Lett. 6(6), 706–709 (2017)

    Article  Google Scholar 

  33. J. Zhang, Y. Wei, E. Björnson, S. Member, Performance analysis and power control of cell-free massive MIMO systems with hardware impairments. IEEE Access 6, 55302–55314 (2018)

    Article  Google Scholar 

  34. C.B. Peel, B.M. Hochwald, A.L. Swindlehurst, A vector-perturbation technique for near-capacity multiantenna multiuser communication—Part I : channel inversion and regularization. IEEE Trans. Commun. 53(1), 195–202 (2005)

    Article  Google Scholar 

  35. E. Biglieri, J. Proakis, S. Shlomo, Fading channels : information-theoretic and communications aspects. IEEE Trans. Inf. Theory 44(6), 2619–2692 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  36. T.V. Chien, E. Björnson, E.G. Larsson, Joint pilot design and uplink power allocation in multi-cell massive MIMO systems. IEEE Trans. Wirel. Commun. 17(3), 2000–2015 (2018)

    Article  Google Scholar 

  37. T.C. Mai, H.Q. Ngo, M. Egan, T.Q. Duong, S. Member, Pilot power control for cell-free massive MIMO. IEEE Trans. Veh. Technol. 67(11), 11 264-11 268 (2018)

    Article  Google Scholar 

  38. J. Hoydis, S. Ten Brink, M. Debbah, Massive MIMO in the UL/DL of cellular networks: How many antennas do we need? IEEE J. Sel. Areas Commun. 31(2), 160–171 (2013)

    Article  Google Scholar 

  39. D.P. Bertsekas, W.W. Hager, O.L. Mangasarian, Nonlinear Programming (Athena Scientific, Belmont, 1998)

    Google Scholar 

  40. W. Yu, T. Lan, Transmitter optimization for the multi-antenna downlink with per-antenna power constraints. IEEE Trans. Signal Process. 55(6 I), 2646–2660 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  41. V. Le Nir, M. Moonen, M. Guenach, J. Verlinden, Broadcast channel optimal spectrum balancing (BC-OSB) with per-modem total power constraints for downstream DSL, in European Signal Processing Conference (2007), pp. 2189–2193

  42. R.J. Plemmons, M-matrix characterizations. I-nonsingular M-matrices. Linear Algebra Appl. 18(2), 175–188 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  43. Z.Q. Luo, S. Zhang, Dynamic spectrum management: complexity and duality. IEEE J. Sel. Topics Signal Process. 2(1), 57–73 (2008)

    Article  Google Scholar 

  44. E. Björnson, E. Jorswieck, Optimal resource allocation in coordinated multi-cell systems. Found. Trends Inf. Theory 9(2–3), 113–381 (2012)

    MATH  Google Scholar 

  45. H. Tuy, Monotonic optimization: problems and solution approaches. SIAM J. Optim. 11(2), 464–494 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  46. Q. Shi, M. Razaviyayn, Z.Q. Luo, C. He, An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel. IEEE Trans. Signal Process. 59(9), 4331–4340 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  47. S. Buzzi, C. D’Andrea, A. Zappone, User-centric 5G cellular networks: resource allocation and comparison with the cell-free massive MIMO approach. IEEE Trans. Wirel. Commun. 19(2), 1250–1264 (2020)

    Article  Google Scholar 

  48. S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004)

    Book  MATH  Google Scholar 

  49. M. Bashar, K. Cumanan, A.G. Burr, M. Debbah, H.Q. Ngo, Enhanced max–min SINR for uplink cell-free massive MIMO systems, in IEEE International Conference on Communications (2018)

  50. Y.W. Hong, C.W. Tan, L. Zheng, C.L. Hsieh, C.H. Lee, A unified framework for wireless max–min utility optimization with general monotonic constraints, in IEEE INFOCOM 2014—IEEE Conference on Computer Communications (IEEE, 2014), pp. 2076–2084

  51. H. Chen, A. Campbell, B. Thomas, A. Tamir, Minimax flow tree problems. Networks 54, 117–129 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  52. G. Interdonato, E. Björnson, H.Q. Ngo, P. Frenger, E.G. Larsson, Ubiquitous cell-free massive MIMO communications. EURASIP J. Wirel. Commun. Netw. 197, 2019 (2019)

    Google Scholar 

  53. O. Mart, E.D. Carvalho, J.Ø. Nielsen, Towards Very Large Aperture Massive MIMO: A Measurement Based Study (2014), 281–286

  54. U. Madhow, D.R. Brown, S. Dasgupta, R. Mudumbai, Distributed massive MIMO: algorithms, architectures and concept systems, in 2014 Information Theory and Applications Workshop, ITA 2014—Conference Proceedings (2014)

Download references


Not applicable.


The work of R. Van Rompaey was supported by a PhD Fellowship of the Research Foundation Flanders (FWO-Vlaanderen No. 40062). This research work was carried out at the ESAT Laboratory of KU Leuven, in the frame of Research Council KU Leuven Project C3-19-00221 “Cooperative Signal Processing Solutions for IoT-based Multi-User Speech Communication Systems", Research Council KU Leuven Project C24/16/019 “Distributed Digital Signal Processing for Ad-hoc Wireless Local Area Audio Networking" and Fonds de la Recherche Scientifique - FNRS and the Fonds Wetenschappelijk Onderzoek - Vlaanderen under EOS Project no 30452698 ‘(MUSE-WINET) MUlti-SErvice WIreless NETwork’. The scientific responsibility is assumed by its authors.

Author information

Authors and Affiliations



RVR and MM conceived of the presented idea. RVR developed the theoretical formalism, performed the analytic calculations and performed the numerical simulations. All authors discussed the results and contributed to the final manuscript.

Corresponding author

Correspondence to Robbe Van Rompaey.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Text for this section.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix 1: Proof of Theorem 3.1


The combining matrix \({\textbf{V}}^{\mathrm{MMSE}} = [{\textbf{v}}^{\mathrm{MMSE}}_{1}... {\textbf{v}}^{\mathrm{MMSE}}_{K} ]\) can be written as

$$\begin{aligned} {\textbf{V}}^{\mathrm{MMSE}} = \left( \hat{{\textbf{H}}} {\textbf{P}} \hat{{\textbf{H}}}^{H} + \underbrace{ \sum _{i=1}^{K} p_{i} {\textbf{C}}_{i} + {\textbf{R}}_{{\textbf{n}} {\textbf{n}}}^{\mathrm{UL}}}_{{\textbf{T}}} \right) ^{-1} \hat{{\textbf{H}}} {\textbf{P}}. \end{aligned}$$

The matrix \({\textbf{V}}^{\mathrm{MMSE}}\) can further be written as

$$\begin{aligned}&= \left( {\hat{\textbf{H}}} {\textbf{P}} {\hat{\textbf{H}}}^{H} + {\textbf{T}} \right) ^{-1} {\hat{\textbf{H}}} {\textbf{P}} \nonumber \\&= \left[ {\textbf{T}}^{-1} - {\textbf{T}}^{-1} {\hat{\textbf{H}}} \left( {\textbf{P}}^{-1} + {\hat{\textbf{H}}}^{H} {\textbf{T}}^{-1} {\hat{\textbf{H}}} \right) ^{-1} {\hat{\textbf{H}}}^{H} {\textbf{T}}^{-1} \right] {\hat{\textbf{H}}} {\textbf{P}} \nonumber \\&= {\textbf{T}}^{-1} {\hat{\textbf{H}}} \left( {\textbf{P}}^{-1} + {\hat{\textbf{H}}}^{H} {\textbf{T}}^{-1} {\hat{\textbf{H}}} \right) ^{-1} \nonumber \\&\quad \underbrace {{\left( {{\mathbf{P}}\left( {{\mathbf{P}}^{{ - 1}} + \widehat{{\mathbf{H}}}^{H} {\mathbf{T}}^{{ - 1}} \widehat{{\mathbf{H}}}} \right)^{{ - 1}} - \widehat{{\mathbf{H}}}^{H} {\mathbf{T}}^{{ - 1}} \widehat{{\mathbf{H}}}{\mathbf{P}}} \right)}}_{{\mathbf{I}}} \nonumber \\&= \left[\begin{matrix} {\textbf{X}}_{1}\\ \vdots \\ {\textbf{X}}_L \end{matrix}\right] \left( {\textbf{P}}^{-1} + {\textbf{X}}^{H} {\hat{\textbf{H}}} \right) ^{-1} \end{aligned}$$

with \({\textbf{X}}_{k}\) and \({\textbf{X}}^{H} \hat{{\textbf{H}}}\) defined in (18) and (17) respectively. In (73) the Sherman-Morrison-Woodbury formula is used together with the fact that \({\textbf{T}}\) is a block-diagonal matrix. \(\square\)

Appendix 2: Proof of Theorem 3.2


By using \({\textbf{h}}_{k} = \hat{{\textbf{h}}}_{k} + \tilde{{\textbf{h}}}_{k}\) with \(\hat{{\textbf{h}}}_{k}\) and \({\tilde{\textbf{h}}}_{k}\) independent random variables, the required expectations for Theorem 3.2 can be written as:

$$\begin{aligned} E\left\{ {\textbf{v}}^{H}_{k} {\textbf{h}}_{k} \right\}= & {} E\left\{ {\textbf{v}}^{H}_{k} {\hat{\textbf{h}}}_{k} \right\} + E\left\{ {\textbf{v}}^{H}_{k} {\tilde{\textbf{h}}}_{k} \right\} \nonumber \\= & {} E\left\{ {\textbf{v}}^{H}_{k} \hat{{\textbf{h}}}_{k} \right\} , \end{aligned}$$
$$\begin{aligned} E\left\{ \left| {\textbf{v}}^{H}_{k} {\textbf{h}}_{i}\right| ^{2} \right\}= & {} E\left\{ {\textbf{v}}^{H}_{k} \left( \hat{{\textbf{h}}}_{i} + \tilde{{\textbf{h}}}_{i}\right) \left( \hat{{\textbf{h}}}_{i} + \tilde{{\textbf{h}}}_{i}\right) ^{H} {\textbf{v}}_{k} \right\} \nonumber \\= & {} E\left\{ \left| {\textbf{v}}^{H}_{k} \hat{{\textbf{h}}}_{i} \right| ^{2} \right\} + E\left\{ {\textbf{v}}^{H}_{k} \tilde{{\textbf{h}}}_{i} \tilde{{\textbf{h}}}_{i}^{H} {\textbf{v}}_{k} \right\} \nonumber \\= & {} E\left\{ \left| {\textbf{v}}^{H}_{k} \hat{{\textbf{h}}}_{i} \right| ^{2} \right\} + E\left\{ {\textbf{v}}^{H}_{k} E\left\{ \tilde{{\textbf{h}}}_{i} \tilde{{\textbf{h}}}_{i}^{H} \right\} {\textbf{v}}_{k} \right\} \nonumber \\= & {} E\left\{ \left| {\textbf{v}}^{H}_{k} \hat{{\textbf{h}}}_{i} \right| ^{2} \right\} + E\left\{ {\textbf{v}}^{H}_{k} {\textbf{C}}_{i} {\textbf{v}}_{k} \right\} \nonumber \\= & {} E\left\{ \left| {\textbf{v}}^{H}_{k} \hat{{\textbf{h}}}_{i}\right| ^{2} \right\} + E\left\{ \sum _{l=1}^{L} {\textbf{v}}^{H}_{kl} {\textbf{C}}_{il} {\textbf{v}}_{kl} \right\} \nonumber \\= & {} E\left\{ \left| {\textbf{v}}^{H}_{k} \hat{{\textbf{h}}}_{i}\right| ^{2} \right\} + \sum _{l=1}^{L} E\left\{ {\textbf{v}}^{H}_{kl} {\textbf{C}}_{il} {\textbf{v}}_{kl} \right\} \end{aligned}$$

where in (74) and (75) independence between \({\textbf{v}}_{k}\) and the zero-mean channel estimate \(\tilde{{\textbf{h}}}_{k}\) is used. Similarly, it is possible to rewrite the last expectation as

$$\begin{aligned} \begin{aligned} E\left\{ \left| {\textbf{v}}^{H}_{k} {\textbf{n}}^{\mathrm{UL}}\right| ^{2} \right\}&= E\left\{ \left| {\textbf{v}}^{H}_{k} {\textbf{n}}^{\mathrm{UL}} {\textbf{n}}^{\mathrm{UL},H} {\textbf{v}}_{k} \right| ^{2} \right\} \\&= E\left\{ {\textbf{v}}^{H}_{k} E \left\{ {\textbf{n}}^{\mathrm{UL}} {\textbf{n}}^{\mathrm{UL},H} \right\} {\textbf{v}}_{k} \right\} \\&= E\left\{ {\textbf{v}}_{k}^{H} {\textbf{R}}_{{\textbf{n}} {\textbf{n}}}^{\mathrm{UL}} {\textbf{v}}_{k} \right\} \\&= E\left\{ \sum _{l=1}^{L} {\textbf{v}}_{kl}^{H} {\textbf{R}}_{{\textbf{n}}_{l} {\textbf{n}}_{l}}^{\mathrm{UL}} {\textbf{v}}_{kl} \right\} \\&= \sum _{l=1}^{L} E\left\{ {\textbf{v}}_{kl}^{H} {\textbf{R}}_{{\textbf{n}}_{l} {\textbf{n}}_{l}}^{\mathrm{UL}} {\textbf{v}}_{kl} \right\} \end{aligned} \end{aligned}$$

where this time independence between \({\textbf{v}}_{k}\) and the additive uplink noise vector \({\textbf{n}}^{\mathrm{UL}}\) is used. \(\square\)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Van Rompaey, R., Moonen, M. Distributed MMSE-based uplink receive combining, downlink transmit precoding and optimal power allocation in cell-free massive MIMO systems. J Wireless Com Network 2023, 48 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: