Ubiquitous Cell-Free Massive MIMO Communications

Since the first cellular networks were trialled in the 1970s, we have witnessed an incredible wireless revolution. From 1G to 4G, the massive traffic growth has been managed by a combination of wider bandwidths, refined radio interfaces, and network densification, namely increasing the number of antennas per site [1]. Due its cost-efficiency, the latter has contributed the most. Massive MIMO (multiple-input multiple-output) is a key 5G technology that uses massive antenna arrays to provide a very high beamforming gain and spatially multiplexing of users, and hence, increases the spectral and energy efficiency (see [2] and references herein). It constitutes a centralized solution to densify a network, and its performance is limited by the inter-cell interference inherent in its cell-centric design. Conversely, ubiquitous cell-free Massive MIMO [3] refers to a distributed Massive MIMO system implementing coherent user-centric transmission to overcome the inter-cell interference limitation in cellular networks and provide additional macro-diversity. These features, combined with the system scalability inherent in the Massive MIMO design, distinguishes ubiquitous cell-free Massive MIMO from prior coordinated distributed wireless systems. In this article, we investigate the enormous potential of this promising technology while addressing practical deployment issues to deal with the increased back/front-hauling overhead deriving from the signal co-processing.


I. INTRODUCTION
One of the primary ways to provide high per-user data rates-requirement for the creation of a 5G network-is through network densification, namely increasing the number of antennas per site and deploying smaller and smaller cells [1]. A communication technology that involves base stations (BSs) with very large number of transmitting/receiving antennas is Massive MIMO [2], where MIMO stands for multiple-input multiple-output. This key 5G technology leverages aggressive spatial multiplexing. In the uplink, all the users transmit data to the BS in the same time-frequency resources. The BS exploits the massive number of channel observations to apply linear receive combining, which discriminates the desired signal from the interfering signals. In the downlink, the users are coherently served by all the antennas, in the same time-frequency resources but separated in the spatial domain by receiving very directive signals. By supporting such a highly spatially-focused transmission (precoding), massive MIMO provides higher spectral efficiency, and reduces the inter-cell interference compared to existing mobile systems.
The inter-cell interference is however becoming the major bottleneck as we densify the networks. It cannot be removed as long as we rely on a network-centric (cell-centric) implementation, since the inter-cell interference is inherent to the cellular paradigm. In a conventional cellular network, each user equipment (UE) is connected to the access point (AP) in only one  boundaries resulting in no inter-cell interference, hence the terminology "cell-free".
Ubiquitous cell-free Massive MIMO enhances the conventional CoMP-JT by implementing the user-centric approach and leveraging the benefits of using Massive MIMO, i.e., high spectral efficiency, system scalability, and close-to-optimal linear processing. In such a system, the APs are connected via front-haul connections to central processing units (CPUs), which are responsible for the coordination. The CPUs are interconnected by back-haul (Fig. 1d).
Note that, it is convenient to have AP-specific synchronization and reference signals, which can be viewed as cell identifiers, also in cell-free networks. Hence, "cell-free" refers to the lack of cell boundaries in the data transmission.

II. SYSTEM OPERATION AND RESOURCE ALLOCATION
To give a first sense of the paradigm shift that cell-free Massive MIMO constitutes, Fig. 2 shows the user performance at different locations in an area with nine APs: Fig. 2a shows that the SEs in a cellular network is poor at the cell edges due to strong inter-cell interference, while Fig. 2b shows that a cell-free network can avoid interference by co-processing over the APs and provide more uniform performance among the users. The SE is only limited by signal propagation losses.
A. Ubiquitous Cell-Free Massive MIMO: The Scalable Way to Implement CoMP-JT The first challenge in implementing a cell-free Massive MIMO network is to obtain sufficiently accurate channel state information (CSI) so that the APs can simultaneously transmit (receive) signals to (from) all UEs and cancel interference in the spatial domain. The conventional approach of sending DL pilots and letting the UEs feed back channel estimates is unscalable since the feedback load is proportional to the number of APs. To circumvent this issue, we note that each AP only requires local CSI to perform its tasks [9], which refers to the channel between the AP and to each of the UEs. Local CSI is conveniently acquired in time-division duplex (TDD) operation since, when a UE sends a pilot, each AP can simultaneously estimate its channel to the UE. Hence, the overhead is independent of the number of APs. By exploiting channel reciprocity, the UL channel estimates can be also utilized as DL channel estimates, as in cellular Massive MIMO [2]. Just like Massive MIMO is the scalable way to implement multi-user MIMO [2], ubiquitous cell-free Massive MIMO is the scalable way to implement CoMP-JT [3], [10], [11]. In cell-free networks there are L of geographically distributed APs that jointly serve a  relatively smaller number K of users: L K. Cell-free Massive MIMO can provide ten-fold improvements in 95%-likely SE for the UEs over a corresponding cellular network with small cells [3], [10]. There are two key properties that explains this result.
The first property is the increased macro-diversity.    a large ISD, the UEs with the best channel conditions have almost identical channel gains in both cases, but the most unfortunate UEs gains 5 dB from cell-free processing. With a small ISD of 5 m, which is reasonable for connected factory applications, all UEs obtain 5-20 dB higher channel gain by the cell-free network.
The second property is favorable propagation, which means that the channel vectors h 1 , h 2 of any pair of UEs are nearly orthogonal, leading to little inter-user interference. The level of orthogonality can be measured by the squared inner product |h H 1 h 2 | 2 /( h 1 2 h 2 2 ). A smaller value represents greater orthogonality. In a cellular network with single-antenna APs, h 1 and h 2 are scalars and thus the measure is one. Favorable propagation will, however, appear in cell-free Massive MIMO where h 1 , h 2 ∈ C L , since the combination of small-scale and large-scale fading makes the large-dimensional channel vectors pairwise nearly orthogonal [11]. This is illustrated in Fig. 3b, which shows the CDF of the orthogonality measure for two randomly located UEs.
The inner product is very small for all the considered ISDs.

B. TDD Protocol
The TDD protocol recommended for cell-free Massive MIMO is illustrated in Fig. 4. Each AP estimates the uplink (UL) channel from each UE by measurements on UL pilots. By virtue of reciprocity, these estimates are also valid for the DL channels. Hence, the pilot resource requirement is independent of the number of AP antennas and no UL feedback is needed.
After applying precoding, each UE sees an effective scalar channel. The UE needs to estimate the gain of this channel to decode its data. Note that in cellular Massive MIMO, owing to channel hardening, the UE may rely on knowledge of the average channel gain for decoding [2]. In cellfree Massive MIMO, in contrast, there is less hardening and DL effective gain estimation is desirable at the terminal [11], [12]. This estimate can either be obtained from DL pilots sent by the AP during a DL training phase [12] (Fig. 4a) or, potentially, blindly from the DL data transmission if there are no DL pilots (Fig. 4b). The configuration including the pilot-based DL training, depicted in Fig. 4a  The channel coherence interval is defined as the time-frequency interval during which the channel can be approximately considered as static. It is determined by the propagation environment, UE mobility, and carrier frequency [2]. The TDD frame should be equal or shorter than the smallest coherence time among the active UEs. For simplicity, we herein assume it is equal.
Let τ = T c B c the length of TDD frame in symbols, where B c is the coherence bandwidth and and τ d,d denote the total number of symbols per frame spent on transmission of UL pilots, UL data, DL pilots and DL data, respectively. Importantly, τ as well as τ p , τ d,d , τ u,d can be adjusted over time to accommodate the coherence interval variation and the traffic load change. However, such frame reconfiguration should occur slowly to limit the amount of control signaling required by the resource re-allocation.
The maximum number of mutually orthogonal pilots is upper-bounded by τ . Hence, allocating orthogonal pilots is physically impossible in networks with K ≥ τ , and non-orthogonal pilots are necessary. UEs that send non-orthogonal pilots cause mutual interference that make the respective channel estimates correlated, a phenomenon known as pilot contamination. This results in a degradation of the channel estimation quality and in coherent interference.

C. Power Control
Power control is important to handle the near-far effect, and protect UEs from strong interference. The power control can be governed by the CPU, which tells the APs and UEs which power-control coefficients to use. By using closed-form capacity bounds that only depend on the large-scale fading, the power control can be well optimized and infrequently updated, e.g., a few times per second.
When maximum-ratio (MR) precoding is used, at AP l, the symbol intended for UE k, q k , is first weighted byĝ * lk and √ ρ lk , whereĝ lk is the estimate of the channel from AP l to UE k and ρ lk is the power-control coefficient. The weighted symbols of all K UEs will be then combined and transmitted to the UEs. In the UL, at UE k, the corresponding symbol q k is weighted by a power-control coefficient √ ρ k before transmission to the APs.
In general, the power-control coefficients should be selected to maximize a given performance objective. This objective may, for example, be the max-min rate or sum rate: • Max-Min Fairness Power Control: The goal of this power-control policy is to deliver the same rate to all UEs and maximizing that rate. In a large network, some UEs may have very bad channels to all APs, thus it is necessary to drop them from service before applying this policy, otherwise the service will be bad for everyone. As in cellular Massive MIMO, the max-min fairness power-control coefficients can be obtained efficiently by means of linear and second-order cone optimization [3, Section IV-B].
• Power Control with User Prioritization: The rate requirements are typically different among the UEs, which can be taken into account in the power-control policy. For instance, UEs that use real-time services or have more expensive subscriptions have higher priority. The max-min fairness power control can be extended to consider weighted rates, where the individual weights represent the priorities. Minimum rate constraints can be also included.
• Power Control with AP Selection: Due to the path-loss, APs far away from a given UE will modestly contribute to its performance. AP selection is implemented by setting non-zero power-control coefficients to the APs designed to serve that UE.

D. Pilot Assignment
To limit pilot contamination, efficient pilot assignment is important. Pilot assignment is determined at the CPU, and communicated to all the APs which forward it to the UEs. This message would map the UE identifier to the pilot index. This UE-to-pilot mapping can be transmitted either in the broadcast control channel within the system information acquisition process or in the random access channel during the random access procedure. Pilot assignment can be done in several ways: • Random pilot assignment: Each UE is randomly assigned one of the τ p mutually orthogonal pilots. This method requires no coordination, but there is a substantial probability that closely located UEs use the same pilot, leading to bad performance.
• Brute-force optimal assignment: A search over all possible pilot sequences can be performed to maximize a utility of choice, such as the max-min rate or sum rate. This method is optimal but its complexity grows exponentially with K.
• Greedy pilot assignment: The K UEs are first assigned pilot sequences at random. Then this assignment is iteratively improved by performing small changes that increase the utility.
To achieve good network performance, pilot assignment and power control can be performed jointly.

III. PRACTICAL DEPLOYMENT ISSUES
The cost and complexity of deployment, limited capacity of back/front-haul connections, and network synchronization are three major issues that need be solved in a practical deployment.

A. Radio Stripes System
The cabling and internal communication between APs is challenging in practical cell-free massive MIMO deployments. An appropriate, cost-efficient architecture is the radio stripe system (patent WO/2018/103897), presented next.
In a radio stripe system, the antennas and the associated antenna processing units (APUs) are serially located inside the same cable, which also provides synchronization, data transfer, and power supply via a shared bus; see   Each radio stripe sends/receives data to/from one or multiple CPUs through a shared bus (or internal connector), which also provides synchronization and power supply to each APU. shared bus. In each antenna, the input data streams are scaled with the pre-calculated precoding vector and the sum-signal is transmitted over the radio channel to the receiver(s). By exploiting channel reciprocity, the precoding vector may be a function of the estimated uplink channels.
For example, if the conjugate of the estimated uplink channel is used, MR precoding is obtained.
This precoding requires no CSI sharing between the antennas.
On the receiver side, the received radio signal is multiplied with the combining vector previously calculated in the uplink pilot phase. The output gives K data streams. The processed streams are then combined with the data streams received from the shared bus and sent again on the shared bus to the next APU. More specifically, the mth APU sums its received data streams to the input streams from APU m − 1 consisting of combined signals from APUs 1, . . . , m − 1, for one or more UEs. This cumulative signal is then outputted to APU m + 1. The combination of signals is a simple per-stream addition operation. The radio stripe system facilitates a flexible and cheap cell-free Massive MIMO deployment.
Cheapness comes from many aspects: (i) deployment does not require highly qualified personnel.
Theoretically, a radio stripe needs only one (plug and play) connection either to the front-haul network or directly to the CPU; (ii) a conventional distributed massive MIMO deployment requires a star topology, i.e., a separate cable between each APs and a CPU, which may be economically infeasible. Conversely, radio stripe installation complexity is unaffected by the number of antenna elements, thanks to its compute-and-forward architecture. Hence, cabling becomes much cheaper; (iii) maintenance costs are cut down as a radio stripe system offers increased robustness and resilience: highly distributed functionality offer limited overall impact on the network when few stripes being defected; (iv) low heat-dissipation makes cooling systems simpler and cheaper. While cellular APs are bulky, radio stripes enable invisible installation in existing construction elements as exemplified in Fig. 5b. Moreover, a radio stripe deployment may integrate for example temperature sensors, microphones/speakers, or vibration sensors, and provide additional features such as fire alarms, burglar alarms, earthquake warning, indoor positioning, and climate monitoring and control.

B. Front-haul and Back-haul Capacity
While there is no need to share CSI between antennas, the CPUs must provide each APU with the data streams. The data is delivered from the core network via the back-haul and then forwarded to the APU over the front-haul; see Fig. 5a. Similarly, the CPU receives the cumulative signals from its the radio stripes over the front-haul and decodes them. The data will then be delivered to the core network over the back-haul.
The required front-haul capacity of a radio stripe is proportional to the number of simultaneous data streams that it supports at maximum network load. The required back-haul capacity of a CPU corresponds to the sum rate of the data streams that its radio stripes will transmit/receive at maximum network load. The way to limit these capacity requirements is to constrain the number of UEs that can be served per AP (e.g., radio stripe) and CPU. To avoid creating cell boundaries, a user-centric perspective must be used when selecting which subset of APs that serve a particular UE [3], [13], as illustrated in Fig. 1c.
Suppose a UE is alone in the network and all APs transmit to it with full power. Since the path-loss decays rapidly with the propagation distance, 95% of the received power will originate from a subset of the APs, called the 95%-subset. When the ISD is large, as in a conventional cellular network, the 95%-subset might only contain a handful of APs. As the ISD reduces (i.e., the number of APs per km 2 grows), the 95%-subset is larger. This property can be used to limit the back-haul signaling. For example, it is shown in [3] that only 10-20% of the APs in the 1 km 2 area surrounding a UE belongs to the 95%-subset.

C. Synchronization
To serve a UE by coherent joint transmission from multiple APs, the network infrastructure needs to be synchronized. The network might have an absolute time (phase) reference, but the APs are unsynchronized. This means that, effectively, the transmitter and receiver circuits of each AP have their own time references. The difference in time reference between the transmitter and receiver in a given AP represents the reciprocity calibration error. The difference in, say, transmitter time reference, between any pair of APs represents the synchronization error between these two APs. To limit the reciprocity and synchronization errors, a synchronization process need to be applied at regular intervals.
Suppose the transmitter of AP i has a clock bias of t i (i.e., its local time reference clock shows zero at absolute time t i ) and the receiver has a clock bias of r i (i.e., its clock shows zero at absolute time r i ). A simple synchronization protocol may then work as follows: 1) At local time zero (absolute time t 1 ), AP 1 transmits a known pulse. AP 2 receives this pulse at time t 1 − r 2 , according to its clock, and timestamps it with δ 12 = t 1 − r 2 . Similarly, AP 3 timestamps the pulse with δ 13 = t 1 − r 3 .
2) At its local time zero, AP 2 transmits a known pulse. AP 1 timestamps the received pulse with its local reception time δ 21 = t 2 − r 1 . AP 3 timestamps it with δ 23 = t 2 − r 3 .
The quantities δ ij are known from the measurements, but t 1 , r 1 , t 2 , r 2 , t 3 , r 3 cannot be obtained from δ ij since the corresponding linear equation system is singular. However, the reciprocity and synchronization errors are easily recovered: This enables synchronization between the three APs.
This synchronization method can be applied in a differential manner. Consider measurements δ ij taken at a first point in time at which the biases are t 1 , r 1 , t 2 , r 2 , t 3 , r 3 , and then measurements δ ij taken at a second point in time at which the biases are t 1 , r 1 , t 2 , r 2 , t 3 , r 3 . The application of the above method to δ ij − δ ij yields the evolution of clock biases, up to a drift that is common to the whole group.
Extension to synchronization between two groups is straightforward. Consider two groups A and B, each group comprising three APs. The reciprocity and synchronization errors within each group may be calibrated through the above-described procedure. Each group will, however, have an unknown remaining clock bias. Let δ A,B ij t A i − r B j the time discrepancy measured at AP j in group B, following the known pulse transmission by AP i in group A. The inter-group synchronization error can be easily obtained by ii . Extensions to synchronization between more than two groups follows the same methodology as above. Note that, in a radio stripe system, groups of APs are sequential. Hence, synchronization is only required between a group and its neighbor.

IV. PERFORMANCE OF UBIQUITOUS CELL-FREE MASSIVE MIMO
We will analyze the anticipated performance in two case studies of practical interest: (i) an industrial indoor scenario, and (ii) an outdoor piazza scenario.

A. Industrial Indoor Scenario
Ubiquitous coverage, low latency, ultra-reliable communication, and resilience are key for wireless communications in a factory environment. The flexible distributed cell-free architecture, with its macro-diversity gain and inherent ability to suppress interference, is suitable to cope with the requirements of this scenario.
We consider the industrial environment described in [14]: a 7-8 m high building with metal ceiling and concrete floors and walls. The industrial inventory mainly consists of metal machinery.
The radio stripes are deployed in an area of 100×100 meters in such a way that 400 APs shape a 20×20 regular grid, as shown in Fig. 6a (left). The end-most antennas are 5 m apart. They are placed at 6 m above ground level, while the UE antenna height is 2 m.
The performance, in terms of per-user SE and the impact of power control is shown in Fig. 6a (right). We consider K = 20 uniformly distributed UEs, mutually orthogonal pilots, and four 3) MMF-CQB AP selection: Max-min fairness power control with channel-quality-based AP selection. Each UE is served by a UE-specific subset of APs. This subset consists of the APs that contribute to at least 95% of the channel gain to a given UE [3]. 4) MMF-RPB AP selection: Max-min fairness power control with received-power-based AP selection. The subset consists of the APs that contribute to at least 95% of the power assigned to a given UE [3].
Max-min fairness power control doubles the 95%-likely SE compared to the baseline CD-FPT case. Thanks to optimal power control, the radio-stripe system can guarantee each UE almost 4.5 bit/s/Hz. The performance with AP selection is also exploited (dashed and dashed dotted lines).
We can see that the SE reduction is minor if the RPB AP selection strategy is used, while the CQB criterion leads to a 20% reduction. The performance gap is attributable to the cardinality of the corresponding AP subsets; on average, CQB uses 17% of the APs and RPB uses 42% of the APs.

B. Piazza Scenario
Installations causing a big visual impact on the environment can be prohibited in areas like piazzas and historic places. In such a scenario, a radio-stripe system can provide all the  advantages previously described with an unobtrusive deployment. We consider a radio-stripe system that covers a 300×300 m square. The radio stripes are placed along the perimeter of the square at 9 m height, for example, on building facades. There are 400 APs in total, as shown in Fig. 6b (left). We consider K = 20 uniformly distributed UEs, mutually orthogonal pilots, and the same power-control policies as before. The maximum per-APs radiated power is set to 400 mW to deal with the large coverage area.
The numerical results are shown in Fig. 6b  While this article has outlined the basic processing and implementation concepts, many open issues remain, ranging from communication theory to measurements and engineering efforts: • Power control: While (weighted) max-min power-control is computationally tractable and provides uniform quality of service, it does not take actual traffic patterns into account. New power control algorithms are needed to balance fairness, latency, and network throughput, while permitting a distributed implementation.
• Distributed signal processing: MR precoding/detection and synchronization can be distributed, as described earlier, but the data encoding/decoding must be carried out at one or multiple CPUs. The distribution of such signal processing tasks over the network is non-trivial, when looking for a good tradeoff between high rates and limited back-haul signaling.
• Resource allocation and broadcasting: Scheduling, paging, pilot allocation, system information broadcast, and random access are basic functionalities that traditionally rely on a cellular architecture. New algorithms and protocols are needed for these tasks in cell-free networks.
• Channel modeling: The performance analysis of cell-free networks have primarily considered Rayleigh fading channels. Practical channels are likely to contain a mix of line-of-sight and non-line-of-sight paths, and will likely differ substantially depending on the carrier frequency. Dedicated channel measurements followed by refined channel modeling are necessary to better understand the channel characteristics and fine-tune resource allocation algorithms.
• DL channel estimation: Recent works [11], [12] show that cell-free networks provide a low degree of channel hardening. DL channel estimates, needed for data decoding, can either be obtained from DL pilots, which increases the pilot overhead, or by blind estimation techniques that uses the DL data. Dedicated algorithms for this estimation are needed.
• Compliance with existing standards: The 5G standard is intended to be forward-compatible and only relies on cell-identities for the basic functionalities. It is likely that cell-free data transmission can be implemented in 5G, but further work in standardization and conceptual development is needed.
• Prototype development: The step from a promising communication concept to a practical network requires substantial prototyping. The first working cell-free prototype may be pCell, where [15] describes a setup with 32 APs serving 16 UEs. Since every AP in a cell-free network has low cost and footprint, prototyping can be carried out using rather simple components. One can begin by demonstrating the synchronization and joint processing capabilities with a small number of APs in a limited area, and then continue with more APs and larger coverage area.