The road to 6G: a comprehensive survey of deep learning applications in cell-free massive MIMO communications systems

The fifth generation (5G) of telecommunications networks is currently commercially deployed. One of their core enabling technologies is cellular Massive Multiple-Input-Multiple-Output (M-MIMO) systems. However, future wireless networks are expected to serve a very large number of devices and the current MIMO networks are not scalable, highlighting the need for novel solutions. At this moment, Cell-free Massive MIMO (CF M-MIMO) technology seems to be the most promising idea in this direction. Despite their appealing characteristics, CF M-MIMO systems face their own challenges, such as power allocation and channel estimation. Deep Learning (DL) has been successfully employed to a wide range of problems in many different research areas, including wireless communications. In this paper, a review of the state-of-the-art DL methods applied to CF M-MIMO communications systems is provided. In addition, the basic characteristics of Cell-free networks are introduced, along with the presentation of the most commonly used DL models. Finally, future research directions are highlighted.

that L ≫ K distributed over space. In such a configuration, neither cells nor cell-boundaries exist. A central processing unit (CPU) is connected with the APs via the backhaul network, while all users are served based on a cooperation between the APs which use time-division duplexing (TDD) mode. The main benefits over the classical cellular technology are: (i) smaller SNR variations, (ii) managing interference and (iii) increased SNR values [2][3][4][5][6].
Deep Learning (DL) is a subset of Machine Learning (ML) class of computational methods and recently has achieved impressive results in many different research areas. DL is based on neural networks' architectures, using multiple layers ("deep") of artificial neurons [7]. DL has been utilized in the field of wireless communications too, introducing a data driven approach and offering new insights, such as new system's modeling [8,9] and distributed computation [10]. In this context, there is ongoing research in the applications of DL in CF M-MIMO, thus providing new insights in the current research. Fig. 1 depicts the number of papers referring to DL applications in CF M-MIMO systems.

Related work
The applicability of CF M-MIMO to 6G vision is thoroughly discussed in [3]. An extensive presentation of the foundations of user-centric CF M-MIMO is provided in [2]. In [4] the authors provide a survey of the state-of-the-art literature on CF M-MIMO along with the characteristics of such systems. The applications of DL in wireless communications are presented in [8].
To the best of our knowledge this is the first study that discusses explicitly the applications of DL methods to the field of CF M-MIMO, providing at the same time an introduction to both CF M-MIMO systems and current DL architectures.

Paper motivation and contributions
The motivation behind this work may be framed in the need to study in depth the ideas behind future 6G vision. In particular, both CF M-MIMO and DL methods are considered viable solutions for many design and optimization problems, such as resource allocation, energy efficiency and managing interference. This paper considers the application of DL models for CF M-MIMO systems. The main contributions of this work are summarized as follows: • An introduction to CF M-MIMO and user-centric CF M-MIMO networks is presented here. • An extended review of the work around DL methods for CF M-MIMO systems is provided. Among the DL methods, special focus is given on federated learning and its utilization in resource management, and channel estimation • Future research directions are discussed, including research challenges and the incorporation of DL methods to the 6G vision The rest of this paper is structured as follows: DL models are briefly described in sect. 2. In sect. 3 the details of CF M-MIMO configurations are provided. In addition, the user-centric approach is also presented. Section 4 considers the application of DL in CF M-MIMO networks, sect. 5 discusses the future research challenges and also concludes this work.

DL models
There are three main DL models that are mostly employed in the current CF M-MIMO literature; feed-forward neural networks (FFNNs), recurrent neural networks (RNNs) and convolutional neural networks (CNNs). In this section, the details about these models are briefly discussed, providing further references for the interested reader. In addition, Reinforcement Learning (RL), one of the three major ML paradigms (the other two are Supervised Learning and Unsupervised Learning), is briefly presented in this section.

Feed-forward neural networks
The key component of every DL architecture is the notion of the artificial neuron. Artificial neurons are computational units (functions) that try to mimic in mathematical terms the behavior of biological neurons. The functionality of such a unit is very simple; the neuron takes some input and produces an output (or an activation, with respect to the action potential of a biological neuron). For inputs in vector form, each individual input is weighted in a separated way, and then the sum is passed through a nonlinear function known as an activation function [7]. A FFNN is the first proposed architecture comprised of multiple artificial neurons. In such a model, the connections between the nodes do not form a cycle or a loop and the information is transferred only forward. Both the training and the learning are achieved through the stochastic gradient descent (SGD) algorithm. The pseudo-code for the SGD algorithm is given in Alg. 1. where η is the learning rate, w the weight and C the loss function (cost) which computes the distance between the current output of the algorithm and the expected output.
There are many techniques for making FFNNs work efficiently, but the most frequently used is the back-propagation [7] algorithm. For the loss function, back-propagation technique utilizes the chain rule from calculus, computing the gradient one layer at a time, iterating backward from the last layer. In Fig. 2, the basic architecture of a FFNN with one hidden layer is shown.
The mathematical formulation of FFNNs (that can be extended to other DL architectures) considers an input vector x , a set of weights w , a bias b and an activation function f. The output of the last layer is As mentioned before this output is compared with the expected output vector.

Recurrent neural networks
As stated before FFNNs' architecture does not contain either cycles or loops (acyclic graphs). A different approach is introduced with Recurrent Neural Networks (RNNs). RNNs adopt a notion of memory by utilizing their internal state to process variable length sequences of inputs [7,11]. This characteristic makes them suitable for tasks such as time series forecasting, handwriting recognition or speech recognition [11]. There are many RNN architectures, such as Gated Recurrent Units (GRUs), Bi-directional RNNs, Hopfield networks, etc.
A very successful variant of RNNs are the long short-term memory (LSTM) networks [12,13]. The building blocks of LSTMs are cells which have an input gate, an output gate and a forget gate. The main advantage over classical RNNs is that this type of cell is capable of remembering values over arbitrary time intervals. In addition, the flow of information is regulated by the three aforementioned gates [12]. In mathematical terms, a LSTM network can be formulated in the following way. For an input vector x t ∈ R N at time step t, and M hidden layers, the forget gate's activation vector F t ∈ (0, 1) M is given by where W F and U F are matrices of weights, q t ∈ (0, 1) M is the hidden state vector and b F is the bias vector.
In addition, the input/update gate's activation vector I t ∈ (0, 1) M and the output's activation vector O t ∈ (0, 1) M are expressed in a similar way, and where the subscripts I and O mean input and output, respectively, and the other symbols have the same meaning as previously.
A LSTM unit has also a cell input activation vector C t ∈ (−1, 1) M , which is given by, Combining the above equations, the cell state vector and the hidden state vector are updated with the following rules where • denotes the Hadamard product, S 0 = 0 and q 0 = 0 . Finally, Although the DL research field seems to move towards replacing LSTMs with Transformers for many tasks [14], LSTMs still remain one of the most commonly used architectures.

Convolutional neural networks
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC), has been proved a driving force for many advancements in computer vision (CV) [15]. The dominant paradigm in this field is the application of convolutional neural networks (CNNs). CNNs utilize the convolution operation instead of the general matrix multiplication [7,16]. CNNs have found success not only in CV tasks but also in time series forecasting, video processing, natural language processing, etc. [7]. CNNs are composed of at least one convolutional layer and often fully connected layers and pooling layers, as shown in Fig. 3. The latter reduces the size of the incoming data. In contrast to a fully connected layer, in a convolutional layer exists the so-called neuron's receptive field, which means that every single neuron receives input from only a restricted area of the previous layer. Most CNNs use at some point as an activation function the Rectified Linear Unit (ReLU) function, or variants. ReLU is simply defined as [7], Despite its mathematical simplicity, its use has been proved valuable in order to avoid over-fitting.

Reinforcement learning
Reinforcement Learning (RL) is one of the three major ML paradigms [7]. In this framework, a learning agent is able to act within its environment, take actions and learn through trial and error. Finding a balance between exploration of an "unknown space" and exploitation of its "current knowledge", the agent maximizes a cumulative reward [17]. Recently the combination of DL architectures with RL (DRL) in a unified setup has provided solutions to many difficult problems [18].
RL has many similarities with the fields of dynamic programming and optimal control [19]. In this context, the environment is often modeled as Markov decision process (MDP). However, RL methods do not always assume the knowledge of an exact model of the environment. Formally, a MDP is defined as a 4-tuple (S, A, P, Q) , where S is the state space, A is the action space, Q is the immediate reward and the probability that action a 1 in state s 1 at time t will lead to states 2 at time t + 1 is given by Deep Q-Learning (DQL) is a branch of DRL which is utilized for many tasks in different areas, including wireless communications. In DQL, a Q-value is an estimation of how good it is to take the action A at the state S at time t. In this way, a matrix is created where the agent can refer to in order to maximize its cumulative reward [17].
The realization that the matrix entries have an importance relative to the other entries, leads us to approximate the matrix values with a deep neural network.
Another frequently used DRL method that exploits DQL characteristics is the deep deterministic policy gradient (DDPG) algorithm. DDPG is an "off-policy" method that consists of two entities; the actor and the critic. The actor is modeled as a policy network and its input is the state and its output the exact continuous action (instead of a probability distribution over actions). The critic is a Q-value network that takes in state and action as input and outputs the Q-value. [20].

CF M-MIMO
In this section, the basic characteristics of CF M-MIMO networks and user-centric CF M-MIMO networks are discussed. For a detailed analysis of the fundamentals of cellular M-MIMO networks the interested reader may refer to [1], while for CF M-MIMO [2,3] and [4].

Conventional CF M-MIMO
Current wireless communications systems are based on the cellular architecture, where the coverage area is divided into cells. Cell-free technology has been proposed as a change of paradigm in communications engineering, trying to meet the beyond 5G telecommunications' demands [21]. Fig. 4 shows the number of papers referring to CF M-MIMO technology. Following [2], CF M-MIMO can be described as an ultra-dense wireless network where joint transmission and reception are achieved through the cooperating APs which serve the user equipment (UE). A benefit of the CF M-MIMO is that the system as a whole makes use of the physical layer concepts from the cellular M-MIMO area. More specifically, a CF M-MIMO network consists of numerous distributed APs which are connected to a central processing unit (CPU). In this way, there is no need for cells and the users are served simultaneously by all APs, as illustrated in Fig. 5. The motivation behind this idea is to provide an, as uniform as possible, quality of service in the given space.
CF M-MIMO brings a change of paradigm in wireless communications, offering many advantages over the classical cellular telecommunications systems. In particular, CF M-MIMO technology offers (i) smaller SNR variations, (ii) managing interference and (iii) increased SNR values [2][3][4][5][6]. Smaller SNR variations are achieved through uniform SNR across the coverage area. The joint transmission from multiple APs, helps suppressing the inter-cell interference. The involvement of APs, with weaker channels in the transmission, results in an increased signal-to-interference-and-noise ratio (SINR). In the cellular paradigm, only the AP with the best channel is utilized. In addition, having a much larger number of antennas than users, has the effect of creating many spatial degrees of freedom to separate the UE in space. As a result the transmitted and received signals can be processed using linear methods [2].

Conventional TDD-based CF M-MIMO
Most CF M-MIMO scenarios consider a time division duplexing (TDD) operation. Here, the basic system model is introduced. For a discussion of FDD operation, one can refer to [3].
Considering the deployment of a CF M-MIMO network, it is going to consist of L distributed APs, each equipped with M antennas serving K single-antenna users. The scenario considering multi-antenna users can be modeled in the same way, by just adding more single-antenna users.
Each AP acquires channel state information (CSI) between itself and all users via the uplink channel estimation method. The channel between the l-th AP and the k-th user is denoted by h kl ∈ C N . The channel is considered approximately constant during coherence time τ c and follows a correlated Rayleigh fading distribution, (11) h kl ∼ N C (0, Cor kl ) In the above equation, Cor kl is simply the spatial correlation matrix composed of the small-scale fading and large-scale fading. For channel estimation we use τ p mutually orthogonal pilots q 1 , . . . , q τ p , such that ||q j || 2 2 = τ p , j = 1, . . . , τ p and τ p is the length of the pilot. The uplink received signal corresponding to the pilot signal at the l-th AP is [2] where p k ≥ 0 the transmit power of the k-th user, n p l ∈ C N ×τ p is the additive noise and each element is considered i.i.d.
However, despite their appealing amenities CF M-MIMO networks face serious challenges regarding their practical implementation. The main one is that this configuration is not scalable as the number of users increases [2]. As a result, they cannot be deployed for the future 6G wireless networks.

User-centric CF M-MIMO
As previously stated, the main challenge that CF M-MIMO systems face, is the fact that they are not scalable, thus making them impractical for 6G applications. In order to overcome this problem, user-centric CF M-MIMO has been proposed. In this new setup, a subset of APs is transmitted to the UE. As a result, the fronthaul signaling is reduced, while at the same time the performance loss is negligible. Stated differently, user-centric CF M-MIMO makes use of dynamic cooperation clustering, where a subset of APs serves the user k. In [2], one proved that this new configuration is scalable.
Cellular M-MIMO communication systems make use of two emerging phenomena: (i) Channel hardening and (ii) favorable propagation, which explain the resulting performance gain. Channel hardening explains the situation where a fading channel has almost the same effects with a non-fading channel. Favorable propagation is defined in the case of the vector-valued channels [22,23]. These two properties are extended in the usercentric CF M-MIMO framework, since the above characteristics remain. However, a proper mathematical analysis is needed.
The mathematical formulation of user-centric function extends the results obtained in the previous section, but also utilizes the concept of dynamic cooperation clustering (DCC) [24]. DCC refers to the idea that every user selects which antennas should serve them.
Let us consider the scenario discussed previously and a set of diagonal matrices D kl ∈ C N ×N . These matrices establish the connection between users and antennas. If an antenna is allowed to serve the k-th user then the matrix D kl is transformed to the identity matrix. The DCC framework allows only a subset of the APs to participate [3], so the received downlink signal at k-th user is given by [ where w kl is the precoding vector, s k is the transmitted signal and as before n k ∼ CN (0, σ 2 ) is the additive Gaussian noise. It is evident that if D kl = 0 , the k-th user is not served by l-th AP.

DL for CF M-MIMO systems
DL has been recently applied in many tasks in the field of wireless communications [8].
In addition DL is considered a main component of beyond 5G communications systems. In this section, the application of DL models in CF M-MIMO systems is reviewed. In Table 1, the results presented in this section are summarized.

Resource allocation and power efficiency
In wireless networks, power allocation is a very important factor for the system's overall efficiency [45,46]. When power allocation is properly performed, signal communication can take place in a near-optimum way. The allocation of power to the individual users should take into consideration two things: (i) the maximization of the minimum capacity guaranteed to each of them and (ii) the channel's dynamics [1]. Exploiting the characteristics of DL techniques one can approximate the power allocation and control problem. The sum rate maximization problem indirectly maximizes the spectral efficiency of the system. A solution to this problem is discussed in [25]. The sum rate describes the summation of the achievable rates of multiple concurrent transmissions and the problem of its maximization is a non-convex one. In that research paper, power allocation problem is converted into a standard geometric program (GP) and the channel statistics is exploited to design the respective power elements. Employing large-scale-fading (LSF) with a CNN allows to determine a mapping from the LSF coefficients and the optimal power through solving the sum rate maximization problem.
The uplink power control is studied in [26]. In the Supervised Learning framework, a FFNN is trained to learn the pairs of input-output data. In this particular setting, the optimal solution of the power allocation strategy is the goal of the FFNN's training. In a similar manner, the same problem is tackled in [27]. The authors train a LSTM, taking into consideration different scenarios.
A different approach is given in [28]. An Unsupervised Learning setting is established, where a FFNN is designed to learn the optimum user power allocations which maximize the minimum user rate. In this way there is no need to know in advance the optimal power allocations. An alternative research direction in problem of downlink power allocation is provided in [29]. First the authors proved a generalization of maximum ratio precoding and then they trained a NN for every AP. The goal is to mimic system-wide max-min fairness power allocation. One major benefit of that paper over other candidate solutions, is the use of only local information, outperforming the state-of-the-art power allocation algorithms for CF M-MIMO scenarios. Maximum ratio transmission (MRT) as a concept for multiple antenna systems was introduced in [47]. MRT combined with a CF M-MIMO network results in smaller fronthaul overhead. The work proposed in [30] considers the task of finding practical near-optimal power control utilizing DL methods. The whole procedure consists of a CNN whose input is the channel matrix of large-scale fading coefficients and its outputs are the total transmit power of each AP. This information is then used to compute the downlink power control for each user, with a low-complexity convex program.
RL has also been employed in resource allocation problems. More specifically, in [31] DQL is utilized. The allocation of the downlink transmission powers in a CF M-MIMO configuration is achieved by making use of a DQN. The sum spectral efficiency optimization problem is discussed. Spectral efficiency refers to the maximum number of bits of data that can be transmitted to a specified number of users per second per Hz while maintaining an acceptable quality of service. Exploiting the RL framework of trial-and-error interactions with the environment over time, the DQN is trained taking as input of the long-term fading information and then it outputs the downlink transmission power values.
A similar approach is used in [32]. The proposed DQN and the deep deterministic policy gradient (DDPG) methods are employed for the task of dynamic power allocation. The goal here is to maximize the sum-spectral efficiency. The numerical results showed a competitive performance with the state-of-the-art weighted minimum mean square error (WMMSE) algorithm.
Another important factor in the operation of wireless networks is the power efficiency and the long-term energy efficiency. The long term energy efficiency of the uplink beamforming is addressed in [33]. Exploiting the information obtained from the MMSE algorithm, an estimation of SINR is given. Then the long-term energy efficiency is defined as a function of the beamforming matrix. As a final step, the authors utilize DRL algorithm based on deep deterministic policy gradient to model the dynamic beamforming design.
In the current literature model-based approaches regarding power control have prevailed. However, in [34] the authors exploit a model-free solution for downlink power control, using also the deep deterministic policy gradient algorithm (DDPG) with FFNNs.
Industrial Internet of Things (IIoT) is developed in parallel with other IoT networks. The architecture and optimization of such networks are considered as difficult problems, hence DL techniques are often applied. Cell-free architectures have been proposed for IIoT networks. In [35] a systematic study of DRL in CF M-MIMO IIoT is presented. In particular, considering a cross-layer optimization scenario (power allocation in the physical layer and random access in the medium access layer), a dual deep deterministic policy gradient (DDPG) algorithm is designed for resource management tasks.

Channel estimation
Channel estimation is a fundamental concept in wireless communications and refers to the process of characterizing the dynamics of the wireless channel [23]. The authors of [36] formulate the concept of channel mapping in space and frequency. Considering a scenario with two sets of antennas with different frequency bands, the channels and the frequencies of the first one are mapped to the channels and frequency bands of the other set of antennas. Leveraging the results of their proposed analysis, a FFNN was utilized for channel mapping in a CF M-MIMO model. This DL method managed to reduce both the downlink training/feedback and the fronthaul signaling overhead.
In [37] the authors employ a flexible denoising convolutional neural network (FFD-Net) for the task of channel estimation in a CF M-MIMO framework. The results showed that the time spent for the FFDNet training is much less than the time that is needed from the state-of-the-art channel estimators, such as CNN, achieving at the same time similar performance. In order to remove the need for relative reciprocity calibration based on the cooperation of antennas, a cascade of two FFNNs is proposed in [38].
Unsupervised Learning is employed for the task of decentralized beamforming in [39]. The authors propose two different deep neural networks models, one fully distributed and the other partially distributed. The training is performed in an unsupervised framework and each model is able to perform decentralized coordinated beamforming with zero or limited communication overhead between APs and the network controller, for both fully digital and hybrid precoding. The proposed methods achieve a near-optimal sum-rate while also reducing significantly the complexity.
For the case of enhanced normalized conjugate beamforming, the authors in [40] derived an exact closed-form expression for an achievable downlink spectral efficiency. To achieve such a result, they assumed independent Rayleigh fading channels. DL could be proved useful for other cases, where such a modeling would not be sufficient.

Federated learning
Federated Learning (FL) is a sub-field of ML algorithms, where the models are trained across multiple decentralized devices, while at the same time they hold only local data, without exchanging them (offering advanced security). This is an alternative approach compared to traditional centralized ML techniques where all the local datasets are uploaded to one server [48].
Wireless FL faces many challenges. In [41] a novel scheme for CF M-MIMO networks is proposed. This scheme tries to establish a stable operation in any FL framework by allowing each instant of all the iterations of the FL framework to happen in a large-scale coherence time. The authors take into account an existing FL framework as an example and target FL training time minimization for this framework.
The authors in [42] try to answer the following question: How does a wireless network support multiple FL groups? Their proposal is CF M-MIMO network to establish a stable operation of multiple FL processes. Then, a novel scheme is developed which asynchronously executes the iterations of FL processes under multi-casting downlink and conventional uplink transmission protocols. As a final step, an optimal low-complexity resource allocation algorithm is used.

Other applications
Apart from the aforementioned DL applications in CF M-MIMO, there are some insights in other possible use-case scenarios. Considering the case of a large scale CF M-MIMO network, a FFNN is utilized in [43] for pilot assignment. The goal is to maximize the sum spectral efficiency. Examining the advantages of joint cooperation clustering and content caching in CF M-MIMO, a DRL approach is discussed in [44] demonstrating good energy efficiency performance with no prior information requirements.

Future directions and conclusions
There is a consensus among researchers in the field of communications engineering that CF M-MIMO will play a very important role in the deployment of beyond 5G networks. As a result, a growing number of papers relative to these issues are being published every year. Although DL methods seem to offer a data-driven approach, thus improving the overall performance, it is likely that standalone techniques will not be proved sufficient. On the contrary, advanced signal processing techniques, compressed sensing and DL methods employed in a combined way which leverages the individual characteristics of each method are more likely to offer solutions in the near future.
The main drawback of practical implementation of DL models in wireless communications is their computational cost. Reducing the required memory and utilizing results from a "classical" analysis will provide better resource usage. In addition power allocation and energy efficiency will be central in the near-future research, while DL applications focusing only on user-centric CF M-MIMO are also expected.
Another factor that will enable future research in this field is the publication of the codes and the rest computational tools that were used along with the published works. Publicly available datasets and simulation codes have helped other fields to grow rapidly. It is inevitable that such practices will boost the research activity in wireless communications in general.
In this article, an extensive review of the work around DL methods for CF M-MIMO systems was provided. More specifically, DL architectures applied on the field of CF M-MIMO technology were reviewed and the basic information about the CF MIMO systems was discussed, including the user-centric variant. The model's equations were presented for both conventional systems and the user-centric approach. The survey was focused on resource allocation, channel estimation and federated learning problems. In these three areas, DL methods seem to achieve better results. Finally, future research directions were highlighted, offering insights for further study.

5G
Fifth generation 6G Sixth generation MIMO Multiple-input-multiple-output CF M-MIMO Cell-free massive MIMO DL Deep learning