This section addresses NN applications in the channel estimation process. The NN schemes have been grouped into different sections considering their standard features or training methods. Surveys about this subject showed an effort to design DNN and optimize the volume of the training dataset. At the same time, recent approaches are rising to circumvent the training issue by building different networks and ML dimensionality reduction strategies.
Neural network concepts
Before going into the survey on the paper subject, it is essential to define some concepts related to NN to yield better comprehension of the discussion in the following sections. First, regarding artificial NN, the direct computation units are called neurons, arranged in a layer fashion to form the network [227]. The neurons are connected through structures defined as weights that scale the neuron input and alter the function computed at the neuron. Hence, the functions employ those weights as parameters to propagate the inputs to the outputs [227, 228]. Second, NN learning comes from the weight changing at each interaction based on external stimuli referred to as training sequences or datasets. Here, the learning process is classified as supervised, unsupervised, and reinforcement learning, with the definitions presented in Sect. 3 [227,228,229]. During the training process, the output provides feedback prediction errors that allow to adjust the weight in the NN according to the learning process to pursue a better prediction in the incoming iteration [227, 230].
The weighted input sum at each neuron is applied to an activation function or transfer function responsible for introducing nonlinear operations to the prediction process based on mathematical operations [229]. This function is essential to leverage NN learning through complex tasks. Despite the layer number, it breaks through simple linear mathematical iterations and avoids getting a linear regression model. The activation function might be linear or nonlinear. There are a set of types among nonlinear activation functions, such as the sigmoid or logistic, hyperbolic tangent, rectified linear unit (ReLU), Gaussian error linear unit (GELU), softmax, and so forth. These nonlinear activation functions present advantages and limitations, which are not in this work scope and are appropriately found in [227,228,229].
The architecture of a NN is related to the layer design fashion. Based on this principle, a NN primary architecture definition is classified as the single layer and multilayer. The single-layer NN comprises a set of weighted (\(w_1, w_2, \dots , w_n\)) inputs (\(x_1, x_2, \dots , x_n\)) directly mapped to the output through an activation function, as shown in Fig. 9. This structure is commonly referred to as perceptron [227]. In addition, the perceptron might have an input invariant to the prediction part, defined as a bias, which defines the activation threshold. The multilayer NN architecture integrates neurons layer—arranged in an input and output layer connected by single or multiple intermediate layers defined as a hidden layer. For instance, Fig. 10 shows a multilayer structure. Once again, the neurons in the hidden layer might also have a bias weight.
Finally, another essential aspect to discuss is the algorithm used for NN training. The training algorithms are related to the function applied to update the weights among the network layers, searching to boost the learning process at each iteration [227, 229]. There are two possible concepts defined: incremental or batch training. The former updates the weights immediately after each iteration, while batch training leverages the updating process after all the inputs are inserted in the NN [230]. While the error is computed considering the network output prediction and the output expectation, the training algorithms rely on the error back-propagation mechanism. In other words, the algorithm implements a set of steps to update the weight starting from the output layer in the direction of the input layer.
Notably, the NN families are classified according to their structural aspects, such as the number of hidden layers and neuron connections. Thus, this survey considered grouping the proposed approaches regarding the NN classification while introducing each type appropriately.
Back-propagation neural network
When back-propagation algorithms are used in NN training, back-propagation neural networks (BPNNs) are created [231]. The basic algorithm concept is backward propagating the network error from the output to the input layer and adjusting the weights to reduce the network error through the steepest descent approach. This BPNN is deployed to work with real-domain data. Since the CIR is a complex-type signal, the channel estimation is also a complex-valued process.
Complex-valued BPNNs have been designed based on three layers (input, hidden, and output) of NN for channel estimation purposes, as shown in Fig. 11 [231, 232]. The complex signal is decomposed into real and imaginary parts to feed-forward the network. At the end, the output is summed to compose the channel estimated sample.
The BPNN in Fig. 11 has been used for channel estimation and equalization in FBMC [233] and OFDM systems [231, 232, 234,235,236,237,238], employing supervised learning through training sequences. The number of used perceptrons is specific, according to the proposal. The BPNN performance has been assessed in terms of BER and MSE compared to other conventional channel estimation approaches. Concerning the MMSE, LS, and LMS methods, the BPNN has underperformed the former while it has outperformed the others [231, 232, 236].
Complexity-wise, BPNN shows less complexity than the MMSE algorithm, although underperforms it [232]. BPNN exhibited a loss of about 2 dB compared with MMSE for the 0 dB SNR scenario. BPNN was also tested against semi-blind channel estimation and presented \(96\%\)–\(97\%\) BER enhancement at the cost of an \(86\%\)–\(87\%\) increase in complexity [236]. In general, BPNN estimation approaches do not require complicated matrix computations, and the optimum result happens when the size of hidden neurons is almost equal to the channel length [231, 233,234,235, 237, 238].
Feed-forward neural network
The FFNN is characterized by presenting connections among the neurons, not forming a cycle, and depending on the same layer. The data flow between the input and output layers includes single or multiple hidden layers. When there is one single hidden layer, the FFNN is known as MLP. Linear operations are realized in each perceptron, and the result is applied to an activation function before perpetuating it to the adjacent layer. The use of a radial basis activation function defines the RBFN subgroup.
FFNNs have been applied to FBMC, OFDM, and MIMO-OFDM systems. The networks are data-driven and use an ABCE and ABCEx approach. The training process is supervised by issuing pilot sequences online or offline. Concerning the MLP, a channel estimation has been implemented for a preamble-based FBMC system using a complex-valued two-hidden layer NN, which is offline trained with simulated datasets [239]. Figure 12 shows the proposed network, where the ReLU and tanh nonlinear activation functions are used in the hidden and output layers, respectively.
Furthermore, the initial MLP network was modified by inserting an MSE loss function to update the network. The two proposals were evaluated in terms of BER, with lower rates than the traditional LS. The Levenberg–Marquardt training algorithm allowed the designing of a two-layer complex-valued MLP for OFDM systems [240]. Therefore, it was extended to a MIMO-OFDM system, where the training also considers the one-step secant strategy [241]. The performance analysis showed that MLP results in more diversity gain than the conventional channel estimation approaches.
Initially, RBFN estimators were implemented with a single hidden layer and analyzed against the LMS, MMSE, and zero-forcing (ZF), outperforming the referred estimators [242]. The network structure resembles the one shown in Fig. 11, considering the discussion context. RBFNs were tested for OFDM systems by exploring the channel correlation in the time and time–frequency domains [243, 244]. The former considers estimating the channel for each subcarrier independently through the network. The latter cooperatively estimates the channels at different subcarriers. The strategies performance has shown to be similar in terms of BER. Meanwhile, the one-dimensional RBFN has been compared with an interpolation RBFN using fewer pilot subcarriers as training inputs. This second approach offered lower BER than the first one.
Tracking channel fluctuations in pilot-aided OFDM systems operating in a boisterous environment using RBFN have been shown to work well compared to traditional interpolation approaches [245]. In parallel, a Gaussian radial basis function interpolation was applied for fast-fading channel estimation. The LS method treats the initial estimation, and the channel response estimation is assisted by the Gaussian one hidden layer RBFN [246, 247]. The proposed scheme was applied to comb-type OFDM systems for analysis purposes, generating lower MSE than the LS and other RBFNs estimators. Lately, RBFN has been applied to a coherent optical OFDM system to implement an RBFN-based nonlinear equalizer [248]. The network weights are updated based on a two-step process. First, a K-means clustering algorithm is used to adjust the hidden layer weights. Further, the least mean square algorithm updates the output layer weight. Finally, a Q-factor assessment has been performed to highlight the proposal results against other works, resulting in a 4-dB performance improvement.
The MIMO-OFDM channel estimation based on RBFN was evaluated in [245, 247,248,249,250,251,252,253,254]. The RBFN structure is replicated to each antenna branch connected to the input layer. Thus, N inputs are forward connected to the next layer to demodulate the signals. A semi-blind technique has been improved by updating the function iteration based on an RBFN [249]. Further, evolutionary algorithms (PSO and GA) were employed to enhance the network parameters. Despite the mixture of techniques, there was no comparison to the conventional estimator for assessment purposes. In [250], the RBFN estimates the initial values of the MIMO channel supporting the particle filter method, which drops out the need for more training pilots since it tracks the channel variation.
Furthermore, joining the channel estimation and signal detection was done using an RBFN optimized by a genetic algorithm [251]. The approach was close to the MMSE estimator in terms of BER. Cyclic delay diversity OFDM systems were also targeted to the RBFN, which was introduced to solve interpolation problems in an uneven-pilot-based system [252]. Meanwhile, the Gaussian radial basis function has been extended to the MIMO-OFDM scenarios to leverage RBFN solutions [253, 254]. The solutions have returned better performance than the LS and LMS estimator, close to the MLP network BER.
Regarding the complexity, FFNN adds computational latency while improving the BER. For example, the proposal in [240] contributes to a gain of 1.2 dB at \(10^{-3}\). Also, [241] concludes that a training data length of 16 symbols or more produces remarkable results and better performance than the conventional LS, meaning that a compromise between performance and computational complexity must be reached [242]. Interpolation RBFN-based techniques exhibit complexity and performance trade-offs [244, 246]. The ultimate complex estimation methods are proposed in [250, 251], which achieve optimal performance in terms of BER and spectral efficiency at the cost of higher computational complexity.
Extreme learning machine
An ELM is an FFNN based on fast learning and one-shot training, reducing the training time with low computational complexity. The weights are set through the Moore–Penrose generalized inverse matrix. This learning technique has been applied in the channel estimation field for OFDM and MIMO-OFDM systems. The evaluated ELM networks are a single-hidden layer with an implementation based on the AMBCE [255], ABCE [256,257,258,259,260,261,262,263,264], and ABCEx [265, 266] approaches. The referred works employ a network comprising p input and m output neurons, as shown in Fig. 13. These network variables have different meanings according to the system design. For instance, p is equal to the number of receiving antennas, while m is related to the number of transmitting antennas for MIMO-OFDM systems. The number of hidden layer neurons (l) defines the Moore–Penrose generalized inverse matrix dimension.
Applying real-valued ELM networks has exploited joint channel equalization and symbol detection [265, 266]. This scheme has two input layer neurons corresponding to the real and imaginary parts of the received symbol. In [265], the training process uses an LS solution, while the ELM algorithm in [266] employs pilot blocks. Complex-valued ELM schemes were also investigated for channel estimation with p equal to the training sequence length [256]. The online trained network has been evaluated in a nonlinear channel condition, overcoming the LS and MMSE estimator BER results. Furthermore, the network performs similarly to the scheme without nonlinearities. The nonlinear distortion has been carried out in [258] to enhance the performance of OFDM systems with insufficient CP. The offline trained network was deployed online using an initial LS estimator to obtain the features of the CFR.
A technique to reduce the number of training pilots was developed based on the ensemble learning theory [257]. This method generates and combines different models to find an optimal predictive model. The ensemble approach comprised weighted averaging and median of the ELM model predictions based on the training error and pruning generated models, including combinations thereof. The BER results demonstrated the proposal effectiveness with a lower rate than ELM schemes and a similar performance compared to the MMSE.
A semi-supervised ELM has been proposed to channel estimation and equalization for vehicle to vehicle communications [264]. The training phase considered taking the label data training length equal to the unlabeled dataset. Afterward, the system implementation applies an LS pre-equalization after the FFT is conducted, with the output delivered to the semi-supervised ELM. The evaluation has demonstrated BER performance close to the LS and other ELM-based estimators. However, the algorithm execution time has been the longest among the compared methods. On the other hand, an ELM-based equalizer for OFDM-based radio-over-fiber systems was evaluated in [263]. The authors proposed a multilayer generalized complex-valued ELM build circumventing the ELM algorithm expansion to achieve an ELM-autoencoder. The network evaluation has outperformed other ELM from the literature, while the authors claimed that the proposal increased the computational cost.
Regarding MIMO systems, a semi-blind channel estimation process based on ELM networks has outperformed the BPNN, MLP, and RBFN. The scheme encompasses estimating the CFR at the pilot subcarriers and applying it to the training of the real-valued network. In addition, an ELM scheme with training based on symbol construction is proposed in [259]. The approach reduced the training sequence length and kept the performance, providing a better estimation than the MMSE. Another attempt to reduce the training time has combined manifold learning with ELM. Manifold learning is a nonlinear dimensionality reduction technique grouped with the PCA and ICA schemes presented in Sect. 4. This approach has also outperformed the MMSE estimator.
Recently, an ELM-based detector has been founded on online training for pilot-assisted mMIMO-OFDM systems at the millimeter-wave [262]. The network resembles that shown in Fig. 13, with the pilots being applied to the online training to leverage post-symbol detection. The BER assessment highlighted the ELM network performance over the MMSE estimator. Despite that, a lack of evaluation among the ELM network solutions has been identified.
Complexity appraisal shows that complex-valued ELM can involve only one hidden layer, outperform offline DNN in terms of complexity and performance, and reduce the training time [256,257,258, 260, 266]. Furthermore, ELM complexity was investigated to require the same number of neurons in the hidden layers as the number of antennas at the base station (BS) to achieve higher spectral efficiency than linear mMIMO receivers [261]. An attempt to leverage unsupervised learning to an ELM has been shown to increase the computational time cost with no performance improvement [264]. Besides, an ELM-autoencoder solution has significantly improved performance with a high computational cost. In contrast to complex-valued ELM, real-valued ELM demands less computation than FFNN and complex-valued ELM due to real-domain values instead of complex domain ones [255].
Recurrent neural network
The RNN consists of a network structure with one-step temporal dependence among the input data [267, 268]. The hidden layers receive the incoming information from the previous ones, and its output results through a feedback loop, as shown in Fig. 14. Consequently, it can learn over time in a cumulative process. Taking the unfolded example, the output at \(t-1\) feedback the input at time t, and the output at this current instant is provided as input at time \(t+1\). Thus, this NN learns not only from the incoming input but also by considering the influence of past information.
The RNN features are suitable for tackling time variations in channel estimation. This NN has been used to estimate channel response in OFDM, FBMC, and MIMO-OFDM systems [48, 267,268,269,270,271,272,273]. It has been deployed in an ABCE approach with supervised learning. The RNN was designed as a mapping function to assist pilot-aided OFDM systems [268]. The RNN was trained with the pilot subcarriers and then used to find the channel estimation at the data position. Lately, a bidirectional RNN has been proposed to enhance the system performance. A similar approach has been considered in training an RNN to provide signal recovery in an OFDM system operating under an interference environment. For instance, the network in [269] could predict 50 lost subcarriers based on channel estimation under severe interference with a root-mean-square error (RMSE) of 0.37065 and 0.24596 after 100 iterations and training epochs.
Moreover, the RNN was applied to track channel variations in MIMO-OFDM systems [267]. The proposal attempted to design an RNN for estimating channel response using signals with tightly coupled real and imaginary parts. Thus, a split-complex activation RNN was accomplished by allowing the network to learn to estimate the real and imaginary parts separately and combining them through the time average of the input information over a time window. The work has been improved by adding a self-organized map-based optimization to obtain a complex time delay fully RNN block for MIMO-OFDM systems [270]. The BER assessment has shown that the performance of the proposed network is close to the perfect CSI, superposing the MMSE estimator.
Besides, a SoftMax RNN using frequency index modulation was proposed to perform channel estimation on MIMO-OFDM systems [271]. The network provided lower BER values than the LS estimator and the ELM algorithm found in [256]. However, the comparison lacks an evaluation of the involved complexity. Reducing the ISI in MIMO-OFDM systems has been carried out by an Elman RNN for channel estimation [272]. The network evaluation has proved its application to channel estimation providing low PAPR and BER, with high capacity and throughput. The comparison included a convolutional neural network (CNN) and DNN, with the Elman RNN outperforming those networks. The RNN has also been used to design DNNs, such as the ChanEstNet DNN, which is later discussed [273]. However, the RNN performance has been recently evaluated in MIMO-OFDM systems [48].
The channel estimation field has also investigated a derivation of the RNN called long short-term memory (LSTM). The LSTM is designed to yield good performance in long sequence approaches and solve the vanishing and exploding gradient issue in conventional RNNs [274, 275]. This network can obtain long-term dependencies calling for learning based on past extended sequence information. Figure 15 shows an LSTM unit cell composed of a forget, output, and input gate responsible for the data flow regulation inside the cell. The forget gate decides what kind of information is thrown away or included in the cell state based on observing the past state and the actual data. Therefore, the \(\sigma _f\) assumes values equal to 0 (throw away) or 1 (accept the information). The candidate cell allows storing certain information in the current cell state, scaling it by the \(\sigma _c\) value. According to the decided value, the input from the gate is added to the current state. Finally, the output gate imposes management on what is computed as the output value, considering that the cell state is scaled into the range -1 to 1.
The LSTM network has been combined with conventional RNN, CNN, and MLP networks [274,275,276,277]. The inherent imaginary interference channel estimation problem in FBMC systems was approached by combining a bidirectional LSTM and an RNN [274]. The network has worked well under fast time-varying scenarios and outperformed a DNN algorithm. Meanwhile, the LSTM was joined with a CNN to support channel estimation in time-varying scenarios for OFDM systems [275]. The CBR-Net (CNN batch normalization RNN) provided lower BER than the convectional estimator and other DNN architectures. A similar hybrid solution, the CNN-LSTM algorithm, achieved lower BER than other NN [276]. An MLP-LSTM network is found in [277], with the joint solution working well under high-mobility scenarios with a velocity of up to 150km/h. Recently, bidirectional LSTM network architectures have been raised to prove their performance on MIMO-OFDM systems [278,279,280]. The evaluation has confirmed the superposition of conventional estimators. In addition, the researchers have claimed low complexity due to using a DNN architecture to combine massive LSTM units, adding a bidirectional arrangement.
Furthermore, an extension of the LSTM concept is named gated recurrent network (GRU). It comprises a cell unit in which the input and output gates are replaced by an updating gate that controls the amount of information to be retained or updated. This network type has been used to design a data-driven model for channel estimation in an OFDM system applied to a fog radio scenario [281]. The performance comparison was addressed with the orthogonal matching pursuit channel estimation strategy, showing promising results. The GRU network performance was also investigated under the FBMC system [282] to deal with the inherent imaginary interference channel estimation problem. Resembling the bidirectional LSTM architecture, a GRU network called BiGRU has been proposed for a MIMO FBMC-OQAM system [283]. The training process is based on an offline stage followed by an online prediction. The BER assessment uses different time-varying channel models to face the BiGRU performance against the interference approximation channel estimation method, with an improvement in the FBMC system employing the former.
An RNN with random connections among the neurons of the hidden layer is defined as an echo state network (ESN), with a network architecture as shown in Fig. 16. This network is typically designed in a single hidden layer called a reservoir. It stands for a NN that drops out of the training process through the back-propagation mechanism. The ESN has been recently investigated to leverage the channel estimation process in OFDM and MIMO-OFDM systems [284,285,286,287,288,289,290,291].
The ESN was used for channel estimation purposes [284]. First, the real and imaginary parts of the OFDM symbol are separated and delivered to two ESNs. After that, the network outputs were combined. Then, the ESN was supervised, trained, and analyzed based on comparing the desired results and those estimated, which leak from a performance analysis regarding system implementation. Moreover, an adaptive elastic ESN has been designed for channel estimation on IEEE 802.11ah systems employing the OFDM modulation [285]. The hybrid network architecture comprises an ESN and an adaptive elastic network. The latter has been added to handle ill-conditioned solutions of the LS and applied to obtain the frequency-domain CSI. The ill-conditioned solution rises from the collinearity problem in the input of the basic ESN model [285]. Therefore, the adaptive elastic network replaces the LS method to calculate the frequency-domain CSI. The results regarded the RMSE evaluation of adaptive elastic networks against auto-regression and support vector machine algorithms, highlighting the networks superior performance.
A three-layer estimator for the MIMO-OFDM system was designed considering a feature, enhancement, and output layer [286]. The feature layer comprised a pool of parallel ESNs connected with the enhancement layer by weights and biases. These layers extract feature information to feed the output layer, leveraging the channel estimation process. Besides, a supervised learning ESN has been proposed for nonlinear MIMO-OFDM systems for joint channel estimation and symbol detection, with BER results close but inferior to the LMMSE estimator [288, 289]. Thereafter, the symbol detection was based on a deep ESN, superposing the LMMSE estimator performance and showing results close to a shallow ESN [290]. Meanwhile, an ESN was designed to detect symbols using comb and scattered patterns in a standard LTE system with MIMO. The network evaluation has demonstrated superior performance over fewer pilots [291].
Complexity-wise, RNN leverages the training dataset to overcome other NNs trade-offs between accuracy and complexity. For example, they have been shown to require 218 epochs to achieve an average precision of \(96\%\), while MLP requires 326 epochs to achieve an average precision of \(94\%\) [267]. They also demand less computation due to low overhead using layers of simple matrix–vector multiplications and nonlinear activation functions [268]. However, DL-based RNN still has a challenging complexity, although its robustness can even estimate fast time-varying channels [274, 275, 277]. As a solution to reduce RNN intricacy, reservoir computing (RC) has been used to generate random synaptic weights [284,285,286,287,288,289,290,291].
Deep neural network
DNNs consist of multiple layers between the input and the output layers, as shown in Fig. 17 [23, 292, 293]. The multiple layers are hidden and can contain the same number of neurons or decrease towards the output layer. The layers are fully connected because each neuron is connected to all the neurons of the subsequent layer. The input value reaching a given neuron is the summation of the weighted output and bias values from the primary layer neurons. A given neuron output is a nonlinear activation function value such as the ReLU or the Sigmoid functions. Hence, the output sequences of the DNN are a cascaded nonlinear transformation of its input sequences.
The general DNN has been used for channel estimation for multicarrier systems [292, 294, 295]. For instance, a general DNN has been proposed to estimate CSI, allowing for joint channel estimation and symbol detection in an OFDM system with performance close to the MMSE estimator [292]. In [294], DNN is applied to the received signal to yield a less noisy signal and estimate the channel based on the generated signal. It has been shown that the proposed DNN channel estimator approaches MMSE estimation to within 1 dB. The authors in [295] have combined the conventional channel estimation technique for an OFDM receiver with a DNN to surpass MMSE estimation in terms of normalized MSE.
Researchers have proposed variations of the DNN for estimating the channel in multicarrier systems [293, 296,297,298]. A deep learning residual framework (ResNet) consisting of two short-connected layers and two fully connected hidden layers was used for channel estimation and equalization in FBMC/OQAM systems [293]. The ResNet uses a long real-valued sequence of a filtered frequency-domain complex sequence of the received signal as the training dataset. Accordingly, the channel estimation performance is better than the general DNN. Meanwhile, a DNN cascading with a zero-forcing preprocessor called Cascade-Net was proposed for detecting OFDM symbols, outperforming the zero-forcing method [296]. Model-driven DNN subnets, ComNet, replaced the usual OFDM channel estimation and symbol detection receiver blocks, surpassing general DNN by offering to refine inputs [297]. A variation of the ComNet receiver includes a compensating network called SwitchNet that outperforms the ComNet [298].
DNN hidden layer with only a tiny portion of its neurons connected to the previous layer neurons is called the convolutional layer [299]. In addition, the convolutional layer neurons share the same parameters. General CNNs significantly reduce the total amount of training parameters, comprising an architecture with an input and convolution layer followed by a pooling set and fully connected layers until the output layer is reached, as shown in Fig. 18 [227, 228]. The convolution layer enables the gathering of local patterns upon the input data. Meanwhile, the pooling layers summarize the given information. This network region reduces the data dimensional space while retaining the original information. Thus, the classification stage is conducted by fully connected layers.
A CNN has been exploited to recover information from OFDM signals without relying on explicit DFT or IDFT computations and performed better than channel estimators based on linear MMSE [300]. In [299], the authors added a CNN between preprocessing modules to develop a CNN-based detector that adapts to large systems or wide bands. The authors in [301] have joined CNN and image super-resolution to create a channel estimation method that, after offline training, outperforms the MMSE estimator and can potentially save spectrum.
Joining CNN and DNN can boost channel estimation. The authors of [302] have proposed intelligent signal detection comprising DNN and CNN for OFDM with index modulation. The signal detector uses pilots to achieve semi-blind channel estimation and reconstructs the transmitted symbols based on CSI. In [303], a hybrid NN-based fading channel prediction has been designed by connecting CNN and DNN layers. The hybrid channel predictor aggregates robustness to systems operating over frequency-selective channels such as MIMO-OFDM. The authors in [273] have developed a channel estimation method for high-speed scenarios using a combination of CNN and RNN. The new network, ChanEstNet, extracts the channel response feature vectors for channel estimation, exhibiting low computational complexity compared to traditional channel estimation methods.
Regarding the complexity issue, DNNs depend on extensive training datasets and apply matrix multiplication between sequential layers. For example, the adaptive DNN complexity investigated in [295] is equivalent to the accurate LMMSE channel estimation scheme, but its performance is much better. To reduce DNN complexity, the authors in [294] have combined the deep image prior (DIP) model, diminishing the training overhead and only needing pilot symbols during channel estimation. Also, a sliding structure based on the signal-to-interference power has been designed for computational complexity reduction compared to a single deep detection network [296]. Furthermore, by splitting the receiver into different subnets, DNNs demand less memory and computation than LMMSE-MMSE methods [297,298,299]. Instead of reducing the DNN-aided detector complexity, some researchers have traded it for better capabilities. For instance, the complexity has been swapped for the ability to replace DFT with a linear transformation [300]. Finally, merging LSTM and CNN creates a hybrid network that was shown to be able to predict channel characteristics [273].
Autoencoder-aided end-to-end systems
Autoencoders apply unsupervised learning to replace an end-to-end communication system. Hence, from the block-structure communication system point of view, autoencoders substitute the whole structure composed of the serial-to-parallel converter, lookup table, modulator, detector, symbol estimation, parallel-to-serial converter, and so forth. Autoencoders take advantage of the input data statistics to communicate them through the channel so that the fewest possible data is sent. Still, it allows the receiver to understand the input data completely [304]. Autoencoders reconstruct the input data through a series of latent representations, typically using an MMSE objective and a stochastic gradient descent (SGD) solver to find the network weights, achieving a practical regression [305]. Figure 19 depicts a general autoencoder architecture, which is taken as the basis for autoencoder systems implementation in the following discussion.
DNN and CNN are used to construct autoencoders. On the transmitter side, they learn the mapping from bits to waveforms. At the receiver side, they learn the synchronization, parameter estimation, and demapping from waveforms to bits. Some channel impairments are considered to train the autoencoder: noise, time and rate of the signal arrival, carrier frequency, phase offset, and the received signal delay spread [305]. Although it may seem that an extensive dataset is required for training autoencoders, they usually require a tiny portion of the code space, the ratio being even \(2.9387359 \times 10^{-34}\). Thus, autoencoders contribute to the used resources [306]. The trained autoencoder results in a transmit and receive signal that resembles those of MCM communication systems.
The end-to-end autoencoder-based communication system can compete with mature systems such as OFDM, FBMC, GFDM, and UFMC without any prior mathematical modeling or analysis [307, 308]. In [307], the DNN and CNN-based autoencoder of [305] has been enhanced to deal with synchronization and ISI. For synchronization, an introduced NN is responsible for separating the infinite sequence of the received samples into different probable block groups and estimating each group probability. For ISI, during the training phase, the autoencoder assumes the received messages present ISI interference in learning to solve this impairment. The enhanced autoencoder has been tested against real channels and demonstrated a performance 2 dB worse than that of the MMSE method. In [308], the proposed DNN-based autoencoder exhibited fast convergence when operating over an aggressive Rayleigh fading channel. The autoencoder transmitter and receiver parts were alternatively trained until the loss stopped decreasing. The authors claimed that the autoencoder could be applied to any channel without analysis.
Instead of competing with well-established MCM systems, autoencoders can be combined with them, bringing more reliability [309, 310]. DNN-based autoencoders have been proposed to mitigate synchronization errors and simplify equalization over multipath channels [309]. The proposed model has also shown flexibility regarding imprecise knowledge about the channel and reduced complexity compared to conventional OFDM systems. The authors in [310] have combined autoencoders to an OFDM under single-bit quantization. The OFDM data detection loss under that constraint was reduced using an unsupervised autoencoder, competing with unquantized OFDM at SNR values smaller than 6 dB.
Autoencoders have also been compared with MIMO systems [311, 312]. The authors in [311] have obtained an autoencoder that outperforms Alamouti space-time block code (STBC) [313] operating over the Rayleigh fading channel for SNR values greater than 15 dB. It is considered perfectly known, quantized, and none CSI information scenarios. The optimum autoencoder was achieved using NN-based regression, considering channel estimation on both the transmitter and receiver sides. In [312], the authors combined autoencoders and ELM and proposed a novel detection scheme for MIMO-OFDM. In this approach, the autoencoders refine the input data before transmitting it and ELM is employed to classify the received signal based on regular features. The BER performance of the novel MIMO-OFDM detector is similar to the maximum-likelihood detection (MLD).
The extension of MIMO, mMIMO, has also been targeted to use autoencoders. The proposed network in [314] employs CNN to learn the channel structure effectively from training samples to recover CSI even in low compression regions. This autoencoder is mainly investigated for multicarrier systems where the BS receives the CSI from the users. The autoencoder can transform the channel matrix into a shorter-dimensional vector and vice versa. Even though executing new sensing and recovery mechanism beats existing compressive sensing-based methods, the authors claimed it could be enhanced by applying advanced DL strategies.
In terms of complexity, autoencoders require a large dataset for training and to reach the optimum solution, thus resulting in a trade-off between performance and computation. Some works have addressed power demand reduction as the attractiveness of their proposed method. For example, tensor-based processing can reduce power requirements by lowering clock rates, increasing algorithm concurrency, and adapting, as pointed out in [305]. The PAPR could also be reduced using a network based on an autoencoder architecture of DL [306, 309]. Other works implement different training strategies to reduce the intrinsic trade-off between the performance and computation of autoencoders [308, 310]. For example, in [307], the authors have used a two-phase training: the architecture is trained with simulated channels in the first phase, and the receiver is fine-tuned over realistic channels in the second phase.
Other neural networks
Generative adversarial network (GAN) [315,316,317,318], general regression neural network (GRNN) [319, 320], and fuzzy neural network (FNN) [321, 322] have also been investigated in the channel estimation subject. Likewise, the least mean error [323], meta-learning [324], k-means clustering [325], and LS [326] techniques were applied to leverage NN training. Regarding these training techniques, the survey has shown that ML might also be an interesting approach to overcoming the voluminous training dataset problems in DNN.
Generative adversarial network
A GAN comprises two networks: generative and adversarial networks. These networks operate competitively, as shown in Fig. 20. The generative network aims to retrieve the original information utilizing training. On the other hand, the adversarial network discriminates the incoming labeled fake samples of the first network by comparing them with accurate data. In other words, the adversarial must learn to recognize false and true patterns and the generative to deceive the former. In this way, the generative network is later trained to fool the adversarial network by passing through samples as true [316].
This concept was applied to reshape the ResEsNet [315, 327] by considering the channel response with known pilot positions as a low-resolution image. Thus, the GAN was applied to estimate the CSI in a super-resolution approach. First, the generator comprises convolution layers and residual blocks with pre-residual activation units. Then, batch normalization is applied to the beginning and the end to map/remap the data to the scale model. Finally, the fake samples feed the discriminator, also formed by convolution, batch normalization, and Leaky ReLU layers [315]. The super-resolution GAN has outperformed the ResEsNet estimation while presenting better performance than the LMMSE estimator. Furthermore, a GAN-based channel estimation approach was proposed for high-speed mobile scenarios [317]. The method goal was to reduce the complexity of the channel estimation process by training a discriminator to learn and extract channel time-varying features. After, the generator acts upon the samples to generate and restore the channel information.
The GAN approach has also been modeled to reduce the number of pilots in MIMO-OFDM and OFDM systems [316, 318]. The first network proposal exploited the generative network to learn how to produce channel samples based on training on real data [316]. After that, the trained model was used to get current channel samples according to the received signal. The results have been compared with a supervised learning ResNet mode, exhibiting better performance. However, it could not overcome the LMMSE estimator. Meanwhile, the GAN has been devoted to mapping low-dimensional channel space into a high-dimensional one, reducing the pilots number in an OFDM system [318]. As a result, the designed network could track the CIR at different channels after training, outperforming the LMMSE and ChannelNet estimators.
General regression neural network
The GRNN has been proposed as an enhanced version of the RBFN founded on nonparametric regression [319, 320, 328]. The network falls into the probabilistic NN category. The GRNN architecture comprises four layers known as the input, pattern, summation, and output layers, as shown in Fig. 21. The former and the latter are classical structures of NN architecture. The pattern layer is the single learning layer of the network and it is fully connected with the neurons of the input layer [328]. The pattern output is fully connected to the s-summation and the d-summation neurons of the summation layer. In contrast, the former computes the weighted sum from the previous layer and the latter the unweighted values. Thereafter, the output layer divides the s-summation results by the d-summation.
This neural network approach has been applied in channel estimation using partial CSI information obtained from data-aided decision feedback channel estimation, showing more accurate interpolation results [319, 320, 329]. The network structure has four layers: input, pattern, summation, and output layer. The pattern layer includes the radius of the radial basis function that can control the smoothness level of the regression results. The summation layer sums the neuron pattern outputs by multiplying them by the desired results and, after, by their own, which are further combined in the output layer. This network was first applied to time-domain [319] smoothness and extended to a frequency-domain strategy [320]. The latter has outperformed the former and conventional pilot-aided channel estimation.
Fuzzy neural network
The fuzzy logic was applied to leverage a fuzzy controller to periodically adjust the step size in an LMS algorithm for OFDM systems [321]. The results showed a faster convergence and robust tracking of channel variations when compared with the LMS under different channel conditions. Furthermore, a functional link FNN estimator was developed [322]. The network comprises a functional link NN integrated with fuzzy rules, whereas each one is a sub-functional link NN with a function expansion of input variables. The network performance was close to the MMSE estimator.
Reduction training techniques for neural networks
Regarding the training approach, the least mean error algorithm was applied to a NN with two sub-networks to identify amplitude gain and phase variation [323]. Moreover, the LS algorithm was integrated into a black box NN [326]. The process uses the LS to estimate the channel at the pilot subcarriers, then apply it to the network to predict the channel response at the data subcarriers. This approach might be seen as an ML interpolation strategy with results similar to the MMSE, and some other discussed NN. A proposed similar channel estimation method using a multiple variable regression approach to design an ML algorithm that does not require any initial information or statistics about the channel is found in [330]. It uses the SGD algorithm for parameter optimization purposes. This proposal has been compared with the LS and MMSE estimators, outperforming the conventional estimator while providing performance similar to the perfect estimation.
The K-means clustering algorithm was proposed to support a semi-blind channel estimator for cell-free mMIMO [325]. The algorithm allows clustering of the received signal to optimize the channel estimation process. In the meantime, the meta-learning has been exploited in a two-stage method named robust channel estimation with meta-learning neural networks (RoemNet) for OFDM symbols [324]. The proposed network can learn general characteristics from multiple channels, gathering meta-knowledge for training purposes. Furthermore, this approach allows applying the RoemNet to different unknown channels and fast refinement of its weights by using a few pilot symbols through the meta-update process. The RoemNet performance has proved its ability to learn and better estimate the channel with a few pilots, outperforming the MMSE estimator. However, the increase in the pilots quantity leads to similar results. Also, it was shown that with 8 pilot-long sequences, training the RoemNet yields lower BER than the LS estimator with 128 pilot-long sequences.
Complexity discussion
Regarding the complexity, GANs can reduce it during training while improving the performance compared with residual NN [315, 316]. Additionally, the GAN-based estimation proposed in [316] does not require retraining, even if the number of clusters and rays changes considerably, and lowers the number of necessary pilot tones. Complexity-wise, the network approaches in [316, 318] have the lowest value compared to the LS and LMMSE. Meanwhile, the network algorithm complexity of [318] was compared with the MMSE estimator, resulting in a linear and cubic relationship with the number of pilots, respectively. FNN could not reduce the complexity of well-known estimators while improving performance [322]. In [321], the used FNN showed a steeper learning curve than MSE but increased the computation load slightly. GRNN demanded only 0.0534ms of processing time for channel estimation at SNR, such as 30 dB to achieve a BER of \(1.2 \times 10^{-4}\), as an example of its computational complexity [319]. However, it kept the trade-off between performance and complexity, requiring 0.4206ms to reach a BER of \(1.8 \times 10^{-5}\). GRNN could reduce this trade-off for other NN-based estimation methods. For example, GRNN replaces ANN in [320] to eliminate the iterative training process and diminish the computational complexity as the BER decreases. Other techniques, such as least mean error, meta-learning, k-means clustering, and LS, focus on reducing the training overhead to demand less computation.