 Research
 Open access
 Published:
Modified state activation functions of deep learningbased SCFDMA channel equalization system
EURASIP Journal on Wireless Communications and Networking volumeÂ 2023, ArticleÂ number:Â 115 (2023)
Abstract
The most important function of the deep learning (DL) channel equalization and symbol detection systems is the ability to predict the userâ€™s original transmitted data. Generally, the behavior and performance of the deep artificial neural networks (DANNs) rely on three main aspects: the network structure, the learning algorithms, and the activation functions (AFs) used in each node in the network. Long shortterm memory (LSTM) recurrent neural networks have shown some success in channel equalization and symbol detection. The AFs used in the DANN play a significant role in how the learning algorithms converge. Our article shows how modifying the AFs used in the tanh units (block input and output) of the LSTM units can significantly boost the DL equalizer's performance. Additionally, the learning process of the DL model was optimized with the help of two distinct errormeasuring functions: default (crossentropy) and sum of squared error (SSE). The DL model's performance with different AFs is compared. This comparison is conducted using three distinct learning algorithms: Adam, RMSProp, and SGdm. The findings clearly demonstrate that the most frequently used AFs (sigmoid and hyperbolic tangent functions) do not really make a significant contribution to perfect network behaviors in channel equalization. On the other hand, there are a lot of noncommon AFs that can outperform the frequently employed ones. Furthermore, the outcomes demonstrate that the recommended loss functions (SSE) exhibit superior performance in addressing the channel equalization challenge compared to the default loss functions (crossentropy).
1 Introduction
Over the past few years, providing customers with access to broadband wireless communication services has become the top priority for businesses. As a result, researchers have focused on developing new wireless technologies that can handle high data rates while remaining unaffected by radio frequency (RF) impairments. In recent years, multicarrier orthogonal frequency division multiple access (OFDMA) schemes have emerged as the dominant principle for broadband wireless applications due to their high spectral efficiency obtained by selecting a special set of overlapping orthogonal subcarriers [1].
It is challenging perfectly recover the transmitted data at the receiver side due to the significant intersymbol interference (ISI) effect that is formed between the highly broadcasted symbols in the multipath environment of wireless communication channels. As a result, it is crucial for wireless communications systems to find a solution to the ISI issue. Hence, to reduce the inferior consequences of ISI, you cannot get around the need for strong channel equalization techniques.
The objective of the channel equalization is to produce a nearly flat response in the frequency domain (FD) from the cascade of the channel and the equalizer, thereby minimizing or eliminating the negative effects of the ISI in the multipath fading channels. Various types of equalizers, including linear equalizers and nonlinear equalizers, are used in the digital broadband wireless communication receivers [2].
Furthermore, it is possible to think of the channel equalization as a classification problem in which an equalizer is built as a decisionmaking device to reconstruct the symbol sequence with the highest possible accuracy [3]. Complex classification tasks are within the capabilities of artificial neural networks (ANNs) because they can form arbitrary nonlinear decision boundaries [3, 4]. In general, the ANN equalizers are superior to linear and nonlinear equalizers in terms of equalizer performance and symbol error rate (SER) [5,6,7,8].
Machine learning (ML) [9, 10] techniques especially deep learning (DL) ANNbased methods has been significantly developed to aid in the resolution of numerous challenging issues, including face recognition [11, 12], image synthesis and semantic manipulations [13], sentiment classification [14], image recovery [15], digital image augmentation [16] and many other aspects. DL uses different kinds of neural networks such as convolutional neural networks (CNN [17, 18], multilayer perceptron (MLP) [19], and recurrent neural networks (RNN) [20]; to learn abstract features from data. Additionally, the availability of highspeed computational power as well as the effectiveness of DL in different fields have prompted its utilization for the development of strong broadband wireless communication systems [21, 22]. Numerous researchers have proposed the use of DL in the design of broadband wireless communication systems and exhibited enhanced Bit Error Rate (BER) results. In this regard, deep ANNs have recently received a lot of attention in the field of channel equalization because of their abilities to accomplish the mapping between input and output domains in a way that's not linear [3, 23, 24].
In this case, the deep ANN approach is a good choice among the available channel equalization options. However, there are still some concerns and questions that require answers, such as the following:

1.
Is it possible to improve the performance gain of the equalization process of the DL model in terms of BER by changing the activation functions (AFs).

2.
Is it possible to improve the learning process by varying the loss functions, and how does this affect the robustness and efficiency of the proposed DL model.
1.1 Motivations and contributions
Hochreiter and Schmidhuber [25] came up with Long shortterm memory (LSTM), which is an architecture for a RNN that has been proven to efficiently work for different learning issues, particularly those with sequential data [26]. The LSTM structure contains blocks, which are a set of recurrently interconnected nodes. In RNNs, the gradient of the error function could rise or decline exponentially with time, which is identified as the vanishing gradient problem. LSTMs reconfigure their network units to address this issue. Each LSTM block is made up of one or more memory cells that are selfconnected, as well as input, forget, and output multiplicative gates. The gates improve the performance by giving the memory cells more time to store and retrieve data [26].
LSTMs and bidirectional LSTMs have considerable impacts in a wide range of applications, particularly classification ones. For example, these networks can be used in online mode detection [27], sound classification [28, 29], and handwriting recognition [30, 31]. Additionally, LSTMs are utilized for speech synthesis [32], acoustic modeling [33], emotion identification [34], and speech translation [35]. Moreover, these networks are used for protein structure prediction [36, 37], language modeling [38], human activity analysis [39], video and audio data processing [40], and have been successfully utilized in 5G wireless communication systems [41,42,43].
In general, a neural network's performance depends on a variety of aspects, including the network's structure, the learning algorithm, and the activation functions (AFs) utilized in each node. The importance of AFs has not received as much attention as learning algorithms and architectures have in neural network research [44,45,46], though the AFs are very important to NNs due to their assistant in learning abstract features through nonlinear transformations [46]. The value of the AFs determines the decision borders as well as the total input and output signal strength of the node. Choosing the right AFs can have an effect on how well networks work, how complicated they are, and how well the algorithms converge [45, 47].
Throughout this work, we formulate the channel equalization dilemma in the modified version of orthogonal frequency division multiple access (OFDMA), known as singlecarrier FDMA, which gives a moderate peaktoaverage power ratio (PAPR) compared to the OFDMA, and has been used in the longterm evolution (LTE) standard for uplink (UL) transmission, as a DL task. In the DL model, the channel equalization and signal detection processes are treated as a black boxes, and their functions are constantly approached by a DNN model based on the recurrent feedback LSTMNN. This model can do equalization and symbol decoding at the same time, even though it does not have any knowledge about channel state information (CSI). The DL model takes features from the SCFDMA system's received messages and labels them based on the constellation map used at the transmitter.
In this study, we evaluate the performance of several AFs to improve the learning process that improve the learning process of the DL model by fixing the issue of vanishing gradients and leading to more accurate classifications than traditional ones. These AFs will be utilized in the LSTM block's input and output instead of the currently used "tanh" AF, which is known as a state activation function (SAF). Thus, we will build a reliable SCFDMA wireless communication system using the modified LSTM DNNs. Finally, simulation findings demonstrated that our proposed scheme outperforms other widely employed signal equalization schemes in terms of bit error rate (BER). This effective illustration demonstrates the value of DL in SCFDMA systems.
In summary, our contributions are:

1.
We construct a novel LSTM network with different SAFs in the equalization and symbol detection process as an alternative to the conventional hyperbolic tangent (tanh) function.

2.
We construct a reliable and efficient SCFDMA receiver for combined channel state equalization and symbol detection implicitly.

3.
We evaluate the influences of the alternative optimization algorithms, like Adam, RMSProp, and SGdm, on the learning stage of the proposed network to produce the most efficient and reliable model and, consequently, on the equalization and symbol detection performance of the deep network.

4.
We assess the effects that varying loss functions, e.g., crossentropy and sumsquared errors, have on the learning process and how this affects the robustness and the efficiency of the proposed model.

5.
We compare the performance of the proposed framework with that of linear equalizers (LEs) such as zeroforcing (ZF) and minimum mean squared error (MMSE).

6.
To figure out how well the proposed DL model works, we compare its BER performance with that of the other existing NNbased blind equalization algorithms, such as both the convolutional neural networkbased (CNNbased) blind equalization algorithm described in [48] and the BiLSTMbased equalization algorithm described in [24].
The following sections will organize the remainder of the paper: Sect.Â 2 is devoted to describing the methods including the system description subsection, the DL model subsection, and the activation functions subsection. Meanwhile, Sect.Â 3 introduce the offline training of the suggested scheme. The results and discussions are then shown in Sect.Â 4. Finally, Sect.Â 5 concludes the study.
2 Methods
2.1 System model
FigureÂ 1 shows the proposed SCFDMA system according to [49]. The system's overall subcarriers are M. Each of the N subcarriers is assigned to a single user from among those N_{u} users, where Mâ€‰=â€‰N_{u}â€‰Ã—â€‰N. All of this is achieved just after the Npoint FFT transformation. Following the Mpoint IFFT, a cyclic prefix of length L_{cp}, equal to or greater than the length of the channel's transfer function L_{ch}, would be inserted. This formula \({g}_{k}={F}_{M}^{H}{T}_{k}{F}_{N}{s}_{k},\) represents the time domain (TD) transmitted signal that corresponds to the kth user in vector form, without the L_{cp}. Where s_{k} is the kth user's Nâ€‰Ã—â€‰1 symbol vector, T_{k} is an Mâ€‰Ã—â€‰N subcarrier mapping matrix, and \({F}_{N}^{H}\) and \({F}_{M}^{H}\) are the FFT and IFFT matrices, respectively, with dimensions Nâ€‰Ã—â€‰N and Mâ€‰Ã—â€‰M. Assume that h_{k} is the (L_{ch}Â Ã— 1) transfer function of the channel between the kth userÂ and the base station, with maximum delay spread L_{ch} smaller than the L_{cp} to completely eliminate the ISI. At the other end (receiving side), the process will be reversed. The CP is first eliminated, after which the SCFDMA symbols are transformed into FD byÂ Mpoint FFTÂ along with subcarrier demapping to extract the FD received signal for the kth user. The FD received signal is then equalized using any conventional technique, such as in [49], to mitigate the effects of the ISI. After Npoint IFFT TD transformation, demodulate and find the kth user original transmitted symbols.
Instead of using traditional channel equalization techniques, the proposed method uses a DNN model. This creates an endtoend approach that can retrieve the original information directly from the information that was sent, without having to get into the intricacies of the channel equalization and symbol detection systems.
2.2 DL model
The LSTM NN structure is covered in this part as a DL model for combined channel equalization and symbol detection. The proposed DL LSTMbased channel equalizer is trained offline using the simulated data.
The LSTM network is a type of recurrent neural network that has the ability to learn longterm correlations among time step sequences [25]. Various LSTMbased systems have been designed to tackle issues such as speech recognition, handwriting recognition, and others [50,51,52,53]. In Fig.Â 2, we see the singlecell LSTM block, which is a collection of recurrently interconnected nodes.
At time \(t\), the input vector \({x}_{t}\) is inserted in the network and the mathematical model for the LSTMNN setup is given by the following six equations as in [54].
where \(i, o,\mathrm{and }f\) represent the input, output, and forget gates, respectively. The forget and input gates enable the LSTM NN to effectively store longterm memory. The input gate finds the information that will be used with the previous LSTM cell state \({c}_{t1}\) to obtain a new cell state \({c}_{t}\) based on the current cell input \({x}_{t}\) and the previous cell output \({h}_{t1}\). The output gate finds current cell output \({h}_{t}\) by using the previous cell output \({h}_{t1}\) at current cell state \({c}_{t}\) and input \({x}_{t}\). The forget gate allows forgetting and discarding the information by currently used input \({x}_{t}\) and cell output \({h}_{t}\) of the last process. Using the forget and input gates, LSTM can decide which information is abandoned and which is retained. \({\mathrm{g}}_{t}\) defined in Eq.Â 3 is the block input/cell candidate at time \(t\) which is a tanh layer and with the input gate in Eq.Â 5, the two decides on the new information that should be stored in the cell state. \({c}_{t}\) is the cell state at time \(t\) which is updated from the old cell state Eq.Â 5. Finally, \({h}_{t}\) is the cell output/block output at time t.
The output of the block \({h}_{t}\) is recurrently connected back to the block input \({\mathrm{g}}_{t}\) and all of the gates (\(i, o,\mathrm{and }f\)). \({\sigma }_{\mathrm{g}},\mathrm{ and }{\sigma }_{c}\) represent the gate activation function (sigmoid function), and the state activation function (tanh function), respectively. \(\odot\) denote the Hadamard Product (Elementwise Multiplication). \(W={[{w}_{i}{w}_{f}{w}_{\mathrm{g}}{w}_{o}]}^{T},b={[{b}_{i}{b}_{f}{b}_{\mathrm{g}}{b}_{o}]}^{T} and R={[{R}_{i}{R}_{f}{R}_{\mathrm{g}}{R}_{o}]}^{T}\) are the input weights, the biases, and the recurrent weights, respectively.
2.3 Activation functions
The sigmoid and hyperbolic tangent functions are the most frequently used activation functions in neural networks. However, a number of separate studies have looked into other activation functions [44,45,46].
In this article, we will look at how well the DNN LSTM works when these activation functions are used instead of the state activation functions (hyperbolic tangent function (tanh) of the basic LSTM block to effectively combine channel state equalization and symbol detection in the SCFDMA wireless communication systems. Table 1 lists the most common activation functions that have been used: tanh, Gaussian, GELU, Cloglogm, Modified Elliott, Elliott, Bitanh1, Bitanh2, Rootsig, Softsign, Wave, and Aranda [44,45,46,47, 54,55,56,57,58,59].
3 Offline training of the suggested DL model
Due to the lengthy training period required for the proposed model and the large amount of variables that must be tuned at the time of training, e.g., weights and biases, training must be conducted offline. The trained model is utilized to extract the transmitted data during online implementation.
For the bulk of machine learning tasks, obtaining a huge amount of labeled data for training is a challenge. Alternatively, training data for channel equalization issues can be easily gotten by simply conducting a simulation. Obtaining the training data is straightforward once the channel parameters and model are known.
Offline training of the neural networks is carried out using simulated data. When you run a simulation, you start with a random message s and send the SCFDMA frames to the receiving end through a simulated channel model. Each frame has one SCFDMA symbol in it. To retrieve the received SCFDMA signal, SCFDMA frames with varying channel defects are used. After undergoing the distortion of the channel and removing the CP, the incoming signals y are gathered as a training samples. As shown in Fig.Â 1, the network's input data are the signals that are received y, and the actual information messages s. These signals act as the supervision labels.
The same dataset is used for training and testing all equalizers, whether they are CNNbased, BiLSTMbased, or LSTMbased with modified loss and SAFs.
As the proposed modified DL loss and SAFs LSTMbased channel equalizer and symbol detector is created as shown in Fig.Â 3, the weights and biases of the recommended equalizer will be adjusted (tuned) before the deployment using the appropriate optimization algorithm.
A number of different optimization algorithms are used to get the best possible DL channel equalization and symbol detection model for the SCFDMA wireless communication system. Some of them are adaptive moment estimation (Adam), root mean square propagation (RMSProp), and stochastic gradient descent with momentum (SGdm).
To figure out the best parameters (weights and biases), a loss function is used to figure out how far the network output is from the desired output, and by minimizing the loss function and updating the weights and biases, the optimization algorithms train the model and reach the optimal network parameters.
The loss function, in its simplest form, is the difference between the network's output and the original messages, which can be expressed in a variety of ways. The loss functions we used in our experiments are the crossentropy and the sum of squared errors (SSE), and they can be expressed as follows:
where \(c\) is the class number, \(N\) is the sample number, \({s}_{ij}\) is the \(i\mathrm{th}\) transmitted data sample for the \(j\mathrm{th}\) class and \({\widehat{s}}_{ij}\) is the modified DL SAF LSTMbased model response for sample \(i\) class \(j\).
During the offline training period, we change the SAF (hyperbolic tangent function (tanh) from Table 1 to see how it affects the performance of our DL model during the online implementations.
Finally, after the offline training, the model is capable of recovering data automatically, without the need for explicit channel estimation and symbol detection processes. These processes are accomplished together. FigureÂ 4 shows how to train offline to get a learned DL model based on LSTMNN.
The most important limitations and challenges of the proposed system are that each user in the system is allocated four subcarriers, with the possibility for each subcarrier to be one of four QPSK constellation points. In the training, the quantity of labels is denoted as M_{s}^{N}, where M_{s} represents the constellation (modulation) order and N signifies the subcarriers that are exclusively allocated to a single user. Consequently, there are 256 classes since there are 4^{4}â€‰=â€‰256 labels in the training set. For the LSTMNN, this means that the fully connected layer size needs to be 256 in order to match the number of classes. The number of labels will increase if higherorder modulations are used or if more subcarriers are allocated to each user. The increase in the number of labels leads to an increase in the number of classes and an increase in the size of the LSTMNN fully Connected Layer. Such an approach requires a very large amount of data necessary for good or effective training and will lead to an increase in training time and decreased usability, ultimately rendering the system impractical. We therefore advise the utilization of QPSK.
4 Results and discussions
Several experiments were carried out to demonstrate the efficiency of the proposed modified loss and state activation functions (SAFs) (Table 1) LSTMbased configurations for the channel equalization and symbol detention techniques in the SCFDMA wireless communication system. The proposed DLNNbased equalizer was trained offline based on several learning optimizers, namely: The SGdm, RMSProp, and Adam [60], and compared with the conventional ZeroForcing (ZF) and Minimum Mean Square Error (MMSE) linear equalizers and DL CNNbased and BiLSTMbased equalization algorithms [24, 48], in terms of bit error rates (BERs) at different signaltonoise ratios (SNRs) using the collected data sets. The training dataset is gathered for four subcarriers. The transmitter sends the SCFDMA packets to the receiver, each containing one SCFDMA data symbol. The SCFDMA system and channel specifications are listed in Table 2. The employed DL LSTM NN architecture parameters and training settings are summarized in Table 3.
In these simulations, we also looked at how well the proposed equalizer worked with two different loss functions: default (crossentropy) and sum square of error (SSE).
Instead of using curves, which produce a muddled picture because of their overlap, we used heatmap visualizations, as shown in Fig.Â 5, A heatmap (or heat map) is a graphical representation of data that uses colors to represent values. Using a heatmap, even a large amount of data can be visualized and understood quickly. Heatmaps make it easier to combine quantitative and qualitative data for data analysis and provide a quick overview of a model's performance. As a visual tool, heat maps help make informed, databased decisions. As an example of using the heatmap charts, the authors in [42] use them in their published work.
First, we will discuss the default (crossentropy) loss function. In the case of deepfading channels, it is well known that the linear equalization may amplify the noise at the spectral null, which has a negative impact on the performance of the SCFDMA system. So, it is clear from Fig.Â 5, that all the proposed modified DL SAFs LSTMbased equalizers using the Adam learning algorithm and crossentropy loss function outperform both the ZF and the MMSE equalizers at SNRs ranging from 10 to 20 dB, while at 8 dB all the proposed SAFs LSTMbased equalizers outperform both the ZF and the MMSE equalizers except the proposed GLEU SAF, which outperforms the ZF only.
Also, it is clear from Fig.Â 5, that most of the proposed modified DL SAFs LSTMbased equalizers have promising results compared to this using the default (Tanh) SAF. Furthermore, it should be noted that most of the proposed modified DL SAFs LSTMbased models demonstrated exceptional signal detection capabilities when the SNR exceeded 12 dB. In this case, the BER is zero, which serves as an indication of the model's capabilities.
In contrast to alternative DLbased channel equalization systems, such as those based on CNN and BiLSTM [24, 48], the modified DL SAFs LSTMbased equalizers that have been proposed exhibit encouraging performance across the majority of SNR levels, as shown in Fig.Â 5.
FigureÂ 6 also shows that the proposed modified DL Aranda, Gaussian, and Wave SAFs LSTMbased equalizers using the RMSProp learning algorithm and the default (crossentropy) loss function have superior performance than both linear equalizers (ZF and MMSE) and the DL CNNbased equalizer at SNRs between 8 and 20 dB, and the DL LSTMbased model with the default SAF (Tanh) at SNRs ranging from 4 to 20 dB, and outperform the DL BiLSTMbased equalizer at low SNRs ranging from 0 and 10 dB. Furthermore, Fig.Â 5 demonstrates that the proposed modified Aranda, Gaussian, Wave, Elliott, Modified Elliott, and Softsign SAFs LSTMbased equalizers outperform the stateoftheart CNN approach [48] over the entire range of SNR.
Besides, it is obvious from Fig.Â 7 that the proposed modified DL SAFs LSTMbased equalizers (Bitanh1, Cloglogm, Bitanh2, Rootsig, Softsign, Gaussian, Wave, and Elliott SAFs using the SGdm learning algorithm and default (crossentropy) loss function outperform the linear equalizers (ZF and MMSE equalizers) and the DL model with the default SAF (Tanh) at SNRs ranging from 10 to 20 dB, and the DL CNNbased equalizer over all the SNR ranges. On the other hand, the DL BiLSTMbased equalizer produces approximately comparable performance to the proposed DL Bitanh2 SAFs LSTMbased equalizer. The proposed Aranda SAF has the worst BER at all SNRs ranging from 10 to 20 dB.
Secondly, in the case of the Sum of Squared Errors loss function, from Fig.Â 8, we can observe that all of the proposed modified DL Cloglogm, Bitanh2, Modified Elliott, Wave, Softsign, Rootsig, Bitanh1, Elliott, and Aranda SAFs LSTMbased equalizers using the Adam learning algorithm outperform both the ZF and the MMSE equalizers at SNRs ranging from 10 to 20 dB. While at the SNR of 8 dB, the proposed Cloglogm, Modified Elliott, Bitanh2, Softsign, Bitanh1, Rootsig, and Elliott SAFs provide better performance than the other proposed SAFs and the linear equalizers. On the other hand, the proposed Modified Elliott, Bitanh2, Softsign, and Rootsig SAFs LSTMbased equalizers have superior performance to the DL LSTMbased model that uses the default SAF (Tanh) over all the SNR ranges.
In contrast to the other DLbased channel equalization systems, the CNNbased and the BiLSTMbased approaches [24, 48] in this case have the worst BER over the entire range of SNR, as shown in Fig.Â 8.
In addition, as shown in Fig.Â 9, the proposed modified DL Rootsig, Elliott, Cloglogm, Bitanh2, Softsign, Bitanh1, Gaussian, and Modified Elliott SAFs LSTMbased equalizers trained with the RMSProp learning algorithm and the Sum of Squared Errors loss function outperform the linear equalizers (ZF and MMSE equalizers) and the DL LSTMbased model that uses the default SAF (Tanh) at SNRs ranging from 8 to 20 dB, and the CNNbased or the BiLSTMbased DL equalizers [24, 48] over all the SNR ranges.
FigureÂ 10 shows that all the proposed modified DL SAFs LSTMbased equalizers trained with the SGdm learning algorithm and the Sum of Squared Errors loss function perform better than the traditional ZF and MMSE linear equalizers at SNRs ranging from 10 to 20 dB, and the CNNbased equalizer over all the SNR ranges. Also, the proposed Rootsig, Bitanh2, Softsign, Gaussian, Wave, and Cloglogm SAFs LSTMbased equalizers have superior performance to the DL LSTMbased model that uses the default SAF (Tanh) over the SNRs ranging from 6 to 20 dB. Also, the proposed Gaussian, and Cloglogm SAFs LSTMbased equalizers outperform the BiLSTMbased equalizer at SNRs ranging from 8 and 20 dB.
As we know the default choice for the LSTMNN SAF is the hyperbolic tangent function (Tanh) because it has the advantage of being a smooth and symmetric AF, which helps keep the output values centered around zero. This aids the backpropagation process and decreases the likelihood of vanishing gradients, which can be challenging for deep learning networks [61]. Besides this, the Tanh function has the property of squashing its output values between âˆ’Â 1 and 1, which is beneficial in applications such as normalizing the output of a linear layer [62].
The Tanh function has numerous drawbacks, such as its inability to completely eliminate the vanishing gradient problem, its computational complexity, and can only attain a gradient of 1 when the input value is 0 (x is zero); as a result, the function can produce some dead neurons during the computation process [62, 63]. These limitations of the Tanh function necessitated additional research into alternative AFs capable of addressing these issues. Also, the loss function, which computes the error between the actual and desired outputs, controls convergence and the optimum performance of the model [64].
In the scientific community, there is a significant interest in identifying and defining AFs and loss functions that can enhance the performance of neural networks [47, 54, 56, 64, 65].
We showed in Figs. 5, 6, 7, 8, 9, and 10 that the LSTMbased equalizer worked better when different SAFs were used instead of the default Tanh SAF, and SSE was used instead of the default (crossentropy) loss function. Our research showed that using SSE instead of the default (crossentropy) loss function, and some lessknown AFs instead of the default Tanh has a positive effect on the performance of the LSTM network. This is reflected in the better performance of the DLLSTMbased equalizers.
We may conclude from Figs. 5, 6, 7, 8, 9, and 10 that, the bestproposed state activation functions, which give the best performance in the modified loss and SAFs LSTMbased equalizers and symbol detector under the previous system settings, are listed in the following table.
Optimization techniques are critical for the improvement of DL systems. DNN training can be viewed as an optimization issue, with the objective of achieving a global optimum via a trustworthy training trajectory and rapid convergence via gradient descent techniques [60]. The goal of the DL method is to develop a model that produces more accurate and faster outcomes by modifying the biases and weights to minimize the loss function. Selecting the best optimizer for a certain scientific issue is a difficult task. By selecting an inadequate optimizer, the network may remain in the local minima (stay in the same place) during training, resulting in little progress in the learning process. As a result, the inquiry is required to look at how different optimizers perform based on the model and dataset used to make the best DL model.
This section compares the performance of the three optimization algorithms: Adam, RMSProp, and SGdm, using an experimental approach. We can use Table 4 to select the best SAFs that give the best performance, each with its own optimization algorithm.
In the case of the crossentropy loss function, Fig.Â 11, clearly shows that the proposed modified DL SAF Softsign LSTMbased equalizer using the Adam learning algorithm outperforms all of the other proposed modified SAFs LSTMbased equalizers at all SNRs.
On the other hand, in the case of the sum of squared errors loss function, as shown in Fig.Â 12, the proposed modified DL SAF Elliott LSTMbased equalizer using the RMSProp learning algorithm gives the best performance over all the SNR ranges.
Also from Fig.Â 13, we can say that the best proposed modified DL SAF LSTMbased equalizer is the modified DL SAF Elliott using the RMSProp learning algorithm and the sum of squared errors loss function.
It is beneficial to monitor the training processes of the DL equalizers by investigating the loss and accuracy curves. These curves deliver details regarding how the training process goes, and the user could indeed decide whether to let the training process keep going or quit.
The Adam, RMSProp, and SGdm optimization loss and accuracy curves for our proposed best modified loss and SAFs LSTMbased equalizers in Figs. 14, 15, 17, and 18 highlight the outcomes shown in Figs. 11, and 12. Furthermore, the Adam, RMSProp, and SGdm optimization loss and accuracy curves for the CNNbased and BiLSTMbased approaches in Figs. 14, 15, 16, 17, and 18 emphasize the findings seen in Figs. 5, 6, 7, 8, 9, and 10, where the CNN and BiLSTM can provide improvements over the linear equalizers in the crossentropy loss function with any one of the learning algorithms (Adam, RMSProp, and SGdm), while less or no improvement can be achieved in the case of the sum square of errors.
4.1 Computational complexity of the proposed modified DL loss and SAFs LSTMbased equalizers
The computational complexity of the proposed modified loss and SAFs LSTMbased channel equalization and symbol detection DL models in the SCFDMA is provided empirically in terms of the training time which is performed offline. Training time can be defined as the amount of time expended to get the best NN parameters (e.g., weights and biases) that will minimize the error using a training dataset. Because it involves continually evaluating the loss function with multiple parameter values, the training procedure is computationally complex.
Table 5 lists the consumed training time for the modified SAFs LSTMbased channel equalization and symbol detection DL models. The used computer is equipped with Windows 10 operating system and an Intel(R) Core(TM) i52450M CPU @ 2.50GHz, and 8 GB of RAM.
From Table 5, the best proposed DL SAF Softsign LSTMbased CESD trained with the Adam optimizer and crossentropy loss function consumes a large amount of time compared to the best proposed DL SAF Cloglogm LSTMbased CESD that is trained with the Adam optimizer and sum of squared errors loss function. Also, the best DL SAF Gaussian LSTMbased CESD trained with the RMSProp optimizer and crossentropy loss function consumes a large amount of time compared to the best DL SAF Elliott LSTMbased CESD that is trained with the RMSProp optimizer and sum of squared errors loss function. On the other hand, the best proposed DL SAF Bitanh2 LSTMbased CESD trained with the SGdm optimizer and crossentropy loss function consumes a small amount of time compared to the best proposed DL SAF Gaussian LSTMbased CESD that is trained with the SGdm optimizer and sum of squared errors loss function. Also, from Table 5 and Fig.Â 13, we can say that the best proposed SAF that allows to give the best performance and consumes the least amount of time is the DL SAF Elliott LSTMbased CESD that was trained with the RMSProp optimizer and sum of squared errors loss function. The least SAF training time indicates its lowest computational complexity in comparison to its peers.
Also from Table 6, we can observe that the BiLSTMbased approach requires a large amount of training time for all of the training scenarios (Adam, SGdm, and RMSProp) compared to the proposed modified DL loss and SAFs LSTMbased equalizers, which is an indication of its increased computational complexity due to the fact that the BiLSTM network uses two distinct hidden layers to analyze data in both directions (first, from the past to the future, and second, from the future to the past) before feeding the results into a single output layer [24].
In contrast, from Table 7, we can say that the CNNbased approach requires the largest training time for all of the training scenarios (Adam, SGdm, and RMSProp), which is an indication of its increased computational complexity compared to our proposed modified DL SAFs LSTMbased equalizers.
4.2 Generalization ability and robustness of the proposed models
Several practical channel models have been adopted. By using other practical channel models, we can provide additional analysis for comparing the efficacy of the proposed models and AFs. These channel models have been established based on lots of measurements (such as the indoor and vehicular models) released by ITU [66, 67].
FiguresÂ 5, 6, 7, 19, 20, and 21 depict the BERs of the proposed modified DL loss and SAFs LSTMbased equalizers, the conventional linear equalizers, BiLSTMbased equalizer, and the CNNbased equalizer under two distinct ITU channel models. In all investigated channel models, the proposed modified DL SAFs LSTMbased model outperforms the other equalizers in terms of stability and performance. We trained the model by the ITU Vehicular channel model and then tested it under two distinct ITU channel models (Vehicular and Indoor ITU channel models). The obtained results highlight the generalization ability and the robustness of the proposed equalizer, as it was evaluated using datasets (corrupted by two distinct ITU channel models) that were not utilized in the training process.
5 Conclusion
In conclusion, a modified DL LSTMbased channel equalization and symbol detection method based on changing the default state activation function [the hyperbolic tangent function (tanh)] and the default loss function (crossentropy) was investigated in this study. The effectiveness of the modified DL model that has been suggested has been examined, and its results have been contrasted with those of other common linear equalizers like ZF and MMSE and other DL models like CNNbased or BiLSTM equalizers. The internal weights and biases of the proposed modified DL model were adjusted during the training process with different loss functions (default(crossentropy) and sum of squared errors(SSE)) and different optimization algorithms (Adam, RMSProp, and SGdm). In our results, we have found that the presented modified loss and SAFs LSTMbased channel equalizer and symbol detector achieved higher performance in terms of BER than the conventionally used nonDL algorithms like linear (ZF and MMSE) equalizers and the other DL algorithms like CNNbased or BiLSTM equalizers in the SCFDMA wireless communication systems. Additionally, the outcomes demonstrated that under various DL model settings (i.e., training algorithm, initial learning rate, learning rate drop factor, etc.), some lesserknown activation functions, including GELU, Wave, Bitanh1, Bitanh2, Modified Elliott, Elliott, Gaussian, Cloglogm, Aranda, Softsign, and Rootsig, can in terms of channel equalization accuracy outperform the frequently employed "tanh" state activation functions. Consequently, our comparison revealed that, among the proposed activation functions, the functions summarized in Table 4 (Softsign, Gaussian, Bitanh2, Cloglogm, and Elliott) outperformed the others. Furthermore, the findings showed that using the SSE loss function instead of the default loss function (crossentropy) was an option that greatly improved the accuracy of the modified DL LSTMbased channel equalizer and symbol detector. Finally, the computational complexity of the proposed modified DL loss and SAFs LSTMbased equalizers was investigated, and we found that the proposed model provides a moderate computational complexity compared to the existing BiLSTM or CNNbased approaches. In light of the rapid technological advancements in the design and production of highspeed GPUs, the proposed model is emphasized. As a result of the proposed DL model's extraordinary learning and generalization properties, the suggested equalizer appears promising for channel equalization, particularly under poor channel conditions.
The following ideas are suggested for future research:

Mining for new activation functions and studying the other parts of an LSTM, such as changing the gate activation function (GAFs).

Studying the performance of the proposed modified SAFs LSTMbased channel equalizer and symbol detector systems with other loss functions.
Availability of data and materials
Not applicable.
Abbreviations
 ML:

Machine learning
 ANN:

Artificial neural network
 MLP:

Multilayer perceptron
 DL:

Deep learning
 DNN:

Deep neural network
 CNN:

Convolutional neural networks
 RNN:

Recurrent neural networks
 LSTM:

Long shortterm memory
 AF:

Activation function
 SAF:

State activation function
 Adam:

Adaptive moment estimation
 RMSProp:

Root mean square propagation
 SGdm:

Stochastic gradient descent with momentum
 SSE:

Sum of squared errors
 OFDM:

Orthogonal frequency division multiplexing
 SCFDMA:

Single carrier orthogonal frequency division multiple access
 LTE:

Longterm evolution
 IFFT:

Inverse fast Fourier transform
 FFT:

Fast Fourier transform
 CP:

Cyclic prefix
 TD:

Time domain
 FD:

Frequency domain
 SNR:

Signaltonoise ratio
 BER:

Bit error rate
References
R. Prasad, OFDM for Wireless Communications Systems (Artech House, Norwood, 2004)
S. Hassan et al., Performance evaluation of machine learningbased channel equalization techniques: new trends and challenges. J. Sens. 2022, 1â€“14 (2022)
K. Burse, R.N. Yadav, S. Shrivastava, Channel equalization using neural networks: a review. IEEE Trans. Syst. Man. Cybern. Part C (Appl. Rev.) 40(3), 352â€“357 (2010)
L. Sun, Y. Wang, CTBRNN: a novel deeplearning based signal sequence detector for communications systems. IEEE Signal Process. Lett. 27, 21â€“25 (2020)
A. Zerguine, A. Shafi, M. Bettayeb, Multilayer perceptronbased DFE with lattice structure. IEEE Trans. Neural Networks 12(3), 532â€“545 (2001)
P. Mohapatra et al., Shuffled frogleaping algorithm trained RBFNN equalizer. Int. J. Comput. Inform. Syst. Ind. Manag. Appl. 9, 249â€“256 (2017)
P.K. Mohapatra et al., Training strategy of fuzzyfirefly based ANN in nonlinear channel equalization. IEEE Access 10, 51229â€“51241 (2022)
P. Kumar Mohapatra et al., Application of Bat algorithm and its modified form trained with ANN in channel equalization. Symmetry 14(10), 2078 (2022)
S. Iqbal et al., Automised flow rule formation by using machine learning in software defined networks based edge computing. Egypt. Inform. J. 23(1), 149â€“157 (2022)
H.O. Alanazi, A.H. Abdullah, K.N.J. Qureshi, A critical review for developing accurate and dynamic predictive models using machine learning methods in medicine and health care. J. Med. Syst. 41, 1â€“10 (2017)
O. AgboAjala, S.J. Viriri, Deep learning approach for facial age classification: a survey of the stateoftheart. Artif. Intell. Rev. 54(1), 179â€“213 (2021)
P. Punyani, R. Gupta, A.J. Kumar, Neural networks for facial age estimation: a survey on recent advances. Artif. Intell. Rev. 53(5), 3299â€“3347 (2020)
M. Abdolahnejad, P.X.J. Liu, Deep learning for face image synthesis and semantic manipulations: a review and future perspectives. Springer Artif. Intell. Rev. 53(8), 5847â€“5880 (2020)
R. Wadawadagi, V.J. Pagi, Sentiment analysis with deep neural networks: comparative study and performance assessment. Springer Artif. Intell. Rev. 53(8), 6155â€“6195 (2020)
S.R. Dubey, A decade survey of content based image retrieval using deep learning. IEEE Trans. Circuits Syst. Video Technol. 32(5), 2687â€“2704 (2021)
N.E. Khalifa, M. Loey, S.J.A.I.R. Mirjalili, A comprehensive survey of recent trends in deep learning for digital images augmentation. Springer Artif. Intell. Rev. 55(3), 2351â€“2377 (2022)
A. Khan et al., A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53(8), 5455â€“5516 (2020)
D. Das, R. Naskar, Image splicing detection based on deep convolutional neural network and transfer learning, in 2022 IEEE 19th India Council International Conference (INDICON) (2022)
F.K. OduroGyimah, et al., Prediction of telecommunication network outage time using multilayer perceptron modelling approach, in 2021 International Conference on Computing, Computational Modelling and Applications (ICCMA) (2021)
J. Oruh, S. Viriri, A. Adegun, long shortterm memory recurrent neural network for automatic speech recognition. IEEE Access 10, 30069â€“30079 (2022)
H.A. Hassan et al., Effective deep learningbased channel state estimation and signal detection for OFDM wireless systems. J. Electr. Eng. 74(3), 167â€“176 (2023)
H.A. Hassan, et al., An efficient and reliable OFDM channel state estimator using deep learning convolutional neural networks. J. Electr. Eng. 74(3), 167â€“176 (2023)
Z. Wang et al., Long shortterm memory neural equalizer. IEEE Trans. Signal Power Integr. 2, 13â€“22 (2023)
M.A. Mohamed et al., Modified gate activation functions of BiLSTMbased SCFDMA channel equalization. J. Electr. Eng. 74(4), 256â€“266 (2023)
S. Hochreiter, J.J.N.C. Schmidhuber, Long shortterm memory. Neural Comput. 9(8), 1735â€“1780 (1997)
A. Graves, Supervised sequence labelling, in Supervised sequence labelling with recurrent neural networks. (Springer, 2012), pp.5â€“13
E.F.D.S. Soares, et al. Recurrent neural networks for online travel mode detection, in 2019 IEEE Global Communications Conference (GLOBECOM) (2019)
T. Fernando et al., Heart sound segmentation using bidirectional LSTMs with attention. IEEE J. Biomed. Health Inform. 24(6), 1601â€“1609 (2020)
S. Kamepalli, B.S. Rao, K.V.K. Kishore, Multiclass classification and prediction of heart sounds using stacked LSTM to detect heart sound abnormalities. in 2022 3rd International Conference for Emerging Technology (INCET) (2022)
H. Nisa, et al., A deep learning approach to handwritten text recognition in the presence of struckout text, in 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ) (2019)
N.D. Cilia et al., From online handwriting to synthetic images for Alzheimerâ€™s disease detection using a deep transfer learning approach. IEEE J. Biomed. Health Inform. 25(12), 4243â€“4254 (2021)
A.S. GS, et al., Synthetic speech classification using bidirectional LSTM Networks, in 2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT) (IEEE, 2022).
W. Zhang, et al., Underwater acoustic source separation with deep BiLSTM networks, in 2021 4th International Conference on Information Communication and Signal Processing (ICICSP) (2021)
J.L. Wu et al., Identifying emotion labels from psychiatric social texts using a Bidirectional LSTMCNN model. IEEE Access 8, 66638â€“66646 (2020)
Arya, L., et al., Analysis of layerwise training in direct speech to speech translation using BiLSTM, in 2022 25th Conference of the Oriental COCOSDA International Committee for the Coordination and Standardisation of Speech Databases and Assessment Techniques (OCOCOSDA) (2022)
H. Jin, et al., Combining GCN and BiLSTM for protein secondary structure prediction, in 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2021)
W.W. Zeng, N.X. Jia, J. Hu. Improved protein secondary structure prediction using bidirectional long shortterm memory neural network and bootstrap aggregating, in 2022 10th International Conference on Bioinformatics and Computational Biology (ICBCB) (2022)
J. Jorge et al., Live streaming speech recognition using deep bidirectional LSTM acoustic models and interpolated language models. IEEE/ACM Trans. Audio Speech Lang. Process. 30, 148â€“161 (2022)
A. Shrestha et al., Continuous human activity classification from FMCW radar With BiLSTM networks. IEEE Sens. J. 20(22), 13607â€“13619 (2020)
S. Jung, J. Park, S. Lee. Polyphonic sound event detection using convolutional bidirectional Lstm and synthetic databased transfer learning. in ICASSP 2019â€”2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019).
H. Huang et al., Deep learning for physicallayer 5G wireless techniques: opportunities, challenges and solutions. IEEE Wirel. Commun. 27(1), 214â€“222 (2020)
M.H.E. Ali, I.B. Taha, Channel state information estimation for 5G wireless communication systems: recurrent neural networks approach. PeerJ Comput. Sci. 7, e682 (2021)
M.H.E. Ali, et al., Machine learningbased channel state estimators for 5G wireless communication systems (2022)
G.S.D.S. Gomes, T.B. Ludermir, Optimization of the weights and asymmetric activation function family of neural network for time series forecasting. Expert Syst. Appl. 40(16), 6438â€“6446 (2013)
Y. Singh, P. Chandra, A class+ 1 sigmoidal activation functions for FFANNs. J. Econ. Dyn. Control 28(1), 183â€“187 (2003)
W. Duch, N.J.N.C.S. Jankowski, Survey of neural transfer functions. Neural Comput. Surv. 2(1), 163â€“212 (1999)
G.S. da S. Gomes et al., Comparison of new activation functions in neural network for forecasting financial time series. Neural Comput. Appl. 20(3), 417â€“439 (2011)
W. Xu, et al. Joint neural network equalizer and decoder, in 2018 15th International Symposium on Wireless Communication Systems (ISWCS) (IEEE, 2018)
M. Anbar, et al. Iterative SCFDMA frequency domain equalization and phase noise mitigation, in 2018 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) (2018)
T. Zia, U.J.I.J.O.S.T. Zahid, Long shortterm memory recurrent neural network architectures for Urdu acoustic modeling. Int. J. Speech Technol. 22(1), 21â€“30 (2019)
A. Graves et al., A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855â€“868 (2008)
Ong, T., Facebookâ€™s translations are now powered completely by AI. The Verge. https://www.theverge.com/2017/8/4/16093872/facebookaitranslationsartificialintelligence (2017)
Y. Wu, et al., Google's neural machine translation system: Bridging the gap between human and machine translation (2016)
M.H.E. Ali, A.B. AbdelRaman, E.A.J.I.A. Badry, Developing novel activation functions based deep learning LSTM for classification. IEEE Access 10, 97259â€“97275 (2022)
D. Hendrycks, K.J.A.P.A. Gimpel, Gaussian error linear units (gelus) (2016)
A. Farzad, H. Mashayekhi, H. Hassanpour, A comparative performance analysis of different activation functions in LSTM networks for classification. Neural Comput. Appl. 31(7), 2507â€“2521 (2019)
S.S. Sodhi, P.J.N. Chandra, Bimodal derivative activation function for sigmoidal feedforward networks. Neurocomputing 143, 182â€“196 (2014)
D.L. Elliott, A better activation function for artificial neural networks (1993).
K. Hara, K. Nakayamma. Comparison of activation functions in multilayer neural network for pattern classification, in Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94) (IEEE, 1994)
E. Dogo, et al. A comparative analysis of gradient descentbased optimization algorithms on convolutional neural networks, in 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS) (IEEE, 2018).
L. Wang et al., Optimal parameters selection of back propagation algorithm in the feedforward neural network. Eng. Anal. Bound. Elem. 151, 575â€“596 (2023)
C. Nwankpa, et al., Activation functions: comparison of trends in practice and research for deep learning (2018)
S.R. Dubey, S.K. Singh, B.B.J.A.P.A. Chaudhuri, A comprehensive survey and performance analysis of activation functions in deep learning (2021)
M. Abou Houran, et al., Developing novel robust loss functionsbased classification layers for DLLSTM neural networks (2023)
A. Apicella et al., A survey on modern trainable activation functions. Neural Netw. 138, 14â€“32 (2021)
ITUR, R., Guidelines for Evaluation of Radio Transmission Technologies for IMT2000 (1997)
X. Cheng, et al., Channel estimation and equalization based on deep blstm for fbmcoqam systems, in ICC 2019â€“2019 IEEE International Conference on Communications (ICC) (IEEE, 2019)
Funding
Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB). Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).
Author information
Authors and Affiliations
Contributions
All of the authors of this research paper took part in planning, carrying out, and analyzing the study. They have all read and approved the final version that was sent in.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
The contents of this manuscript have not been copyrighted or published previously; the contents of this manuscript are not now under consideration for publication elsewhere; the contents of this manuscript will not be copyrighted, submitted, or published elsewhere while acceptance by the journal is under consideration; and there are no directly related manuscriptsÂ or abstracts, published or unpublished, by any authors of this paper.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mohamed, M.A., Hassan, H.A., Essai, M.H. et al. Modified state activation functions of deep learningbased SCFDMA channel equalization system. J Wireless Com Network 2023, 115 (2023). https://doi.org/10.1186/s13638023023264
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13638023023264