Autoencoder-bank based design for adaptive channel-blind robust transmission

The idea of employing deep autoencoders (AEs) has been recently proposed to capture the end-to-end performance in the physical layer of communication systems. However, most of the current methods for applying AEs are developed based on the assumption that there exists an explicit channel model for training that matches the actual channel model in the online transmission. The variation of the actual channel indeed imposes a major limitation on employing AE-based systems. In this paper, without relying on an explicit channel model, we propose an adaptive scheme to increase the reliability of an AE-based communication system over different channel conditions. Specifically, we partition channel coefficient values into sub-intervals, train an AE for each partition in the offline phase, and constitute a bank of AEs. Then, based on the actual channel condition in the online phase and the average block error rate (BLER), the optimal pair of encoder and decoder is selected for data transmission. To gain knowledge about the actual channel conditions, we assume a realistic scenario in which the instantaneous channel is not known, and propose to blindly estimate it at the Rx, i.e., without any pilot symbols. Our simulation results confirm the superiority of the proposed adaptive scheme over existing methods in terms of the average power consumption. For instance, when the target average BLER is equal to 10-4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10^{-4}$$\end{document}, our proposed algorithm with 5 pairs of AE can achieve a performance gain over 1.2 dB compared with a non-adaptive scheme.

end-to-end modeling of the physical layer of communication systems [5].In particular, such a setup is enabled to optimize Tx and Rx without being limited to conventional component-wise optimization methods, and hence, moving away from carefully optimized sub-blocks to adaptive and flexible NNs [6].Using offline data-set, AEs can be trained and optimized for a practical communication system, and this architecture can outperform the conventional separable design of the physical layer of such systems.
Owing to these benefits, a number of studies on employing AEs in the physical layer of communication systems has been reported [6][7][8][9][10].Particularly, the idea of end-to-end learning of communication systems through deep NN-based AEs has been applied to an orthogonal frequency division multiplexing (OFDM)-based communication system in [6].Moreover, the authors in [7] investigated the problem of joint source and channel coding of structured data (i.e,natural language) over a noisy channel and attained lower word error rates by developing an AE-based system.A new AE-based peak-to-average power ratio reduction scheme has been proposed in [8].The authors in [9] developed an AE-based deep learning architecture to model a multiuser single-input multiple-output communication system.The work in [10] employs an AE to find proper constellations and corresponding receiver when a radar system coexists with the other interfering wireless systems.
Furthermore, there exist some studies on DL-based wireless transmission systems in which different functionalities of physical layer have been modeled and investigated as a deep NN (DNN).In this regard, ref [11] presents an overview of physical layer DL and the state of the art for fifth-generation of wireless and beyond systems.Meanwhile, the potential of DL approaches to address problems in the physical layer has been shown in several recent studies.More precisely, a novel method for synthesizing new physical layer modulation and coding schemes for communications systems using a learning-based approach is proposed in [12].The dynamic interference channel in a communication system has been investigated in [13], modulation recognition has been studied in [14], radio fingerprinting has been evaluated in [15], and medium access control mechanisms have been studied in [16].Furthermore, the authors in [17] investigated mobile edge computing networks for intelligent internet of things (IoT), where multiple users have some computational tasks assisted by multiple computational access points.Accordingly, a system is devised by proposing an intelligent off-loading strategy in which the deep reinforcement learning algorithm is used to automatically learn the optimal offloading strategy.An NN is also trained to predict the offloading action, where the training data is provided by the environment.Also in [18], a DL-based ultra reliable multi-user multiple-input multiple-output (MIMO) detector for 5G enabled IoT is proposed, where the system is operating in interfering environments correlated over the time or frequency domain.To this end, an iterative detection framework including a conventional symbol-by-symbol detector and a deep convolutional neural network (DCNN) is utilized, where the DCNN is used to suppress the interfering signals by capturing their characteristics through deep learning.
Nevertheless, most of these prior works assume an exact mathematical channel model to perform training.More precisely, in an end-to-end communication system, the channel is considered as a layer in the NN.Thus, to backpropagate error during the training phase, the AE needs to know the gradient of the channel transfer function.Moreover, to capture the maximum end-to-end performance, considered channel model in the training phase must match the actual channel model in the online transmission phase.This imposes a major limitation on employing AE-based approaches to achieve maximum end-to-end performance of a communication system when the actual channel varies over the time.To cope with this problem, prior works have proposed different online training methods based on measured data during online transmission.For instance, the work in [2] considers training the DL-based system using a channel model, and then fine-tuning the Rx with measured data.However, fully capturing of end-to-end performance is not possible since no fine-tuning is done at the Tx side in this approach.Moreover, the authors in [19] approximate the loss function gradient with respect to the Tx parameters and develop an alternating algorithm for end-to-end training without channel model knowledge.This algorithm iterates between two phases: (i) training of the Rx using the true gradient of the loss, and (ii) training of the Tx based on an approximation of the loss function gradient.However, this method takes more samples to converge and is relied on a two phase time-consuming training paradigm over online transmission, thus decreasing the link availability.
To mitigate the need for undergoing complex online training over actual channels as well as to obtain the maximum end-to-end performance of a communication system, we propose a robust adaptive scheme for data transmission over a random channel with no specific mathematical model using an AE bank.For online transmission, we assume a realistic scenario where the instantaneous channel gain is not known to Tx/Rx.Thus, we need to estimate the channel gain at the RX and feed it back to the Tx.Then, based on the actual channel conditions in the online transmission phase, the pair of encoder and decoder that satisfies the system average block error rate (BLER) constraint is selected for data transmission.To increase bandwidth efficiency, as well as to avoid data framing at the Tx, we propose a method to estimate the channel blindly, i.e., without using any pilot symbols.We then compare the proposed blind method for channel estimation with existing methods in terms of average BLER of the system.It is worth mentioning that, in this paper, the term "robust" indicates that, by using the proposed adaptive scheme, the performance of the system will not be affected by channel variations over the time, and hence, the communication system can deliver solid performance during transmission.
Our major contributions can be summarized as follows: • A robust AE-based transmission scheme consists of n pairs of AE, each of which corresponds to a specific sub-interval of possible values of channel coefficients, is proposed.Considering the instantaneous channel state, the AE that satisfies the best BLER is selected for data transmission in the online transmission phase.• In a realistic scenario where the instantaneous channel gain is not known to Tx/Rx, a bandwidth-efficient blind channel estimation is proposed which avoids any pilot transmission.Therefore, the proposed scheme does not impose additional overhead that arises in prevalent pilot-based schemes.• We evaluate performance of the proposed adaptive scheme in terms of the required number of encoders and decoders and also the average power consumption to satisfy a BLER constraint for data transmission.In this regard, we seek to balance an inherent tradeoff between the deployment cost (represented by the required number of encoders and decoders), and the system performance (represented by the average power consumption and target BLER).
The rest of this paper is organized as follows.Section 2 briefly presents our method for modeling and evaluating the considered DL-based system.Section 3 describes our system model including the end-to-end communication system and the channel estimation methods.In Sect.4, we present our adaptive transmission scheme.In Sect.5, we present the numerical results of the proposed adaptive scheme and system performances.Finally, we conclude the paper in Sect.6.

Methods
The inspiration for an end-to-end deep learning model, also known as, AE , is rooted in the functioning of the proposed method in this study.More precisely, Fig. 1 represents the layered structure of the end-to-end deep learning-based communication system modeled as an AE.Accordingly, without relying on an explicit channel model, we propose an adaptive scheme to increase the reliability of an AE-based communication system over different channel conditions.During the training process, the input symbols, modeled as one-hot vectors, go through the AE where the weights of the neural nodes are initialized with random values.After that, the weight vectors of the nodes will be tuned.The main training goal is to minimize the loss function and maximize the accuracy of the whole process.Using TensorFlow framework [20], the performance of the proposed DL-based method is numerically evaluated.TensorFlow is an end-to-end open source platform for machine learning.It has a comprehensive and flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in learning and provides some facilities for the developers to easily build and deploy ML powered applications as well [20].Accordingly, one can build and train machine learning models easily using intuitive high-level APIs like Keras with eager execution, which makes for immediate model iteration and easy debugging.As a result, to model and Fig. 1 The layered structure of an end-to-end deep learning-based communications system modeled as an autoencoder simulate the proposed design as well as to build and train the deep neural network, we use Keras [21] with TensorFlow in its back-end.Moreover, the set of parameters for simulation is provided in Tables 1 and 2.
3 System model

End-to-end communication system (autoencoder)
As thoroughly expressed in [2] and depicted in Fig. 1, an AE describes a NN that is trained to reconstruct the input at the output.As the information must pass each layer, the AE will have to find a robust representation of the input message at every layer.Here, we assume a DL-based communication system modeled as an AE.Particularly, the AE includes three main blocks, namely, the Tx, the Rx, and channel, as shown in Fig. 1.The input message s ∈ M = {1, 2, ..., M} has been received by the Tx.Here, we have M = 2 b where b is the number of bits per message.Before entering to the dense layers, the input message is transformed into a one-hot vector s m of dimension M which consists of a single element equal to "1" in position m, whereas all other elements are equal to "0".The one-hot vector is then sent to the hidden dense layers, where at each layer an activation function is applied individually to each element of the input vector.Table 3 shows some commonly used activation functions in the dense layers of an AE.After passing the one-hot vector through the multiple dense layers at the Tx, the transmitted signal ] is formed for L discrete channel uses.Accordingly, for the considered setup, the data rate is defined as r = b L (bit/channel use).It is worth noting that, given a certain power constraint and for a feasible rate r, the AE-based system that is trained to minimize a loss function, can automatically build zero-error codes (which is the case in our considered model).However, due to the problems such as vanishing and exploding gradients [22], there may be no guarantee for finding an optimal capacity-achieving code especially in deep NNs.Therefore, the problem of designing capacity-approaching codes is an interesting future research topic in this domain and is beyond the scope of this work.
Furthermore, the Tx last layer normalizes the transmit vectors to guarantee that the average energy per symbol is equal to a predefined value The channel is implemented by including both fading and noise layers whose output y = [y [1], . . ., y[L]] , i.e., a noisy and distorted version of x , is given by where h is the channel gain 1 , and w ∼ N 0, σ 2 L I L is a zero-mean additive white Gauss- ian noise (AWGN) vector with noise variance σ 2 L .At the Rx side, the received signal y is passed through multiple dense layers to reach the last layer.Accordingly, at this layer, a softmax activation function is utilized where its output consists of an estimate of the corresponding posterior probability vector p ∈ R M over all possible messages.The index of the element of p with the largest value is returned to estimate the transmitted message ŝ based on the maximum a posteriori (1) y = hx + w, (MAP) criterion.Moreover, we utilize the mean square error (MSE) as the loss function of the training process.This way, the loss function is obtained as Also, the BLER of considered setup is obtained as where p s is the a priori probability of transmission message s.Since the message prob- ability distribution is commonly assumed to be uniform, we have p s = 1 M .

Challenge in training over physical channel
To fully exploit the end-to-end performance of an AE-based communication system, channel model in the training phase must match the actual channel over which the system is supposed to communicate.Nevertheless, for an actual system, the channel is not perfectly known and varies over the time.Therefore, an AE that is trained over a specific realization of the channel may not deliver the expected performance with the change in channel conditions.Thus, AE should be re-trained from scratch for each new channel condition in order to minimize the loss function and BLER which is a time-consuming process that greatly restricts the link availability and reliability.In this paper, we propose to obviate this practical limitation by employing an AE bank consisting of multiple pairs of trained encoder and decoder for different channel conditions.Subsequently, regarding the actual channel state in the online transmission phase, one pair of trained encoder and decoder is selected for data transmission.To this aim, we estimate the channel gain at the Rx and feed it back to the Tx.The details of the proposed adaptive scheme are provided in Sect. 4. To monitor actual channel conditions for adopting the adaptive scheme, and depending on whether pilot symbols are used or not, two methods for estimating the channel gain of the considered AE-based communication system are presented in the sequel.

Channel estimation using pilot symbols
We assume that the pilot symbol s ′ is transmitted.Therefore, the channel output y ′ cor- responding to the encoded signal x ′ is given by Given x ′ , one can obtain an estimate of the channel gain, h, by applying the maximum- likelihood (ML) criterion as where �•,•� denotes the dot product.Although estimating the communication channel via pilot symbols results in an accurate estimation, adopting this approach requires data (2) L(s m , p) = �(s m − p)� 2  2 . ( = hx ′ + w. ( , framing at the Tx side which leads to the loss of data rate and increases the required channel bandwidth.In the sequel, we propose a near optimal method for channel estimation without using pilot symbols.

Blind channel estimation
To increase the bandwidth efficiency by avoiding pilot symbols and, at the same time, to decrease the system complexity by avoiding data framing at the Tx, here, we tend to estimate the channel in a blind way, i.e., without using any pilot symbols.For this purpose, we assume that the encoded symbol is accumulated during an observation window composed of K intervals.As a result, at each interval k, k ∈ {1, . . ., K } , the received signal (or equally the channel output) is obtained as Squaring both sides of (6)  .Therefore, τ y has a Gaussian distribution with a mean equal to h2 and vari- ance σ 2 .By using the ML criterion, a blind estimate of the channel is attained as Moreover, by employing a buffer at the Rx, the communication system can support a real-time decision-feedback channel estimation process. 2

Adaptive transmission scheme
In this section, an adaptive transmission scheme is proposed for the AE-based communication system to increase the system reliability over different channel conditions.Meanwhile, we aim to minimize the number of trained encoder-decoder pairs required for the communication link with the average BLER and transmit power constraints (6) under different channel conditions.Figure 2 depicts the four-step structure of our proposed adaptive DL-based transmission scheme.Each step is described as follows.
The first step is offline training.Different pairs of the encoder and decoder in the AE bank are trained offline over different channel conditions.Since the AE-represented communication system should be applicable for any type of channel without a tractable mathematical model, in this paper, we assume there is no channel model information.As a result, the channel fading block acts as a random gain block where its input is multiplied by a random number (i.e., the instantaneous channel gain) to produce the output 3 .We divide the interval of channel gains, h = [h min , h max ] , into n sub-intervals.There- fore, the channel gain interval is obtained as Accordingly, in the training phase, we have n fading blocks for which the gain of each block is randomly selected from its associated sub-interval.Then, AE i (the ith pair of encoder and decoder for each sub-interval, i ∈ {1, . . ., n} ), is trained under the assump- tion that the channel gain in the channel layer of AE i lies in (h i − 1, h i ] .After training, the trained encoders and decoders are employed with fixed parameters (i.e, the input and output weights and bias of the neurons remain constant during online transmission).As a result, each trained pair is optimized for a specific channel condition and is used when the practical channel conditions are within the same interval as the one used in the training.Note that, by training the system for each sub-interval, the associated encoder carefully learns a robust representations x of the different symbol s regarding to (11)

. , (h n-1 , h max ]]
n sub-intervals .Fig. 2 Four-step structure of the proposed adaptive transmission scheme for the DL-based communication system 3 Although this is indeed an appealing theoretical idea to model the end-to-end system as a whole learning system, its biggest drawback impeding practical implementation is that the gradient of the instantaneous channel transfer function should be known [2].Generally, there is no tractable mathematical model in a real-world communication system.More precisely, in the context of AE, the actual channel is generally considered as a black box for which only the inputs and outputs can be observed.Here, we just need to perform some simple measurements at different time intervals to determine the range of the channel gain variations.
the possible distortions created by the channel at that interval.Therefore, the whole system is expected to deliver a robust performance over a wide range of channel conditions.
The second step is channel estimation during online transmission.As we mentioned in Sect.3.2, the Rx can perform channel estimation by employing two methods, i.e., using pilot symbols and blind estimation.For estimating the channel via pilot symbol, the Tx should enclose transmit data in different frames and insert pilot symbols into the frames which results in less bandwidth efficiency.To avoid this, the Rx can also perform blind channel estimation over an observation window as thoroughly discussed in Sect.3.2.2.
In the third and fourth steps, first, the estimated channel gain at the Rx, ĥ , is fed back to the Tx.Next, if ĥ lies in the ith sub-interval, the ith pair of encoder and decoder is selected for sending and receiving data.We summarize the main steps of the proposed adaptive transmission scheme in Algorithm 1.
be noted that performance of the proposed adaptive scheme comes at the expense of using multiple pairs of encoder and decoder instead of using one pair which increases the deployment cost.Clearly, this price should be paid for being agnostic to the actual channel during training phase.This gives rise to a natural question: what is the minimum number of pairs of encoder and decoder which satisfies a target level of performance for the considered system.In the sequel, we propose an answer to this question by evaluating the performance of the proposed adaptive scheme in terms of average power consumption and deployment cost to fulfill a predetermined average BLER.More precisely, we impose a target average BLER, P t e , as a constraint for the system, and minimize the number of encoder and decoder pairs (or equally the number of sub-intervals n) to satisfy the BLER constraint under different channel conditions.Hence, the optimization problem can be formulated as Note that, in the optimization problem (12), the minimum number of encoder and decoder pairs should be found under different channel conditions and average transmit power.From the system performance's perspective, the more encoders and decoders, the (12a) more is the system which is designed for different channel conditions.Therefore, Average BLER Fig. 3 Average BLER versus SNR when the instantaneous channel gain, ĥ , is estimated by using two methods, i.e., pilot symbol method, and blind estimation method.The parameters for the AE are set as follows: M = 16 , L = 7 , and n = 5 finding the optimum number of encoders and decoders balancing an inherent tradeoff between the cost of system and desirable performance.tradeoff is thoroughly studied in Sect. 5.

Numerical results and
In this section, we evaluate the numerical results of the proposed adaptive transmission scheme in the DL-based communication system.We use Keras [21] with TensorFlow [20] in its back-end in order to build and train our deep NN.For training, we use a variant of stochastic gradient descent (SGD) known as Adam with widely accepted thumb rule for the parameter values as follows, learning rate η = 0.001 , β 1 = 0.9 , and β 2 = 0.99 [2].Also, the set of parameters for each pair of encoder and decoder (the AE) is provided in Table 1.
We first compare the proposed blind method for channel estimation with pilot estimation in terms of average BLER.To this aim, we plot the average BLER of the considered system when different methods for channel estimation are employed in Fig. 3.The results of this figure are obtained by assuming M = 16 and L = 7 , thus r = 4  7 .Moreover, in Fig. 3, we assume a communication system employing binary phase-shift keying (BPSK) modulation and a Hamming (7,4) code with either binary hard-decision decoding or soft decoding against the BLER achieved by the trained AEs as a baseline system for comparison.From the results of this figure, first, we can observe that given the same information transmission rate, r = 4  7 , the performance of the DL-based system is better than that of communication system employing Hamming code.It is worth mentioning that, the DL-based system does not employ any error control coding approach for the noisy channel, and it still outperforms a classical communication system that utilizes error control schemes.Second, for the considered rate in our setup, the proposed blind method for channel estimation achieves an acceptable level of accuracy (SNR gap less than 0.3 dB for L = 7 in our setup) compared with the pilot estimation.Thus, proposed blind estimation method can be applied in our considered adaptive scheme.
To investigate the tradeoff of finding the optimum number of encoder and decoder over different channel conditions, we have presented curves of average BLER as a function of transmit power for different number of sub-intervals, n, and for the case of an AE with M = 16 , and L = 7 in Fig. 4. Firstly, as expected, our considered system gives its worst performance in terms of average BLER when one pair of encoder and decoder is used for all channel conditions, i.e., when n = 1 (non-adaptive scheme).Indeed, this worst performance highlights the necessity of employing a robust transmission scheme  over different channel conditions.Subsequently, by applying our adaptive scheme, one can that the performance of the considered improves by increasing n.On the other hand, from a specific onwards (e.g., n = 8 in our setup), increasing the of sub-intervals (or equally the number of pairs of encoders and decoders), does not necessarily improve the system performance.This can be justified by the fact that, in this case, the n pairs of encoder and decoder are trained over closed sub-intervals (i.e., nearly the same channel conditions) as the number of these sub-intervals increases.Hence, when the sub-interval is short, those pairs of encoder and decoder trained over close sub-intervals deliver the same performance when they are used in actual channel conditions.Hence, the improvement of performance is negligible in this situation.Finally, as it is depicted in Fig. 4, the minimum value of average transmit power, as well as the minimum number of pairs of encoder and decoder can be obtained to satisfy the average BLER constraint in Eq. (12).For instance, as we can observe from Fig. 4, for a target average BLER equal to 10 −4 , our proposed algorithm with n = 5 can achieve a performance gain over 1.2 dB in terms of average power consumption compared with a non-adaptive scheme.
Finally, to better evaluate the performance and effectiveness of the proposed scheme, we have carried out another evaluation by contrasting the performance of our proposed scheme with the proposed scheme in [2], referred as fine tuning, 4 under a practical optical wireless scenario.More precisely, we consider a free space optical channel under moderate atmospheric turbulence regime, which has been experimentally proven to have a log-normal distribution [23], and compare the performance of our proposed scheme with the fine tuning method.We note that, since optical wireless channel is quasi-static [24], the channel can be estimated with good accuracy and fed back to the transmitter, thus making it an appealing practical case for employing our proposed AE-based communication systems.From the result of Fig. 5, one can readily observe that our proposed scheme outperforms the fine tuning approach.The reason for the enhancement is that, compared to [2] which uses an unchangeable strategy over different channel conditions (or equally, it fine tunes the resulting in sub-optimal performance), our proposed scheme is able to use different encoder to generate a robust representation of massages for different channel conditions (messages that will be least affected by random channel variations).Hence, the effect of channel uncertainties on the performance of the learning-based systems is decreased, and the proposed scheme outperforms the fine tuning method with more than 2dB margin in the power consumption.

Conclusion
In this paper, we proposed an adaptive scheme to increase the reliability of an AE-based communication system over different channel conditions.Accordingly, we divided the interval of random channel gains into n sub-intervals and assigned n pairs of encoder and decoder to each interval.The encoders and decoders are trained offline, and, regarding the actual channel state in the online transmission phase, one pair of trained encoder and decoder is selected for data transmission.To this aim, we estimated the channel gain at the Rx and fed it back to the Tx without using any pilot symbols.We showed that, compared with a non-adaptive scheme, by using the proposed adaptive method, the DLbased system can deliver a robust performance in terms of average BLER over channel conditions.Also, it is observed that our proposed adaptive scheme can achieve a performance gain in terms of average power consumption to achieve the same average BLER as existing methods.

Fig. 4
Fig. 4 Average BLER versus p t for different numbers of sub-intervals n.The parameters for the AE are set as follows: M = 16 , L = 7