A deep learning-aided temporal spectral ChannelNet for IEEE 802.11p-based channel estimation in vehicular communications

In vehicular communications using IEEE 802.11p, estimating channel frequency response (CFR) is a remarkably challenging task. The challenge for channel estimation (CE) lies in tracking variations of CFR due to the extremely fast time-varying characteristic of channel and low density pilot. To tackle such problem, inspired by image super-resolution (ISR) techniques, a deep learning-based temporal spectral channel network (TS-ChannelNet) is proposed. Following the process of ISR, an average decision-directed estimation with time truncation (ADD-TT) is first presented to extend pilot values into tentative CFR, thus tracking coarsely variations. Then, to make tentative CFR values accurate, a super resolution convolutional long short-term memory (SR-ConvLSTM) is utilized to track channel extreme variations by extracting sufficiently temporal spectral correlation of data symbols. Three representative vehicular environments are investigated to demonstrate the performance of our proposed TS-ChannelNet in terms of normalized mean square error (NMSE) and bit error rate (BER). The proposed method has an evident performance gain over existing methods, reaching about 84.5% improvements at some high signal-noise-ratio (SNR) regions.


Introduction
Vehicular communications, which form a network to support vehicle-to-vehicle (VTV) and vehicle-to-infrastructure (VTI) communications, are essential techniques of intelligent transportation system (ITS). In recent years, lots of attention has been drawn to develop multiple applications in vehicular communications such as automatic selection of routing protocol [1]. To realize such high-speed mobile communications, the IEEE 802.11p standard [2], that defines the physical layers (PHY) and the medium-access layers (MAC), has been officially applied in 2010. The IEEE 802.11p is a modified version of 802.11a [3]. The main difference between them is that 802.11p facilitates the half of frequency bandwidth of 802.11a, thus making signals more robust against fading and • CFR is modeled as an image. The pilot is considered as a LR version of the image.
The estimated CFR is viewed as a HR version of the image. Then, the TS-ChannelNet, which includes pilot-based interpolation and DL-based restoration, is presented to obtain HR version of CFR.
• An improved interpolation based on DD-TT called ADD-TT is taken to extend pilot into reasonable initial coarse CFR. ADD-TT handles few impacts of error propagation by time truncation based on decision feedback and further improves the performance of the follow-up SR-ConvLSTM. • The new super resolution technique-based architecture named SR-ConvLSTM is designed. It restores HR version of CFR by reflecting highly variations of channel.
• The extensive ablation experiment is conducted to verify that SR-ConvLSTM powerfully extract temporal spectral correlation of signal to track the variations of channel.
The rest of this paper is organized as follows. Section 2 illustrates related work in details. Section 3 introduces the system model, channel model, and benchmark algorithm. Section 4 presents our temporal spectral deep learning-based channel estimation scheme. Section 5 verifies the full advantage of TS-ChannelNet by simulation results. Section 6 concludes the paper.

Related work
In this section, the existing work of CE under vehicular communications using 802.11p standard is first elaborated. The downside of present work is then introduced. Furthermore, DL applied in the communication field, as a promising prospect, is investigated.
In few years, mobile ad hoc network (MANET) has successfully applied in amounts of field, such as health care [15,16], broadcast encryption [17], vehicular streaming service [18], and urban management [19]. CE has been investigated actively because it decides the performance of the system in PHY layer [7]. The current CE focuses on DPAS method, such as STA [6] and CDP [7]. The key part of these algorithms is to consider the demapped data signals as aided pilot. Then, the estimated CR is iteratively used to construct data pilot in the follow-up orthogonal frequency division multiplexing (OFDM) symbol. Mehrabi [9] introduced decoded data bits into DPAS to suppress noise caused by demodulation, but the performance gain is still marginal at high SNR region. To further improve accuracy of CFR, Awad [8] transformed CFR into time domain and performed truncation operation, thus removing demodulation errors. However, because iterative accumulated noise is not eliminated completely, these schemes still suffer from error propagation especially in the rapid time-varying vehicular channels.
Compared to the conventional schemes, DL has been shown to extract powerfully the inherent characteristic of signals [20] and thus has been qualified when overcoming multiple problems in wireless communications field [21][22][23][24][25]. FCNN was utilized into channel estimation and pilot design [11,12]. It initially demonstrates the powerful ability of DL to increase improvement in the accuracy of CE. However, the scheme is not fit for vehicular communications using 802.11p. Because the unlimited packet length in 802.11p leads to increase rapidly in the number of neurons and thus FCNN tends to overfit. Neumann [13] modeled channel as conditionally Gaussian distribution given a set of random hyperparameters. Those hyperparameters are learned via convolutional neural network (CNN). Soltani [14] viewed channel estimation as an image super resolution problem where the pilot was a low-resolution sampled version of the channel and time-frequency CR was the image to be recovered. But the performance of the method still degrades under fast time-variant environment. The goal of this paper is to integrate DPAS with DL to track variations of channel, thus estimating CFR precisely.

System model
In this section, the structure of IEEE 802.11p under vehicular communications is first presented. Then, the channel model for vehicular wireless environment employed in this paper is briefly introduced. Subsequently, ChannelNet applied as benchmark algorithm is elaborated.

Structure of IEEE 802.11p
IEEE 802.11p physical layer is based on OFDM which boosts spectrum utilization by turning serial large data streams into parallel data streams on orthogonal subcarriers. In 802.11p, the received signal is turned into parallel data for fast Fourier transformation (FFT) input, thus obtaining follow-up output in the frequency domain.
where Y (t, k) and X(t, k) represents received, transmitted OFDM data symbols using FFT respectively, H(t, k) represents the CFR of the wireless channel, and Z(t, k) is added white Gaussian noise (AWGN). t represents the index of length per frame with 1≤ t ≤ T. T is the number of length per frame. k denotes the index of subcarriers per frame with 1≤ k ≤ K. K is the number of subcarrier per frame. How to estimate H more accurately is the goal of this paper. IEEE 802.11p defines 75 MHz band at 5.9 GHz. The 75 MHz bandwidth is divided into 7 channels including one control channel (CCH) and six service channels. Safety messages are transmitted through CCH when emergent events happen [26]. IEEE 802.11p standard defines that pilot tones for channel estimation is comb structure. It is located on subcarriers -21, -7, 7, and 21 as Fig. 1 is shown. The initial channel estimation is enabled by utilizing the known training symbols transmitted of the preamble. Due to the highly timevarying channel in vehicular environments and the fact that the frame length is unlimited in the IEEE 802.11p standard, the channel estimation for each packet outdates easily over the entire packet duration. Therefore, how to design a channel estimation scheme to track variations under vehicular channel is a challenging problem.

Channel model for vehicular communications
Due to the relative motion of the transmitter and receiver, a Doppler spectral spread or broadening appears under vehicular communication. The relatively high velocity causes fast time-varying CR. To capture the joint Doppler-delay characteristics of vehicular communications, the tapped-delay line (TDL) model is adopted following the parameter of [27]. In [27], taps are characterized by Doppler power spectral density due to Rayleigh fading. The channel impulse response is calculated as (2) where φ l (t) represents the fading coefficient, L denotes resolution multipath, δ is impulse function, and τ l (t) denotes time delay in lth path. In this paper, three representative models are given as in [27], i.e., VTV Expressway Oncoming (VTVEO), VTV Urban Canyon (VTVUC) Oncoming, and RTV Expressway (RTVE). In the VTV Expressway Oncoming scenario, the moving speed of the receiver and the transmitter is the highest compared to the other scenarios. Its speed is 100km/h and its Doppler shift is about 1200Hz. Then, the VTV Urban Oncoming is the medium challenging environment for channel estimation. Its Doppler shift is 400-500 Hz with about 32 km/h moving velocity. In conclusion, the models presented are typical standard vehicular environments that consist of different velocities (low velocity/high velocity), and a Doppler shift ranged from 400 to 1200 Hz.

Benchmark algorithm: ChannelNet
In [14], a deep learning-based channel estimation scheme named ChannelNet was implemented for the short length of frame in slow time-varying environment. By viewing CR as images, the pilot values were utilized via image super resolution technique to restore (estimate) CR.
The process of ChannelNet consists of two phases. On the one hand, the isolated pilot values are extended to initial CR via Gaussian interpolation. On the other hand, CR values as input are fed into super resolution neural network (SRCNN) [28] followed by denoising convolutional neural network (DnCNN) [29]. The NN generates the estimated CR. The authors investigate the performance of ChannelNet in relatively slow time-varying environment. In our experimental trial, ChannelNet furthermore degrades for high-velocity mobile communications. This is owing to the unreliability of initial interpolation method, coupled with the fact that CNN does not have enough capacity to uncover temporal spectral correlation of the CR, thereby keeping CR outdated over the frame duration.

Proposed method
In this section, we first describe the pre-process of TS-ChannelNet. It utilizes interpolation scheme based on ADD-TT via pilot values. Then, the NN architecture named SR-ConvLSTM is presented to track variations of vehicular channel. Afterwards, the training process of TS-ChannelNet that is made up of ADD-TT following by SR-ConvLSTM is illustrated.

Interpolation based on ADD-TT
In this subsection, interpolation based on pilot via ADD-TT is implemented to obtain coarse CR. It extends few pilot values to initial CR values that are taken as IR images.
Usually, least squares (LS) estimation utilizes two identical preambles which are sent at the beginning of received packet in IEEE 802.11p to estimate tentative CR. Y (1, k), Y (2, k) are the first two long training symbols. X(1, k), X(2, k) are identical and two transmitted predefined long symbols in the frequency domain. To obtain CFR for all subcarriers, the received Y (1, k) and Y (2, k) are divided by X(1, k) aŝ whereĤ LS (1, k) represents the LS channel estimate at the 1th time slot on the kth subcarrier. LS estimation assumes the channel is stationary. However, vehicular channel varies fast and the performance of LS estimation degrades significantly. Then, decision-directed channel estimation is presented. It is based on correlation of adjacent symbols. The symbols are equalized by previous channel estimation as followŝ whereŜ(t, k) denotes equalized symbol at the tth time slot on the kth subcarrier and H(t − 1, k) is the previous channel estimation. Based on the high correlation between adjacent data symbols, the current tth CFR is assumed to be unchanged with respect to the previous. The errors caused by such assumption are alleviated by the subsequent demodulation. Hence, the previousĤ(t − 1, k) is utilized to estimate. Note that the first estimated CR is LS channel estimation using (2). Then, the decision feedback is used to update channel estimate according to (5) whereX(t, k) represents the demodulated OFDM data symbol that stems fromŜ(t, k). The errors of estimated CFR are alleviated by demappingŜ(t, k) to the corresponding constellation pointX(t, k). Thus, data symbols can provide useful channel information to construct data pilot. However,Ĥ(t, k) still cannot eliminate completely noise and accumulate error in iterative process, caused by error propagation, especially at low SNR region. The error propagation happens because the data symbols may be incorrectly demapped and thus the error is gradually accumulated during the iteration. To reduce such negative impact on decision-directed channel estimation, an average method based on time-domain truncation loop approach is applied. The scaled version of FFT matrix V is firstly calculated following by where 2β+1 represents the number of averaged subcarriers. The high correlation between adjacent subcarriersĤ f (t, k + λ) can be introduced to further improve the accuracy of the estimates. Then, averaging in time domain is calculated as follows, whereĤ f (t, k) denotes the output of ADD-TT scheme at the tth time slot on the kth subcarrier, and α is coefficient parameter to update CR. Based on the high correlation across successive OFDM symbols, the weighted summation of previous and current estimated CFR can improve the performance. α, β are parameters related to knowledge of the vehicular environments. However, it is impossible to obtain such information in practice. It is observed in [6] that the best performance of averaging in time and frequency domain is achieved with α = 0.5 and β = 2. Thus, α is fixed to 0.5 and β is set to 2 in this paper.

The architecture of SR-ConvLSTM
ChannelNet based on CNN is inept in uncovering the inherent characteristics of temporal spectral correlation, thus a NN architecture SR-ConvLSTM based on ConvLSTM is proposed. It models temporal spectral correlation of adjacent symbols to estimate CR and is suitable for non-stationary scenarios. Channel estimation of vehicular communications using IEEE 802.11p is viewed as super resolution problem. Considering the time-variant channel, LSTM that enables to extract time correlation of series is introduced to tackle super resolution problem. In [30], the authors prove LSTM successfully handles channel state information (CSI) feedback for time-varying communications. Adding a convolution operation to the LSTM composes of ConvLSTM. ConvLSTM is more effective for feature extraction when the time series data are images. The ConvLSTM [31] originates from LSTM. The difference is that after adding the convolution operation which not only obtain the timing relationship, but also to extract features such as convolution layers. In this way, we obtain the temporal spectral characteristics via SR-ConvLSTM based on ConvLSTM.
The details of proposed SR-ConvLSTM are presented. SR-ConvLSTM is composed of five layers including ConvLSTM and batch normalization (BN). Since this paper views channel estimation as an image super resolution problem, inspired by the architecture of [28], the structure of ConvLSTM following by BN is chosen and such structure is repeated to track high variations of CFR. ConvLSTM works for capturing temporal spectral correlation between adjacent data symbols and BN enables SR-ConvLSTM to converge. The specific structure is seen in Table 1. The first layer applies 64 filters of size 9 × 9 of where x is input of the ConvLSTM. When the activation value of the neuron enters the negative half region, the gradient is 0. That means this neuron is trained to keep sparsity. The second layer is BN. BN is able to solve the problem when the neural network is training with slow convergence speed or exploding gradients. In fact, we find out if BN is removed from SR-ConvLSTM, the network cannot be converged. The reason may lie in the complex distribution of channel that needs BN operation. In addition, BN is added to speed up the training speed and improve the accuracy of the model. The third layer uses 32 filters of size 1×1 of ConvLSTM following by ReLU activation. The fifth layer is BN. The last layer is 1 filter of size 5×5×5 to reconstruct the output. Notably, to strike balance between performance and complexity, TS-ChannelNet removes DnCNN compared to ChannelNet. The relationship between input and output of proposed SR-ConvLSTM is represented asĤ where θ denotes the parameters of SR-ConvLSTM,Ĥ is the final estimated CR, and f means nonlinear function that is determined by θ.
The architecture of ChannelNet must be revised if the frame lengths are changed. What is worse, the whole ChannelNet should be trained from scratch, which is non-trivial in practice. In SR-ConvLSTM, the CR is divided into blocks that contain n data symbols. Hence, SR-ConvLSTM fits for the arbitrary frame length without amending the input shape of NN. In conclusion, SR-ConvLSTM is more robust than SRCNN that is building blocks of ChannelNet.

Training of TS-ChannelNet
In this paper, estimating CFR at the receiver is viewed as a super resolution problem which includes pilot-based interpolation and DL-based restoration [14]. Thus, the proposed TS-ChannelNet is composed of ADD-TT and SR-ConvLSTM. In the first phase, pilot values h p are extended into the coarse CFR whose dimension is identical to estimatedĤ. In this second phase, SR-ConvLSTM parameterized by θ is utilized to make coarse CFR become HR version via DL. The relationship between the input and output of TS-ChannelNet can be represented by this equation: where f θ and f ADD−TT are the network and interpolated functions, respectively. ADD-TT in the first phase comprises decision-direction, time truncation, and weighted average. Firstly, decision-direction assumes that the tth CFR is highly correlated with the previous and thusĤ(t − 1, k) is used as pseudoĤ(t, k) to calculate data pilot. The errors caused by such iterative operation are alleviated via demapping data pilot to constellation point. Secondly, accumulate errors by wrong demodulation are equal to adding noise. Noise is uniformly distributed across the different taps from frequency domain to time domain [8] and operate truncation to curb it. Thirdly, to make use of pilot, averaginĝ H(t, k) in the frequency and time domain is taken into account. In general, ADD-TT utilizes average decision-directed time truncation to make pilot become coarseĤ.
SR-ConvLSTM in the second phase is introduced to restore HR version ofĤ. Initially, training SR-ConvLSTM needs to extract real and imaginary part ofĤ and stack them. Then, the stackedĤ is divided into several blocks to make SR-ConvLSTM reveal temporal spectral correlation. SR-ConvLSTM has impressive power to achieve intrinsic correlation of signal in a end-to-end manner. The stackedĤ is divided into several blocks. Finally, the output of SR-ConvLSTM is concatenated to obtain final estimated CFR. In addition, the optimization algorithm Adam [11] is chosen to make SR-ConvLSTM converge. To measure the accuracy of estimates, the normalized mean square error (NMSE) between H and H is utilized.
where N is the frame length. Besides, bit error rate (BER) is also chosen to demonstrate the performance of TS-ChannelNet. The algorithm of TS-ChannelNet is summarized in Algorithm 1.

Simulation and results
In this section, we first introduce the settings of the simulation, which includes the parameters of IEEE 802.11p and DL-based model. Then, the simulation results demonstrate the strength of our proposed TS-ChannelNet.

Simulation setup and parameters
The IEEE 802.11p end-to-end PHY is implemented for simulation. The NMSE and BER are taken as the performance measurement of the scheme. The range of SNRs for simulation is from 0 to 30 dB with 4 quadrature amplitude modulation (QAM). The velocities range from 32 to 104 km/h. Frame length with 60 blocks is chosen.  (0, k), H test (0, k) as described in (3) for t = 1 : T do Obtain (Ĥ train (t, k),Ĥ test (t, k)) via decision-direction by (4), (5) Update (Ĥ train (t, k),Ĥ test (t, k)) via time truncation as in (6) Tensorflow using graphics processing unit (GPU) is employed for our approach. The learning rate is 0.001 and the dropout is 0.2. The batch size is 128 and epochs are 60. The training size, validation size, and test size are 32000, 8000, and 4000 respectively. The two models are trained at the SNR values of 22 dB with above hyperparameters with respect to three different environments. The specific parameters of simulation are described in Table 2..

Results and discussion
Figures 2, 3, and 4 compare the performance of TS-ChannelNet and other schemes with maximum Doppler shift ranged from 300 to 1200 Hz. It is seen that the DD-TT outperforms ChannelNet at the high SNR region. Our presented scheme consistently has a better performance advantage than other approaches. This is because our proposed scheme estimates CR by integrating pilot knowledge, data knowledge, and the correlation of adjacent symbol. TS-ChannelNet is competent under high-velocity communication, which is challenging for real vehicular communication.
In Fig. 5, ideal BER is illustrated. Ideal BER is obtained with known of CR without noise. It is seen that the performance of our method is approaching the ideal situation, which means TS-ChannelNet can nearly accurately recover CR. It is obvious that TS-ChannelNet has a better performance as deep fading for vehicular communications become severer. Through the performance under representative vehicular models, we demonstrate our TS-ChannelNet is robust and has a evident performance in terms of BER or NMSE.
To further investigate our proposed method, an ablation analysis for fast time-varying environment is introduced. Due to the fact that Gaussian interpolation (GI) is utilized in ChannelNet, we take GI, DD-TT, and ADD-TT as interpolation methods in the first phase of TS-ChannelNet respectively while SR-ConvLSTM remains. We refer these approaches as GI-(SR-ConvLSTM), DD-TT-(SR-ConvLSTM), and ADD-TT-(SR-ConvLSTM). Besides, ChannelNet is taken as benchmark algorithm. Figure 6 plots the NMSE of TS-ChannelNet with different interpolation methods under high mobility scenario while ChannelNet is considered as a reference. It is clearly seen that TS-ChannelNet with GI outperforms ChannelNet with GI. It suggests that our proposed SR-ConvLSTM has better capacity to extract temporal spectral correlation of data  symbol than NN structure of ChannelNet. It is also observed that the different interpolation methods have effect on the performance of following SR-ConvLSTM. It proves that our proposed ADD-TT outperforms DD-TT, especially at high SNR values. With respect to the compared methods, the improved result of our method in percentage is also presented in Table 3. Under three representative channel models, this percentage is obtained in terms of NMSE with SNR=30 dB. The three representative channel models are RTV Expressway (RTVE), VTV Expressway Oncoming (VTVEO), and VTV Urban Canyon (VTVUC) as mentioned before. It is obvious that our proposed method delivers fairly performance gain. Besides, the gain increases as the maximum Doppler shift grows. It demonstrates our proposed method can track more adequately variations of CFR with respect to the compared methods.

Conclusions
Because CFR in vehicular communications varies highly, it is difficult to track variations of channel. The current DPAS method suffers from error propagation caused by accumulative noise. In this paper, a TS-ChannelNet-based channel estimation method for the fast time-varying scenario using IEEE 802.11p is proposed. In this scheme, CR is taken as images and apply TS-ChannelNet to estimate the CR leveraging pilot. TS-ChannelNet is made up of two phases. Pilot values are first extended to coarse tentative CR via interpolation based on ADD-TT. Note that the estimated CR is divided into sequences that contain n adjacent symbols. Afterwards, the SR-ConvLSTM takes divided CR as input and generates recovered CR. Simulation results demonstrate that our proposed method enables prominent performance over previous schemes under high-sped scenarios. Further experiments verify the two building blocks of TS-ChannelNet have all evident performances in channel estimation accuracy. The proposed TS-ChannelNet sheds light on how DL can be successfully applied for CE under high velocity environments.
In this paper, the NN is trained separately with respect to the correspoding representive environments. Hence, the generalization ability of network needs to be further improved. How to use transfer learning to overcome this problem will be our future work.