Doubly selective channel estimation for OFDM modulated amplify-and-forward relay networks using superimposed training

This article is concerned with the problem of superimposed training (ST)-aided channel estimation for orthogonal frequency division multiplexingmodulated amplify-and-forward relay networks in doubly selective environment. A ‘subblockwise’ linear assumption-based channel model is proposed to represent the mobile-to-mobile time- and frequency-selective channels. We then propose a novel ST strategy that allows the destination node to separately obtain the channel information of the source → relay link and the relay → destination link, from which the optimal ST signals are derived by minimizing the channel mean-square-error. To enhance the performance of channel estimation, a subblock tracking-based low-complexity decision feedback approach is introduced to iteratively mitigate the unknown data interference. Finally, extensive numerical results are provided to corroborate the proposed studies.


Introduction
Cooperative communication systems have attracted much attention due to their ability to exploit spatial diversity by utilizing relays to assist transmission between a source and a destination node [1][2][3]. Like any other wireless communications systems, channel state information (CSI) at both the relay nodes and the destination nodes are required to optimize certain criterions. For example, in relay beamforming schemes [4,5] as well as subcarrier pairing schemes [6,7], the destination needs both the channel knowledge of source ! relay and relay ! destination links in order to know the relay's operation.
To obtain the separate CSI from the source node (S) to the relay node (R) and the relay node to the destination node (D), time-and/or frequency-multiplexed pilots are employed in amplify-and-forward(AF) relay networks [8][9][10]. For orthogonal frequency division multiplexing (OFDM)-modulated AF relay networks, the authors of [11] proposed a two-phase training prototype, where the relay superimposes its own training to the received training signal such that separated channels can be estimated at the destination, from which optimal training as well as optimal power allocation factor between R and S are derived based on Bayesian Cramer-Rao bound.
Previous studies in AF relay systems [8][9][10][11] mainly focused on the block-fading or slow-fading scenarios (e.g., the normalized Doppler spread over one OFDM block is less than 0.1). However, for practical broadband relay networks where the source and the relay can all be moving nodes, e.g., mobile terminals in moving cars or highspeed trains. Under such transmission environment, one must assume that the wireless channels of S ! R and R ! D to be time-and frequency-selective fading. To alleviate the number of unknown channel parameters, doubly selective channels are typically represented by two ways: by using the basis expansion model (BEM) [12][13][14], which decomposes the channel into a superposition of time-varying orthogonal basis functions (e.g., Fourier bases) weighted by time-invariant coefficients, and by using a blockwise linear model [15], which tracks the channel variation as a linear fashion over specific block periods. Previous contributions on channel estimation involving either BEM or blockwise linear channel models have been reported by the authors of [12,13,[15][16][17][18]. Although such channel modeling methods are generally reliable for a relatively high Doppler frequency, more than 30% transmission efficiency is wasted for transmitting known pilots, thus leading to a reduction in transmission efficiency.
To improve valuable transmission efficiency while without entailing unrealistic assumption or highcomplexity, an alternative approach, referred to as superimposed training (ST), has been studied in [17,19]. In such schemes, channel estimation can be performed without a loss of rate with bearable data interference since the training signals are arithmetically added onto the unknown data.
Motivated by the advantages of ST, this article presents a novel ST-based doubly selective channel estimation for OFDM-modulated AF relay networks. By modeling the doubly selective channel as a 'subblockwise' linear model, separated channel estimation of S ! R and R ! D is estimated straightforwardly by a twostep approach: First, we adopt a time-domain 'subblock' tracking scheme whose aim is to model the timeselective channel within one OFDM block as multiple subblock fading structures such that channel estimation over each subblock can be performed by a linear timeinvariant structure. Second, we smooth the initial channel estimates over multiple subblocks of one OFDM block by using polynomial fitting. The optimal ST design criterions for both S and R are derived w.r.t. minimizing the mean square error (MSE) of channel estimation. Furthermore, a subblock tracking-based low-complexity decision feedback (DF) approach is provided to enhance the performance of channel estimation by iteratively mitigating the data interference. Finally, simulation results are provided to corroborate our studies.
The advantages of this article are summarized as follows: 1. ST is adopted for channel estimation, and thus offers higher transmission efficiency in comparison with the existing pilot-assisted schemes [10,13,16,20] tracking is provided to iteratively enhance the performance of channel estimation.
The rest of the article is organized as follows. The following section presents the system model of OFDMmodulated AF relay networks with ST strategy. The STbased channel estimation algorithm and optimal training design are then provided in Section "ST-based channel estimation". Using the analyzed MSE derived in the same section, we optimize the power ratio between ST and data sequence w.r.t. channel capacity in Section "Channel estimation enhancement". Section "Simulation results and discussion" reports on some simulation experiments to corroborate the validity of our theoretic analysis, and we conclude the article with conclusion.

Notations
Vectors and matrices of time-and frequency-domain are boldface small and capital letters, the transpose, conjugate, inverse, and pseudo-inverse of the matrix A are denoted by A T , A H , A −1 , and A † , respectively. diag{A} denotes the diagonal matrix with the diagonal element constructed from A, and tr{A} is the trace of A; represents the linear convolution. MATLAB

Problem formulation
Relay transmission model Figure 1 illustrates a typical one-way relay network with one source node (S), one relay node (R), and one destination node (D) a . The baseband channels between S and R and R and D are denoted by h SR;l 1 t ð Þ and h RD;l 2 t ð Þ, respectively, where t is discrete time index, l i = 0, 1,. . ., L i − 1, 8i 2 {SR, RD} with L SR and L RD being the number of resolvable paths for the channel of S ! R and R ! D, respectively. The paths h SR,l (t), l = 0,. . ., L SR − 1 and h RD,l (t), l = 0,. . ., L RD − 1 are assumed statistically independent, with the power of the lth path being σ 2 h SR;l and σ 2 h RD;l , respectively. Unlike the block-fading scenarios [10,11], the present analysis assumes each node to be mobile terminals. Hence, the corresponding wireless channels between each node pair, i.e., h SR,l (t) and h RD,l (t), are assumed to be mobile-to-mobile channels, i.e., the channel coefficients are time-and frequency-selective fading [19,21]. Denote f S , f R , and f D as the maximum Doppler shifts due to the motion of S, R, and D, respectively. The discrete autocorrelation functions of h i,l (t), i 2 {SR, RD} can be represented as [21].
where J 0 ( Á ) is the zeroth-order Bessel function of the first kind, and T S is the symbol sampling time (sample interval). The correlation functions have widely been adopted to describe the mobile-to-mobile link (see, e.g., [21]). Note that (1) reveals that the power spectra of h SR,l (t) and h RD,l (t) span over the bandwidths f SR = f S + f R and f RD = f R + f D , respectively, which indicates an increased Doppler effect for the mobile-to-mobile communications. Without loss of generality, perfect synchronization is assumed in this article as did in [7][8][9][10][11]13,[15][16][17][18][19]22].

ST strategy at the source and relay
After performing inverse fast Fourier transform and inserting cyclic prefix (CP), the transmitted complex baseband data samples can be written as where t ¼ À L; Á Á Á; 0; Á Á Á; N À 1 and L is the CP-length, S(k) is modulated data symbol at kth subcarrier and N is the total number of subcarriers. In the proposed ST strategy, known training sequences p S (t), t = 0, . . ., N − 1 from S are superimposed onto data samples s(t): Here, the average power of the data s(t) and the training p s (t), respectively, are normalized and given by Hence, the average transmission power at S yields In considered AF relay transmission, x(t) is transmitted over a time-and frequency-selective channel between S and R, the received signal at R is amplified by a fixed gain α. Meanwhile, R superimposes its own training p R (t) over the received signal. The structure of the training model is shown in Figure 2. After discarding CP, the signal determined at R yields where n R (t) is the additive white Gaussian noise (AWGN) observed at R with zero-mean and a variance of σ n 2 . Suppose R has the average transmission power P R , the average power assigned for p R (t) yields where r and α 2 0; i control the power allocation between the received signal from S and the ST signal of R. Relay R then adds the new CP and forwards the resultant signal to D. The received signal at D, after removing CP, is where the first term on the right-handside of the resulting signal model (7) is equivalent to a single hop doubly selective channel model with cascaded fading gains h SRD,l" (t) that can be determined by n D (t) is the AWGN observed at D. For notation simplicity, we assume the variance of both n R (t) and n D (t) to be σ n 2 in the rest of the article. Nevertheless, extension to the general case is straightforward.
To facilitate channel estimation, let Therefore, formula (7) can be rewritten by where η(t) in (9) is the undesired part that comprises extra data interference, channel gain, and AWGN, n R (t) is the corresponding 1 × L RD AWGN vector at R. Unlike the block fading scenarios [9][10][11][12], h SR (t) and h RD (t) considered in this article are time selective over an OFDM block period. The aim of this article is to find the separated h SR (t) and h RD (t) from (9).

ST-based channel estimation
In this section, we propose a 'subblockwise' linear channel model to facilitate the separated channel estimation of S ! R and R ! D. The key idea behind the proposed approach is to force the channel as time-invariant over subblocks of one OFDM block such that separated CSI of S ! R and R ! D over each subblock can be obtained by a time-invariant structure. Using the initial channel estimates, we got the recover of the CSI of S ! R and R ! D over one OFDM block by using polynomial fitting. First, we split one OFDM block into several equispaced subblocks (time slots). Let M be the subblcok size and G be the number of subblocks within each OFDM block, i.e.,N = GM. We then assume that the time-variation of both h SR (t) and h RD (t) is negligible within one subblock period. Accordingly, the channel response within gth subblock can be approximated as Let us specify the time-index within gth subblock as t = gM + m, g = 0,. . .,G − 1, m = 0,. . .,M − 1 and define The received signal y D (t) in (9) can be re-expressed in a matrix form as T MÂL RD are the column-wise circulant matrices, respectively, and Channel estimation over subblocks of one OFDM block Case 1 Suppose there are sufficient observations within one subblock to estimate the unknowns h SRD p and h RD p , for example, M > L SR + 2L RD − 1. Hence, it is possible to use either linear estimators, e.g., the Least Square (LS) or the Linear Minimum Mean Square Error estimator, to obtain the initial channel estimation.
In this article, we consider the LS estimation in order to embrace more practical scenarios where the channel statistics are not available. Let us define The LS estimator of h g can be obtained by [23] whose error covariance is is the covariance matrix of η g . In accordance with central limit theorem, data sequence s (t) can be regarded as a Gaussian-distributed random vector. Assuming that s(t) and AWGN are mutually independent [17,19], R η g can be modeled as where Note that the first term on the righthand side of (13) is the interference due to the unknown data symbols. In ST-aided schemes where the CSI is time invariant [19,22], a large number of OFDM blocks can be averaged to reduce such extra data effect. For doubly selective fading channel assumed in this article, however, the long-term averaged operation becomes impractical. This problem is viewed as a major demerit for the existing ST-based schemes [11,15,17,19,22].
By (11), we have the corresponding MSE as To obtain the minimum MSE of the LS channel estimator subject to a fixed power dedicated for ST signals, the optimization for ST with a given constraint of training power is formulated as [11,13] arg min Following from the majorization theory, minimization of tr{Cov h } requires matrix (p g ) H p g to be diagonal, i.e., ðp g Þ H p g ¼ CI. Let us rewrite (p g ) H p g as The optimal ST should satisfy the following three conditions: Obviously, if all columns in matrix p S g and p R g are orthogonal, respectively, (C1) and (C2) are satisfied. Moreover, (C3) is an additional constraint that requires the orthogonality between p S g and p R g . An example of such training sequences is provided here From (10), the corresponding initial estimation of S ! R, i.e., ĥ SR g , can be computed straightforwardly from the time-domain de-convolution approach as Case 2 Initial channel estimation when M < L SR + 2L RD − 1. In this case, one cannot directly estimate both h SRD g and h RD g since p g will be a rank deficiency matrix. Bearing in mind that the minimum subblock-size M is expected to be greater than or at least equal to L SR + L RD − 1 since there are totally L SR + L RD − 1 unknowns to estimate, we employ two consecutive subblocks to jointly estimate the channel coefficients.
, the signal vector over two consecutive subblocks can be written by where p Assuming that the block-fading approximation is still valid over two consecutive subblocks, it allows us to model h Correspondingly, the LS estimator of h g,g+1 can similarly be obtained by [18] and the error covariance is From the specific property of p g in (19), it can be verified that the optimal ST design criteria (C1)-(C3) are also optimal for (25), i.e., ðp g;gþ1 S Þ H p g;gþ1 S ¼ 2Mρ P S I and ðp g;gþ1 R Þ H p g;gþ1 R ¼ 2Mρ P R I . That is, we can obtain the separated CSI of S ! R and R ! D over each two consecutive subblocks of one OFDM block, i.e., at the equi-spaced time samples t = gM + M, g = 0,. . .,G − 2. Detailed procedures are omitted here since the derivations are similar to that of Case 1.

Channel smoothing over one OFDM block
Once the channel estimates over G subblocks have been obtained, i.e., ĥ SR g and ĥ RD g , g = 0,. . .,G − 1, an intuitive idea is to recover the CIR over one OFDM block straightforwardly by a linear interpolation method with a gradient between two subblocks given by [15,18]. However, in high mobility environment, the extrapolation on the edge of subblocks generates unreliable channel estimates, which results in the severe performance degradation.
Addressing the above issue, we propose to recover the time-selective channel coefficients by using polynomial fitting, which can be summarized as in the following steps: 1. We use the polynomial of Υ order to model the estimated CIR in each subblock [20] aŝ where a i,l,γ is the polynomial coefficient, t g ¼ gM þ G 2 ; g ¼ 0; Á Á Á; G À 1.

Channel estimation enhancement
Observing (14) and (24), we note that the channel estimation of S ! R and R ! D is affected by extra interference induced by data. To overcome such problem, a data detection-based DF process has been employed to mitigate such data interference at receiver by using the recovered data symbols [17,22,24]. However, in doubly selective channel environment, data detection suffers severe ICI due to the channel-time-variation. To combat with ICI, a computational complexity of approximately O(N 2 ) is required for the existing symbol detectors adopted in [17,22,24], making the DF process unsuitable for practical applications due to constraint of complexity.
In this section, we introduce a novel subblock tracking scheme to alleviate the computational burden of the data detection in doubly selective environment.

Proposed DF technique with subblock tracking detector
Denote the initial estimates asĥ RD t ð Þ, respectively. Removing the ST sequences, the received time domain signal observed at D is given bỹ

RD t ð Þ
denotes the residual ST interference, and w t ð Þ ¼ α P L RD À1 l 0 ¼0 h RD;l 0 t ð Þn R t À l 0 ð Þþ n D t ð Þ. Motivated by the 'subblockwise' linear channel model [15,25], we split the resulting signal block in (29), i.e., y D ¼ỹ D 0 ð Þ; Á Á Áỹ D N ð Þ ½ T NÂ1 , into P equi-spaced subblocks of Q periods (Q ≥ L SR + L RD − 1), and then neglect the channel-time-variation over each subblock. Accordingly, the time domain signal within the pth subblock can be expressed as h SRD t ð Þ is the approximated CIR during the pth subblock period. For the sake of simplicity, we omit the noise term, i.e., ε(t) + w(t) in the following derivation. Nevertheless, the extension to the general case is straightforward.

Collecting the signal (30) within a subblock to form a vectorỹ
where s p ¼ s p pQ ð Þ; Á Á Á; s p pQ þ Q ð Þ ½ T QÂðL SR þL RD À1Þ is a column-wise circulant matrix whose first column is We apply N-point FFT onỹ p D with a zero-padded length N vector as follows: where H Therefore, the initial detection on S can be obtained by a time-invariant structure with a zero-forcing (ZF) criteria, which has the form aŝ Remark 1 Explicitly, one can also adopt the MMSE detector, , and γ is the signal-to-noise ratio (SNR). b However, we herein would rather choose ZF detector due to the constraint of computational complexity. Note that for the special case of block fading channel, where H 1 , the proposed subblock tracking method is equivalent to the conventional one-tap ZF or MMSE detectors [14,20,22].
Using the initially recovered data symbols (34), the data interference can be mitigated by removingŜ ð0Þ from the received signals as Obviously, the data interference can be effectively cancelled as the following inequality holds where β (0) is the symbol-error-rate (SER) of the initial detection in (34). Replacing y D (t) by y ð1Þ D t ð Þ , we reestimate the CSI using the channel estimator proposed in Section "ST-based channel estimation", and obtain the channel estimates of the iteration 1, i.e.,ĥ lower SER performance is expected. Accordingly, the corresponding recovered data are then utilized to mitigate the data interference, similar to (35), thereby achieving an enhanced performance on channel estimation in the forthcoming iteration. The iteration goes until a certain stopping criterion is satisfied.
To sum up, the process of DF scheme can be summarized as follows.
Step 1. Using the estimated CSI, we detect the data symbols by using the proposed subblock tracking scheme from (29)-(34).
Step 2. Using the detected data symbols from Step 1, we mitigate the extra data interference from (35).
Step 3. Re-estimating the CSI by using the proposed channel estimation in Section "ST-based channel estimation". Step 4. Updating the channel estimates of the current iteration, and then go back to Step 1.

Complexity analysis of the proposed subblock tracking detector
We herein discuss the computational complexity of the proposed subblock tracking-based detector. Obviously, in doubly selective fading environment, the most complexity of the DF algorithm comes from Step 1, i.e., data detection process. As can be observed from (30)-(34), the proposed subblock tracking-based data detector requires P N-point FFT operations from (32) and 3NP complex operations from the linear process of (34), resulting in an overall complexity of log N 2 þ 3 which means that the complexity of the proposed subblock tracking scheme is linear in the number of OFDM block-size, i.e. O(N). Comparatively, the proposed data detector requires a lower complexity than the existing methods [15,22], where a total complexity of approximately O(N 2 ) is required, we thus re-emphasize the novelty of the proposed algorithm.

Simulation results and discussion
In this section, we present various numerical examples to verify the validity of the proposed studies. The performance of the channel estimation schemes developed in Sections"ST-based channel estimation"and "Channel estimation enhancement"are evaluated by conducting simulations in accordance with the OFDM system setting [17], i.e., an OFDM block length of N = 512 with a symbol rate of f = 5 MHz, CP-length is chosen to be 64, and 4PSK modulation is adopted. We take L SR = L RD = 4 and the coefficients of S ! R and R ! D are generated as low-pass, Gaussian and zero mean random processes and correlated in time with the correlation functions according to Jakes' model [26]. For estimating the separated CSI of S ! R and R ! D, the power of ST and data sequences are assumed to be ρ P S ¼ ρ P R ¼ ρ S .

Test case 1. ST-based channel estimation
We first test the proposed channel estimator with different tracking parameters, i.e., subblock index and polynomial order. We run the Doppler frequencies of the range of f S = f R = 1000 Hz that corresponds to the mobile speed of 216 km/h as the users of S, R, and D operate at a carrier frequency of 5 GHz. In this case, the corresponding normalized Doppler spreads during an OFDM block period of S ! R and R ! D are (N + CP)f D T % 0.23 and (N + CP)f R T % 0.115, respectively. As shown in Figure 3, we observe that the channel MSE of S ! R and R ! D are almost independent of the AWGN, especially for high SNR regions, e.g., SNR >15 dB. This result is unexpected since the estimation errors is affected by the extra data interference. Anyway, even for more demanding situation, subblock index G = 4 and a polynomial order ϒ = 2 is enough for the proposed channel estimator.
We then test the strategy of optimal ST design. From our previous discussion, we know the optimal training sequences p S and p R should have equal power and satisfy certain phase constraints, e.g., (C1)-(C3). In Figure 4, we compare the optimal ST with two types of nonoptimal ST. Type-1 non-optimal training has equal power but random phase and Type-2 non-optimal training has random power and random phase. Clearly, significant performance improvement can be achieved by using the optimal one. Meanwhile, the farther away the training sequence from the optimal one, the worse the performance will be.
To gain an insight into the proposed estimator in Section "ST-based channel estimation", Figure 5 illustrates the performance of the propose channel estimator for various normalized Doppler spreads. In this example, we set G = 4, with each subblock size of M = 128. A second-order polynomial is used in simulation. For  fairness of comparison, we also simulate the ST-based methods [14,15], where the channel is modeled as linear fashion and generalized BEM. As can be observed, the present channel estimator is outperforms the methods [14,15]. Although the estimator [14] achieves a more robust performance for extremely high Doppler regions (e.g., f d T ≥ 0.3), the proposed subblock tracking based estimator yields potential advantage for doubly selective channels environment of the most acceptable range of most studies (e.g.,f d T 2 [0.1, 0.3]). The result is not unexpected since the ST signals cannot be optimized due to the structure of the generalized BEM, where the matrix of the basis function is non-orthogonal. As observed from previous simulations, channel estimation is severely degraded by the extra data interference. To mitigate such effect, we resort to the lowcomplexity DF process presented in Section "Channel estimation enhancement". As shown in Figure 6, the enhanced channel estimator with DF process, in comparison with the initial channel estimation obtained by  Section "ST-based channel estimation", achieves a significant performance improvement after only three iterations (a steady-state MSE can be achieved after iteration # 3). This result confirms that the presented DF method can effectively mitigate the data interferences, and thus providing a feasible solution for STbased doubly selective channel estimation. It should be emphasized that when higher computational burden is allowed, more advanced estimator like MMSE or maximum likelihood (ML) after power allocation and decision-feedback can be adopted to achieve better estimation performance. Such topic will be elaborated in subsequent researches.

Test case 2. Symbol detection
Herein we carry out several experiments to assess the effectiveness of the subblock tracking based symbol detector. Figure 7 illustrates the initial SER performance  for different values of subblock index P for f d T % 0.23. It is observed that even for more demanding situation, using P = 8 is sufficient since the time-variation over each subblock is less than 3%. For such a small P, the computational complexity of the proposed subblock tracking-based symbol detection scheme is linear in the number of subcarriers, and can be acceptable for practical application scenarios.
To gain an insight into the proposed symbol detector in Section "Channel estimation enhancement", in Figure 8, we simulate the SER performance of various data detection methods for various Doppler shifts, when SNR = 20 dB. P is set to be 8 to ensure an accurate subblock fading channel model. Clearly, the proposed subblock tracking detector outperforms the detection method [27] with a banded structure of the frequency domain channel matrix (we set the bandwidth as 2 for comparison), while with a complexity reduction. Figure 9 illustrates the performance comparison between the initial symbol detection and the enhanced DF process in terms of SER versus SNR. As shown in Figure 9, with the aid of DF process, the demodulator achieves a considerable gain than the initial subblock tracking based symbol detection method. This result is consistent with the previous result in Figure 6 since one may expect to achieve a better SER performance in the forthcoming iteration due to the improved estimation accuracy of the current iteration. The result in Figure 9 further confirms that the effectiveness of the provided subblock tracking method.

Conclusion
In this article, we studied the problem of ST -based channel estimation for OFDM-modulated AF relay networks in doubly selective fading environment. By modeling the channel as a 'subblockwise' linear model, we estimate the separated CSI of S ! R and R ! D, from which we derive the optimal ST signals with regard to minimizing the MSE of channel estimation. To further enhance the performance of channel estimation while preserving computational complexity, we provide a lowcomplexity DF scheme to cancel the extra data interference to channel estimation. Various numerical examples are provided to evaluate the proposed algorithms.

Methods
The analysis in this paper is conducted by using the MATLAB software environment to verify the theoretical expressions.