Superimposed Training-Based Joint CFO and Channel Estimation for CP-OFDM Modulated Two-Way Relay Networks

,


Introduction
Two-way relay networks (TWRNs) [1] have received much attention due to their improved spectral efficiency over a one-way relay network (OWRN) [2].Unlike the OWRN where the data flow is unidirectional from the source to the relay and then to the destination, in a TWRN, two source terminals exchange information via the relay similar to network coding [3].The overall communication rate of this setup is approximately twice as that achieved in a OWRN [4], making the TWRN concept particularly attractive to bidirectional systems.The achievable rate regions for amplify-and-forward-(AF-) and decode-and-forward-(DF-) based TWRNs were derived in [5,6].In [7], the optimal mapping function at the relay that minimizes the bit-error rate (BER) was proposed, and in [8], distributed spacetime codes (STCs) that achieve full diversity were developed for both AF and DF TWRN.The optimal beamforming at the multiantenna relay that maximizes the capacity of AFbased TWRN was designed in [9], and suboptimal resource allocation for orthogonal frequency division multiplexing-(OFDM-) based TWRN was derived in [10].
Most TWRN works [4][5][6][7][8][9][10] assume perfect synchronization and channel state information (CSI) at the relay node and/or the source terminals.However, since these assumptions are not realistic, for the first time, [11,12] designed the joint carrier frequency offset (CFO) and channel estimation algorithms for an OFDM-based TWRN.As the first effort to tackle this problem, [11,12] only proposed to estimate the cascaded channels and cascaded CFO of the uplink and the downlink phases, but not the individual frequency and channel parameters.However, individual parameters are also important for TWRN systems.For example, they are required for the optimal beamforming and the carrier permutations [9,10].Moreover, if detailed information about CFOs is available at both source nodes, the two nodes can cooperate to eliminate CFOs, which may yield a better detection performance (this fact will be seen from the special case in Section 3 and from the corresponding simulation results Figures 4 and 5).
In this paper, we introduce the superimposed pilots at the relay to enable the estimation of all the individual channel and CFO parameters.The idea of superimposed pilots was first proposed in [13] for analog communication systems and was later extended to digital communication systems for both synchronization and channel estimation [14][15][16].Different from the traditional concept that the pilots are superimposed on the data symbols, we propose that the relay node superimposes its own pilots over the pilots received from the two terminals.Depending on the number of pilots, three different estimators are then developed for the initial parameter estimation.An iterative estimator is also proposed to improve the estimation accuracy.Moreover, the Cramér-Rao Bound (CRB) is derived and compared with the estimation mean square error (MSE).We find that, when the CFO between the two source terminals is small, the MSE approaches CRB in the high SNR region, indicating the optimality of the third proposed estimator.However, for more general cases, the MSE deviates from the CRB, which indicates the room for improving the estimation accuracy.
This forms an open problem for the future research.
The remainder of this paper is organized as follows.In Section 2, the two-way communication systems and our joint CFO and channel estimation model are introduced.In Section 3, three estimators are proposed for initial CFO and channel estimation.The estimators vary depending on the number of pilots.An iterative estimator is also designed to improve the estimation accuracy.The CRBs of all parameters are derived in Section 4. Simulation results are presented in Section 5, and conclusions are made in Section 6.
1.1.Notations.Vectors and matrices are boldface small and capital letters; the transpose, complex conjugate, Hermitian, inverse, and pseudoinverse of the matrix A are denoted by A T , A * , A H , A −1 , and A † , respectively; tr(A) and A F are the trace and the Frobenius norm of A; [A] i j is the (i, j)th entry of A, and diag{a} denotes a diagonal matrix with the diagonal element constructed from a; ⊗ represents the linear convolution between two vectors.MATLAB notations for rows and columns of a matrix are adopted; A [:,i: j] denotes the ith column to the jth columns of A, and A H [:,i: j] denotes the ith column to the jth columns of A H ; the entry index of vector and matrix starts from 0; I P is the P × P identity matrix for any positive integer P; 0 n×m represents the n × m matrix with all zero entries; E{•} is the average; j is the imaginary unit √ −1 and e i is the basis vector of all-zeros except for the ith element of 1.

Problem Formulation
2.1.System Model.Consider a classical two-way relay network (TWRN) with two terminal nodes T j , j = 1, 2 and one relay node R (Figure 1).Each node has only one half-duplex antenna.The baseband channel from T j to R is denoted as h j = [h j,0 , . . ., h j,L ] T , whose elements are independent with zero means and variances σ 2 j,l .The channel from R to T j is also h j .(This is true for reciprocal channels.There may be a phase difference but is nevertheless ignored in this paper.) The training block length is set as N, which may or may not be the same as the data block length.The average powers of T j and R are denoted as P j and P r , respectively.Furthermore, we denote the carrier frequency of T j as f j and that of R as f r .

Superimposed pilots p
Figure 1: System configuration for two-way relay network.
In real applications, Doppler shifts and oscillator instabilities may result in CFOs such as f 2 − f 1 and f r − f j .The target of this work is to separately estimate CFOs and channels h 1 and h 2 .It is important to achieve this goal within two transmission phases such that the training is compatible with the two-phase data transmission and can be embedded in the data frame.To do so, we modify the OFDM transmission scheme and introduce superimposed training at the relay node.In the following discussion, perfect time synchronization is assumed.

CP-Based OFDM Modulation at Terminals. Denote one OFDM block from
T .The corresponding time-domain signal block is obtained from the normalized inverse discrete Fourier transformation (IDFT) as where F is the N × N normalized DFT matrix with the (p, q)th entry given by (1/ √ N)e − j2π(p−1)(q−1)/N .To maintain the subcarrier orthogonality during the overall transmission, we propose to add the cyclic prefix of length 2L as did in [11].This implicitly requires N ≥ 2L which is nevertheless satisfied by most OFDM systems.
Define the matrix for any P ≤ N. The baseband signal from T i including the CP can now be mathematically expressed as T (2L) cp s i of length N + 2L.The signal is up-converted to the passband signal by the carrier e j2π fit .Note that the oscillator may have an initial phase but it is omitted for brevity since the constant phase can be absorbed into the channel effects.

Relay Processing.
With the assumption of the perfect time synchronization, the signals from T 1 and T 2 arrive at R simultaneously.Relay then down-converts the received signal by the carrier e − j2π fr t .It is important to mention that R removes only the first L symbols in each block.
Denote T s as the symbol duration and define for x = [x 0 , x 1 , . . ., x P ] T .The resultant baseband signal block at R is of length N + L and can be expressed as where ] T , and we use the property that Moreover, each element in the noise vector n r is assumed to be independent and identically distributed (i.i.d) zero-mean complex white Gaussian with variance σ 2 n .The estimation of the CFO and channels h 1 and h 2 at R can be easily done as in the 2 × 1 multi-input single-output (MISO) system [17,18].
The relay then superimposes a time-domain training signal p over r and obtain where the scaling factor α and the superimposed training should satisfy the following power constraint: Note that ( 7) is a constraint on the average power because the instant channel values are unknown before the estimation.
It can be easily shown that the range of α is (0, P r /( 2 j=1 L l=0 σ 2 j,l P j + σ 2 n )) that balances the power between the training from T j and the superimposed training from R. Note that the training signal p is generated from N training signals p 0 and a cyclic prefix of length L. By using our definition (2), the training signal can be expressed as p = T (L)  cp p 0 .The pilots p 0 , s 1 , and s 2 are predesigned at both source terminals for channel estimation.
Finally, R up-converts the resultant signal t to passband by the carrier e j2π fr t .

Signal Reformulation at
Terminals.Due to symmetry, we only illustrate the process at T 1 .After down-converting the passband signal by e − j2π f1t , T 1 obtains the baseband block of length N + L. It then removes the first L elements and the remaining signal is written as where the property is used and, without loss of generality, the noise vector n 1 is assumed to have the same statistics as n r .The equivalent Gaussian noise n e has the covariance Define Ω (K) [ f ] = diag{e j2π f (K−1)Ts , . . ., e j2π f Ts , 1}.The following properties hold where denotes the N-point circular convolution but reduces to linear convolution if N is greater than or equal to the length of x 1 x 2 .
Assuming N > 2L and using the above properties, we can further simplify y as Define EURASIP Journal on Wireless Communications and Networking Then ( 12) can be expressed as where S j is the N × (2L + 1) circulant matrix with the first column s i , and P is the N × (L + 1) circulant matrix with the first column p 0 .

Joint Estimation Algorithms
Based on the new signal model ( 14), the task is to estimate the individual channels h 1 , and h 2 and the CFOs v and w.We omit all redundant superscripts and subscripts for notation simplicity and rewrite ( 14) as Note that the number of parameters to be estimated is 2L +4.Furthermore, a is a function of w and h 1 , and b is a function of v, w, h 1 and h 2 .Depending on the number of pilots, we can thus develop three different estimation methods.
where C and d are defined in (16).From the least-squares (LS) method, the two CFO estimates are where v and w can be obtained either from a twodimensional search or from the alternating projection that converts the 2-dimensional maximization into a series of 1D maximization problems.Details on the implementation of the alternating projection method can be found from [19].
Then values of d are estimated as where the value of C is obtained by using the estimates v and w.
We next explore the relationships among a, b, and h 1 to improve the quality of the estimates.From (13), we note that where By subtracting the estimate of the second item in (15) from y and by using (19), an improved estimate of h 1 is obtained as Similarly, from (13) we find that Thus h 2 can be estimated as zp In summary, ( 17), ( 20) and ( 22) provide estimates of all the parameters.These initial estimates can be further improved by the iterative estimator developed in Section 3.4.

Estimation with Not-So-Large N.
In order to reduce the overhead, we will use fewer pilots than before.Define K 1 , K 2 , and K r as the frequency domain pilot index sets from T 1 , T 2 , and R, with cardinality K 1 , K 2 , and K r , respectively.We require Here, we do not assume the disjoint sets, and so 15), we know that the frequency domain pilots are s j = Fs j , and p 0 = Fp 0 .
Remark 1.As will be seen later that pilots in K r are used to estimate h 1 , and those in K 2 are used to estimate h 2 at T 1 .Due to symmetry, pilots in K 1 are used to estimate h 1 at T 2 .This gives the above requirements on cardinality.
Let us collect nonzero pilots from T j into a K j × 1 vector sj and nonzero pilots from R into a K r × 1 vector p0 .Since S j and P are column-wise circulant matrices, they can be represented as Define K 1 as the complement set of K 1 .Multiplying both sides of (15) with F [K1,:] yields where b = F [K2,1:2L+1] b is the DFT response of b on the subcarrier set K 2 , and C 1 is an (N −K 1 )×(K 2 +L+1) matrix.
As long as N − K 1 − K 2 − (L + 1) ≥ 2, namely, when there is sufficient degree of freedom to estimate the two unknown CFOs, they can be estimated as Considering the range of K j and K r , the minimum number of N is 3L + 5, when sets are disjoint, and Then, h 2 can be estimated as Remark 2. Though the number of unknown parameters is 2L + 3, we should allow certain redundancy to provide low complexity estimation, for example, linear estimation.In our work, the minimum training length is set as N = 3L + 5.

Joint Estimation with Minimum Training
Length: A Special Case.In practical applications, the relay terminal is often a simple device while the two source terminals may employ high-precision synchronization circuits.Thus, it is reasonable to expect the CFO between the two source terminals to be negligible.In this case, v ≈ 0 or v 1/N, that is, one subcarrier spacing.This is also true at the CFO tracking stage when the frequency difference between two terminals is quite small.By taking advantage of the negligible CFO between the two source terminals, parameter estimation is achieved with the minimum training length N = 2L + 3, that is, the same number of the unknowns variables.
Let us choose the same frequency pilot sets for T 1 and T 2 , that is, K 1 = K 2 .Left multiplying the received signal y by F [K1,:] gives where C 2 is an (N − K 1 ) × (L + 1) matrix, and the first term is negligible because v ≈ 0. As long as (N − K 1 − L − 1) ≥ 1, the CFO between the relay and the first source terminal can be obtained as Since a can be estimated from w and h 1 , then h 2 can be estimated as where . The estimates given by ( 30) and (31) need not be improved by iterations because they already achieve CRB under high SNR conditions.

Iterative Algorithm to Improve the Performance.
With the initial estimates of all the parameters, an iterative approach can be applied to improve the estimation accuracy.Re-denote the initial estimate as v {0} , w {0} , a {0} , b {0} , h {0}  1 , h {0} 2 , respectively, with the superscript representing the number of the iterations.We will estimate v {1} , w {1} simultaneously from the ML estimation process as where R −1 n is always obtained by using the newest estimates of w and h 1 in (10) expressed as The complexity here is not significant even if the 2dimensional search is applied, since the search region for fine estimation is around the initial estimation and is thus very small.Then we can obtain h {1} 2 and h {1} 1 from where , and . The interactive processing could gain the improvement from the fact that the initial estimation does not fully exploit the correlation between a, b, and h 1 .
Remark 3. The superimposed training used in our paper is different from the traditional ones in two aspects.First, traditional superimposed pilots [14] are added on the data symbols while our superimposed pilots are added on the EURASIP Journal on Wireless Communications and Networking pilots from the source by the relay.On the other hand, traditional superimposed pilots [15] select the first order statistics for channel estimation while we use the nullingbased LS method for joint CFO and channel estimation.

Cramér-Rao Bound
In this section, we derive the CRB that defines the theoretical bound of the estimation accuracy.The CRB is an important tool to study the performance of the estimation algorithms. Define According to [20], the (i, j)th entry of the Fisher Information Matrix (FIM) can be calculated as After some tedious simplifications, we derive where and e i is an (L + 1) × 1 vector whose ith element equals 1 and others 0.
The CRB of η is then obtained by inverting the FIM F.
the CRB of v, w, h 1 , and h 2 can be expressed as

Simulation Results
The performance of the proposed three estimation algorithms along with the iterative estimator is investigated.A four-tap model for both h 1 and h 2 is assumed, and each tap is assumed complex Gaussian with unit variance as did in [21].The variance of the noise is taken as σ 2 n = 1.The normalized frequencies f 1 , f r , and f 2 are set as 0.94, 1, and 1.06, respectively.The MSE is chosen as the figure of merit, defined by where x represents h 1 or h 2 , and 10000 is the number of the Monte-Carlo trials used for average.In all the following simulations, α is set as half the maximum value, that is, α = 0.5 P r /( 2 j=1 L l=0 σ 2 j,l P j + σ 2 n ).

Sufficiently Large N.
In this case, we chose N = 24, which is greater than 5L + 5 = 20.The received signal y at T 1 is generated according to (9).Initial CFO and channel estimates are obtained from y through (17) and (18).The estimate h 1 of h 1 is updated as (20).Finally, the iterative estimator in Section 3.4 is applied and is found to converge in three iterations.The MSEs and CRBs of CFO estimation as a function of SNR are shown in Figure 2. The iterative algorithm improves the estimation accuracy significantly.Specifically,   the improvement of w, which is the CFO between R and T 1 , is much more significant than that of v, the CFO between T 2 and T 1 .The reason is due to the fact that the received signals contain more information about w than that about v.In (14), all components of y contain information about w while only the second term of y does so of v.
The channel estimation MSEs and CRBs versus SNR are shown in Figure 3.It is observed that the gaps between the MSE and the CRB are smaller compared to those in the CFO estimation.The reason is that phase errors have less effect on the channel estimation than on the CFO estimation.Similarly to CFO estimation, iteration improves the estimation accuracy, and h 1 improves more than h 2 since most components of y contain the information of h 1 .

Not-So-Large N.
Next we choose N = 3L + 5 = 14, and K 1 = 4, K 2 = 4, K r = 6.The initial CFO and channel estimates are obtained from (25), (26), and (28).The estimates are iteratively updated.We find that ten iterations reach convergence.The MSEs and CRBs versus SNRs for both CFO and channel estimation are displayed in Figures 4 and 5, respectively.
Since smaller training length is applied and the average symbol power is kept the same, the performance here is a little worse than that in Figures 2 and 3.It is observed that in the high SNR region, the MSEs approach CRBs after ten iterations.The iterative estimator improves the estimation of h 1 for all SNRs while, for h 2 , it only works at the high SNR region.Conversely, the iterative estimator degrades the estimation of h 2 at low SNRs.A possible reason is as follows.Since only the second item of y in (14) contains h 2 , and iterations require reconstruction of a from the initial     estimate w and h 1 , the ambiguity of h 2 increased at the low SNR region from errors of all factors.However, in the case of large training length, for example, N = 24, a is directly estimated, which avoids the error propagation from erroneous w and h 1 .Therefore similar phenomenon is not observed in Figure 3.   From Figure 2 to Figure 5, one key conclusion is that there is room for new algorithms to improve the performance, due to the gap between MSEs and CRBs.This open question is an interesting topic for the future research.

Minimum N:
A Special Case.In the last example, we set f 2 as 0.95 and 0.9401, such that the CFOs between the two terminals are v = 0.01 and v = 0.0001, respectively.The minimum training length is chosen as N = 2L + 3 = 9.The CFO and channel estimation results can be obtained from (30) and (31).As mentioned before, these cases do not require iterations because v is nearly zero.The estimation MSEs of w and channels, as well as their corresponding CRBs are shown in Figures 6 and 7, respectively.
These figures show that the estimation accuracy is quite close to CRB.The reason is that the interference due to the pilots is negligible in this case as the CFO between the two source terminals is negligible.For v = 0.01, a relatively larger value, there exists an error floor for both CFO and channel estimation at the high SNR region.When v is as small as 0.0001, the best estimation performance can be achieved since the MSE attaches the CRB.

Conclusions
In this paper, superimposed pilot-based CFO and channel estimation was investigated for CP-OFDM modulated TWRN.Three direct estimation algorithms as well as the iteration algorithm to improve the performance were developed.We also derived the analytical CRBs as the benchmark for the designed algorithms.With superimposed training, all the individual parameters can be estimated at all three nodes.From the simulations, it is found that although the iterative estimator improves the performance, gaps remain between the MSE and CRB, indicating room for further improving the performance.

3. 1 .
Estimation for Sufficiently Large N. When N ≥ 5L + 5, there are sufficient degrees of freedom in the training signals, and v, w, a, b and h 1 can be simply treated as individual variables.That is, the above-mentioned relationships among the variables are ignored.Rewrite y as y = [αS 1 αΓ[v]S 2 Γ[w]P]