Exploiting a priori information for iterative channel estimation in block-fading amplify-and-forward cooperative networks

In an amplify-and-forward cooperative network, a closed-form expression of the a priori distribution of the complex-valued gain of the global relay channel is intractable, so that a priori information is often not exploited for estimating this gain. Here, we present two iterative channel gain and noise variance estimation algorithms that make use of a priori channel information and exploit the presence of not only pilot symbols but also unknown data symbols. These algorithms are approximations of maximum a posteriori estimation and linear minimum mean-square error estimation, respectively. A substantially reduced frame error rate is achieved as compared to the case where only pilot symbols are used in the estimation.


Introduction
As wireless channels suffer from multipath propagation, several methods to combat the detrimental effect of fading have been proposed [1]. Cooperative communication [2] is a relatively new method where spatial diversity is achieved by exploiting the presence of other terminals in the network. The source time-shares its allocated time frame with other terminals that acts as relays. During the first slot of the frame, the source broadcasts information to the destination and the relays; the remaining slots are used by the relays to transmit to the destination information that is related to the message sent by the source. In this paper, we consider the amplify-and-forward protocol [3], and hence, each relay simply amplifies and retransmits the signal received from the source.
In real-world situations, the channels between the different terminals are unknown and must be estimated, before detection at the destination can start. The overall noise in the signal received from the relay has a variance depending on the realization of the relay-destination channel, and the overall channel gain is the product of the source-relay and relay-destination channel gains. It has been proposed (e.g., the linear minimum mean-square *Correspondence: nico.aerts@telin.ugent.be TELIN, UGent, Gent 9000, Belgium error (LMMSE) cascaded channel estimation from [4,5]) that the destination estimates the overall channel gain but takes the overall noise variance equal to the variance obtained by averaging over the statistic of the relaydestination channel, whereas in [6] the relay-destination channel gain is estimated separately (and the noise variance computed accordingly) at the expense of a more sophisticated relay (that adds pilot symbols of its own). The LMMSE disintegrated estimation from [5] involves the estimation of the source-relay channel at the relay (which significantly increases relay complexity) and the relay-destination channel at the destination, whereas [7] considers maximum-likelihood (ML) estimation of both these channels at the destination. In [8], ML estimation of the overall channel gain and noise variance is performed at the destination; these results are extended in [9] to the case of a multi-antenna destination.
In this contribution, we present two pilot-based and two space-alternating generalized expectation-maximization (SAGE) [10] algorithms for estimating at the destination both the overall channel gain and (unlike the cascaded channel estimation from [5]) the overall noise variance. In contrast with [6][7][8][9], the proposed algorithms also take a priori channel knowledge into account. We restrict our attention to a low-complexity amplify-and-forward relay, which does not add pilot symbols of its own nor performs http://jwcn.eurasipjournals.com/content/2014/1/222 channel estimation. We derive in closed form the joint a priori distribution of the parameters to be estimated and use this distribution to derive an approximately maximum a posteriori (MAP) estimate and an approximation of the LMMSE estimate. We investigate the mean-square estimation error (MSEE) and frame error rate (FER) performance resulting from these estimates and make the comparison with the performance that results from the joint ML channel gain and the noise variance estimates from [8].

Notations
All vectors are row vectors and boldface; the Hermitian transpose, statistical expectation, and estimate of the row vector x are denoted by x H , E[x], andx, respectively.

System description
We consider a system consisting of a source S, a destination D, and a relay R. During a given frame, the channels between the terminals are characterized by a noise vector n m with variance N m per component and a constant channel gain h m with variance H m and where m ∈ {SD, SR, RD}. Both h m and the elements of n m are zero-mean circular symmetric complex Gaussian (ZMCSCG) distributed, and SD, SR, and RD refer to the source-destination, source-relay, and relay-destination path, respectively.
During the first time slot, the source transmits a sequence of K = K p + K d complex symbols c consisting of K p pilot symbols c p and K d data symbols c d ; the latter are obtained by coding and mapping a packet of information bits. We assume E[ |c| 2 ] = KE s . Throughout the paper, we use the notation x p and x d to indicate the part of any vector x that corresponds to the pilot and data symbols, respectively. The signals received by the relay and the destination during the first slot are The relay amplifies r SR using a fixed gain γ , and hence, the signal received by the destination during the second time slot is where h SRD = h SR h RD and the elements of n SRD are ZMC-SCG distributed with variance when conditioned on h RD . If the power constraint at the relay imposes the average energy per sent frame E[|γ r SR | 2 ] to be KE r , with E r denoting the energy per symbol transmitted by the relay, the constant gain factor follows from the relation (H SR E s + N SR )γ 2 = E r . Taking into account the energies spent by both the source and the relay, the received energy per information bit at the destination is obtained as with r c and M denoting the code rate and the constellation size, respectively. The spectral efficiency, expressed as the ratio between the information bitrate R b and the symbol rate R s , is given by where the factor 1/2 accounts for the fact that two time slots are used. If the destination knows the channel gains and noise variances, maximum ratio combining [11] of the received signals yields the sufficient statistic z to be used by the decoder The noise variances N m (m ∈ {SD, SR, RD}) are longterm properties of the channels, and we consider them to be known. The three components of ν = h SD , h SRD , N SRD depend on the channel gain realizations during the considered frame; therefore, the destination must estimate ν and calculate the sufficient statistic (6) with ν replaced by its estimate.

Joint a priori distribution
The SD channel gain h SD is ZMCSCG distributed with variance H SD and is independent of (h SRD , N SRD ). When conditioned on h RD , (a) the variables h SRD and N SRD are statistically independent, (b) h SRD is ZMCSCG distributed with variance H SR |h RD | 2 , and (c) the probability density function (pdf ) of N SRD is a Dirac delta function located at γ 2 |h RD | 2 N SR +N RD . Then, p(h SRD , N SRD ) is obtained by averaging p(h SRD |h RD )p(N SRD |h RD ) over h RD , which is ZMCSCG distributed with variance H RD . As outlined in Appendix 1, this yields and y is a short-hand notation for N SRD − N RD .

Estimation
In order to obtain the MAP estimate of ν, which is defined by:ν MAP = arg max ν ln p(r, ν), (8) with r = (r SD , r RD ) and p(r, ν) = p(r|ν)p(ν), one has to compute p(r|ν) by averaging over the unknown data symbols c, i.e., with Pr[ c] the a priori probability of c. As the summation in (9) runs over all possible codewords, the computation of p(r|ν) is not feasible because its complexity increases exponentially with the data sequence length. The expectation-maximization (EM) algorithm [12], which is tailored to deal with a nuisance parameter such as c, produces a sequence of estimatesν that, when properly initialized, converges to the MAP estimate (8); the conditional expectation in (10) is with respect to the nuisance parameter c. We will use the SAGE algorithm [10] instead of the EM algorithm in order to avoid the complexity of the multidimensional maximization associated with (10). The SAGE algorithm replaces the multidimensional maximization in (10) by several lower dimensional maximizations over mutually exclusive subsets of ν. The SAGE algorithm needs an initial estimateν (0) to start the iterations; this initial estimate is derived from the pilot symbols. Using (1) and (2), it follows from (10) that h SD and (h SRD , N SRD ) can be estimated independentlŷ and As the estimation of h SD is well documented [13], we will present the estimates of (h SRD , N SRD ) only.

Iterative approximate MAP estimation
Let us first derive a pilot-based initial MAP estimate h To circumvent this problem, we firstly compute from r RD,p the ML estimate of h SRD , which is given by [8] Next, exploiting (7), a numerical search algorithm is used to obtain an initial estimate of N SRD according tô (14) Finally, we maximize ln p r RD,p according to (31) in Appendix 2, this yields the closedform expression (15) During the ith SAGE iteration, we use (12) to firstly estimate h SRD with N SRD fixed toN SRD ; according to Appendix 3, this yields the closed-from expression wherê is the SAGE-based ML estimate of h SRD [8]. The quantities are the a posteriori expectations (APEs) of c and |c| 2 and are easily derived from the marginal a posteriori probabilities of the coded symbols, which are obtainable by message passing on a factor graph [14]. Next, we look for the estimateN (i) SRD by using (12) with h SRD fixed toĥ (i) SRD . As it was the case with the pilot-based estimate, it is not possible to obtain a closed-form expression forN (i) SRD . As a result, a numerical search is used to obtainN . (20)

Iterative approximate LMMSE estimation
A drawback of the approximate MAP estimation in the previous section is the numerical search required to obtain the noise variance estimate. This can be avoided by considering a LMMSE estimate for h SRD , which is obtained independently of the noise variance estimate. If c was known at the destination, we could take a linear estimate of the formĥ SRD = r RD u H and determine u such that E |ĥ SRD − h SRD | 2 is minimum, with E[.] denoting the expectation over the a priori statistics of the noise and the fading. As shown in Appendix 4, this yields u = αc/(γ |c| 2 ) with α = κ/(1+κ), κ = γ 2 |c| 2 H SR H RD /N SRD,avg , http://jwcn.eurasipjournals.com/content/2014/1/222 and N SRD,avg = γ 2 H RD N SR + N RD . In order to (approximately) apply LMMSE estimation in the case where c is not known to the destination, we simply replace c and |c| 2 by their a posteriori expectations μ (i) and σ (i) , respectively, withĥ (i) SRD,ML given by (17). This iterative estimate is initialized by means of the pilot-based LMMSE estimate given bŷ withĥ (i) SRD,ML given by (13). This yields the estimates from Section 4.1, but withN (16)  In order to obtain a closed-form expression for the noise variance estimate, we ignore the a priori distribution (7) of (h SRD , N SRD ). The pilot-based initial estimate is given bŷ Note that the first argument in the max(.) function in (23) is an unbiased estimate of N SRD ; the (biased) ML estimate of N SRD is obtained when replacing K p − 1 by K p . The restriction thatN (0) SRD should not be smaller than N RD arises naturally from (3). Similarly, during the SAGE iterations, we ignore in (12) the a priori distribution of (h SRD , N SRD ) and perform the maximization over N SRD with h SRD fixed toĥ which is the ML estimate from [8].

MSEE performance analysis
Here, we present closed-form expressions for the MSEE resulting from some of the considered algorithms. We make the comparison with the Cramer-Rao bound (CRB) and the modified CRB (MCRB), which are fundamental lower bounds on the MSEE of unbiased estimates. Note that these bounds do not take into account the a priori distribution of the parameter to be estimated. For the derivation of the (M)CRBs in the context of an amplify-and-forward relaying system, we refer to [15]. Let us first consider the pilot-based estimation of h SRD , which uses only r p . The corresponding CRB and MCRB both equal N SRD,avg /|γ c p | 2 , which was shown in [8] to coincide with the MSEE of the pilot-based ML estimate from (13). A closed-form expression for the MSEE of the pilotbased MAP channel gain estimate from (15) is hard to find, because this estimate is coupled with the (numerically obtained) noise variance estimate. In contrast, the MSEE of the pilot-based LMMSE channel gain estimate from (22) is, as illustrated in Appendix 4, easily found (25) Note that this MSEE is always below the corresponding (M)CRB, because the (biased) LMMSE channel gain estimate takes a priori information into account. This effect is more pronounced in the low SNR region, whereas at high SNR, the MSEE (25) converges to the (M)CRB.
When also incorporating r d , it is not possible to obtain expressions for the MSEEs resulting from the considered estimation algorithms nor for the CRB, because of the presence of the unknown data symbols; however, the corresponding MCRB is easily evaluated and is given by N SRD,avg /E |γ c| 2

Numerical performance results
In this section, we study the MSEE and FER performance of the presented MAP and LMMSE algorithms by means of Monte Carlo simulations. We use a non-recursive rate-1/2 convolutional code [16] with generating polynomials (15, 17) 8 to encode 180 information bits per frame at the source; the resulting 360 coded bits are mapped to 4-quadrature amplitude modulation (QAM) symbols, yielding K d = 180. We select K p = 2, so that K = 182.
We take E s = E r and assume N SR = N SD = N RD = N 0 , H SR = H RD = 1, and H SD = 0.5. Defining the signal-to-noise ratios on the SD, SR, and RD channels as SNR SD = E s H SD /N SD , SNR SR = E s H SR /N SR , and SNR RD = E r H RD /N RD , we obtain 2SNR SD = SNR SR = SNR RD . A comparison is made with both the pilot-based and the http://jwcn.eurasipjournals.com/content/2014/1/222 SAGE-based ML estimates (derived in [8]), which do not exploit a priori information; as far as the FER performance is concerned, also a reference system is considered, where the destination knows all channel parameters and no pilot symbols are transmitted. For the MAP, LMMSE and ML algorithms considered, the SAGE algorithm converges after only two iterations. The behaviour of the MSEE related to estimating h SRD as a function of SNR SD is depicted in Figure 1. Considering estimation based only on pilot symbols, we observe that the MAP and LMMSE estimates slightly outperform the ML estimate at low SNR, and their MSEE converges to the MMSE for ML estimation (which equals N SRD,avg /|γ c p | 2 ) with increasing SNR. As far as the iterative SAGE-based estimates that exploit the entire received signal is concerned, the MAP, LMMSE, and ML estimates yield essentially the same MSEE, which is considerably less than when only the pilot symbols are exploited; at high SNR, the MSEE converges to the MCRB (which equals N SRD,avg /E |γ c| 2 ), whereas at low SNR, the MSEE is considerably larger than the MCRB because the a posteriori expectation μ (i) significantly deviates from c.
For the same strategies, Figure 2 depicts the corresponding FER as a function of E b /N 0 , where E b denotes the received energy per information bit at the destination (see (4)). When exploiting only the pilot symbols for channel estimation, the MAP approach slightly (by about 0.2 dB) outperforms the LMMSE and ML approaches in terms of FER; as compared to the reference system, the FER resulting from MAP estimation is about 2 dB worse. The iterative SAGE-based strategies (MAP, LMMSE, and ML) that exploit the entire received signal yield essentially the same FER, which is only about 0.3 dB worse than the reference system. Figure 3 depicts the FER performance of the LMMSE estimates for several iterations. We observe that the performance improvement beyond two iterations becomes negligibly small. We have verified that also the FER performance of the ML and MAP estimates has essentially converged after two iterations (results not displayed to avoid overloading the figure).
The considered algorithms make use of pilot symbols to assist the estimation. However, the transmission of pilot symbols reduces both the power efficiency (part of the transmit energy is devoted to pilot symbols that carries no information) and the spectral efficiency (see (5)). Figure 4 shows the effect of the number of pilot symbols K p on the FER, for E b /N 0 = 8 dB. The general trend is that (a) for small K p , the FER decreases with increasing K p , because the effect of improving estimation quality is more important than the reduction of the power efficiency; and (b) for large K p , the FER increases with increasing K p , because the reduction of the power efficiency is more important than the (slightly) improving estimation quality. For the given K p , the SAGE-based algorithms outperform the pilot-based algorithms in terms of FER, because the former also exploits the data-bearing part of the received signal during the estimation process. We observe that the FER for the SAGE-based algorithms is only weakly dependent on K p in the interval 2 ≤ K p ≤ 10 and achieves a broad minimum at K p = 5; for the pilot-based algorithms, the dependence of the FER on K p is stronger, because only the pilot symbols contribute to the estimation.
Finally, we display in Figure 5 the FER performance for a 16-QAM constellation; we still have K d = 180 and K p = 2 and use the same rate 1/2 code as for 4-QAM, implying that each frame now contains 360 information bits. In order to achieve a given FER, using 16-QAM requires a larger value of E b /N 0 as compared to 4-QAM, because the distance between constellation points is smaller for the former. For pilot-based estimation, the MAP algorithm is slightly better than the LMMSE algorithm in terms of FER and outperforms the ML algorithm by about 0.8 dB; as compared to the reference system, the FER resulting from MAP estimation is about 2.2 dB worse. The iterative SAGE-based strategies (MAP, LMMSE, and ML) that exploit the entire received signal yield essentially the same FER, which is about 1.2 dB worse than the reference system.

Complexity considerations
Let us consider the computational complexity of the receiver as the sum of the complexities of the estimation algorithm and the decoding algorithm. As we have considered convolutional encoding, the decoder makes use of the Viterbi algorithm, which efficiently determines the shortest past in a trellis [16]. The complexity of the Viterbi algorithm is proportional to the number K d of data symbols and the number of trellis states.
In the case of pilot-based estimation, the complexity of the considered estimation algorithms is roughly proportional to the number K p of pilot symbols. The complexity of these algorithms is negligible as compared to that of the Viterbi decoder because K p K d . Hence, for pilot-based estimation, the receiver complexity is dominated by that of the Viterbi decoder.
In the case of SAGE-based estimation, at each iteration, the a posteriori probabilities of all K d data symbols are needed in order to obtain the a posteriori expectations (18) and (19). For a convolutional code, these probabilities are efficiently computed by means of the Bahl, Cocke, Jelinek, and Raviv (BCJR) algorithm [16], whose computational complexity is about three times as large as that of the Viterbi algorithm. The complexity of the considered estimation algorithms is dominated by that of the BCJR algorithm. Hence, when I iterations are executed, the receiver complexity is about 3I + 1 times the complexity of the Viterbi algorithm.
In [17], an approximate computation of the a posteriori symbol probabilities based on the Viterbi decoder metrics has been presented, with a complexity similar to that of the Viterbi algorithm. When using the algorithm from [17] during I SAGE iterations, the receiver complexity is about I + 1 times the complexity of the Viterbi algorithm.
In summary, the performance improvement resulting from the application of SAGE-based estimation comes at the cost of an increase of the receiver complexity as compared to the use of pilot-based estimation only. This increase is (3I +1)-fold for the BCJR algorithm and (I +1)fold for the algorithm from [17]; in the case at hand, we have I = 2, which corresponds to a complexity increases by a factor of 7 and 3, respectively.

Conclusions
In the context of the cooperative amplify-and-forward protocol, we proposed approximations of MAP and LMMSE channel estimations, exploiting the a priori distribution of the channel parameters, and compared these algorithms to the ML channel estimation, which does not make use of a priori information. When using pilot symbols only, both the MAP and LMMSE algorithms outperform the ML algorithm in terms of FER; the MAP algorithm yields the better FER performance, providing a gain of about 0.2 dB (0.8 dB) compared to the ML algorithm, in the case of a 4-QAM (16-QAM) constellation and frame parameters, K p = 2 and K d = 180. Performance of the MAP, LMMSE, and ML algorithms is substantially improved by exploiting also the presence of the unknown data symbols; the resulting iterative SAGE-based algorithms converge after only two iterations and yield nearly identical FER performance for the MAP, LMMSE, and ML algorithms, which for a 4-QAM (16-QAM) constellation is only about 0.3 dB (1.2 dB) worse than in the case where the destination knows all channel parameters.

Appendix 1
The density Noting that p(N SRD |h RD ) and p(h SRD |h RD ) depend on |h RD | 2 but not on arg(h RD ), we introduce x = |h RD | 2 γ 2 N SR , which has an exponential density: p(x) = A exp(−Ax) for x ≥ 0 and p(x) = 0 otherwise, with A = (γ 2 N SR H RD ) −1 .
Expressing p(h SRD |h RD ) and p(N SRD |h RD ) as a function of x yields p(N SRD |h RD ) = δ(y − x) and where y = N SRD − N RD and B = γ 2 N SR /H SR . The density p(h SRD , N SRD ) is then obtained as the expectation of p(h SRD |h RD )p(N SRD |h RD ) w.r.t. x, i.e., where C = AB/π = (πH SR H RD ) −1 .

Appendix 2
In order to obtain the pilot-based MAP estimate of (h SRD , N SRD ), based on the observation r RD,p only, we have to jointly maximize ln p(r RD,p , h SRD , N SRD ) = ln p(r RD,p |h SRD , N SRD ) + ln p(h SRD , N SRD ). Taking into account (7) and (2) Substituting (31) back into (30) yields the function ln p r RD,p ,ĥ SRD (N SRD ), N SRD ; the value of N SRD which maximizes this function cannot be obtained in closed form.

Appendix 3
Computation of the function to be maximized in (12) yields (within terms not depending on (h SRD , N SRD ) E ln p r RD |h SRD , N SRD , c p h SRD , N SRD |r,ν where ln p(h SRD , N SRD ) follows from (7). In a similar way as for the pilot-based estimate, it can be verified that maximizes (33) for N SRD =N (i−1) SRD . As the maximization of (33) w.r.t. N SRD for h SRD =ĥ (i) SRD is intractable analytically, we resort to a numerical search.