EURASIP Journal on Wireless Communications and Networking 2005:2, 92–99 c ○ 2005 Hindawi Publishing Corporation The Extended-Window Channel Estimator for Iterative Channel-and-Symbol Estimation

The application of the expectation-maximization (EM) algorithm to channel estimation results in a well-known iterative channel-and-symbol estimator (ICSE). The EM-ICSE iterates between a symbol estimator based on the forward-backward recursion (BCJR equalizer) and a channel estimator, and may provide approximate maximum-likelihood blind or semiblind channel estimates. Nevertheless, the EM-ICSE has high complexity, and it is prone to misconvergence. In this paper, we propose the extended-window (EW) estimator, a novel channel estimator for ICSE that can be used with any soft-output symbol estimator. Therefore, the symbol estimator may be chosen according to performance or complexity speciﬁcations. We show that the EW-ICSE, an ICSE that uses the EW estimator and the BCJR equalizer, is less complex and less susceptible to misconvergence than the EM-ICSE. Simulation results reveal that the EW-ICSE may converge faster than the EM-ICSE.


INTRODUCTION
Channel estimation is an important part of communications systems. Channel estimates are required by equalizers that minimize the bit error rate (BER), and can be used to compute the coefficients of suboptimal but lowercomplexity equalizers such as the minimum mean-squared error (MMSE) linear equalizer (LE) [1], or the decisionfeedback equalizer (DFE) [1]. Traditionally, a sequence of known bits, called a training sequence, is transmitted for the purpose of channel estimation [1]. These known symbols and their corresponding received samples are used to estimate the channel. However, this approach, known as trained estimation, ignores received samples corresponding to the information bits, and thus does not use all the information available at the receiver. Alternatively, semiblind estimators [2] use every available channel output for channel estimation. Thus, they outperform estimators based solely on the channel outputs corresponding to training symbols, and require a shorter training sequence. Channel estimation is still This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. possible even if no training sequence is available, using a technique known as blind channel estimation.
An important class of algorithms for blind and semiblind channel estimation is based on the iterative strategy depicted in Figure 1 [3,4,5,6,7,8,9,10,11,12,13,14], which we call iterative channel-and-symbol estimation (ICSE). In these algorithms, an initial channel estimate is used by a symbol estimator to provide initial estimates of the firstorder (and possibly also the second-order) statistics of the transmitted symbol sequence. These estimates are used by a channel estimator to improve the initial channel estimates. The process is then repeated. The hope is that several iterations between these two low-complexity estimators will lead to estimates that nearly maximize the joint likelihood function.
The application of the expectation-maximization (EM) algorithm, also known as the Baum-Welch algorithm [15,16], to the blind channel estimation problem results in the canonical ICSE that fits the framework of Figure 1. An EM iterative channel-and-symbol estimator (EM-ICSE) was first reported in [4], and it has some useful properties. First, it generates a sequence of estimates with nondecreasing likelihood, so that the channel estimates are capable of approaching the maximum-likelihood (ML) estimates. Second, its symbol estimator is based on the forwardbackward recursion of Bahl et al. (BCJR) [17], which minimizes the probability of decision error. Third, the EM-ICSE may be easily modified to exploit, in a natural and nearly optimal way, any a priori information the receiver may have about the transmitted symbols. This a priori information may arise because of pilot symbols (e.g., in semiblind estimation) or error-control coding (e.g., in the context of turbo equalization [6,7,8,9]).
The application of iterative channel estimation to turbo equalization is particularly important, since it leads to channel estimates that benefit from the presence of channel coding, thus performing well at low signal-to-noise ratios [6,7,8,9]. This is particularly important because powerful codes such as turbo codes [18,19] allow reliable communication at extremely low signal-to-noise ratios, which only exacerbates the estimation problem for traditional channel estimators that ignore the existence of coding, as is the case with most blind channel estimation techniques.
The EM-ICSE has two main drawbacks that we address in this paper: its tendency to converge to inaccurate channel estimates, and its high computational complexity. The problem of convergence to inaccurate estimates arises because the EM-ICSE necessarily generates a sequence of estimates with nondecreasing likelihood. This property makes the EM-ICSE susceptible to getting trapped in a local maximum of the likelihood function. Also, the EM-ICSE has two sources of complexity. First, the EM channel estimator involves the computation and inversion of a square matrix whose order is equal to the channel length. Second, and more important, the complexity of the EM symbol is exponential in the channel length. In [11,12], ICSEs are proposed that reduce the complexity of the EM-ICSE by introducing a low-complexity symbol estimator. However, these works focus only on the symbol estimator, and use the same channel estimator as the EM-ICSE, resulting in a computational complexity that grows with the square of the channel memory.
In this work, we focus on the channel estimator of Figure  1. We will propose the simplified EM channel estimator (SEM), a channel estimator for ICSE that avoids the matrix inversion of the EM channel estimator. More importantly, an ICSE based on the SEM channel estimator does not require the BCJR equalizer, and thus may be implemented with any number of low-complexity alternatives to the BCJR algorithm, such as those proposed in [20,21]. Since the complexity of the SEM channel estimator is linear in the channel memory, the overall complexity of an ICSE based on the SEM channel estimator is also linear if a linear-complexity equalizer is used. We will also investigate the convergence of an ICSE based on the SEM estimator. We will see that, after misconvergence, the SEM channel estimates may have a structure that can be exploited to escape the local maximum of the likelihood function. We then propose the extended-window (EW) channel estimator, a simple modification to the SEM channel estimator that exploits this structure and greatly decreases the probability of misconvergence, without significantly affecting the computational complexity.
This paper is organized as follows. In Section 2 we present the channel model and describe the problem we will investigate. In Section 3, we briefly review the EM-ICSE. In Section 4, we propose the SEM estimator, a linearcomplexity channel estimator for ICSE that is not intrinsically linked to a symbol estimator. In Section 5, we propose the EW estimator, an extension to the SEM estimator of Section 4 that is less likely than EM to get trapped in a local maximum of the joint likelihood function. In Section 6, we present some simulation results, and we draw some conclusions in Section 7.

CHANNEL MODEL AND PROBLEM STATEMENT
Consider the transmission of K zero-mean, uncorrelated symbols a k belonging to some alphabet A, with unit energy E[|a k | 2 ] = 1, across a dispersive channel with memory µ and additive-white Gaussian noise. The received signal at time k can be written as where h = (h 0 , h 1 , . . . ,h µ ) T represents the channel impulse response, a k = (a k , a k−1 , . . . ,a k−µ ) T , and n k represents white Gaussian noise with variance σ 2 . Let a = (a 0 , a 1 , . . . ,a K−1 ) and r = (r 0 , r 1 , . . . ,r N−1 ) denote the input and output sequences, respectively, where N = K +µ. The resulting channel model is depicted in Figure 2.
Notice that, as far as channel estimation is concerned, the assumption that the transmitted symbols are uncorrelated is not too restrictive. Indeed, most training sequences are chosen so as to satisfy this assumption (thus minimizing the Cramér-Rao bound [22]) and the presence of an interleaver in most coded systems also ensures that the transmitted sequence is approximately uncorrelated. In other words, for channel estimation purposes, assuming that the transmitted symbols are uncorrelated does not exclude the presence of a training sequence or of a channel code. As we will see, it is the symbol estimator in Figure 1 that exploits the presence of a training sequence or of a channel code.
This paper concerns the joint estimation of a, h, and σ relying solely on the received signal r. Ideally, we would like to solve the joint-ML channel estimation and symbol detection problem, that is, find where log p h,σ (r|a) is the log-likelihood function, defined as the logarithm of the pdf of the received signal r conditioned on the channel input r and parameterized by r and σ. Intuitively, the ML estimates are those that best explain the received sequence, in the sense that we are less likely to observe the channel output if we assume any other set of parameters to be correct, that is, p h,σ (r|a) ≥ p hML,σML (r|a ML ) for all h, σ, a.
Besides this intuitive interpretation, ML estimates have many interesting theoretical properties [22]. It is noteworthy that the maximization in (2) should be performed over the set of valid transmitted sequences. Thus, the joint-ML channel-and-symbol estimation problem in (2) incorporates all possible scenarios: fully trained estimation (all of a is known); semiblind estimation without coding (parts of a are known, unknown parts of a can be any sequence of symbols); semiblind estimation with coding (parts of a are known, a must be a valid codeword); blind estimation without coding (none of a is known, a can be any sequence of symbols); and blind estimation with coding (none of a is known, a must be a valid codeword).
Unfortunately, a direct solution to the problem in (2) is too complex. Therefore, this paper focuses on iterative techniques that provide an approximate solution to (2) with reasonable computational complexity. In the sequel, we review the EM-ICSE, an ICSE that computes a sequence of estimates with nondecreasing likelihood and that, with proper initialization or if the likelihood function is well-behaved, will converge to the ML estimates.

THE EM-ICSE
The EM algorithm [15,16] provides an iterative solution to the blind identification problem in (2) that fits the paradigm of Figure 1, as first reported in [4]. The EM channel estimator (see Figure 1) for the (i + 1)th iteration of the EM-ICSE is defined by where The EM symbol estimator (see Figure 1) provides the values ofã ( that are required by (5) and (6). The a posteriori expected values in (5) and (6) are computed assuming that h (i) and σ 2 (i) are the actual channel parameters. Notice thatã k = E[a k |r; h (i) , σ 2 (i) ] is the a posteriori MMSE estimate of a k , and we refer toã k as a soft symbol estimate.
Also, note that R i and p i of (5) and (6) can be viewed as estimates of the a posteriori autocorrelation matrix of the transmitted sequence and the cross-correlation vector between the transmitted and received sequences, respectively. Thus, (3) and (4) are similar to the MMSE-trained channel estimator [22], in which R i and p i are computed with the actual transmitted sequence.
The computation of the expected values in (5) and (6) require the knowledge of the a posteriori probabili- . For an uncoded system, these can be exactly computed with the forwardbackward recursion or BCJR algorithm [17]. Because the computational complexity of this algorithm grows exponentially with the channel length, some authors [11,12] have proposed lower-complexity alternatives that compute approximations to these a posteriori probabilities. In other words, the algorithms of [11,12] are approximations to the EM-ICSE that also fit the framework of Figure 1, and that are also based on the channel estimator of (3), (4), (5), and (6).
Unfortunately, in the presence of a channel code, an exact computation of R i and p i is prohibitively complex. The most common solution in this case is to modify the EM-ICSE, using a turbo equalizer as the symbol estimator [6]. In other words, for coded systems, E[a k |r; h (i) , σ 2 (i) ] and E[a k a T k |r; h (i) , σ 2 (i) ] are based on the decoder output. Similarly, the presence of training symbols is easily handled by the symbol estimator, which only has to set the training symbols as deterministic constants when computing R i and p i . Based on these two observations, we see that the channel estimator of the EM-ICSE always ignores the presence of a training sequence or of a channel code. It is the symbol estimator that exploits the structure of the transmitted symbols to improve their estimates.

A SIMPLIFIED EM CHANNEL ESTIMATOR
In this section, we propose the simplified EM estimator (SEM), an alternative to the EM channel estimator in (3), (4), (5), and (6) that avoids the computation of R i and the matrix inversion of (3). To derive the SEM estimator, we note that, from channel model (1) and the uncorrelatedness assumption, we get h n = E[r k a k−n ]. This expected value may be computed by conditioning on r, yielding E r k a k−n = E E r k a k−n |r = E r k E a k−n |r , where the last equality follows from the fact that r k is a constant given r. Note that the channel estimator has no access to E[a k |r], which requires exact channel knowledge. However, based on the iterative paradigm of Figure 1, at the ith iteration the channel estimator does have access toã (i) Replacing this value in (7), and also replacing a time average for the ensemble average, leads to the following channel estimator: Notice that in (8) the channel is estimated by correlating the received signal with the soft symbol estimatesã k . This is similar to the fully trained channel estimator of [23,24], known as channel probing, except that the training symbols have been replaced by their soft estimates. As for estimating the noise variance, let a (i) k be a hard decision of the kth transmitted symbol, chosen as the element of A closest toã (i) k . Also, define the vector a (i) Notice that in (9) we use hard instead of soft symbol estimates. In our simulations, we found that doing so improved convergence speed.
Remark 1. Combining the estimates (8) into a single vector, we find that Thus, we may view (8) as a simplification of the EM estimate R −1 i p i that avoids matrix inversion by approximating R i by I. This approximation is reasonable, since R i is an a posteriori estimate of the autocorrelation matrix of the transmitted vector, which, due to the uncorrelatedness assumption, is close to the identity for large N. Furthermore, since this approximation results in a channel estimator that is less complex than the EM channel estimator defined in (3) and (4), we refer to the channel estimator defined by (8) and (9) as the simplified EM estimator (SEM).
Remark 2. The SEM channel estimator requires only the soft symbol estimatesã (i) k , so that an ICSE based on the SEM estimator may be represented as in Figure 3. Note that any equalizer that produces soft symbol estimates can be used, which allows for an even lower-complexity implementation of an SEM-based ICSE, using equalizers such as those proposed in [20,21].
Remark 3. It is interesting to notice that, while substituting the actual values of h or a for their estimates will always improve the performance of the iterative algorithm, the same is not true for σ. Indeed, substituting σ for σ will often result  in performance degradation. Intuitively, one can think of σ as playing two roles: in addition to measuring σ, it also acts as a measure of reliability of the channel estimate h. In fact, consider a decomposition of the channel output: The term (h − h) T a k represents the contribution to r k from the estimation error. By using h to model the channel in the BCJR algorithm, we are in effect lumping the estimation error with the noise, resulting in an effective noise sequence with variance larger than σ 2 . It is thus appropriate that σ should exceed σ whenever h differs from h. Alternatively, it stands to reason that an unreliable channel estimate should translate to an unreliable symbol estimate, regardless of how well h T a k matches r k . Using a large value of σ in the BCJR equalizer ensures that its output will have a small reliability. Fortunately, the noise variance estimate produced by (9) measures the energy of both the second and the third term in (10). If h is a poor channel estimate,ã will also be a poor estimate for a, and convolvingã and h will produce a poor match for r, so that (9) will produce a large estimated noise variance.

THE EXTENDED-WINDOW CHANNEL ESTIMATOR
Misconvergence is a common characteristic of ICSEs, especially in blind systems. To illustrate this problem, consider estimating the channel h = (1, 2, 3, 4, 5) T with a BPSK constellation and SNR = h 2 /σ 2 = 20 dB. An ICSE based on the BCJR symbol estimator and the SEM channel estimator converges to h (20) = (2.1785, 3.0727, 4.1076, 5.0919, 0.1197) T after 20 iterations, with K = 1000 bits, with initialization h (0) = (1, 0, 0, 0, 0) T and σ 2 (0) = 1. Although the algorithm fails, h (20) is seen to roughly approximate a shifted (or delayed) and truncated version of the actual channel. A possible explanation for this behavior is that the channel is maximum phase, while we used a minimum phase initialization. This phase mismatch between h and the initialization h (0) introduces a delay that cannot be compensated for by the iterative scheme. In fact, after convergence, a k is approximately sign(ã k+1 ), and h 0 can be accurately estimated by correlating r k withã k+1 . However, because the delay n in (8) is limited to the narrow window 0, . . . , µ, this correlation is never computed. This observation leads us to propose the extended-window (EW) channel estimator, in which (8) is computed for a broader range of n.
Notice that after convergence we expect that g δ ≈ h 0 . Comparing (7) and (11), we note that this is equivalent to saying that a k ≈ã (i) k−δ . This delay must be taken into account in the estimation of the noise variance. With that in mind, we propose to estimate σ 2 using a modified version of (9), namely

Computational complexity
We now compare the computational complexity of the EW channel estimator of (11), (12), and (13) to that of the EM channel estimator of (3) and (4). We ignore the cost of computingã k , and we consider the complexity in terms of sums and multiplications per received symbol. For each received symbol, the EW algorithm performs 3µ + 1 multiplications and 3µ + 1 additions to compute the vector g in (11). The division by N, as well as the computation of δ, is done only once per block of N received symbols, and thus can be ignored. The computation of each term in the summation in (13) involves µ + 2 multiplications and the same number of sums. Hence, the total computational cost of the EW channel estimator is 4µ + 4 multiplications and 4µ + 4 sums.
For the EM channel estimator, we consider that E[a k a T k |r; h (i) , σ 2 (i) ] ≈ E[a k |r; h (i) , σ 2 (i) ] E[a k |r; h (i) , σ 2 (i) ] T . This approximation is used in [11,12], and allows for a simpler complexity comparison. With this simplification, and noting that E[a k a T k |r; h (i) , σ 2 (i) ] is a symmetric matrix, we see that the computation of R i in (5) requires (µ + 1)µ/2 multiplications and an equal number of sums per received symbol. On the other hand, the computation of p i in (6) requires µ + 1 multiplications and sums per received symbol. The linear system in (3) is solved only once, so that its cost can be ignored. The same can be said about most of the operations in (4), except for its first term, which requires 1 multiplication and sum per received symbol. Thus, the total cost of this approximate EM channel estimator is µ 2 /2+3µ/2+2 multiplications and sums per received symbol.

SIMULATION RESULTS
In this section, we use simulations to compare the performance of the fully blind EM-ICSE and the fully blind EW-ICSE, assuming both ICSEs use the BCJR symbol estimator. The results presented in this section all correct for the aforementioned shifts in the estimates. In other words, when computing channel estimation error or BER, the channel and symbol estimates were shifted to best match the actual channel or the transmitted sequence. Note that this shift was done only for the purpose of computing the errors, and hence did not affect the estimates in the iterative procedure.
For comparison purposes, we also consider fully trained channel estimators, in which all the transmitted bits are assumed known by the channel estimator. We consider the fully trained MMSE estimator which, as discussed in Section 3, can be seen as a trained version of the EM channel estimator. We also consider channel probing which, as discussed in Section 4, can be seen as the trained counterpart of the EW channel estimator. In the simulations of the trained estimators, we use the same block of received samples to estimate the channel (assuming that all transmitted symbols are known) and to estimate the transmitted symbols (with the BCJR equalizer, using the trained channel estimates).
As a first test of the EW-ICSE, we simulated the transmission of K = 600 BPSK symbols over the channel h = (−0.2287, 0.3964, 0.7623, 0.3964, −0.2287) T from [12]. To stress the fact that the EW-ICSE is not sensitive to initial conditions, we initialized h randomly using h (0) = u σ (0) / u , where u ∼ N (0, I) and σ 2 (0) = N−1 k=0 |r k | 2 /2N. By assigning half of the received energy to the signal and half to the noise, we are essentially initializing the SNR estimate to 0 dB. In Figure 4, we show the convergence behavior of the EW-ICSE estimates, averaged over 100 independent runs of this experiment using SNR = h 2 /σ 2 = 9 dB. Only the convergence of h 0 , h 1 , and h 2 is shown; the behavior of h 3 and h 4 is similar to that of h 2 and h 0 , respectively, but we show only the coefficients with worse convergence. The shaded regions around the channel estimates correspond to plus and minus one standard deviation. For comparison, we show the average behavior of the EM channel estimates in Figure 5. Unlike the good performance of the EW-ICSE, the EM estimates even fail to converge in the mean to the correct estimates, especially h 0 . This happens because the EM-ICSE often gets trapped in local maxima of the likelihood function [16], while the EW-ICSE avoids many of these local maxima. The better convergence behavior of the EW-ICSE is even more clear in Figure 6, where we show the noise variance estimates. Also, Figures 4, 5, and 6 suggest that the EW-ICSE converges faster than the EM-ICSE. In Figure 7 we show the channel estimation error for the EW-ICSE and the EM-ICSE estimates as a function of SNR, after 20 iterations. The number of iterations is enough for both the EM-ICSE and the EW-ICSE to converge in this case. We also show the estimation errors of the trained MMSE estimates and the trained channel probing estimates. The results are averaged over 100 independent runs of this experiment.  In Figure 8, we show the average BER. Again, as we can see in Figures 7 and 8, the EW-ICSE performs better than the EM-ICSE.
It is interesting to notice in Figures 7 and 8 that for high enough SNR the performance of the EW-ICSE approaches that of its trained counterpart, the channel probing estimator. One might also expect the performance of the EM-ICSE to approach that of its trained counterpart, the MMSE algorithm. However, as we can see from Figures 7 and 8, the EM-ICSE performs worse than channel probing, which is in turn worse than the MMSE estimator. The difference between the EM and MMSE estimates may be explained by the misconvergence of the EM-ICSE.
It should be pointed out that even though the channel estimates provided by the MMSE algorithm are better than the channel probing estimates, the BER of both estimates is similar. In other words, the channel probing estimates are "good enough," and the added complexity of the MMSE estimator does not have much impact on the BER performance in the SNR range considered here. Finally, we observed that the BER performance of a BCJR equalizer with channel knowledge cannot be distinguished from that of a BCJR equalizer using the MMSE estimates.
To further support the claim that the EW-ICSE avoids most of the local maxima of the likelihood function that trap the EM-ICSE, we ran both the EM-ICSE and the EW-ICSE on 1000 random channels of memory µ = 4, generated as h = u/ u , where u ∼ N (0, I). The estimates were initialized to σ 2 (0) = N−1 k=0 |r k | 2 /2N and h (0) = (0, . . . ,0, σ (0) , 0, . . . , 0) T , that is, the center tap of h (0) is initialized to σ (0) . We used SNR = 18 dB, and blocks of K = 1000 BPSK symbols. In Figure 9 we show the word error rate (WER) (fraction of blocks detected with errors) of the EW-ICSE and the EM-ICSE versus iteration. It is again clear that the EW-ICSE outperforms the EM-ICSE. It should be noted that in this example the equalizer based on the channel probing estimates was able to detect all transmitted sequences correctly. The better performance of the EW estimates can also be seen in Figure 10, where we show histograms of the estimation errors (in dB) for the channel probing, the EW, and the EM estimates, computed after 50 iterations. We see that while only 3% of the EW estimates have an error larger than −16 dB, 35% of the EM estimates have an error larger than −16 dB. In fact, the histogram for the EW estimates is very similar to that of the channel probing estimates, which again shows the good convergence properties of the EW-ICSE. It is also interesting to note in Figure 10 that the EM estimates have a bimodal behavior: the estimation errors produced by the EM-ICSE are grouped around −11 dB and −43 dB. These groups are respectively better than and worse than the channel probing estimates. This bimodal behavior can be explained by the fact that the EM algorithm often converges to inaccurate estimates, leading to large estimation errors. On the other hand, when the EM algorithm does work, it works very well.

CONCLUSIONS
We presented the EW channel estimator, a linear-complexity channel estimator for ICSE. We have shown that this technique can be seen as a modification of the EM channel estimator. A key feature of the EW estimator is its extended window, which greatly improves the convergence behavior of ICSEs based on the EW estimator, avoiding most of the local maxima of the likelihood function that trap the EM-ICSE. Furthermore, the computational complexity of the EW estimator grows linearly with the channel memory, as opposed to the quadratic complexity of the EM channel estimator. Additionally, the EW estimator may be used with any softoutput equalizer. This allows for even further complexity reduction when compared to the EM-ICSE, which requires a BCJR equalizer. However, simulations show that, despite its good convergence properties, the EW-ICSE is not globally convergent. The problem of devising an iterative strategy that is guaranteed to always avoid misconvergence, regardless of initialization, remains open.