# Low-complexity *a posteriori* probability approximation in EM-based channel estimation for trellis-coded systems

- Nico Aerts
^{1}Email author, - Iancu Avram
^{1}and - Marc Moeneclaey
^{1}

**2014**:153

https://doi.org/10.1186/1687-1499-2014-153

© Aerts et al.; licensee Springer. 2014

**Received: **21 May 2014

**Accepted: **2 September 2014

**Published: **17 September 2014

## Abstract

When estimating channel parameters in linearly modulated communication systems, the iterative expectation-maximization (EM) algorithm can be used to exploit the signal energy associated with the unknown data symbols. It turns out that the channel estimation requires at each EM iteration the *a posteriori* probabilities (APPs) of these data symbols, resulting in a high computational complexity when channel coding is present. In this paper, we present a new approximation of the APPs of trellis-coded symbols, which is less complex and requires less memory than alternatives from literature. By means of computer simulations, we show that the Viterbi decoder that uses the EM channel estimate resulting from this APP approximation experiences a negligible degradation in frame error rate (FER) performance, as compared to using the exact APPs in the channel estimation process.

### Keywords

*A posteriori*symbol probability ML estimation EM algorithm Trellis-coded modulation Viterbi decoder

## 1 Introduction

When the channel between a source and destination node is not known, it is primordial for the destination to estimate this channel in order to decode the transmitted information. Typically, the source assists the destination with this task by transmitting known pilot symbols along with the unknown data symbols. Making use of only these pilot symbols, the destination is able to estimate the channel. The drawback of this pilot-aided method is that the channel information contained in the data part of the signal is not harvested during the estimation. Hence, in order to obtain an accurate channel estimate, a large number of pilot symbols should be present, yielding a substantial reduction of both power and bandwidth efficiency.

To accommodate these problems, the iterative expectation-maximization (EM) algorithm [1, 2] can be used to also exploit the signal energy associated with the unknown data symbols during the channel estimation; this way, much less pilot symbols are needed to achieve a given estimation accuracy. Application of the EM algorithm requires that at each iteration the *a posteriori* probabilities (APPs) of these data symbols be calculated. When using a trellis code to map the information bits on the data symbols, the Viterbi algorithm [3] minimizes the frame error rate (FER) by performing maximum likelihood (ML) sequence detection. The exact APPs of the trellis-coded data symbols are obtained by means of the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm [4], which however is roughly three times as complex as the Viterbi algorithm [5].

Several low-complexity approximations of the BCJR algorithm have been proposed in the literature, mainly in the context of iterative soft decoding of concatenated codes, referred to as turbo decoding. Among them are the max-log maximum *a posteriori* probability (MAP) algorithm [5] and the soft-output Viterbi algorithm (SOVA) [6], which are roughly twice as complex as the Viterbi algorithm [7], and the soft-output M-algorithm (SOMA) [8], which reduces complexity by considering only the M most likely states at each trellis section. Some improvements of the SOVA algorithm in the context of turbo decoding have been presented in [9–13]. Whereas these referenced papers make use of the approximate APPs inside the iterative decoder, we focus on using the approximate APPs only in the iterative estimation algorithm and use the standard Viterbi algorithm (which does not need symbol APPs) for decoding. Because an accurate approximation of the true APPs is less critical for the proper operation of the EM algorithm, we propose a simpler approximation of the APP computation with roughly half the complexity of max-log MAP and with substantially less memory requirements. We compare the resulting EM algorithm in terms of estimation accuracy and FER of the Viterbi decoder, with the cases where the EM estimator uses either the true APPs, or the APPs resulting from SOMA, or the APPs that are computed under the simplifying assumption of uncoded transmission.

### 1.1 Notations

All vectors are row vectors and in boldface. The Hermitian transpose, statistical expectation, the *m* th element, the first *m* elements, and estimate of the row vector x are denoted by x^{
H
}, *E* [x], *x*_{
m
}, x_{1:m}, and $\widehat{x}$, respectively.

## 2 System description

*K*

_{p}pilot symbols c

_{p}and a sequence of

*K*data symbols c; the latter is obtained by applying

*K*

_{b}information bits b to a trellis encoder [5]. We assume that the pilot symbols have a constant magnitude $\sqrt{{E}_{\mathrm{s}}}$ and that the data symbols satisfy $\mathbb{E}\phantom{\rule{0.2em}{0ex}}\left[|{c}_{m}{|}^{2}\right]={E}_{\mathrm{s}}$, with

*E*

_{s}denoting the average symbol energy at the transmitter. Considering a channel that is characterized by a channel gain

*h*that is constant over the frame and a noise contribution n, the signal received by the destination r

_{ t }= (r

_{p}, r) is defined by

The elements of n_{p} and n are independent zero mean circular symmetric complex Gaussian random variables with variance *N*_{0}. The destination produces a channel gain estimate *ĥ* and uses a Viterbi decoder to obtain the ML information bit sequence decision $\widehat{\mathit{b}}=arg\underset{\stackrel{~}{\mathit{b}}}{max}p\left(\mathit{r}|\mathit{b}=\stackrel{~}{\mathit{b}},h=\mathit{\u0125}\right)$ with $\stackrel{~}{\mathit{b}}$ belonging to the set of all ${2}^{{K}_{\mathrm{b}}}$ information bit sequences of length *K*_{b}. If the estimate *ĥ* were equal to the actual channel gain *h*, the Viterbi decoder would minimize the FER given by $Pr\phantom{\rule{0.2em}{0ex}}\left[\widehat{\mathit{b}}\ne \mathit{b}\right]$.

*m*th symbol interval, the encoder accepts a vector u

_{ m }of

*N*

_{b}information bits (the information bit sequence b is partitioned as b = (u

_{1},⋯,u

_{ K })) and outputs a coded data symbol

*c*

_{ m }which is given by the output equation

*c*

_{ m }=

*g*(

*S*

_{ m },u

_{ m }); here

*S*

_{ m }denotes the encoder state at the start of the

*m*th symbol interval. At the end of the

*m*th symbol interval, the encoder has reached the state

*S*

_{m + 1}given by the state equation

*S*

_{m + 1}=

*f*(

*S*

_{ m },u

_{ m }). For any

*m*,

*S*

_{ m }belongs to the set

*Σ*= {

*σ*

_{1},⋯,

*σ*

_{ L }}, with

*L*denoting the number of encoder states. The resulting trellis code has a rate of

*N*

_{b}information bits per coded symbol. The Viterbi algorithm recursively computes the sequences ${\mathit{\u0109}}_{1:m}\left({\sigma}_{l}\right)$ and their log-likelihood for all

*σ*

_{ l }∈ Σ, and

*m*= 1,⋯,

*K*; here, ${\mathit{\u0109}}_{1:m}\left({\sigma}_{l}\right)$ is the ML sequence consisting of

*m*data symbols that yields

*S*

_{m + 1}=

*σ*

_{ l }. The log-likelihood of ${\mathit{\u0109}}_{1:m}\left({\sigma}_{l}\right)$ is given by –Λ

_{ m }(

*σ*

_{ l })/

*N*

_{0}, where the path metric Λ

_{ m }(

*σ*

_{ l }) satisfies the following recursion:

The value of $\stackrel{~}{c}$ that results from (3) is ${\u0109}_{m}\left({\sigma}_{l}\right)$, the last element of ${\mathit{\u0109}}_{1:m}\left({\sigma}_{l}\right)$. The recursion starts from Λ_{0} (*σ*_{
l
}), which is determined by the *a priori* distribution of the initial state *S*_{0}. The ML data sequence decision is given by $\mathit{\u0109}={\mathit{\u0109}}_{1:K}\left({\u015c}_{K+1}\right)$, where ${\u015c}_{K+1}=arg\underset{{\sigma}_{l}\in \mathrm{\Sigma}}{min}{\mathrm{\Lambda}}_{K}\left({\sigma}_{l}\right)$. The ML decision $\widehat{\mathit{b}}$ is the information bit sequence consistent with $\widehat{\mathit{c}}$, ${\u015c}_{K+1}$, and the encoder operation. The Viterbi decoder operation requires the storage of *L* data symbol sequences of length *K*. The above is straightforwardly extended to (i) multidimensional trellis coding, where a transition from state *S*_{
m
} to state *S*_{m + 1} gives rise to multiple data symbols, and (ii) the presence of termination bits at the encoder input to impose a known final state *S*_{K + 1}.

## 3 Estimation strategy

_{p}to assist the destination with the channel estimation process. The ML pilot-based estimate of

*h*that uses only r

_{p}is given by [14]

Hence, for given *E*_{s}, the estimation accuracy is improved by increasing *K*_{p}.

*h*is given by

*K*

_{b}is computationally complex. Fortunately, the EM algorithm [1] allows to compute the ML estimate iteratively. For the problem at hand, the EM channel estimate ${\mathit{\u0125}}^{\left(i\right)}$ during the

*i*th iteration is obtained as

*a posteriori*mean of

*c*

_{ m }and |

*c*

_{ m }|

^{2}, respectively:

When the data symbols have a constant magnitude, the numerator of (9) reduces to (*K*_{p} + *K*)*E*_{s}. The iterations are initialized with the pilot-based estimate from (5), which we denote as ${\mathit{\u0125}}^{\left(0\right)}$.

*c*

_{ m }can be expressed as

where ∝ means equal within a normalization factor, and the summation is over all valid codewords with *c*_{
m
} equal to *α*. Making use of the finite-state description of the encoder, the APPs (12) can be computed efficiently for *m* = 1,⋯,*K* by means of the BCJR algorithm [4]. However, its complexity is still about three times that of the Viterbi algorithm [5]. Hence, assuming that the EM algorithm converges after *I* iterations, the BCJR algorithm must be applied *I* times, after which, the Viterbi algorithm (with $\mathit{\u0125}={\mathit{\u0125}}^{\left(I\right)}$) is used to detect the information bit sequence. The resulting complexity is 3*I* + 1 times that of a single use of the Viterbi decoder.

Comparison of (6) with (13) indicates the possibility of substantially reducing the MSE when also including the data portion r of the observation in the estimation process, especially when *K* ≫ *K*_{p}.

## 4 Complexity reduction

In order to avoid the computational complexity associated with the BCJR algorithm (or the max-log MAP or SOMA approximations), we consider two reduced-complexity approximations for computing the APPs. In the first algorithm (*A* 1), we do not exploit the code properties and compute the APPs as if the transmission was uncoded. The second algorithm (*A* 2), which exploits the path metrics from the Viterbi decoder to approximate the APPs, represents our main contribution.

### 4.1 Algorithm *A* 1

*r*

_{ k }with

*k*= 1,…,

*K*and

*k*≠

*m*do not contain any information about

*c*

_{ m }. In (12), we can thus replace r by

*r*

_{ m }. This yields

where *λ*_{
m
}(*α*) follows from (4).

### 4.2 Algorithm *A* 2

_{ m }(

*σ*

_{ l })} of the surviving paths. The APP approximation makes use of the following simplifications:

- (i)We ignore future observations. More specifically, we approximate the APP $Pr\phantom{\rule{0.2em}{0ex}}\left[{c}_{m}=\alpha |\mathit{r},\mathit{\u0125}\right]$ of a symbol
*c*_{ m }by conditioning on only the past and present observations r_{1:m}. This APP is obtained by simply replacing in the right-hand side of (12) the vectors r and c by r_{1:m}and c_{1:m}, respectively:$\begin{array}{l}\phantom{\rule{-18.0pt}{0ex}}Pr\phantom{\rule{0.2em}{0ex}}\left[{c}_{m}=\alpha |\mathit{r},\mathit{\u0125}\right]\approx Pr\phantom{\rule{0.2em}{0ex}}\left[{c}_{m}=\alpha |{\mathit{r}}_{1:m},\mathit{\u0125}\right]\phantom{\rule{2em}{0ex}}\hfill \end{array}$(16)

- (ii)From all paths yielding
*S*_{m + 1}=*σ*_{ l }(*l*= 1, 2, …,*L*), we only keep the most likely path that corresponds to the symbol sequence ${\mathit{\u0109}}_{1:m}\left({\sigma}_{l}\right)$; the likelihood of the other, non-surviving, paths is assumed to be zero. This yields the approximation$\begin{array}{l}\phantom{\rule{-20.0pt}{0ex}}Pr\phantom{\rule{0.2em}{0ex}}\left[{c}_{m}=\alpha |\mathit{r},\mathit{\u0125}\right]\propto \sum _{\begin{array}{c}{\sigma}_{l}\in \Sigma \\ {\u0109}_{m}\phantom{\rule{0.2em}{0ex}}\left({\sigma}_{l}\right)=\alpha \end{array}}p\left({\mathit{r}}_{1:m}|{\mathit{\u0109}}_{1:m}\left({\sigma}_{l}\right),\mathit{\u0125}\right)Pr\phantom{\rule{0.2em}{0ex}}\left[\mathit{c}\right]\hfill \end{array}$(18) - (iii)We replace in (18) the summation over the valid symbol sequences by a maximization and finally obtain the approximation$\begin{array}{l}\phantom{\rule{-19.0pt}{0ex}}Pr\phantom{\rule{0.2em}{0ex}}\left[{c}_{m}=\alpha |\mathit{r},\mathit{\u0125}\right]\propto \underset{\begin{array}{c}{\sigma}_{l}\in \Sigma \\ {\u0109}_{m}\left({\sigma}_{l}\right)=\alpha \end{array}}{\mathrm{max}}p\phantom{\rule{0.2em}{0ex}}\left({\mathit{r}}_{1:m}\left|{\mathit{\u0109}}_{1:m}\right({\sigma}_{l}),\mathit{\u0125}\right)\hfill \end{array}$(19)

*Λ*

_{ m }(

*σ*

_{ l }) from (3) and the likelihood of ${\mathit{\u0109}}_{1:m}\left({\sigma}_{l}\right)$, we obtain from (19) the approximation

Hence, when the surviving path with the largest likelihood at the end of the *m* th trellis section has *c*_{
m
} = *α*, our APP approximation for the symbol *c*_{
m
} is largest for *c*_{
m
} = *α*. Approximating the APPs $Pr\phantom{\rule{0.2em}{0ex}}\left[{c}_{m}=\alpha |\mathit{r},\mathit{\u0125}\right]$ for *m* = 1,⋯,*K* using (21) yields a complexity similar to that of the Viterbi algorithm. Hence, assuming that the EM algorithm converges after *I* iterations, the complexity as compared to a single use of the Viterbi decoder is *I* + 1 times for algorithm *A* 2, whereas it is 3*I* + 1 times when the APP computation is according to the BCJR algorithm. Note that unlike the Viterbi algorithm, the computation of the APP (21) of *c*_{
m
} does not require to store the data symbol decisions ${\u0109}_{n}\left({\sigma}_{l}\right)$ for *n* < *m*, so that algorithm *A* 2 uses considerably less memory than the Viterbi algorithm does.

Whereas simplifications similar to (ii) and (iii) have also been applied to APP algorithms from literature (e.g., max-log MAP), this is not the case for simplification (i). As the APP algorithms from the literature also make use of future observations, the APP of *c*_{
m
} requires updating each time future observations *r*_{m + 1}, *r*_{m + 2}, … become available, yielding a higher computational complexity and more memory requirements. Hence, approximation (i) is crucial for obtaining a very simple APP computation.

## 5 Numerical results

We consider a trellis encoder consisting of an eight-state rate 1/2 (15,17)_{8} convolutional encoder with known initial and final states, followed by Gray mapping of the convolutional encoder output bits to 4-QAM symbols. Each frame contains *K*_{p} = 5 pilot symbols and *K* = 200 data symbols (including four termination symbols). We consider both an Additive white Gaussian noise (AWGN) channel with *h* = 1 and a Rayleigh fading channel with $\mathbb{E}\left[|h{|}^{2}\right]=1$. We investigate the performance of the estimator and the Viterbi decoder by means of Monte-Carlo simulations, in terms of MSE and FER, respectively. The EM algorithm has essentially converged after only one iteration, i.e., *I* = 1; for *I* = 1, the complexity reduction obtained by computing the APPs using the new algorithm *A* 2 instead of the BCJR algorithm is about a factor of 2. In the following, we consider the APP computation according to the BCJR algorithm, the SOMA (M = 4) version of the exact APP algorithm from [16] and the above A1 and A2 algorithms.

*h*. Ranking the APP algorithms according to the resulting MSE, we see that BCJR (which computes exact APPs) performs best, SOMA (which takes past, present, and future observations into account) is a very close second, and A2 (which ignores future observations) is only slightly worse than BCJR and SOMA; A1 (which uses only the present observation) is considerably worse than BCJR, SOMA, and A2 for low and medium SNRs.

*E*

_{b}/

*N*

_{0}, with

*E*

_{b}denoting the energy per information bit. Hence,

where *K*_{ter} is the number of termination symbols. As benchmark, we use the FER of a reference system with (*K*,*K*_{p}) = (200,0) where the channel coefficient *h* is known to the receiver. Hence, as compared to this reference system, the system with (*K*,*K*_{p}) = (200,5) suffers from an irreducible power efficiency loss of $10\underset{10}{log}\frac{205}{200}=0.11$ dB because of the presence of pilot symbols; the actual degradation will exceed 0.11 dB because of channel estimation errors. We observe that (a) the A2, SOMA, and BCJR algorithms yield essentially the same FER performance and require, for given FER, about 0.11 dB more *E*_{b}/*N*_{0} than the reference system: for these algorithms, the channel estimation is sufficiently accurate so that the degradation is mainly determined by the power efficiency loss caused by the pilot symbol insertion. (b) The A1 algorithm performs worse than the A2, SOMA, and BCJR algorithms because ignoring the code constraints when computing the APPs yields less accurate channel estimates. (c) The FER performance is worst when only pilot symbols are used to estimate *h*. Hence, from a computational complexity and memory requirement point of view, it is advantageous to compute the trellis-coded symbol APPs in the EM algorithm by means of the new algorithm A2 (that ignores future observations) rather than the considered APP algorithms from the literature (that take also future observations into account).

## 6 Conclusions

EM-based channel estimation in the presence of trellis-coded modulation requires the use of the BCJR algorithm to efficiently compute the exact symbol APPs. As the computational complexity of the BCJR algorithm is about three times that of the Viterbi algorithm which we use for decoding, we have proposed a new approximation to the APP computation that beats the main APP algorithms from the literature in terms of computational complexity and memory requirements. By means of computer simulations, we have pointed out that when using the new APP computation instead of the exact APPs from the BCJR algorithm, the resulting Viterbi decoder FER performances are essentially the same. Hence, this motivates the use of the new APP approximation in the context of EM channel estimation for trellis-coded modulation.

*h*during a frame. For time-varying channels, a similar approximation of the symbol APPs can be derived. Denoting by

*h*

_{ m }the channel gain associated with the data symbol

*c*

_{ m }, the branch metric corresponding to ${c}_{m}=\stackrel{~}{c}$ becomes

*ĥ*by ${\mathit{\u0125}}_{m}$, which denotes the estimate of

*h*

_{ m }. Collecting the channel gain estimates at the data symbol positions into the vector

*ĥ*, the likelihood $p\left(\mathit{r}|\mathit{c},\mathit{\u0125}\right)$ can be decomposed as

where $p\left({\mathit{r}}_{m}|{c}_{m},{\mathit{\u0125}}_{m}\right)\propto exp\left(-{\lambda}_{m}\left({c}_{m}\right)/{N}_{0}\right)$, with ${\lambda}_{m}\left(\stackrel{~}{c}\right)$ given by (23). Hence, the symbol APP approximations for time-varying channels are simply obtained by replacing in the APP approximations *A*_{1} and *A*_{2} for time-invariant channels the quantity ${\lambda}_{m}\left(\stackrel{~}{c}\right)$ by the right-hand side of (23) instead of (4). The resulting approximated symbol APPs can be used in, for instance, a MAP EM channel estimation algorithm that exploits the correlation between the time-varying channel gains [17].

## Declarations

### Acknowledgements

This research has been funded by the Interuniversity Attraction Poles Programme initiated by the Belgian Science Policy Office and is also supported by the FWO project G.0022.11 ‘Advanced multi-antenna systems, multi-band propagation models and MIMO/cooperative receiver algorithms for high data rate V2X communications’.

## Authors’ Affiliations

## References

- Dempster A, Laird N, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm.
*J. Roy. Stat. Soc*1977, 39(1):1-38.MathSciNetMATHGoogle Scholar - Choi J:
*Adaptive and Iterative Signal Processing in Communications*. Cambridge University Press, New York; 2006.View ArticleGoogle Scholar - Viterbi AJ: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm.
*IEEE Trans. Inform. Theor*1967, 13: 260-269.View ArticleMATHGoogle Scholar - Bahl L, Cocke J, Jelinek F, Raviv J: Optimal decoding of linear codes for minimizing symbol error rate.
*IEEE Trans. Inform. Theor*1974, IT-20(2):284-287.MathSciNetView ArticleMATHGoogle Scholar - Lin S, Costello D:
*Error Control Coding*. Pearson Education Inc., Upper Saddle River; 2004.MATHGoogle Scholar - Hagenauer J: A Viterbi algorithm with soft-decision output and its applications.
*GLOBECOM*1989, 3: 1680-1686.Google Scholar - Mansour M, Bahai A: Complexity based design for iterative joint equalization and decoding.
*Proc. IEEE VTC*2002, 4: 1699-1704.Google Scholar - Mehlan R, Meyr H: Soft output M-algorithm equalizer and trellis-coded modulation for mobile radio communication.
*Proc. IEEE VTC*1992, 2: 586-591.Google Scholar - Papaharalabos S, Sweeney P, Evans BG, Mathiopoulos PT: Improved performance SOVA turbo decoder.
*IEE Proc. Commun*2006, 153(5):586-590. 10.1049/ip-com:20050247View ArticleGoogle Scholar - Hamad AA: Performance enhancement of SOVA based decoder in SCCC and PCCC schemes.
*SciRes. Wireless Engineering and Technology*2013, 4: 40-45. 10.4236/wet.2013.41006View ArticleGoogle Scholar - Motwani R, Souvignier T: Reduced-complexity soft-output Viterbi algorithm for channels characterized by dominant error events. In
*IEEE Global Telecommunications Conference*. Miami, FL, USA; 6–10 Dec 2010:1-5.Google Scholar - Yue DW, Nguyen HH: Unified scaling factor approach for turbo decoding algorithms.
*IET Commun*2010, 4(8):905-914. 10.1049/iet-com.2009.0125MathSciNetView ArticleMATHGoogle Scholar - Huang CX, Ghrayeb A: A simple remedy for the exaggerated extrinsic information produced by SOVA algorithm.
*IEEE Trans. Wireless Commun*2006, 5(5):996-1002.View ArticleGoogle Scholar - Proakis JG:
*Digital Communications*. McGraw-Hill, New York; 1995.MATHGoogle Scholar - Andrea AD, Mengali U, Reggiannini R: The modified Cramer-Rao bound and its applications to synchronization problems.
*IEEE Trans. Commun*1994, 42(234):1391-1399.View ArticleGoogle Scholar - Lee L: Real-time minimum-bit-error probability decoding of convolutional codes.
*IEEE Trans. Commun*1974, COM-22: 146-151.View ArticleGoogle Scholar - Aerts N, Moeneclaey M: SAGE-based estimation algorithms for time-varying channels in amplify-and-forward cooperative networks. In
*International Symposium on Information Theory and its Applications (ISITA)*. Honolulu, HI, USA; 28–31 Oct 2012:672-676.Google Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.