 Research
 Open Access
 Published:
On the use of the channel secondorder statistics in MMSE receivers for time and frequencyselective MIMO transmission systems
EURASIP Journal on Wireless Communications and Networking volume 2016, Article number: 276 (2016)
Abstract
Equalization of unknown frequency and timeselective multiple input multiple output (MIMO) channels is often carried out by means of decision feedback receivers. These consist of a channel estimator and a linear filter (for the estimation of the transmitted symbols), interconnected by a feedback loop through a symbolwise threshold detector. The linear filter is often a minimum mean square error (MMSE) filter, and its mathematical expression involves secondorder statistics (SOS) of the channel, which are usually ignored by simply assuming that the channel is a known (deterministic) parameter given by an estimate thereof. This appears to be suboptimal and in this work we investigate the kind of performance gains that can be expected when the MMSE equalizer is obtained using SOS of the channel process. As a result, we demonstrate that improvements of several dBs in the signaltonoise ratio needed to achieve a prescribed symbol error rate are possible.
Introduction
The main appeal in using a multiple input multiple output (MIMO) wireless communication system stems from the fact that the channel capacity increases linearly with the minimum between the number of transmitting antennas and that of receiving antennas [1]. Unfortunately, the complexity of optimal MIMO detectors (which minimize the probability of either symbol or sequence detection errors) grows exponentially with the number of input streams and the order of the channel, if the latter is frequencyselective [2]. Therefore, suboptimal equalization algorithms that avoid this computational burden are needed in order to take advantage, in a practical setup, of the increase in capacity that a MIMO channel can offer. Additionally, in most realworld scenarios, the channel is unknown and must be estimated prior to data detection. A decision feedback equalizer (DFE) type of receiver [3–7] is then an appealing choice due to its ease of implementation and the good tradeoff between computational complexity and performance that it achieves.
Figure 1 shows a simple DFE scheme. The main blocks are an adaptive channel estimation algorithm and a linear filter. The latter takes a channel estimate and the observations at the receiver frontend to produce linear estimates of the transmitted symbols. These estimates are either real or complex numbers, depending on the modulation format. A threshold detector is used to convert them into hard symbol decisions, i.e., discrete estimates chosen from the symbol alphabet based on a minimum distance rule. We assume that the detector operates symbolwise in order to keep the computational effort limited. The decisions at the output of the detector are fed back to the channel estimation block, so that they can be used to improve the subsequent channel estimates. Usually, the detected symbols are also employed to cancel intersymbol interference (see, e.g., [8] and Section 3 in this article), but this is omitted in the figure for simplicity. We remark that the receiver is nonlinear, due to the thresholding operation, yet its computational complexity is similar to that of a linear receiver combined with an adaptive channel estimation algorithm [9], ([7] Chapter 16).
Any linear filter is amenable to be used in a DFE structure. However, the use of the minimum mean square error (MMSE) filter has become widespread because it offers an attractive tradeoff between noise amplification and interference cancelation [10]. Many joint data detection and channel estimation algorithms (see, e.g., [11–14]) rely on the linear MMSE filter to carry out the equalization of an unknown MIMO channel. In every case, and to the best of our knowledge, the point estimates of the channel impulse response (CIR) provided by the corresponding estimator are used by the MMSE filter as if they were the true CIR. However, the estimates are actually statistics of the true channel and, as such, have a mean and a covariance matrix, the latter measuring the uncertainty we have about their accuracy. When the Kalman filter (KF) [15, 16] is used to estimate the channel, both statistics, the mean and the covariance matrix of the channel become available, but even then the usual approach (e.g., in [11, 17]) consists in taking the mean of the estimate as if it were the true channel and ignore the information provided by the covariance matrix. In this paper, we argue that significant performance gains can be expected by taking advantage of the secondorder statistics (SOS) of the channel, with a low impact on the computational complexity of the receiver. To be specific, we show that reductions of several dBs in the signaltonoise ratio (SNR) needed to attain a prescribed symbol error rate (SER) can be achieved using the proposed scheme. The main contributions of this work are the design and implementation of the new linear MMSE equalizer that exploits the secondorder statistics of the channel, as well as extensive computer simulations showing the gains that can be expected from the proposed method as compared to the conventional one.
The remaining of this paper is organized as follows. In Section 2, the discretetime baseband equivalent signal model of a MIMO transmission system with frequency and timeselective channel is described. The standard linear MMSE equalizer is briefly reviewed in Section 3. Our extension thereof using the channel SOS is introduced in Section 4. Section 5 outlines the DFE schemes resulting from the investigated equalizers. In Section 6, we show and discuss the results of extensive computer simulations to compare the performance of the conventional DFE MMSE receiver (that ignores the channel SOS) and the proposed DFE SOSMMSE scheme. Finally, Section 7 is devoted to the conclusions.
Summary of notation
Given a timeindexed sequence of (column) vectors, x _{ a },x _{ a+1},⋯,x _{ b }, we denote by \(\mathbf {x}_{_{b}^{a}}\) the column vector constructed by stacking, in order, all the vectors between x _{ a } and x _{ b } (including both), i.e.,
An identity matrix of order k is denoted by I _{ k }, whereas 0 _{ N,M } is an N×M allzeros matrix. If N=M, we simply write 0 _{ N }. For a vector x, x ( i ) represents its ith element, and given a matrix A, a(i) refers to ith column.
Signal model
Time and frequencyselective MIMO channel
We consider a MIMO communication system with N _{ t } transmitting antennas and N _{ r } receiving antennas separated by a time and frequencyselective (MIMO) channel. The discretetime basebandequivalent model describing the transmission can be written as (see, e.g., [18])
where y _{ t } is a N _{ r }×1 vector containing the observations collected at time t, m is the number of taps (usually referred to as the order) of the frequencyselective MIMO channel, H _{ t }(i) is the (timevarying) N _{ r }×N _{ t } channel matrix associated with the ith tap, s _{ t } is a vector of size N _{ t }×1 comprising the symbols transmitted at time t, and g _{ t } is an N _{ r }×1 vector of independent additive white Gaussian noise (AWGN) components with zero mean and variance \(\sigma ^{2}_{g}\).
Grouping the matrices associated with the different taps of the channel, H _{ t }(i),i=0,…,m−1, in a single overall channel matrix,
of size N _{ r }×N _{ t } m, allows Eq. (1) to be written in a more compact form as
where \(\mathbf {s}_{{^{tm+1}_{t}}}\) is a N _{ t } m×1 vector that stacks all the symbol vectors involved in the tth observation,
While nonstandard, the notation in (4) shows explicitly that vector \(\mathbf {s}_{{^{tm+1}_{t}}}\) is constructed by stacking simpler vectors in order and indicates the time indexes of the first (t−m+1 in this case) and last (t here) elements to be stacked. These features should ease the understanding of some formulas in the sequel.
The evolution of the channel is modeled by means of an autoregressive (AR) process driven by white Gaussian noise^{1} [11]. For the sake of generality, we consider an AR process of order R, whose analytical description is given by
where a _{ r },r=1,…,R, are the coefficients of the process and V _{ t } is a N _{ r }×N _{ t } m matrix with independent and identically distributed (i.i.d.) Gaussian random variables (r.v.) with zero mean and variance \(\sigma ^{2}_{v}\).
Equations (3) and (5) can be seen, respectively, as the observation and state equations of a random dynamic system in statespace form. Since both equations are linear and the corresponding noise processes are Gaussian, the Kalman filter (KF) can be applied to exactly compute the posterior probability distribution of the timevarying MIMO channel when the symbols are available.
In order to do so while using the standard KF equations, we first need to gather the whole state of the system (here, the channel at the last R time instants) in a single vector and rewrite the state and observation equations in terms of it. Matrix H _{ t } can be represented as a vector in a straightforward manner by, e.g., stacking all its columns one upon another. In particular, if we let h _{ t }(j) denote the jth column of matrix H _{ t }, then the N _{ r } N _{ t } m×1 vector
contains the same coefficients as matrix H _{ t }. Using this vectorial notation and taking into account that, according to Eq. (5), the channel at a certain time instant depends on the channel at the R previous time instants, the state of the system at time t can be represented by the vector
The state equation of the system can then be written in terms of this augmented channel vector as
where v _{ t } is a N _{ r } N _{ t } mR×1 vector with i.i.d. Gaussian r.v.’s of zero mean and variance \(\sigma ^{2}_{v}\) in the last N _{ r } N _{ t } m positions and zeros in the rest, and the state transition matrix, Q, is defined as
with \(\boldsymbol {0}_{N_{r}N_{t}m}\) denoting an N _{ r } N _{ t } m×N _{ r } N _{ t } m allzeros matrix, and \(\mathbf {I}_{N_{r}N_{t}m}\) an identity matrix of order N _{ r } N _{ t } m.
The observation (Eq. (3)) can also be easily rewritten in terms of the augmented channel vector, \(\mathbf {h}_{^{tR+1}_{t}}\), as
where
We use the dynamic system in statespace form specified by Eqs. (10) and (8) (which is equivalent to that given by Eqs. (3) and (5)) to track the unknown timevarying MIMO channel by means of a KF.
Stacked model
When a channel is time dispersive, a reliable detection of the transmitted symbols usually requires smoothing. It entails taking into account the observations y _{ t:t+d } (the parameter d≥1 being the smoothing lag) in order to detect the vector s _{ t } containing the symbols transmitted at time t. In such case, it is useful to consider an equation that relates a tall vector of stacked observations with the transmitted symbols, namely,
where \(\mathbf {y}_{^{t}_{t+d}} = \left [ \mathbf {y}_{t}^{\top },\mathbf {y}_{t+1}^{\top },\cdots,\mathbf {y}_{t+d}^{\top } \right ]^{\top }\), and the N _{ r }(d+1)×N _{ t }(m+d) composite channel matrix is defined as
Equation (12) involves symbol vectors s _{ t−m+1},⋯,s _{ t−1}, which, at time t, have already been detected. It is convenient to identify their contribution to the stacked observations vector, \(\mathbf {y}_{^{t}_{t+d}}\). Let us decompose the overall channel matrix \(\overline {\mathbf {H}}_{t,d}\) as
where the submatrices \(\mathbf {H}^{\ddag }_{t,d}\) and H _{ t,d } encompass, respectively, the first N _{ t }(m−1) and last N _{ t }(d+1) columns of \(\overline {\mathbf {H}}_{t,d}\). Then, the vector of stacked observations can be rewritten as
where the term \(\mathbf {H}^{\ddag }_{t,d} \mathbf {s}_{^{tm+1}_{t1}}\) contains the contribution of the symbols transmitted up to time t−1, and can be treated as causal intersymbol interference.
Kalman filtering
The KF [15] provides the optimal solution to the problem of estimating the state of a dynamic system in statespace form when its state and observation equations are linear and their corresponding noises are Gaussian.
In the problem at hand, the state of the system at time t is given by the augmented channel vector, \(\mathbf {h}_{^{tR+1}_{t}}\), and Eqs. (8) and (10) can be seen as, respectively, the state and observation equations of a dynamic system in statespace form. Since the above constraints of linearity and Gaussianity are met, the KF can be used to compute the probability density function of the state conditional on the available observations, \(p(\mathbf {h}_{^{tR+1}_{t}}\mathbf {y}_{0},\cdots,\mathbf {y}_{t})\). However, the observation equation involves knowing, at time t, matrix S _{ t }, which includes all the symbols transmitted between time instants t−m+1 and t. In practice, only estimates of the symbols transmitted up to time t−1 are available at time t, and hence we aim at the (predictive) distribution of the state conditional on all the past observations and previously detected symbols, i.e., \(p(\mathbf {h}_{^{tR+1}_{t}} \mathbf {y}_{0},\cdots,\mathbf {y}_{t1},\tilde {\mathbf {s}}_{0},\cdots,\tilde {\mathbf {s}}_{t1})\) with \(\tilde {\mathbf {s}}_{t}\) denoting the vector containing the hard estimates of the symbols in s _{ t }. Every expectation in the remaining of the paper is also (implicitly) conditional on the same information, and we denote it as \({\mathbb E}_{t1}[\cdot ]\). For example, the posterior mean of the CIR at time t+k (k≥0) conditional on y _{0},⋯,y _{ t−1} and \(\tilde {\mathbf {s}}_{0},\cdots,\tilde {\mathbf {s}}_{t1}\) is written as \({\mathbb E}_{t1}[\mathbf {h}_{t+k}]\) and the posterior crosscovariance between h _{ t+k } and ht+k′, k,k ^{′}≥0, is denoted \({\mathbb E}_{t1}[\mathbf {h}_{t+k} \mathbf {h}_{t+k'}^{H}]\).
Notice that here the KF only yields an approximate solution insofar as it depends on the goodness of the previously detected symbols fed to it.
Linear MMSE smoothing
In order to detect the symbols transmitted at time t over a frequencyselective channel, it is usually a good approach to first remove the contribution of the already detected symbols from the observations vector [8]. At the sight of Eq. (15), we can obtain causalinterferencefree observations as
Computing \(\mathbf {z}_{^{t}_{t+d}}\) from \(\mathbf {y}_{^{t}_{t+d}}\) entails knowing vector \(\mathbf {s}_{^{tm+1}_{t1}}\), which encompasses symbol vectors s _{ t−m+1},⋯,s _{ t−1}. These are unknown but previous estimates thereof are available at time t and can be used as a surrogate. Hence, in practice, the stacked symbols vector \(\mathbf {s}_{^{tm+1}_{t1}}\) in Eq. (16) is replaced with vector, \(\tilde {\mathbf {s}}_{_{t1}^{tm+1}}\) that contains hard estimates of the same symbols. This is a common approximation for the design of DFEs, and it usually makes sense under the assumption that the receiver is operating with a sufficiently low symbol error probability. Throughout the paper, we rely on this approximation, which amounts to taking the previously detected symbols as if they were the truly transmitted symbols, i.e.,
Assuming the causal interference is properly canceled, the linear MMSE estimation of the symbols transmitted at time t considering the observations up to time t+d can be easily derived from Eq. (17) (see, e.g., [19]). In particular, let the N _{ r }(d+1)×N _{ t }(d+1) matrix F _{ t } represent the response of a linear system. Then, estimates of the transmitted symbols are computed as ^{2}
and, in order to minimize the mean square error of these estimates, the response matrix can be computed by solving the optimization problem
Since the ultimate aim is to estimate s _{ t } but we are using observations up to time t+d, we refer to the linear system whose response is given by F _{ t } in (20) as an MMSE smoother.
Equation (20) poses a quadratic optimization problem and it is straightforward to obtain the closedform solution (see, e.g., [2])
Again, if we assume that the causal intersymbol interference has been completely removed from the observations, so that these are given by Eq. (17), then the expectations on the righthand side of Eq. (21) can be shown to be
where \(\sigma ^{2}_{s}\) denotes the variance of the symbols and it has been taken into account that the noise at time t is white and independent of the channel process and the symbols transmitted up to t−1, and that the channel and the symbols are a priori independent.
The expectation on the righthand side of Eq. (23) is usually approximated, to the best of our knowledge, by dealing with the channel matrix H _{ t,d } as if it were a known given (deterministic) parameter, and hence
Substituting (24) into Eq. (23) yields
and combining Eqs. (22) and (25) in Eq. (21) yields the final expression for the response of the conventional linear MMSE smoother,
So far, we have assumed the channel is a known (deterministic) parameter. However, this is not usually the case in practice, and the common approach to tackle this problem consists in replacing, whenever necessary, the (true) channel matrix with its expectation. Notice that here this entails a twofold approximation. On one hand, even when assuming that the symbols up to time t−1 have been detected exactly, at best one can only obtain approximate causalinterferencefree observations as
On the other hand, taking the true channel matrix in Eq. (26) to be equal to its expectation results in the following approximation for the linear MMSE filter
At the sight of Eqs. (13) and (14), in order to obtain \(\mathbb {E}_{t1}\left [\mathbf {H}^{\ddag }_{t,d}\right ]\) and \(\mathbb {E}_{t1}\left [\mathbf {H}_{t,d}^{H}\right ]\) on the righthand side of Eqs. (27) and (28), respectively, we need the expectations of the matrices H _{ t },H _{ t+1},…,H _{ t+d }. At time t, the expectation of the channel matrix H _{ t } is given by the predictive distribution of the KF, which takes into account the observations and symbols vectors up to time t−1. However, the expectations of the matrices H _{ t+1},…,H _{ t+d } have to be computed as well. In order to do so, we simply use Eq. (5) (the state equation of the system) to expand the expected channel matrix at time t, i.e.,
MMSE smoothing using the channel SOS
The proposed MMSE detector treats the channel as an unknown (multidimensional) random variable (as opposed to a deterministic known parameter), and takes advantage of its secondorder statistics rather than just its expectation. Additionally, it avoids performing explicit interference cancelation, since this cannot be performed exactly. In order to do so, it aims to detect the transmitted symbols by solving the optimization problem
which is exactly the same as that posed by Eq. (20) replacing \(\mathbf {z}_{^{t}_{t+d}}\) with \(\mathbf {y}_{^{t}_{t+d}}\). Hence, the solution is
Through straightforward algebraic manipulation, one can show
where we have used that vector \(\mathbf {s}_{^{tm+1}_{t1}}\) is known because the expectations are conditional on the previously detected symbols and we are assuming these match the truly transmitted ones (see Eq. (18) and the surrounding discussion).
The observation autocorrelation matrix
From Eq. (33), computing \(\mathbb {E}_{t1}\left [ \mathbf {y}_{^{t}_{t+d}} \mathbf {y}_{^{t}_{t+d}}^{H} \right ]\) actually amounts to the calculation of \(\mathbb {E}_{t1}\left [ \mathbf {H}_{t,d} \mathbf {s}_{^{t}_{t+d}} \mathbf {s}_{^{t}_{t+d}}^{H} \mathbf {H}_{t,d}^{H}\right ]\) and \( \mathbb {E}_{t1}\left [ \mathbf {H}^{\ddag }_{t,d} \mathbf {s}_{^{tm+1}_{t1}} \mathbf {s}_{^{tm+1}_{t1}}^{H} \mathbf {H}^{\ddag {H}}_{t,d}\right ].\) Regarding the first expectation, if we let h _{ t,d }(j) denote the jth column of matrix H _{ t,d } and \(s_{_{t+d}^{t}}(j)\) denote the jth element within vector \(\mathbf {s}_{^{t}_{t+d}}\), then the expectation \(\mathbb {E}_{t1}\left [\mathbf {H}_{t,d}\mathbf {s}_{^{t}_{t+d}}\mathbf {s}_{^{t}_{t+d}}^{H}\mathbf {H}_{t,d}^{H} \right ]\) can be rewritten as
where the third equality follows because the symbols from time t onwards are a priori independent of the channel at time t and subsequent time instants, while the fourth equality holds because of the (also a priori) independence between different symbols,
Similarly, if \(\mathbf {h}^{\ddag }_{t,d}(i)\) refers to the ith column in matrix \(\mathbf {H}^{\ddag }_{t,d}\), and \(s_{^{tm+1}_{t1}}(i)\) to the ith symbol within vector \(\mathbf {s}_{^{tm+1}_{t1}}\) (assumed known), we have
where, once again, we have used that the expectation is conditional on all the previously detected symbols and hence, assuming these were exactly detected, vector \(\mathbf {s}_{^{tm+1}_{t1}}\) is known.
Channel crosscorrelation matrices
Equations (34) and (36) involve computing the crosscorrelation between different columns of matrices H _{ t,d } and \(\mathbf {H}^{\ddag }_{t,d}\), respectively. These are submatrices of \(\overline {\mathbf {H}}_{t,d}\) (see Eq. (14)), and hence their columns are ultimately columns from \(\overline {\mathbf {H}}_{t,d}\). In particular, if we let \(\overline {\mathbf {h}}_{t,d}(j)\) denote the jth column of matrix \(\overline {\mathbf {H}}_{t,d}\), then
and every required crosscorrelation is ultimately between columns of \(\overline {\mathbf {H}}_{t,d}\). The structure of a column from the latter can be inferred from Eq. (13). Specifically, the jth column of \(\overline {\mathbf {H}}_{t,d}\) is given by
where
is a N _{ r }×1 column vector.
We compute the crosscorrelation between any pair of columns in \(\overline {\mathbf {H}}_{t,d}\), by way of their means and crosscovariance matrix, as
where \(\Sigma _{\overline {\mathbf {h}}_{t,d}(i), \overline {\mathbf {h}}_{t,d}(j)}\) stands for the crosscovariance matrix between N _{ r }(d+1)×1 (column) vectors \(\overline {\mathbf {h}}_{t,d}(i)\) and \(\overline {\mathbf {h}}_{t,d}(j)\).
The expectation of every column, \(\overline {\mathbf {h}}_{t,d}(i), i=1,\ldots,N_{t}(m + d)\), in matrix \(\overline {\mathbf {H}}_{t,d}\) is readily available from the expectation of the entire matrix, \(\overline {\mathbf {H}}_{t,d}\), which can be obtained in a straightforward manner as explained at the end of Section 3. As for the crosscovariance \(\Sigma _{\overline {\mathbf {h}}_{t,d}(i), \overline {\mathbf {h}}_{t,d}(j)},\) it is given, at the sight of Eq. (39), by
where \(\Sigma _{\breve {\mathbf {h}}_{t+k}(i),\breve {\mathbf {h}}_{t+l}(j)},k,l=0,\ldots,d\) is the crosscovariance between vectors \(\breve {\mathbf {h}}_{t+k}(i)\) and \(\breve {\mathbf {h}}_{t+l}(j)\). In particular the entry at the (u,w) position of \(\Sigma _{\breve {\mathbf {h}}_{t+k}(i),\breve {\mathbf {h}}_{t+l}(j)}\) is the covariance between the uth element of \(\breve {\mathbf {h}}_{t+k}(i)\) and the wth element of \(\breve {\mathbf {h}}_{t+l}(j)\). Notice that when both vectors in the subscript are the same, this yields the (self)covariance matrix \(\Sigma _{\breve {\mathbf {h}}_{t+k}(i),\breve {\mathbf {h}}_{t+k}(i)} = \Sigma _{\breve {\mathbf {h}}_{t+k}(i)}\).
Recall from Eq. (40) that \(\breve {\mathbf {h}}_{t+k}(i)\) for k=0,⋯,d, and i=1,⋯,N _{ r }(d+1), is either an allzeros (column) vector or a column from matrix H _{ t+k }. Thus, when computing the crosscovariance between vectors \(\breve {\mathbf {h}}_{t+k}(i)\) and \(\breve {\mathbf {h}}_{t+l}(j)\), if one of the vectors (or both) is allzeros, then \(\Sigma _{\breve {\mathbf {h}}_{t+k}(i),\breve {\mathbf {h}}_{t+l}(j)} = \boldsymbol {0}_{N_{r}}\). On the contrary, if both vectors are different from the allzeros column vector, then \(\breve {\mathbf {h}}_{t+k}(i)\) must be a column from matrix H _{ t+k }, and \(\breve {\mathbf {h}}_{t+l}(j)\) one from matrix H _{ t+l }. In that case, let us assume \(\breve {\mathbf {h}}_{t+k}(i) = \mathbf {h}_{t+k}(n)\) and \(\breve {\mathbf {h}}_{t+l}(j) = \mathbf {h}_{t+l}(q)\) for some n,q∈{1,⋯,N _{ t } m}, so that \(\Sigma _{\breve {\mathbf {h}}_{t+k}(i),\breve {\mathbf {h}}_{t+l}(j)} = \Sigma _{\mathbf {h}_{t+k}(n),\mathbf {h}_{t+l}(q)}\).
In order to compute \(\phantom {\dot {i}\!}\Sigma _{\mathbf {h}_{t+k}(n),\mathbf {h}_{t+l}(q)}\) for k,l∈{0,…,d} and n,q∈{1,⋯,N _{ t } m}, we consider two different cases. On one hand, if k=l then \(\phantom {\dot {i}\!}\Sigma _{\mathbf {h}_{t+k}(n),\mathbf {h}_{t+k}(q)}\) is ultimately the crosscovariance between columns n and q of matrix H _{ t+k } and can be obtained from the KF. Indeed, we can make the KF evolve from time t up to time t+k when no new information is available by taking k predictive steps. This yields predictive statistics for h _{ t+k } (a vectorial representation of H _{ t+k }), and from its covariance matrix it is straightforward to obtain the crosscovariance matrix, \(\phantom {\dot {i}\!}\Sigma _{\mathbf {h}_{t+k}(n),\mathbf {h}_{t+k}(q)}\) between any pair of columns h _{ t+k }(n) and h _{ t+k }(q) within H _{ t+k }. On the other hand, if k≠l then
and from Eqs. (43) and (44), it can be shown (see Appendix 7 for details) that, for k<l,
which allows for the recursive computation of the crosscovariance between any given two (different) columns in matrices H _{ t },H _{ t+1},…,H _{ t+d }. Notice that when k>l, we can still use the above formula since
It is interesting to note that, when R=1, Eq. (45) becomes
Since a KF estimating the augmented channel vector defined in Eq. (7) yields, at time t, the crosscovariance matrices \(\phantom {\dot {i}\!}\Sigma _{\mathbf {h}_{k}(i),\mathbf {h}_{l}(j)}\) with k,l=t−R+1,…,t and i,j=1,…,N _{ t } m, Eq. (45) allows the recursive computation of all the crosscovariance matrices needed to obtain, according to Eq. (42), the crosscovariance matrix \(\Sigma _{\overline {\mathbf {h}}_{t,d}(i),\overline {\mathbf {h}}_{t,d}(j)}\), for any pair of columns in \(\overline {\mathbf {H}}_{t,d}\). To summarize,

Equations (45), (42), and (41) together yield the channel crosscorrelation matrices \(\mathbb {E}_{t1}\left [\overline {\mathbf {h}}_{t,d}(i)\overline {\mathbf {h}}_{t,d}^{H}(j)\right ]\) in closed form;

Equations (41), (34), (37), (36), (38), and (33) yield the observation autocorrelation matrix \(\mathbb {E}_{t1}\left [ \mathbf {y}_{^{t}_{t+d}} \mathbf {y}_{^{t}_{t+d}}^{H} \right ].\)
The SOSMMSE smoother
Having computed the stacked observations autocorrelation matrix given by Eq. (33), it is straightforward to plug it into Eq. (31), along with righthand side of (32), to obtain the final expression for the proposed MMSE smoother that exploits the channel SOS explicitly,
Notice that, at the sight of the righthand side of Eq. (47), the proposed MMSE detector takes advantage of the previously detected symbols. Hence, causalinterference cancelation is still being performed, although in an implicit manner (as opposed to the explicit causalinterference cancelation carried out by the conventional MMSE detector).
In order to ease the implementation of the proposed scheme, Pseudocode ?? gives an overview of the necessary steps to obtain the MMSE smoother at time t when the order of the AR process used to model the channel dynamics is R=1. The extension to higher order AR processes is straightforward (the procedure is essentially the same), though some care is needed to build up the involved crosscovariance matrices in an adequate order.
Assuming the smoothing lag, d, is approximately equal to the memory of the channel^{3}, m, the complexity of the proposed receiver when R=1 is \(\mathcal {O}({N_{r}^{3}N_{t}^{3}m^{4}})\). This is ultimately the complexity of running m KFs, each one estimating a state vector of length N _{ r } N _{ t } m (see lines ??–?? of Pseudocode ??). Therefore, the computational complexities of the SOSMMSE smoother and the conventional MMSE smoother of Section 3 are of the same order.
DFE schemes
The two MMSE smoothers described in Sections 3 and 4 can be readily used in a DFE scheme that relies on a Kalman filter for the channel tracking. In particular, we aim at comparing the performance of

a conventional MMSE DFE that neglects the SOS generated by the KF, termed “MMSE + KF” in the sequel, and

the proposed SOSbased MMSE DFE, termed “SOSMMSE + KF”.
Figure 2 illustrates schematically the fundamental differences between the MMSE + KF and SOSMMSE + KFbased receivers. The KF yields the mean and the covariance matrix of the channel impulse response, and both receivers make use of the former (along with the observations) to obtain estimates of the symbols transmitted. However, the proposed SOSMMSE filter also takes advantage of the covariance matrix whereas the conventional MMSE neglects the information contained within this statistic. Also notice that the proposed receiver does not perform explicit interference cancelation (since this cannot be carried out exactly), as opposed to the conventional MMSE.
Simulation results
In order to assess the performance of the proposed algorithm, we have carried out computer simulations considering a system with N _{ t }=4 transmitting antennas and N _{ r }=7 receiving antennas. The modulation format is BPSK and transmission is carried out in frames of K=300 symbol vectors (i.e., 1200 binary symbols overall), including a training sequence of length T=30 comprising symbols known to the receiver. This last parameter has been selected empirically, after observing that an increase thereof does not yield any noticeably performance improvement while decreasing it has indeed a negative impact. The training sequence is used at the beginning of each data frame to obtain a rough estimate of the channel impulse response. However, extending the method to use pilot symbols instead of, or in addition to, a training preamble is straightforward.
A flat power profile is assumed for the channel, and every coefficient is initially (and independently) drawn from a Gaussian distribution with zero mean and unit variance. As for the channel model, an AR process of order 1 has been considered, i.e., R=1. The coefficient of the AR process is a _{1}=1−10^{−5}, and we evaluate the performance of the MMSE + KF and SOSMMSE + KF receivers in terms of the symbol error rate (SER) considering two different values for the variance of the channel noise, \(\sigma ^{2}_{v}\), each one studied in a section of its own^{4}. Furthermore, different values for the channel order are explored in every case.
In all the simulations, each data frame is generated independently of all others (including the transmitted data, the MIMO channel realization and the noise terms), and the lag for the MMSE smoothers is set to d=m−1. The latter condition guarantees that every symbol is detected using all the related observations, and values of the smoothing lag above m−1 do not seem to yield a noticeable performance gain. The results are averaged over 60, 000 data frames.
Slow fading channel (\(\sigma ^{2}_{v} = 5 \times 10^{3}\))
In this scenario, we consider a value of \(\sigma ^{2}_{v} = 0.005\) for the variance of the noise in the AR process governing the evolution of the channel (see Eq. (5)).
Figure 3 compares the SER achieved by the algorithms MMSE + KF and SOSMMSE + KF for different values of the SNR when the channel is flat (m=1). In order to reach a SER of 10^{−2}, the method using the proposed MMSE DFE requires roughly 0.4 dBs less SNR than the method using the conventional MMSE DFE. This gap largely widens as the SNR increases: for a SER of 5×10^{−3}, the curve for the MMSE + KF is more than 1 dB away from that of the SOSMMSE + KF. Notice that both methods exhibit an error floor, but the one associated with the DFE scheme introduced in this paper is lower.
When the channel is flat, only the present state of the channel is needed (along with the observations) in order to detect the symbols transmitted. On the other hand, when the channel is dispersive, i.e., m>1, not only does the number of channel coefficients to be estimated increase, but predictions of future channel states are also necessary in order to perform both causalinterference cancelation and smoothing. Overall, this results in less reliable channel estimates being employed, and hence accounting for their uncertainty becomes important. This is illustrated in Fig. 4 that shows the performance of the algorithms when m=3. When comparing these results to those obtained for m=1, the SER achieved by the MMSE + KF degrades over the whole range of SNRs on account of this algorithm neglecting the channel SOS. On the other hand, the SOSMMSE + KF is able to successfully cope with the uncertainty in the channel estimates and even sees a performance boost due to the increase in diversity given by a higher channel order. As a consequence, the proposed receiver now exhibits a more clear advantage over the conventional one. Indeed, in order to attain a SER of 10^{−2}, the latter requires around 2.4 dBs more SNR than the former. Figure 5 shows that when the channel order is m=5, the gap between the curves of the SOSMMSE + KF and MMSE + KF algorithms widens even more. Moreover, the performance of the SOSbased MMSE DFE improves slightly whereas that of the conventional MMSE DFE further deteriorates.
Fast fading channel (\(\sigma ^{2}_{v} = 0.01\))
Increasing the variance of the channel noise has a twofold effect. On one side, since the channel now changes more rapidly and hence it is harder to track, the performance of any receiver is expected to worsen. On the other side, the channel evolving faster means that predictions about its future state are less reliable, and hence accounting for their uncertainty is even more important (which should benefit the SOSbased MMSE DFE). From a mathematical point of view, if the predicted channel estimates are not accurate, then the elements in the covariance matrices that enter Eq. (41) are nonnegligible and so is their contribution to the computation of the proposed MMSE DFE in Eq. (47).
Figure 6 shows the performance of the algorithms in a flat channel (m=1). The SER of both DFEs degrades in the mediumhigh SNR region, as compared to the previous scenario, but this penalty is larger in the case of the MMSE + KF. Thus, the proposed MMSE DFE now exhibits a more pronounced advantage over the conventional MMSE (about 0.65 dBs for a SER of 10^{−2} and more than 3 dBs for a SNR of 5×10^{−3}).
Increasing the channel order in a fast fading channel has a negligible effect on the performance of the SOSMMSE + KF^{5} but seriously harms that of the MMSE + KF. In Fig. 7, it can be seen that, when the channel order is m=3, a SER of 10^{−2} requires ≈8.3 dBs less SNR in the former than in the latter (recall that in the previous scenario this gap was approximately 2.4 dBs).
Results for the case in which m=5 are shown in Fig. 8. Again, the advantage of the proposed method over the conventional one is much more clear as the channel order increases.
Conclusions
In this work, we have introduced an enhanced version of the conventional MMSE equalizer for timeselective MIMO channels that takes advantage of the posterior secondorder statistics of the channel provided by the KF. Computer simulations show that the proposed SOSMMSE DFE yields significant performance gains (in terms of SER) over the conventional MMSE in the mediumhigh SNR region. In highly dispersive channels, the SNR required for the proposed SOSMMSE receiver to achieve a certain SER can be several dBs lower than that required by the conventional MMSE. This is especially true for fastvarying channels, in which the uncertainty about the channel estimates becomes important. Indeed, a measure of this uncertainty is given by the secondorder statistics of the channel, which are dismissed by the conventional MMSE, but handled by the one introduced in this paper.
It is important to point out that the application of the SOSaided MMSE filter introduced in this work is not restricted to DFE receivers. On the contrary, the key idea is very general and can be integrated into any MMSEbased scheme as long as SOS of an unknown random variable that is relevant for the filter (here, the channel) are available.
One last remark is that the computational complexity of the proposed SOSMMSE DFE is of the same order as that of the conventional scheme that neglects the channel SOS.
Endnotes
^{1} For all practical purposes, any model that is linear and affected by Gaussian noise is amenable to be used here.
^{2} Notice that (for the sake of mathematical convenience) estimates of the symbol vectors s _{ t+1},s _{ t+2},⋯,s _{ t+d } are obtained at time t (using observations up to time t+d) since they are also included in \(\mathbf {s}_{^{t}_{t+d}}\). However, they are dismissed at that time and the actual estimates of the symbols in s _{ t+i } are computed at time t+i using the observations up to time t+i+d.
^{3} This is a reasonable hypothesis since the smoothing lag should be selected to account for, at least, the observations containing all the energy of the symbols transmitted at time t. This obviously depends on the length of the CIR, m, and it is common to set the smoothing lag to d=m−1. This is also the case in the experiments whose results are presented in Section 6.
^{4} Notice that the higher the variance of the channel noise, the more rapidly the channel coefficients fluctuate. A faster varying channel can also be obtained by decreasing the coefficient of the AR process, a _{1}, which determines the correlation between a channel coefficient and itself at a different time instant.
^{5} Here, the increase in diversity due to a higher channel order is not enough to compensate for the rapid variation of the channel coefficients.
Appendix
Computation of the crosscovariance \(\protect \phantom {\dot {i}\!}\Sigma _{\mathbf {h}_{t+k}(n),\mathbf {h}_{t+l}(q)}\) with k<l
The qth column in matrix H _{ t+l }, h _{ t+l }(q), can be expressed, according to Eq. (5), as
where v _{ t+l } is a N _{ r }×1 vector of i.i.d. Gaussian random variables (r.v.) with zero mean and variance \(\sigma ^{2}_{v}\). Its expectation is then given by
Notice that this last equation is just Eq. (29) restated columnwise.
Substituting Eqs. (48) and (49) into (43), we obtain
Since we are assuming k<l, it is clear from Eq. (5) that vectors h _{ t+k }(n) and \(\mathbf {v}_{t+l}^{H}\) are independent (the channel at time t + k < t + l is independent of the noise affecting the channel in the future), and hence \(\mathbb {E}_{t1}\left [ \mathbf {h}_{t+k}(n) \mathbf {v}_{t+l}^{H} \right ] = \mathbb {E}_{t1}\left [ \mathbf {h}_{t+k}(n) \right ] \mathbb {E}_{t1}\left [ \mathbf {v}_{t+l}^{H} \right ] = 0\) due the channel noise, v _{ t+l }, having zero mean. For the same reason, the other expectation involving the channel noise, i.e., \( \mathbb {E}_{t1}\left [ \mathbb {E}_{t1}\left [\mathbf {h}_{t+k}(n)\right ] \mathbf {v}_{t+l}^{H}\right ],\) is also zero since \(\mathbb {E}_{t1}\left [\mathbf {h}_{t+k}(n)\right ]\) is a constant. Therefore, Eq. (50) becomes
(and through straightforward algebraic manipulation)
Since all the terms in the second summation of (51) are zero, only the first summation is left and the equation can be rewritten as
where Eq. (44) has been used in the last step of the derivation.
Abbreviations
 AWGN:

Additive white Gaussian noise
 AR:

Autoregressive
 BPSK:

Binary phaseshift keying
 CIR:

Channel impulse response
 DFE:

Decision feedback equalizer
 i.i.d.:

Independent and identically distributed
 KF:

Kalman filter
 MIMO:

Multiple input multiple output
 MMSE:

Minimum mean square error
 r.v.:

Random variable
 SOS:

Secondorder statistics
 SER:

Symbol error rate
References
 1
E Telatar, Capacity of multiantenna Gaussian channels. Eur. Trans. Telecommun.10:, 585–595 (1999).
 2
S Verdú, Multiuser Detection (Cambdridge University Press, Cambridge, 1998).
 3
TS Rappaport, Wireless Communications: Principles and Practice, (2nd Edition) (PrenticeHall, Upper Saddle River, 2001).
 4
C Tidestav, A Ahlen, M Sternad, Realizable MIMO decision feedback equalizers: structure and design. IEEE Trans. Signal Process. 49(1), 121–133 (2001).
 5
CA Belfiore, JH Park Jr., Decision feedback equalization. Proc. IEEE. 67(8), 1143–1156 (1979).
 6
AH Sayed, Adaptive Filters (WileyIEEE Press, Hoboken, 2008).
 7
AF Molisch, Wireless Communications, vol. 15 (John Wiley & Sons, West Sussex, 2010).
 8
JG Andrews, Interference cancellation for cellular systems: a contemporary overview. Wireless Commun. IEEE. 12(2), 19–29 (2005).
 9
AM Chan, GW Wornell, A new class of efficient blockiterative interference cancellation techniques for digital communication receivers. J. VLSI Signal Process. Syst. Signal Image Video Technol. 30(13), 197–215 (2002).
 10
E Biglieri, AR Calderbank, AG Constantinides, A Goldsmith, A Paulraj, MIMO Wireless Communications (Cambridge University Press, Cambridge, 2010).
 11
C Komnikakis, C Fragouli, AH Sayed, RD Wesel, Multiinput multioutput fading channel tracking and equalization using Kalman estimation. IEEE Trans. Signal Process.50(5), 1065–1076 (2002).
 12
J Choi, Equalization and semiblind channel estimation for spacetime block coded signals over a frequencyselective fading channel. IEEE Trans. Signal Process. 52(3), 774–785 (2004).
 13
MA Vázquez, MF Bugallo, J Míguez, Sequential Monte Carlo methods for complexityconstrained MAP equalization of dispersive MIMO channels. Signal Process. 88:, 1017–1034 (2008).
 14
G Yanfei, H Zishu, in Communications, Circuits and Systems, 2005. Proceedings. 2005 International Conference On, 1. MIMO channel tracking based on Kalman filter and MMSEDFE (IEEE, 2005), pp. 223–226.
 15
RE Kalman, A new approach to linear filtering and prediction problems. J. Basic Eng.82:, 35–45 (1960).
 16
BDO Anderson, JB Moore, Optimal filtering (Englewood Cliffs, New Jersey, 1979).
 17
N AlDhahir, AH Sayed, The finitelength multiinput multioutput MMSEDFE. Signal Process. IEEE Trans. 48(10), 2921–2936 (2000).
 18
D Guo, X Wang, Blind detection in MIMO systems via sequential Monte Carlo. IEEE J. Selected Areas Commun. 21(3), 464–473 (2003).
 19
J Míguez, L Castedo, Spacetime channel estimation and soft detection in timevarying multiaccess channels. Signal Process.83(2), 389–411 (2003).
Acknowledgements
This work was supported by Ministerio de Economía y Competitividad of Spain (project COMPREHENSION TEC201238883C0201), Comunidad de Madrid (project CASICAMCM S2013/ICE2845), and the Office of Naval Research Global (award no. N62909 1512011).
Competing interests
The authors declare that they have no competing interests.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 MIMO
 MMSE
 Joint channel and data estimation
 Secondorder statistics