Training sequence design for MIMO channels: an application-oriented approach

Katselis, Dimitrios; Rojas, Cristian R; Bengtsson, Mats; Björnson, Emil; Bombois, Xavier; Shariati, Nafiseh; Jansson, Magnus; Hjalmarsson, Håkan

doi:10.1186/1687-1499-2013-245

Research
Open access
Published: 17 October 2013

Training sequence design for MIMO channels: an application-oriented approach

Dimitrios Katselis¹,
Cristian R Rojas¹,
Mats Bengtsson¹,
Emil Björnson¹,
Xavier Bombois²,
Nafiseh Shariati¹,
Magnus Jansson¹ &
…
Håkan Hjalmarsson¹

EURASIP Journal on Wireless Communications and Networking volume 2013, Article number: 245 (2013) Cite this article

1762 Accesses
9 Citations
Metrics details

Abstract

In this paper, the problem of training optimization for estimating a multiple-input multiple-output (MIMO) flat fading channel in the presence of spatially and temporally correlated Gaussian noise is studied in an application-oriented setup. So far, the problem of MIMO channel estimation has mostly been treated within the context of minimizing the mean square error (MSE) of the channel estimate subject to various constraints, such as an upper bound on the available training energy. We introduce a more general framework for the task of training sequence design in MIMO systems, which can treat not only the minimization of channel estimator’s MSE but also the optimization of a final performance metric of interest related to the use of the channel estimate in the communication system. First, we show that the proposed framework can be used to minimize the training energy budget subject to a quality constraint on the MSE of the channel estimator. A deterministic version of the 'dual’ problem is also provided. We then focus on four specific applications, where the training sequence can be optimized with respect to the classical channel estimation MSE, a weighted channel estimation MSE and the MSE of the equalization error due to the use of an equalizer at the receiver or an appropriate linear precoder at the transmitter. In this way, the intended use of the channel estimate is explicitly accounted for. The superiority of the proposed designs over existing methods is demonstrated via numerical simulations.

1 Introduction

An important factor in the performance of multiple antenna systems is the accuracy of the channel state information (CSI) [1]. CSI is primarily used at the receiver side for purposes of coherent or semicoherent detection, but it can be also used at the transmitter side, e.g., for precoding and adaptive modulation. Since in communication systems the maximization of spectral efficiency is an objective of interest, the training duration and energy should be minimized. Most current systems use training signals that are white, both spatially and temporally, which is known to be a good choice according to several criteria [2, 3]. However, in case that some prior knowledge on the channel or noise statistics is available, it is possible to tailor the training signal and to obtain a significantly improved performance. Especially, several authors have studied scenarios where long-term CSI in the form of a covariance matrix over the short-term fading is available. So far, most proposed algorithms have been designed to minimize the squared error of the channel estimate, e.g., [4–9]. Alternative design criteria are used in [5] and [10], where the channel entropy is minimized given the received training signal. In [11], the resulting capacity in the case of a single-input single-output (SISO) channel is considered, while [12] focuses on the pairwise error probability.

Herein, a generic context is described, drawing from similar techniques that have been recently proposed for training signal design in system identification [13–15]. This context aims at providing a unified theoretical framework that can be used to treat the MIMO training optimization problem in various scenarios. Furthermore, it provides a different way of looking at the aforementioned problem that could be adjusted to a wide variety of estimation-related problems in communication systems. First, we show how the problem of minimizing the training energy subject to a quality constraint can be solved, while a 'dual’ deterministic (average design) problem is considered^a. In the sequel, we show that by a suitable definition of the performance measure, the problem of optimizing the training for minimizing the channel MSE can be treated as a special case. We also consider a weighted version of the channel MSE, which relates to the well-known L-optimality criterion [16]. Moreover, we explicitly consider how the channel estimate will be used and attempt to optimize the end performance of the data transmission, which is not necessarily equivalent to minimizing the mean square error (MSE) of the channel estimate. Specifically, we study two uses of the channel estimate: channel equalization at the receiver using a minimum mean square error (MMSE) equalizer and channel inversion (zero-forcing precoding) at the transmitter, and derive the corresponding optimal training signals for each case. In the case of MMSE equalization, separate approximations are provided for the high and low SNR regimes. Finally, the resulting performance is illustrated based on numerical simulations. Compared to related results in the control literature, here, we directly design a finite length training signal and consider not only deterministic channel parameters but also a Bayesian channel estimation framework. A related pilot design strategy has been proposed in [17] for the problem of jointly estimating the frequency offset and the channel impulse response in single-antenna transmissions.

Implementing an adaptive choice of pilot signals in a practical system would require a feedback signalling overhead, since both the transmitter and the receiver have to agree on the choice of the pilots. Just as the previous studies in the area, the current paper is primarily intended to provide a theoretical benchmark on the resulting performance of such a scheme. Directly considering the end performance in the pilot design is a step into making the results more relevant. The data model used in [4–10] is based on the assumption that the channel is frequency flat but the noise is allowed to be frequency selective. Such a generalized assumption is relevant in systems that share spectrum with other radio interfaces using a narrower bandwidth and possibly in situations where channel coding introduces a temporal correlation in interfering signals. In order to focus on the main principles of our proposed strategy, we maintain this research line by using the same model in the current paper.

As a final comment, the novelty of this paper is on introducing the application-oriented framework as the appropriate context for training sequence design in communication systems. To this end, Hermitian form-like approximations of performance metrics are addressed here because they usually are good approximations of many performance metrics of interest, as well as for simplicity purposes and comprehensiveness of presentation. Although the ultimate performance metric in communications systems, namely the bit error rate (BER), would be of interest, its handling seems to be a challenging task and is reserved for future study. In this paper, we make an effort to introduce the application-oriented training design framework in the most illustrative and straightforward way.

This paper is organized as follows: Section 2 introduces the basic MIMO received signal model and specific assumptions on the structure of channel and noise covariance matrices. Section 3 presents the optimal channel estimators, when the channel is considered to be either a deterministic or a random matrix. Section 4 presents the application-oriented optimal training designs in a guaranteed performance context, based on confidence ellipsoids and Markov bound relaxations. Moreover, Section 5 focuses on four specific applications, namely that of MSE channel estimation, channel estimation based on the L-optimality criterion, and finally channel estimation for MMSE equalization and ZF precoding. Numerical simulations are provided in Section 6, while Section 7 concludes this paper.

1.1 Notations

Boldface (lowercase) is used for column vectors, x, and (uppercase) for matrices, X. Moreover, X^T, X^H, X^∗, and X^† denote the transpose, the conjugate transpose, the conjugate, and the Moore-Penrose pseudoinverse of X, respectively. The trace of X is denoted as tr(X) and A ≽ B means that A - B is positive semidefinite. vec(X) is the vector produced by stacking the columns of X, and (X)_i,j is the (i, j)-th element of X. [X]₊ means that all negative eigenvalues of X are replaced by zeros (i.e., [X]₊ ≽ 0). $C N (\bar{x}, Q)$ stands for circularly symmetric complex Gaussian random vectors, where $\bar{x}$ is the mean and Q the covariance matrix. Finally, α! denotes the factorial of the non-negative integer α and mod (a, b) the modulo operation between the integers a, b.

2 System model

We consider a MIMO communication system with n_T antennas at the transmitter and n_R antennas at the receiver. The received signal at time t is modelled as

y (t) = H x (t) + n (t),

where $x (t) \in C^{n_{T}}$ and $y (t) \in C^{n_{R}}$ are the baseband representations of the transmitted and received signals, respectively. The impact of background noise and interference from adjacent communication links is represented by the additive term $n (t) \in C^{n_{R}}$ . We will further assume that x(t) and n(t) are independent (weakly) stationary signals. The channel response is modeled by $H \in C^{n_{R} \times n_{T}}$ , which is assumed constant during the transmission of one block of data and independent between blocks, that is, we are assuming frequency flat block fading. Two different models of the channel will be considered:

(i)
A deterministic model
(ii)
A stochastic Rayleigh fading model^b, i.e., $vec (H) \in C N (0, R)$ , where, for mathematical tractability, we will assume that the known covariance matrix R possesses the Kronecker model used, e.g., in [7, 10]:
$\begin{align} R & = R_{T}^{T} \otimes R_{R} \end{align}$
(1)

where $R_{T} \in C^{n_{T} \times n_{T}}$ and $R_{R} \in C^{n_{R} \times n_{R}}$ are the spatial covariance matrices at the transmitter and receiver side, respectively. This model has been experimentally verified in [18, 19] and further motivated in [20, 21].

We consider training signals of arbitrary length B, represented by $P \in C^{n_{T} \times B}$ , whose columns are the transmitted signal vectors during training. Placing the received vectors in $Y = [y (1) \dots y (B)] \in C^{n_{R} \times B}$ , we have

Y = H P + N,

where $N = [n (1) \dots n (B)] \in C^{n_{R} \times B}$ is the combined noise and interference matrix.

Defining $\tilde{P} = P^{T} \otimes I$ , we can then write

\begin{align} vec (Y) = \tilde{P} vec (H) + vec (N) . \end{align}

(2)

For example in [7, 10], we assume that $vec (N) \in C N (0, S)$ , where the covariance matrix S also possesses a Kronecker structure:

\begin{align} S = S_{Q}^{T} \otimes S_{R} . \end{align}

(3)

Here, $S_{Q} \in C^{B \times B}$ represents the temporal covariance matrix^c that is used to model the effect of temporal correlations in interfering signals, when the noise incorporates multiuser interference. Moreover, $S_{R} \in C^{n_{R} \times n_{R}}$ represents the received spatial covariance matrix that is mostly related with the characteristics of the receive array. The Kronecker structure (3) corresponds to an assumption that the spatial and temporal properties of N are uncorrelated.

The channel and noise statistics will be assumed known to the receiver during estimation. Statistics can often be achieved by long-term estimation and tracking [22]. For the data transmission phase, we will assume that the transmit signal {x(t)} is a zero-mean, weakly stationary process, which is both temporally and spatially white, i.e., its spectrum is Φ_x(ω) = λ_xI.

3 Channel matrix estimation

3.1 Deterministic channel estimation

The minimum variance unbiased (MVU) channel estimator for the signal model (2), subject to a deterministic channel (Assumption i) in Section 2, is given by [23]:

vec ({\hat{H}}_{MVU}) = {({\tilde{P}}^{H} S^{- 1} \tilde{P})}^{- 1} {\tilde{P}}^{H} S^{- 1} vec (Y) .

(4)

This estimate has the distribution

\begin{align} vec ({\hat{H}}_{MVU}) \in C N (vec (H), I_{F,MVU}^{- 1}), \end{align}

(5)

where $I_{F,MVU}$ is the inverse covariance matrix

I_{F,MVU} = {\tilde{P}}^{H} S^{- 1} \tilde{P} .

(6)

From this, it follows that the estimation error $\tilde{H} ≜ {\hat{H}}_{MVU} - H$ will, with probability α, belong to the uncertainty set

D_{D} = \{\tilde{H} : {vec}^{H} (\tilde{H}) I_{F,MVU} vec (\tilde{H}) \leq \frac{1}{2} χ_{α}^{2} (2 n_{T} n_{R})\},

(7)

where $χ_{α}^{2} (n)$ is the α percentile of the χ²(n) distribution [15].

3.2 Bayesian channel estimation

For the case of a stochastic channel model (Assumption ii) in Section 2, the posterior channel distribution becomes (see [23])

vec (H) | Y, P \in C N (vec ({\hat{H}}_{MMSE}), C_{MMSE}),

(8)

where the first and second moments are

\begin{array}{l} vec ({\hat{H}}_{MMSE}) = {(R^{- 1} + {\tilde{P}}^{H} S^{- 1} \tilde{P})}^{- 1} {\tilde{P}}^{H} S^{- 1} vec (Y), \\ C_{MMSE} = {(R^{- 1} + {\tilde{P}}^{H} S^{- 1} \tilde{P})}^{- 1} . \end{array}

(9)

Thus, the estimation error $\tilde{H} ≜ {\hat{H}}_{MMSE} - H$ will, with probability α, belong to the uncertainty set

D_{B} = \{\tilde{H} : {vec}^{H} (\tilde{H}) I_{F,MMSE} vec (\tilde{H}) \leq \frac{1}{2} χ_{α}^{2} (2 n_{T} n_{R})\},

(10)

where $I_{F,MMSE} ≜ C_{MMSE}^{- 1}$ is the inverse covariance matrix in the MMSE case [15].

4 Application-oriented optimal training design

In a communication system, an estimate of the channel, say $\hat{H}$ , is needed at the receiver to detect the data symbols and may also be used at the transmitter to improve the performance. Let $J (\tilde{H}, H)$ be a scalar measure of the performance degradation at the receiver due to the estimation error $\tilde{H}$ for a channel H. The objective of the training signal design is then to ensure that the resulting channel estimation error $\tilde{H}$ is such that

J (\tilde{H}, H) \leq \frac{1}{γ}

(11)

for some parameter γ > 0, which we call accuracy. In our settings, (11) cannot be typically ensured, since the channel estimation error is Gaussian-distributed (see (5) and (8)) and, therefore, can be arbitrarily large. However, for the MVU estimator (4), we know that, with probability $α, \tilde{H}$ will belong to the set $D_{D}$ defined in (7). Thus, we are led to training signal designs which guarantee (11) for all channel estimation errors $\tilde{H} \in D_{D}$ . One training design problem that is based on this concept is to minimize the required transmit energy budget subject to this constraint

\begin{array}{c} DGPP : \underset{P \in C^{n_{T} \times B}}{minimize} tr (P P^{H}) \\ s.t. J (\tilde{H}, H) \leq \frac{1}{γ} \forall \tilde{H} \in D_{D} . \end{array}

(12)

Similarly, for the MMSE estimator in Subsection 3.2, the corresponding optimization problem is given as follows:

\begin{array}{l} SGPP : \underset{P \in C^{n_{T} \times B}}{minimize} tr (P P^{H}) \\ s.t. J (\tilde{H}, H) \leq \frac{1}{γ} \forall \tilde{H} \in D_{B}, \end{array}

(13)

where $D_{B}$ is defined in (10). We will call (12) and (13) as the deterministic guaranteed performance problem (DGPP) and the stochastic guaranteed performance problem (SGPP), respectively. An alternative dual problem is to maximize the accuracy γ subject to a constraint $P > 0$ on the transmit energy budget. For the MVU estimator, this can be written as

\begin{array}{l} DMPP : \underset{P \in C^{n_{T} \times B}}{maximize} γ \\ s.t. J (\tilde{H}, H) \leq \frac{1}{γ} \forall \tilde{H} \in D_{D}, \\ tr (P P^{H}) \leq P . \end{array}

(14)

We will call this problem as the deterministic maximized performance problem (DMPP). The corresponding Bayesian problem will be denoted as the stochastic maximized performance problem (SMPP). We will study the DGPP/SGPP in detail in this contribution, but the DMPP/SMPP can be treated in similar ways. In fact, Theorem 3 in [24] suggests that the solutions to the DMPP/SMPP are the same as for DGPP/SGPP, save for a scaling factor.

The existing work on optimal training design for MIMO channels are, to the best of the authors knowledge, based upon standard measures on the quality of the channel estimate, rather than on the quality of the end use of the channel. The framework presented in this section can be used to treat the existing results as special cases. Additionally, if an end performance metric is optimized, the DGPP/SGPP and DMPP/SMPP formulations better reflect the ultimate objective of the training design. This type of optimal training design formulations has already been used in the control literature, but mainly for large sample sizes [13, 14, 25, 26], yielding an enhanced performance with respect to conventional estimation-theoretic approaches. A reasonable question is to examine if such a performance gain can be achieved in the case of training sequence design for MIMO channel estimation, where the sample sizes would be very small.

Remark.

Ensuring (11) can be translated into a chance constraint of the for

Pr \{J (\tilde{H}, H) \leq \frac{1}{γ}\} \geq 1 - ε

(15)

for some ε ∈ [0, 1]. Problems (12), (13), and (14) correspond to a convex relaxation of this chance constraint based on confidence ellipsoids [27], as we show in the next subsection.

4.1 Approximating the training design problems

A key issue regarding the above training signal design problems is their computational tractability. In general, they are highly non-linear and non-convex. In the sequel, we will nevertheless show that using some approximations, the corresponding optimization problems for certain applications of interest can be convexified. In addition, these approximations will show that DGPP and SGPP are very closely related. In particular, we will show that the performance metric for these applications can be approximated by

\begin{align} J (\tilde{H}, H) \approx {vec}^{H} (\tilde{H}) I_{adm} vec (\tilde{H}), \end{align}

(16)

where the Hermitian positive definite matrix $I_{adm}$ can be written in Kronecker product form as $I_{T}^{T} \otimes I_{R}$ for some matrices $I_{T}$ and $I_{R}$ . This means that we can approximate the set ${\tilde{H} : J (\tilde{H}, H) \leq 1 / γ}$ of all admissible estimation errors $\tilde{H}$ by a (complex) ellipsoid in the parameter space [15]:

\begin{align} D_{adm} = {\tilde{H} : {vec}^{H} (\tilde{H}) γ I_{adm} vec (\tilde{H}) \leq 1} . \end{align}

(17)

Consequently, the DGPP (12) can be approximated by

\begin{array}{l} ADGPP : \underset{P \in C^{n_{T} \times B}}{minimize} tr (P P^{H}) \\ s.t. D_{D} \subseteq D_{adm} . \end{array}

(18)

We call this problem the approximative DGPP (ADGPP). Both $D_{D}$ and $D_{adm}$ are level sets of quadratic functions of the channel estimation error. Rewriting (7) so that we have the same level as in (17), we obtain

D_{D} = \{\tilde{H} : {vec}^{H} (\tilde{H}) \frac{2 I_{F,MVU}}{χ_{α}^{2} (2 n_{T} n_{R})} vec (\tilde{H}) \leq 1\} .

Comparing this expression with (17) gives that $D_{D} \subseteq D_{adm}$ if and only if

\frac{2 I_{F,MVU}}{χ_{α}^{2} (2 n_{T} n_{R})} ≽ γ I_{adm}

(for a more general result, see [15], Theorem 3.1).

When $I_{adm}$ has the form $I_{adm} = I_{T}^{T} \otimes I_{R}$ , with $I_{T} \in C^{n_{T} \times n_{T}}$ and $I_{R} \in C^{n_{R} \times n_{R}}$ , the ADGPP (18) can then be written as

\begin{align} \begin{array}{l} \underset{P \in C^{n_{T} \times B}}{minimize} tr (P P^{H}) \\ s.t. \underset{I_{F,MVU}}{\underset{⏟}{{\tilde{P}}^{H} S^{- 1} \tilde{P}}} ≽ \frac{γ χ_{α}^{2} (2 n_{T} n_{R})}{2} I_{T}^{T} \otimes I_{R} . \end{array} \end{align}

(19)

Similarly, by observing that $D_{adm}$ only depends on the channel estimation error, and following the derivations above, the SGPP can be approximated by the following formulation

\begin{align} \begin{array}{l} \underset{P \in C^{n_{T} \times B}}{minimize} tr (P P^{H}) \\ s.t. \underset{I_{F,MMSE}}{\underset{⏟}{R^{- 1} + {\tilde{P}}^{H} S^{- 1} \tilde{P}}} ≽ \frac{γ χ_{α}^{2} (2 n_{T} n_{R})}{2} I_{T}^{T} \otimes I_{R} . \end{array} \end{align}

(20)

We call the last problem approximative SGPP (ASGPP).

Remarks.

1.
The approximation (16) is not possible for the performance metric of every application. Several examples that this is possible are presented in Section 5. Therefore, in some applications, different convex approximations of the corresponding performance metrics may have to be found.
2.
The quality of the approximation (16) is characterized by its corresponding tightness to the true performance metric. For our purposes, when the tightness of the aforementioned approximation is acceptable, such an approximation will be desirable for two reasons. First, it corresponds to a Hermitian form, therefore offering nice mathematical properties and tractability. Additionally, the constraint $D_{D} \subseteq D_{adm}$ can be efficiently handled.
3.
The sizes of $D_{D}$ and $D_{adm}$ critically depend on the parameter α. In practice, requiring α to have a value close to 1 corresponds to adequately representing the uncertainty set in which (approximately) all possible channel estimation errors lie.

4.2 The deterministic guaranteed performance problem

The problem formulations for ADGPP and ASGPP in (19) and (20), respectively, are similar in structure. The solutions to these problems (and to other approximative guaranteed performance problems) can be obtained from the following general theorem, which has not previously been available in the literature, to the best of our knowledge:

Theorem 1.

Consider the optimization problem

\begin{align} \begin{array}{l} \underset{P \in C^{n \times N}}{minimize} tr (P P^{H}) \\ s.t. P A^{- 1} P^{H} ≽ B \end{array} \end{align}

(21)

where $A \in C^{N \times N}$ is Hermitian positive definite, $B \in C^{n \times n}$ is Hermitian positive semidefinite, and N ≥ rank (B). An optimal solution to (21) is

\begin{align} P^{opt} = U_{B} D_{P} U_{A}^{H}, \end{align}

(22)

where $D_{P} \in C^{n \times N}$ is a rectangular diagonal matrix with $\sqrt{{(D_{A})}_{1, 1} {(D_{B})}_{1, 1}} \dots \sqrt{{(D_{A})}_{m, m} {(D_{B})}_{m, m}}$ on the main diagonal. Here, m = min(n, N), while U_Aand U_Bare unitary matrices that originate from the eigendecompositions of A and B, respectively, i.e.,

\begin{array}{l} A & = U_{A} D_{A} U_{A}^{H} \\ B & = U_{B} D_{B} U_{B}^{H} \end{array}

(23)

and D_A, D_Bare real-valued diagonal matrices, with their diagonal elements sorted in ascending and descending order, respectively, that is, 0 < (D_A)_1,1 ≤ … ≤ (D_A)_N,Nand (D_B)_1,1 ≥ … ≥ (D_B)_n,n ≥ 0.

If the eigenvalues of A and B are distinct and strictly positive, then the solution (22) is unique up to the multiplication of the columns of U_Aand U_Bby complex unit-norm scalars.

Proof.

The proof is given in Appendix 7. □

By the right choice of A and B, Theorem 1 will solve the ADGPP in (19). This is shown by the next theorem (recall that we have assumed that $S = S_{Q}^{T} \otimes S_{R}$ ).

Theorem 2.

Consider the optimization problem

\begin{align} \begin{array}{l} \underset{P \in C^{n_{T} \times B}}{minimize} tr (P P^{H}) \\ s.t. {\tilde{P}}^{H} {(S_{Q}^{T} \otimes S_{R})}^{- 1} \tilde{P} ≽ c I_{T}^{T} \otimes I_{R}, \end{array} \end{align}

(24)

where $\tilde{P} = P^{T} \otimes I$ , $S_{Q} \in C^{B \times B}$ , $S_{R} \in C^{n_{R} \times n_{R}}$ are Hermitian positive definite, $I_{T} \in C^{n_{T} \times n_{T}}$ , $I_{R} \in C^{n_{R} \times n_{R}}$ are Hermitian positive semidefinite, and c is a positive constant.

If $B \geq rank (I_{T})$ , this problem is equivalent to (21) in Theorem 1 for A = S_Qand $B = c λ_{max} (S_{R} I_{R}) I_{T}$ , where λ_max(·) denotes the maximum eigenvalue.

Proof.

The proof is given in Appendix 7. □

4.3 The stochastic guaranteed performance problem

We will see that Theorem 1 can be also used to solve the ASGPP in (20). In order to obtain closed-form solutions, we need some equality relation between the Kronecker blocks of $R = R_{T}^{T} \otimes R_{R}$ and of either $S = S_{Q}^{T} \otimes S_{R}$ or $I_{adm} = I_{T}^{T} \otimes I_{R}$ . For instance, it can be R_R = S_R, which may be satisfied if the receive antennas are spatially uncorrelated or if the signal and interference are received from the same main direction (see [7] for details on the interpretations of these assumptions).

The solution to ASGPP in (20) is given by the next theorem.

Theorem 3.

Consider the optimization problem

\begin{align} \begin{array}{l} \underset{P \in C^{n_{T} \times B}}{minimize} tr (P P^{H}) \\ s.t. R^{- 1} + {\tilde{P}}^{H} S^{- 1} \tilde{P} ≽ c I_{T}^{T} \otimes I_{R}, \end{array} \end{align}

(25)

where $\tilde{P} = P^{T} \otimes I$ , $R = R_{T}^{T} \otimes R_{R}$ , and $S = S_{Q}^{T} \otimes S_{R}$ . Here, $R_{T} \in C^{n_{T} \times n_{T}}$ , $R_{R} \in C^{n_{R} \times n_{R}}$ , $S_{Q} \in C^{B \times B}$ , $S_{R} \in C^{n_{R} \times n_{R}}$ are Hermitian positive definite, $I_{T} \in C^{n_{T} \times n_{T}}$ , $I_{R} \in C^{n_{R} \times n_{R}}$ are Hermitian positive semidefinite, and c is a positive constant.

If R_R = S_Rand $B \geq rank ({[c λ_{max} (S_{R} I_{R}) I_{T} - R_{T}^{- 1}]}_{+})$ , then the problem is equivalent to (21) in Theorem 1 for A = S_Qand $B = {[c λ_{max} (S_{R} I_{R}) I_{T} - R_{T}^{- 1}]}_{+}$ .
If $R_{R}^{- 1} = I_{R}$ and $B \geq rank ({[c I_{T} - R_{T}^{- 1}]}_{+})$ , then the problem is equivalent to (21) in Theorem 1 for A = S_Qand $B = λ_{max} (S_{R} I_{R}) {[c I_{T} - R_{T}^{- 1}]}_{+}$ .
If $R_{T}^{- 1} = I_{T}$ and $B \geq rank (I_{T})$ , then the problem is equivalent to (21) in Theorem 1 for A = S_Qand $B = λ_{max} (S_{R} {[c I_{R} - R_{R}]}_{+}) I_{T}$ .

Proof.

The proof is given in Appendix 3. □

The mathematical difference between ADGPP and ASGPP is the R^-1 term that appears in the constraint of the latter. This term has a clear impact on the structure of the optimal ASGPP training matrix.

It is also worth noting that the solution for R_R = S_R requires $B \geq rank ({[c λ_{max} (S_{R} I_{R}) I_{T} - R_{T}^{- 1}]}_{+})$ which means that solutions can be achieved also for B < n_T (i.e., when only the B < n_T strongest eigendirections of the channel are excited by training). In certain cases, e.g., when the interference is temporally white (S_Q = I), it is optimal to have $B = rank ({[c λ_{max} (S_{R} I_{R}) I_{T} - R_{T}^{- 1}]}_{+})$ as larger B will not decrease the training energy usage, cf.[9].

4.4 Optimizing the average performance

Except from the previously presented training designs, the application-oriented design can be alternatively given in the following deterministic dual context. If H is considered to be deterministic, then we can set up the following optimization problem

\begin{align} \begin{array}{l} \underset{P \in C^{n_{T} \times B}}{minimize} & E_{\tilde{H}} \{J (\tilde{H}, H)\} \\ s.t. & tr (P P^{H}) \leq P . \end{array} \end{align}

(26)

Clearly, for the MVU estimator

E_{\tilde{H}} \{J (\tilde{H}, H)\} = tr \{I_{adm} {({\tilde{P}}^{H} S^{- 1} \tilde{P})}^{- 1}\},

so problem (26) is solved by the following theorem.

Theorem 4.

Consider the optimization problem

\begin{align} \begin{array}{l} \underset{P \in C^{n_{T} \times B}}{minimize} & tr \{I_{adm} {({\tilde{P}}^{H} S^{- 1} \tilde{P})}^{- 1}\} \\ s.t. & tr (P P^{H}) \leq P, \end{array} \end{align}

(27)

where $I_{adm} = I_{T}^{T} \otimes I_{R}$ as before. Set $I_{T}^{'} = I_{T}^{T} = U_{T} D_{T} U_{T}^{H}$ and $S_{Q}^{'} = S_{Q}^{T} = U_{Q} D_{Q} U_{Q}^{H}$ . Here, $U_{T} \in C^{n_{T} \times n_{T}}$ , $U_{Q} \in C^{B \times B}$ are unitary matrices and D_T, D_Qare diagonal n_T × n_Tand B × B matrices containing the eigenvalues of $I_{T}^{'}$ and $S_{Q}^{'}$ in descending and ascending order, respectively. Then, the optimal training matrix P equals ${(U_{T} D_{P} U_{Q}^{H})}^{*}$ , where D_Pis an n_T × B diagonal matrix with main diagonal entries equal to ${(D_{P})}_{i, i} = \sqrt{P \sqrt{α_{i}} / \sum_{j = 1}^{n_{T}} \sqrt{α_{j}}}, i = 1, 2, \dots, n_{T} (B \geq n_{T})$ and α_i = (D_T)_i,i(D_Q)_i,i, i = 1, 2, …, n_Twith the aforementioned ordering.

Proof.

The proof is given in Appendix 7. □

Remarks.

1.
In the general case of a non-Kronecker-structured $I_{adm}$ , the training can be obtained using numerical methods like the semidefinite relaxation approach described in [28].
2.
If $I_{adm}$ depends on H, then in order to implement this design, the embedded H in $I_{adm}$ may be replaced by a previous channel estimate. This implies that this approach is possible whenever the channel variations allow for such a design. This observation also applies to the designs in the previous subsections (see also [24, 29], where the same issue is discussed for other system identification applications).

The corresponding performance criterion for the case of the MMSE estimator is given by

E_{\tilde{H}, H} \{J (\tilde{H}, H)\} = tr \{I_{adm} {(R^{- 1} + {\tilde{P}}^{H} S^{- 1} \tilde{P})}^{- 1}\} .

In this case, we can derive closed form expressions for the optimal training under assumptions similar to those made in Theorem 3. We therefore have the following result:

Theorem 5.

Consider the optimization problem

\begin{align} \begin{array}{l} \underset{P \in C^{n_{T} \times B}}{minimize} & tr \{I_{adm} {(R^{- 1} + {\tilde{P}}^{H} S^{- 1} \tilde{P})}^{- 1}\} \\ s.t. & tr (P P^{H}) \leq P \end{array} \end{align}

(28)

where $I_{adm} = I_{T}^{T} \otimes I_{R}$ as before. Set $S_{Q}^{'} = S_{Q}^{T} = V_{Q} Λ_{Q} V_{Q}^{H}$ . Here, we assume that $V_{Q} \in C^{B \times B}$ is a unitary matrix and Λ_Qa diagonal B × B matrix containing the eigenvalues of $S_{Q}^{'}$ in arbitrary order. Assume also that $R_{T}^{'} = R_{T}^{T}$ with eigenvalue decomposition $U_{T}^{'} Λ_{T}^{'} U_{T}^{′H}$ . The diagonal elements of $Λ_{T}^{'}$ are assumed to be arbitrarily ordered. Then, we have the following cases:

R_R = S_R: We further discriminate two cases
- $I_{T} = I$
  : Then the optimal training is given by a straightforward adaptation of Proposition 2 in[8].
- $R_{T}^{- 1} = I_{T}$
  : Then, the optimal training matrix P equals ${(U_{T}^{'} (π_{opt}) D_{P} V_{Q}^{H} (ϖ_{opt}))}^{*}$ , where π_opt, ϖ_optstand for the optimal orderings of the eigenvalues of $R_{T}^{'}$ and $S_{Q}^{'}$ , respectively. These optimal orderings are determined by Algorithm 1 in Appendix 5. Additionally, define the parameter m_∗as in Equation 69 (see Appendix 5). Assuming in the following that, for simplicity of notation, ${(Λ_{T}^{'})}_{i, i}$ ’s and (Λ_Q)_i,i’s have the optimal ordering, the optimal (D_P)_j,j, j = 1, 2, …, m_∗are given by the expression
  $\begin{align} \sqrt{\frac{P + \sum_{i = 1}^{m_{*}} \frac{{(Λ_{Q})}_{i, i}}{{(Λ_{T}^{'})}_{i, i}}}{\sum_{i = 1}^{m_{*}} \sqrt{\frac{{(Λ_{Q})}_{i, i}}{{(Λ_{T}^{'})}_{i, i}}}} \sqrt{\frac{{(Λ_{Q})}_{j, j}}{{(Λ_{T}^{'})}_{j, j}}} - \frac{{(Λ_{Q})}_{j, j}}{{(Λ_{T}^{'})}_{j, j}}}, \end{align}$
  
  while (D_P)_j,j = 0 for j = m_∗ + 1, …, n_T.

Proof.

The proof is given in Appendix 5. □

Remarks. Two interesting additional cases complementing the last theorem are the following:

1.
If the modal matrices of R _R and S _R are the same, $I_{T} = I$ and $I_{R} = I$ , then the optimal training is given by [9].
2.
In any other case (e.g., if R _R ≠ S _R), the training can be found using numerical methods like the semidefinite relaxation approach described in [28]. Note again that this approach can also handle general $I_{adm}$ , not necessarily expressed as $I_{T}^{T} \otimes I_{R}$ .

As a general conclusion, the objective function of the dual deterministic problems presented in this subsection can be shown to correspond to Markov bound approximations of the chance constraint (15), as these approximations have been described in [27], namely

\begin{align} Pr \{J (\tilde{H}, H) \geq \frac{1}{γ}\} \leq γ E \{J (\tilde{H}, H)\} \leq ε \end{align}

According to the analysis in [27], these approximations should be tighter than the approximations based on confidence ellipsoids presented in Subsections 4.1, 4.2, and 4.3 for practically relevant values of ε.

5 Applications

5.1 Optimal training for channel estimation

We now consider the channel estimation problem in its standard context, where the performance metric of interest is the MSE of the corresponding channel estimator. Optimal linear estimators for this task are given by (4) and (9). The performance metric of interest is

J (\tilde{H}, H) = {vec}^{H} (\tilde{H}) vec (\tilde{H}),

which corresponds to $I_{adm} = I$ , i.e., to $I_{T} = I$ and $I_{R} = I$ . The ADGPP and ASGPP are given by (19) and (20), respectively, with the corresponding substitutions. Their solutions follow directly from Theorems 2 and 3, respectively. To the best of the authors’ knowledge, such formulations for the classical MIMO training design problem are presented here for the first time. Furthermore, solutions to the standard approach of minimizing the channel MSE subject to a constraint on the training energy budget are provided by Theorems 4 and 5 as special cases.

Remark.

Although the confidence ellipsoid and Markov bound approximations are generally different [27], in the simulation section, we show that their performance is almost identical for reasonable operating γ-regimes in the specific case of standard channel estimation.

5.2 Optimal training for the L-optimality criterion

Consider now a performance metric of the form

J_{W} (\tilde{H}, H) = {vec}^{H} (\tilde{H}) W vec (\tilde{H}),

for some positive semidefinite weighting matrix W. Assume also that W = W₁ ⊗ W₂ for some positive semidefinite matrices W₁, W₂. Taking the expected value of this performance metric with respect to either $\tilde{H}$ or both $\tilde{H}$ and H leads to the well-known L-optimality criterion for optimal experiment design in statistics [16]. In this case, $I_{T} = W_{1}^{T}$ and $I_{R} = W_{2}$ . In the context of MIMO communication systems, such a performance metric may arise, e.g., if we want to estimate the MIMO channel having some deficiencies in either the transmit and/or the receive antenna arrays. The simplest case would be both W₁ and W₂ being diagonal with non-zero entries in the interval [0,1], W₁ representing the deficiencies in the transmit antenna array and W₂ in the receive array. More general matrices can be considered if we assume cross-couplings between the transmit and/or receive antenna elements.

Remark.

The numerical approach of [28] mentioned after Theorems 4 and 5 can handle general weighting matrices W, not necessarily Kronecker-structured.

5.3 Optimal training for channel equalization

In this subsection, we consider the problem of estimating a transmitted signal sequence {x(t)} from the corresponding received signal sequence {y(t)}. Among a wide range of methods that are available [30, 31], we will consider the MMSE equalizer, and for mathematical tractability, we will approximate it by the non-causal Wiener filter. Note that for reasonably long block lengths, the MMSE estimate becomes similar to the non-causal Wiener filter [32]. Thus, the optimal training design based on the non-causal Wiener filter should also provide good performance when using an MMSE equalizer.

5.3.1 Equalization using exact channel state information

Let us first assume that H is available. In this ideal case and with the transmitted signal being weakly stationary with spectrum Φ_x, the optimal estimate of the transmitted signal x(t) from the received observations of y(t) can be obtained according to

\hat{x} (t; H) = F (q; H) y (t),

(29)

where q is the unit time shift operator, [q x(t) = x(t + 1)], and the non-causal Wiener filter F(e^jω;H) is given by

\begin{array}{l} F (e^{jω}; H) = Φ_{xy} (ω) Φ_{y}^{- 1} (ω) \\ = Φ_{x} (ω) H^{H} {(H Φ_{x} (ω) H^{H} + Φ_{n} (ω))}^{- 1} . \end{array}

(30)

Here, Φ_xy(ω) = Φ_x(ω)H^H denotes the cross-spectrum between x(t) and y(t), and

Φ_{y} (ω) = H Φ_{x} (ω) H^{H} + Φ_{n} (ω)

(31)

is the spectral density of y(t). Using our assumption that Φ_x(ω) = λ_xI, we obtain the simplified expression

\begin{align} F (e^{jω}; H) & = H^{H} {(H H^{H} + Φ_{n} (ω) / λ_{x})}^{- 1} . \end{align}

(32)

Remark.

Assuming non-singularity of Φ_n(ω) for every ω, the MMSE equalizer is applicable for all values of the pair (n_T, n_R).

5.3.2 Equalization using a channel estimate

Consider now the situation where the exact channel H is unavailable, but we only have an estimate $\hat{H}$ . When we replace H by its estimate in the expressions above, the estimation error for the equalizer will increase. While the increase in the bit error rate would be a natural measure of the quality of the channel estimate $\hat{H}$ , for simplicity, we consider the total MSE of the difference, $\hat{x} (t; H + \tilde{H}) - \hat{x} (t; H) = Δ (q; \tilde{H}, H) y (t)$ (note that $\hat{H} = H + \tilde{H}$ ), using the notation $Δ (q; \tilde{H}, H) ≜ F (q; H + \tilde{H}) - F (q; H)$ . In view of this, we will use the channel equalization (CE) performance metric

\begin{align} J_{CE} (\tilde{H}, H) & = E \{{[Δ (q; \tilde{H}, H) y (t)]}^{H} [Δ (q; \tilde{H}, H) y (t)]\} \\ = E \{tr ([Δ (q; \tilde{H}, H) y (t)] {[Δ (q; \tilde{H}, H) y (t)]}^{H})\} \\ = \frac{1}{2 π} \int_{- π}^{π} tr (Δ (e^{jω}; \tilde{H}, H) Φ_{y} (ω) Δ^{H} (e^{jω}; \tilde{H}, H)) d ω . \end{align}

(33)

We see that the poorer the accuracy of the estimate, the larger the performance metric $J_{CE} (\tilde{H}, H)$ and, thus, the larger the performance loss of the equalizer. Therefore, this performance metric is a reasonable candidate to use when formulating our training sequence design problem. Indeed, the Wiener equalizer based on the estimate $\hat{H} = H + \tilde{H}$ of H can be deemed to have a satisfactory performance if $J_{CE} (\tilde{H}, H)$ remains below some user-chosen threshold. Thus, we will use J_CE as J in problems (12) and (13). Though these problems are not convex, we show in Appendix 1 how they can be convexified, provided some approximations are made.

Remarks.

1.
The excess MSE $J_{CE} (\tilde{H}, H)$ quantifies the distance of the MMSE equalizer using the channel estimate $\hat{H}$ over the clairvoyant MMSE equalizer, i.e., the one using the true channel. This performance metric is not the same as the classical MSE in the equalization context, where the difference $\hat{x} (t; H + \tilde{H}) - x (t)$ is considered instead of $\hat{x} (t; H + \tilde{H}) - \hat{x} (t; H)$ . However, since in practice the best transmit vector estimate that can be attained is the clairvoyant one, the choice of $J_{CE} (\tilde{H}, H)$ is justified. This selection allows for a performance metric approximation given by (16).
2.
There are certain cases of interest, where $J_{CE} (\tilde{H}, H)$ approximately coincides with the classical equalization MSE. Such a case occurs when n _R ≥ n _T, H is full column rank and the SNR is high during data transmission.

5.4 Optimal training for zero-forcing precoding

Apart from receiver side channel equalization, as another example of how to apply the channel estimate we consider point-to-point zero-forcing (ZF) precoding, also known as channel inversion [33]. Here, the channel estimate is fed back to the transmitter, and its (pseudo-)inverse is used as a linear precoder. The data transmission is described by

y (t) = H Ψ x (t) + v (t),

where the precoder is $Ψ = {\hat{H}}^{†}$ , i.e., $Ψ = {\hat{H}}^{H} {(\hat{H} {\hat{H}}^{H})}^{- 1}$ if we limit ourselves to the practically relevant case n_T ≥ n_R and assume that $\hat{H}$ is full rank. Note that x(t) is an n_R × 1 vector in this case, but the transmitted vector is Ψ x(t), which is n_T × 1.

Under these assumptions and following the same strategy and notation as in Appendix 1, we get

\begin{array}{l} y (t; \hat{H}) - y (t; H) = H {\hat{H}}^{†} x (t) + v - (H H^{†} x (t) + v) \\ = (\hat{H} {\hat{H}}^{†} - \tilde{H} {\hat{H}}^{†} - I) x (t) ≃ - \tilde{H} H^{†} x (t) . \end{array}

(34)

Consequently, a quadratic approximation of the cost function is given by

\begin{align} J_{ZF} (\tilde{H}, H) & = E \{{[y (t; \hat{H}) - y (t; H)]}^{H} [y (t; \hat{H}) - y (t; H)]\} \\ ≃ λ_{x} {vec}^{H} (\tilde{H}) ({(H^{†} {(H^{†})}^{H})}^{T} \otimes I) vec (\tilde{H}) \\ = {vec}^{H} (\tilde{H}) (I_{T}^{T} \otimes I_{R}) vec (\tilde{H}), \end{align}

(35)

if we define $I_{T} = λ_{x} H^{†} {(H^{†})}^{H} = λ_{x} H^{H} {(H H^{H})}^{- 2} H$ and $I_{R} = I$ .

Remark.

The cost functions of (27) and (28) reveal the fact that any performance-oriented training design is a compromise between the strict channel estimation accuracy and the desired accuracy related to the end performance metric at hand. Caution is needed to identify cases where the performance-oriented design may severely degrade the channel estimation accuracy, annihilating all gains from such a design. In the case of ZF precoding, if n_T > n_R, $I_{T}$ will have rank at most n_R yielding a training matrix P with only n_R active eigendirections. This is in contrast to the secondary target, which is the channel estimation accuracy. Therefore, we expect ADGPP, ASGPP, and the approaches in Subsection 4.4 to behave abnormally in this case. Thus, we propose the performance-oriented design only when n_T = n_R in the context of the ZF precoding.

6 Numerical examples

The purpose of this section is to examine the performance of optimal training sequence designs and compare them with existing methods. For the channel estimation MSE figure, we plot the normalized MSE (NMSE), i.e., $E (∥ H - \hat{H} ∥^{2} / ∥ H ∥^{2})$ , versus the accuracy parameter γ. In all figures, fair comparison among the presented schemes is ensured via training energy equalization. Additionally, the matrices R_T, R_R, S_Q, S_R follow the exponential model, that is, they are built according to

{(R)}_{i, j} = r {j - i}^{}, j \geq i,

(36)

where r is the (complex) normalized correlation coefficient with magnitude ρ = |r| < 1. We choose to examine the high correlation scenario for all the presented schemes. Therefore, in all plots, |r| = 0.9 for all matrices R_T, R_R, S_Q, S_R. Additionally, the transmit SNR during data transmission is chosen to be 15 dB, when channel equalization and ZF precoding are considered. High SNR expressions are therefore used for optimal training sequence designs. Since the optimal pilot sequences depend on the true channel, we have for these two applications additionally assumed that the channel changes from block to block according to the relationship H_i = H_i-1 + μ E_i, where E_i has the same Kronecker structure as H, and it is completely independent from H_i-1. The estimated H_i-1 is used in the pilot design. The value of μ is 0.01.

In Figure 1, the channel estimation NMSE performance versus the accuracy γ is presented for three different schemes. The scheme 'ASGPP’ is the optimal Wiener filter together with the optimal guaranteed performance training matrix described in Subsection 5.1. 'Optimal MMSE’ is the scheme presented in [9], which solves the optimal training problem for the vectorized MMSE, operating on vec(Y). This solution is a special case in the statement of Theorem 5 for $I_{adm} = I$ , i.e., $I_{T} = I$ and $I_{R} = I$ . Finally, the scheme 'White training’ corresponds to the use of the vectorized MMSE filter at the receiver, with a white training matrix, i.e., one having equal singular values and arbitrary left and right singular matrices. This scheme is justified when the receiver knows the involved channel and noise statistics but does not want to sacrifice bandwidth to feedback the optimal training matrix to the transmitter. This scheme is also justifiable in fast fading environments. In Figure 1, we assume that R_R = S_R, and we implement the corresponding optimal training design for each scheme. ASGPP is implemented first for a certain value of γ, and the rest of the schemes are forced to have the same training energy. The Optimal MMSE in [9] and ASGPP schemes have the best and almost identical MSE performance. This indicates that for the problem of training design with the classical channel estimation MSE, the confidence ellipsoid relaxation of the chance constraint and the relaxation based on the Markov bound in Subsection 4.4 deliver almost identical performances. This verifies the validity of the approximations in this paper for the classical channel estimation problem.

Figures 2 and 3 demonstrate the L-optimality average performance metric E{J_W} versus γ. Figure 2 corresponds to the L-optimality criterion based on MVU estimators and Figure 3 is based on MMSE estimators. In Figure 2, the scheme 'MVU’ corresponds to the optimal training for channel estimation when the MVU estimator is used. This training is given by Theorem 4 for $I_{adm} = I$ , i.e., $I_{T} = I$ and $I_{R} = I$ . 'MVU in Subsection 4.4’ is again the MVU estimator based on the same theorem but for the correct $I_{adm}$ . The scheme 'MMSE in Subsection 4.4’ is given by the numerical solution mentioned below Theorem 5, since W₁ is different than the cases where a closed form solution is possible. Figures 2 and 3 clearly show that both the confidence ellipsoid and Markov bound approximations are better than the optimal training for standard channel estimation. Therefore, for this problem, the application-oriented training design is superior compared to training designs with respect to the quality of the channel estimate.

Figure 4 demonstrates the performance of optimal training designs for the MMSE estimator in the context of MMSE channel equalization. We assume that R_R ≠ S_R, since the high SNR expressions for $I_{adm}$ in the context of MMSE channel equalization in Appendix 1 indicate that $I_{T} = I$ for this application and according to Theorem 5 the optimal training corresponds to the optimal training for channel estimation in [8]. We observe that the curves almost coincide. Moreover, it can be easily verified that for MMSE channel equalization with the MVU estimator, the optimal training designs given by Theorems 2 and 4 differ slightly only in the optimal power loading. These observations essentially show that the optimal training designs for the MVU and MMSE estimators in the classical channel estimation setup are nearly optimal for the application of MMSE channel equalization. This relies on the fact that for this particular application, $I_{T} = I$ in the high data transmission SNR regime.

Figures 5 and 6 present the corresponding performances in the case of the ZF precoding. The descriptions of the schemes are as before. In Figure 6, we assume that R_R = S_R. The superiority of the application-oriented designs for the ZF precoding application is apparent in these plots. Here, $I_{T} \neq I$ and this is why the optimal training for the channel estimate works less well in this application. Moreover, the ASGPP is plotted for γ ≥ 0 dB, since for γ ≤ -5 dB all the eigenvalues of $B = {[c λ_{max} (S_{R} I_{R}) I_{T} - R_{T}^{- 1}]}_{+}$ are equal to zero for this particular set of parameters defining Figure 6.

Figure 7 presents an outage plot in the context of the L-optimality criterion for the MVU estimator. We assume that γ = 1. We plot Pr{ J_W > 1 / γ} versus the training power. This plot indirectly verifies that the confidence ellipsoid relaxation of the chance constraint given by the scheme ASGPP is not as tight as the Markov bound approximation given by the scheme MVU in Subsection 4.4.

Finally, Figures 8 and 9 present the BER performance of the nearest neighbor rule applied to the signal estimates produced by the corresponding schemes in Figure 6. The used modulation is quadrature phase-shift keying (QPSK). The 'Clairvoyant’ scheme corresponds to the ZF precoder with perfect channel knowledge. The channel estimates have been obtained for γ = -10 and 0 dB, respectively. Even if the application-oriented estimates are not optimized for the BER performance metric, they lead to better performance than the Optimal MMSE scheme in [9] as is apparent in Figure 8. In Figure 9, the performances of all schemes approximately coincide. This is due to the fact that for γ = 5 dB, all channel estimates are very good, thus leading to symbol MSE performance differences that do not translate to the corresponding BER performances for the nearest neighbor decision rule.

7 Conclusions

In this contribution, we have presented a quite general framework for MIMO training sequence design subject to flat and block fading, as well as spatially and temporally correlated Gaussian noise. The main contribution has been to incorporate the objective of the channel estimation into the design. We have shown that by a suitable approximation of $J (\tilde{H}, H)$ , it is possible to solve this type of problem for several interesting applications such as standard MIMO channel estimation, L-optimality criterion, MMSE channel equalization, and ZF precoding. For these problems, we have numerically demonstrated the superiority of the schemes derived in this paper. Additionally, the proposed framework is valuable since it provides a universal way of posing different estimation-related problems in communication systems. We have seen that it shows interesting promise for, e.g., ZF precoding, and it may yield even greater end performance gains in estimation problems related to communication systems, when approximations can be avoided, depending on the end performance metric at hand.

Endnotes

^a The word 'dual’ in this paper defers from the Lagrangian duality studied in the context of convex optimization theory (see [24] for more details on this type of duality).

^b For simplicity, we have assumed a zero-mean channel, but it is straightforward to extend the results to Rician fading channels, similar to [9].

^c We set the subscript Q to S_Q to highlight its temporal nature and the fact that its size is B × B. The matrices with subscript T in this paper share the common characteristic that they are n_T × n_T, while those with subscript R are n_R × n_R.

^d For a Hermitian positive semidefinite matrix A, we consider here that A^1/2 is the matrix with the same eigenvectors as A and eigenvalues as the square roots of the corresponding eigenvalues of A. With this definition of the square root of a Hermitian positive semidefinite matrix, it is clear that A^1/2 = A^H/2, leading to A = A^1/2A^H/2 = A^H/2A^1/2.

^e For easiness, we use the MATLAB notation in this table.

Appendix 1

Approximating the performance measure for MMSE channel equalization

In order to obtain the approximating set $D_{adm}$ , let us first denote the integrand in the performance metric (33) by

J^{'} (ω; \tilde{H}, H) = tr (Δ (e {jω}^{}; \tilde{H}, H) Φ_{y} (ω) Δ^{H} (e {jω}^{}; \tilde{H}, H)) .

(37)

In addition, let ≃ denote an equality in which only dominating terms with respect to $| | \tilde{H} | |$ are retained. Then, using (32), we observe that

\begin{array}{l} Δ (e {jω}^{}; \tilde{H}, H) = F (e {jω}^{}; H + \tilde{H}) - F (e {jω}^{}; H) \\ ≃ λ_{x} {\tilde{H}}^{H} Φ_{y}^{- 1} - λ_{x}^{2} H^{H} Φ_{y}^{- 1} (H {\tilde{H}}^{H} + \tilde{H} H^{H}) Φ_{y}^{- 1} \\ = λ_{x} (\underset{= Q}{\underset{⏟}{(I - λ_{x} H^{H} Φ_{y}^{- 1} H)}} {\tilde{H}}^{H} Φ_{y}^{- 1} - λ_{x} H^{H} Φ_{y}^{- 1} \tilde{H} H^{H} Φ_{y}^{- 1}), \end{array}

(38)

where we omitted the argument ω for simplicity. Inserting (38) in (37) results in the approximation

\begin{align} J^{'} (ω; \tilde{H}, H) & ≃ λ_{x}^{2} tr (Q {\tilde{H}}^{H} Φ_{y}^{- 1} \tilde{H} Q \\ + λ_{x}^{2} (H^{H} Φ_{y}^{- 1} \tilde{H} H^{H} Φ_{y}^{- 1} H {\tilde{H}}^{H} Φ_{y}^{- 1} H) \\ - λ_{x} Q {\tilde{H}}^{H} Φ_{y}^{- 1} H {\tilde{H}}^{H} Φ_{y}^{- 1} H \\ - λ_{x} H^{H} Φ_{y}^{- 1} \tilde{H} H^{H} Φ_{y}^{- 1} \tilde{H} Q) . \end{align}

(39)

To rewrite this into a quadratic form in terms of $vec (\tilde{H})$ , we use the facts that tr(A B) = tr(B A) = vec^T(A^T)vec(B) = vec^H(A^H)vec(B) and vec(A B C) = (C^T ⊗ A)vec(B) for matrices A, B, and C of compatible dimensions. Hence, we can rewrite (39) as

\begin{array}{l} J^{'} (ω; \tilde{H}, H) ≃ {vec}^{H} (\tilde{H}) [λ_{x}^{2} {Q^{2}}^{T} \otimes Φ_{y}^{- 1}] vec (\tilde{H}) \\ + {vec}^{H} (\tilde{H}) [λ_{x}^{4} {(H^{H} Φ_{y}^{- 1} H)}^{T} \otimes Φ_{y}^{- 1} H H^{H} Φ_{y}^{- 1}] vec (\tilde{H}) \\ - {vec}^{H} (\tilde{H}) [λ_{x}^{3} {(Φ_{y}^{- 1} H Q)}^{T} \otimes Φ_{y}^{- 1} H] vec ({\tilde{H}}^{H}) \\ - {vec}^{H} ({\tilde{H}}^{H}) [λ_{x}^{3} {(Q H^{H} Φ_{y}^{- 1})}^{T} \otimes H^{H} Φ_{y}^{- 1}] vec (\tilde{H}) . \end{array}

(40)

In the next step, we introduce the permutation matrix Π defined such that $vec ({\tilde{H}}^{T}) = Π vec (\tilde{H})$ for every $\tilde{H}$ to rewrite (40) as

\begin{align} J^{'} (ω; \tilde{H}, H) ≃ {vec}^{H} (\tilde{H}) [λ_{x}^{2} {Q^{2}}^{T} \otimes Φ_{y}^{- 1}] vec (\tilde{H}) \\ + {vec}^{H} (\tilde{H}) [λ_{x}^{4} {(H^{H} Φ_{y}^{- 1} H)}^{T} \otimes Φ_{y}^{- 1} H H^{H} Φ_{y}^{- 1}] vec (\tilde{H}) \\ - {vec}^{H} (\tilde{H}) [λ_{x}^{3} {(Φ_{y}^{- 1} H Q)}^{T} \otimes Φ_{y}^{- 1} H] Π vec ({\tilde{H}}^{*}) \\ - {vec}^{H} ({\tilde{H}}^{*}) Π^{T} [λ_{x}^{3} {(Q H^{H} Φ_{y}^{- 1})}^{T} \otimes H^{H} Φ_{y}^{- 1}] vec (\tilde{H}) . \end{align}

(41)

We have now obtained a quadratic form. Note indeed that the last two terms are just complex conjugates of each other and thus we can write them as two times their real part.

High SNR analysis

In order to obtain a simpler expression for $I_{adm}$ , we will assume high SNR in the data transmission phase. We consider the practically relevant case where rank(H) = min(n_T, nn_nR). Depending on the rank of the channel matrix H, we will have three different cases:

Case 1.

rank(H) = n_R < n_TUnder this assumption, it can be shown that both the first and the second terms on the right hand side of (41) contribute to $I_{adm}$ . We have $Q \to Π_{H^{H}}^{⊥}$ and $λ_{x} Φ_{y}^{- 1} \to {(H H^{H})}^{- 1}$ for high SNR. Here, and in what follows, we use Π_X = X X^† to denote the orthogonal projection matrix on the range space of X and $Π_{X}^{⊥} = I - Π_{X}$ to denote the projection on the nullspace of X^H. Moreover, $λ_{x} H^{H} Φ_{y}^{- 1} H \to Π_{H^{H}}$ and $λ_{x}^{2} Φ_{y}^{- 1} H H^{H} Φ_{y}^{- 1} \to {(H H^{H})}^{- 1}$ for high SNR. As $Π_{H^{H}}^{⊥} + Π_{H^{H}} = I$ , summing the contributions from the first two terms in (41) finally gives the high SNR approximation

I_{adm} = λ_{x} I \otimes {(H H^{H})}^{- 1} .

(42)

Case 2.

rank(H) = n_R = n_TFor the non-singular channel case, the second term on the right hand side of (41) dominates. Here, we have $λ_{x} H^{H} Φ_{y}^{- 1} H \to I$ and $λ_{x}^{2} Φ_{y}^{- 1} H H^{H} Φ_{y}^{- 1} \to {(H H^{H})}^{- 1}$ for high SNR. Clearly, this results in the same expression for $I_{adm}$ as in Case 1, namely

I_{adm} = λ_{x} I \otimes {(H H^{H})}^{- 1} .

(43)

Case 3.

rank(H) = n_T < n_RIn this case, the second term on the right hand side of (41) dominates. When rank(H) = n_T, we get $λ_{x} H^{H} Φ_{y}^{- 1} H \to I$ and $λ_{x}^{2} Φ_{y}^{- 1} H H^{H} Φ_{y}^{- 1} \to Φ_{n}^{- 1 / 2} {[Φ_{n}^{- 1 / 2} H H^{H} Φ_{n}^{- 1 / 2}]}^{†} Φ_{n}^{- 1 / 2}$ for high SNR. Using these approximations finally gives the high SNR approximation

\begin{array}{l} I_{adm} = λ_{x} I \otimes (\frac{1}{2 π} \int_{- π}^{π} Φ_{n}^{- 1 / 2} {[Φ_{n}^{- 1 / 2} H H^{H} Φ_{n}^{- 1 / 2}]}^{†} \\ Φ_{n}^{- 1 / 2} d ω) . \end{array}

Low SNR analysis

For the low SNR regime, we do not need to differentiate our analysis for the cases n_T ≥ n_R and n_T < n_R because now Φ_y → Φ_n. It can be shown that the first term on the right hand side of (41) dominates, that is, the term involving

λ_{x}^{2} ({(Q^{2})}^{T} \otimes Φ_{y}^{- 1}) .

Moreover, Q → I and $Φ_{y}^{- 1} \to Φ_{n}^{- 1}$ . This yields

I_{adm} = I \otimes (\frac{λ_{x}^{2}}{2 π} \int_{- π}^{π} Φ_{n}^{- 1} d ω) .

(44)

Appendix 2

Proof of Theorem 1

For the proof of Theorem 1, we require some preliminary results. Lemmas 1 and 2 will be used to establish the uniqueness part of Theorem 1, and Lemma 3 is an extension of a standard result in majorization theory, which is used in the main part of the proof.

Lemma 1.

Let $D \in R^{n \times n}$ be a diagonal matrix with elements d_1,1 > ⋯ > d_n,n > 0. If $U \in C^{n \times n}$ is a unitary matrix such that UDU^Hhas diagonal (d_1,1, …, d_n,n), then U is of the form U = diag(u_1,1, …, u_n,n), where |u_i,i| = 1 for i = 1, …, n. This also implies that UDU^H = D.

Proof.

Let V = UDU^H. The equation for (V)_i,i is

\begin{align} \sum_{k = 1}^{n} d_{k, k} | u_{i, k} |^{2} = d_{i, i} \end{align}

from which we have, by the orthonormality of the columns of U, that

\begin{align} \sum_{k = 1}^{n} \frac{d_{k, k}}{d_{i, i}} | u_{i, k} |^{2} = 1 = \sum_{k = 1}^{n} | u_{i, k} |^{2} . \end{align}

(45)

□

We now proceed by induction on i = 1, …, n to show that the i th column of U is [0 ⋯ 0 u_i,i 0 ⋯ 0]^T with |u_i,i| = 1. For i = 1, it follows from (45) and the fact that U is unitary that

\begin{align} | u_{1, 1} |^{2} + {|\frac{d_{2, 2}}{d_{1, 1}} u_{2, 1}|}^{2} + \dots + {|\frac{d_{n, n}}{d_{1, 1}} u_{n, 1}|}^{2} \\ = | u_{1, 1} |^{2} + \dots + | u_{n, 1} |^{2} = 1 . \end{align}

However, since d_1,1 > ⋯ > d_n,n > 0, the only way to satisfy this equation is to have |u_1,1| = 1 and u_i,1 = 0 for i = 2, …, n. Now, if the assertion holds for i = 1,…, k, the orthogonality of the columns of U implies that u_i,k+1 = 0 for i = 1, …, k, and by following a similar reasoning as for the case i = 1, we deduce that |u_k+1,k+1| = 1 and u_i,k+1 = 0 for i = k + 2, …, n.

Lemma 2.

Let $D \in R^{n \times n}$ be a diagonal matrix with elements d_1,1 > ⋯ >d_N,N > 0. If $U \in C^{n \times n}$ , with n ≤ N, such that U^HU = I and $V = \tilde{D} U {\tilde{D}}^{- 1}$ (where $\tilde{D} = diag (d_{1, 1}, \dots, d_{n, n})$ ) also satisfies V^HV = I, then U is of the form U = [diag(u_1,1, …, u_n,n) 0_N-m,n]^T, where |u_i,i| = 1 for i = 1, …, n.

Proof.

The idea is similar to the proof of Lemma 1. We proceed by induction on the i th column of V. For the first column of V we have, by the orthonormality of the columns of U and V, that

\begin{align} | u_{1, 1} |^{2} + {|\frac{d_{2, 2}}{d_{1, 1}} u_{2, 1}|}^{2} & + \dots + {|\frac{d_{N, N}}{d_{1, 1}} u_{N, 1}|}^{2} \\ = 1 \\ = | u_{1, 1} |^{2} + \dots + | u_{N, 1} |^{2} . \end{align}

Since d_1,1 > ⋯ > d_N,N > 0, the only way to satisfy this equation is to have |u_1,1| = 1 and u_i,1 = 0 for i = 2, …, N. If now the assertion holds for columns 1 to k, the orthogonality of the columns of U implies that u_i,k+1 = 0 for i = 1, …, k, and by following a similar reasoning as for the first column of U we have that |u_k+1,k+1| = 1 and u_i,k+1 = 0 for i = k + 2, …, N. □

Lemma 3.

Let $A, B \in C^{n \times n}$ be Hermitian matrices. Arrange the eigenvalues a₁, n …, a_nof A in a descending order and the eigenvalues b₁, n …, b_nof B in an ascending order. Then, $tr (A B) \geq \sum_{i = 1}^{n} a_{i} b_{i}$ . Furthermore, if B = diag(b₁, n …, b_n) and both matrices have distinct eigenvalues, then $tr (A B) = \sum_{i = 1}^{n} a_{i} b_{i}$ if and only if A = diag(a₁, n …, a_n).

Proof.

See ([34], Theorem 9.H.1.h) for the proof of the first assertion. For the second part, notice that if B = diag(b₁, n …, b_n), then by ([34], Theorem 6.A.3)

\begin{align} tr (A B) = \sum_{i = 1}^{n} {(A)}_{i, i} b_{i} \geq \sum_{i = 1}^{n} {(A)}_{[i, i]} b_{i}, n \end{align}

where {(A)_{[i, i]}}_{i = 1, …, n} denotes the ordered set {(A)_1,1, …, (A)_n,n} sorted in descending order. Since {(A)_{[i, i]}}_i=1,…,n is majorized by {a₁, n …, a_n} and the b_i’s are distinct, we can use ([34], Theorem 3.A.2) to show that

\begin{align} \sum_{i = 1}^{n} {(A)}_{[i, i]} b_{i} > \sum_{i = 1}^{n} a_{i} b_{i} \end{align}

unless (A)_{[i, i]} = a_i for every i = 1, …, n. Therefore, $tr (A B) = \sum_{i = 1}^{n} a_{i} b_{i}$ if and only if the diagonal of A is (a₁, nnn …, a_n). Now, we have to prove that A is actually diagonal, but this follows from Lemma 1. □

Proof of Theorem 1

First, we simplify the expressions in (21). Using the eigendecompositions in (23) of A and B, we see that

\begin{align} P A^{- 1} P^{H} ≽ B & \Leftrightarrow P U_{A} D_{A}^{- 1} U_{A}^{H} P^{H} ≽ U_{B} D_{B} U_{B}^{H} \\ \Leftrightarrow U_{B}^{H} P U_{A} D_{A}^{- 1} U_{A}^{H} P^{H} U_{B} ≽ D_{B} . \end{align}

Now, define $\bar{P} = U_{B}^{H} P U_{A} D_{A}^{- 1 / 2}$ and observe that

\begin{align} tr (P P^{H}) & = tr ((U_{B} \bar{P} D_{A}^{- H / 2} U_{A}^{H}) {(U_{B} \bar{P} D_{A}^{- H / 2} U_{A}^{H})}^{H}) \\ = tr (U_{B} \bar{P} D_{A}^{- 1} {\bar{P}}^{H} U_{B}^{H}) = tr ({\bar{P}}^{H} \bar{P} D_{A}^{- 1}) . \end{align}

Therefore, (21) is equivalent to

\begin{align} \begin{array}{l} \underset{P \in C^{n \times N}}{minimize} & tr ({\bar{P}}^{H} \bar{P} D_{A}^{- 1}) \\ s.t. & \bar{P} {\bar{P}}^{H} ≽ D_{B} . \end{array} \end{align}

(46)

To further simplify our problem, consider the singular value decomposition $\bar{P} = U Σ V^{H}$ , where $U \in C^{n \times n}$ and $V \in C^{N \times N}$ are unitary matrices and Σ has the structure

\begin{align} Σ = [\begin{array}{l} σ_{1} & 0 & 0 & \dots & 0 \\ ⋱ & ⋮ & ⋮ \\ 0 & σ_{m} & 0 & \dots & 0 \end{array}] or Σ = [\begin{array}{l} σ_{1} & 0 \\ ⋱ \\ 0 & σ_{m} \\ 0 & \dots & 0 \\ ⋮ & ⋮ \\ 0 & \dots & 0 \end{array}] \end{align}

depending on whether N ≥ n or N < n. The singular values are ordered such that σ₁ ≥ ⋯ ≥ σ_m > 0. Now, observe that (46) is equivalent to

\begin{align} \begin{array}{l} \underset{P \in C^{n \times N}}{minimize} & tr (V^{H} Σ^{H} Σ V^{H} D_{A}^{- 1}) \\ s.t. & U Σ Σ^{H} U^{H} ≽ D_{B} . \end{array} \end{align}

(47)

With this problem formulation, it follows (from Sylvester’s law of inertia [35]) that we need m ≥ rank(D_B) to achieve feasibility in the constraint (i.e., having at least as many non-zero singular values of Σ as non-zero eigenvalues in D_B). This corresponds to the condition N ≥ rank(B) in the theorem.

Now, we will show that U and V can be taken to be the identity matrices. Using Lemma 3, the cost function can be lower bounded as

\begin{array}{l} tr (V Σ^{H} Σ V^{H} D_{A}^{- 1}) & \geq \sum_{j = 1}^{n} λ_{n - j + 1} (D_{A}) λ_{j} (V Σ^{H} Σ V^{H}) \\ = \sum_{j = 1}^{m} {(D_{A})}_{jj} σ_{j}^{2}, \end{array}

(48)

where λ_j(·) denotes the j th largest eigenvalue. The equality is achieved if V = I, and observe that we can select V in this manner without affecting the constraint.

To show that U can also be taken as the identity matrix, notice that the cost function in (47) does not depend on U, while the constraint implies (by looking at the diagonal elements of the inequality and recalling that U is unitary) that

\begin{align} σ_{i}^{2} \geq {(D_{B})}_{i, i}, i = 1, \dots, m, \end{align}

(49)

requiring m ≥ rank(D_B). Suppose that $\bar{U}$ and $\bar{Σ}$ minimize the cost. Then, we can replace $\bar{U}$ by I and satisfy the constraint, without affecting the cost in (48). This means that there exists an optimal solution with U = I.

With U = I and V = I, the problem (47) is equivalent (in terms of Σ) to

\begin{align} \begin{array}{l} \underset{σ_{1} \geq 0, \dots, σ_{m} \geq 0}{minimize} \sum_{i = 1}^{m} σ_{i}^{2} {(D_{A})}_{i, i} \\ s.t. σ_{i}^{2} \geq {(D_{B})}_{i, i}, i = 1, \dots, m. \end{array} \end{align}

It is easy to see that the optimal solution for this problem is $σ_{i}^{opt} = \sqrt{{(D_{B})}_{i, i}}, i = 1, \dots, m.$ By creating an optimal Σ, denoted as Σ^opt, with the singular values $σ_{1}^{opt}, \dots, σ_{m}^{opt}$ , we achieve an optimal solution

\begin{align} P^{opt} = U_{B} \bar{P} D_{A}^{1 / 2} U_{A}^{H} = U_{B} Σ^{opt} D_{A}^{1 / 2} U_{A}^{H} = U_{B} D_{P} U_{A}^{H} \end{align}

with D_P as stated in the theorem.

Finally, we will show how to characterize all optimal solutions for the case when A and B have distinct non-zero eigenvalues (thus, m = n). The optimal solutions need to give equality in (48) and thus Lemma 3 gives that V Σ Σ^HV^H is diagonal and equal to Σ Σ^H. Lemma 1 then implies that V = diag(v_1,1, …, v_n,n) with |v_i,i| = 1 for i = 1, …, n.

For the optimal Σ, we have that $σ_{i}^{2} = {(D_{B})}_{i, i}$ for i = 1, …, n, so the diagonal elements of U Σ Σ^HU^H - D_B are zero. Since U Σ Σ^HU^H-D_B ≽ 0 for every feasible solution of (47), U has to satisfy U Σ Σ^HU^H = D_B. Lemma 2 then establishes that the first n columns of U are of the form

{[diag (u_{1, 1}, \dots, u_{n, n}) 0_{N - m, n}]}^{T},

where |u_i,i| = 1 for i = 1, …, n. Since U has to be unitary and its last N - n + 1 columns play no role in $\bar{P}$ (due to the form of Σ), we can take them as [0_n,N-m+1I_N-m+1]^T without loss of generality.

Summarizing, an optimal solution is given by (23). When A and B have distinct eigenvalues, V and U can only multiply the columns of U_A and U_B, respectively, by complex scalars of unit magnitude.

Appendix 3

Proof of Theorems 2 and 3

Before proving Theorems 2 and 3, a lemma will be given that characterizes equivalences between different sets of feasible training matrices P.

Lemma 4.

Let $B \in C^{n \times n}$ and $C \in C^{m \times m}$ be Hermitian matrices and $f : C^{n \times N} \to C^{n \times n}$ be such that f(P) = f(P)^H. Then, the following sets are equivalent

{P | f (P) \otimes I ≽ B \otimes C} = {P | f (P) ≽ λ_{max} (C) B} .

(50)

Proof.

The equivalence will be proved by showing that the left hand side (LHS) is a subset of right hand side (RHS) and vice versa. First, assume that f(P) ≽ λ_max(C)B, then

\begin{array}{l} f (P) \otimes I ≽ λ_{max} (C) B \otimes I \\ = (B \otimes λ_{max} (C) I) ≽ (B \otimes C) . \end{array}

(51)

□

Hence, RHS⊆LHS.

Next, assume that f(P) ⊗ I ≽ B ⊗ C, but for the purpose of contradiction that f(P) ≽ ̸λ_max(C)B. Then, there exists a vector x such that x^H(f(P) - λ_max(C)B)x < 0. Let v be an eigenvector of C that corresponds to λ_max(C) and define y = x ⊗ v. Then,

\begin{array}{l} y (f (P) \otimes I - B \otimes C) y \\ = (x^{H} f (P) x) ∥ v ∥^{2} - (x^{H} B x) (v^{H} C v) \\ = x^{H} (f (P) - λ_{max} (C) B) x ∥ v ∥^{2} < 0 \end{array}

(52)

which is a contradiction. Hence, LHS⊆RHS.

Proof of Theorem 2

Rewrite the constraint as

\begin{array}{l} {\tilde{P}}^{H} {(S_{Q}^{T} \otimes S_{R})}^{- 1} \tilde{P} ≽ c I_{T}^{T} \otimes I_{R} \\ \Leftrightarrow & {(P S_{Q}^{- 1} P^{H})}^{T} \otimes S_{R}^{- 1} ≽ c I_{T}^{T} \otimes I_{R} \\ \Leftrightarrow & (P S_{Q}^{- 1} P^{H}) \otimes I ≽ c I_{T} \otimes S_{R} I_{R} . \end{array}

(53)

Let $f (P) = P S_{Q}^{- 1} P^{H}$ . Then, Lemma 4 gives that the set of feasible P is equivalent to the set of feasible P with the constraint

(P S_{Q}^{- 1} P^{H}) ≽ c λ_{max} (S_{R} I_{R}) I_{T} .

(54)

Proof of Theorem 3

In the case that R_R = S_R, the constraint can be rewritten as

{(P S_{Q}^{- 1} P^{H} + R_{T}^{- 1})}^{T} \otimes I ≽ c I_{T}^{T} \otimes S_{R} I_{R} .

(55)

With $f (P) = P S_{Q}^{- 1} P^{H} + R_{T}^{- 1}$ , Lemma 4 can be applied to achieve the equivalent constraint

\begin{array}{l} P S_{Q}^{- 1} P^{H} + R_{T}^{- 1} ≽ c λ_{max} (S_{R} I_{R}) I_{T} \\ \Leftrightarrow P S_{Q}^{- 1} P^{H} ≽ c λ_{max} (S_{R} I_{R}) I_{T} - R_{T}^{- 1} \\ \Leftrightarrow P S_{Q}^{- 1} P^{H} ≽ {[c λ_{max} (S_{R} I_{R}) I_{T} - R_{T}^{- 1}]}_{+} \end{array}

(56)

where the last equality follows from the fact that the left hand side is positive semidefinite.

In the case that $R_{R}^{- 1} = I_{R}$ , the constraint can be rewritten as

\begin{array}{l} {(P S_{Q}^{- 1} P^{H})}^{T} \otimes S_{R}^{- 1} ≽ {(c I_{T} - R_{T})}^{T} \otimes I_{R} \\ \Leftrightarrow {(P S_{Q}^{- 1} P^{H})}^{T} \otimes S_{R}^{- 1} ≽ {[c I_{T} - R_{T}]}_{+}^{T} \otimes I_{R} . \end{array}

(57)

Observe that this expression is identical to the constraint in (24), except that the positive semidefinite $I_{T}$ has been replaced by ${[c I_{T} - R_{T}]}_{+}$ . Thus, the equivalence follows directly from Theorem 2.

In the case $R_{T}^{- 1} = I_{T}$ , the constraint can be rewritten as

\begin{array}{l} {(P S_{Q}^{- 1} P^{H})}^{T} \otimes S_{R}^{- 1} ≽ I_{T}^{T} \otimes (c I_{R} - R_{R}) \\ \Leftrightarrow {(P S_{Q}^{- 1} P^{H})}^{T} \otimes S_{R}^{- 1} ≽ I_{T}^{T} \otimes {[c I_{R} - R_{R}]}_{+} . \end{array}

(58)

As in the previous case, the equivalence follows directly from Theorem 2.

Appendix 4

Proof of Theorem 4

Our basic assumption is that $I_{T}, I_{R}$ are both Hermitian matrices, which is encountered in the applications presented in this paper. Denoting by P^′ the matrix P^T and using the fact that^f $I_{adm} = {(I_{T}^{'} \otimes I_{R})}^{1 / 2} {(I_{T}^{'} \otimes I_{R})}^{1 / 2}$ , it can be seen that our optimization problem takes the following form

\begin{align} \begin{array}{l} \underset{P^{'} \in C^{B \times n_{T}}}{minimize} & J (H) \\ s.t. & tr (P^{'} {P^{'}}^{H}) \leq P, \end{array} \end{align}

(59)

where $J (H) = E_{\tilde{H}} \{J (\tilde{H}, H)\}$ is given by the expression

\begin{array}{l} tr \{{[I_{T^{'}}^{- 1 / 2} {P^{'}}^{H} {S^{'}}_{Q}^{- 1} P^{'} I_{T^{'}}^{- 1 / 2} \otimes I_{R}^{- 1 / 2} S_{R}^{- 1} I_{R}^{- 1 / 2}]}^{- 1}\} \\ = tr \{{[I_{T^{'}}^{- 1 / 2} {P^{'}}^{H} {S^{'}}_{Q}^{- 1} P^{'} I_{T^{'}}^{- 1 / 2}]}^{- 1} \otimes I_{R}^{1 / 2} S_{R} I_{R}^{1 / 2}\} . \end{array}

Using the fact that tr(A ⊗ B) = tr(A)tr(B) for square matrices A and B, it is clear from the last expression that the optimal training matrix can be found by minimizing

tr \{{[V_{T}^{H} I_{T^{'}}^{- 1 / 2} {P^{'}}^{H} {S^{'}}_{Q}^{- 1} P^{'} I_{T^{'}}^{- 1 / 2} V_{T}]}^{- 1}\},

(60)

where V_T denotes the modal matrix of $I_{T^{'}}$ corresponding to an arbitrary ordering of its eigenvalues. Here, we have used the invariance of the trace operator under unitary transformations. First, note that for an arbitrary Hermitian positive definite matrix A, $tr (A^{- 1}) = \sum_{i} 1 / λ_{i} (A)$ , where λ_i(A) is the i th eigenvalue of A. Since the function 1/x is strictly convex for x > 0, tr(A^-1) is a Schur-convex function with respect to the eigenvalues of A[34]. Additionally, for any Hermitian matrix A, the vector of its diagonal entries is majorized by the vector of its eigenvalues [34]. Combining the last two results, it follows that tr(A^-1) is minimized when A is diagonal. Therefore, we may choose the modal matrices of P^′ in such a way that $V_{T}^{H} I_{T}^{' - 1 / 2} P^{′H} S_{Q}^{' - 1} P^{'} I_{T}^{' - 1 / 2} V_{T}$ is diagonalized. Suppose that the singular value decomposition (SVD) of P^′^H is $U D_{P′} V^{H}$ and that the modal matrix of S Q′, corresponding to arbitrary ordering of its eigenvalues, is V_Q. Setting U = V_T and V = V_Q, $V_{T}^{H} I_{T}^{' - 1 / 2} P^{′H} S_{Q}^{' - 1} P^{'} I_{T}^{' - 1 / 2} V_{T}$ is diagonalized and is given by the expression

Λ_{T}^{- 1 / 2} D_{P^{'}} Λ_{Q}^{- 1} D_{P^{'}} Λ_{T}^{- 1 / 2} .

Here, Λ_T and Λ_Q are the diagonal eigenvalue matrices containing the eigenvalues of $I_{T}^{'}$ and S^′_Q, respectively, in their main diagonals. The ordering of the eigenvalues corresponds to V_T and V_Q. Clearly, by reordering the columns of V_T and V_Q, we can reorder the eigenvalues in Λ_T and Λ_Q. Assume that there are two different permutations π, ϖ such that $π ({(Λ_{T})}_{1, 1}), \dots, π ({(Λ_{T})}_{n_{T},_{T}})$ and ϖ((Λ_Q)_1,1), …, ϖ((Λ_Q)_B,B) minimize J(H) subject to our training energy constraint. Then, the entries of the corresponding eigenvalue matrix of $V_{T}^{H} I_{T}^{' - 1 / 2} P^{′H} S_{Q}^{' - 1} P^{'} I_{T}^{' - 1 / 2} V_{T}$ are

{(D_{P^{'}})}_{i, i}^{2} / (π ({(Λ_{T})}_{i, i}) ϖ ({(Λ_{Q})}_{i, i})), i = 1, 2, \dots, n_{T} (B \geq n_{T}) .

Setting ${(D_{P^{'}})}_{i, i}^{2} = κ_{i}, i = 1, 2, \dots, n_{T}$ , the optimization problem (59) results in

\begin{align} \begin{array}{l} \underset{π, ϖ, κ_{i}, i = 1, 2, \dots, n_{T}}{minimize} \sum_{i = 1}^{n_{T}} \frac{1}{\frac{κ_{i}}{π ({(Λ_{T})}_{i, i}) ϖ ({(Λ_{Q})}_{i, i})}} \\ s.t. \sum_{i = 1}^{n_{T}} κ_{i} \leq P \end{array} \end{align}

(61)

which leads to

\begin{align} \begin{array}{l} \underset{π, ϖ, κ_{i}, i = 1, 2, \dots, n_{T}}{minimize} \sum_{i = 1}^{n_{T}} \frac{α_{i}}{κ_{i}} \\ s.t. \sum_{i = 1}^{n_{T}} κ_{i} \leq P, \end{array} \end{align}

(62)

where α_i = π((Λ_T)_i,i) ϖ ((Λ_Q)_i,i), i = 1, 2, …, n_T. Forming the Lagrangian of the last problem, it can be seen that

{(D_{P^{'}})}_{i, i} = \sqrt{\frac{P \sqrt{α_{i}}}{\sum_{j = 1}^{n_{T}} \sqrt{α_{j}}}}, i = 1, 2, \dots, n_{T},

while the objective value equals to ${(\sum_{i = 1}^{n_{T}} \sqrt{α_{i}})}^{2} / P$ . Using Lemma 3, it can be seen that π and ϖ should correspond to opposite orderings of (Λ_T)_i,i,(Λ_Q)_j,j, i = 1, 2, …, n_T, j = 1, 2, …, B, respectively. Since B can be greater than n_T, the eigenvalues of $I_{T}^{'}$ must be set in decreasing order and those of S^′_Q in increasing order.

Appendix 5

Proof of Theorem 5

Using the factorization $I_{adm} = {(I_{T}^{'} \otimes I_{R})}^{1 / 2} {(I_{T}^{'} \otimes I_{R})}^{1 / 2}$ , we can see that $E \{J (\tilde{H}, H)\}$ is given by the expression

\begin{array}{l} tr \{[(I_{T}^{' - 1 / 2} R_{T}^{' - 1} I_{T}^{' - 1 / 2} \otimes I_{R}^{- 1 / 2} R_{R}^{- 1} I_{R}^{- 1 / 2}) \\ + {(I_{T}^{' - 1 / 2} P^{′H} S_{Q}^{' - 1} P^{'} I_{T}^{' - 1 / 2} \otimes I_{R}^{- 1 / 2} S_{R}^{- 1} I_{R}^{- 1 / 2})]}^{- 1}\}, \end{array}

(63)

where $R_{T}^{'} = R_{T}^{T}$ with eigenvalue decomposition $U_{T}^{'} Λ_{T}^{'} U_{T}^{′H}$ . This objective function subject to the training energy constraint $tr (P^{'} P^{′H}) \leq P$ seems very difficult to minimize analytically unless special assumptions are made.

R_R = S_R: Then, (63) becomes
$\begin{array}{l} tr \{{(I_{T}^{' - 1 / 2} R_{T}^{' - 1} I_{T}^{' - 1 / 2} + I_{T}^{' - 1 / 2} P^{′H} S_{Q}^{' - 1} P^{'} I_{T}^{' - 1 / 2})}^{- 1} \\ \otimes I_{R}^{1 / 2} R_{R} I_{R}^{1 / 2}\} . \end{array}$
(64)
Using once more the fact that tr(A ⊗ B) = tr(A) tr(B) for square matrices A and B, it is clear from (64) that the optimal training matrix can be found by minimizing
$tr \{{(R_{T}^{' - 1} + P^{′H} S_{Q}^{' - 1} P^{'})}^{- 1} I_{T}^{'}\} .$
(65)
Again, here some special assumptions may be of interest.

$I_{T} = I$
: Then, the optimal training matrix can be found by straightforward adjustment of Proposition 2 in [8].–
$R_{T}^{- 1} = I_{T}$
: Then, (65) takes the form
$tr \{{(I + R_{T}^{' 1 / 2} P^{′H} S_{Q}^{' - 1} P^{'} R_{T}^{' 1 / 2})}^{- 1}\} .$
(66)

Using the same majorization argument as in the previous appendix for $tr (A^{- 1}) = \sum_{i} 1 / λ_{i} (A)$ and adopting the notation therein, we should select U = U T′ and V = V_Q. With these choices, the optimal power allocation problem becomes

\begin{align} \begin{array}{l} \underset{π, ϖ, κ_{i}, i = 1, 2, \dots, n_{T}}{minimize} & \sum_{i = 1}^{n_{T}} \frac{1}{1 + \frac{π ({(Λ_{T}^{'})}_{i, i}) κ_{i}}{ϖ ({(Λ_{Q})}_{i, i})}} \\ s.t. \sum_{i = 1}^{n_{T}} κ_{i} \leq P, \end{array} \end{align}

(67)

where (Λ T′)_i,i, i = 1, 2, …, n_T are the eigenvalues of R T′. Fixing the permutations π(·) and ϖ(·), we set γ_i = π((Λ T′)_i,i) / ϖ((Λ_Q)_i,i), i = 1, 2, …, n_T. With this notation, the problem of selecting the optimal κ_i’s becomes

\begin{align} \begin{array}{l} \underset{κ_{i}, i = 1, 2, \dots, n_{T}}{minimize} \sum_{i = 1}^{n_{T}} \frac{1}{1 + γ_{i} κ_{i}} \\ s.t. \sum_{i = 1}^{n_{T}} κ_{i} \leq P . \end{array} \end{align}

(68)

Following similar steps as in the proof of Proposition 2 in [8], we define the following parameter

\begin{array}{l} m_{*} = max \{m \in {1, 2, \dots, n_{T}} : \sqrt{\frac{1}{γ_{k}}} \cdot \\ \sum_{i = 1}^{m} \sqrt{\frac{1}{γ_{i}}} - \sum_{i = 1}^{m} \frac{1}{γ_{i}} < P, k = 1, 2, \dots, m\} . \end{array}

(69)

Then, it can be easily seen that for j = 1, 2, …, m_∗ the optimal (D_P′)_j,j is given by the expression

\sqrt{\frac{P + \sum_{i = 1}^{m_{*}} \frac{1}{γ_{i}}}{\sum_{i = 1}^{m_{*}} \sqrt{\frac{1}{γ_{i}}}} \sqrt{\frac{1}{γ_{j}}} - \frac{1}{γ_{j}}},

while (D_P′)_j,j = 0 for j = m_∗ + 1, …, n_T.

With these expressions for the optimal power allocation, the objective of (67) equals
$n_{T} - m_{*} + \frac{{(\sum_{i = 1}^{m_{*}} \frac{1}{\sqrt{γ_{i}}})}^{2}}{P + \sum_{i = 1}^{m_{*}} \frac{1}{γ_{i}}},$

and therefore, the problem of determining the optimal orderings π(·), ϖ(·) becomes

\begin{align} \begin{array}{l} \underset{π, ϖ}{minimize} & n_{T} - m_{*} + \frac{{(\sum_{i = 1}^{m_{*}} \frac{1}{\sqrt{γ_{i}}})}^{2}}{P + \sum_{i = 1}^{m_{*}} \frac{1}{γ_{i}}} . \end{array} \end{align}

(70)

The last problem seems to be difficult to solve analytically. Nevertheless, a simple numerical exhaustive search algorithm, namely Algorithm 1, can solve this problem^g.
Algorithm 1 Optimal ordering for the eigenvalues of R T ′ and S Q ′ , when R _R =S _R and $R_{T}^{- 1} = I_{T}$
Note that given the fact that n_T and B are small in practice, the complexity of the above algorithm and its necessary memory are not crucial. However, as n_T and B increase, complexity and memory become important. In this case, a good solution may be to order the eigenvalues of R T′ in decreasing order and those of S Q′ in increasing order. This can be analytically justified based on the fact that for a fixed m_∗, the objective function of problem (70), say $MSE (γ_{1}, \dots, γ_{m_{*}})$ , has negative partial derivatives with respect to γ_i, n i = 1, 2, …, m_∗, and it is also symmetric, since any permutation of its arguments does not change its value. This essentially shows that a good solution may maintain as active γ’s the largest possible, through the selection of m_∗. Additionally, the structure of $MSE (γ_{1}, \dots, γ_{m_{*}})$ reveals the fact that for every new active γ, something less than 1 is added to the MSE, while an inactive value corresponds to adding 1 to the MSE. This is intuitively appealing with the spatial diversity of MIMO systems and the usual properties that optimal training matrices possess in such systems (i.e., that they tend to fully exploit the available spatial diversity). The largest possible γ’s can be achieved with a decreasing order of the eigenvalues of R T′ and an increasing order of the eigenvalues of S Q′. In this case, it can be checked that m_∗ can be found as follows:
$\begin{array}{l} m_{*} = max \{m \in {1, 2, \dots, n_{T}} : \sqrt{\frac{1}{γ_{m}}} \cdot \\ \sum_{i = 1}^{m} \sqrt{\frac{1}{γ_{i}}} - \sum_{i = 1}^{m} \frac{1}{γ_{i}} < P\} . \end{array}$
If the modal matrices of R_R and S_R are the same, $I_{T} = I$ and $I_{R} = I$ , then the optimal training is given by [9], as these assumptions correspond to the problem solved therein.
In any other case (e.g., if R_R ≠ S_R), the (optimal) training can be found using numerical methods like the semidefinite relaxation approach described in [28]. Note that this approach can handle also general $I_{adm}$ , not necessarily Kronecker-structured.

References

Tarokh V, Naguib A, Seshadri N, Calderbank AR: Space-time codes for high data rate wireless communication: performance criteria in the presence of channel estimation errors, mobility, and multiple paths. IEEE Trans. Commun 1999, 47(2):199-207. 10.1109/26.752125
Article MATH Google Scholar
Stoica P, Besson O: Training sequence design for frequency offset and frequency-selective channel estimation. IEEE Trans. Commun 2003, 51(11):1910-1917. 10.1109/TCOMM.2003.819199
Article MathSciNet Google Scholar
Hassibi B, Hochwald B: How much training is needed in multiple-antenna wireless links? IEEE Trans. Inf. Theory 2003, 49(4):951-963. 10.1109/TIT.2003.809594
Article MATH Google Scholar
Kotecha J, Sayeed A: Transmit signal design for optimal estimation of correlated MIMO channels. IEEE Trans. Signal Process 2004, 52(2):546-557. 10.1109/TSP.2003.821104
Article MathSciNet Google Scholar
Wong T, Park B: Training sequence optimization in MIMO systems with colored interference. IEEE Trans. Commun 2004, 52(11):1939-1947. 10.1109/TCOMM.2004.836558
Article Google Scholar
Biguesh M, Gershman A: Training-based MIMO channel estimation: a study of estimator tradeoffs and optimal training signals. IEEE Trans Signal Process 2006, 54(3):884-893.
Article Google Scholar
Liu Y, Wong T, Hager W: Training signal design for estimation of correlated MIMO channels with colored interference. IEEE Trans Signal Process 2007, 55(4):1486-1497.
Article MathSciNet Google Scholar
Katselis D, Kofidis E, Theodoridis S: Training-based estimation of correlated MIMO fading channels in the presence of colored interference. Signal Process 2007, 87(9):2177-2187. 10.1016/j.sigpro.2007.02.016
Article MATH Google Scholar
Björnson E, Ottersten B: A framework for training-based estimation in arbitrarily correlated Rician MIMO channels with Rician disturbance. IEEE Trans. Signal Process 2010., 58(3):
Biguesh M, Gazor S, Shariat M: Optimal training sequence for MIMO, wireless systems in colored environments. IEEE Trans. Signal Process 2009, 57(8):3144-3153.
Article MathSciNet Google Scholar
Vikalo H, Hassibi B, Hochwald B, Kailath T: On the capacity of frequency- selective channels in training-based transmission schemes. IEEE Trans. Signal Process 2004, 52(9):2572-2583. 10.1109/TSP.2004.832020
Article Google Scholar
Ahmed K, Tepedelenlioglu C, Spanias A: PEP-based optimal training for MIMO systems in wireless channels. Proc. IEEE ICASSP 2005, 3: 793-796.
Google Scholar
Jansson H, Hjalmarsson H: Input design via LMIs admitting frequency-wise model specifications in confidence regions. IEEE Trans Aut. Control 2005, 50(10):1534-1549.
Article MathSciNet Google Scholar
Bombois X, Scorletti G, Gevers M, Van den, Hof P M J, Hildebrand R: Least costly identification experiment for control. Automatica 2006, 42(10):1651-1662. 10.1016/j.automatica.2006.05.016
Article MATH Google Scholar
Hjalmarsson H: System identification of complex and structured systems. Plenary address European Control Conference/European Journal of Control 2009, 15(4):275-310.
Article MathSciNet MATH Google Scholar
Kiefer J: General equivalence theory for optimum designs (approximate theory). Ann. Stat 1974, 2(5):849-879. 10.1214/aos/1176342810
Article MathSciNet MATH Google Scholar
Ciblat P, Bianchi P, Ghogho M: Training sequence optimization for joint channel and frequency offset estimation. IEEE Trans. Signal Process 2008, 56(8):3424-3436.
Article MathSciNet Google Scholar
Kermoal J, Schumacher L, Pedersen K, Mogensen P, Fredriksen F: A stochastic MIMO radio channel model with experimental validation. IEEE J. Sel. Areas Commun 2002, 20(6):1211-1226. 10.1109/JSAC.2002.801223
Article Google Scholar
Yu K, Bengtsson M, Ottersten B, McNamara D, Karlsson P, Beach M: Modeling of wideband MIMO radio channels based on NLOS indoor measurements. IEEE Trans. Veh. Technol 2004, 53(3):655-665. 10.1109/TVT.2004.827164
Article Google Scholar
Gazor S, Rad H: Space-time frequency characterization of MIMO, wireless channels. IEEE Trans. Wireless Commun 2006, 5(9):2369-2376.
Article Google Scholar
Rad H, Gazor S: The impact of non-isotropic scattering and directional antennas on MIMO multicarrier mobile communication channels. IEEE Trans. Commun 2008, 56(4):642-652.
Article Google Scholar
Werner K, Jansson M: Estimating MIMO channel covariances from training data under the Kronecker model. Signal Process 2009, 89: 1-13. 10.1016/j.sigpro.2008.06.014
Article MATH Google Scholar
Kay S: Fundamentals of Statistical Signal Processing: Estimation Theory. Englewood Cliffs, New Jersey: Prentice-Hall; 1993.
MATH Google Scholar
Rojas CR, Agüero JC, Welsh JS, Goodwin GC: On the equivalence of least costly and traditional experiment design for control. Automatica 2008, 44(11):2706-2715. 10.1016/j.automatica.2008.03.023
Article MathSciNet MATH Google Scholar
Barenthin M, Hjalmarsson H: Identication and control: joint input design and H∞ state feedback with ellipsoidal parametric uncertainty via LMIs. Automatica 2008, 44(2):543-551. 10.1016/j.automatica.2007.06.025
Article MathSciNet MATH Google Scholar
Bombois X, Hjalmarsson H: Optimal input design for robust H2, deconvolution filtering. In 15th IFAC Symposium on System Identification, Saint-Malo, July 2009. (IEEE, Piscataway); 2009.
Google Scholar
Rojas CR, Katselis D, Hjalmarsson H, Hildebrand R, Bengtsson M: Chance constrained input design. In Proceedings of CDC-ECC, Orlando,December 2011. IEEE, Piscataway; 2011.
Google Scholar
Katselis D, Rojas CR, Hjalmarsson H, Bengtsson M: Application-oriented finite sample experiment design: a semidefinite relaxation approach. In SYSID 2012, Brussels, July 2012. IEEE, Piscataway; 2012.
Google Scholar
Gerencsér L, Hjalmarsson H, Mårtensson J: Adaptive input design for ARX systems. In European Control Conference, Kos, July 2007. Piscataway: IEEE; 2007.
Google Scholar
Paulraj A, Nabar R, Gore D: Introduction to Space-Time Wireless Communications. Cambridge: Cambridge University Press; 2003.
Google Scholar
Verdú S: Multiuser Detection. Cambridge: Cambridge University Press; 1998.
MATH Google Scholar
Haykin S: Adaptive Filter Theory. Englewood Cliffs: Prentice-Hall; 2001.
MATH Google Scholar
Hochwald B, Peel CB, Swindlehurst AL: A vector-perturbation technique for near-capacity multiantenna multiuser communication—part I: channel inversion and regularization. IEEE Trans. Commun 2005, 53: 195-202. 10.1109/TCOMM.2004.840638
Article Google Scholar
Marshall AW, Olkin I New York: Academic Press; 1979.
Ostrowski A: A quantitative formulation of Sylvester’s law of inertia, II. Proc. Nat. Acad. Sci. USA 1960, 46(6):859-862. 10.1073/pnas.46.6.859
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

ACCESS Linnaeus Center, School of Electrical Engineering, KTH Royal Institute of Technology, Stockholm, SE-100 44, Sweden
Dimitrios Katselis, Cristian R Rojas, Mats Bengtsson, Emil Björnson, Nafiseh Shariati, Magnus Jansson & Håkan Hjalmarsson
Delft Center for Systems and Control, Delft University of Technology, Mekelweg 2, 2628 CD, Delft, The Netherlands
Xavier Bombois

Authors

Dimitrios Katselis
View author publications
You can also search for this author in PubMed Google Scholar
Cristian R Rojas
View author publications
You can also search for this author in PubMed Google Scholar
Mats Bengtsson
View author publications
You can also search for this author in PubMed Google Scholar
Emil Björnson
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Bombois
View author publications
You can also search for this author in PubMed Google Scholar
Nafiseh Shariati
View author publications
You can also search for this author in PubMed Google Scholar
Magnus Jansson
View author publications
You can also search for this author in PubMed Google Scholar
Håkan Hjalmarsson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitrios Katselis.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Katselis, D., Rojas, C.R., Bengtsson, M. et al. Training sequence design for MIMO channels: an application-oriented approach. J Wireless Com Network 2013, 245 (2013). https://doi.org/10.1186/1687-1499-2013-245

Download citation

Received: 29 April 2013
Accepted: 01 October 2013
Published: 17 October 2013
DOI: https://doi.org/10.1186/1687-1499-2013-245

Training sequence design for MIMO channels: an application-oriented approach

Abstract

1 Introduction

1.1 Notations

2 System model

3 Channel matrix estimation

3.1 Deterministic channel estimation

3.2 Bayesian channel estimation

4 Application-oriented optimal training design

Remark.

4.1 Approximating the training design problems

4.2 The deterministic guaranteed performance problem

Theorem 1.

Proof.

Theorem 2.

Proof.

4.3 The stochastic guaranteed performance problem

Theorem 3.

Proof.

4.4 Optimizing the average performance

Theorem 4.

Proof.

Theorem 5.

Proof.

5 Applications

5.1 Optimal training for channel estimation

Remark.

5.2 Optimal training for the L-optimality criterion

Remark.

5.3 Optimal training for channel equalization

5.3.1 Equalization using exact channel state information

Remark.

5.3.2 Equalization using a channel estimate

5.4 Optimal training for zero-forcing precoding

Remark.

6 Numerical examples

7 Conclusions

Endnotes

Appendix 1

Approximating the performance measure for MMSE channel equalization

High SNR analysis

Case 1.

Case 2.

Case 3.

Low SNR analysis

Appendix 2

Proof of Theorem 1

Lemma 1.

Proof.

Lemma 2.

Proof.

Lemma 3.

Proof.

Proof of Theorem 1

Appendix 3

Proof of Theorems 2 and 3

Lemma 4.

Proof.

Proof of Theorem 2

Proof of Theorem 3

Appendix 4

Proof of Theorem 4

Appendix 5

Proof of Theorem 5

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords