Performance analysis and optimal power allocation for linear receivers based on superimposed training

Kammoun, Abla; Abed-Meraim, Karim

doi:10.1186/1687-1499-2013-227

Research
Open access
Published: 13 September 2013

Performance analysis and optimal power allocation for linear receivers based on superimposed training

Abla Kammoun¹ &
Karim Abed-Meraim²

EURASIP Journal on Wireless Communications and Networking volume 2013, Article number: 227 (2013) Cite this article

1921 Accesses
2 Citations
Metrics details

Abstract

In this paper, we derive a performance comparison between two training-based schemes for multiple-input multiple-output systems. The two schemes are the time-division multiplexing scheme and the recently proposed data-dependent superimposed pilot scheme. For both schemes, a closed-form expression for the bit error rate (BER) is provided. We also determine, for both schemes, the optimal allocation of power between the pilot and data that minimizes the BER.

1 Introduction

The use of multiple-input multiple-output (MIMO) antenna systems enables high data rates without any increase in bandwidth or power consumption. However, the good performance of the MIMO systems requires a priori knowledge of the channel at the receiver. In many practical systems, the receiver estimates the channel by time division, multiplexing pilot symbols with the data. Although high quality of the channel estimation could be achieved especially when using a large number of pilot symbols[1], this method may entail a waste of the available channel resources. An alternative method is the conventional superimposed training. It consists in transmitting pilots and data at the same time. However, since during channel estimation, the data symbols act as a source of noise, the channel estimation is affected. In the literature, the impact of channel estimation error upon the performance indexes has been investigated. In[2] and[3], a comparison between the performance of the conventional superimposed training scheme and the time-multiplexing-based scheme has been carried out. The optimal power allocation between pilot and data that maximizes a lower bound of the maximum mutual information criterion has been provided. It has been shown that the use of the optimal conventional superimposed training scheme entails a gain in terms of channel capacity only in special scenarios (many receive antennas and/or short coherence time). In other scenarios, the superimposed training scheme suffers from high channel estimation errors, and its gain over the time-multiplexing-based scheme is often lost. For this reason, many alternatives to the conventional superimposed training scheme have been proposed in recent works.

In[4], Ghogho and Swami proposed to introduce a distortion to the data symbols, prior to adding the known pilot in such a way to guarantee the orthogonality between pilot and data sequences. It is shown that the channel estimation performance is by far enhanced as compared to the standard superimposed scheme. This technique is referred to as the data-dependent superimposed training (DDST). While the DDST scheme exhibits the same channel performance as its time-division multiplexed training (TDMT) counterpart, the effect of the introduced distortion may considerably affect the detection performance. The aim of this paper is thus to study the BER performance of the DDST and TDMT schemes, and to evaluate to which extent the performance of the DDST scheme is altered.

In the literature, the few works focusing on BER performance have been based on unrealistic assumptions like the uncorrelation between the noise and channel estimation error[5, 6]. These assumptions make calculations feasible for fixed size dimensions but are far away from being realistic. To make derivations possible while keeping realistic conditions, we will relax the assumption of finite size dimensions by allowing the space and time dimensions to grow to infinity at the same rate. Working with the asymptotic regime allows us to simplify the derivations, and at the same time, we observe that the obtained results apply as well to usual sample and antenna array sizes. We show also that the obtained expressions can be used to determine the optimal power allocation that minimizes the BER.

The remainder of this paper is as follows: in the next section, we introduce the system model. After that, we review in section 3 the channel estimation and data detection processes for the TDMT and DDST schemes. Section 4 is dedicated to the derivation of the asymptotic BER expressions. Based on these results, we determine the optimal allocation of power between data and training for both schemes. Finally, simulation results are provided in section 7 to validate the analytical derivation.

The following notations are used in this paper: Superscripts H, #, and Tr(.) denote Hermitian, pseudo-inverse, and trace operators, respectively. The statistical expectation and the Kronecker product are denoted by $E$ and ⊗. The (K×K) identity matrix is denoted by I_K, and the (Q×Q) matrix of all ones by 1_Q. The (i,j)th entry of a matrix A is denoted by A_i,j.

2 System model and problem setting

2.1 Time-division multiplexing scheme

We consider a M×K MIMO system operating over a flat fading channel. Two phases are considered:

First phase: In the first phase, each transmitting antenna sends N₁ pilot symbols. The received symbol Y₁ writes as:

Y_{1} = {HP}_{t} + V_{1},

where P_t is the K×N₁ pilot matrix, and

Assumption 1

H is the M×K channel matrix with independent and identically distributed (i.i.d.) Gaussian variables with zero mean and variance $\frac{1}{K}$ .

Assumption 2

V₁ is the M×N₁ matrix whose entries are i.i.d. with variance $σ_{v}^{2}$ .

Second phase: In the second phase, N₂ data symbols with power $σ_{w_{t}}^{2}$ are sent by each antenna so that the received signal matrix Y₂ writes as:

Y_{2} = H W_{t} + V_{2},

where

Assumption 3

W_t is the K×N₂ data matrix with i.i.d. bounded data symbols of power $σ_{w_{t}}^{2}$ , and V₂ is the M×N₂ additive Gaussian noise matrix with entries of zero mean and variance $σ_{v}^{2}$ . Moreover, W_t is independent of V₁ and V₂.

2.2 Data-dependent superimposed training scheme

Another alternative to the TDMT-based schemes is to send the training and data sequences at the same time. Since the data is transmitted all the time, these schemes allow efficient bandwidth efficiency but may suffer from the interference caused by the training sequence. Ghogho and Swami[4] proposed thus to distort the data so that is becomes orthogonal to the training sequence. The proposed distortion matrix D is defined as:

D = I_{N} - J,

where $J = \frac{K}{N} 1_{\frac{N}{K}} \otimes I_{K}$ , (we assume that $\frac{N}{K}$ is the integer valued, N being the sample size). This distortion matrix was shown to be optimal in the sense that it minimizes the averaged Euclidean distance between the distorted and non-distorted data[7]. The received signal matrix at each block is therefore given by:

Y = H W_{d} (I_{N} - J) + H P_{d} + V,

where

Assumption 4

W_d is the data matrix with i.i.d. bounded data symbols of power $σ_{w_{d}}^{2}$ , and V is the M×N matrix whose entries are i.i.d. zero mean with variance $σ_{v}^{2}$ .

Moreover, P_d is the K×N training matrix. The chosen pilot matrix P_d should fulfill two requirements. It should be orthogonal to the distortion matrix D, thus satisfying D P d H=0, and also verify the orthogonality relation $P_{d} P_{d}^{H} = N σ_{P_{d}}^{2} I_{K}$ in order to minimize the channel estimation error subject to a fixed training power. A possible pilot matrix that meets these requirements is

Assumption 5

\begin{array}{l} P_{d} (k, n) = σ_{P_{d}} exp (j 2 πkn / K) with k = 0, \dots, K - 1 \\ and n = 0, \dots, N - 1 . \end{array}

(1)

3 Channel estimation and data detection

3.1 TDMT scheme

In the first phase, we assume that the receiver estimates the channel in the least-square sense. Hence, the channel estimate is given by

\begin{align} {\hat{H}}_{t} & = Y_{1} P_{t}^{H} {(P_{t} P_{t}^{H})}^{- 1} \\ = H + V_{1} P_{t}^{H} {(P_{t} P_{t}^{H})}^{- 1} \\ = H + Δ H_{t}, \end{align}

where Δ H_t=V₁P t H(P_tP t H)⁻¹. Thus, the mean square error (MSE) is written as

{MSE}_{t} = M σ_{v}^{2} tr {(P_{t} P_{t}^{H})}^{- 1} .

As it has been shown in[1], the optimal training matrix that minimizes the MSE under a constant training energy $N_{1} σ_{P_{t}}^{2}$ should satisfy

Assumption 6

P_{t} P_{t}^{H} = N_{1} σ_{P_{t}}^{2} I_{K},

where $σ_{P_{t}}^{2}$ denotes the amount of power devoted to the transmission of a pilot symbol. The optimal minimum value for the MS E_t is then given by

{MSE}_{t} = \frac{KM σ_{v}^{2}}{N_{1} σ_{P_{t}}^{2}} .

In the data transmission phase, the linear receiver uses the channel estimate in order to retrieve the transmitted data. After channel inversion, the estimated data matrix is given by

{\hat{W}}_{t} = {({\hat{H}}_{t})}^{#} Y_{2},

where ${({\hat{H}}_{t})}^{#}$ denotes the pseudo-inverse matrix of ${\hat{H}}_{t}$ . Assuming that the channel estimation error is small, the pseudo-inverse of the estimated matrix can be approximated by the linear part of the Taylor expansion as[8]:

{({\hat{H}}_{t})}^{#} = H^{#} - H^{#} Δ H_{t} H^{#} + H^{#} {(H^{#})}^{H} Δ H_{t} (I_{M} - {HH}^{#}) .

(2)

Substituting H^# by (H^HH)⁻¹H^H in (2), we obtain

{({\hat{H}}_{t})}^{#} = H^{#} - H^{#} Δ H_{t} H^{#} + {(H^{H} H)}^{- 1} Δ H_{t}^{H} Π,

where Π=I_M−H(H^HH)⁻¹H^H is the orthogonal projector on the null space of H. Hence, the zero-forcing estimate of the transmitted matrix can be expressed as

\begin{array}{l} {\hat{W}}_{t} = W_{t} - H^{#} Δ H_{t} W_{t} + (H^{#} - H^{#} Δ H_{t} H^{#}) V_{2} \\ + {(H^{H} H)}^{- 1} {(Δ H_{t})}^{H} Π V_{2} . \end{array}

Consequently, the effective post-processing noise $Δ W_{t} = {\hat{W}}_{t} - W_{t}$ could be written as

\begin{array}{l} Δ W_{t} = - H^{#} Δ H_{t} W_{t} + (H^{#} - H^{#} Δ H_{t} H^{#} \\ + {(H^{H} H)}^{- 1} {(Δ H_{t})}^{H} Π) V_{2} . \end{array}

3.2 DDST scheme

The LS channel estimate is obtained by multiplying Y by P d H(P_dP d H)⁻¹, thus giving

{\hat{H}}_{d} = Y P_{d}^{H} {(P_{d} P_{d}^{H})}^{- 1} = H + V P_{d}^{H} {(P_{d} P_{d}^{H})}^{- 1} = H + Δ H_{d},

where Δ H_d=V P d H(P_dP d H)⁻¹ denotes the channel estimation error matrix for the DDST scheme. As aforementioned above in Assumption 1, the optimal training matrix that minimizes the MSE should satisfy

P_{d} P_{d}^{H} = N σ_{P_{d}}^{2} I_{K} .

The MSE is thus given by:

{MSE}_{d} = M σ_{v}^{2} tr {(P_{d} P_{d}^{H})}^{- 1} = \frac{KM σ_{v}^{2}}{N σ_{P_{d}}^{2}} .

For the DDST scheme, we consider a zero-forcing receiver which, prior to inverting the channel matrix, cancels the contribution of the training symbols by right multiplying Y by (I−J), where

Y = H W_{d} (I_{N} - J),

the matrix W_d being the sent data matrix. Thus, the zero-forcing estimate of W_d is given by

\begin{array}{l} {\hat{W}}_{d} = {({\hat{H}}_{d})}^{#} Y (I - J) \\ = (H^{#} - H^{#} Δ H_{d} H^{#} + {(H^{H} H)}^{- 1} Δ H_{d}^{H} Π) \\ \times ({HW}_{d} (I - J) + V (I - J)) \\ = (I - H^{#} Δ H_{d}) W_{d} (I - J) \\ + (H^{#} - H^{#} Δ H_{d} H^{#}) V (I - J) \\ + {(H^{H} H)}^{- 1} Δ H_{d}^{H} Π V (I - J) \\ = W_{d} (I - J) - H^{#} Δ H_{d} W_{d} (I - J) \\ + (H^{#} - H^{#} Δ H_{d} H^{#}) V (I - J) \\ + {(H^{H} H)}^{- 1} Δ H_{d}^{H} Π V (I - J) \\ = W_{d} + (- W_{d} J - H^{#} Δ H_{d} W (I - J) \\ + (H^{#} - H^{#} Δ H_{d} H^{#}) V (I - J)) \\ + {(H^{H} H)}^{- 1} Δ H_{d}^{H} Π V (I - J) . \end{array}

Hence,

\begin{array}{l} Δ W_{d} = - W_{d} J - H^{#} Δ H_{d} W_{d} (I - J) \\ + (H^{#} - H^{#} Δ H_{d} H^{#}) V (I - J) \\ + {(H^{H} H)}^{- 1} Δ H_{d}^{H} Π V (I - J) . \end{array}

4 Bit error rate performance

4.1 TDMT scheme

In order to evaluate the bit error rate performance, we need to evaluate the asymptotic behavior of the post-processing noise observed at each entry of matrix Δ W_t. Using the ‘characteristic function’ approach, we can prove that conditioned on the channel matrix, the noise behaves asymptotically like a Gaussian random variable. This result is stated in the following theorem, but its proof is shown in Appendix 1.

Theorem 1

Under Assumptions 1, 2, 3, and 6, and under the asymptotic regime defined as

\begin{array}{l} M, K, N_{1}, N_{2} \to + \infty with \\ \frac{K}{N_{1} + N_{2}} \to c_{1}, 0 < c_{1} < 1 \frac{M}{K} \to c_{2} > 1 and \frac{N_{2}}{N_{1}} \to r, \end{array}

the post-processing noise experienced by the i th antenna at each time k, Δ W_t(i,k), for the TDMT scheme behaves in the asymptotic regime as a Gaussian random variable.

E [e^{j ℜ (z^{*} Δ W_{t} (i, k))}] - e^{- \frac{σ_{w_{t}}^{2} δ_{t} {[{(H^{H} H)}^{- 1}]}_{i, i} | z |^{2}}{4}} \to_{}^{K \to + \infty} 0,

where

δ_{t} = c_{1} (1 + r) \frac{σ_{v}^{2}}{σ_{P_{t}}^{2}} + \frac{σ_{v}^{2}}{σ_{w_{t}}^{2}} + \frac{c_{1} (1 + r) (c_{2} + 1) σ_{v}^{4}}{σ_{w_{t}}^{2} σ_{P_{t}}^{2} (c_{2} - 1)},

and K→+ ∞ refers to this asymptotic regime.

Remark 1

Note that as compared to the results in[9], our results make appear a new additive term of order $σ_{v}^{4}$ .

With the Gaussianity of the post-processing noise being verified in the asymptotic case, we can derive the bit error rate for QPSK constellation and Gray encoding as[10]

BER = E Q (\sqrt{x}),

(3)

where the expectation is taken with respect to the probability density function of the post processing SNR at the i th branch defined as

γ_{t} = \frac{1}{δ_{t} {[{(H^{H} H)}^{- 1}]}_{i, i}} .

From[11] and[12], we know that $\frac{1}{{[{(H^{H} H)}^{- 1}]}_{i, i}}$ is a weighted chi-square distributed random variable with 2(M−K+1) degrees of freedom, whose density function is given by

f (x) = \frac{K^{M - K + 1} x^{M - K} e^{- Kx}}{(M - K)!} 1_{[0, + \infty[},

where 1_[0,+∞[ is the indicator function corresponding to the interval [0,+∞[. Hence, the probability density function of γ_t is given by

f_{γ_{t}} (x) = \frac{{(K δ_{t})}^{M - K + 1} x^{M - K} exp (- K δ_{t} x)}{(M - K)!} 1_{[0, + \infty[} .

(4)

Plugging (4) into (3), we get:

{BER}_{t} = \frac{{(K δ_{t})}^{M - K + 1}}{(M - K)!} \int_{0}^{+ \infty} x^{M - K} exp (- K δ_{t} x) Q (\sqrt{x}) dx.

(5)

To compute (5), we use the following integral function:

J (m, a, b) = \frac{a^{m}}{Γ (m)} \int_{0}^{+ \infty} exp (- ax) x^{m - 1} Q (\sqrt{bx}) dx.

(6)

The BER is, therefore, equal to

BER = J (M - K + 1, K δ_{t}, 1) .

(7)

The integral in (6) has been shown to have, for c>0, the following closed-form expression[13]:

\begin{array}{l} J (m, a, b) = {\frac{\sqrt{c / π} Γ (m + \frac{1}{2})}{2 {(1 + c)}^{m + \frac{1}{2}} Γ (m + 1)}}_{2} \\ \times F_{1} (1, m + \frac{1}{2}; m + 1; \frac{1}{1 + c}), c = \frac{b}{2 a}, \end{array}

where ₂F₁(p,q;n,z) is the Gauss hypergeometric function[14]. If c=0 equivalently b=0, it is easy to note that J(m,a,0) is equal to $\frac{1}{2}$ . When m is restricted to positive integer values, the above equation can be further simplified to[15]

J (m, a, b) = \frac{1}{2} [1 - μ \sum_{k = 0}^{m - 1} (\binom{2 k}{k}) {(\frac{1 - μ^{2}}{4})}^{k}],

(8)

where $μ = \sqrt{\frac{c}{1 + c}}$ .

Plugging (8) into (7), we get

{BER}_{t} = \frac{1}{2} [1 - μ_{t} \sum_{k = 0}^{M - K} (\binom{2 k}{k}) {(\frac{1 - μ_{t}^{2}}{4})}^{k}],

(9)

where $μ_{t} = \sqrt{\frac{1}{2 K δ_{t} + 1}}$ .

4.2 DDST scheme

Unlike the TDMT scheme, the asymptotic distribution of entries of the post-processing noise matrix is not Gaussian. Actually, we prove that

Theorem 2

Under assumptions 4, 5, and under the asymptotic regime defined as

\frac{K}{N} \to c_{1}, 0 < c_{1} < 1 with \frac{M}{K} \to c_{2} > 1,

the post-processing noise experienced by the i th antenna at each time k behaves asymptotically as a Gaussian mixture random variable, i.e.,

\begin{array}{l} E [exp (ȷ ℜ (z^{*} {[Δ W_{d}]}_{i, k}))] \\ - \sum_{i = 1}^{Q} p_{i} exp (ȷ ℜ (z^{*} α_{i})) \\ \times exp (- \frac{| z |^{2} δ_{d} σ_{w_{d}}^{2} {[{(H^{H} H)}^{- 1}]}_{i, i}}{4}) \to_{}^{K \to \infty} 0, \end{array}

(10)

where

\begin{array}{l} δ_{d} = (1 - c_{1}) (\frac{c_{1} σ_{v}^{2}}{σ_{P_{d}}^{2}} + \frac{σ_{v}^{2}}{σ_{w_{d}}^{2}} + \frac{c_{1} σ_{v}^{4} (c_{2} + 1)}{(c_{2} - 1) σ_{P_{d}}^{2} σ_{w_{d}}^{2}}), \end{array}

(11)

and $Q$ is the cardinal of the set of all possible values of ${[\bar{W}]}_{i, k} = c_{1} \sum_{k = 1}^{\frac{1}{c_{1}}} {[W_{d}]}_{i, k}$ , and p_i is the probability that ${[\bar{W}]}_{i, k}$ takes the value α_i.

We can also prove that conditioning on the fact that ${[W]}_{i, k} = ε_{1} \sqrt{\frac{σ_{w_{d}}^{2}}{2}} + ȷ ε_{2} \sqrt{\frac{σ_{w_{d}}^{2}}{2}}$ , where ε₁=±1 and ε₂=±1, the post-processing noise satisfies

\begin{array}{l} E [exp (ȷ ℜ (z^{*} {[Δ W_{d}]}_{i, k})) | {[W]}_{i, k} = (ε_{1} + ȷ ε_{2}) \sqrt{\frac{σ_{w_{d}}^{2}}{2}}] \\ - \sum_{i = 1}^{Q^{^{'}}} p_{i}^{^{'}} exp (ȷ ℜ (z^{*} (- c_{1} (ε_{1} + ȷ ε_{2}) \sqrt{\frac{σ_{w_{d}}^{2}}{2}} + α_{i}^{^{'}}))) \\ \times exp (- \frac{| z |^{2} δ_{d} σ_{w_{d}}^{2} {[{(H^{H} H)}^{- 1}]}_{i, i}}{4}) \to_{}^{K \to \infty} 0, \end{array}

(12)

where $Q^{^{'}}$ is the cardinal of the set of all possible values ${\bar{\bar{W}}}_{i} = c_{1} \sum_{l = 1}^{\frac{1}{c_{1}} - 1} {[W]}_{i, l}$ , and $p_{i}^{^{'}}$ is the probability that ${\bar{\bar{W}}}_{i}$ takes the value $α_{i}^{^{'}}$ .

Proof

See Appendix 2. □

The assumption of the Gaussianity of the post-processing noise has been always assumed. For time-division multiplexed training, this assumption is well founded, since the post-processing noise converges to a Gaussian distribution in the asymptotic regime (see Theorem 1).

In the superimposed training case, the distortion caused by the presence of data symbols affects the distribution of the post-processing noise which becomes asymptotically Gaussian mixture distributed. To assess the system performance in this particular case, we will start from the elementary definition of the bit error rate. Let Δ W_i,k denote the post-processing noise experienced at the i th antenna at time k (we omit the subscript d for ease of notations). As it has been previously shown that Δ W_i,k behaves as a Gaussian mixture random variable. Let $σ_{d}^{2}$ be the asymptotic variance of Δ W_i,k, i.e., $σ_{d}^{2} = σ_{w_{d}}^{2} δ_{d} {[{(H^{H} H)}^{- 1}]}_{i, i}$ .

Using the symmetry of the transmitted data, the BER expression at the i th branch under QPSK constellation and for a given channel realization is given by

\begin{array}{l} {BER}_{i} = \frac{1}{2} P [ℜ ({\hat{W}}_{i, k}) > 0 | ℜ (W_{i, k}) = - \sqrt{\frac{σ_{w}^{2}}{2}}] \\ + \frac{1}{2} P [ℜ ({\hat{W}}_{i, k}) < 0 | ℜ (W_{i, k}) = \sqrt{\frac{σ_{w}^{2}}{2}}] \\ = \frac{1}{2} P [ℜ (Δ W_{i, k}) > \sqrt{\frac{σ_{w}^{2}}{2}}] \\ + \frac{1}{2} P [ℜ (Δ W_{i, k}) < - \sqrt{\frac{σ_{w}^{2}}{2}}] \end{array}

In the asymptotic regime, (Δ W_i,k) converges to a mixed Gaussian distribution with the probability density function

\begin{matrix} f (x) = \frac{1}{\sqrt{π σ_{d}^{2}}} \sum_{s = 1}^{\sqrt{Q^{^{'}}}} p_{s} exp (- \frac{{(x + c_{1} ε \sqrt{\frac{σ_{w_{d}}^{2}}{2}} - ℜ (α_{s}))}^{2}}{σ_{d}^{2}}) . \end{matrix}

Hence, conditioned on the channel, the asymptotic bit error rate can be approximated by

\begin{array}{l} {BER}_{i, d} = \frac{1}{2} \frac{1}{\sqrt{π σ_{w_{d}}^{2}}} \int_{\sqrt{\frac{σ_{w_{d}}^{2}}{2}}}^{+ \infty} \sum_{s = 1}^{\sqrt{Q^{^{'}}}} p_{s}^{^{'}} \\ \times exp (- \frac{{(x - c_{1} \sqrt{\frac{σ_{w_{d}}^{2}}{2}} - ℜ (α_{s}))}^{2}}{σ_{w_{d}}^{2}}) dx \\ + \frac{1}{2} \frac{1}{\sqrt{π σ_{w_{d}}^{2}}} \int_{- \infty}^{- \sqrt{\frac{σ_{w_{d}}^{2}}{2}}} \sum_{s = 1}^{\sqrt{Q^{^{'}}}} p_{s}^{^{'}} \\ \times exp (- \frac{{(x + c_{1} \sqrt{\frac{σ_{w_{d}}^{2}}{2}} - ℜ (α_{s}))}^{2}}{σ_{w_{d}}^{2}}) dx. \end{array}

Finally, the proposed approximation of the BER can be obtained by averaging with respect to the channel realization H, thus giving

\begin{array}{l} {BER}_{d} = E \frac{1}{2} \sum_{s = 1}^{\sqrt{Q^{^{'}}}} p_{s}^{^{'}} Q (\sqrt{\frac{σ_{w}^{2}}{σ_{w_{d}}^{2}}} (1 - c_{1}) - \frac{ℜ (α_{s})}{\sqrt{\frac{σ_{w_{d}}^{2}}{2}}}) \\ + \frac{1}{2} \sum_{s = 1}^{\sqrt{Q^{^{'}}}} p_{s}^{^{'}} Q (\sqrt{\frac{σ_{w_{d}}^{2}}{σ_{w_{d}}^{2}}} (1 - c_{1}) + \frac{ℜ (α_{s})}{\sqrt{\frac{σ_{w_{d}}^{2}}{2}}}) . \end{array}

For QPSK constellations, it can be shown that $\sqrt{Q^{^{'}}} = \frac{1}{c_{1}}$ , where $\frac{1}{c_{1}} = \frac{N}{K}$ is assumed to be integer. Moreover, the set $S$ of the values taken by ℜ(α_s) can be given by

\begin{matrix} S = \{ℜ (α_{s}) = c_{1} \sqrt{\frac{σ_{w_{d}}^{2}}{2}} (\frac{1}{c_{1}} - 2 s - 1), s \in \{0, \dots, \frac{1}{c_{1}} - 1\}\} . \end{matrix}

with probability $p_{s} = \frac{(\binom{\frac{1}{c_{1}} - 1}{s})}{2^{\frac{1}{c_{1}} - 1}}$ .

Let $γ_{d} = \frac{σ_{w_{d}}^{2}}{σ_{d}}$ , then the BER expression becomes

\begin{align} {BER}_{d} & = E \sum_{s = 0}^{\frac{1}{c_{1}} - 1} \frac{(\binom{\frac{1}{c_{1}} - 1}{s})}{2^{\frac{1}{c_{1}} - 1}} Q (2 {sc}_{1} \sqrt{γ_{d}}), \end{align}

(13)

where the expectation is taken over the distribution of γ_d given by

f_{γ_{d}} (x) = \frac{{(K δ_{d})}^{M - K + 1} x^{M - K}}{(M - K)!} exp (- K δ_{d} x) .

The computation of the BER can be treated similarly to the TDMT scheme, thus leading to

{BER}_{d} = \frac{1}{2^{\frac{1}{c_{1}} - 1}} \sum_{s = 0}^{\frac{1}{c_{1}} - 1} (\binom{\frac{1}{c_{1}} - 1}{s}) J (M - K + 1, K δ_{d}, 4 s^{2} c_{1}^{2}) .

(14)

5 Optimal power allocation

So far, we have provided the approximations of the BER for the TDMT and DDST schemes. As it has been previously shown, these expressions depend on the power allocated to data and training, in addition to other parameters. While the system has no control over the noise power or the number of transmitting and receiving antennas, it still can optimize the power allocation in such a way to minimize this performance index. Next, we provide for the TDMT and DDST schemes the optimal data and training power amounts that minimize the BER under the constraint of a constant total power.

5.1 Optimal power allocation for the TDMT scheme

Referring to the expressions of BER, we can easily see that the optimal amount of power allocated to data and pilot for the TDMT scheme is the one that minimizes δ_t. Let ${\tilde{c}}_{1} = (1 + r) c_{1}$ , then minimizing δ_t with respect to $σ_{w_{t}}^{2}$ and $σ_{P_{t}}^{2}$ under the constraint that $N_{1} σ_{P_{t}}^{2} + N_{2} σ_{w_{t}}^{2} = (N_{1} + N_{2}) σ_{T}^{2}$ ( $σ_{T}^{2}$ being the mean energy per symbol) results in the following lemma:

Lemma 1

The optimal power allocation minimizing the BER under

\begin{align} σ_{w_{t}}^{2} & = \frac{(1 + r) σ_{T}^{2} \sqrt{r ((1 + r) σ_{T}^{2} + \frac{{\tilde{c}}_{1} σ_{v}^{2} (c_{2} + 1)}{c_{2} - 1})}}{r (\sqrt{r ((1 + r) σ_{T}^{2} + \frac{{\tilde{c}}_{1} σ_{v}^{2} (c_{2} + 1)}{c_{2} - 1})} + \sqrt{{\tilde{c}}_{1} (((1 + r) σ_{T}^{2} + \frac{r σ_{v}^{2} (c_{2} + 1)}{c_{2} - 1})})}, \end{align}

(15)

\begin{align} σ_{P_{t}}^{2} & = \frac{r (1 + r) σ_{T}^{2} \sqrt{{\tilde{c}}_{1} ((1 + r) σ_{T}^{2} + \frac{r σ_{v}^{2} (c_{2} + 1)}{c_{2} - 1})}}{r (\sqrt{r ((1 + r) σ_{T}^{2} + \frac{{\tilde{c}}_{1} σ_{v}^{2} (c_{2} + 1)}{c_{2} - 1})} + \sqrt{{\tilde{c}}_{1} ((1 + r) σ_{T}^{2} + \frac{r σ_{v}^{2} (c_{2} + 1)}{c_{2} - 1})})} . \end{align}

(16)

5.2 Optimal power allocation for the DDST scheme

For the DDST scheme, we can deduce from (13) that maximizing γ_d leads to minimize the BER. To maximize γ_d, we need to optimize δ_d as a function of $σ_{w_{d}}^{2}$ and under the constraint that $σ_{P_{d}}^{2} + (1 - c_{1}) σ_{w_{d}}^{2} = σ_{T}^{2}$ . After straightforward calculations, we can find that the optimal values for $σ_{w_{d}}^{2}$ and $σ_{P_{d}}^{2}$ are given by

Lemma 2

Under the data model, the optimal power allocation minimizing the BER under a total power constraint $σ_{T}^{2}$ is given by

\begin{align} σ_{w_{d}}^{2} & = \frac{\sqrt{(1 - c_{1}) (σ_{T}^{2} + \frac{c_{1} (c_{2} + 1) σ_{v}^{2}}{c_{2} - 1})} σ_{T}^{2}}{(1 - c_{1}) (\sqrt{(1 - c_{1}) (σ_{T}^{2} + \frac{c_{1} (c_{2} + 1) σ_{v}^{2}}{c_{2} - 1})} + \sqrt{c_{1} σ_{T}^{2} + \frac{c_{1} (c_{2} + 1) (1 - c_{1}) σ_{v}^{2}}{c_{2} - 1}})}, \end{align}

(17)

\begin{align} σ_{P_{d}}^{2} & = \frac{\sqrt{c_{1} σ_{T}^{2} + \frac{c_{1} (c_{2} + 1) (1 - c_{1}) σ_{v}^{2}}{c_{2} - 1}} σ_{T}^{2}}{\sqrt{(1 - c_{1}) (σ_{T}^{2} + \frac{c_{1} (c_{2} + 1) σ_{v}^{2}}{c_{2} - 1})} + \sqrt{c_{1} σ_{T}^{2} + \frac{c_{1} (c_{2} + 1) (1 - c_{1}) σ_{v}^{2}}{c_{2} - 1}}} . \end{align}

(18)

6 Discussion

To get more insight into the proposed analysis, we provide here some comments and workouts on the theoretical results derived in the previous sections.

6.1 High SNR behavior of the BER

At high SNRs, the error variance parameters δ_t and δ_d are close to zero, and hence, using a first-order Taylor expansion of the BER expressions in (9) and (14), we obtain

\begin{align} {BER}_{t} & \approx \frac{1}{2^{M - K + 1}} {(K δ_{t})}^{M - K + 1} (\binom{2 (M - K) + 1}{M - K + 1}) \end{align}

(19)

\begin{align} {BER}_{d} & \approx \frac{1}{2^{\frac{1}{c_{1}}}} + O ({(K δ_{d})}^{M - K + 1}), \end{align}

(20)

where O(x) denotes a real value of the same order of magnitude as x. From these approximated expressions, one can observe that the BER at the TDMT scheme is a monomial function of the estimation error variance parameter δ and the number of transmitters K. For example, if the noise power is decreased by a factor 2, then the BER will decrease by 2^M−K+1. The diversity gain is thus equal to M−K+1, which is in accordance with the works in[16] and[5]. Also, we observe that for the DDST case, we have a floor effect on the BER (i.e., the BER is lower bounded by $\frac{1}{2^{\frac{1}{c_{1}}}}$ ) due to the data distortion inherent to this transmission scheme.

6.2 Gaussian vs. Gaussian mixture model

In our derivations, we have found that the post-processing noise in the DDST case behaves asymptotically as a Gaussian mixture process, while in most of the existing works, the noise is assumed to be asymptotically Gaussian distributed. In fact, one can show that for large sample sizes (i.e., when c₁→0), the Gaussian mixture converges to a Gaussian distribution, allowing us to retrieve the standard Gaussian noise assumption. However, for small or moderate sample sizes, the considered Gaussian mixture model leads to a much better approximation of the BER analytical expression than the one we would obtain with a post-processing Gaussian noise model. In other words, Theorem 2 results allow us to derive closed-form expressions for the BER that are valid for relatively small sample sizes.

6.3 Workouts on the optimal power allocation expressions of the TDMT scheme

We consider here two limit cases: (1) The high SNR case, where $σ_{v}^{2} ≪ σ_{T}^{2}$ and (2) the case of high-dimensional system (the number of transmit antennae is of the same order of magnitude as the number of receiver antennae), where c₂−1≪1. From (16), the data-to-pilot power ratio can then be approximated by

\begin{align} case (1) \frac{σ_{w_{t}}^{2}}{σ_{P_{t}}^{2}} & \approx \frac{N_{1}}{\sqrt{N_{2} K}} \end{align}

(21)

\begin{align} case (2) \frac{σ_{w_{t}}^{2}}{σ_{P_{t}}^{2}} & \approx \frac{N_{1}}{N_{2}} . \end{align}

(22)

Equation (21) shows that the optimal power allocation in the high SNR case realizes a kind of trade-off between the pilot size and its power, such that the total energy $N_{1} σ_{P_{t}}^{2}$ is kept constant. This suggests us to use the smallest possible pilot size that meets the technical constraint of limited transmit power, to increase the effective channel throughput without loss of performance.

Equation (21) shows that in the difficult case of large dimensional system, one needs to allocate the same total energy to pilots and to data symbols, i.e., $N_{1} σ_{P_{t}}^{2} \approx N_{2} σ_{w_{t}}^{2}$ . In other words, we should give similar importance (in terms of power allocation) to the channel estimation and to the data detection.

6.4 Workouts on the optimal power allocation expressions of the DDST scheme

A similar workout is considered here for the DDST scheme. We consider the two previous limit cases, and we assume that the sample size is much larger than the number of transmitters, i.e., N≫K. In this context, we obtain the following approximations for the data-to-pilot power ratio:

\begin{align} case (1) \frac{σ_{w_{d}}^{2}}{σ_{P_{d}}^{2}} & \approx \sqrt{\frac{N}{K}} \end{align}

(23)

\begin{align} case (2) \frac{σ_{w_{t}}^{2}}{σ_{P_{t}}^{2}} & \approx 1 . \end{align}

(24)

Again, we observe that for the large-dimensional system case, one needs to allocate the same total energy to pilot and to the data. For high SNRs, one observes a kind of trade-off between the pilot power and size, but in a different way than the TDMT case. In fact, if we increase by a factor of 4 the sample size, one can increase the data-to-pilot power ratio by a factor of 2 without affecting the BER performance.

6.5 High SNR BER comparison of the two pilot design schemes

For the DDST scheme, the BER expression can be lower bounded as follows (using the convexity of $Q (\sqrt{bx})$ as a function of b):

\begin{align} {BER}_{d} & = \frac{1}{2^{\frac{1}{c_{1}} - 1}} \sum_{s = 0}^{\frac{1}{c_{1}} - 1} (\binom{\frac{1}{c_{1}} - 1}{s}) J (M - K + 1, K δ_{d}, 4 s^{2} c_{1}^{2}) \\ \geq J (M - K + 1, K δ_{d}, \frac{1}{2^{\frac{1}{c_{1}} - 1}} \sum_{s = 0}^{\frac{1}{c_{1}} - 1} (\binom{\frac{1}{c_{1}} - 1}{s}) 4 s^{2} c_{1}^{2}) \\ = J (M - K + 1, K δ_{d}, 1 - c_{1}) \\ \geq J (M - K + 1, K δ_{d}, 1), \end{align}

the latter inequality comes from the fact that J(m,a,b) is a decreasing function of its last argument. Now, in the high SNR and large sample size scenario (i.e., for $σ_{v}^{2} / σ_{T}^{2} ≪ 1$ and N≫N₁,K), we have δ_t≈δ_d and by continuity J(M−K+1,K δ_d,1)≈J(M−K+1,K δ_t,1)=BER_t. Consequently, in this context, the TDMT scheme is better than the DDST in terms of BER, i.e.,

{BER}_{d} \geq {BER}_{t} .

7 Simulations

Despite being valid only for the asymptotic regime, our results are found to yield a good accuracy even for very small system dimensions. In this section, we present the simulation results that compare between the TDMT and DDST schemes.

7.1 Performance comparison between DDST- and TDMT-based schemes

In this section, except when mentioning, we consider a 2×4 MIMO system (K=2, M=4) with a data block size N=32.

7.1.1 Bit error rate performance

Figure1 plots the empirical and theoretical BER under QPSK constellation for N=32, K=2, and M=4 for the TDMT- and DDST-based schemes. All comparisons are conducted in the context when both schemes have the same total energy. The number of training symbols is set to N₁=2 (N₂=30) for the TDMT scheme.

For low SNR values (SNR below 6 dB), both schemes achieve approximatively the same BER performance, and therefore, the DDST scheme outperforms its TDMT counterpart in terms of data rate, since it has a better bandwidth efficiency. For high SNR values, the noise caused by the data distortion is higher than the additive Gaussian noise, thus affecting the performance of the DDST scheme.

7.1.2 Applications

To compare the efficiency of the TDMT and DDST schemes, we consider applications in which the BER should be below a certain threshold, say 10⁻². This may be the case for instance of circuit-switched voice applications. Note that for non-coded systems, a target BER of 10⁻² is commonly used.

Application 1. In this scenario, we set the $SNR ≜ \frac{σ_{T}^{2}}{σ_{v}^{2}}$ to 15 dB. We then vary the ratio $c_{1} = \frac{K}{N}$ from 0.01 to 0.5. Since we consider K=2 and M=4, N=K/c₁ varies also with c₁. For each value of N, we compute the BER using (9) and (14). Figure2 illustrates the obtained results. We also superposed in the same plot the empirical results for the TDMT and the DDST scheme. The results show a good match, thereby supporting the usefulness of the derived results.

We note that the DDST scheme may be interesting for long enough frames (N≥16). For small frames (high distortion ratio c₁), the distortion of the data becomes too high, thus reducing the interest of the DDST scheme.

Application 2. In this experiment, we propose to determine for the TDMT scheme (K=2,M=4,N=32) the optimal ratio $\frac{N_{2}}{N_{1}}$ that has to be used to meet a certain quality of service. For that, we consider a scenario where the BER should be below 10⁻². Using (15), (16), and (9), we determine the minimum number of required training symbols to meet the BER lower bound requirement. We, then, plot the corresponding ratio $r = \frac{N_{2}}{N_{1}}$ with respect to the SNR. We note that if the SNR is below 2 dB, the BER requirement could not be achieved. This is to be compared with the DDST scheme where the SNR should be set at least to 10.5 dB so as to meet the BER lower bound requirement as it can be shown in Figure3. Moreover, for a SNR more than 8.5 dB, the minimum number of pilot symbols for channel identification (equal to K) is sufficient to meet the BER requirement.

8 Conclusion

In this paper, we have carried out theoretical studies on BER for two training-based schemes, namely, the basic time-division multiplexed training (TDMT) scheme and the data-dependent superimposed training (DDST)-based scheme. To make derivations possible, the asymptotic regime, where all the system dimensions grow to infinity with a constant pace, has been considered. For each scheme, we have derived closed-form approximations for the BER. We have also determined optimal power allocations of power between data and training that minimize the asymptotic BER.

Appendices

Appendix 1

Proof of Theorem 1

In the sequel, we propose to determine the asymptotic distribution of the post-processing noise of each entry of the matrix Δ W_t. Actually, the (i,j) entry of Δ W_t is given by

\begin{array}{l} {(Δ W_{t})}_{i, j} = - h_{i}^{#} Δ H_{t} w_{j} + h_{i}^{#} (I_{K} - Δ H_{t} H^{#}) v_{2, j} \\ + {\tilde{h}}_{i} {(Δ H_{t})}^{H} Π v_{2, j}, \end{array}

where $h_{i}^{#}$ and ${\tilde{h}}_{i}$ denote the i th row of H^# and (H^HH)⁻¹, respectively, and w_j and v_2,j denote the j th columns of W_t and V₂, respectively. Conditioned on H, V₁, and W_t, (Δ W_t)_{i, j} is a Gaussian random variable with mean equal to $- h_{i}^{#} Δ H_{t} w_{j}$ and variance

\begin{array}{l} σ_{w, K}^{2} = σ_{v}^{2} (h_{i}^{#} - h_{i}^{#} Δ H_{t} H^{#} + {\tilde{h}}_{i} {(Δ H_{t})}^{H} Π) \\ \times ({(h_{i}^{#})}^{H} - {(H^{#})}^{H} {Δ H}_{t}^{H} {(h_{i}^{#})}^{H} + Π Δ H_{t} {({\tilde{h}}_{i})}^{H}) . \end{array}

Since our proof will be based on the ‘characteristic function’ approach, we shall first recall the expression of the characteristic function for complex random variables:

Theorem 3

Let X_n be a complex Gaussian random variable with mean m_X,n and variance $σ_{X, n}^{2}$ , such that $E {(X_{n} - m_{X, n})}^{2} = 0$ . Then, X_n can be seen as a two-dimensional random variable corresponding to its real and imaginary parts. The characteristic function of X_n is, therefore, given by

\begin{array}{l} E [exp (ȷ ℜ (z^{*} X_{n}))] = exp (ȷ ℜ (z^{*} m_{X, n})) \\ \times exp (- \frac{1}{4} | z |^{2} σ_{X, n}^{2}) . \end{array}

Applying Theorem 3, the conditional characteristic function of (Δ W)_i,j can be written as

\begin{array}{l} E [exp (ȷ ℜ (z^{*} {(Δ W_{t})}_{i, j})) | V_{1}, H, W_{t}] \\ = exp (- ȷ ℜ (z^{*} h_{i}^{#} Δ H_{t} w_{j})) exp (- \frac{1}{4} | z |^{2} σ_{w, K}^{2}) . \end{array}

(25)

To remove the condition expectation on V₁ and W_t, one should prove that $σ_{w, K}^{2}$ converges almost surely to a deterministic quantity. Actually, $σ_{w, K}^{2}$ can be expanded as follows:

\begin{array}{l} σ_{w, K}^{2} = σ_{v}^{2} h_{i}^{#} {(h_{i}^{#})}^{H} + σ_{v}^{2} h_{i}^{#} Δ H_{t} {(H^{H} H)}^{- 1} (Δ H_{t}) (h_{i}^{#}) \\ - 2 σ_{v}^{2} ℜ (h_{i}^{#} Δ H_{t} {(H H)}^{- 1} (h_{i}^{#})) \\ + σ_{v}^{2} {\tilde{h}}_{i} {Δ H}^{H}_{t} Π Δ H_{t} {({\tilde{h}}_{i})}^{H} . \end{array}

Let

\begin{align} A_{σ, K} & = σ_{v}^{2} h_{i}^{#} Δ H_{t} {(H^{H} H)}^{- 1} {(Δ H_{t})}^{H} {(h_{i}^{#})}^{H} \\ B_{σ, K} & = σ_{v}^{2} {\tilde{h}}_{i} {Δ H}^{H}_{t} Π Δ H_{t} {({\tilde{h}}_{i})}^{H} \\ ε_{σ, K} & = h_{i}^{#} Δ H_{t} {(H^{H} H)}^{- 1} {(h_{i}^{#})}^{H} . \end{align}

The limiting behavior of A_σ,K can be derived using the following known results describing the asymptotic behavior of an important class of quadratic forms:

Lemma 3

[17, Lemma 2.7] Let x=[X₁,⋯,X_N]^T be a N×1 vector, where the X_n are centered i.i.d. complex random variables with unit variance. Let A be a deterministic N×N complex matrix. Then, for any p≥2, there exists a constant C_p depending on p only, such that

\begin{matrix} E {|\frac{1}{N} x^{H} Ax - \frac{1}{N} Tr (A)|}^{p} \leq \frac{C_{p}}{N^{p}} ({(E | X_{1} |^{4} Tr (A A^{H}))}^{p / 2} \\ + E | X_{1} |^{2 p} Tr ({(A A^{H})}^{p / 2})) . \end{matrix}

(26)

Noticing that Tr(A A^H)≤N∥A∥² and that Tr((A A^H)^p/2)≤N∥A∥^p, we obtain the simpler inequality

\begin{array}{l} E {|\frac{1}{N} x^{H} Ax - \frac{1}{N} Tr (A)|}^{p} \leq \frac{C_{p}}{N^{p / 2}} ∥ A ∥^{p} ({(E | X_{1} |^{2})}^{p / 2} \\ + E | X_{1} |^{2 p}) . \end{array}

(27)

Hence, if A and x have finite spectral norm and finite eight moment, respectively, we can conclude, using Borel-Cantelli lemma, about the almost convergence of the quadratic form $\frac{1}{N} x^{H} Ax$ , thus yielding the following corollary:

Corollary 1

Let x=[X₁,⋯,X_N]^T be a N×1 vector, where the entries x_i are centered i.i.d. complex random variables with unit variance and finite eight order. Let A be a deterministic N×N complex matrix with bounded spectral norm. Then,

\frac{1}{N} x^{H} Ax - \frac{1}{N} Tr (A) \to 0 almost surely .

By Corollary 1, the asymptotic behavior of A_σ,K is then given by

\begin{matrix} A_{σ, K} - \frac{σ_{v}^{2} {[{(H^{H} H)}^{- 1}]}_{i, i}}{N_{1} σ_{P}^{2}} Tr {(H^{H} H)}^{- 1} & \to 0 \\ almost surely . \end{matrix}

Since $\frac{1}{K} Tr {(H^{H} H)}^{- 1}$ converges asymptotically to $\frac{1}{c_{2} - 1}$ as the dimensions go to infinity[18], we get

A_{σ, K} - \frac{c_{1} (1 + r) σ_{v}^{4}}{(c_{2} - 1) σ_{P_{t}}^{2}} {[{(H^{H} H)}^{- 1}]}_{i, i} \to 0 .

Note that Theorem 1 can be applied since the smallest eigenvalue of the Wishart matrix (H H) are almost surely uniformly bounded away from zero by ${(1 - \sqrt{c_{2}})}^{2} > 0$ [19].

Before determining the limiting behavior of B_σ,K, we shall need the following lemma:

Lemma 4

Let $Y = {(\frac{1}{\sqrt{K}} y_{i, j})}_{i = 1, j = 1}^{M, K}$ be a M×K with Gaussian i.i.d entries. Then, in the asymptotic regime given by

M, K \to \infty such that \frac{M}{K} \to c_{2} > 1,

we have

{[{(Y^{H} Y)}^{- 2}]}_{i, i} - \frac{c_{2}}{c_{2} - 1} {({[{(Y^{H} Y)}^{- 1}]}_{i, i})}^{2} \to 0 .

Proof

Without loss of generality, we can restrict our proof to the case, where i=1. Let y₁,⋯,y_K denote the columns of Y. Matrix Y^HY is then given by

Y^{H} Y = [\begin{matrix} y_{1}^{H} y_{1} & y_{1}^{H} y_{2} & \dots & y_{1}^{H} y_{K} \\ ⋮ & ⋮ \\ y_{K}^{H} y_{1} & y_{K}^{H} y_{2} & \dots & y_{K}^{H} y_{K} . \end{matrix}]

Let v_y=[ [ (Y^HY)⁻¹]_1,2,⋯,[ (Y^HY)⁻¹]_1,K]. Then, using the formula of the inverse of block matrices, we get

v_{y} = - {[{(Y^{H} Y)}^{- 1}]}_{1, 1} y_{1}^{H} \tilde{Y} {(\tilde{Y} \tilde{Y})}^{- 1},

where $\tilde{Y} = [y_{2}, \dots, y_{K}]$ .

On the other hand,

\begin{array}{l} {[{(Y^{H} Y)}^{- 2}]}_{1, 1} = {({[{(Y^{H} Y)}^{- 1}]}_{1, 1})}^{2} + v_{y} v_{y}^{H} \\ = {({[{(Y^{H} Y)}^{- 1}]}_{1, 1})}^{2} \\ \times (1 + y_{1}^{H} \tilde{Y} {({\tilde{Y}}^{H} \tilde{Y})}^{- 2} {\tilde{Y}}^{H} y_{1}) . \end{array}

Using Corollary 1, we have

y_{1}^{H} \tilde{Y} {({\tilde{Y}}^{H} \tilde{Y})}^{- 2} {\tilde{Y}}^{H} y_{1} - \frac{1}{K} Tr {({\tilde{Y}}^{H} \tilde{Y})}^{- 1} \to 0 almost surely .

Since $\frac{1}{K} Tr {({\tilde{Y}}^{H} \tilde{Y})}^{- 1}$ tends to $\frac{1}{c_{2} - 1}$ almost surely, we get the desired result. □

We are now in position to deal with the term B_σ,K. Using Corollary 1, we get

B_{σ, K} - \frac{σ_{v}^{4} (M - K)}{N_{1} σ_{P}^{2}} {[{(H^{H} H)}^{- 2}]}_{i, i} \to 0 almost surely .

Hence,

\begin{matrix} B_{σ, K} - \frac{σ_{v}^{4} c_{1} (c_{2} - 1) (1 + r)}{σ_{P}^{2}} & {[{(H^{H} H)}^{- 2}]}_{i, i} \to 0 \\ almost surely . \end{matrix}

Using Lemma 4, we get that

\begin{matrix} B_{σ, K} - \frac{σ_{v}^{4} c_{1} c_{2} (1 + r)}{σ_{P}^{2}} {({[{(H^{H} H)}^{- 1}]}_{i, i})}^{2} & \to 0 \\ almost surely . \end{matrix}

It can be shown that [ (H^HH)⁻¹]_i,i converges almost surely to $\frac{1}{c_{2} - 1}$ (its inverse is the mean of independent random variables[12]). Then,

B_{σ, K} - \frac{σ_{v}^{4} c_{1} c_{2} (1 + r)}{σ_{P}^{2} (c_{2} - 1)} {[{(H^{H} H)}^{- 1}]}_{i, i} \to 0 almost surely .

To prove the almost sure convergence to zero of ε_σ,K, we will be basing on the following result, about the asymptotic behavior of weighted averages:

Theorem 4

Almost sure convergence of weighted averages[20] Let a=[a₁,⋯,a_N]^T be a sequence of N×1 deterministic real vectors with $sup_{N} \frac{1}{N} a_{N}^{T} a_{N} < + \infty$ . Let x_N=[x₁,⋯,x_N] be a N×1 real random vector with i.i.d. entries, such that $E x_{1} = 0$ and $E | x_{1} | < + \infty$ . Therefore, $\frac{1}{N} a_{N}^{T} x_{N}$ converges almost surely to zero as N tends to infinity.

This theorem was proven in[20] for real variables. Since we are interested in the asymptotic convergence of the real part of ε_σ,K, it can be possible to transpose our problem into the real case. Indeed, let $x = V_{1}^{H} h_{i}^{#}$ and $a = P_{t}^{H} {(H^{H} H)}^{- 1} h_{i}^{#}$ , then ℜ(ε_σ,K) is given by

ℜ (ε_{σ, K}) = \frac{1}{N_{1} σ_{P}^{2}} ℜ (x^{H} a) .

Let a_r and x_r (a_i and x_i, respectively) denote, respectively, the real parts (respectively, imaginary parts) of A and x. Then,

ℜ (ε_{σ, K}) = \frac{1}{N_{1} σ_{P}^{2}} a_{r}^{T} x_{r} - a_{i}^{T} x_{i} .

Referring to Theorem 4, the convergence to zero of ℜ(ε_σ,K) is ensured if $\frac{1}{2 N_{1}} (a_{r}^{T} a_{r} + a_{i}^{T} a_{i}) = \frac{1}{2 N_{1}} ∥ a ∥_{2}^{2}$ is finite. This is almost surely true, since

\begin{array}{l} \frac{1}{N_{1} σ_{P}^{2}} ∥ a ∥_{2}^{2} = \frac{1}{N_{1} σ_{P}^{2}} Tr (P_{t}^{H} {(H^{H} H)}^{- 1} h_{i}^{#} {(h_{i}^{#})}^{H} {(H^{H} H)}^{- 1} h_{i}^{#}) \\ = h_{i}^{#} {(H^{H} H)}^{- 2} {(h_{i}^{#})}^{H} < ∥ {(H^{H} H)}^{- 2} ∥_{2} {[{(H^{H} H)}^{- 1}]}_{i, i} . \end{array}

This leads to

σ_{w, K}^{2} - {\tilde{σ}}_{w, K}^{2} \to 0 almost surely,

where ${\tilde{σ}}_{w, K}^{2}$ is given by

\begin{matrix} {\tilde{σ}}_{w, K}^{2} = σ_{v}^{2} {[{(H^{H} H)}^{- 1}]}_{i, i} + \frac{c_{1} (c_{2} + 1) (1 + r) σ_{v}^{4}}{(c_{2} - 1) σ_{P}^{2}} {[{(H^{H} H)}^{- 1}]}_{i, i} . \end{matrix}

Substituting $σ_{w, K}^{2}$ by its asymptotic equivalent in (25), we get

\begin{array}{l} E [exp (ȷ ℜ (z^{*} {(Δ W_{t})}_{i, j})) | H, W_{t}] \\ - E [exp (- ȷ ℜ (z^{*} h_{i}^{#} Δ H_{t} w_{j})) | W, H] \\ \times exp (- \frac{1}{4} | z |^{2} {\tilde{σ}}_{w, K}^{2}) \to 0 almost surely . \end{array}

Also conditioning on W_t and H, $h_{i}^{#} Δ H_{t} w_{j}$ is a Gaussian random variable with zero mean and variance

σ_{m, K}^{2} = \frac{σ_{v}^{2}}{N_{1} σ_{P}^{2}} h_{i}^{#} {w_{j}}^{H} w_{j} {(h_{i})}^{#} .

Since $\frac{1}{K} {w_{j}}^{H} w_{j} \to σ_{w_{t}}^{2}$ almost surely, we get that $σ_{m, K}^{2}$ converges almost surely to ${\tilde{σ}}_{m, K}^{2}$ , where

{\tilde{σ}}_{m, K}^{2} = \frac{c_{1} (1 + r) σ_{v}^{2} σ_{w_{t}}^{2}}{σ_{P_{t}}^{2}} {[{(H^{H} H)}^{- 1}]}_{i, i} .

Using the fact that the characteristic function of $h_{i}^{#} Δ H_{t} w_{j}$ is

\begin{matrix} E [exp (- ȷ ℜ (z^{*} h_{i}^{#} Δ H_{t} w_{j})) | W, H] = exp (- \frac{1}{4} | z |^{2} σ_{m, K}^{2}), \end{matrix}

we obtain that conditionally on the channel

\begin{array}{l} E [exp (ȷ ℜ (z^{*} {(Δ W_{t})}_{i, j}))] \\ - exp (- \frac{1}{4} | z |^{2} ({\tilde{σ}}_{m, K}^{2} + {\tilde{σ}}_{w, K}^{2})) \to 0 almost surely . \end{array}

We end up the proof by noticing that ${\tilde{σ}}_{m, K}^{2} + {\tilde{σ}}_{w, K}^{2} = σ_{w_{t}}^{2} δ_{t} {[{(H^{H} H)}^{- 1}]}_{i, i}$ .

Appendix 2

Proof of Theorem 2

For the DDST scheme, the post-processing noise matrix Δ W_d is given by

\begin{array}{l} Δ W_{d} = - WJ - H^{#} Δ H_{d} W (I_{N} - J) \\ + (H^{#} - H^{#} Δ H_{d} H^{#}) V (I_{N} - J) \\ + {(H^{H} H)}^{- 1} {Δ H}_{d}^{H} Π V (I_{N} - J) \\ = - WJ - H^{#} Δ H_{d} W (I_{N} - J) + H^{#} V (I_{N} - J) \\ - H^{#} Δ H_{d} H^{#} V (I_{N} - J) \\ + {(H^{H} H)}^{- 1} {Δ H}_{d}^{H} Π V (I_{N} - J) . \end{array}

Hence,

\begin{array}{l} {(Δ W_{d})}_{i, j} = - {\tilde{w}}_{i} J_{j} - h_{i}^{#} V P^{H} {({PP}^{H})}^{- 1} W (e_{j} - J_{j}) \\ + h_{i}^{#} V (e_{j} - J_{j}) - h_{i}^{#} V P^{H} {(P P^{H})}^{- 1} H^{#} V (e_{j} - J_{j}) \\ + {\tilde{h}}_{i} {(P P^{H})}^{- 1} P V^{H} Π V (e_{j} - J_{j}), \end{array}

where e_j and J_j denote the j th columns of I_N and J, respectively, and ${\tilde{w}}_{i}$ denotes the i th row of the matrix W.

Let v₁=V(e_j−J_j), and v₂=vec(V(P P^H)⁻¹P).

The vector ${[v_{1}^{T}, v_{2}^{T}]}^{T}$ is a Gaussian vector. Since $E [v_{1} v_{2}^{H}] = 0$ , we conclude that V₁ and V₂ are independent. Then, V₁ and V₂=V(P P^H)⁻¹P^H are also independent. Moreover, $E [v_{1} v_{1}] = σ_{v}^{2} (1 - \frac{K}{N}) I_{N}$ .

Conditioning on V₂, H, and W, (Δ W_d)_i,j is a Gaussian random variable with a mean equal to $- {\tilde{w}}_{i} J_{j} - h_{i}^{#} V_{2} W (e_{j} - J_{j})$ and a variance $σ_{w_{d}, N}^{2}$ equal to

\begin{array}{l} σ_{w_{d}, N}^{2} = E [(h_{i}^{#} - h_{i}^{#} V_{2} H^{#} + {\tilde{h}}_{i} V_{2}^{H} Π) \\ \times v_{1} v_{1}^{H} ({(h_{i}^{#})}^{H} - {(H^{#})}^{H} V_{2}^{H} {(h_{i}^{#})}^{H} + Π V_{2} {\tilde{h}}_{i}^{H}) | V_{2}] \\ = E [h_{i}^{#} v_{1} v_{1} {(h_{i}^{#})}^{H}] + E [h_{i}^{#} V_{2} H^{#} v_{1} v_{1} {(H^{#})}^{H} V_{2}^{H} {(h_{i}^{#})}^{H}] \\ - 2 E [ℜ (h_{i}^{#} V_{2} H^{#} v_{1} v_{1} {(h_{i}^{#})}^{H})] \\ + σ_{v}^{2} (1 - \frac{K}{N}) {\tilde{h}}_{i} V_{2} Π V_{2} {({\tilde{h}}_{i})}^{H} \\ = (1 - \frac{K}{N}) σ_{v}^{2} {[{(H^{H} H)}^{- 1}]}_{i, i} \\ + σ_{v}^{2} (1 - \frac{K}{N}) h_{i}^{#} V_{2} {(H^{H} H)}^{- 1} V_{2} (h_{i}^{#}) \\ - 2 (1 - \frac{K}{N}) σ_{v}^{2} ℜ (h_{i}^{#} V_{2} H^{#} {(h_{i}^{#})}^{H}) \\ + σ_{v}^{2} (1 - \frac{K}{N}) {\tilde{h}}_{i} V_{2} Π V_{2} {({\tilde{h}}_{i})}^{H} . \end{array}

Using the same techniques as before, it can be proved that

\begin{matrix} (1 - \frac{K}{N}) σ_{v}^{2} h_{i}^{#} V_{2} {(H^{H} H)}^{- 1} V_{2} {(h_{i}^{#})}^{H} \\ - \frac{c_{1} (1 - c_{1}) σ_{v}^{4}}{(c_{2} - 1) σ_{P}^{2}} {[{(H^{H} H)}^{- 1}]}_{i, i} \to 0 \\ almost surely . \end{matrix}

and also that

ℜ (h_{i}^{#} V_{2} H^{#} {(h_{i}^{#})}^{H}) \to 0 almost surely .

On the other hand, we have

\begin{array}{l} σ_{v}^{2} (1 - c_{1}) {\tilde{h}}_{i} V_{2}^{H} Π V_{2} {({\tilde{h}}_{i})}^{H} \\ - \frac{c_{1} σ_{v}^{4} (1 - c_{1}) (M - K)}{N σ_{P}^{2}} {[{(H^{H} H)}^{- 2}]}_{i, i} \to 0 \\ almost surely . \end{array}

Since $[{(H^{H} H)}^{- 2}] - \frac{c_{2}}{c_{2} - 1} {[{(H^{H} H)}^{- 1}]}_{i, i}^{2} \to 0$ by Lemma 4, we get that

\begin{array}{l} σ_{v}^{2} (1 - c_{1}) {\tilde{h}}_{i} V_{2}^{H} Π V_{2} {({\tilde{h}}_{i})}^{H} \\ - \frac{σ_{v}^{4} (1 - c_{1}) c_{1} c_{2}}{(c_{2} - 1)} {[{(H^{H} H)}^{- 1}]}_{i, i} \to 0 . \end{array}

Therefore,

σ_{w_{d}, N}^{2} - {\tilde{σ}}_{w_{d}, N}^{2} \to 0 almost surely,

where,

\begin{array}{l} {\tilde{σ}}_{w_{d}, N}^{2} = (σ_{v}^{2} (1 - c_{1}) + \frac{c_{1} (c_{2} + 1) (1 - c_{1}) σ_{v}^{4}}{(c_{2} - 1) σ_{P_{d}}^{2}}) \\ \times {[{(H^{H} H)}^{- 1}]}_{i, i} . \end{array}

Consequently,

\begin{array}{l} E [exp (ȷ ℜ (z^{*} {(Δ W)}_{i, j})) | H, W, V_{2}] \\ = E [exp (- ȷ ℜ (z^{*} {\tilde{w}}_{i} J_{j} + z^{*} h_{i}^{#} V_{2} W (e_{j} - J_{j}))) | W, v_{2}] \\ \times exp (- \frac{1}{4} | z |^{2} {\tilde{σ}}_{w_{d}, N}^{2}) . \end{array}

Conditioning on W and H, ${\tilde{w}}_{i} J_{j} + h_{i}^{#} V_{2} W (e_{j} - J_{j})$ is a Gaussian random variable with a mean equal to ${\tilde{w}}_{i} J_{j}$ and a variance $σ_{w_{m}, N}^{2}$ given by

\begin{array}{l} σ_{m_{d}, N}^{2} = E [h_{i}^{#} V_{2} W (e_{j} - J_{j}) (e_{j}^{H} - J_{j}^{H}) W^{H} V_{2}^{H} {(h_{i}^{#})}^{H} | W, H] \\ = \frac{σ_{v}^{2}}{N σ_{P_{d}}^{2}} {[{(H^{H} H)}^{- 1}]}_{i, i} (e_{j}^{H} - J_{j}^{H}) W W^{H} (e_{j} - J_{j}) . \end{array}

Using Corollary 1, we can easily prove that

σ_{m_{d}, N}^{2} - {\tilde{σ}}_{m_{d}, N}^{2} \to 0 almost surely,

where

{\tilde{σ}}_{m_{d}, N}^{2} = \frac{(1 - c_{1}) σ_{w_{d}}^{2} σ_{v}^{2}}{σ_{P_{d}}^{2}} {[{(H^{H} H)}^{- 1}]}_{i, i} .

Conditioning only on H, the conditional characteristic function satisfies:

\begin{array}{l} E [exp (ȷ ℜ (z^{*} {(Δ W_{d})}_{i, j})) | H] - E [exp (- j ℜ (z^{*} {\tilde{w}}_{i} J_{j}))] \\ \times exp (- \frac{1}{4} | z |^{2} ({\tilde{σ}}_{w_{d}, N}^{2} + {\tilde{σ}}_{m_{d}, N}^{2})) \to 0 . \end{array}

Giving the structure of the matrix J, ${\tilde{w}}_{i} J_{j}$ involves the average of $\frac{1}{c_{1}}$ symmetric-independent and identically distributed discrete random variables, and therefore,

E [exp (- j ℜ (z^{*} {\tilde{w}}_{i}))] = \sum_{i = 1}^{Q} p_{i} exp (ȷ ℜ (z^{*} α_{i})),

where $Q$ is the set of all possible values of ${\bar{W}}_{i, k} = c_{1} \sum_{i = 1}^{\frac{1}{c_{1}}} W_{i, k}$ , and p_i is the probability that ${\bar{W}}_{i, k}$ takes the value α_i.

Consequently,

\begin{array}{l} E [exp (ȷ ℜ (z^{*} {(Δ W_{d})}_{i, j})) | H] = \sum_{i = 1}^{Q} p_{i} exp (ȷ ℜ (z^{*} α_{i})) \\ \times exp (- \frac{1}{4} | z |^{2} ({\tilde{σ}}_{m_{d}, N}^{2} + σ_{w_{d}, N}^{2})) . \end{array}

We conclude the proof by noting that

{\tilde{σ}}_{m_{d}, N}^{2} + σ_{w_{d}, N}^{2} = σ_{w_{d}}^{2} {[{(H H)}^{- 1}]}_{i, i} δ_{d} .

References

Hassibi B, Hochwald B: How much training is needed in multiple-antenna wireless links? IEEE Trans. Inform. Theory 2003, 49(4):951-963. 10.1109/TIT.2003.809594
Article MATH Google Scholar
Bohlin P, Tapio M: Performance evaluation of MIMO communication systems based on superimposed pilots. ICASSP 2004, 4: 425-428.
Google Scholar
Codray M, Bohlin P: Training-based MIMO systems—part I: performance comparison. IEEE Trans. Signal Process 2007, 55(11):5464-5476.
Article MathSciNet Google Scholar
Ghogho M, Swami A: Channel estimation for, MIMO systems using data-dependent superimposed training. Paper presented at the 42nd Allerton conference on communication and computing. Monticello, IL, USA; 29 September–1 October 2004.
Google Scholar
Wang C, Au EKS, Murch RD, Mow WH, Cheng RS, Lau V: On the performance of the MIMO zero-forcing receiver in the presence of channel estimation error. IEEE Trans. Wireless Commun 2007, 6(3):805-810.
Article Google Scholar
Au EKS, Wang C, Sfar S, Murch RD, Mow WH, Lau VKN, Cheng RS, Letaief KB: Error probability for MIMO zero-forcing receiver with adaptive power allocation in the presence of imperfect channel state information. IEEE Trans. Wireless Commun 2007, 6(4):1523-1529.
Article Google Scholar
Ghogho M, Swami A: Optimal training for affine-precoded and cyclic-prefixed block transmissions. Paper presented at the 2005 IEEE/SP 13th workshop on statistical signal processing. Novosibirsk, Russia; 17–20 July 2005.
Google Scholar
Magnus JR, Neudecker H: Matrix Differential Calculus with Applications in Statistics and Econometrics. Chichester: Wiley; 2007.
MATH Google Scholar
Wang C, Au EKS: Closed-form outage probability and BER of MIMO zero-forcing receiver in the presence of imperfect CSI. Paper presented at SPAWC ’06 IEEE 7th workshop on signal processing advances in wireless communications. Cannes, France; 2006.
Google Scholar
Proakis JG: Digital Communications. New York: McGraw-Hill; 1995.
MATH Google Scholar
Gore DA, Heath RW, Paulraj AJ: Transmit selection in spatial multiplexing systems. IEEE Commun. Lett 2002, 6(11):491-493.
Article Google Scholar
Winters J, Salz J, Gitlin R: The impact of antenna diversity on the capacity of wireless communication systems. IEEE Trans. Commun 1994, 42(234):1740-1751.
Article Google Scholar
Eng T, Milstein L: Coherent DS-CDMA performance in Nakagami multipath fading. IEEE Trans. Commun 1995, 43: 1134-1143. 10.1109/26.380145
Article MATH Google Scholar
Gradshteyn IS, Ryzhik IM: Table of Integrals, Series and Products. Amsterdam: Academic; 2007.
MATH Google Scholar
Xu R, Lau FCM: Performance analysis for MIMO systems using zero forcing detector over Rice fading channel. Proc. IEEE Int. Symp. Circ. Syst 2005, 5: 4955-4958.
Google Scholar
Hedayat A, Nosratinia A: Outage and diversity of linear receivers in flat-fading MIMO channels. IEEE Trans. Signal Process 2007, 55(12):5868-5873.
Article MathSciNet Google Scholar
Bai Z, Silverstein J: No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices. Ann. Probab 1998, 26: 316-345.
Article MathSciNet MATH Google Scholar
Tulino AM, Verdú S: Random Matrix Theory and Wireless Communications,. New Jersey: Now Publishers; 2004.
MATH Google Scholar
Silverstein JW: The smallest eigenvalue of a large dimensional Wishart matrix. Ann. Probab 1985, 13(4):1364-1368. 10.1214/aop/1176992819
Article MathSciNet MATH Google Scholar
Baxter J, Jones R, Lin M, Olsen J: SLLN for weighted independent identically distributed random variables. J. Theor. Probab 2004, 1: 165-181.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research has been supported by the ERC Starting Grant 305123 MORE (Advanced Mathematical Tools for Complex Network Engineering).

Author information

Authors and Affiliations

Alcatel-Lucent, Supélec, Plateau de Moulon, 3 rue Joliot Curie, 91192, Gif-sur-Yvette Cedex, Paris, France
Abla Kammoun
PRISME Lab., Ecole Polytechnique de l’Université d’Orléans, 12 rue de Blois, BP 6744, 45067, Orléans Cedex 2, France
Karim Abed-Meraim

Authors

Abla Kammoun
View author publications
You can also search for this author in PubMed Google Scholar
Karim Abed-Meraim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abla Kammoun.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Kammoun, A., Abed-Meraim, K. Performance analysis and optimal power allocation for linear receivers based on superimposed training. J Wireless Com Network 2013, 227 (2013). https://doi.org/10.1186/1687-1499-2013-227

Download citation

Received: 04 January 2013
Accepted: 19 August 2013
Published: 13 September 2013
DOI: https://doi.org/10.1186/1687-1499-2013-227

Performance analysis and optimal power allocation for linear receivers based on superimposed training

Abstract

1 Introduction

2 System model and problem setting

2.1 Time-division multiplexing scheme

Assumption 1

Assumption 2

Assumption 3

2.2 Data-dependent superimposed training scheme

Assumption 4

Assumption 5

3 Channel estimation and data detection

3.1 TDMT scheme

Assumption 6

3.2 DDST scheme

4 Bit error rate performance

4.1 TDMT scheme

Theorem 1

Remark 1

4.2 DDST scheme

Theorem 2

Proof

5 Optimal power allocation

5.1 Optimal power allocation for the TDMT scheme

Lemma 1

5.2 Optimal power allocation for the DDST scheme

Lemma 2

6 Discussion

6.1 High SNR behavior of the BER

6.2 Gaussian vs. Gaussian mixture model

6.3 Workouts on the optimal power allocation expressions of the TDMT scheme

6.4 Workouts on the optimal power allocation expressions of the DDST scheme

6.5 High SNR BER comparison of the two pilot design schemes

7 Simulations

7.1 Performance comparison between DDST- and TDMT-based schemes

7.1.1 Bit error rate performance

7.1.2 Applications

8 Conclusion

Appendices

Appendix 1

Proof of Theorem 1

Theorem 3

Lemma 3

Corollary 1

Lemma 4

Proof

Theorem 4

Appendix 2

Proof of Theorem 2

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords