Performance analysis and optimal power allocation for linear receivers based on superimposed training
© Kammoun and Abed-Meraim; licensee Springer. 2013
Received: 4 January 2013
Accepted: 19 August 2013
Published: 13 September 2013
In this paper, we derive a performance comparison between two training-based schemes for multiple-input multiple-output systems. The two schemes are the time-division multiplexing scheme and the recently proposed data-dependent superimposed pilot scheme. For both schemes, a closed-form expression for the bit error rate (BER) is provided. We also determine, for both schemes, the optimal allocation of power between the pilot and data that minimizes the BER.
KeywordsSuperimposed training sequence MIMO systems performance Linear receiver
The use of multiple-input multiple-output (MIMO) antenna systems enables high data rates without any increase in bandwidth or power consumption. However, the good performance of the MIMO systems requires a priori knowledge of the channel at the receiver. In many practical systems, the receiver estimates the channel by time division, multiplexing pilot symbols with the data. Although high quality of the channel estimation could be achieved especially when using a large number of pilot symbols, this method may entail a waste of the available channel resources. An alternative method is the conventional superimposed training. It consists in transmitting pilots and data at the same time. However, since during channel estimation, the data symbols act as a source of noise, the channel estimation is affected. In the literature, the impact of channel estimation error upon the performance indexes has been investigated. In and, a comparison between the performance of the conventional superimposed training scheme and the time-multiplexing-based scheme has been carried out. The optimal power allocation between pilot and data that maximizes a lower bound of the maximum mutual information criterion has been provided. It has been shown that the use of the optimal conventional superimposed training scheme entails a gain in terms of channel capacity only in special scenarios (many receive antennas and/or short coherence time). In other scenarios, the superimposed training scheme suffers from high channel estimation errors, and its gain over the time-multiplexing-based scheme is often lost. For this reason, many alternatives to the conventional superimposed training scheme have been proposed in recent works.
In, Ghogho and Swami proposed to introduce a distortion to the data symbols, prior to adding the known pilot in such a way to guarantee the orthogonality between pilot and data sequences. It is shown that the channel estimation performance is by far enhanced as compared to the standard superimposed scheme. This technique is referred to as the data-dependent superimposed training (DDST). While the DDST scheme exhibits the same channel performance as its time-division multiplexed training (TDMT) counterpart, the effect of the introduced distortion may considerably affect the detection performance. The aim of this paper is thus to study the BER performance of the DDST and TDMT schemes, and to evaluate to which extent the performance of the DDST scheme is altered.
In the literature, the few works focusing on BER performance have been based on unrealistic assumptions like the uncorrelation between the noise and channel estimation error[5, 6]. These assumptions make calculations feasible for fixed size dimensions but are far away from being realistic. To make derivations possible while keeping realistic conditions, we will relax the assumption of finite size dimensions by allowing the space and time dimensions to grow to infinity at the same rate. Working with the asymptotic regime allows us to simplify the derivations, and at the same time, we observe that the obtained results apply as well to usual sample and antenna array sizes. We show also that the obtained expressions can be used to determine the optimal power allocation that minimizes the BER.
The remainder of this paper is as follows: in the next section, we introduce the system model. After that, we review in section 3 the channel estimation and data detection processes for the TDMT and DDST schemes. Section 4 is dedicated to the derivation of the asymptotic BER expressions. Based on these results, we determine the optimal allocation of power between data and training for both schemes. Finally, simulation results are provided in section 7 to validate the analytical derivation.
The following notations are used in this paper: Superscripts H, #, and Tr(.) denote Hermitian, pseudo-inverse, and trace operators, respectively. The statistical expectation and the Kronecker product are denoted by and ⊗. The (K×K) identity matrix is denoted by I K , and the (Q×Q) matrix of all ones by 1 Q . The (i,j)th entry of a matrix A is denoted by Ai,j.
2 System model and problem setting
2.1 Time-division multiplexing scheme
We consider a M×K MIMO system operating over a flat fading channel. Two phases are considered:
where P t is the K×N1 pilot matrix, and
H is the M×K channel matrix with independent and identically distributed (i.i.d.) Gaussian variables with zero mean and variance.
V1 is the M×N1 matrix whose entries are i.i.d. with variance.
W t is the K×N2 data matrix with i.i.d. bounded data symbols of power, and V2 is the M×N2 additive Gaussian noise matrix with entries of zero mean and variance. Moreover, W t is independent of V1 and V2.
2.2 Data-dependent superimposed training scheme
W d is the data matrix with i.i.d. bounded data symbols of power, and V is the M×N matrix whose entries are i.i.d. zero mean with variance.
Moreover, P d is the K×N training matrix. The chosen pilot matrix P d should fulfill two requirements. It should be orthogonal to the distortion matrix D, thus satisfying D P d H=0, and also verify the orthogonality relation in order to minimize the channel estimation error subject to a fixed training power. A possible pilot matrix that meets these requirements is
3 Channel estimation and data detection
3.1 TDMT scheme
As it has been shown in, the optimal training matrix that minimizes the MSE under a constant training energy should satisfy
3.2 DDST scheme
4 Bit error rate performance
4.1 TDMT scheme
In order to evaluate the bit error rate performance, we need to evaluate the asymptotic behavior of the post-processing noise observed at each entry of matrix Δ W t . Using the ‘characteristic function’ approach, we can prove that conditioned on the channel matrix, the noise behaves asymptotically like a Gaussian random variable. This result is stated in the following theorem, but its proof is shown in Appendix 1.
and K→+ ∞ refers to this asymptotic regime.
Note that as compared to the results in, our results make appear a new additive term of order.
4.2 DDST scheme
Unlike the TDMT scheme, the asymptotic distribution of entries of the post-processing noise matrix is not Gaussian. Actually, we prove that
and is the cardinal of the set of all possible values of, and p i is the probability that takes the value α i .
where is the cardinal of the set of all possible values, and is the probability that takes the value.
See Appendix 2. □
The assumption of the Gaussianity of the post-processing noise has been always assumed. For time-division multiplexed training, this assumption is well founded, since the post-processing noise converges to a Gaussian distribution in the asymptotic regime (see Theorem 1).
In the superimposed training case, the distortion caused by the presence of data symbols affects the distribution of the post-processing noise which becomes asymptotically Gaussian mixture distributed. To assess the system performance in this particular case, we will start from the elementary definition of the bit error rate. Let Δ Wi,k denote the post-processing noise experienced at the i th antenna at time k (we omit the subscript d for ease of notations). As it has been previously shown that Δ Wi,k behaves as a Gaussian mixture random variable. Let be the asymptotic variance of Δ Wi,k, i.e.,.
5 Optimal power allocation
So far, we have provided the approximations of the BER for the TDMT and DDST schemes. As it has been previously shown, these expressions depend on the power allocated to data and training, in addition to other parameters. While the system has no control over the noise power or the number of transmitting and receiving antennas, it still can optimize the power allocation in such a way to minimize this performance index. Next, we provide for the TDMT and DDST schemes the optimal data and training power amounts that minimize the BER under the constraint of a constant total power.
5.1 Optimal power allocation for the TDMT scheme
Referring to the expressions of BER, we can easily see that the optimal amount of power allocated to data and pilot for the TDMT scheme is the one that minimizes δ t . Let, then minimizing δ t with respect to and under the constraint that ( being the mean energy per symbol) results in the following lemma:
5.2 Optimal power allocation for the DDST scheme
For the DDST scheme, we can deduce from (13) that maximizing γ d leads to minimize the BER. To maximize γ d , we need to optimize δ d as a function of and under the constraint that. After straightforward calculations, we can find that the optimal values for and are given by
To get more insight into the proposed analysis, we provide here some comments and workouts on the theoretical results derived in the previous sections.
6.1 High SNR behavior of the BER
where O(x) denotes a real value of the same order of magnitude as x. From these approximated expressions, one can observe that the BER at the TDMT scheme is a monomial function of the estimation error variance parameter δ and the number of transmitters K. For example, if the noise power is decreased by a factor 2, then the BER will decrease by 2M−K+1. The diversity gain is thus equal to M−K+1, which is in accordance with the works in and. Also, we observe that for the DDST case, we have a floor effect on the BER (i.e., the BER is lower bounded by) due to the data distortion inherent to this transmission scheme.
6.2 Gaussian vs. Gaussian mixture model
In our derivations, we have found that the post-processing noise in the DDST case behaves asymptotically as a Gaussian mixture process, while in most of the existing works, the noise is assumed to be asymptotically Gaussian distributed. In fact, one can show that for large sample sizes (i.e., when c1→0), the Gaussian mixture converges to a Gaussian distribution, allowing us to retrieve the standard Gaussian noise assumption. However, for small or moderate sample sizes, the considered Gaussian mixture model leads to a much better approximation of the BER analytical expression than the one we would obtain with a post-processing Gaussian noise model. In other words, Theorem 2 results allow us to derive closed-form expressions for the BER that are valid for relatively small sample sizes.
6.3 Workouts on the optimal power allocation expressions of the TDMT scheme
Equation (21) shows that the optimal power allocation in the high SNR case realizes a kind of trade-off between the pilot size and its power, such that the total energy is kept constant. This suggests us to use the smallest possible pilot size that meets the technical constraint of limited transmit power, to increase the effective channel throughput without loss of performance.
Equation (21) shows that in the difficult case of large dimensional system, one needs to allocate the same total energy to pilots and to data symbols, i.e.,. In other words, we should give similar importance (in terms of power allocation) to the channel estimation and to the data detection.
6.4 Workouts on the optimal power allocation expressions of the DDST scheme
Again, we observe that for the large-dimensional system case, one needs to allocate the same total energy to pilot and to the data. For high SNRs, one observes a kind of trade-off between the pilot power and size, but in a different way than the TDMT case. In fact, if we increase by a factor of 4 the sample size, one can increase the data-to-pilot power ratio by a factor of 2 without affecting the BER performance.
6.5 High SNR BER comparison of the two pilot design schemes
Despite being valid only for the asymptotic regime, our results are found to yield a good accuracy even for very small system dimensions. In this section, we present the simulation results that compare between the TDMT and DDST schemes.
7.1 Performance comparison between DDST- and TDMT-based schemes
In this section, except when mentioning, we consider a 2×4 MIMO system (K=2, M=4) with a data block size N=32.
7.1.1 Bit error rate performance
For low SNR values (SNR below 6 dB), both schemes achieve approximatively the same BER performance, and therefore, the DDST scheme outperforms its TDMT counterpart in terms of data rate, since it has a better bandwidth efficiency. For high SNR values, the noise caused by the data distortion is higher than the additive Gaussian noise, thus affecting the performance of the DDST scheme.
To compare the efficiency of the TDMT and DDST schemes, we consider applications in which the BER should be below a certain threshold, say 10−2. This may be the case for instance of circuit-switched voice applications. Note that for non-coded systems, a target BER of 10−2 is commonly used.
We note that the DDST scheme may be interesting for long enough frames (N≥16). For small frames (high distortion ratio c1), the distortion of the data becomes too high, thus reducing the interest of the DDST scheme.
In this paper, we have carried out theoretical studies on BER for two training-based schemes, namely, the basic time-division multiplexed training (TDMT) scheme and the data-dependent superimposed training (DDST)-based scheme. To make derivations possible, the asymptotic regime, where all the system dimensions grow to infinity with a constant pace, has been considered. For each scheme, we have derived closed-form approximations for the BER. We have also determined optimal power allocations of power between data and training that minimize the asymptotic BER.
Proof of Theorem 1
Since our proof will be based on the ‘characteristic function’ approach, we shall first recall the expression of the characteristic function for complex random variables:
The limiting behavior of Aσ,K can be derived using the following known results describing the asymptotic behavior of an important class of quadratic forms:
Hence, if A and x have finite spectral norm and finite eight moment, respectively, we can conclude, using Borel-Cantelli lemma, about the almost convergence of the quadratic form, thus yielding the following corollary:
Note that Theorem 1 can be applied since the smallest eigenvalue of the Wishart matrix (H H) are almost surely uniformly bounded away from zero by.
Before determining the limiting behavior of Bσ,K, we shall need the following lemma:
Since tends to almost surely, we get the desired result. □
To prove the almost sure convergence to zero of εσ,K, we will be basing on the following result, about the asymptotic behavior of weighted averages:
Almost sure convergence of weighted averages Let a=[a1,⋯,a N ]T be a sequence of N×1 deterministic real vectors with. Let x N =[x1,⋯,x N ] be a N×1 real random vector with i.i.d. entries, such that and. Therefore, converges almost surely to zero as N tends to infinity.
We end up the proof by noticing that.
Proof of Theorem 2
where e j and J j denote the j th columns of I N and J, respectively, and denotes the i th row of the matrix W.
Let v1=V(e j −J j ), and v2=vec(V(P PH)−1P).
The vector is a Gaussian vector. Since, we conclude that V1 and V2 are independent. Then, V1 and V2=V(P PH)−1PH are also independent. Moreover,.
where is the set of all possible values of, and p i is the probability that takes the value α i .