A simple importance sampling technique for orthogonal space-time block codes on Nakagami fading channels

In this contribution, we present a simple importance sampling technique to considerably speed up Monte Carlo simulations for bit error rate estimation of orthogonal space-time block coded systems on spatially correlated Nakagami fading channels.


Introduction
In digital communications, the bit error rate (BER) serves as a fundamental performance measure. For complicated systems, analytical assessment of the BER becomes very difficult, if not impossible, and Monte Carlo (MC) simulations must be applied to estimate the BER. In the case of low BER, however, the computation time associated with MC simulations can be very long, since many bits must be sent to generate a sufficient number of bit errors. In order to speed up these simulations without losing precision, importance sampling (IS) [1] can be employed. The latter technique aims to reduce the variance of the BER estimator by using a biased distribution for the input variables that increases the bit error probability during simulation. The use of the biased distribution is corrected for by weighting the results with the ratios of the actual to the biased probability density functions (PDFs), yielding an unbiased BER estimate with lower variance.
Although the application of IS is well documented, the search for a convenient biased distribution remains a challenging task. In conventional importance sampling (CIS) [2], the variance of the additive Gaussian noise is increased, resulting in a scaled noise distribution and more errors. The improved importance sampling (IIS) technique proposed in [3], on the other hand, shifts the mean of the actual PDF of the input random variables (RVs) directly towards the error region. Both techniques are combined in [4] for efficient simulation in Rayleigh fading. In order to speed up the simulation of orthogonal space-time block codes (OSTBCs) on Rayleigh-faded multiple-input multiple-output (MIMO) channels, it is shown in [5] that scaling the channel distribution is more efficient than scaling the noise distribution. However, an optimal biased channel distribution was not derived. An IS methodology based on the stochastic gradient descent (SGD) algorithm is presented in [6] for simulation of adaptive systems in frequency-flat Rayleigh fading channels.
In spite of the numerous papers on IS, an easy-to-use biased distribution for simulation over non-Rayleigh fading channels is not yet available from the literature to the best of our knowledge. In this contribution, we propose an IS technique for simulation over a frequency-flat MIMO fading channel, where we maintain the actual distribution for the channel noise and the data symbols, but use a biased channel distribution. Furthermore, we illustrate how the presented technique enables to derive a practical biased channel distribution for the case of OSTBCs transmitted over a spatially correlated Nakagami-m fading channel. Although perfect channel knowledge (PCK) is assumed for the derivation of the biased distribution, this distribution allows accurate BER estimation in a wide range of scenarios, such as pilot-based [7] or blind channel estimation, and in the presence of residual frequency offset, IQ imbalance, and phase noise. Moreover, the technique can easily be extended to the case of multicarrier communication on dispersive MIMO channels, using for instance OSTBCs or orthogonal space-frequency block codes.

Importance sampling
Let us assume that a vector s consisting of N s information symbols is coded and transmitted over a frequency-flat MIMO channel, the coefficients of which are comprised in the vector h. Collecting both s and h and the vector w consisting of the additive Gaussian noise samples into a single input vector x = [s, w, h], the average BER can be written as where F(x) is the fraction of bit errors corresponding to given x and E [.] denotes expectation over the PDF p(x) of the vector x. The fraction F(x) is given by where N b is the number of bits contained in the symbol vector s and I n (x) equals 1 when the decision about the nth bit is wrong and zero otherwise. When analytical averaging over x is too complex, a closed-form BER expression cannot be obtained. Using the MC method, however, the average BER can be easily estimated by independently generating a set of N realizations {x i } of the input vector x according to the actual PDF p(x) and simulating for each realization the system operations that yield the bit decisions at the receiver. An estimate of the BER is obtained as the ratio of the number of counted bit errors to the total number of bits transmitted Note that E P b = P b . As the vectors x i in (3) are independently generated, the variance of the BER estimator is given by which can be reduced by increasing the number of simulation runs N. Nevertheless, when differences in the vectors {x i } have a great impact on F(x i ), N must be prohibitively large (especially for small P b ) in order to obtain a sufficient estimation accuracy. When using IS, N * vectors {x i } are generated independently according to a biased distribution q(x), and the BER estimate is computed asP where the correction factors p(x i )/q(x i ) guarantee an unbiased BER estimate. Note that the biased distribution q(x) provides us with an additional degree of freedom which can be used to reduce the variance σ * 2 of the BER estimate, which is given by where E * [·] denotes expectation over the biased distribution q(x). Using a proper biased distribution, the simulation time to estimate the BER with a given precision can be reduced substantially as compared to conventional MC simulation, i.e., N * N. The variance σ * 2 is minimized when the expectation in (6) is minimized; clearly, this occurs for which yields σ * 2 = 0. However, the biased distribution from (7) is impractical, as it depends on the unknown bit error rate P b that is to be estimated by simulation. Nevertheless, (7) indicates that an efficient biased distribution should be proportional to an approximation of F(x)p(x).
In this contribution, we propose an IS approach where we keep the actual PDFs for the data symbols s and the additive channel noise w unchanged and search for a convenient biased distribution q(h) for the channel h. Hence, we have Using (8) and (9), it can be shown that (6) reduces to where E * [·] reduces to averaging over the biased channel distribution q(h) andF(h) is defined as with E s,w|h [·] denoting expectation over the conditional PDF p(s, w|h). Considering the similarity of (6) and (10), it follows that σ * 2 from (10) is minimum for where ∝ denotes proportionality. However, a closed-form expression forF(h) from (11) is usually not available or too complex to yield a practical biased distribution.
In order to find a suitable biased distribution, we rearrange (5) aŝ wherê is the IS estimate of the probability that a detection error for the nth bit occurs. As the variance ofP * b is hard to compute because of the correlation of the quantitiesP * b,n , we look for the biased distribution of the form (9) that minimizes the variance of the individual termsP * b,n rather than the variance ofP * b . Using the same reasoning that led to (12), this biased distribution that corresponds to the bit index n is wherẽ Note that E s,w|h [I n (s, w, h)] represents the conditional error probability of the nth bit, conditioned on h. By introducing P b,n (h) E s,w|h [I n (s, w, h)], the biased distribution (15) reduces to The exact expression of the conditional bit error probability P b,n (h) depends on the observation model and the type of receiver considered and is often unknown. Hence, a suitable approximation of P b,n (h) is needed to obtain a biased distribution (17) that adequately reduces the variance of the estimateP * b,n .

Application to OSTBCs over Nakagami-m fading channels
In this section, we derive a proper biased channel distribution for OSTBC systems operating over spatially correlated Nakagami-m fading channels. As an approximation of the actual conditional bit error probability P b,n (h) of the system affected by channel estimation errors, we take the conditional bit error probability P b,n (h) of a maximumlikelihood (ML) receiver with PCK, which in turn is well approximated by [8] where Q(·) is the Gaussian Q-function, β = 2λ in the case of binary phase-shift keying (BPSK), β = 3λ M−1 in the case of M-quadrature amplitude modulation (QAM) transmission with Gray mapping, and the parameter λ depends on the specific OSTBC [7]. A further approximation involves replacing Q(x) by the Chernoff upper bound (1/2) exp(−x 2 /2) [8]. It follows from (17) that the resulting biased distribution is given by which is independent of the bit index n, so that the bit index can be dropped. Hence, the same biased distribution can be used to reduce the variance of the individual bit error probability estimatesP * b,n for n = 1, . . . , N b , and therefore, it is expected to efficiently reduce the variance of the BER estimateP * b as well. The Nakagami-m distribution is a versatile statistical distribution able to accurately model a variety of fading environments by selecting a proper value for the fading parameter m ≥ 1/2 [8]. It also includes the Rayleigh distribution for m = 1. Assuming a MIMO channel vector h = [h 1 , . . . , h L ] with L elements, the correlation between the channel coefficients is represented by the L × L power correlation matrix , the entries of which are defined as ( [8] Eq. 9.195) where α = |h | and i, n = 1, 2, . . . , L. For integer and identical fading parameters, i.e., m = m, ∀ , it is shown in [9] that L correlated Nakagami-m RVs α can be obtained from 2m i.i.d. real-valued zero-mean (ZM) Gaussian random vectors {y 1 , . . . , y 2m }, where y k = y k,1 , y k,2 , . . . , y k,L T and y k,l with k = 1, 2, . . . , 2m and l = 1, 2, . . . , L are called auxiliary variables. In particular, by defining it is readily verified that α 's are correlated Nakagami-m RVs with E α 2 = and power correlation matrix , if the covariance matrix Q = E y k y T k of the column vectors y k is given by where the L × L diagonal matrix is given by = diag { 1 , 2 , . . . , L } and G = • 1 2 , with X • 1 2 denoting the element-wise square root of a matrix X. http://jwcn.eurasipjournals.com/content/2014/1/103 Since the channel coefficients are obtained from auxiliary variables, we derive the biased distribution of the auxiliary variables rather than of the channel coefficients. Introducing the vector y = y T 1 , y T 2 , . . . , y T 2m T and taking into account that ||h|| 2 = ||y|| 2 = 2m k=1 y T k y k and the vectors {y k } are i.i.d., the biased distribution q(y) of the auxiliary random variables reduces to q(y) = 2m k=1 q 0 (y k ), where and the PDF p 0 (y k ) of the random vector y k is ZM Gaussian with covariance matrix Q, i.e., It follows from (23) and (24) that q(y k ) is the joint PDF of L correlated ZM Gaussian RVs with a covariance matrix Q given by Taking the PDFs p 0 (y k ) and q 0 (y k ) into account, it follows that the correction factor p(x)/q(x) in (5) depends only on y and is given by It is important to note that, although (18) is obtained under the assumption of PCK, the resulting biased distribution is convenient also in the case of imperfect channel estimation (ICE), as demonstrated in Section 4, or in the presence of impairments, such as residual frequency offset, IQ imbalance, and phase noise, in which case P (ML) b,n (h) in (18) is regarded as an approximation of the actual P b,n (h). In addition, the proposed IS technique can easily be extended to the case of multicarrier communication on dispersive MIMO channels, using for instance OSTBCs or orthogonal space-frequency block codes. In multicarrier systems, the relevant channel distribution is the joint distribution of the channel transfer function values at the subcarrier frequencies, which is to be derived from the joint distribution of the channel impulse response samples. In the important case of jointly Gaussian channel impulse response samples (i.e., Rayleigh or Rice fading), the channel transfer function values are also jointly Gaussian.

Numerical results
In this section, we illustrate the computation time savings that can be achieved by using the biased sampling distribution proposed in Section 3. In order to guarantee a certain accuracy for the simulated BER results, we require that the ratio of the variance of the simulated BER to the square of its expectation does not exceed a prescribed value 2 whereP b is given by (3) or (5). Since the N realizations {x i } of the vector x are independently generated, we have , so that for a given accuracy 2 , the number of simulation runs N needs to satisfy In the following, the lower bound (28) will be computed from the simulations by substituting P b and var [F(x)] by the sample mean and variance resulting from {F(x i )}, respectively.
First, we consider BPSK transmission over a 1 × 2 SIMO channel affected by Nakagami-m fading with fading parameter m = 2 and power correlation matrix The receiver has perfect channel knowledge and performs ML detection. In the simulations, for each BPSK symbol, a new realization of x is generated. For MC simulations with and without IS, Figure 1 shows the minimal number of realizations N required to ensure a given accuracy associated with 2 = 10 −4 as a function of E b /N 0 , where E b = E s / log 2 (M) denotes the energy per information bit (for BPSK, M = 2). Although in both cases the required simulation time increases with the signal-to-noise ratio (SNR), it is observed that huge run-time savings of several orders of magnitude can be achieved by using the proposed biased sampling distribution, especially for high SNR values where the BER is low. Now we consider a 2 × 2 MIMO system employing Alamouti's code [10]. The receiver performs pilot-based linear minimum mean-square error (LMMSE) channel estimation, followed by mismatched ML detection [11]. To this end, the transmission is organized in frames containing 100 coded data symbols and 14 pilot symbols. The channel coefficients, which are assumed to remain constant during a frame, are distributed according to the http://jwcn.eurasipjournals.com/content/2014/1/103 In the simulations, a fixed number of N = 10 4 data frames is used, and for each data frame, a new realization of x is generated. Figure 2  deterioration of the accuracy can be observed for BER values below 10 −4 ; for BER ∈ (10 −6 , 10 −4 ), the deviation w.r.t. IS is up to 37.8%, and for BER values below 10 −6 , the results without IS are highly unreliable. The number of frames required to obtain an accuracy of 2 = 0.025 with and without using IS is shown in Figure 3 for 4-QAM and 16-QAM transmissions. Note that for given E b /N 0 , 16-QAM requires fewer frames than does 4-QAM to achieve a given estimation accuracy, because the formed constellation yields a larger BER than the latter. Again, it is observed from the figure that substantial run-time savings can be achieved using the proposed IS technique.
Let us now define the efficiency gain γ IS that is attained by applying IS as the ratio of the number of frames required to achieve a certain accuracy 2 by using straightforward MC simulations to the number of frames required to achieve the same accuracy when using IS. Obviously, this efficiency gain is directly related to the run-time savings achieved by IS. From (28), it follows that where the variances var [F(x)] | IS and var[F(x)] | MC for MC simulations with and without IS, respectively, are obtained from the MC simulation results {F(x i )} conducted with and without using IS. In order to investigate the impact of spatial correlation and the number of receive antennas on the efficiency gain, we consider Alamouti's code transmitted on a Nakagami-m MIMO channel with fading parameter m = 2 and captured by an ML receiver equipped with 1, 2, or 3 antennas. Taking into account that the power correlation matrix of the MIMO channel is given by = t ⊗ r , the diagonal elements of t and r are assumed to be given by 1, whereas the non-diagonal elements are given by ρ. Furthermore, we consider 4-QAM transmission, LMMSE channel estimation, and data frames consisting of 100 coded data symbols and 14 pilot symbols per transmit antenna. In Figure 4, the simulated BER is shown for a fixed number of N = 10 6 data frames transmitted. Obviously, straightforward MC simulation allows accurate BER estimation down to about 10 −7 , whereas employing IS enables accurate BER values down to 10 −17 with the same number of data frames. In addition, it is observed from the figure that spatial correlation with ρ = 0.85 results in a substantial degradation of the BER as compared to the case where all spatial channels are uncorrelated (ρ = 0). In Figure 5, we display the efficiency gain γ IS achieved by IS corresponding to the BER curves from values below 10 dB, efficiency gains between 2 and 50 can be observed. For SNR values above 10 dB, however, rapidly increasing efficiency gains up to 5 * 10 3 can be appreciated. Moreover, even higher efficiency gains are to be expected for the receiver with three antennas and, in the case of uncorrelated channels, the dual-antenna receiver. In general, it follows from the figure that for high SNR, the efficiency gain grows when the BER decreases, i.e., for increasing number of receive antennas and decreasing level of spatial correlation. As a result, the proposed IS sampling distribution allows practical BER simulation for OSTBC systems on spatially uncorrelated and correlated Nakagami fading channels, even when very low BERs are considered.

Conclusions
In this contribution, we presented a simple but efficient IS technique to speed up by orders of magnitude the BER simulations of OSTBC systems operating on spatially correlated frequency-flat Nakagami-m fading channels. While maintaining the actual distributions for the channel noise and the data symbols, we proposed a suitable biased distribution for the fading channel. This IS technique can be extended to the case of multicarrier communication on dispersive MIMO channels, using for instance OSTBCs or orthogonal spacefrequency block codes (OSFBCs). In multicarrier systems, the relevant channel distribution is the joint distribution of the channel transfer function values at the subcarrier frequencies, which is to be derived from the joint distribution of the channel impulse reponse samples. In the important case of jointly Gaussian channel impulse response samples, i.e., Rayleigh or Rice fading, the channel transfer function values are also jointly Gaussian.