Let us assume that a vector **s** consisting of *N*_{
s
} information symbols is coded and transmitted over a frequency-flat MIMO channel, the coefficients of which are comprised in the vector **h**. Collecting both **s** and **h** and the vector **w** consisting of the additive Gaussian noise samples into a single input vector **x**=[**s**,**w**,**h**], the average BER can be written as

{P}_{\mathrm{b}}=\mathbb{E}\left[F\left(\mathbf{x}\right)\right],

(1)

where *F*(**x**) is the fraction of bit errors corresponding to given **x** and \mathbb{E}\left[.\right] denotes expectation over the PDF *p*(**x**) of the vector **x**. The fraction *F*(**x**) is given by

F\left(\mathbf{x}\right)=\frac{1}{{N}_{\mathrm{b}}}\sum _{n=1}^{{N}_{\mathrm{b}}}{\mathit{I}}_{n}\left(\mathbf{x}\right),

(2)

where *N*_{b} is the number of bits contained in the symbol vector **s** and *I*_{
n
}(**x**) equals 1 when the decision about the *n* th bit is wrong and zero otherwise. When analytical averaging over **x** is too complex, a closed-form BER expression cannot be obtained. Using the MC method, however, the average BER can be easily estimated by independently generating a set of *N* realizations **{** **x**_{
i
}} of the input vector **x** according to the actual PDF *p*(**x**) and simulating for each realization the system operations that yield the bit decisions at the receiver. An estimate of the BER is obtained as the ratio of the number of counted bit errors to the total number of bits transmitted

{\widehat{P}}_{\mathrm{b}}\triangleq \frac{1}{N}\sum _{i=1}^{N}F\left({\mathbf{x}}_{i}\right).

(3)

Note that \mathbb{E}\left[{\widehat{P}}_{\mathrm{b}}\right]={P}_{\mathrm{b}}. As the vectors **x**_{
i
} in (3) are independently generated, the variance of the BER estimator is given by

\mathbb{E}\left[{\left({\widehat{P}}_{\mathrm{b}}-{P}_{\mathrm{b}}\right)}^{2}\right]=\frac{1}{N}\left(\mathbb{E}\left[{F}^{2}\left(\mathbf{x}\right)\right]-{P}_{\mathrm{b}}^{2}\right),

(4)

which can be reduced by increasing the number of simulation runs *N*. Nevertheless, when differences in the vectors **{** **x**_{
i
}} have a great impact on *F*(**x**_{
i
}), *N* must be prohibitively large (especially for small *P*_{b}) in order to obtain a sufficient estimation accuracy. When using IS, *N*^{∗} vectors **{** **x**_{
i
}} are generated independently according to a biased distribution *q*(**x**), and the BER estimate is computed as

{\widehat{P}}_{\mathrm{b}}^{\ast}\triangleq \frac{1}{{N}^{\ast}}\sum _{i=1}^{{N}^{\ast}}F\left({\mathbf{x}}_{i}\right)\frac{p\left({\mathbf{x}}_{i}\right)}{q\left({\mathbf{x}}_{i}\right)},

(5)

where the correction factors *p*(**x**_{
i
})/*q*(**x**_{
i
}) guarantee an unbiased BER estimate. Note that the biased distribution *q*(**x**) provides us with an additional degree of freedom which can be used to reduce the variance *σ*^{∗2} of the BER estimate, which is given by

{\sigma}^{\ast 2}=\frac{1}{{N}^{\ast}}\left({\mathbb{E}}^{\ast}\left[{\left(\frac{F\left(\mathbf{x}\right)p\left(\mathbf{x}\right)}{q\left(\mathbf{x}\right)}\right)}^{2}\right]-{P}_{\mathrm{b}}^{2}\right),

(6)

where {\mathbb{E}}^{\ast}[\phantom{\rule{0.3em}{0ex}}\xb7] denotes expectation over the biased distribution *q*(**x**). Using a proper biased distribution, the simulation time to estimate the BER with a given precision can be reduced substantially as compared to conventional MC simulation, i.e., *N*^{∗}≪*N*. The variance *σ*^{∗2} is minimized when the expectation in (6) is minimized; clearly, this occurs for

q\left(\mathbf{x}\right)=\frac{F\left(\mathbf{x}\right)p\left(\mathbf{x}\right)}{{P}_{\mathrm{b}}},

(7)

which yields *σ*^{∗2}=0. However, the biased distribution from (7) is impractical, as it depends on the unknown bit error rate *P*_{b} that is to be estimated by simulation. Nevertheless, (7) indicates that an efficient biased distribution should be proportional to an approximation of *F*(**x**)*p*(**x**).

In this contribution, we propose an IS approach where we keep the actual PDFs for the data symbols **s** and the additive channel noise **w** unchanged and search for a convenient biased distribution *q*(**h**) for the channel **h**. Hence, we have

p\left(\mathbf{x}\right)=p(\mathbf{s},\mathbf{w},\mathbf{h})=p(\mathbf{s},\mathbf{w}|\mathbf{h}\left)p\right(\mathbf{h}),

(8)

q\left(\mathbf{x}\right)=q(\mathbf{s},\mathbf{w},\mathbf{h})=p(\mathbf{s},\mathbf{w}|\mathbf{h}\left)q\right(\mathbf{h}).

(9)

Using (8) and (9), it can be shown that (6) reduces to

{\sigma}^{\ast 2}=\frac{1}{{N}^{\ast}}\left({\mathbb{E}}^{\ast}\left[{\left(\stackrel{~}{F}\left(\mathbf{h}\right)\frac{p\left(\mathbf{h}\right)}{q\left(\mathbf{h}\right)}\right)}^{2}\right]-{P}_{\mathrm{b}}^{2}\right),

(10)

where {\mathbb{E}}^{\ast}\left[\xb7\right] reduces to averaging over the biased channel distribution *q*(**h**) and \stackrel{~}{F}\left(\mathbf{h}\right) is defined as

\stackrel{~}{F}\left(\mathbf{h}\right)=\sqrt{{\mathbb{E}}_{\mathbf{s},\mathbf{w}|\mathbf{h}}\left[{F}^{2}(\mathbf{s},\mathbf{w},\mathbf{h})\right]},

(11)

with {\mathbb{E}}_{\mathbf{s},\mathbf{w}|\mathbf{h}}[\phantom{\rule{0.3em}{0ex}}\xb7] denoting expectation over the conditional PDF *p*(**s**,**w**|**h**). Considering the similarity of (6) and (10), it follows that *σ*^{∗2} from (10) is minimum for

q\left(\mathbf{h}\right)\propto \stackrel{~}{F}\left(\mathbf{h}\right)p\left(\mathbf{h}\right),

(12)

where ∝ denotes proportionality. However, a closed-form expression for \stackrel{~}{F}\left(\mathbf{h}\right) from (11) is usually not available or too complex to yield a practical biased distribution.

In order to find a suitable biased distribution, we rearrange (5) as

{\widehat{P}}_{\mathrm{b}}^{\ast}=\frac{1}{{N}_{\mathrm{b}}}\sum _{n=1}^{{N}_{\mathrm{b}}}{\widehat{P}}_{\mathrm{b},n}^{\ast},

(13)

where

{\widehat{P}}_{\mathrm{b},n}^{\ast}=\frac{1}{{N}^{\ast}}\sum _{i=1}^{{N}^{\ast}}{I}_{n}\left({\mathbf{x}}_{i}\right)\frac{p\left({\mathbf{x}}_{i}\right)}{q\left({\mathbf{x}}_{i}\right)}

(14)

is the IS estimate of the probability that a detection error for the *n* th bit occurs. As the variance of {\widehat{P}}_{\mathrm{b}}^{\ast} is hard to compute because of the correlation of the quantities {\widehat{P}}_{\mathrm{b},n}^{\ast}, we look for the biased distribution of the form (9) that minimizes the variance of the individual terms {\widehat{P}}_{\mathrm{b},n}^{\ast} rather than the variance of {\widehat{P}}_{\mathrm{b}}^{\ast}. Using the same reasoning that led to (12), this biased distribution that corresponds to the bit index *n* is

{q}_{n}\left(\mathbf{h}\right)\propto {\u0128}_{n}\left(\mathbf{h}\right)p\left(\mathbf{h}\right),

(15)

where

{\u0128}_{n}\left(\mathbf{h}\right)=\sqrt{{\mathbb{E}}_{\mathbf{s},\mathbf{w}|\mathbf{h}}\left[{I}_{n}^{2}(\mathbf{s},\mathbf{w},\mathbf{h})\right]}=\sqrt{{\mathbb{E}}_{\mathbf{s},\mathbf{w}|\mathbf{h}}\left[{I}_{n}(\mathbf{s},\mathbf{w},\mathbf{h})\right]}.

(16)

Note that {\mathbb{E}}_{\mathbf{s},\mathbf{w}|\mathbf{h}}\left[{I}_{n}(\mathbf{s},\mathbf{w},\mathbf{h})\right] represents the conditional error probability of the *n* th bit, conditioned on **h**. By introducing {P}_{\mathrm{b},n}\left(\mathbf{h}\right)\triangleq {\mathbb{E}}_{\mathbf{s},\mathbf{w}|\mathbf{h}}\left[{I}_{n}(\mathbf{s},\mathbf{w},\mathbf{h})\right], the biased distribution (15) reduces to

{q}_{n}\left(\mathbf{h}\right)\propto \sqrt{{P}_{\mathrm{b},n}\left(\mathbf{h}\right)}p\left(\mathbf{h}\right).

(17)

The exact expression of the conditional bit error probability *P*_{b,n}(**h**) depends on the observation model and the type of receiver considered and is often unknown. Hence, a suitable approximation of *P*_{b,n}(**h**) is needed to obtain a biased distribution (17) that adequately reduces the variance of the estimate {\widehat{P}}_{\mathrm{b},n}^{\ast}.