A Fast Soft Bit Error Rate Estimation Method

,


Introduction
To study the performance of a digital communications system, we need to use, in general, the Monte Carlo (MC) method to estimate the BER.A tutorial exposition of different techniques is provided in [1] with particular reference to four other specific methods: modified Monte Carlo simulation (importance sampling), extreme value theory, tail extrapolation, and quasianalytical method.The modified Monte Carlo is achieved by importance sampling which means that important events, and then errors, are artificially generated by biasing the noise process.At the end of simulation, the error count must be properly unbiased (see also [2]).The extreme value theory (see [3]) assumes that the pdf can be approximated by exponential function.The tail extrapolation method, which is a subset of the previous one, is based on the assumption that only the tail region of the pdf can be described by a generalized exponential class.The quasianalytical method combines noiseless simulation with analytical representation of noise.In [4], High-Order Statistics (HOS) of the bit Log-Likelihood-Ratio (LLR) are used for evaluating performance of turbo-like codes.In this case, the characteristic function of the bit LLR is estimated by using its first cumulants (or moments).The pdf is therefore computed by using the inverse Fourier transform.The reader can find other recent papers on this topic (see [5][6][7][8]).
In [9], we have suggested a soft BER estimation based on the nonparametric computation of the pdf of the received data.We have shown, in this last case, that hard decision is not needed to compute the BER and that the total necessary number of transmitted data is very small compared to the classical MC simulation.This allows a significant reduction in run time for computer simulations and may also be used as the basis for rapid real-time BER estimation in real communication systems.
In this paper, we suggest to use a Gaussian Mixture (GM) model to estimate the pdf of the observed samples.Two conditional pdfs are computed corresponding to the transmitted bits equal to ±1.The Expectation Maximisation (EM) algorithm is used to estimate, in an iterative fashion, the different parameters of this mixture, that is, the means, the variances, and the a priori probabilities.The different parameters are estimated by using the Maximum Expectation of joint Likelihood criterion.The analytical expression of the BER is therefore simply given by using the different estimated parameters of the Gaussian Mixture.The choice of the number of Gaussians for each pdf is very important.In [10], a method, based on Mutual Information Theory, was presented to find the optimal number of Gaussians in order to give an accurate estimation of the pdf.Our suggested analytical expression of the BER, based on the Gaussian Mixture model, where parameters are jointly estimated by EM algorithm and Mutual Information theory leads to an efficient fast way to estimate the performance of a digital communications system.Simulation results are carried out to compare the three mentioned methods: Monte Carlo, Kernel, and Gaussian Mixture.We analyze the performance of the proposed BER estimator in the framework of a multiuser code division multiple access (CDMA) system with single user detection and show that attractive performance is achieved compared with conventional Monte Carlo (MC) or Kernel aided techniques.The results show that the GM method can drastically reduce the needed number of samples to estimate the BER in order to reduce the simulation runtime even at very low BER.
The main idea of this paper is the use, in an iterative way, of the Expectation Maximisation (EM) Algorithm, for the pdf estimation with Gaussian Mixture model, jointly with the Mutual Information Theory, for the computation of the optimal number of Gaussians.The analytical expression of the BER is therefore given by using the different estimated parameters of the Gaussian Mixture.
The EM algorithm was introduced for the first time by Dempster et al. [11].It is an iterative computation method of maximum likelihood estimates of missing data from observable variables.In this paper observable variables are simply given by soft output values at the receiver of a digital communications system.Missing data is given by unknown true component (Gaussian) from which the observation comes.Two conditional pdfs, according to the transmitted bits ±1, must be estimated.For each pdf, a Gaussian Mixture model, with a large enough initial number of components, is used.Then, the EM algorithm performs, in an iterative way, the estimation of the parameters for each component, that is, means, variances, and a priori probabilities.The Mutual Information (MI), according to Shannon Theory, is computed.A component with positive MI is assumed to be dependent on others components and could be removed without damaging the pdf estimation.The EM algorithm can be performed with a new decreased value of the number of Gaussians.The algorithm stops when the maximum of the computed MI, over all the components, is nonpositive which means that all the reached components are likely independent and therefore gives an optimal structure of the Gaussian Mixture model.The two conditional pdfs are then estimated, in a parallel fashion, by using the Mutual Information Theory to compute iteratively the optimal number of components and a subiteration for the EM algorithm to estimate the different parameters of each component.An analytical expression of the BER is therefore obtained by using all parameters of the Gaussian Model at the last iteration.
Let us recall that the EM algorithm has mainly been used, in the past twenty last years, in image processing or more precisely in image segmentation for different applications such as image or video compression.The reader can find in [12] an example of an application of SEM (Stochastic version of EM) algorithm in SPOT satellite image segmentation where a Gaussian distribution is assumed for each class.In [13], a hybrid version of SEM is used assuming a generalized Dirichlet distribution.
Nonparametric pdf estimation has also been used in different applications such as speech coding and pattern recognition [14,15].The Gaussian Mixture model has also been used in speaker identification [16].
Let us consider a general communications system (see Figure 1) where bit information is transmitted using any kind of transmission schemes such as CDMA, FDMA, TDMA, MC-CDMA techniques, with or without channel coding, space time coding, using single or multitransmit antennas, transmitting over Gaussian, fading or multipath, fixed or time variant, channel.At the receiver, any kind of detection such as MIMO equalization, multiuser detection, turbo techniques detection, or simply Rake receiver, may be implemented.The only assumption we use in this paper is the fact that the receiver is able to perform the soft decision (X i in Figure 1) which is right before the hard decision, b i , of each transmitted bit.In this paper, only binary phase shift key (BPSK) modulation is used.The case of other kinds of modulation is left for future work.
Let (b i ) 1≤i≤N ∈ {+1, −1} be a set of N transmitted bits.The (b i ) 1≤i≤N are assumed to be independent and identically distributed with P[b i = +1] = π + and P[b i = −1] = π − , where π + + π − = 1.Let us note that (X i ) 1≤i≤N the corresponding soft output at the receiver such as the hard decision is taken by using its sign: b i = sgn(X i ).All the received soft output decisions (X i ) 1≤i≤N are random variables having the same pdf, f X (x).
Throughout the paper, the following notation is used.The output decisions (X i ) 1≤i≤N are random variables having the same pdf, f X (x).The cardinality N of set C is denoted N = |C|.When X is a random variable, E[X] and Var[X] denote the mathematical expectation and variance of X, respectively.When f is a second derivative function, ( f ) (x) and ( f ) (x) denote its first and second derivatives at point x, respectively.sgn(•) denotes the sign of the argument, and ln(•) is the natural log function.P[•] is the probability of a given event, and superscript denotes the transpose.
The paper is organized as follows: Section 2 briefly shows how the probability density function (pdf) of the soft output signal at the receiver is estimated using the Kernel method in a nonparametric way by estimating the optimal smoothing parameter.The BER is performed based on all soft observations and the smoothing parameter value.In Section 3, we will show how a Gaussian Mixture model can be performed, for each conditional pdf, by using the Expectation Maximization (EM) algorithm.The Mutual Information is used to compute iteratively the optimal number of Gaussians.The BER is, therefore, simply computed by using all parameters (means, variances, and a priori probabilities) for each conditional pdf given by the EM algorithm at the last iteration.Different simulation results are presented in 4. Finally, a brief conclusion is given in Section 5. Proofs of all theoretical results are given in the appendices.

Pdf Estimation Based on Kernel Method.
A brief description of the kernel method simulation will be given in this section.The reader can find more details in our previous work ([9]).Let us note that f X (x), the pdf of the output observations, is a mixture of the two conditional pdfs and then can be written as Where We assume that we know the exact partitions of the observations {(X i ) 1≤i≤N } into two classes (or partitions) C + and C − which, respectively, contains the observed received soft bit X i such as the corresponding transmitted bit information b i = +1 (resp., b i = −1).Let N + (resp., N − ) be the cardinality of C + (resp., C − ).The kernel method ( [17][18][19]) is used to estimate the different pdfs.In this case, the estimation of the conditional pdf, f b+ X , can be given by the following formula: Where h N+ is the smoothing parameter which depends on the length of the observed samples, N + .K(•) is any pdf (called the kernel) assumed to be an even and regular (i.e, square integrated) function with unit variance and zero mean.
For simplicity reasons, we will not give the corresponding equations for the conditional pdf, f b− X .The reader can easily find them by replacing "+" by "−".
The choice of the smoothing parameter h N+ is very important ( [14,15]).Let us note that for conditional Gaussian distribution f b+ X ∼ N (m, σ 2 ), and for Gaussian kernel, the optimal smoothing parameter is given by (3)

BER Estimation Based on Kernel Method.
The BER is given by: To estimate the BER of our system, we must evaluate the expression of (4).We can show that for the chosen Gaussian kernel, a soft BER estimation can be given by the following expression (see proof in [20]): where Q(•) denotes the complementary unit cumulative Gaussian distribution, for example, We have given some theoretical studies, in [20], regarding the convergence of this BER estimator.We have shown that this estimator is asymptotically unbiased.Different details can be found in [20].

Gaussian Mixture for BER Estimation
In this section, instead of using the Kernel method (given by ( 2)), a Gaussian Mixture (GM) model will be used.The mixture model is used in general for its mathematical flexibilities.For example, a mixture of two Gaussian distributions with different means and different variances results in a density with two modes, which is not a standard parametric distribution model.Mixture distributions can model extreme events better than the basic Gaussian ones.More details about Mixture distributions can be found in [21].
The following sub section will show how to estimate the two conditional pdfs using a Gaussian Mixture model.The Expectation Maximization (EM) algorithm will be performed to compute the mean, the variance and the a priori probability of each component of this mixture.Therfore, Section 3.2 will show how the BER can be simply computed by using these different estimated parameters.For simplicity reason, equations are developped only for one conditional pdf, f b+ X (•).The reader can easily find all the corresponding equations for the estimation of f b− X (•).

Pdf Estimation
Based on Gaussian Mixture Method.In this section, we will assume that the conditional pdf f b+ X is a mixture of K + Gaussians as follows: (see [21]) where α + k is the a priori probability of the kth component for the Gaussian mixture and k as a mean and σ +2 k as a variance.We have the constraint that K+ k=1 α + k = 1.Let X = (X) 1≤i≤N+ be the soft observed samples corresponding to the transmitted bits equal to +1.As the pdf of the obseved samples is a mixture of K + Gaussians, this means (see [21]) that each X i is produced by one component k of this mixture (1 ≤ k ≤ K + ).We have to find the value of this component: this is the missing data that we will try to compute.Let Z = (Z i ) 1≤i≤K+ be the missing data which is a sequence of variables that determines the component from which the observations originate.
In order to estimate the conditional pdf, from (6), we have to estimate the unknown parameters represented by The criterion we will use is the maximization of the conditional expectation of the joint likelihood of both observed samples, X, and missing data, Z (see [11] for details about this criterion).The likelihood function is given by: where 1 k (•) is the indicator function given by: In this section, we will use the Expection Maximization algorithm to estimate, in an iterative way, the unknown parameter θ + .For each new iteration t and for a given estimate of the paramater θ (t−1) + , computed at a previous iteration, two steps are performed.In the first one, that is, Estimation step, we will compute the different a posteriori probabilities (APP):

+
].In the second one, that is, Maximization step, we will compute the new parameter θ (t)  + by maximizing the conditional expectation of the Joint Log likelihood of observed samples, X, and missing data, Z.

Estimation
Step.In this step, at iteration t, we estimate the unobserved component in the Gaussian mixture for each observed sample (X i ) 1≤i≤N+ using the parameter value θ (t−1) + , computed at the last Maximization step at the previous iteration (t − 1).Then, using simple Bayes' rule, we have: Therefore, for i = 1, . . ., N + , and for k = 1, . . ., K + , we have:

Maximization
Step.Now, at the current iteration t, we will maximize the conditional expectation of the log-likelihood of the joint event, assuming independent observation X i .Then, we have: Where L(θ (t−1) + , X, Z) is the joint likelihood event given by (7).
We can show that for k = 1, . . ., K + , the new parameters are given by: (see Appendices A-D for proofs)

BER Estimation Based on Gaussian Mixture Method.
In this section, we derive the expression of BER estimate assuming Gaussian Mixture based pdf estimator.Let θ (T) ) 1≤i≤K− the reached values at the last iteration of the EM algorithm described in the previous Section 3.1.The parameter θ (T) + (resp., θ (T) − ) allows the estimation of the conditional pdf f b+ X (•) (resp., f b− X (•)).Let us underline that we need to perform the EM algorithm two times and in independent way.At each time, a different data base is used: C + (resp., C − ), of soft observations corresponding to the transmitted bits equal +1 (resp., −1)  to estimate θ (T)  + (resp., θ (T) − ).At the last iteration T of EM algorithm, reliable estimates of θ (T)  + and θ (T) − are reached and the BER is computed using the obtained estimates.Let us recall the expression of the BER given by (4).Replacing the two conditional pdfs by their Gaussian Mixture based estimates (6) using the parameters θ (T)  + and θ (T) − , the BER estimates is simply computed as Where Q(•) denotes the classical complementary unit cumulative Gaussian distribution.Details regarding the derivation of ( 15) are provided in Appendix D.

Optimal Choice of the Number of Components.
The choice of K − and K + is very important.It is clear that if this number of components is too low, the corresponding pdf will be too smooth and then the BER less reliable.On the other hand, if this number is too high, this means that the same class of observed samples comes from different components and then these components should be correlated which is not useful for simulation since all the observed data are assumed to be independent.Consequently, the optimal number of components has to be the largest one such that all the components are independent.Similar method has been suggested by [16] for speaker identification applications by increasing the number of classes in the k-means algorithm.
More mathematical details can be found in [10].Here, we suggest to initialize the algorithm with a high enough value, to perform the EM algorithm to estimate the different components and test their independence after the last iteration.If it is not the case, we have to decrease iteratively the number of components until the independence is reached.
To test the independence of two components k 1 and k 2 , mutual information theory, as proposed by Shannon [22], can be used.For speaker identification application, in [16], the mutual relationship of the two components has been defined as Where, p + (k 1 ) = α + k is the probability of the mixture k 1 (see ( 6)), and p + (k 1 , k 2 ) is the joint probability of these two components as follow, The sign of the expression of ( 16) allows us to know whether the two components are statistically independent: if sgn(MI + (k 1 , k 2 )) = 0 then the two components are independent (p + (k 1 , k 2 ) = p + (k 1 )p + (k 2 )), if sgn(MI + (k 1 , k 2 )) > 0 then the two components are statistically dependent and then one of these components can be removed without damaging the estimation of the pdf.If sgn(MI + (k 1 , k 2 )) < 0, the two components are much less correlated.So, the following quantity, allows us to know if we have to reduce the number of components or not.To find the optimal number of components K − and K + , we have to just choose a large enough initial number and at the end of the EM algorithm, the sign of I + (or I − ) allows us to know if we have to reduce K + (or K − ).If sgn(I + ) > 0, we decrease the number of component by one, otherwise we stop the algorithm.The computation of the optimal values K + and K − , which could of course be different, can be performed in a parallel fashion.
For a new decreased value, (K + −1), initial GM parameters of EM algorithm could be given by the output parameters at the last iteration of the previous EM algorithm, where we remove the kth component given by The quantity K+ k2=1 MI + (k 1 , k 2 ) presents the mutual information for the component k 1 and denotes whether this component has a significant and independent contribution to the pdf estimation.The biggest positive value has a less and dependent contribution to the GM estimation and should therefore be removed.The proposed Gaussian Mixture based BER estimation using EM algorithm and Mutual Information theory can now be summarized in Algorithm 1. Figure 2 gives the flow chart of the suggested algorithm.

Performance Evaluation
To evaluate the performance of the three methods, we consider the framework of a synchronous CDMA system with two users using binary phase-shift keying (BPSK) and operating over an additive white Gaussian noise (AWGN) channel.We restrict ourselves to the conventional single user CDMA detector.Performance assessment in the case of advanced signaling/receivers is not reported in this paper due to space limitation and is left for future contributions.
With respect to the considered framework, the received L SF × 1 chip-level signal vector at discrete time instant i can be expressed as where L SF denotes the spreading factor, and s k ∈ {±1/ L SF } LSF is the spreading code corresponding to user k.
A k is the amplitude of user k = 1, 2, b (k) i is the information bit value ∈ {±1} of user k at time instant i, and n i ∈ R LSF is the temporally and spatially white Gaussian noise, that is, n i ∼ N (0, σ 2 I LSF ).The a priori probabilities of information bits are supposed to be identical and uniform for both users, that is, π Compute the 2 apriori prob.

Compute the GM parameters
Compute the GM parameters

of GM parameters
Components independent?Components independent?

Verification of optimality of K + :
3 Computation of θ (t) − by using ( 12), ( 13) and ( 14).The decision statistic that serves for detecting user 1 at time instant i is X (1)  i = s 1 r i [23] and is given as,

Verification of optimality of K
Where ρ is the normalized cross-correlation between the two spreading codes s 1 and s 2 , and n (1)  i is the Gaussian noise at the output of the single user detector, that is, n (1)   i ∼ N (0, σ 2 ).The decision about information bit b (1)  i corresponds to the sign of decision statistic X (1)  i , that is, b (1)   i = sgn(X (1)  i ).Note that the soft output X (1)   i in ( 22) contains a mixture of a Gaussian noise.Using (22), we can easily show that the BEP for user 1 is In the following, we use the two spreading codes Where the cross-correlation is ρ = 3/7.We consider the case where the two users have equal powers A 1 = A 2 = 1.The SNR at the output of the MF of each user is therefore SNR = 1/2σ 2 .

Output Pdf Estimation Comparison.
First of all, in this simulation, we would like to compare the three different pdf estimation: Histogram method which leads to MC method, Kernel method and Gaussian Mixture method.In order to make a fair comparison, the three methods are used in optimal conditions.In particular, the length of the bins of the Histogram is chosen equal to the smoothing parameter computed for the kernel method so as the convergence of the histogram in the MSE and IMSE criterion can be guaranteed.For this first simulation, we have chosen a pdf as a mixture of 3 different pdfs according to Gaussian, Rayleigh and Beta first kind laws with fixed different parameters.So the true chosen pdf is Where, . a 1 = 0.50, a 2 = 0.40 and a 3 = 0.10.Figure 3 plots the true pdf with the estimated pdf for the three cases.N = 2, 000 samples has been generated for this first simulation.GM method has found that 4 components are sufficients to estimate the pdf as f GM (x) = 0.27N (x; −1.64, 0.51) + 0.20N (x; −0.38, 0.39) + 0.24N (x; 0.67, 0.11) + 0.29N (x; 1.53, 0.50).
The Integrated Square Error (ISE), which is defined 2 dx, has been computed for this simulation.We have carried out one hundred different trials and computed the Mean ISE (MISE) and the variance of the ISE for the three methods.Table 1 summarizes  these results and shows that Kernel method gives the best estimation of the pdf in the sense of the minimum of MISE.Gaussian Mixture method seems to be the worst one in the MISE criterion.So, for general applications, such as pattern recognition or speech coding, the Kernel method seems to be the best one to choose.For our application, that is, BER estimation, we do not need an accurate estimate of the whole  pdf.We only need an accurate estimate of the tail of the pdf.That is why, in another simulation, we are interested in estimating the area of the tail delimited between, for example, +3 and +∞.In this area, the tail is a mixture of Gaussian and Rayleigh laws.We can easily show that in this case Let us now estimate the quantity A using the three different pdf estimates with one hundred different trials.Table 2 shows the mean and the Standard deviation for the three methods.GM method clearly gives the best estimate of the area computation.This result will be confirmed in the following sub section for BER estimation in the framework of CDMA systems described in the begining of Section 4.

Performance Comparison for the Three Methods.
In order to compare the three methods (MC, Kernel and GM method), N = 2, 000 soft outputs were generated for each SNR to estimate the BER. 1, 000 different trials were simulated, for each method, to compute the Mean, Minimum and Maximum of BER values.The number of EM iterations is fixed to 20.All these results are given in Figure 4.
In addition, the Mean, the Standard deviation with the theoretical value of BER for different values of SNR are given in Table 3.This table shows that GM method provides the best performance in the sense of the minimum of Mean Squared Error of the BER estimation, even if a small bias is observed for medium values of SNR.This bias will completely disappear for increased number of EM iterations as it will be seen in Section 4.3.In the same time, MC technique fails to do so and stops between 8 and 14 dB because of the very limited number of transmitted information bits and lack of errors.The Kernel method leads to a smaller bias but has the greatest standard deviation and then less reliable than the GM method.
On the other hand, for GM method and for a fixed SNR value (10 dB), we have computed the Mean BER and the Standard deviation for different number of soft outputs (between 1, 000 and 100, 000) with 100 different trials.
The number of EM iterations is fixed to 20.All these results are given in Table 4.The precision of the estimation,  defined as the the standard deviation to the mean of BER ratio, does not decrease with the number of samples.This is linked to the observed bias mentioned before.
Figure 5 shows the performance of the receiver, for GM method, with different number of EM iterations and using N = 2, 000 samples.This figure clearly shows that performance of GM method increases with the number of iterations.

Performance of GM Method in the High SNR Region.
We would like to test our suggested algorithm in severe conditions for high values of SNR while using, in the same time, few samples such N = 1, 000 output observations.Figure 6 shows the performance of the receiver (one random simulation), using 50 EM-iterations.We can see that we obtain an unbelievable result, a BER estimate down to 10 −200 has been measured.This result is only limited by the computer precision.To obtain this figure, the simulation run time is 31 seconds on a conventional PC.For each SNR point, the simulation takes less than 2 seconds.This time is computed by using the CPU-time command of Matlab software.The run time does not depend on the value of SNR which is a huge advantage of our suggested method.For 20 EM iterations, the run time takes less than one second for each SNR value.
It is, of course unimaginable and impossible today, to plot this kind of figures using Monte Carlo method and waiting for 100 errors to have an accurate estimate.Indeed, using our computer with Matlab Software, we need 43 milliseconds to generate 1, 000 output observations.Let us now assume that in the world, there are 10 billion inhabitants, that each one has got 10 PC and each PC is 10,000 times more powerful than the one we used for our simulation.Let us also assume that we can use all these PCs in a parallel and optimized structure.A simple calculation shows that we need to wait for more than one year to estimate a BER at only 10 −25 , hoping that there will be no failure or cut electricity.The CPU time for MC method at different values of SNR is given in Table 5.

Comparison with Importance Sampling Technique.
Importance sampling technique, also known as Modified Monte Carlo method, was introduced by Shanmugam and Balaban (see [2]) to estimate error probabilities in digital communications systems.The importance sampling technique is used to modify the probability density function of the noise process in a way to make simulation possible.An estimate of the pdf of soft decision is constructed in a histogram form.The idea is to modify this pdf, that is, the statistical properties of the soft decision sequence, in such a way that higher rate of errors occur in the simulation process.Therefore, the error count has to be modified appropriately to obtain an unbiased estimate of the true error probability.The main drawback of the importance sampling technique is the difficulty, for complex systems, of determining which regions of the pdf to bias and how to bias these regions.
Compared to classical Monte Carlo method, the importance sampling technique, reduces the sample size requirement by a factor ranging from 10 up to 1, 000.Let us assume that this factor is equal to 1, 000 in order to compare importance sampling technique with our suggested GM method.In this case, the CPU time given in Table 5 for Monte Carlo simulation has to be divided by a factor of 1, 000 to obtain the CPU time for the importance sampling technique simulation.Our suggested GM method has still a huge advantage as the run time does not depend on the value of SNR.In fact, for each SNR point, the simulation takes less than 2 seconds (see Section 4.2).Another advantage of our suggested GM method is the possibility to estimate the performance of a system at a very very low BER (down to 10 −200 for a CDMA system case in this paper, see Figure 6).The only limitation is given by the precision of the used computer.

Conclusions
In this paper, we considered the problem of BER estimation for a digital communications system using any transmission technology or channel coding.BPSK modulation is used.The receiver is assumed to be able to compute soft decision.We proposed a BER estimation algorithm where only soft observations that serve for computing hard decisions about information bits are used.First of all, we provided a formulation of the problem where we showed that BER estimation is equivalent to the estimation of conditional pdfs of soft observations corresponding to the transmitted bits equal to ±1.We then proposed a BER computation technique using Gaussian Mixture-based pdf estimation.The Expectation Maximisation (EM) algorithm was used to estimate, in an iterative way, the different parameters of this mixture, that is, the means, the variances and the a priori probabilities.The analytical expression of the BER was therefore simply given by using the different estimated parameters of the Gaussian Mixture.The optimal number of Gaussians was computed using Mutual Information Theory.Finally, we evaluated the performance of the proposed BER estimation technique in the framework of CDMA systems.
Performance comparison with MC techniques and Kernel method was simulated.Interestingly, we showed that while classical MC method fails to perform BER estimation in the region of high SNR, the proposed GM estimator provides reliable estimates and better, in the sense of minimum Mean Squared Error, than Kernel method using only few soft observations.A measure of BER down to 10 −200 has been reached in less than 2 seconds using only 1, 000 soft outputs samples.

Appendices
A. Proof of (12) Proof.Using (7), the conditional Expectation of the log likelihood function can be written as:  The BER estimate given by ( 15) is simply obtained by combining (D.1), (D.2) and (D.3).

Figure 1 :
Figure 1: General transmission scheme for any transmitter and receiver with soft outputs X 1 , . . ., X N and hard decisions b 1 , . . ., b N .

Figure 2 :
Figure 2: Flow Chart for the proposed Fast iterative BER estimation based on EM-Gaussian Mixture and Mutual Information measures.

Figure 3 :
Figure 3: Comparison between the real and the estimated pdfs.Three different methods are used.From top to bottom: Histogram, Kernel, and Gaussian methods.N = 2, 000 Samples are used.The same samples are used for the three methods.

Figure 4 :Figure 5 :
Figure 4: BER Comparison: Three different methods are used.From top to bottom: MC, Kernel, and Gaussian mixture method.N = 2, 000 Samples are used for each simulation.1,000 different trials are performed to compute the Mean, Min, and Max.

Figure 6 :
Figure 6: BER estimated by Gaussian mixture method for very high SNR values and only using N = 1, 000 samples with 50 iterations.See Figure 7 for a zoom of this figure at high BER to see Monte Carlo simulation.

1 E b1 /N 0 Figure 7 :
Figure 7: A zoom of Figure 6 at high BER to see Monte Carlo simulation.

Table 1 :
Mean and Standard deviation of Integrated Square Error pdf estimation for the three methods by using 1,000 different trials.N = 2, 000 samples are used for each trials.

Table 2 :
Mean and Standard deviation of error estimation of A =

Table 4 :
Mean, Standard deviation and precision of BER estimation GM method, for SNR = 10 dB, at different number of samples are used for each simulation.100 different trials are performed to compute the Mean, the Standard deviation and the precision.

Table 5 :
CPU time for Monte Carlo simulation at different BER, assuming using 10 15 computers with equivalent power to the one we used for Gaussian Mixture simulation.