All simulations were performed in MATLAB (The Mathworks, Natick, MA). Information bits were generated with equal probability with a pseudorandom number generator before coding and modulation. We implemented the coding strategy defined by the Consultative Committee for Space Data Systems (CCSDS), consisting of serial concatenated convolutional codes (SCCC) at the transmitter and a turbo decoder at the receiver [31]. Various puncturing strategies are used to define different code rates. A symbol rate of 250 Msym/s was used for all simulations. After modulation, the signal was synthesized with four times oversampling and a root-raised-cosine transmit filter with a rolloff of 0.35. Complex Gaussian noise was added to simulate thermal noise before the signal was processed by the receive modules.

We compare two demappers: (i) a standard demapper, which assumes circularly symmetric noise with a constant covariance across constellation points and (ii) the proposed covariance-based demapper. Both demappers account for the mean of the symbol-dependent noise distribution using estimates of the cluster centroids. However, the proposed demapper also utilizes the covariance of each symbol cluster. Since the proposed demapper is a generalization of the standard APSK demapping algorithm, this comparison allows us to quantify the improvement obtained by using the covariance information. The means and covariances are estimated using 120 known pilot symbols per constellation point. Using CCSDS frames, this represents a 6% reduction in data rate for a 64 order modulation. Simulations below processed 50 frames (∼10^{7} bits) through the full system model, including distortions, using different demapping algorithms.

Details specific to the two distortion scenarios are elaborated below.

**Scenario 1: nonlinear amplifier**

A memoryless model was used to simulate a nonlinear power amplifier [22], where the model parameters were fit to data measured from an X-band TWTA. The level of distortion is controlled by the amplifier backoff, which is the average power of the input signal relative to the amplifier saturation point. A complex-valued polynomial with order 9 was used for predistortion, extracted using an indirect learning strategy with least squares [32]. We consider 64-APSK and 128-APSK constellations. Since the CCSDS standard is limited to modulation orders up to 64, we extended the CCSDS framework to include a 128-APSK constellation from the DVB-S2X standard [33]. The symbols are distributed in four or five constellation rings. There is no phase noise in this scenario.

**Scenario 2: phase noise**

A linear transmission channel is used, but phase noise is considered at the receiver. Phase noise was simulated at baseband using the phase noise mask from the receiver link in the DVB guidelines for the professional service scenario [34]. Different levels of phase noise were simulated by shifting the entire mask to achieve a given level (in dBc/Hz) at a frequency offset of 100 Hz from the carrier frequency [24]. Rectangular 64-QAM and 128-QAM schemes were simulated.

### Maximum achievable rate

Independent of the demapping algorithm, the maximum achievable rate for the reduced channel model in Section 2.3 is given by the mutual information between the received and transmitted symbols. For a uniform input distribution, the mutual information is

$$\begin{array}{*{20}l} I(Y;X) = \frac{1}{M}\sum_{x\in\mathcal S}\int p(y|x) \log\frac{p(y|x)}{p(y)}dy. \end{array} $$

(16)

The mutual information could be computed efficiently using an approximation to the entropy of a Gaussian mixture, e.g., [35], although a loss in accuracy is expected for high modulation orders. Instead, we calculate the maximum rate by numerical integration of (16) over \(\mathbb R^{2}\) using a grid of 1000×1000 points. The integrand in (16) can be easily evaluated since the PDFs are known from (10). This maximum rate is independent of the demapper and serves as a benchmark for different demapping algorithms.

The achievable rate using a specific demapper can be computed from the mutual information between the extrinsic *L* values in (9) and the transmitted bits for the equivalent bit channels depicted in Fig. 3. In this case, the rate was calculated using the area property of the extrinsic information transfer (EXIT) chart [36]. Specifically, a binary erasure channel was simulated as the extrinsic channel to provide varying amounts of a priori information to the demapper. For a given SNR and distortion level, the reduced channel model was simulated with ∼10^{7} bits to compute the demapper’s EXIT function using the extrinsic *L* values in (9). The function was evaluated at 100 levels of a priori information between 0 and 1. The achievable rate for the demapper was computed with the mean (area) of the resulting EXIT function [36].

Figure 5 illustrates the maximum achievable rate for the proposed covariance-based demapper and the standard demapper. The nonlinear channel in scenario 1 was simulated with an amplifier backoff of 3 dB for two modulation schemes at various SNR levels. The achievable rate for the proposed demapper matches the theoretical maximum derived from the symbol-level channel model, demonstrating the optimality of the demapper. In contrast, the achievable rate of the standard demapper is reduced when the SNR is high. At high SNR, the symbol scattering is directional and mainly due the nonlinear distortion so knowledge of the covariance matrices is crucial to achieve maximum performance. It is also evident that there is a larger gain for the higher-order 128-APSK modulation. Note that the maximum rate for both modulation schemes is below the corresponding number of bits, even for infinite SNR, due to the fixed level of distortion.

To examine the effect of the distortion level on the demapper performance, we calculated the mutual information for SNR values between 5–30 dB and distortion levels defined by input powers between -10 and -2 dB. We define the gain as the difference in the achievable rate between the proposed demapper and the standard demapper. Figure 6 plots the achievable rate gain for a 128-APSK modulation scheme for different pairs of SNR values and distortion levels. We see that there is a gain in regions with high distortion and high SNR. This is expected since the symbol distributions are less isotropic, captured by the covariance of each cluster. We refer to this region as “distortion dominated.” Conversely, regions with low SNR or low distortion have relatively circular and constant symbol covariances, so the gain is minimal. We refer to this region as (thermal) “noise dominated.” The maximum gain is 0.16 bits per channel use, obtained when there is severe distortion and minimal noise.

Very similar plots for the case of phase noise in scenario 2 were generated (not shown for space limitations), where we consider the AWGN as thermal noise and phase noise as distortion. Analogous to the nonlinear channel, a large amount of phase noise creates very asymmetric covariances, which become the dominate source of error for sufficiently high SNR.

We remark that this analysis is ideal in the sense that the reduced channel model perfectly represents the received symbols. In a practical system, the bivariate Gaussian is an approximation of the effects of several communication components. We examine this approximation in more detail in the Appendix. In the next sections, we demonstrate that the superior performance of the proposed demapper is maintained for practical systems with non-Gaussian PDFs.

### Bit error rates

We have shown a substantial gain in the maximum achievable rate when operating in a distortion dominated regime, where the symbol noise varies across the constellation. In this section, we examine how this gain translates to system performance by computing the bit error rate (BER) for different coding schemes.

As an initial experiment, we compute the BER using the proposed and standard demappers to make hard decisions about the transmitted bits. The BER in Fig. 7 is for an uncoded system under scenario 1 with a nonlinear amplifier at input power of -5 and -3 dB for two different constellations. The proposed demapper incorporating symbol covariances lowers the uncoded BER in distortion dominated regions, when the SNR is high. The BER increases for a denser constellation (e.g., 128-APSK) or a higher level of distortion (input power of -3 dB). It is also clear that an “error floor” exists for each scheme, due to the fixed level of distortion.

The uncoded simulations reenforce the common theme that demapping using the symbol covariance is advantageous for high SNR when the system is dominated by distortion. However, when error-correction is used, the threshold of the code also plays a significant role. Figure 8 illustrates the BER of a coded system for the same setup as Fig. 7. Notice that there is minimal improvement in the BER for 64-ASPK modulation with coding, despite a large reduction in BER for the uncoded case. To explain this, imagine decoding with a hard decision demapper. Although the proposed demapper reduces the errors at high SNR, the code can already correct for this number of errors (in the order of 10^{−2}), so further reduction is unnecessary. In other words, the information gain from the proposed demapper occurs at SNR values above the threshold of the code. Conversely, higher order schemes, such as 128-APSK, inherently require a higher SNR to achieve the same level of error, so the proposed demapper becomes beneficial. Thus, the performance of the coded system will be substantially improved for 128-APSK modulation with covariance-based demapping, as shown in Fig. 8.

The performance of the entire system with the proposed demapper depends on the modulation scheme, level of distortion, and the SNR threshold of the code. Figure 9 presents the required SNR to achieve a BER of 10^{−6} when error-correction is used for the nonlinear amplifier in scenario 1. As the nonlinear distortion increases, the covariance-based demapper provides more accurate *L* values, which improves the decoder performance so it can operate at reduced SNR compared to the traditional demapper. The reduction in SNR is more pronounced as the spectral efficiency increases, e.g., by increasing the code rate or modulation order. This highlights the interplay between the quality of the demapper output and the errors after decoding. It is also worth noting that the required SNR increases smoothly with the distortion level, which acts to balance the increased uncertainty due to nonlinear distortion by reducing the uncertainty due to thermal noise.

Figure 10 plots the required SNR to achieve a BER of 10^{−6} for a coded system with phase noise, described in scenario 2. When we consider the phase noise as a distortion, separate from thermal noise, all the trends above are maintained. This demonstrates the utility of the proposed demapper for any system that can be modeled with data-dependent noise.

### Effect of the number of pilot symbols

The robustness of the demapper to imperfect mean and covariance estimates is investigated by varying the number of pilot symbols. We calculated the SNR required to achieve a BER of 10^{−6} with a varying number of pilots for the coded system with the nonlinear amplifier in scenario 1. Figure 11 demonstrates that increasing the number of pilots leads to improved performance until the required SNR reaches a plateau, at which point further improvements in the mean and covariance estimates do not improve the BER performance. For 128-APSK modulation, more pilots are required to reach optimal performance, due to the increased number of parameters to be estimated. Apart for a very low number of pilots, the performance is relatively stable and a graceful performance degradation is observed for decreasing number of pilots, demonstrating a reasonable robustness to inaccurate covariance estimation.

### Iterative demapping and decoding

In this section, we demonstrate that the proposed demapper can also benefit applications with iterative demapping and decoding [7]. Iterative demapping methods are strongly dependent on the chosen bit labeling, e.g., a Gray mapping exhibits little performance gain with joint demapping and decoding [37]. To provide a suitable example, we consider a coded system with phase noise described in scenario 2, with a 128-QAM constellation and a randomly generated bit labeling. With this bit labeling, the demapper will benefit from a priori information passed back from the decoder each iteration. We use the EXIT chart [36] to visualize iterations between the demapper and decoder in Fig. 12. The simulation was conducted close to the threshold of the turbo code, with an *E*_{
s
}/*N*_{0} of 25 dB, and phase noise of −43 dBc/Hz at a frequency offset of 100 Hz. The standard demapper generates less accurate *L* values, and consequently, the mutual information captured in the EXIT function is lower. This results in the iterative receiver terminating when the demapper’s EXIT function intersects with that of the decoder (Fig. 12a). The proposed demapper, on the other hand, has a higher EXIT function, which allows for successful decoding using the a priori information generated from previous iterations (Fig. 12b). Successful decoding is achieved after two iterations. It is interesting to note that the turbo code is suboptimal in this example and below the maximum achievable rate, which could be obtained by designing a code/decoder to match the EXIT function of the demapper [38]. Nonetheless, this example demonstrates a clear advantage of the proposed demapper for iterative demapping and decoding. As with all joint demapping approaches, however, the performance gain will depend on the bit labeling and the particular coding scheme.

### Computational complexity

The proposed demapper using a bivariate Gaussian model requires more computation than the standard demapper using a circularly symmetric Gaussian model. The likelihood computation in (10) requires a squared Mahalanobis distance of the form (*y*−*μ*)^{T}*Σ*^{−1}(*y*−*μ*). Assuming precomputed constants, the likelihood calculation requires seven multiplications. For a circularly symmetric Gaussian, this reduces to a Euclidian distance and the likelihood requires four multiplications. However, the extra computational cost must be placed in context of the overall demapper and the receiver in general.

We conducted performance tests for our MATLAB implementation using both models, and there was less than 1% increase in execution time for demapping of 100 frames using the bivariate Gaussian model compared to the circularly symmetric Gaussian model. The cost of the extra multiplications is not significant compared to the other calculations required for demapping, e.g., the products and exponentials in (9). The demapper also represents a fraction (< 2*%*) of the computational cost for the whole receiver module, where the turbo decoder consumes much more of the execution time. This demonstrates that the improved performance of the proposed demapper is at the expense of only a mild increase in computation.