Low complexity soft demapping for nonbinary LDPC codes
 Alain Mourad^{1}Email author,
 Ottavio Picchi^{2},
 Ismael Gutierrez^{1} and
 Marco Luise^{2}
https://doi.org/10.1186/16871499201255
© Mourad et al; licensee Springer. 2012
Received: 27 April 2011
Accepted: 20 February 2012
Published: 20 February 2012
Abstract
This article focuses on nonbinary wireless transmission, where "nonbinary" refers to the use of nonbinary Low Density Parity Check (LDPC) codes for Forward Error Correction. The complexity of the nonbinary soft demapper is addressed in particular when one nonbinary Galois Field (GF) symbol spreads across multiple Quadrature Amplitude Modulation (QAM) symbols and SpaceTime Block Code (STBC) codewords. A strategy is devised to guarantee an efficient mapping at the transmitter, together with an algorithm at the receiver for low complexity soft Maximum Likelihood demapping. The proposed solution targets a tradeoff between performance and complexity, and removes any restriction on the setting of the GF order, QAM constellation order, and STBC scheme. This makes the nonbinary LDPC codes even more appealing for potential use in practical wireless communication systems.
Keywords
1. Introduction
Nonbinary channel codes (i.e., defined over highorder Galois Field (GF) q > 2) have been researched in the literature to achieve higher error protection than conventional binary codes for transmission over different noisy channels [1–3]. More recently, the European FP7 DAVINCI project [4] has explored the design of innovative nonbinary Low Density Parity Check (LDPC) codes with tailored link level technologies over wireless fading channels, whilst aiming at small added complexity to conventional binary receivers.
The DAVINCI project considers LDPC codes defined over a GF of order q = 64 (denoted as GF(64)). The proposed nonbinary LDPC codes were shown to outperform their binary counterparts, e.g., binary LDPC and (duo) binary Turbo Codes, with higher gains for higher constellation orders and higher coding rates [5]. Moreover, these nonbinary codes were shown to boost the system spectral efficiency when combined with highorder Quadrature Amplitude Modulation (QAM) constellations and MIMO spatial multiplexing [6]. This boosting effect comes from the inherently higher capacity of the singleinput singleoutput (SISO) equivalent channel as seen by the nonbinary code with highorder constellations and multiple antennas [6, 7].
Complexitywise, for high GF order, e.g., q = 64, some relatively low complexity LDPC decoding algorithms have been proposed in [8]. Now, if we consider mapping the encoded symbols onto QAM constellation symbols and SpaceTime Block Code (STBC) codewords, the complexity of the soft demapper at the receiver turns out to represent a real challenge, especially when one GF symbol spreads across multiple QAM constellation symbols and STBC codewords. This can be seen for example in the simple case of GF order q = 64 with 16QAM constellation in SISO (single antenna) transmission, where two GF64 coded symbols (total of 2 × 6 = 12 bits) jointly map onto three 16QAM symbols (total of 3 × 4 = 12 bits). Thus, one of the three 16QAM symbols has to contain coded bits from two GF symbols. This spreading of the GF coded symbols across more than one QAM symbol drastically increases the complexity of the soft demapper, the latter already being more complex than in the binary case (q = 2). This complexity issue may become even more problematic in the mapping of GF coded symbols to STBC codewords. This is particularly true when one GF coded symbol does not fit into exactly one STBC codeword. In order to avoid such complexity, most of the recent studies have been restricted to the configurations where each GF symbol can be individually processed in its mapping onto QAM symbols and STBC codewords [5, 6]. This led in certain cases to nonpractical assumptions, such as 3 × 3 antenna configuration for GF64 with Quadrature Phase Shift Keying (QPSK) and MIMO spatial multiplexing [6].
This article tackles the challenging complexity of the nonbinary soft demapper when the GF symbol spreads across multiple QAM symbols and STBC codewords. The mappings at the transmitter and the soft demapping at the receiver are both considered with the aim to achieve the best tradeoff between performance and complexity. A strategy is devised to guarantee an efficient mapping at the transmitter, together with an algorithm for low complexity soft demapping at the receiver. The proposed algorithm borrows a key finding in [8] which is by feeding the nonbinary LDPC decoder with only a limited number of the highest A Posteriori Probability (APP) values for each GF symbol (with this limited number being much less than the GF order q) one can still achieve very close to the optimal performance whilst reducing significantly the nonbinary LDPC decoding complexity and memory requirements.
The rest of the article is structured as follows. Section 2 describes the system model, and Section 3 follows with the problem statement. Section 4 presents the mapping and demapping solutions proposed. Section 5 shows numerical results to illustrate the performance and complexity of the proposed solutions. Finally, Section 6 draws our conclusions and suggests some perspectives for future work.
2. System model: nonbinary wireless transmission
Key notations used throughout the article
q  GF order (default value q = 64) 

Ω  Alphabet of q GF symbols 
M  QAM constellation order (e.g., QPSK → M = 4; 16QAM → M = 16) 
A  Alphabet of M QAM constellation symbols 
q _{ m }  Number of APP values per GF symbol fed to the decoder (< <q) 
n _{t}  Number of transmitter antennas 
n _{r}  Number of receiver antennas 
Q  Number of QAM symbols mapped onto one STBC codeword 
T  STBC block length (as expressed in MIMO channel uses) 
m _{1}  Minimum integer number of GF(q) symbols which map onto m_{2} MQAM symbols and m_{3} STBC codewords 
n _{1}  Number of GF(q) symbols multiplexing within n_{2} (≤m_{2}) MQAM symbols and n_{3} (≤m_{3}) STBC codewords, (n_{1} ≤ m_{1}) 
m _{2}  Minimum integer number of MQAM symbols which map to m_{1} GF(q) symbols and m_{3} STBC codewords 
n _{2}  Number of MQAM symbols carrying one GF(q) symbol, (n_{2} ≤ m_{2}) 
m _{3}  Minimum integer number of STBC codewords which map to m_{1} GF(q) symbols and m_{2} MQAM symbols 
n _{3}  Number of STBC codewords carrying one GF(q) symbol, (n_{3} ≤ m_{3}) 
2.1 Nonbinary LDPC codes
The nonbinary LDPC codes used are taken directly from DAVINCI project [4]. These codes have been designed with a very sparse parity check matrix. The nonzero elements of the matrix are defined over a GF of order q = 64, denoted by Ω. The variable node degree is fixed to d_{ v } = 2 (optimal when q → ∞ and codeword length → ∞), whereas the check node degree d_{ c } is variable and adapted to the coding rate (i.e., d_{ c } = {4, 6, 8, 12} for rate = {1/2, 2/3, 3/4, 5/6}, respectively). The DAVINCI codes are obtained as regular LDPC codes over the GF Ω following the optimization process described in [9]. At the receiver side, we use a reduced complexity nonbinary decoder based on the Extended MinSum algorithm proposed in [8] for practical hardware implementation of the DAVINCI codes. This low complexity decoder takes only the q_{ m } (q_{ m } < q) highest APP values out of the q  1 values available at the output of the soft demapper. This truncation of the APP values at the input of the decoder reduces significantly the decoder complexity at the cost of slight performance degradation.
2.2 Nonbinary wireless transmission chain
2.2.1 Transmitter operations
Blocks of K GF(q) symbols are then passed to the nonbinary LDPC encoder which generates the nonbinary codeword of length N GF(q) symbols. In order then to map the nonbinary Forward Error Correction (FEC) codeword onto the MQAM constellation symbols, each of the GF(q) symbols in the FEC codeword is first converted back to its binary image of log_{2}(q) bits (using the same primitive polynomial in (1)). The resulting binary stream is then passed to the Mapping module, which is in charge of mapping the GF(q) symbols onto the MQAM constellation symbols and STBC codewords. As highlighted in Figure 1, the mapping function features a novel module referred to as intrablock permutation which permutes/rearranges the bits (per block of m_{1} GF symbols) in the binary stream in accordance with three design rules devised in this article to achieve the tradeoff between performance and complexity. Next to the intrablock permutation, each group of log_{2}(M) adjacent bits of the permuted output stream is mapped onto one QAM constellation symbol. A conventional graymapping is used to produce the stream of complexvalued QAM symbols. The QAM symbols are then directly sent for transmission over the wireless multipath fading channel in the context of a single antenna transmission.
In the context of multiantenna transmission, the stream of QAM symbols undergoes a further step of spatial encoding represented by the STBC encoder depicted in Figure 1. The QAM symbols are arranged in groups of Q symbols, and each group is encoded by the STBC encoder resulting into a STBC codeword V_{ j }of n_{t} × T complex symbols, with n_{t} being the number of transmitter antennas and T the STBC block length. The spatial rate is then given by R_{S} = Q/T. The output stream of the STBC encoder is then transmitted across the multiple antennas through the multipath fading channel.
2.2.2 Receiver operations
where h is the channel fading, x the transmitted symbol per channel use, and v the background noise assumed to be complexvalued Gaussian distributed with zero mean and singlesided variance N_{0}.
The received symbols are deinterleaved at the QAM level, and next fed into the soft demapper, which computes the APP values of all GF(q) symbols in the codeword. The computation of the APP values in the nonbinary case is much heavier than in the case of binary transmission for two reasons: first, each GF(q) coded symbol calls for the computation of (q  1) APP values, and second the computation of each APP value turns out to be particularly complicated whenever one GF(q) symbol is spread across different QAM symbols (see Section 3 for more details). The APP values are fed into the nonbinary FEC decoder, and the decoded GF symbols are finally converted into bits to represent the received binary message.
The received signal model is slightly more complicated with n_{t} × n_{r} MIMO transmission.
where H_{ j }is an n_{r} × n_{t} complex matrix representing the MIMO frequencyflat channel coefficients for the j th STBC codeword V_{ j }.
The received STBC codewords are fed into the socalled Soft Maximum Likelihood (ML) demapper which combines the STBC ML detection and the APP computation. It is noteworthy here that suboptimal linear equalizers may be considered for the STBC detection, which will then be followed in a second step by the APP computation for nonbinary LDPC decoder. Such linear approach was compared to the soft ML demapper in [4, 10], where the latter was shown to significantly outperform the former. It is not in the scope of this article to reproduce such comparison, but rather focus on the complexity reduction of the soft ML demapper with the aim to make it practical for wireless communication systems.
3. Problem statement: mapping and Soft ML demapping
3.1 Mapping of GF(q) symbols onto MQAM symbols
Values of m_{1} and m_{2} for GF(64) to QAM mapping
Constellation  QPSK  16QAM  64QAM 

( m _{ 1 } , m _{ 2 } )  (1, 3)  (2, 3)  (1, 1) 
For example for QPSK, one GF(64) symbol maps onto three QPSK symbols. For 16QAM however, two GF(64) symbols will map onto three 16QAM symbols, and consequently two GF(64) symbols will be spread onto at least one 16QAM symbol. For 64QAM, the mapping is obviously onetoone since both GF(64) and 64QAM symbols are represented by the same number of bits (= log_{2}(64) = 6).
In the sequel, we consider only one vector d^{(n)}and corresponding vector x^{(n)}, and omit the superscript index n for the sake of simplicity. Thus, d and x refer now to two vectors of lengths m_{1} and m_{2}, respectively, associated by the mapping function μ(.). As illustrated in Figures 1 and 2, the mapping function μ(.) features a novel component introduced in this article which is the socalled "intrablock permutation". In this component, the bits in the binary image of the vector of m_{1} GF(q) symbols are permuted (rearranged) in accordance with the design rules proposed in Section 3.1. This is with the aim to achieve the best tradeoff between performance and complexity.
where y_{ j } = h_{ j }x_{ j } + ν_{ j } is the j th received symbol given in (2), h_{ j } is the corresponding equivalent channel coefficient (that we assume perfectly known), x_{ j } = μ_{ j }(d) the j th entry in the vector x of m_{2} QAM symbols mapping onto the vector d of m_{1} GF(q) symbols, and ν_{ j } the noise term. The set ${\mathrm{\Delta}}_{i}^{k}$ includes all configurations of vector d with i th component d_{ i } = α_{ k }, where α_{ k } denotes the k th entry in the GF Ω. The cardinality of ${\mathrm{\Delta}}_{i}^{k}$ is clearly equal to q^{m 11}.
The computational complexity of the soft demapper is of a magnitude order O((q  1) × q^{m 11}× m_{2}) per GF symbol. This reflects an exponential growth with the GF order q whenever the minimum number m_{1} of GF(q) symbols that are spread onto MQAM symbols is strictly greater than 1 (m_{1} > 1). This is the case with 16QAM as given in Table 2.
3.2 Mapping of GF(q) symbols onto STBC codewords
Values of m_{1}, m_{2}, and m_{3} for GF(64) to MIMO (Q = 2) mapping
Constellation  QPSK  16QAM  64QAM 

( m _{ 1 } , m _{ 2 } , m _{ 3 } )  (2, 6, 3)  (4, 6, 3)  (2, 2, 1) 
where SM(.) denotes the MIMO encoder operation, which encodes the stream of QAM symbols into STBC codewords, and ${\u2225..\u2225}_{F}$ is the Frobenius norm. The parameter n_{3} is defined in Table 1 as the number of STBC codewords carrying one GF(q) symbol (n_{3} ≤ m_{3}). The value of n_{3} may then vary from one GF symbol to another in the vector of m_{1} GF symbols and therefore depends on the index i of the GF(q) symbol. The vector d may thus have different bit lengths depending on whether n_{3} is equal to or greater than 1. Furthermore, in order to minimize the computational weight of the APP extraction, we can exploit the maxLogMAP, so that (8) becomes the difference between the maximum sum at the numerator and the maximum sum at the denominator.
Taking into account the inherent matrix multiplication required to compute the distance between STBC codewords V_{ j } and W_{ j } (cf. (3)), the computation complexity of the soft demapper becomes of the magnitude order O((q  1) × q^{m 11}× m_{3} × n_{ r } × Q × T). When m_{1} > 1 different GF(q) are spread into different STBC codewords, and this occurs here for any constellation, unlike the SISO case where it only occurs for 16QAM constellation (cf. Table 2). This emphasizes how problematic the complexity of the soft demapper may become with MIMO transmission, even with simple practical configurations (e.g., Q = 2). The main problem tackled in this article is the reduction of the complexity of the soft demapper when one GF(q) symbol spreads across different QAM symbols and STBC codewords (i.e., m_{1} > 1), without sacrificing the error protection performance.
4. Novel mapping strategy and low complexity soft demapping
Our solution to the problem stated above consists of a mapping strategy at the transmitter side together with an algorithm for low complexity soft demapping when one GF(q) symbol spreads across different QAM symbols and STBC codewords (i.e. m_{1} > 1).
4.1 Mapping strategy at the transmitter
Three rules are introduced hereafter with the aim to achieve the best tradeoff between error protection performance and soft demapper complexity.
First rule: The I or Q component of an MQAM symbol should carry (in part or in full) the binary image of only one GF(q) symbol
This rule naturally applies to the particular case of m_{1} = 1, and can always be met whenever the number of bits per GF(q) symbol log_{2}(q) is an integer multiple of the number of bits per I or Q component log_{2}(M)/2. Otherwise, the rule requires mapping as many I and Q components as possible to binary parts issued from the binary image of only one single GF symbol. This ensures better performance compared to all other schemes not obeying to this rule, as will be proven in Section 5.
Example of four patterns for mapping GF(64) symbols to 16QAM symbols
Number  Mapping pattern (m_{1} = 2, m_{2} = 3)  

I0  Q0  I1  Q1  I2  Q2  
P1  a _{0} a _{1}  a _{2} a _{3}  a _{4} a _{5}  b _{0} b _{1}  b _{2} b _{3}  b _{4} b _{5} 
P2  a _{0} b _{0}  a _{1} b _{1}  a _{2} b _{2}  a _{3} b _{3}  a _{4} b _{4}  a _{5} b _{5} 
P3  a _{0} a _{1}  b _{0} b _{1}  a _{2} a _{3}  b _{2} b _{3}  a _{4} a _{5}  b _{4} b _{5} 
P4  a _{0} b _{0}  b _{1} a _{1}  a _{2} b _{2}  b _{3} a _{3}  a _{4} b _{4}  b _{5} a _{5} 
Second rule: Map as many I/Q components as possible issued from the same GF(q) symbol onto the same STBC codeword
This will ensure a minimum number (n_{3} ≤ m_{3}) of STBC codewords to be considered by the soft demapper for the computation of the APP values of each GF(q) symbol, and so will contribute to the reduction of the complexity of soft ML demapping as proposed in Section 4.2, but to the detriment of limiting the maximum channel selectivity that can be achieved within one GF(q) symbol. This is because ideally by letting each I or Q component issued from one GF(q) symbol map onto different STBC codewords, we create higher chance for these parts of the same GF(q) symbol to experience uncorrelated channel fading. This rule clearly restricts the freedom to let the GF(q) symbol enjoy higher channel selectivity, but fortunately has the advantage of reducing drastically the complexity of the soft ML demapper. This is where the complexity of the soft ML demapper is traded off with the error protection performance of the GF(q) symbols.
Third rule: Under the constraint of the second rule, map the I/Q components issued from one GF(q) symbol onto the transmission units ideally of independent channel fading within the STBC codeword carrying this GF(q) symbol
This rule obviously targets the maximum achievable channel selectivity (i.e., number of independent channel fading) within each GF(q) symbol under the constraint of the second rule. The higher the channel selectivity within one GF(q) symbol is (i.e., the number of independent channel fading affecting the different parts of the GF(q) symbol), the better the error protection performance is expected to be. The margin for this rule to achieve higher channel selectivity order is clearly bound by the second rule.
Example of three patterns for mapping GF(64) symbols to STBC codewords
Number  Antenna number  Mapping pattern (m_{1} = 4, m_{2} = 6, m_{3} = 3)  

I0  Q0  I1  Q1  I2  Q2  
P1  A#1  a _{0} a _{1}  a _{2} a _{3}  b _{2} b _{3}  b _{4} b _{5}  c _{4} c _{5}  d _{0} d _{1} 
A#2  a _{4} a _{5}  b _{0} b _{1}  c _{0} c _{1}  c _{2} c _{3}  d _{2} d _{3}  d _{4} d _{5}  
P2  A#1  a _{0} a _{1}  b _{0} b _{1}  a _{2} a _{3}  b _{2} b _{3}  a _{4} a _{5}  b _{4} b _{5} 
A#2  c _{0} c _{1}  d _{0} d _{1}  c _{2} c _{3}  d _{2} d _{3}  c _{4} c _{5}  d _{4} d _{5}  
P3  A#1  a _{0} a _{1}  a _{2} a _{3}  b _{2} b _{3}  c _{0} c _{1}  c _{4} c _{5}  d _{0} d _{1} 
A#2  a _{4} a _{5}  b _{0} b _{1}  b _{4} b _{5}  c _{2} c _{3}  d _{2} d _{3}  d _{4} d _{5} 
All three patterns in Table 5 follow the first rule by not mixing bits from different GF symbols into the same I or Q component.
Patterns P1 and P3 further obey the second rule by mapping as many I/Q components from the same GF symbol as possible into the same STBC codeword, whilst Pattern P2 does not. For patterns P1 and P3, GF(64) symbols a and d are carried within one single STBC codeword, and GF(64) symbols b and c are mapped onto two STBC codewords. However, for pattern P2, each GF(64) symbol is spread out over all of the m_{3} = 3 STBC codewords. In terms of complexity of the soft demapper, patterns P1 and P3 will enable reduced complexity, whereas the complexity with pattern P2 will be drastically higher, as shown later in Section 4.2.
Now with regard to the third rule, the channel selectivity order (i.e., maximum number of independent channel fading) for pattern P1 is equal to 2 for all GF symbols, a, b, c, and d. This is clear since any of these GF symbols is mapped onto exactly two QAM symbols within only one single STBC codeword, with the first QAM symbol transmitted on the first antenna port and the second QAM symbol transmistted on the second antenna port. For pattern P2 however, the channel selectivity order is higher and equal to 3, since any GF symbol is mapped onto exactly three QAM symbols transmitted within three different STBC codewords, hence ideally subject to three independent channel fading. The last pattern P3 has its channel selectivity order equal to 2 for the edge symbols a and d (since carried in two QAM symbols within one single STBC codeword), whereas it is equal to 3 for the middle symbols b and c (since these are carried in two QAM symbols within two STBC codewords). Amongst all three patterns, only P1 and P3 respect the second rule, but only P3 which further respects the third rule as it attempts to achieve the highest possible channel selectivity order under the constraint of the second rule.
In summary, by obeying all the three rules introduced above, we aim to obtain mapping patterns which ensure the best tradeoff between performance and complexity. This will be further detailed and proven in Sections 4.2 and 5.
4.2 Low complexity soft demapping at the receiver
As highlighted in Equation (6), the soft demapper at the receiver requires two major steps for the computation of the APP values of the GF(q) coded symbols: (i) Euclidean distances computation, and (ii) Marginalization across all possible combinations. The Euclidean distances computation is typically required for ML hard detector. In our case, since soft values are required, the MIMO ML detection and nonbinary soft demapping are combined together into one single function, referred to as soft ML demapping.
First step: Computation of the Euclidean distances
In the decoding of STBC, each received STBC codeword W_{ j } is processed individually in order to obtain its distance to all possible transmitted STBC codewords V_{ j }. In our nonbinary case (q > 2), one GF(q) coded symbol may span more than one STBC codeword. Thus, for the computation of the APP values of one GF(q) symbol, there is a need to store the Euclidean distances of all of the STBC codewords which carry the binary image of the given GF(q) symbol. Thanks to our second rule in the design of the mapper at the transmitter (which limits the number of STBC codewords carrying the binary image of one GF(q) symbol to the minimum possible), only the Euclidean distances of n_{3} ≤ m_{3} STBC codewords are needed. This clearly reduces the memory requirements at the receiver.
Second step: Marginalization across all possible combinations
The marginalization takes the form of a summation in the general case (i.e.. logMap) reflected in Equation (6). Should the Maxlog approximation be used, it takes instead the form of a comparison. The marginalization (or summation) involves the Euclidean distances of n_{3} ≤ m_{3} STBC codewords and the binary subparts of the n_{1}  1 (n_{1} ≤ m_{1}) GF(q) symbols multiplexing with the binary image of the desired GF(q) symbol in their mapping to the n_{2} ≤ m_{2} MQAM symbols and n_{3} ≤ m_{3} STBC codewords.
For the sake of simplicity, let us consider first the case where n_{3} = 1, i.e., the desired GF(q) symbol is mapped onto a single STBC codeword. This is the case of SISO transmission but also applies for instance to MIMO transmission for the edge GF(q) symbols a and d in patterns P1 and P3 in Table 5. Let us focus first on the simple case of SISO transmission with 16QAM as in Table 4 with the straightforward mapping P1 = [a_{0}a_{1}a_{2}a_{3}]; [a_{4}a_{5}b_{0}b_{1}]; [b_{2}b_{3}b_{4}b_{5}]; for m_{1} = 2 and m_{2} = 3. In order to compute the APP values for the first GF(64) symbol a, the Euclidean distances involving the first n_{2} = 2 ≤ 3 QAM symbols are required. For the second GF(64) symbol b, those involving the second and the third QAM symbols are required. For the computation of the APP values of a, a marginalization is required across all of the possible combinations of the subpart b_{0}b_{1} from GF(64) symbol b due to their mix with the subpart a_{4}a_{5} in the second QAM symbol (i.e., [a_{4}a_{5}b_{0}b_{1}]). The number of all possible combinations is clearly equal to 2^{2} = 4. The number of operations per received GF symbol is (q  1) × 2^{2} × 3, a factor 2^{2}/q^{m 11}= 4/64 = 1/16 smaller than the value O((q  1) × q^{m 11}× m_{2}) indicated earlier in Section 3.1. Thanks to the specific mapping where the two edge 16QAM symbols carry information from only one single GF(64) symbol. Similar marginalization is required in the second case of MIMO transmission for the edge GF(q) symbols a and d in patterns P1 and P3 in Table 5.
Example of number of combinations to be considered for APP marginalization
Number of combinations  P2  P3 

GF(64) symbol a  q^{m 11}= 262144  2^{2} = 4 
GF(64) symbol b  2^{6} × 2^{4} = 1024  
GF(64) symbol c  2^{4} × 2^{6} = 1024  
GF(64) symbol d  2^{2} = 4 
Table 6 reflects the huge complexity incurred with mapping pattern P2 (although as said earlier, this pattern achieves the maximum transmit diversity order 3 for all the GF(64) symbols). This confirms the tremendous complexity advantage of the mapping patterns respecting the second rule devised previously. Yet, whilst only 4 combinations are required for the edge symbols a and d, 1024 combinations are required for the symbols in the middle b and c, which is still relatively a high number.
Still, 1024 is a relatively large number causing excessive complexity. To further reduce the number of combinations to a relatively low level, we propose the following algorithm which exploits the correlation existing between GF(q) symbols produced by the code. The algorithm introduces a threshold parameter called N_{ m }. The algorithm proceeds with the following steps:

Step 1: Set the value of N_{ m }. For example, N_{ m }is set to the value 8.

Step 2: For any GF(q) symbol entailing a number N_{ e }of combinations required for marginalization lower than the threshold N_{ m }, obtain the corresponding APP values using an exhaustive search over all N_{ e }required combinations.
○ Example: This applies to the edge symbols a and d in P1 and P3 in Table 5, where the number of combinations required is N_{ e } = 4 < N_{ m } = 8.

Step 3: For GF(q) symbols that multiplex only with symbols falling under step 2, compute the APPs by limiting the combinations associated with the GF(q) symbol from step 2 only to the ones yielding the N_{ m } largest APPs for this symbol.
○ Example: Assume we are transmitting three consecutive GF(256)*symbols α, β, and γ mapped onto two consecutive 64QAM STBC codewords. Then symbols α and γ fall under Step 2, while the APPs for β have to be computed as above. Assuming N_{ m } = 16, then the marginalization over α and β will be carried out by considering only N_{ m } · N_{ m } = 256 terms instead of the 256 · 256 = 65536 terms in the exhaustive search.
*NB: Switching to GF(256) in this example is simply because no such case occurs with our default GF(64).

Step 4: For each remaining GF(q) symbol, not falling under steps 2 and 3, proceed with the following substeps:
○ Step 4.1: Limit the combinations associated with the multiplexing GF(q) symbol from step 2 to the ones yielding the N_{ m } maximum APP values for this multiplexing symbol.
○ Step 4.2: Complete the marginalization of the APPs with respect to the adjacent GF(q) symbol whose APPs are still unavailable with an iterative procedure for a number r of iterations and depending on a parameter N_{ q }. At i th iteration, the marginalization runs across the N_{ q } combinations of the interleaved symbol with the highest N_{ q } APP values. Such combinations are those computed in the previous i  1 iteration of the algorithm. At the initialization stage, the N_{ q } combinations are chosen randomly.
○ Example: Step 4.1 applies to the middle GF(q) symbol b in P1 and P3 in Table 5, where marginalization is required across the interleaved edge GF(q) symbol a. The APP values of the edge symbol a are obtained from step 2. Thus, instead of searching over all the 2^{6} = 64 possible values of symbol a, we only limit the search to the N_{ m } = 8 values of symbol a yielding the highest APP values (thus 8 highest likelihood values). Step 4.2 applies to the middle GF(q) symbol b in P1 and P3 in Table 5, where marginalization is required across the interleaved other middle GF(q) symbol c, whose APP values are not available from step 2. We start considering N_{ q } = 8 randomly selected APP values for symbol c (out of the 2^{4} = 16 values theoretically needed) to obtain the (marginalized) APP values of symbol b. Then, we compute the APP values for symbols b and c, with the marginalization limited to the N_{ q } random values of each. We refine then the choice of the N_{ q } combinations used for marginalization to the ones yielding the highest N_{ q } APP values for symbols b and c. This is repeated for r iterations.
The introduction of the variables N_{ m } and N_{ q } taking values much lower than the total number q of the APP values is mainly inspired from the work originally done by the authors in [8, 11] in their contribution to the FP7 DAVINCI project [4]. The authors of [8, 11] have conducted a thorough analysis of the behavior of the nonbinary LDPC decoder specifically in term of the APP distribution of the GF symbols at the input of the LDPC decoder. The main motivation in [8] is the proposal of suboptimal LDPC decoding algorithm of reduced complexity and less memory requirements for real implementation. A key result found in [8, 11] is that by feeding the nonbinary LDPC decoder with only a limited number N_{ m } of the highest APP values for each GF symbol (with N_{ m } much less than the GF order q) achieves very close to the optimal performance whilst reducing significantly the nonbinary LDPC decoding complexity and APP memory requirements. This finding is borrowed in our algorithm above to reduce the complexity challenge of the soft ML demapper.
Example of reduction of the number combinations for P1 and P3 from Table 6
Number of combinations  Without the algorithm  Prop. Algorithm with r= 0, N_{ m }= 8  Prop. algorithm with r= 3, N_{ m }= 8, N_{ q }= 8 

GF(64) symbol a  64 × 2^{2} = 256  64 × 2^{2} = 256  64 × 2^{2} = 256 
GF(64) symbol b  64 × 2^{6} × 2^{4} = 65536  64 × 6 + 64 × N_{ m } × 2^{4} = 8576  64 × 3 × N_{ m } × N_{ q } + 2 × 64 × 6 = 13056 
GF(64) symbol c  64 × 2^{6} × 2^{4} = 65536  64 × 6 + 64 × N_{ m } × 2^{4} = 8576  
GF(64) symbol d  64 × 2^{2} = 256  64 × 2^{2} = 256  64 × 2^{2} = 256 
Block of m_{1} GF symbols  64 × 2 × (2^{10} + 2^{2}) = 131584  2 × 64 × (2^{2} + 6 + N_{ m } × 2^{4}) = 17664  2 × 64 × (2^{2} + 6) + 64 × 3 × N_{ m } × N_{ q } = 13568 
5. Numerical results
Simulations set up
Modules  Set up 

FEC encoder  DAVINCI NBLDPC codes 
GF order = 64  
Codeword length = 96 symbols = 576 bits  
FEC decoder  Extended MinSum algorithm 
Number of soft values per symbol fed to the decoder = q_{ m } = 16 (highest values)  
Maximum number of decoding iterations = 30  
Constellation  QPSK, 16QAM, 64QAM 
MIMO encoder  2 × 2 antennas configuration 
Uncoded Spatial multiplexing  
STBC codeword length Q = 2  
Channel model  AWGN and Rayleigh channels 
Soft demapper  Soft ML demapping 
Proposed low complexity algorithm with N_{ m } = 8, N_{ q } = 4 to 16, number of demapping iterations r = 0 to 3 
As illustrated in Figure 4, for QPSK and 64QAM, where m_{1} = 1 (cf. Table 2), there is no significant difference between the arbitrary and the proposed mapping patterns, since inherently here only one GF(64) symbol maps onto three QPSK symbols or one 64QAM symbol. However, for 16QAM, where m_{1} = 2 and m_{2} = 3 (cf. Table 2), two GF(64) symbols are mapped onto the same mapping onto one 16QAM symbol, and here the results show clear SNR gain of 0.5 dB for the mapping respecting the first design rule as compared to a pattern not respecting this rule, hence validating the merits of this rule. It is noteworthy here that at this stage, there is no issue of tradeoff between performance and complexity (this will come later when considering the second and third design rules proposed).

The first curve in black circular marker gives the number of operations when an exhaustive search with pattern P2 is performed.

The second curve in red circular marker gives the number of operations when an exhaustive search with pattern P1 or P3 is performed.

The third curve in blue downwards triangular marker shows the number of operations using the proposed algorithm without the iterative step 4 (i.e., simply replace substep 4.2 by an exhaustive search).

The fourth curve in green with diamond markers considers the iterative step 4 of the proposed algorithm with r = 3 iterations and N_{ q }= 10.
From Figure 5, we can first clearly appreciate the huge reduction in complexity (cf. gap between first curve using P2, and the other curves using P1 and P3). This clearly validates the merit of our second rule from the complexity perspective, where patterns P1 and P3 respect this second rule, but not pattern P2. Moreover, from Figure 5, we can also clearly appreciate the significant reduction in complexity (cf. gap between second curve, and third and fourth curves) brought by the use of the proposed algorithm (with and without iterations) as compared to the exhaustive search. The reduction in complexity clearly decreases when increasing the threshold N_{ m }. For a typical value of N_{ m } = 8, we can appreciate nearly one decay (i.e., a factor of 10) complexity reduction; thanks to the proposed algorithm.
From Figure 6, we first compare the performance gap between patterns P1 and P3 with the exhaustive search used in both. This is in order to appreciate the tradeoff in performance due to the second rule and the merits of the third rule. The performance gap between P1 and P3 is almost 0.25 dB, when P1 has a constant channel selectivity order equal to 2 and P3 has an average channel selectivity order equal to 2.5 (it is equal to 2 at the edge symbols and 3 at the middle symbols). As mentioned previously, both patterns P1 and P3 respect the second rule, but only P3 respects the third rule. Hence, from this comparison, the merit of the third rule is clearly appreciated (approximately 0.25 dB SNR gain) at the same level of complexity. The same performance gap is expected between patterns P2 and P3 (although as said before simulations with pattern P2 are not feasible since it breaches the second rule). This expectation is motivated by the fact that the gap in channel selectivity order between P2 and P3 is equal to 0.5, which is the same gap between P3 and P1 (PS: the average channel selectivity order is equal to 3, 2.5, and 2, respectively, for patterns P2, P3, and P1). Hence, the penalty in performance of the second design rule is expected to be around 0.25 dB, compared to a pattern P3 respecting the second and third design rules, and 0.5 dB compared to a pattern P1 respecting the second rule but not the third rule.
Now let us compare the performance of both patterns P1 and P3 when using the proposed soft demapping algorithm. From Figure 6, for both patterns P1 and P3, we do not notice any appreciable degradation when using the proposed algorithm with threshold N_{ m } = 8, and without using the iterative process, compared to when using the exhaustive search. This is an important result as it shows the potential of the proposed algorithm to reduce the complexity by tenfold without practical degradation in the FER performance. Further reduction of the complexity by means of the iterative process for example, does degrade the FER performance. The degradation of the iterative process in the waterfall region at target FER of 10^{2} appears tolerable (up to 0.5 dB), whilst the degradation in the error floor region appears significant. This reflects the tradeoff someone can obtain between FER performance and further reduction of the complexity with the iterative process.
Used patterns for mapping GF(64) symbols to STBC codewords (64QAM)
Number  Antenna number  Mapping pattern (m_{1} = 2, m_{2} = 2, m_{3} = 1)  

I0  Q0  
P1  A#1  a _{0} a _{1} a _{2}  a _{3} a _{4} a _{5} 
A#2  b _{0} b _{1} b _{2}  b _{3} b _{4} b _{5}  
P2  A#1  a _{0} a _{1} a _{2}  b _{0} b _{1} b _{2} 
A#2  a _{3} a _{4} a _{5}  b _{3} b _{4} b _{5} 
Example of reduction of the number combinations for 64QAM
Number of combinations  Without the algorithm  Prop. algorithm with r= 3, N_{ q }= 20  Prop. algorithm with r= 3, N_{ q }= 24 

GF(64) symbol a  64 × 2^{6} = 4096  64 × 3 × N_{ q } + 2 × 64 × 6 = 4608  64 × 3 × N_{ q } + 2 × 64 × 6 = 5376 
GF(64) symbol b  64 × 2^{6} = 4096  
Block of m_{1} GF symbols  2 × 64 × (2^{6}) = 8192  64 × 3 × N_{ q } + 2 × 64 × 6 = 4608  64 × 3 × N_{ q } + 2 × 64 × 6 = 5376 
From Figure 7, we can first appreciate a gain of nearly 0.8 dB for pattern P2 as compared to P1. This confirms further the potential of the third rule in achieving much higher diversity. Second, with pattern P2, we can clearly appreciate a slight degradation in performance nearly 0.2 dB when using the proposed low complexity iterative demapping algorithm with N_{ q } = 24 (35% complexity reduction). The degradation becomes higher 0.5 dB for N_{ q } = 20 (44% complexity reduction). So clearly, there is a tradeoff between the tolerable FER performance degradation and the target complexity reduction, and the proposed mapping strategy and low complexity demapping algorithm provide the tools to achieve the tradeoff desired.
6. Conclusions
In this article, we have addressed the particular complexity challenge of the soft ML demapping faced with nonbinary LDPC codes when one GF(q) symbol spreads across multiple QAM symbols and STBC codewords. A solution is proposed combining a mapping strategy based on three design rules at the transmitter, and a low complexity soft ML demapping algorithm at the receiver.
At the transmitter side, the mapping strategy introduced three design rules to achieve the best tradeoff between performance and complexity. In the first rule, the I or Q component of an MQAM symbol should carry (in part or in full) the binary image of only one GF(q) symbol. This rule was shown to bring an SNR performance gain of approximately 0.5 dB compared to mapping patterns not respecting this rule. In the second rule, the I/Q components issued from one GF(q) symbol are carried into the minimum possible number of STBC codewords. This second rule clearly restricts the freedom to let the GF(q) symbol enjoy higher channel selectivity, but fortunately has the advantage of reducing drastically the complexity of the soft ML demapper. In the third rule, the I/Q components issued from one GF(q) symbol are mapped onto the transmission units which ideally can experience independent channel fading within the STBC codeword carrying this GF(q) symbol. This third rule aims at exploiting the last degree of freedom left by the binding second rule to achieve high channel selectivity within the GF(q) symbol. With mapping patterns respecting the second rule, it was shown that a tenfold complexity reduction can be achieved compared to patterns not respecting this second rule. The tradeoff in performance was shown to be small, 0.25 and 0.5 dB performance degradation for the patterns respecting the second rule with and without the third rule, respectively.
At the receiver side, an algorithm was proposed to reduce the complexity of the soft ML demapper. The algorithm exploits the correlation existing between GF(q) symbols but also any knowledge available on the APP values of the GF(q) symbols in the vector of m_{1} GF(q) symbols, which map together onto the vector of m_{2} MQAM symbols and further on onto the vector of m_{3} STBC codewords. The algorithm also considers only a limited number of potential combinations for each GF(q) symbol, those associated with this same limited number of highest APP values for this symbol. This latter consideration has been inspired from the original work done by the authors of [8, 11] to reduce the complexity of the nonbinary LDPC decoder. Our proposed algorithm was shown to further reduce the complexity of the soft ML demapper by up to 85%.
The proposed solution mitigates the complexity challenge at the receiver faced with nonbinary LDPC codes when one GF(q) symbol spreads across multiple QAM constellation symbols and STBC codewords, at the expense of a slight performance degradation but not sacrificing the performance merits of nonbinary LDPC codes. This removes any restriction on the size of the GF order, QAM constellation order, and MIMO scheme, whilst preserving the merits of nonbinary LDPC codes at very reasonable receiver complexity. Future work will be focused on the assessment of the MIMO schemes which are best suited for combination with GF(64) nonbinary LDPC codes, with the ultimate goal of proposing nonbinary LDPC codes for beyond 4 G wireless communication systems.
Declarations
Acknowledgements
This study was supported by INFSCOICT216203 DAVINCI "Design And Versatile Implementation of Nonbinary wireless Communications based on Innovative LDPC Codes" http://www.ictdavincicodes.eu funded by the European Commission under the Seventh Framework Programme (FP7) [4].
Authors’ Affiliations
References
 Davey M, MacKay DJC: Low density parity check codes over GF(q). IEEE Commun Lett 1998, 2(6):165167. 10.1109/4234.681360View ArticleGoogle Scholar
 Huang J, Zhu J: Linear time encoding of cycle GF(2^{p}) codes through graph analysis. IEEE Commun Lett 2006, 10(5):369371. 10.1109/LCOMM.2006.1633326View ArticleGoogle Scholar
 Declercq D, Fossorier M: Decoding algorithms for nonbinary LDPC codes over GF(q). IEEE Trans Commun 2007, 55(4):633643.View ArticleGoogle Scholar
 INFSCOICT216203 FP7 DAVINCI project[http://www.ictdavincicodes.eu]
 Gutierrez I: Final proposal for IMTAdvanced systems. FP7 DAVINCI deliverable D2.1.4 2009.Google Scholar
 Pfletschinger S, Declercq D: Getting closer to MIMO capacity with nonbinary codes and spatial multiplexing. In Proceedings of IEEE Globecom. Miami (USA); 2010:15.Google Scholar
 Guo F, Hanzo L: Low complexity nonbinary LDPC and modulation schemes communicating over MIMO channels. In Proceedings of IEEE VTC. Los Angeles (USA); 2004:12941298.Google Scholar
 Boutillon E, CondeCanencia L: Bubble check: a simplified algorithm for elementary check node processing in extended minsum nonbinary LDPC decoders. IET Electron Lett 2010, 46(9):633634. 10.1049/el.2010.0566View ArticleGoogle Scholar
 Poulliat C, Fossorier M, Declercq D: Design of regular (2, dc)LDPC codes over GF(q) using their binary images. IEEE Trans Commun 2008, 56(10):16261635.View ArticleGoogle Scholar
 Picchi O, Mourad A, Gutierrez I, Luise M: On the performance of nonbinary LDPC with MIMO in practical systems. In IEEE International Symposium on Wireless Communication Systems (ISWCS). Aachen (Germany); 2011:452456.Google Scholar
 Singh A, AlGhouwayel A, Masera G, Boutillon E: A new performance evaluation metric for suboptimal iterative decoders. IEEE Commun Lett 2009, 13(7):513515.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.