Low-complexity decoding of LDPC codes using reduced-set WBF-based algorithms

We propose a method to substantially reduce the computational complexity of iterative decoders of low-density parity-check (LDPC) codes which are based on the weighted bit-flipping (WBF) algorithm. In this method, the WBF-based decoders are modified so that the flipping function is calculated only over a reduced set of variable nodes. An explicit expression for the achieved complexity gain is provided and it is shown that for a code of block length N, the decoding complexity is reduced from O(N2) to O(N). Moreover, we derive an upper bound for the difference in the frame error rate of the reduced-set decoders and the original WBF-based decoders, and it is shown that the error performances of the two decoders are essentially the same.

In this paper, we propose a method to significantly reduce the computational complexity of WBF-based decoders with a negligible loss in the error performance. Our proposed method, named reduced-set (RS) WBF-based decoding, reduces the complexity of obtaining the flipping function to a great extent and can be applied to all WBF-based decoders. Although simulation results do not show any loss in the error performance, we present an upper bound for the difference between the frame error rate (FER) of WBF-based decoders and their RS counterparts.
The rest of this paper is organized as follows. In the next section, some preliminaries about LDPC codes and WBF-based decodings are reviewed. In Section 3, we present the proposed algorithm to reduce the decoding complexity, followed by complexity and error performance analysis. Simulation results are presented in Section 4, and Section 5 concludes the paper.

Methods/experimental
The research content of this paper is mainly theoretical derivation and analysis, and specific experimental verification will be carried out in a future research.

WBF-based algorithms
In this section, we briefly review some preliminaries about LDPC codes and WBF-based decoders.

Preliminaries
A (d v , d c )-regular LDPC code has a sparse parity-check matrix whose column and row weights are exactly d v and d c , respectively. An LDPC code is irregular if its rows and/or columns have different weights. An LDPC code can be represented by a bipartite (Tanner) graph which consists of two subsets of nodes, namely, variable nodes (or bit nodes) and check nodes. Let c = (c 1 , c 2 , . . . , c N ) be a codeword of a binary LDPC code C of block length N. After BPSK modulation, the transmitted sequence will be x = (x 1 , x 2 , . . . , x N ), with x i = 2c i − 1, i = 1, 2, . . . , N. Assuming an additive white Gaussian noise (AWGN) channel, y = (y 1 , y 2 , . . . , y N ) is the real-valued sequence at the output of the receiver matched filter, where y i = x i + n i , with n i 's being independent zero-mean Gaussian random variables with variance σ 2 . Let z = (z 1 , z 2 , . . . , z N ) be the binary hard-decision sequence obtained from y (i.e., z i = 1 if y i > 0 and z i = 0 if y i ≤ 0). The syndrome vector s = (s 1 , s 2 , . . . , s M ) (2020) 2020:180 Page 3 of 18 is then given by s = zH T , i.e., the syndrome component s m is computed by the check-sum Vector s is zero if and only if all parity-check equations are satisfied and z is a codeword in C.

WBF-based decoding algorithms
The bit-flipping (BF) algorithm is an iterative hard-decision decoding algorithm that computes all the parity-check equations and then flips a group of bits per iteration that is contained in a preset number of unsatisfied check-sums. The weighted bit-flipping (WBF) algorithm improves the performance of the BF decoding by including some reliability measures of the received symbols in their decoding decisions [2]. Reliability of all the parity-check equations are computed via and the flipping function is defined as The WBF decoder first computes the reliability of all the parity-check equations from (2). Next, the decoding algorithm is carried out as follows.
Step 1) For m = 1, 2, . . . , M, compute the syndrome components from (1). Break the algorithm if all the parity-check equations are satisfied (s = 0) or a preset maximum number of iterations is reached. Otherwise, continue.
Step 3) Flip the bit z n for n = argmax 1≤n≤N E n , and go to Step 1.
In what follows, we review several WBF-based methods that improve the standard algorithm. In [12], the modified WBF (MWBF) is proposed, considering not only the reliability of the syndrome sequence for computing the flipping function, but also the reliability information of the received symbol. The flipping function in the MWBF is modified as where the weighting factor a can be determined via Monte-Carlo simulation at different SNRs. Reliability-ratio based WBF (RRWBF) proposed in [13] introduces a new quantity called the reliability ratio R m,n and modifies the flipping function as Lee et al. [14] proposed a new version of the RRWBF algorithm which simplifies the calculation. The flipping function in improved RRWBF (IRRWBF) is given by where T m = n∈N (m) |y n |. In [15], Jiang et al. proposed the improved MWBF (IMWBF) algorithm where in computing the flipping function, the reliability of check-sums involving a given bit should exclude that bit, and the reliability computation in (2) should be revised as w n,m = min i∈N (m)/n |y i |, n ∈ N (m), and the flipping function as For a special class of high-rate quasi-cyclic LDPC codes, Liu-Pados WBF (LP-WBF) [16] and its improved version Shan-Zhao-Jiang LP-WBF (SZJLP-WBF) [17] improve the computation of syndrome reliability and perform even better than the IMWBF algorithm at the high SNR regime. The standard WBF algorithm selects and flips one bit in each iteration. However, to increase the speed of decoding, it can select and flip multiple bits in each iteration. In [20], a threshold adaptation scheme is applied to multi-bit flipping decoding algorithm, where in each iteration, variable nodes with flipping function greater than a pre-defined threshold are selected and flipped. If no flipping occurs, the threshold is reduced and the algorithm continues. A parallel version of IMWBF (PIMWBF) algorithm is proposed in [21] that converges significantly faster and often performs better than IMWBF. The threshold for PIMWBF must be optimized by simulation in each iteration. The proposed multi-bit algorithm in [22] flips multiple bits in each iteration based on a certain threshold that should be optimized by simulation, but the maximum number of bits that are to be flipped in an iteration is restricted. The adaptive-weighted multibit-flipping (AWMBF) algorithm proposed in [23] adjusts the threshold in each iteration as where w H (s) denotes the Hamming weight of the syndrome vector s and E max = max E n , n = 1, . . . , N. The flipping function used in AWMBF is the same as the flipping function proposed for MWBF (i.e., Eq. (4)). In AWMBF, the threshold in each iteration has a closed-form expression and there is no need for time-consuming simulations to determine the optimum thresholds. In this paper, we will use the AWMBF algorithm in simulations for multi-bit flipping decoders.
Recently, a two-bit WBF (TBWBF) decoder was proposed in [24] for the binary symmetric channel (BSC) that produces reliability bits for both the bit-decision results at variable nodes and the syndrome values at check nodes and exchanges the reliability bits between variable and check nodes as the decoding proceeds.

Reduced-set low-complexity decoders
In this section, we propose a method to significantly reduce the computational complexity of all WBF-based algorithms. The complexity of the decoder is also analyzed and an upper bound for its FER is presented.

Proposed algorithm
All of the WBF-based decoders use a flipping function E n to select the bits to be flipped. These decoders compute the flipping function for all variable nodes in each iteration to detect the erroneous bits in the received sequence. As the flipping function calculation requires real-number arithmetic, the computational complexity of WBF-based algorithms is essentially due to this part. The main idea behind our proposed algorithm is to reduce the number of flipping function calculations in each iteration by considering only those variable nodes which are likely to be in error. Denote this set of variable nodes in the lth iteration by A l . In the first iteration, A 1 contains only the variable nodes that are connected to the unsatisfied check nodes. In the next iterations, A l contains the variable nodes that participate in the parity-check equations involving the flipped bits in the last iteration. A l can thus be written as: where B 1 = {m : s m = 0} and B l = m : m ∈ M(n l−1 ) for l ≥ 2. n l−1 is the index of the flipped bit in the (l − 1)th iteration. Note that a variable node might appear in several iterations of the decoding process, and variable nodes in the (l − 1)th iteration are not excluded in the lth iteration. A reduced-set (RS) WBF-based algorithm is summarized below.
Step 1) For m = 1, 2, . . . , M, compute the syndrome components in (1). Break the algorithm if all the parity-check equations are satisfied (s = 0) or a preset maximum number of iterations is reached. Otherwise, continue.
Step 3) Flip bit z n l for n l = argmax n∈A E n . Increase the iteration number l by one and go to Step 1.
The standard WBF algorithm flips one bit in each iteration. In the following remark, the reduced-set single-bit WBF-based algorithm is extended to reduced-set multi-bit WBFbased algorithm.

Remark 1
In multi-bit WBF-based algorithms, the decoder selects and flips multiple bits in each iteration. In the first iteration, the set of variable nodes which are likely to be in error, i.e., the set of variable nodes that are connected to the unsatisfied check nodes, is the same for single-bit and multi-bit WBF-based decoders. Let γ l denote the number of the flipped bits in the lth iteration and n i,l , i = 1, . . . , γ l , denote the index of the flipped bits in the lth iteration. In multi-bit WBF-based algorithms, for l ≥ 2 the set B l is modified as Due to the sparsity of the LDPC parity-check matrix H, the number of bits that participate in each check is small compared to N. So, each erroneous bit causes a small number of unsatisfied check-sums, and for each unsatisfied check-sum, there is a small number of bits that the decoder must decide whether to flip or not. Therefore, even for moderate values of SNR, the set of candidate variable nodes in each iteration constitutes a very small subset of all variable nodes which, in turn, leads to a substantial reduction in the computational complexity of step 2 of the WBF-based decoding algorithms. In the following subsections, we derive explicit expressions for this reduction in complexity and show that the incurred loss in the performance is indeed intangible.

Computational complexity analysis
In this subsection, we obtain the average number of flipping function calculations as a complexity measure of the RS decoding algorithms and show how the computational complexity of any of the WBF-based decoders is substantially reduced using the proposed algorithm. 1 We now present a theorem.
For any of the single-bit and the multi-bit RS decoders, the average number of flipping function calculations in the first iteration (i.e., the average cardinality of A 1 ) is where p 0 is the probability that a bit is received in error and β = 1 For the next iterations, i.e., l ≥ 2, the average number of flipping function calculations for the single-bit RS decoders is given by and for the multi-bit RS decoders it is upper bounded as where γ l is the number of flipped bits in the lth iteration.
Proof We first obtain the cardinality of A 1 , the selected set in the first iteration. As noted in Remark 1, the set of variable nodes that are connected to the unsatisfied check nodes in the first iteration is the same for both single-bit and multi-bit RS decoders. So, the cardinality of set A 1 (i.e., L 1 ) is the same for both single-bit and multi-bit RS decoders.
We define the indicator function I i of the ith variable node as for 1 ≤ i ≤ N. The cardinality of A 1 , denoted by l 1 , is a random variable and can be written as l 1 = N i=1 I i . So, the average number of variable nodes in set A 1 is obtained as and we have The event i / ∈ A 1 occurs when all checks involving the ith bit are satisfied. Let μ m be the event that the mth check involving the ith bit is satisfied. The ith bit participates in d v checks, hence where E denotes the set of all erroneous bits in the received sequence. We assume that the code is 4-cycle free, i.e., no two code bits are checked by the same two parity constraints. This structural property is imposed on almost all LDPC code constructions and is very important to achieve good error performance with iterative decoding [5,25,26]. If there are no cycles of length 4 in the Tanner graph, no two checks share more than one variable node. In other words, if more than one variable node appear at two different checksums, there will be at least one cycle of length 4 in the Tanner graph. On the other hand, in the first iteration, values of variable nodes are received directly from the channel output. Thus, in the first iteration, all variable nodes are independent (as the noise was assumed to be white). Therefore, assuming a 4-cycle free graph, all checks involving the ith bit do not share any other bits, and conditioned on the ith bit all these checks will be independent in the first iteration. Thus, Pr {μ m |i ∈ E} is the probability that the number of erroneous bits participating in the mth check (except the ith bit) is an odd number and is given by Similarly, Pr {μ m |i / ∈ E} = 1 − β. Therefore, Using equations (14), (15) and (19), we have For l ≥ 2, A l contains all the variable nodes that participate in the parity-check equations involving the flipped bit in the last iteration. The number of variable nodes that participate in the parity-check equations involving a given variable node is d v (d c − 1) (see Fig. 1). Single-bit RS decoders flip only one bit in each iteration. Therefore, in this case, the cardinality of set A l for l ≥ 2, will be In multi-bit RS decoders, γ l bits are flipped in the lth iteration, and for each flipped bit in the last iteration, the RS decoder must update d v (d c − 1) + 1 flipping functions. In general, parity-check equations involving flipped bits in the last iteration may have some bits in common. So, the cardinality of the set A l , l ≥ 2, in multi-bit RS decoders is upper bounded as End of Proof. Plotted in Fig. 2 is L 1 versus SNR for (3,6) and (4,32)-regular codes. It is seen that the result of (20) matches the average number of variable nodes in A 1 obtained from Monte-Carlo simulation. If k is the number of iterations required in the decoding process, by using (22), the number of flipping function calculations in multi-bit RS decoders can be upper bounded as 2 Assume that the decoder is in the waterfall region and is able to detect and correct some erroneous bits in each iteration and eventually corrects all of them. Therefore, k l=2 γ l−1 is equal to the number of erroneous bits in the received sequence. For large block sizes, the number of erroneous bits is approximately Np 0 and it can be easily verified that for p 0 1 and large N, we have For single-bit RS decoder, the inequality in Eq. (24) becomes equality (cf. (21) and (22)). From Eq. (24), it can be seen that the computational complexity is linear in the codeword length. This fact was checked by simulation and the results are presented in Table 1. The simulation results for several (3,6)-regular LDPC codes of different codeword lengths are tabulated along with the theoretical results. The parity check matrices of the codes are given in [27], and the SNR is considered to be 6 dB. We observe that both the single-bit and multi-bit RS decoders need essentially the same average number of flipping function calculations, and the derived upper bound for L in (24) is quite tight. As expected, as N increases, the upper bound obtained from Eq. (24) get closer to the simulation results. On the other hand, original WBF-based decoders compute the flipping function for all N variable nodes in each iteration. So, the number of flipping function calculations for WBF-based decoders is approximately kN. So, the ratio of the average number of flipping function calculations for WBF-based and RS decoders-which can be considered as the complexity gain-is lower bounded as (2020) 2020:180 Page 9 of 18

Fig. 2 L 1 versus SNR for the (3,6) and (4,32)-regular LDPC codes
By assuming that the decoder is able to detect and correct one erroneous bit in each iteration, in single-bit decoders, the average number of iterations required to obtain the correct codeword is the same as the number of erroneous bits in the received sequence, i.e., k = Np 0 , and the inequality in equation (25) becomes equality (cf. (21) and (22)). It should also be noted that the complexity gain is higher for a sparser parity-check matrix.
For example, for a (3, 6)-regular code with N = 10 5 and at SNR=6 dB, G c for the singlebit and multi-bit RS decoders is obtained as 3125 and 1279, respectively. Although the complexity gain is smaller for multi-bit RS decoders, it is still significant.

Performance analysis
To evaluate the performance of the proposed RS algorithm and compare it with the original WBF-based decoder, 3 we first note that if A i A i , the selected set by RS decoders, which contains all erroneous bits, both decoders will have the same performance. However, in general, some erroneous bits may happen not to be in the selected set and thus the RS decoders can never detect and correct them. Specifically, an erroneous bit will not be included in A 1 if all parity-checks in which this bit participates are satisfied (i.e., if these checks involve an even number of errors). This bit may never enter A in the next iterations, and so the RS decoder will totally miss it. Therefore, the performance of RSbased decoders will generally be inferior to that of the original decoders. However, in the following theorem, we show that the difference between the FER of original WBF-based decoders P O and RS decoders P RS is indeed negligible.  4 and with X i 's being non-negative integers. The sets θ and θ are defined as Proof Let b andb RS be the transmitted message and the estimated message by the RS decoder, respectively. The FER of the RS decoder can then be written as where E = j i , i = 1, 2, . . . , ε is the set of indices of erroneous bits in the received sequence and A i A i is the selected set of variable nodes in the decoding process. By definingb O as the estimated sequence by the original WBF-based decoder, we have Therefore, using the Bayes rule, By defining P = P RS − P O , we have The event E A 1 is the event that some erroneous variable nodes may not be in the selected set A 1 . The number of erroneous bits ε in the received sequence (i.e., the cardinality of set E) is a random variable with binomial distribution B (ε, p 0 ), i.e., Therefore, we have By defining as the number of erroneous bits participating in checks that involve bit j 1 , we have where To compute Pr j 1 / ∈ A 1 | = θ, ε = ε 0 , we define X i as the number of erroneous bits participating in the ith check that involves bit j 1 . Figure 1 shows an example in which d v = 3, d c = 6, θ = 3 and the erroneous variable nodes are painted gray. It is seen that X 1 = 1, X 2 = 0 and X 3 = 2. Noting that a check is satisfied if an even number of erroneous bits are involved in it, and by defining From the definition of set θ , if d v is an even (odd) number, then P 1 (θ) = 0 when θ is odd (even). Moreover, P 1 (θ) = 0 for θ < d v . Therefore, using (29)-(34), the upper bound of P is obtained as (26), and from (28) the FER of the RS decoders can be upper bounded as End of Proof.
The upper bound presented in Theorem 2 is general and is applicable to both singlebit and multi-bit WBF-based decoders. Indeed, as shown above, the difference between the FER of the original WBF-based decoders and their RS counterparts ( P) is upper bounded by the probability that some erroneous variable nodes may not be in the selected set A 1 in the first iteration (see Eq. (29)), and the set A 1 is the same in single-bit and multi-bit WBF-based decoders.

Remark 2 Noting that P
By changing the order of the summations and modifying their bounds, we have Making the substitution ε 0 = ε 0 − θ − 1 and using n k=0 n k p k (1 − p) n−k = 1, after some simplification, (37) becomes From the above inequality, it is clear that in the high SNR regime P tends to zero at least as p d v +1 0 , and the upper bound will be tighter for a code with a larger degree of the variable nodes.

Results and discussion
In this section, we compare WBF-based and reduced-set (RS) decoders in terms of computational complexity and the probability of error. In the simulations, we use (3,6) and (4, 32)-regular LDPC codes with rates 1 2 and 7 8 , respectively. The parity-check matrix for the (3, 6)-regular code is constructed with the progressive edge growth (PEG) method [28]. For the (4, 32)-regular code, we use the LDPC code considered in [29] for near earth applications which is a quasi-cyclic code. The maximum number of iterations is set to 100 in all simulations. First, an analysis of the computational complexity of the decoders based on the average number of flipping function calculations (L) is presented. Plotted in Fig. 3 is L in the RS decoder versus SNR for the (3,6) and (4, 32)-regular LDPC codes with codeword length 10000 and 8176, respectively. Average number of flipping function calculations obtained by Monte-Carlo simulation for single-bit and multi-bit WBF-based decoders, along with the upper bound of (24) are shown in this figure. As expected, in the high SNR regime, the upper bound becomes quite tight for both single-bit and multi-bit decoders.
In Fig. 4, the average number of flipping function calculations is plotted versus SNR for the RS and original WBF-based decoders. Both single-bit and multi-bit decoders are considered in this figure. As discussed in Section 3.2, the average number of flipping function calculations in single-bit RS and multi-bit RS decoders are almost the same, and this is confirmed by the results obtained by simulation in Fig. 4. It is clearly seen that using the RS algorithm results in about three orders of magnitude decrease in the decoding complexity in single-bit WBF-based decoders and at least two orders of magnitude decrease in the decoding complexity in multi-bit WBF-based decoders. Moreover, this reduction in the complexity is higher for the sparser codes (cf. (25)). It should also be noted that the number of flipping function calculations required in original (non-RS) multi-bit decoders at the medium SNR regime is less than those required in the single-bit decoders, while in the low and high SNR regimes the number of flipping function calculations required in the two decoding algorithms are the same. This behavior can be explained as follows. At low SNRs, neither decoding algorithms are able to correct the errors, so the decoding process continues until the predefined maximum number of iterations is reached, Fig. 3 L versus SNR for the (3,6) and (4,32)-regular LDPC codes with N = 10000 and 8176, respectively and thus the average number of flipping function calculations is the same for singlebit and multi-bit WBF decoders. At intermediate SNRs, the convergence speed of the multi-bit decoding algorithm is higher (i.e., the average number of required iterations is smaller), and therefore, the average number of flipping function calculations for the multibit decoder is lower. At the high SNR regime, either the received sequence is error-free or the number of erroneous bits is very small. In this case, the number of required iterations in the decoding process in the single-bit and multi-bit decoders are almost equal. These results are shown in Fig. 5. In this figure, the average number of required iterations versus SNR is plotted to evaluate the convergence of the original and proposed RS single-bit and multi-bit decoders. As expected, the average number of iterations of the original and RS decoders are nearly identical, i.e., both decoders have similar convergence speeds.
To evaluate the probable performance loss incurred by using the RS decoders (compared to their original WBF-based counterparts), the FER and BER for both the RS and original WBF-based decoders are plotted in Figs. 6, 7, 8, and 9. In these figures, regular (3,6) and (4, 32) LDPC codes with codeword length 10000 and 8176 are employed. In Fig. 6, the simulation results for the FER of the (3,6) and (4, 32)-regular codes for both the RS and original WBF-based decoders, along with an upper bound for the FER of the RS decoder are plotted. In this figure, P O is obtained by Mont-Carlo simulations for both single-bit standard WBF decoder [2] and multi-bit AWMBF decoder [23], and the upper bound is given by equation (35). We observe that both the RS and the original WBFbased decoders have essentially the same performance, and the derived upper bound for the RS decoders are quite tight in both single-bit and multi-bit decoders. As can be seen in Fig. 6, the upper bound of P for (4, 32)-regular LDPC code is tighter than the upper bound for (3, 6)-regular LDPC code, because as discussed in Section 3.3, the upper bound is tighter for a code with a larger degree of the variable nodes (recall that P tends to zero at least as p d v +1 0 ). In Figs. 7, 8, and 9, the error performance of the proposed RS and the original WBF-based decoders are shown. Figures 7 and 8 show the results over the AWGN channel and Fig. 9 over the BSC. In these simulations, we have employed the single-bit WBF, MWBF, IRRWBF, and TBWBF decoders and multibit AWMBF decoder and their RS counterparts. As expected, the error performance in terms of BER and FER of the original decoders and RS decoders are very close.

Conclusion
We proposed a method to reduce the computational complexity of iterative LDPC decoders based on the WBF algorithm. It was shown that the decoder computational complexity is significantly reduced, especially when the code length is large. Our method performs just as well as the existing WBF-based iterative decoding algorithms and the FER and BER of the two decoders are essentially the same. In the proposed method, instead of all variable nodes, the decoder considers only a subset of variable nodes that are potentially erroneous and thus the complexity of the flipping function calculation is significantly reduced. Fig. 8 Performance of (4,32)-regular LDPC code with rate 7 8 over AWGN channel Fig. 9 Performance of regular LDPC codes with rate 1 2 and 7 8 over BSC channel