Skip to main content

Reduced-complexity decoding implementation of QC-LDPC codes with modified shuffling


Layered decoding (LD) facilitates a partially parallel architecture for performing belief propagation (BP) algorithm for decoding low-density parity-check (LDPC) codes. Such a schedule for LDPC codes has, in general, reduced implementation complexity compared to a fully parallel architecture and higher convergence rate compared to both serial and parallel architectures, regardless of the codeword length or code-rate. In this paper, we introduce a modified shuffling method which shuffles the rows of the parity-check matrix (PCM) of a quasi-cyclic LDPC (QC-LDPC) code, yielding a PCM in which each layer can be produced by the circulation of its above layer one symbol to the right. The proposed shuffling scheme additionally guarantees the columns of a layer of the shuffled PCM to be either zero weight or single weight. This condition has a key role in further decreasing LD complexity. We show that due to these two properties, the number of occupied look-up tables (LUTs) on a field programmable gate array (FPGA) reduces by about 93% and consumed on-chip power by nearly 80%, while the bit error rate (BER) performance is maintained. The only drawback of the shuffling is the degradation of decoding throughput, which is negligible for low values of \(E_b/N_0\) until the BER of 1e−6.


Forward error correction (FEC) methods are one of the vital elements of next-generation wireless networks tasked to provide the required level of reliability. Nevertheless, powerful capacity-achieving FEC techniques like low-density parity-check (LDPC), turbo or polar codes have the downside of higher complexity and power consumption compared with traditional coding techniques. Among these codes, LDPC codes have been incorporated into several previous technologies and are seen as the potential candidate for the new standards like Fifth Generation (5G) and IEEE 802.11ax.

Belief propagation (BP) algorithm is commonly used as the decoding method for LDPC codes, as the parity-check matrix (PCM) of these codes is sparse and their Tanner graph representation lacks short cycles of length 4. One important issue with regard to BP decoding is the schedule of the algorithm, i.e., the order in which the reliability messages are exchanged between the nodes of the Tanner graph. BP schedule is directly associated with the implementation architecture of the decoding method, and it falls into three main categories:

  1. 1

    Fully parallel architecture which is realized by flood schedule [1] in which all the variable nodes (VNs) and check nodes (CNs) in the Tanner graph pass messages concurrently to their neighbors in every iterations of the algorithm. Although yielding a high throughput, this schedule requires a large silicon area with high interconnect complexity [2]. This architecture is facilitated due to the inherent parallelizable feature of BP algorithm, which is in contrast to turbo codes whose decoding algorithms are inherently serial. However, several works have devised fully parallel architectures for decoding turbo codes [3,4,5,6].

  2. 2

    Serial architecture in which a smaller number of functional units are re-used several times to perform each decoder iteration. In this way, decoding complexity is lowered, although at the price of reduced decoding throughput.

  3. 3

    Partially parallel architecture which is a good trade-off between hardware complexity and decoding throughput and it is best accomplished by layered decoding (LD) schedule [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33].

In LD schedule, the rows of PCM are divided into a number of layers, and each iteration of the BP algorithm is likewise split into the same number of sub-iterations. Each sub-iteration runs over one layer of the PCM, during which the CNs of that layer exchange reliability messages with their neighbor VNs. At the end of a sub-iteration, updated reliability messages are delivered to the next layer. Accordingly, in each sub-iteration, only a subset of CNs, i.e., as many as the number of rows in each layer, participate in the decoding process, causing a reduced hardware utilization of LD compared to flood schedule. Furthermore, LD schedule achieves a better convergence performance than the flood schedule due to the fact that the latest variable-to-check (VTC) messages are always used to update the check-to-variable (CTV) messages during a sub-iteration.

For the sake of LD complexity, it is highly desirable that the number of ones in each column of a layer be either one or zero. Quasi-cyclic LDPC (QC-LDPC) codes have inherently a layered structure with such a property, thus making them an appropriate candidate for LD. QC-LDPC codes are a special type of LDPC codes possessing a cyclic property which simplifies the encoding and decoding process of them, while preserving comparable performance to random (or unstructured) LDPC codes [34, 35].

Related work and contributions

Shuffling idea proposed in [15] shuffles the rows of the PCM of a QC-LDPC code prior to decoding, in the sense that the order of the rows of the PCM is totally changed. After shuffling, each layer can be produced by circulating its above layer one symbol to the right, leading to a simplified LD and sped-up convergence rate. In particular, due to the cyclic property, it is enough to realize only the first layer of a PCM in hardware rather than the whole PCM. The downside of this shuffling is that it may spoil the primary property of single weight columns in the PCM.

To workaround this shortcoming, we outlined a modified shuffling idea in our previous work [36] which results in a shuffled PCM that retains the desired property of single weight columns, and possesses the cyclic property too. This was accomplished by introducing a set of offset values prior to performing the shuffling. In this paper, this modified shuffling idea is further investigated. To be specific

  1. 1

    The logic behind offset values applied for shuffling is clarified, aiming to elaborate how the offset values come into effect. The procedure for determining the offset values is also outlined.

  2. 2

    Since [36] lacks implementation results to verify the improvements promised by the modified shuffling method, we provide in this work the implementation results for LD of several QC-LDPC codes when shuffled with the proposed technique. Improvements in terms of number of occupied look-up tables (LUTs) on a field programmable gate array (FPGA) and also power consumption are observed when compared with the case of non-shuffled LD. These improvements are achieved without sacrificing bit error rate (BER) performance. Although the decoding throughput deteriorates as \(E_b/N_0\) rises, our analysis shows that if BER of 1e−6 is chosen as the target, throughput degradation will be insubstantial.

The organization of the paper is as follows. Section 3 presents necessary fundamentals of QC-LDPC codes and LD. Section 4 is devoted to assessment of the novel shuffling method and its attributes. Implementation and simulation results together with necessary analysis come in Sect. 5. Final conclusions are made in Sect. 6.

Fig. 1

Base matrices of IEEE 802.15.3c QC-LDPC codes [39]: a (672,336)-LDPC code; b (672-425)-LDPC code; c (672,504)-LDPC code; d (672,588)-LDPC code

Fig. 2

Base matrix of a 1/2-rate (2304,1152)-QC-LDPC code utilized in IEEE 802.16e

Fig. 3

a PCM of a (12, 4)-QC-LDPC code; b its shuffled version


QC-LDPC codes

The PCM of a QC-LDPC code is comprised of Circulant Permutation Matrices (CPMs) and zero matrices, wherein a CPM is a shifted identity matrix. Such a PCM could be represented as

$$\begin{aligned} {\varvec{H}}_{qc}= \begin{bmatrix} {\varvec{A}}_1 \\ {\varvec{A}}_2 \\ \vdots \\ {\varvec{A}}_c \\ \end{bmatrix} = \begin{bmatrix} {\varvec{A}}_{1,1} &{} {\varvec{A}}_{1,2} &{} \ldots &{} {\varvec{A}}_{1,t} \\ {\varvec{A}}_{2,1} &{} {\varvec{A}}_{2,2} &{} \ldots &{} {\varvec{A}}_{2,t} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {\varvec{A}}_{c,1} &{} {\varvec{A}}_{c,2} &{} \ldots &{} {\varvec{A}}_{c,t} \end{bmatrix}, \end{aligned}$$

in which \(c\le t\) and \({\varvec{A}}_{i,j}\)s are either \(b\times b\) CPMs or \(b\times b\) zero matrices. A codeword \({\varvec{v}}\) is of length t.b comprising t sections \({\varvec{v}}=(\varvec{v_1},\varvec{v_2},\cdots ,\varvec{v_t} )\) with each section \(\varvec{v_i},1\le i\le t\) of length b. Codewords of a QC-LDPC code have sectionized cyclic structure, in the sense that with cyclic shifting of the t sections in a codeword, another valid codeword is obtained [37]. A compact way for representing PCM of a QC-LDPC code is known as the base matrix, denoted by \(\varvec{{\mathcal {W}}}\). In \(\varvec{{\mathcal {W}}}\), non-negative integers specify the shifting value with respect to an identity matrix in the corresponding CPM, and other entries, usually chosen to be − 1, represent zero matrices in the PCM. Fig. 1 shows the base matrices for the QC-LDPC codes utilized in IEEE 802.15.3c standard, and Fig. 2 shows the 1/2-rate (2304,1152)-QC-LDPC code used in IEEE 802.16e. In these two figures, empty places are the locations of zero matrices.

Tanner graph representation of a PCM is an important means to comprehend BP algorithm. It consists of two sets of nodes, where one set represents CNs, i.e., parity-check sums (equivalent to rows of \({\varvec{H}}\)), and the other set represents VNs (equivalent to columns of \({\varvec{H}}\)). A CN in a Tanner graph is connected to a VN if and only if the corresponding element of \({\varvec{H}}\) is one.

LD schedule

One appropriate schedule which facilitates partially parallel architecture for execution of BP algorithm is LD. In this schedule, each iteration is split into several sub-iterations, each corresponding to a layer of the PCM. In each sub-iteration, CNs of the corresponding layer exchange reliability messages with VNs of that layer, and at the end, the updated reliability messages are provided to the next layer. Accordingly, fewer number of processing units for CNs and VNs are realized in hardware, and they are re-utilized in successive sub-iterations for successive layers.

Let \({\varvec{y}} = (y_0 , \ldots , y_{n - 1})\) be the soft-decision sequence at the output of the channel that is to be decoded. The J rows of the PCM \({\varvec{H}}_{qc}\) are divided into L layers each containing E consecutive rows. Hence, the i-th layer \({\varvec{H}}^{(i)}_{qc} , 1 \le i \le L\) contains rows \((i - 1)E + 1 , \ldots , i.E\) of \({\varvec{H}}_{qc}\). The i-th sub-iteration of LD algorithm, associated with the i-th layer of the PCM, is split into three steps:

  1. 1

    Vertical step: VTC messages are updated as

    $$\begin{aligned} Z^{(i)}_{j , l} = Y_l - L^{(i)}_{j,l} , 1 \le j \le E , 0 \le l < n, \end{aligned}$$

    where \(Y_l , \, 0 \le l < n\) are A Posteriori Probability (APP) values of VNs initially set as \({\varvec{y}}\), and \(L^{(i)}_{j,l} , 1 \le j \le E , 0 \le l < n , 1 \le i \le L\) are CTV messages corresponding to the i-th layer initially set to zero for all the layers.

  2. 2

    Horizontal step: CTV messages are updated as

    $$\begin{aligned} L^{(i)}_{j,l} = \prod _{l' \in {\mathcal {B}} ({\varvec{h}}^{(i)}_j) \setminus l} sgn(Z^{(i)}_{j , l'}) \times \min _{l' \in {\mathcal {B}} ({\varvec{h}}^{(i)}_j) \setminus l} |Z^{(i)}_{j , l'} |, \end{aligned}$$


    $$\begin{aligned} {\mathcal {B}} ({\varvec{h}}^{(i)}_j) = \{ l : h^{(i)}_{j,l} = 1, 0 \le l < n\} \end{aligned}$$

    denotes the set of VNs neighbor to the CN \({\varvec{h}}^{(i)}_j\).

  3. 3

    Hard decision and early termination criterion: APP values are updated according to:

    $$\begin{aligned} Y_l = y_l + \sum _{j \in {\mathcal {A}}_l^{(i)}} L^{(i)}_{j,l} , 0 \le l < n. \end{aligned}$$


    $$\begin{aligned} {\mathcal {A}}_l^{(i)} = \{j : h_{j , l}^{(i)} = 1 , 1 \le j \le E \} \end{aligned}$$

    analogous to \({\mathcal {B}} ({\varvec{h}}^{(i)}_j)\) representing the set of CNs in layer i connected to \(v_l\).

In (3), min-sum scheme [38] has been used for computing CTV messages, wherein sgn(x) is a sign function which is equal to 1 when \(x \ge 0\) and -1 otherwise.

At the end of each sub-iteration, some codeword bits are estimated based on the updated APP values. In particular, \({\hat{v}}_l = 1\) if \(Y_l > 0\) and \({\hat{v}}_l = 0\) otherwise. If the check-sum condition \({\varvec{H}}_{qc} \hat{{\varvec{v}}}^T = 0\) is satisfied by the estimated codeword \(\hat{{\varvec{v}}} = ({\hat{v}}_0 , \ldots , {\hat{v}}_{n - 1})\), it will be declared as the valid codeword leading to the termination of the algorithm. Otherwise, the algorithm continues starting from the vertical step of the next layer. The maximum number of iterations is however limited by a threshold \(I_{\mathrm{max}}\). The algorithm declares a failure if decoding is not converged to a valid codeword within \(I_{\mathrm{max}}\) iterations.

Modified shuffling of QC-LDPC codes

Modified shuffling

LD schedule is suitably tailored for decoding QC-LDPC codes, since each block-row \({\varvec{A}}_i , \, 1 \le i \le c\) in (1) can be regarded as a layer. Accordingly, the columns in each layer are either single weight or zero weight. This property is useful for simplifying the decoding implementation. First and foremost, the summation in (4) is simplified as it exists only a single CTV message to be added to \(y_l\). Moreover, the implementation of both operations in (2) and (3) are also simplified.

Shuffling is the act of swapping the rows of the PCM, in a manner that the complexity of the decoding algorithm reduces, while the error correction performance is preserved. [15] proposes to partition the \(J=c.b\) rows of \({\varvec{H}}_{qc}\) between matrices \({\varvec{H}}^{sh , (i)}_{qc}, \, i=1,\ldots ,b\) each containing the rows \(i,i+b,i+2b,\ldots ,i+(c-1)b\) of \({\varvec{H}}_{qc}\). Accordingly, the shuffled PCM, \({\varvec{H}}^{sh}_{qc}\), has matrices \({\varvec{H}}^{sh , (i)}_{qc}, \, i=1,\ldots ,b\) as its row-blocks:

$$\begin{aligned} {\varvec{H}}_{qc}^{sh}= \begin{bmatrix} {\varvec{H}}_{qc}^{sh,(1)} \\ \vdots \\ {\varvec{H}}_{qc}^{sh ,(b)} \\ \end{bmatrix} \end{aligned}$$

An example of such a shuffling is depicted in Fig. 3, which has been performed on a (12, 4)-QC-LDPC code. In \({\varvec{H}}^{sh}_{qc},\) each layer \({\varvec{H}}^{sh , (i)}_{qc}\) is obtained by cyclically shifting its above layer \({\varvec{H}}^{sh , (i - 1)}_{qc}\) one symbol to the right, noting that the circulation must be performed separately on the t individual sections of a layer. Due to this cyclic property provided by shuffling, it is no longer needed to implement the whole PCM in the targeted hardware (for example an FPGA) and define all the “1”-entries as connections in it. Instead, it suffices to implement only \({\varvec{H}}^{sh , (1)}_{qc}\), the first layer of \({\varvec{H}}^{sh}_{qc}\), and then perform a circulation on the memory blocks containing updated APP values at the end of each sub-iteration. This implementation benefit is further elaborated on in Sect. 4.2.

The shortcoming with this shuffling method is that layers in the shuffled matrix may no longer be comprised of single weight columns. In particular, if the base matrix of a QC-LDPC code has repetitive numbers in a column, the corresponding block-column in the shuffled matrix will have columns of weight bigger than one in each layer. For instance, in Figs. 1 and 2, the shaded columns have repetitive numbers. Consequently, after shuffling, they bring about columns of weight 2 or 3 in each layer of their own shuffled matrix.

Fig. 4

Base matrices of IEEE 802.15.3c QC-LDPC codes with offset values: a (672,336)-LDPC code; b (672-425)-LDPC code; c (672,504)-LDPC code

Fig. 5

Base matrix of the 1/2-rate (2304,1152) IEEE 802.16e QC-LDPC code with offset values

Fig. 6

LD architecture for shuffled PCM of Figure 3

Fig. 7

Flowchart of the LD algorithm

As a solution to this weakness of shuffling, we propose to employ a set of integer values \(0 \le o_m < b, \, m = 1 , \ldots , c\) serving as an offset in order to modify the order of the rows of \({\varvec{H}}_{qc}\) which are put in the same layer in \({\varvec{H}}_{qc}^{sh}\). Accordingly, the i-th layer of \({\varvec{H}}_{qc}^{sh}\) is made up of rows \(\{i + o_1 , b + i + o_2 , \ldots , (c - 1)b + i + o_c\}\) of \({\varvec{H}}_{qc}\) for \(1 \le i \le b\). \(o_i\)s are carefully selected so that in \({\varvec{H}}_{qc}^{sh}\) no column in a layer has the weight bigger than one. The offset values can be regarded as a means to eliminate repetitive values in a column. In other words, the modified shuffling technique with offset values is equivalent to the basic shuffling method if performed on \({\varvec{H}}_{qc}\) whose base matrix no longer contains repetitive integers in a column because of the offset values. A possible offset set for each of our example codes together with the resulting modified base matrices are shown in Fig. 4 and 5. As observed, with the introduced offset values in these examples, the shaded columns no longer have repetitive integers.

This perspective of the modified shuffling gives us the way for finding appropriate offset values for a specific QC-LDPC code. The straight way is to try all the possible values until finding the one that gives a base matrix in which all the integers in any column are distinct. In general, for a QC-LDPC code, there are \(b ^ c\) possible offset sets which must be examined one by one until finding the one which results in a base matrix without repetitive integers in a column. Once such an offset set is found, the search is halted. It should be noted that in some cases, there may not exist a possible offset set, like the (672, 588)-QC-LDPC code of Fig. 1-d. This indicates that the modified shuffling is not necessarily applicable to all the QC-LDPC codes. It should also be emphasized that the desired cyclic property of \({\varvec{H}}_{qc}^{sh}\) does exist in the case of modified shuffling as it does in the basic shuffling method, and hence, each layer of \({\varvec{H}}_{qc}^{sh}\) can be produced from its previous layer by a circular shifting.

LD implementation

The advantage of shuffling can be highlighted by an investigation of the implementation methodology. Specifically, Fig. 6 illustrates the LD architecture for \({\varvec{H}}_{qc}^{sh}\) of Fig. 3. In this figure, VN Processing Unit (VNPU) and CN Processing Unit (CNPU) stand for a processing unit responsible for computing VTC and CTV messages in (2) and (3), respectively. As illustrated, instead of having \(J = 12\) CNPUs, \(c = 4\) CNPUs delegating the CNs of the first layer are sufficient for decoding and the connections between them are determined by the 1-entries of the first layer. At the end of each sub-iteration, the computed APP values are cyclically shifted as shown by the figure. This circulation serves as an alternative to redefining the connections between VNPUs and CNPUs, allowing the current connections between VNPUs and CNPUs to remain valid and to start the next sub-iteration immediately.

Fig. 7 shows the decoding flowchart for the algorithm outlined in section 3.2. As shown, the last block in the decoding loop is a “circulate” operation performed on the updated APP values in order to prepare them for the next sub-iteration processing. The other blocks in the decoding loop are responsible for performing operations (2)–(4).

Fig. 8

Xilinx VC709 evaluation board with a Virtex-7 FPGA

Fig. 9

BER performance for the example QC-LDPC codes

Fig. 10

Average number of sub-iterations required for achieving the BER performance of Fig. 9

Fig. 11

Average throughput for the example QC-LDPC codes

Implementation and experimental results

We implemented both LD with shuffled PCM and LD with non-shuffled PCM for the example codes of IEEE 802.16e and IEEE 802.15.3c standards. The utilized hardware was a Xilinx VC709 evaluation board, shown in Fig. 8 which possesses a Virtex-7 XC7VX690T-2FFG1761C FPGA. The implementation was conducted with 6-bit quantized messages and the synthesis tool was Vivado 2018.3.

Table 1 Implementation results of the example codes on a Xilinx Virtex-7 FPGA

The acquired results in terms of utilized LUTs, on-chip power and maximum clock frequency are shown in Table 1. The first two parameters are directly reported by the synthesis tool and the maximum clock frequency is estimated from the parameter of worst negative slack, also reported by the synthesis tool. As deduced from the figures in the table, the design with shuffled PCM is considerably smaller, and the consumed power is notably lower. For instance, the number of occupied LUTs reduces from 262122 to only 15299 in the case of (672,336) code, equivalent to \((1 - \frac{15299}{262122}) \times 100 \cong 94\%\) reduction in the numbers of LUTs on FPGA. Similarly, a \((1 - \frac{0.84}{6.693}) \times 100 \cong 87\%\) reduction in consumed power is also resulted. In summary, the superiority of the shuffling method in terms of hardware area and consumed power is apparent from the implementation results. Note that the design of the non-shuffled IEEE 802.16e is too big to fit in the FPGA, and hence, the results are not available.

Figure 9 depicts the performance simulation of the three IEEE 802.15.3c codes, showing that shuffling does not degrade the BER performance. Indeed, LD with (modified) shuffled PCM performs as good as LD with non-shuffled PCM. Furthermore, the average number of iterations needed to achieve a specific BER performance depicted in Fig. 10 is quite the same in two cases, further confirming the similar performance of the two modes. This is due to the fact that shuffling just changes the order of the rows of the PCM, while the overall PCM’s characteristics remain intact. In particular, the determining attributes of the PCM such as distance property, cycles’ distribution and rows’ and columns’ weight do not undergo any change.

Comparing the two cases in terms of throughput can also be of interest. The average throughput for different codes is plotted in Fig. 11. Given that \(f_{\mathrm{clk}}\) is the clock frequency specified in table 1, \(N_{{\mathrm{ave}}\_{\mathrm{ite}}}\) is the average number of sub-iterations and \(N_{\mathrm{clk}}\) is the number of clock cycles each sub-iteration needs, the average duration for decoding a sequence will be then \(\frac{N_{{\mathrm{ave}}\_{\mathrm{ite}}} . N_{\mathrm{clk}}}{f_{\mathrm{clk}}}\), thus leading to the average throughput of

$$\begin{aligned} \tau = \frac{k . f_{\mathrm{clk}}}{N_{{\mathrm{ave}}\_{\mathrm{ite}}} . N_{\mathrm{clk}}}. \end{aligned}$$

In our implementation, \(N_{\mathrm{clk}} = 8\) and 7 for the case of shuffled and non-shuffled PCM, respectively, noting that the one extra clock cycle in the former case is needed for cyclic shifting of the computed APP values. The average throughput in the two cases overlap for low values of \(E_b/N_0\), indicating that the additional number of sub-iterations is compensated fully by the higher clock frequency. This is however not true when \(E_b/N_0\) grows. If \(\mathrm{BER} = 1e\)-6 is chosen as the targeted BER performance, the throughput degradation will be 0.1, 0.3, and 1.3 Gbps for the three codes, respectively. The degradation in throughput stems from the fact that layering is different in the two cases. In the case of non-shuffled PCM, the J rows are divided into c layers, each of b rows, while in the case of shuffled PCM, they are divided into b layers, each of c rows. Since c is usually much more smaller than b, in the first case, a bigger number of VNs are processed in a sub-iteration and hence it needs fewer sub-iterations in total.


The novel shuffling method proposed in this paper is basically a swapping of the rows of the PCM of a QC-LDPC code with two objectives in mind. First, the columns in each layer of the shuffled PCM must remain single weight or zero weight. Second, each layer must be producible from the upper layer by a one-symbol circular shifting. Though simple, this shuffling brings about considerable complexity reduction in the decoding implementation, while preserving the error-correcting capability of the code and its decoding throughput for BER values of up to 1e−6.

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.



A Posteriori Probability


Bit Error Rate


Belief Propagation


Check Node


CN Processing Unit


Circulant Permutation Matrix




Forward Error Correction


Field Programmable Gate Array


Layered Decoding


Low-Density Parity-Check


Look-Up Table


Parity-Check Matrix


Quasi-Cyclic LDPC


Variable Node


VN Processing Unit




Fifth Generation


  1. 1.

    F.R. Kschischang, B.J. Frey, Iterative decoding of compound codes by probability propagation in graphical models. IEEE J. Sel. Areas Commun. 16(2), 219–230 (1998)

    Article  Google Scholar 

  2. 2.

    L. Liu, C.-J.R. Shi, Sliced message passing: High throughput overlapped decoding of high-rate low-density parity-check codes. IEEE Trans. Circuits Syst. I Regul. Pap. 55(11), 3697–3710 (2008)

    MathSciNet  Article  Google Scholar 

  3. 3.

    R.G. Maunder, A fully-parallel turbo decoding algorithm. IEEE Trans. Commun. 63(8), 2762–2775 (2015)

    MathSciNet  Article  Google Scholar 

  4. 4.

    Y. Sun, J.R. Cavallaro, Efficient hardware implementation of a highly-parallel 3g pp LTE/LTE-advance turbo decoder. Integration 44(4), 305–315 (2011)

    Article  Google Scholar 

  5. 5.

    S.K. Chronopoulos, V. Christofilakis, G. Tatsis, P. Kostarakis, Preliminary BER study of a TC-OFDM system operating under noisy conditions. J. Eng. Sci. Technol. Rev. 9(4), 13–16 (2016)

    Article  Google Scholar 

  6. 6.

    S.K. Chronopoulos, V. Christofilakis, G. Tatsis, P. Kostarakis, Performance of turbo coded OFDM under the presence of various noise types. Wirel. Pers. Commun. 87(4), 1319–1336 (2016)

    Article  Google Scholar 

  7. 7.

    M.M. Mansour, N.R. Shanbhag. Turbo decoder architectures for low-density parity-check codes, in Global Telecommunications Conference. GLOBECOM'02. IEEE, vol. 2 (IEEE, 2002), pp. 1383–1388

  8. 8.

    M.M. Mansour, N.R. Shanbhag, High-throughput LDPC decoders. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 11(6), 976–996 (2003)

    Article  Google Scholar 

  9. 9.

    H. Sankar, K.R. Narayanan, Memory-efficient sum-product decoding of LDPC codes. IEEE Trans. Commun. 52(8), 1225–1230 (2004)

    Article  Google Scholar 

  10. 10.

    D.E. Hocevar. A reduced complexity decoder architecture via layered decoding of LDPC codes, in IEEE Workshop on Signal Processing Systems, 2004. SIPS 2004 (IEEE, 2004) pp. 107–112

  11. 11.

    M.M. Mansour, N.R. Shanbhag, A 640-mb/s 2048-bit programmable LDPC decoder chip. IEEE J. Solid-State Circuits 41(3), 684–698 (2006)

    Article  Google Scholar 

  12. 12.

    Gunnam, K.K., Choi, G.S., Wang, W., Kim, E., Yeary, M.B.: Decoding of quasi-cyclic LDPC codes using an on-the-fly computation, in 2006 Fortieth Asilomar Conference on Signals, Systems and Computers (IEEE, 2006), pp. 1192–1199

  13. 13.

    K. Gunnam, G. Choi, W. Wang, M. Yeary. Multi-rate layered decoder architecture for block LDPC codes of the IEEE 802.11 n wireless standard, in 2007 IEEE International Symposium on Circuits and Systems (IEEE, 2007), pp.1645–1648

  14. 14.

    Rovini, M., Rossi, F., Ciao, P., L'Insalata, N., Fanucci, L.: Layered decoding of non-layered LDPC codes, in 9th EUROMICRO Conference on Digital System Design (DSD'06) (IEEE, 2006), pp. 537–544

  15. 15.

    Y.-L. Ueng, C.-C. Cheng. A fast-convergence decoding method and memory-efficient VLSI decoder architecture for irregular LDPC codes in the IEEE 802.16 e standards, in 2007 IEEE 66th Vehicular Technology Conference (IEEE, 2007), pp. 1255–1259

  16. 16.

    P. Radosavljevic, A. de Baynast, J.R. Cavallaro. Optimized message passing schedules for LDPC decoding, in Conference Record of the Thirty-Ninth Asilomar Conference on Signals, Systems and Computers, 2005 (IEEE, 2005), pp.591–595

  17. 17.

    D. Yang, G. Yu, X. Zou, Y. Deng, J. Zhong. The design and verification of a novel LDPC decoder with high-efficiency, in 2014 International Symposium on Integrated Circuits (ISIC) (IEEE, 2014), pp. 256–259

  18. 18.

    A. de Baynast, P. Radosavljevic, A. Sabharwal, J.R. Cavallaro. On turbo-schedules for LDPC decoding. IEEE Commun. Lett. (2006)

  19. 19.

    P. Radosavljevic, M. Karkooti, A. de Baynast, J.R. Cavallaro, Tradeoff analysis and architecture design of high throughput irregular LDPC decoders. IEEE Trans. Circuits Syst. I: Regul. Pap. 1(1), 1 (2006)

    Google Scholar 

  20. 20.

    T. Brack, M. Alles, F. Kienle, N. When. A synthesizable IP core for WIMAX 802.16 e LDPC code decoding, in 2006 IEEE 17th International Symposium on Personal, Indoor and Mobile Radio Communications (IEEE, 2006), pp. 1–5

  21. 21.

    G. Gentile, M. Rovini, L. Fanucci. Low-complexity architectures of a decoder for IEEE 802.16 e LDPC codes, in 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007) (IEEE, 2007), pp. 369–375

  22. 22.

    K.K. Gunnam, G.S. Choi, M.B. Yeary, M. Atiquzzaman. VLSI architectures for layered decoding for irregular LDPC codes of WIMAX, in 2007 IEEE International Conference on Communications (IEEE, 2007), pp. 4542–4547

  23. 23.

    K. Zhang, X. Huang, Z. Wang, High-throughput layered decoder implementation for quasi-cyclic LDPC codes. IEEE J. Sel. Areas Commun. 27(6), 985–994 (2009)

    Article  Google Scholar 

  24. 24.

    J. Goldberger, H. Kfir, Serial schedules for belief-propagation: analysis of convergence time. IEEE Trans. Inf. Theory 54(3), 1316–1319 (2008)

    MathSciNet  Article  Google Scholar 

  25. 25.

    Y. Cui, X. Peng, Z. Chen, X. Zhao, Y. Lu, D. Zhou, S. Goto. Ultra low power qc-LDPC decoder with high parallelism, in 2011 IEEE International SOC Conference (IEEE, 2011), pp. 142–145

  26. 26.

    J. Zhang, M.P. Fossorier, Shuffled iterative decoding. IEEE Trans. Commun. 53(2), 209–213 (2005)

    Article  Google Scholar 

  27. 27.

    Y.-L. Ueng, C.-J. Yang, C.-J. Chen. A shuffled message-passing decoding method for memory-based LDPC decoders, in 2009 IEEE International Symposium on Circuits and Systems (IEEE, 2009), pp. 892–895

  28. 28.

    J. Zhang, M. Fossorier. Shuffled belief propagation decoding, in Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002, vol. 1 (IEEE, 2002), pp. 8–15

  29. 29.

    J. Zhang, Y. Wang, M.P. Fossorier, J.S. Yedidia, Iterative decoding with replicas. IEEE Trans. Inf. Theory 53(5), 1644–1663 (2007)

    MathSciNet  Article  Google Scholar 

  30. 30.

    Z. Cui, Z. Wang, X. Zhang, Q. Jia. Efficient decoder design for high-throughput LDPC decoding, in APCCAS 2008–2008 IEEE Asia Pacific Conference on Circuits and Systems (IEEE, 2008), pp. 1640–1643

  31. 31.

    Y.-L. Ueng, C.-J. Yang, K.-C. Wang, C.-J. Chen, A multimode shuffled iterative decoder architecture for high-rate RS-LDPC codes. IEEE Trans. Circuits Syst. I Regul. Pap. 57(10), 2790–2803 (2010)

    MathSciNet  Article  Google Scholar 

  32. 32.

    F. Guilloud, E. Boutillon, J. Tousch, J.-L. Danger, Generic description and synthesis of LDPC decoders. IEEE Trans. Commun. 55(11), 2084–2091 (2007)

    Article  Google Scholar 

  33. 33.

    Y.-L. Ueng, B.-J. Yang, C.-J. Yang, H.-C. Lee, J.-D. Yang, An efficient multi-standard LDPC decoder design using hardware-friendly shuffled decoding. IEEE Trans. Circuits Syst. I Regul. Pap. 60(3), 743–756 (2013)

    MathSciNet  Article  Google Scholar 

  34. 34.

    Z. Wang, Z. Cui, Low-complexity high-speed decoder design for quasi-cyclic LDPC codes. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 15(1), 104–114 (2007)

    MathSciNet  Article  Google Scholar 

  35. 35.

    M.P. Fossorier, Quasicyclic low-density parity-check codes from circulant permutation matrices. IEEE Trans. Inf. Theory 50(8), 1788–1793 (2004)

    MathSciNet  Article  Google Scholar 

  36. 36.

    A. Hasani, L. Lopacinski, S. Büchner, J. Nolte, R. Kraemer. A modified shuffling method to split the critical path delay in layered decoding of qc-LDPC codes, in 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) (IEEE, 2019), pp. 1–6

  37. 37.

    Z. Li, L. Chen, L. Zeng, S. Lin, W.H. Fong, Efficient encoding of quasi-cyclic low-density parity-check codes. IEEE Trans. Commun. 54(1), 71–81 (2006)

    Article  Google Scholar 

  38. 38.

    J. Chen, M.P. Fossorier, Near optimum universal belief propagation based decoding of low-density parity check codes. IEEE Trans. Commun. 50(3), 406–414 (2002)

    Article  Google Scholar 

  39. 39.

    S.-W. Yen, S.-Y. Hung, C.-L. Chen, H.-C. Chang, S.-J. Jou, C.-Y. Lee, A 5.79-gb/s energy-efficient multirate LDPC codec chip for IEEE 802.15. 3c applications. IEEE J. Solid-State Circuits 47(9), 2246–2257 (2012)

    Article  Google Scholar 

Download references


This work was supported by the German Research Foundation (DFG) and conducted at IHP-microelectronics GmbH. The authors are also thankful to the support of Brandenburg University of Technology (BTU) Cottbus-Senftenberg, Prof. Dr. Jörg Nolte and Dr. Steffen Büchner.


Open Access funding enabled and organized by Projekt DEAL. This work was supported by the German Research Foundation (DFG) project PSSS-FEC, project no. 442607813.

Author information




AH proposed and developed the new idea of the paper and drafted it. LL and RK have substantially revised it. All authors approved the submitted version. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Alireza Hasani.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hasani, A., Lopacinski, L. & Kraemer, R. Reduced-complexity decoding implementation of QC-LDPC codes with modified shuffling. J Wireless Com Network 2021, 183 (2021).

Download citation


  • Quasi-cyclic low-density parity-check code
  • Layered decoding
  • Decoding complexity