 Research
 Open Access
 Published:
Reducedcomplexity decoding implementation of QCLDPC codes with modified shuffling
EURASIP Journal on Wireless Communications and Networking volume 2021, Article number: 183 (2021)
Abstract
Layered decoding (LD) facilitates a partially parallel architecture for performing belief propagation (BP) algorithm for decoding lowdensity paritycheck (LDPC) codes. Such a schedule for LDPC codes has, in general, reduced implementation complexity compared to a fully parallel architecture and higher convergence rate compared to both serial and parallel architectures, regardless of the codeword length or coderate. In this paper, we introduce a modified shuffling method which shuffles the rows of the paritycheck matrix (PCM) of a quasicyclic LDPC (QCLDPC) code, yielding a PCM in which each layer can be produced by the circulation of its above layer one symbol to the right. The proposed shuffling scheme additionally guarantees the columns of a layer of the shuffled PCM to be either zero weight or single weight. This condition has a key role in further decreasing LD complexity. We show that due to these two properties, the number of occupied lookup tables (LUTs) on a field programmable gate array (FPGA) reduces by about 93% and consumed onchip power by nearly 80%, while the bit error rate (BER) performance is maintained. The only drawback of the shuffling is the degradation of decoding throughput, which is negligible for low values of \(E_b/N_0\) until the BER of 1e−6.
Introduction
Forward error correction (FEC) methods are one of the vital elements of nextgeneration wireless networks tasked to provide the required level of reliability. Nevertheless, powerful capacityachieving FEC techniques like lowdensity paritycheck (LDPC), turbo or polar codes have the downside of higher complexity and power consumption compared with traditional coding techniques. Among these codes, LDPC codes have been incorporated into several previous technologies and are seen as the potential candidate for the new standards like Fifth Generation (5G) and IEEE 802.11ax.
Belief propagation (BP) algorithm is commonly used as the decoding method for LDPC codes, as the paritycheck matrix (PCM) of these codes is sparse and their Tanner graph representation lacks short cycles of length 4. One important issue with regard to BP decoding is the schedule of the algorithm, i.e., the order in which the reliability messages are exchanged between the nodes of the Tanner graph. BP schedule is directly associated with the implementation architecture of the decoding method, and it falls into three main categories:

1
Fully parallel architecture which is realized by flood schedule [1] in which all the variable nodes (VNs) and check nodes (CNs) in the Tanner graph pass messages concurrently to their neighbors in every iterations of the algorithm. Although yielding a high throughput, this schedule requires a large silicon area with high interconnect complexity [2]. This architecture is facilitated due to the inherent parallelizable feature of BP algorithm, which is in contrast to turbo codes whose decoding algorithms are inherently serial. However, several works have devised fully parallel architectures for decoding turbo codes [3,4,5,6].

2
Serial architecture in which a smaller number of functional units are reused several times to perform each decoder iteration. In this way, decoding complexity is lowered, although at the price of reduced decoding throughput.

3
Partially parallel architecture which is a good tradeoff between hardware complexity and decoding throughput and it is best accomplished by layered decoding (LD) schedule [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33].
In LD schedule, the rows of PCM are divided into a number of layers, and each iteration of the BP algorithm is likewise split into the same number of subiterations. Each subiteration runs over one layer of the PCM, during which the CNs of that layer exchange reliability messages with their neighbor VNs. At the end of a subiteration, updated reliability messages are delivered to the next layer. Accordingly, in each subiteration, only a subset of CNs, i.e., as many as the number of rows in each layer, participate in the decoding process, causing a reduced hardware utilization of LD compared to flood schedule. Furthermore, LD schedule achieves a better convergence performance than the flood schedule due to the fact that the latest variabletocheck (VTC) messages are always used to update the checktovariable (CTV) messages during a subiteration.
For the sake of LD complexity, it is highly desirable that the number of ones in each column of a layer be either one or zero. Quasicyclic LDPC (QCLDPC) codes have inherently a layered structure with such a property, thus making them an appropriate candidate for LD. QCLDPC codes are a special type of LDPC codes possessing a cyclic property which simplifies the encoding and decoding process of them, while preserving comparable performance to random (or unstructured) LDPC codes [34, 35].
Related work and contributions
Shuffling idea proposed in [15] shuffles the rows of the PCM of a QCLDPC code prior to decoding, in the sense that the order of the rows of the PCM is totally changed. After shuffling, each layer can be produced by circulating its above layer one symbol to the right, leading to a simplified LD and spedup convergence rate. In particular, due to the cyclic property, it is enough to realize only the first layer of a PCM in hardware rather than the whole PCM. The downside of this shuffling is that it may spoil the primary property of single weight columns in the PCM.
To workaround this shortcoming, we outlined a modified shuffling idea in our previous work [36] which results in a shuffled PCM that retains the desired property of single weight columns, and possesses the cyclic property too. This was accomplished by introducing a set of offset values prior to performing the shuffling. In this paper, this modified shuffling idea is further investigated. To be specific

1
The logic behind offset values applied for shuffling is clarified, aiming to elaborate how the offset values come into effect. The procedure for determining the offset values is also outlined.

2
Since [36] lacks implementation results to verify the improvements promised by the modified shuffling method, we provide in this work the implementation results for LD of several QCLDPC codes when shuffled with the proposed technique. Improvements in terms of number of occupied lookup tables (LUTs) on a field programmable gate array (FPGA) and also power consumption are observed when compared with the case of nonshuffled LD. These improvements are achieved without sacrificing bit error rate (BER) performance. Although the decoding throughput deteriorates as \(E_b/N_0\) rises, our analysis shows that if BER of 1e−6 is chosen as the target, throughput degradation will be insubstantial.
The organization of the paper is as follows. Section 3 presents necessary fundamentals of QCLDPC codes and LD. Section 4 is devoted to assessment of the novel shuffling method and its attributes. Implementation and simulation results together with necessary analysis come in Sect. 5. Final conclusions are made in Sect. 6.
Preliminaries
QCLDPC codes
The PCM of a QCLDPC code is comprised of Circulant Permutation Matrices (CPMs) and zero matrices, wherein a CPM is a shifted identity matrix. Such a PCM could be represented as
in which \(c\le t\) and \({\varvec{A}}_{i,j}\)s are either \(b\times b\) CPMs or \(b\times b\) zero matrices. A codeword \({\varvec{v}}\) is of length t.b comprising t sections \({\varvec{v}}=(\varvec{v_1},\varvec{v_2},\cdots ,\varvec{v_t} )\) with each section \(\varvec{v_i},1\le i\le t\) of length b. Codewords of a QCLDPC code have sectionized cyclic structure, in the sense that with cyclic shifting of the t sections in a codeword, another valid codeword is obtained [37]. A compact way for representing PCM of a QCLDPC code is known as the base matrix, denoted by \(\varvec{{\mathcal {W}}}\). In \(\varvec{{\mathcal {W}}}\), nonnegative integers specify the shifting value with respect to an identity matrix in the corresponding CPM, and other entries, usually chosen to be − 1, represent zero matrices in the PCM. Fig. 1 shows the base matrices for the QCLDPC codes utilized in IEEE 802.15.3c standard, and Fig. 2 shows the 1/2rate (2304,1152)QCLDPC code used in IEEE 802.16e. In these two figures, empty places are the locations of zero matrices.
Tanner graph representation of a PCM is an important means to comprehend BP algorithm. It consists of two sets of nodes, where one set represents CNs, i.e., paritycheck sums (equivalent to rows of \({\varvec{H}}\)), and the other set represents VNs (equivalent to columns of \({\varvec{H}}\)). A CN in a Tanner graph is connected to a VN if and only if the corresponding element of \({\varvec{H}}\) is one.
LD schedule
One appropriate schedule which facilitates partially parallel architecture for execution of BP algorithm is LD. In this schedule, each iteration is split into several subiterations, each corresponding to a layer of the PCM. In each subiteration, CNs of the corresponding layer exchange reliability messages with VNs of that layer, and at the end, the updated reliability messages are provided to the next layer. Accordingly, fewer number of processing units for CNs and VNs are realized in hardware, and they are reutilized in successive subiterations for successive layers.
Let \({\varvec{y}} = (y_0 , \ldots , y_{n  1})\) be the softdecision sequence at the output of the channel that is to be decoded. The J rows of the PCM \({\varvec{H}}_{qc}\) are divided into L layers each containing E consecutive rows. Hence, the ith layer \({\varvec{H}}^{(i)}_{qc} , 1 \le i \le L\) contains rows \((i  1)E + 1 , \ldots , i.E\) of \({\varvec{H}}_{qc}\). The ith subiteration of LD algorithm, associated with the ith layer of the PCM, is split into three steps:

1
Vertical step: VTC messages are updated as
$$\begin{aligned} Z^{(i)}_{j , l} = Y_l  L^{(i)}_{j,l} , 1 \le j \le E , 0 \le l < n, \end{aligned}$$(2)where \(Y_l , \, 0 \le l < n\) are A Posteriori Probability (APP) values of VNs initially set as \({\varvec{y}}\), and \(L^{(i)}_{j,l} , 1 \le j \le E , 0 \le l < n , 1 \le i \le L\) are CTV messages corresponding to the ith layer initially set to zero for all the layers.

2
Horizontal step: CTV messages are updated as
$$\begin{aligned} L^{(i)}_{j,l} = \prod _{l' \in {\mathcal {B}} ({\varvec{h}}^{(i)}_j) \setminus l} sgn(Z^{(i)}_{j , l'}) \times \min _{l' \in {\mathcal {B}} ({\varvec{h}}^{(i)}_j) \setminus l} Z^{(i)}_{j , l'} , \end{aligned}$$(3)where
$$\begin{aligned} {\mathcal {B}} ({\varvec{h}}^{(i)}_j) = \{ l : h^{(i)}_{j,l} = 1, 0 \le l < n\} \end{aligned}$$denotes the set of VNs neighbor to the CN \({\varvec{h}}^{(i)}_j\).

3
Hard decision and early termination criterion: APP values are updated according to:
$$\begin{aligned} Y_l = y_l + \sum _{j \in {\mathcal {A}}_l^{(i)}} L^{(i)}_{j,l} , 0 \le l < n. \end{aligned}$$(4)with
$$\begin{aligned} {\mathcal {A}}_l^{(i)} = \{j : h_{j , l}^{(i)} = 1 , 1 \le j \le E \} \end{aligned}$$analogous to \({\mathcal {B}} ({\varvec{h}}^{(i)}_j)\) representing the set of CNs in layer i connected to \(v_l\).
In (3), minsum scheme [38] has been used for computing CTV messages, wherein sgn(x) is a sign function which is equal to 1 when \(x \ge 0\) and 1 otherwise.
At the end of each subiteration, some codeword bits are estimated based on the updated APP values. In particular, \({\hat{v}}_l = 1\) if \(Y_l > 0\) and \({\hat{v}}_l = 0\) otherwise. If the checksum condition \({\varvec{H}}_{qc} \hat{{\varvec{v}}}^T = 0\) is satisfied by the estimated codeword \(\hat{{\varvec{v}}} = ({\hat{v}}_0 , \ldots , {\hat{v}}_{n  1})\), it will be declared as the valid codeword leading to the termination of the algorithm. Otherwise, the algorithm continues starting from the vertical step of the next layer. The maximum number of iterations is however limited by a threshold \(I_{\mathrm{max}}\). The algorithm declares a failure if decoding is not converged to a valid codeword within \(I_{\mathrm{max}}\) iterations.
Modified shuffling of QCLDPC codes
Modified shuffling
LD schedule is suitably tailored for decoding QCLDPC codes, since each blockrow \({\varvec{A}}_i , \, 1 \le i \le c\) in (1) can be regarded as a layer. Accordingly, the columns in each layer are either single weight or zero weight. This property is useful for simplifying the decoding implementation. First and foremost, the summation in (4) is simplified as it exists only a single CTV message to be added to \(y_l\). Moreover, the implementation of both operations in (2) and (3) are also simplified.
Shuffling is the act of swapping the rows of the PCM, in a manner that the complexity of the decoding algorithm reduces, while the error correction performance is preserved. [15] proposes to partition the \(J=c.b\) rows of \({\varvec{H}}_{qc}\) between matrices \({\varvec{H}}^{sh , (i)}_{qc}, \, i=1,\ldots ,b\) each containing the rows \(i,i+b,i+2b,\ldots ,i+(c1)b\) of \({\varvec{H}}_{qc}\). Accordingly, the shuffled PCM, \({\varvec{H}}^{sh}_{qc}\), has matrices \({\varvec{H}}^{sh , (i)}_{qc}, \, i=1,\ldots ,b\) as its rowblocks:
An example of such a shuffling is depicted in Fig. 3, which has been performed on a (12, 4)QCLDPC code. In \({\varvec{H}}^{sh}_{qc},\) each layer \({\varvec{H}}^{sh , (i)}_{qc}\) is obtained by cyclically shifting its above layer \({\varvec{H}}^{sh , (i  1)}_{qc}\) one symbol to the right, noting that the circulation must be performed separately on the t individual sections of a layer. Due to this cyclic property provided by shuffling, it is no longer needed to implement the whole PCM in the targeted hardware (for example an FPGA) and define all the “1”entries as connections in it. Instead, it suffices to implement only \({\varvec{H}}^{sh , (1)}_{qc}\), the first layer of \({\varvec{H}}^{sh}_{qc}\), and then perform a circulation on the memory blocks containing updated APP values at the end of each subiteration. This implementation benefit is further elaborated on in Sect. 4.2.
The shortcoming with this shuffling method is that layers in the shuffled matrix may no longer be comprised of single weight columns. In particular, if the base matrix of a QCLDPC code has repetitive numbers in a column, the corresponding blockcolumn in the shuffled matrix will have columns of weight bigger than one in each layer. For instance, in Figs. 1 and 2, the shaded columns have repetitive numbers. Consequently, after shuffling, they bring about columns of weight 2 or 3 in each layer of their own shuffled matrix.
As a solution to this weakness of shuffling, we propose to employ a set of integer values \(0 \le o_m < b, \, m = 1 , \ldots , c\) serving as an offset in order to modify the order of the rows of \({\varvec{H}}_{qc}\) which are put in the same layer in \({\varvec{H}}_{qc}^{sh}\). Accordingly, the ith layer of \({\varvec{H}}_{qc}^{sh}\) is made up of rows \(\{i + o_1 , b + i + o_2 , \ldots , (c  1)b + i + o_c\}\) of \({\varvec{H}}_{qc}\) for \(1 \le i \le b\). \(o_i\)s are carefully selected so that in \({\varvec{H}}_{qc}^{sh}\) no column in a layer has the weight bigger than one. The offset values can be regarded as a means to eliminate repetitive values in a column. In other words, the modified shuffling technique with offset values is equivalent to the basic shuffling method if performed on \({\varvec{H}}_{qc}\) whose base matrix no longer contains repetitive integers in a column because of the offset values. A possible offset set for each of our example codes together with the resulting modified base matrices are shown in Fig. 4 and 5. As observed, with the introduced offset values in these examples, the shaded columns no longer have repetitive integers.
This perspective of the modified shuffling gives us the way for finding appropriate offset values for a specific QCLDPC code. The straight way is to try all the possible values until finding the one that gives a base matrix in which all the integers in any column are distinct. In general, for a QCLDPC code, there are \(b ^ c\) possible offset sets which must be examined one by one until finding the one which results in a base matrix without repetitive integers in a column. Once such an offset set is found, the search is halted. It should be noted that in some cases, there may not exist a possible offset set, like the (672, 588)QCLDPC code of Fig. 1d. This indicates that the modified shuffling is not necessarily applicable to all the QCLDPC codes. It should also be emphasized that the desired cyclic property of \({\varvec{H}}_{qc}^{sh}\) does exist in the case of modified shuffling as it does in the basic shuffling method, and hence, each layer of \({\varvec{H}}_{qc}^{sh}\) can be produced from its previous layer by a circular shifting.
LD implementation
The advantage of shuffling can be highlighted by an investigation of the implementation methodology. Specifically, Fig. 6 illustrates the LD architecture for \({\varvec{H}}_{qc}^{sh}\) of Fig. 3. In this figure, VN Processing Unit (VNPU) and CN Processing Unit (CNPU) stand for a processing unit responsible for computing VTC and CTV messages in (2) and (3), respectively. As illustrated, instead of having \(J = 12\) CNPUs, \(c = 4\) CNPUs delegating the CNs of the first layer are sufficient for decoding and the connections between them are determined by the 1entries of the first layer. At the end of each subiteration, the computed APP values are cyclically shifted as shown by the figure. This circulation serves as an alternative to redefining the connections between VNPUs and CNPUs, allowing the current connections between VNPUs and CNPUs to remain valid and to start the next subiteration immediately.
Fig. 7 shows the decoding flowchart for the algorithm outlined in section 3.2. As shown, the last block in the decoding loop is a “circulate” operation performed on the updated APP values in order to prepare them for the next subiteration processing. The other blocks in the decoding loop are responsible for performing operations (2)–(4).
Implementation and experimental results
We implemented both LD with shuffled PCM and LD with nonshuffled PCM for the example codes of IEEE 802.16e and IEEE 802.15.3c standards. The utilized hardware was a Xilinx VC709 evaluation board, shown in Fig. 8 which possesses a Virtex7 XC7VX690T2FFG1761C FPGA. The implementation was conducted with 6bit quantized messages and the synthesis tool was Vivado 2018.3.
The acquired results in terms of utilized LUTs, onchip power and maximum clock frequency are shown in Table 1. The first two parameters are directly reported by the synthesis tool and the maximum clock frequency is estimated from the parameter of worst negative slack, also reported by the synthesis tool. As deduced from the figures in the table, the design with shuffled PCM is considerably smaller, and the consumed power is notably lower. For instance, the number of occupied LUTs reduces from 262122 to only 15299 in the case of (672,336) code, equivalent to \((1  \frac{15299}{262122}) \times 100 \cong 94\%\) reduction in the numbers of LUTs on FPGA. Similarly, a \((1  \frac{0.84}{6.693}) \times 100 \cong 87\%\) reduction in consumed power is also resulted. In summary, the superiority of the shuffling method in terms of hardware area and consumed power is apparent from the implementation results. Note that the design of the nonshuffled IEEE 802.16e is too big to fit in the FPGA, and hence, the results are not available.
Figure 9 depicts the performance simulation of the three IEEE 802.15.3c codes, showing that shuffling does not degrade the BER performance. Indeed, LD with (modified) shuffled PCM performs as good as LD with nonshuffled PCM. Furthermore, the average number of iterations needed to achieve a specific BER performance depicted in Fig. 10 is quite the same in two cases, further confirming the similar performance of the two modes. This is due to the fact that shuffling just changes the order of the rows of the PCM, while the overall PCM’s characteristics remain intact. In particular, the determining attributes of the PCM such as distance property, cycles’ distribution and rows’ and columns’ weight do not undergo any change.
Comparing the two cases in terms of throughput can also be of interest. The average throughput for different codes is plotted in Fig. 11. Given that \(f_{\mathrm{clk}}\) is the clock frequency specified in table 1, \(N_{{\mathrm{ave}}\_{\mathrm{ite}}}\) is the average number of subiterations and \(N_{\mathrm{clk}}\) is the number of clock cycles each subiteration needs, the average duration for decoding a sequence will be then \(\frac{N_{{\mathrm{ave}}\_{\mathrm{ite}}} . N_{\mathrm{clk}}}{f_{\mathrm{clk}}}\), thus leading to the average throughput of
In our implementation, \(N_{\mathrm{clk}} = 8\) and 7 for the case of shuffled and nonshuffled PCM, respectively, noting that the one extra clock cycle in the former case is needed for cyclic shifting of the computed APP values. The average throughput in the two cases overlap for low values of \(E_b/N_0\), indicating that the additional number of subiterations is compensated fully by the higher clock frequency. This is however not true when \(E_b/N_0\) grows. If \(\mathrm{BER} = 1e\)6 is chosen as the targeted BER performance, the throughput degradation will be 0.1, 0.3, and 1.3 Gbps for the three codes, respectively. The degradation in throughput stems from the fact that layering is different in the two cases. In the case of nonshuffled PCM, the J rows are divided into c layers, each of b rows, while in the case of shuffled PCM, they are divided into b layers, each of c rows. Since c is usually much more smaller than b, in the first case, a bigger number of VNs are processed in a subiteration and hence it needs fewer subiterations in total.
Conclusion
The novel shuffling method proposed in this paper is basically a swapping of the rows of the PCM of a QCLDPC code with two objectives in mind. First, the columns in each layer of the shuffled PCM must remain single weight or zero weight. Second, each layer must be producible from the upper layer by a onesymbol circular shifting. Though simple, this shuffling brings about considerable complexity reduction in the decoding implementation, while preserving the errorcorrecting capability of the code and its decoding throughput for BER values of up to 1e−6.
Availability of data and materials
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
Abbreviations
 APP:

A Posteriori Probability
 BER:

Bit Error Rate
 BP:

Belief Propagation
 CN:

Check Node
 CNPU:

CN Processing Unit
 CPM:

Circulant Permutation Matrix
 CTV:

ChecktoVariable
 FEC:

Forward Error Correction
 FPGA:

Field Programmable Gate Array
 LD:

Layered Decoding
 LDPC:

LowDensity ParityCheck
 LUT:

LookUp Table
 PCM:

ParityCheck Matrix
 QCLDPC:

QuasiCyclic LDPC
 VN:

Variable Node
 VNPU:

VN Processing Unit
 VTC:

VariabletoCheck
 5G:

Fifth Generation
References
 1.
F.R. Kschischang, B.J. Frey, Iterative decoding of compound codes by probability propagation in graphical models. IEEE J. Sel. Areas Commun. 16(2), 219–230 (1998)
 2.
L. Liu, C.J.R. Shi, Sliced message passing: High throughput overlapped decoding of highrate lowdensity paritycheck codes. IEEE Trans. Circuits Syst. I Regul. Pap. 55(11), 3697–3710 (2008)
 3.
R.G. Maunder, A fullyparallel turbo decoding algorithm. IEEE Trans. Commun. 63(8), 2762–2775 (2015)
 4.
Y. Sun, J.R. Cavallaro, Efficient hardware implementation of a highlyparallel 3g pp LTE/LTEadvance turbo decoder. Integration 44(4), 305–315 (2011)
 5.
S.K. Chronopoulos, V. Christofilakis, G. Tatsis, P. Kostarakis, Preliminary BER study of a TCOFDM system operating under noisy conditions. J. Eng. Sci. Technol. Rev. 9(4), 13–16 (2016)
 6.
S.K. Chronopoulos, V. Christofilakis, G. Tatsis, P. Kostarakis, Performance of turbo coded OFDM under the presence of various noise types. Wirel. Pers. Commun. 87(4), 1319–1336 (2016)
 7.
M.M. Mansour, N.R. Shanbhag. Turbo decoder architectures for lowdensity paritycheck codes, in Global Telecommunications Conference. GLOBECOM'02. IEEE, vol. 2 (IEEE, 2002), pp. 1383–1388
 8.
M.M. Mansour, N.R. Shanbhag, Highthroughput LDPC decoders. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 11(6), 976–996 (2003)
 9.
H. Sankar, K.R. Narayanan, Memoryefficient sumproduct decoding of LDPC codes. IEEE Trans. Commun. 52(8), 1225–1230 (2004)
 10.
D.E. Hocevar. A reduced complexity decoder architecture via layered decoding of LDPC codes, in IEEE Workshop on Signal Processing Systems, 2004. SIPS 2004 (IEEE, 2004) pp. 107–112
 11.
M.M. Mansour, N.R. Shanbhag, A 640mb/s 2048bit programmable LDPC decoder chip. IEEE J. SolidState Circuits 41(3), 684–698 (2006)
 12.
Gunnam, K.K., Choi, G.S., Wang, W., Kim, E., Yeary, M.B.: Decoding of quasicyclic LDPC codes using an onthefly computation, in 2006 Fortieth Asilomar Conference on Signals, Systems and Computers (IEEE, 2006), pp. 1192–1199
 13.
K. Gunnam, G. Choi, W. Wang, M. Yeary. Multirate layered decoder architecture for block LDPC codes of the IEEE 802.11 n wireless standard, in 2007 IEEE International Symposium on Circuits and Systems (IEEE, 2007), pp.1645–1648
 14.
Rovini, M., Rossi, F., Ciao, P., L'Insalata, N., Fanucci, L.: Layered decoding of nonlayered LDPC codes, in 9th EUROMICRO Conference on Digital System Design (DSD'06) (IEEE, 2006), pp. 537–544
 15.
Y.L. Ueng, C.C. Cheng. A fastconvergence decoding method and memoryefficient VLSI decoder architecture for irregular LDPC codes in the IEEE 802.16 e standards, in 2007 IEEE 66th Vehicular Technology Conference (IEEE, 2007), pp. 1255–1259
 16.
P. Radosavljevic, A. de Baynast, J.R. Cavallaro. Optimized message passing schedules for LDPC decoding, in Conference Record of the ThirtyNinth Asilomar Conference on Signals, Systems and Computers, 2005 (IEEE, 2005), pp.591–595
 17.
D. Yang, G. Yu, X. Zou, Y. Deng, J. Zhong. The design and verification of a novel LDPC decoder with highefficiency, in 2014 International Symposium on Integrated Circuits (ISIC) (IEEE, 2014), pp. 256–259
 18.
A. de Baynast, P. Radosavljevic, A. Sabharwal, J.R. Cavallaro. On turboschedules for LDPC decoding. IEEE Commun. Lett. (2006)
 19.
P. Radosavljevic, M. Karkooti, A. de Baynast, J.R. Cavallaro, Tradeoff analysis and architecture design of high throughput irregular LDPC decoders. IEEE Trans. Circuits Syst. I: Regul. Pap. 1(1), 1 (2006)
 20.
T. Brack, M. Alles, F. Kienle, N. When. A synthesizable IP core for WIMAX 802.16 e LDPC code decoding, in 2006 IEEE 17th International Symposium on Personal, Indoor and Mobile Radio Communications (IEEE, 2006), pp. 1–5
 21.
G. Gentile, M. Rovini, L. Fanucci. Lowcomplexity architectures of a decoder for IEEE 802.16 e LDPC codes, in 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007) (IEEE, 2007), pp. 369–375
 22.
K.K. Gunnam, G.S. Choi, M.B. Yeary, M. Atiquzzaman. VLSI architectures for layered decoding for irregular LDPC codes of WIMAX, in 2007 IEEE International Conference on Communications (IEEE, 2007), pp. 4542–4547
 23.
K. Zhang, X. Huang, Z. Wang, Highthroughput layered decoder implementation for quasicyclic LDPC codes. IEEE J. Sel. Areas Commun. 27(6), 985–994 (2009)
 24.
J. Goldberger, H. Kfir, Serial schedules for beliefpropagation: analysis of convergence time. IEEE Trans. Inf. Theory 54(3), 1316–1319 (2008)
 25.
Y. Cui, X. Peng, Z. Chen, X. Zhao, Y. Lu, D. Zhou, S. Goto. Ultra low power qcLDPC decoder with high parallelism, in 2011 IEEE International SOC Conference (IEEE, 2011), pp. 142–145
 26.
J. Zhang, M.P. Fossorier, Shuffled iterative decoding. IEEE Trans. Commun. 53(2), 209–213 (2005)
 27.
Y.L. Ueng, C.J. Yang, C.J. Chen. A shuffled messagepassing decoding method for memorybased LDPC decoders, in 2009 IEEE International Symposium on Circuits and Systems (IEEE, 2009), pp. 892–895
 28.
J. Zhang, M. Fossorier. Shuffled belief propagation decoding, in Conference Record of the ThirtySixth Asilomar Conference on Signals, Systems and Computers, 2002, vol. 1 (IEEE, 2002), pp. 8–15
 29.
J. Zhang, Y. Wang, M.P. Fossorier, J.S. Yedidia, Iterative decoding with replicas. IEEE Trans. Inf. Theory 53(5), 1644–1663 (2007)
 30.
Z. Cui, Z. Wang, X. Zhang, Q. Jia. Efficient decoder design for highthroughput LDPC decoding, in APCCAS 2008–2008 IEEE Asia Pacific Conference on Circuits and Systems (IEEE, 2008), pp. 1640–1643
 31.
Y.L. Ueng, C.J. Yang, K.C. Wang, C.J. Chen, A multimode shuffled iterative decoder architecture for highrate RSLDPC codes. IEEE Trans. Circuits Syst. I Regul. Pap. 57(10), 2790–2803 (2010)
 32.
F. Guilloud, E. Boutillon, J. Tousch, J.L. Danger, Generic description and synthesis of LDPC decoders. IEEE Trans. Commun. 55(11), 2084–2091 (2007)
 33.
Y.L. Ueng, B.J. Yang, C.J. Yang, H.C. Lee, J.D. Yang, An efficient multistandard LDPC decoder design using hardwarefriendly shuffled decoding. IEEE Trans. Circuits Syst. I Regul. Pap. 60(3), 743–756 (2013)
 34.
Z. Wang, Z. Cui, Lowcomplexity highspeed decoder design for quasicyclic LDPC codes. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 15(1), 104–114 (2007)
 35.
M.P. Fossorier, Quasicyclic lowdensity paritycheck codes from circulant permutation matrices. IEEE Trans. Inf. Theory 50(8), 1788–1793 (2004)
 36.
A. Hasani, L. Lopacinski, S. Büchner, J. Nolte, R. Kraemer. A modified shuffling method to split the critical path delay in layered decoding of qcLDPC codes, in 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) (IEEE, 2019), pp. 1–6
 37.
Z. Li, L. Chen, L. Zeng, S. Lin, W.H. Fong, Efficient encoding of quasicyclic lowdensity paritycheck codes. IEEE Trans. Commun. 54(1), 71–81 (2006)
 38.
J. Chen, M.P. Fossorier, Near optimum universal belief propagation based decoding of lowdensity parity check codes. IEEE Trans. Commun. 50(3), 406–414 (2002)
 39.
S.W. Yen, S.Y. Hung, C.L. Chen, H.C. Chang, S.J. Jou, C.Y. Lee, A 5.79gb/s energyefficient multirate LDPC codec chip for IEEE 802.15. 3c applications. IEEE J. SolidState Circuits 47(9), 2246–2257 (2012)
Acknowledgements
This work was supported by the German Research Foundation (DFG) and conducted at IHPmicroelectronics GmbH. The authors are also thankful to the support of Brandenburg University of Technology (BTU) CottbusSenftenberg, Prof. Dr. Jörg Nolte and Dr. Steffen Büchner.
Funding
Open Access funding enabled and organized by Projekt DEAL. This work was supported by the German Research Foundation (DFG) project PSSSFEC, project no. 442607813.
Author information
Affiliations
Contributions
AH proposed and developed the new idea of the paper and drafted it. LL and RK have substantially revised it. All authors approved the submitted version. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hasani, A., Lopacinski, L. & Kraemer, R. Reducedcomplexity decoding implementation of QCLDPC codes with modified shuffling. J Wireless Com Network 2021, 183 (2021). https://doi.org/10.1186/s13638021020565
Received:
Accepted:
Published:
Keywords
 Quasicyclic lowdensity paritycheck code
 Layered decoding
 Decoding complexity