- Research
- Open Access
Non-volatile memory reduction based on 1-D memory space mapping of a specific set of QC-LDPC codes
- Chung-Ping Young^{1},
- Chung-Chu Chia^{1},
- Chao-Chin Yang^{2}Email author and
- Chung-Ming Huang^{3}
https://doi.org/10.1186/1687-1499-2012-191
© Young et al; licensee Springer. 2012
- Received: 8 February 2012
- Accepted: 8 June 2012
- Published: 8 June 2012
Abstract
Supporting a great diversity of multi-rate H-matrices for multiple communication protocols requires a large amount of non-volatile memory, which may consume a large silicon area or logic elements and constrain the implementation of an overall decoder. Therefore, schemes for memory reduction are necessary to make the parity-check storage more compact. This study proposes a specific set of quasi-cyclic low-density parity-check (LDPC) (QC-LDPC) codes which can transfer a traditional two-dimensional (2-D) parity-check matrix (H-matrix) into a one-dimensional (1-D) memory space. Compared to the existing schemes, the proposed codes and memory reduction scheme do achieve significant reduction rates. Within a fixed memory space, many more H-matrices for diverse communication protocols can be saved via the proposed QC-LDPC codes, which are well constructed from modified Welch-Costas sequences. Furthermore, relatively good error performances, which outperform computer-generated random LDPC codes and Sridhara-Fuja-Tanner codes, are also shown in our simulation results. Consequently, we conclude that the proposed QC-LDPC codes can enlarge the capacity for saving much more low-BER (bit error rate) H-matrices within a fixed memory space.
Keywords
- Actual Address
- Memory Space
- LDPC Code
- Gate Count
- Similar Code
1. Introduction
Low-density parity-check (LDPC) codes were first introduced by Gallager in 1962, but they were rarely used since implementing them in hardware was impractical in the 1960s. The value of LDPC codes was rediscovered by Mackay and Neal in 1996 [1]. Since then, LDPC codes have gained a lot of attention due to their excellent error correction capability. A binary (j, k)-regular LDPC code is defined as the null space of a sparse parity-check matrix H over GF(2) and satisfies the following properties: (1) each column has weight j; (2) each row has weight k; (3) no two rows (or two columns) have more than one 1-component in common; (4) both j and k are much smaller than the code length.
Most methods for designing good LDPC codes are based on random constructions, but the lack of structure makes the encoding process complicated. Furthermore, the non-volatile memory required to store the parity-check matrices may be prohibitive in practical applications.
Nowadays, some wireless devices are designed to be both tiny and capable of supporting multiple communication functions, such as WLAN [2, 3], 3G, DVB-S2 [4], CMMB [5], etc. Therefore, a great diversity of LDPC codes is employed for the demand of error corrections. Compared to overall decoders, the storage for multiple LDPC H-matrices is very area consuming. Therefore, memory reduction schemes are necessary for reducing the memory requirements as much as possible. quasi-cyclic LDPC (QC-LDPC) codes are always employed for this purpose since even trivial approaches can achieve huge gains in memory reduction.
There are two primary types of parity-check matrices of LDPC codes: the pseudorandom matrix [6] and the quasi-cyclic matrix [7, 8]. The latter, whose encoding complexity is directly proportional to the code length, is widely applied in consumer electronics. Several classes of QC-LDPC codes [8–10] have been proposed. Such codes can achieve good error performance comparable with computer-generated random LDPC codes. However, in terms of implementation aspects, QC-LDPC codes for multi-rate communication sessions and diverse communication protocols need to be stored concurrently. As indicated in [6], directly employing the lookup tables for saving multiple H-matrices is always prohibitive.
The motivation of this study is to propose a specific set of QC-LDPC codes with extremely low memory requirements. To achieve this goal, we introduce properly constructed QC-LDPC codes which can be classified as a specific set of formerly proposed modified Welch-Costas (MWC-OCS) codes [11]. These LDPC codes are constructed by multilevel sequences [8, 12, 13] with the property that any two different rows have at most one element in common. Based on our proposed 2-D to 1-D memory space mapping, each code in the specific set can achieve a huge reduction rate due to its particular structure.
- (1)
As indicated in [14], the unreduced H_{ROM} is definitely area-consuming. In fact, it may occupy 60% area of the overall decoder. Therefore, reduction schemes are necessary for degrading the memory demand.
- (2)
The reduction impact is still unobvious in some LDPC codes with irregular structures. Possible approaches have been applied in [15].
- (3)
From [5], some code structures can massively be reduced to merely 8.5% of the overall area. This achievement is meaningful, especially when the target device is specified. For example, after optimization, this decoder can easily fit an ALTERA EP2S15F484C3 FPGA (250 K gates), and hence the search for a larger or more expensive device is no longer necessary.
- (4)
The study [2] indicates that supporting multiple H-matrices will increase the memory demand. Although the required non-volatile memory has been optimized as a [324 × 48] low-power ROM (60 K gates), it still consumes a large portion of the hardware resource.
To address the critical issues mentioned above, we propose extremely compact codes to significantly reduce a single H-matrix storage. Moreover, a reduction scheme via data mapping and the proposed codes are also demonstrated to further degrade the memory demand for multi-rate H-matrices storage. Both of the proposed codes and the further reduction scheme can achieve huge reduction rates. Compared to existing approaches, this study can squeeze many more H-matrices within a fixed memory space.
This article is organized as follows. In the introduction section, we highlighted the need for H_{ROM} reduction and clarify the motivation of our approach. In Section 2, we explain how a QC-LDPC parity-check matrix is characterized by merely storing the shift values of its identity sub-matrices for memory efficiency. Our similar codes with extremely low memory requirements for CMMB, WLAN, and WIMAX are introduced in Section 3. The proposed MWC-OCS LDPC block codes and their 1-D memory space mapping are introduced in Sections 4 and 5. In Section 6, a design example demonstrates how multi-rate H-matrices storage can be further reduced. For a tiny device designed to support more H-matrices for diverse communication protocols, this example should not be considered negligible. Finally, we offer conclusions in Section 7.
2. QC-LDPC block codes storage
For QC-LDPC codes, storing H [2] involves saving the shift values of identity sub-matrices and their column positions. The column positions can easily be generated by an address generator unit (AGU) [5]. As for the storage of shift values, it requires a 2-D matrix which can only be carried out by a non-volatile memory space. Non-volatile memory severely consumes the logic elements. Properly constructed LDPC codes require less non-volatile memory and can make the hardware resource available for decoder improvements or more H-matrices storage.
The content stored in the non-volatile memory is used for generating actual addresses which point to soft messages stored in the volatile memory (RAM). An actual address can easily be determined by adding an offset address to the base address. A decoder retrieves a soft message by accessing the RAM via an actual memory address. The number of addresses required to retrieve all the content of message RAM equals to the number of 1-components in the H-matrix. Each 1-component in H represents a RAM address. All the RAM addresses needed for the decoder to obtain the soft messages are stored in the non-volatile memory. Without optimization, the size of this non-volatile memory is equal to Z × U bits, where Z is the address width (Z ≧ log_{2}N) [5], and U is the total number of 1-components in an H-matrix with code length N. In QC-LDPC block codes, the non-volatile memory for recording the RAM addresses can be replaced by a reduced 2-D Y-matrix, in which only shift values are stored. With the content of Y and an AGU [5], the original memory space can effectively be reduced. This section shows how an H-matrix of a QC-LDPC code can be compactly stored and how actual addresses can be determined by an AGU.
2.1. Non-zero elements in hardware implementation
In terms of hardware implementations, the position of '1' in the same column of a parity-check matrix actually represents an offset address. An actual address, which points to a soft message stored in the volatile memory, can be generated by the addition of an offset address and a base address. For an N-bit codeword, the offset addresses range between 0 and N - 1. The actual addresses can be generated on the fly by an AGU which spans all the required memory addresses for message retrieval. The AGU requires only a small amount of data which characterizes the feature of H. This required data varies with different code structures. Chose the smallest integer Z (Z ≧ log_{2}N), then each non-zero element in the H actually denotes to a Z-bit offset address which can be used to retrieve a soft message stored in the corresponding message memory.
2.2. Message address determination
3. Reduction factors in recent memory reduction works
List of works related to non-volatile memory reduction for LDPC codes
Work | Memory requirement | Reduction factor with reference to M × k × Z | Application examples | ||||
---|---|---|---|---|---|---|---|
Code length | Code rate | Memory requirement | Reduction Factor | ||||
LDPC for CMMB (all elements) [5] | M × k × Z | 1 | 9216 | 1/2 | 4608 × 6 × 14 = 387072 bits | 1 | |
9216 | 3/4 | 2304 × 12 × 14 = 387072 bits | 1 | ||||
LDPC for CMMB with AGU [5] | H' | I | (Z × N_{ p } + 2 k × R_{ p })/(M × k × Z) | 9216 | 1/2 | 1512 + 216 = 1728 bits | 4.46 × 10^{-3} |
Z × N _{ p } | 2 k × R _{ p } | 9216 | 3/4 | 1512 + 216 = 1728 bits | 4.46 × 10^{-3} | ||
Regular HC-LDPC [17] | (N_{ e }/N_{ p }) × log_{2}N_{ e } | [log_{2}(M × k)/Z] × (1/N_{ p }) | 4128 | 1/2 | (2064 × 6/18) × 14 = 9632 bits | 5.98 × 10^{-2} | |
TS-LDPC [18] | (j-1) × (k-1) × z | [(j- 1) × (k-1)/(M × k)] × (z/Z) | 6084 | 3/4 | 3 × 15 × 9 = 405 bits | 1.7 × 10^{-3} | |
LDPC for WLAN [2] | N.A. | N.A. | 1944 | 1/2 | 24 × 12 × 7 = 2151 bits | 2.78 × 10^{-2} | |
LDPC for WIMAX [2] | 2304 | 1/2 | 24 × 12 × 7 = 2151 bits | 2.34 × 10^{-2} | |||
Proposed 1-D MWC-OCS LDPC | (j + k-1) × z | [(j + k-1)/(M × k)] × (z/Z) | 9186 (p = 1531) | 1/2 | (6 + 3-1) × 11 = 88 bits | 2.3 × 10^{-4} | |
9228 (p = 769) | 3/4 | (12 + 3 - 1) × 10 = 140 bits | 3.6 × 10^{-4} | ||||
Similar codes for CMMB | |||||||
1992 (p = 83) | 1/2 | (24 + 12) × 7 = 252 bits | 9.6 × 10^{-4} | ||||
2328 (p = 97) | 1/2 | (24 + 12) × 7 = 252 bits | 7.5 × 10^{-4} | ||||
Similar codes for WLAN/WIMAX |
Synthesis results of the code storage for the similar codes of CMMB, WLAN, and WMAX
Compared to the existing approaches, many more H-matrices constructed by our approach for diverse communication protocols can be supported within a fixed non-volatile memory space. The extremely low memory requirements of the proposed codes are achieved by a 2-D to 1-D memory space mapping. Consequently, the j × k × Z bits required by a traditional 2-D storage for QC-LDPC codes can be reduced to (j + k-1) × z bits, where z is the smallest integer satisfying z ≥ log_{2}p.
4. Methods to Construct memory-efficient MWC-OCS LDPC codes
A cycle in a Tanner graph is a sequence of connected vertices that starts and ends at the same vertex in the graph, and which contains other vertices no more than once. To upgrade the performance of LDPC codes, it is necessary to avoid 4-cycles, which is the shortest possible length for a Tanner graph. The girth of an LDPC code is the length of the smallest cycle. Since cycles of short length may degrade the performance of LDPC codes, it is necessary to ensure that the Tanner graph of the LDPC codes is free of cycles with lengths of 4 and hence have girths of at least 6 [13]. In Section 4.1, we introduce how to construct the proposed memory-efficient QC-LDPC codes and prove that no 4-cycles are present. The construction examples and the simulation results are shown at the last two subsections.
4.1. Construction procedure
In this section, we propose a method for constructing the QC-LDPC codes with memory reduction. For clarity of exposition, the MWC-OCS LDPC codes [11] are used as an exemplification. However, for the application of the proposed memory-efficient scheme, only the cyclic parts of the MWC sequences are adopted in the construction procedure.
where 0 ≤ u ≤ j-1, 0 ≤ v ≤ k-1.
For memory efficiency, two specific sequences {a_{0}, a_{1},..., a_{j-1}} and {b_{0}, b_{1},..., b_{k-1}} are constructed by the following procedure.
First, choose basic parameters j, k, and p (p is a prime) where the j, k ∈ integer.
Second, choose initial values a_{0}, b_{0}, and grid size f where 0 ≤ a_{ m } (= a_{0} + mf) ≤ p-2 for m = 0, 1,..., j-1, 1 ≤ b_{ n } (= b_{0} + nf) ≤ p-1 for n = 0, 1,..., k-1, and f ∈ {1, 2,..., p-1}.
Hence, the resulting H, which has j ones in each column and k ones in each row, represents a (j, k)-regular LDPC code (this LDPC code is also an [N, K] regular LDPC code, where N (= kp) is the block length of the MWC-OCS LDPC code and K is the number of message bits).
The size of the parity-check matrix H is jp × kp. Due to the linear dependence among the rows of H, it has a code rate of r = K/N ≥ 1-(j/k). Actually, since the summation of the p rows of the J th sub-matrices [I(y_{J,0}) I(y_{J,1})...I(y_{J, k-1})] (0 ≤ J ≤ j-1) in (4) are equal to an all-1 vector, there are at least j-1 dependent rows in H. Therefore, the Tanner graph of the resulting LDPC codes is free of cycles with lengths of 4 and hence has a girth of at least 6.
4.2. Construction examples
According to (3) and (6), two construction examples are shown as follows. The initial values chosen for the required sequences can be different.
Example A: A [155, 64] MWC-OCS LDPC code (p = 31)
In Example A, two sequences are both ordered increasingly, and the resulting H-matrix shows the same shift values in the identity sub-matrices from the lower left to the upper right.
Example B shows that a memory-efficient LDPC codes also can be constructed by two sequences ordered in the opposite direction of each other. The H-matrix is constructed by a_{ m } = {a_{0}, a_{0}+f, a_{0}+2f,..., a_{0}+(j-1)f} and b_{ n } = {b_{0}, b_{0}-f, b_{0}-2f,..., b_{0}-(k- 1)f}. The resulting H-matrix shows the same shift values in the identity sub-matrices from the upper left to the lower right. Each resulting H-matrix shown in Examples A and B is a 93 × 155 matrix and describes a (3, 5)-regular LDPC code with rate = 64/155 ≈ 0.4129 (by using Gaussian elimination, we know that H has a rank of 91).
5. 2-D to 1-D mapping and simulation results of memory-efficient MWC-OCS LDPC codes
We construct two specific sequences {a_{0}, a_{1},..., a_{j-1}} and {b_{0}, b_{1},..., b_{k-1}}, which satisfy the conditions that 0 ≤ a_{ m } (= a_{0} ± mf) ≤ p-2 for m = 0, 1,..., j-1, 1 ≤ b_{ n } (= b_{0} ± nf) ≤ p-1 for n = 0, 1,..., k-1, f ∈{1, 2,..., p-1} and p is an odd prime. Note that a_{ i } ≠ a_{ j } and b_{ i } ≠ b_{ j } if i ≠ j. Then the following two cases are able to construct memory-efficient MWC-OCS LDPC codes, which can be mapped from 2-D H-matrices into 1-D memory spaces.
Case A: Two sequences are ordered in the same direction, then the 2-D matrix y_{ u, v } is mapped into a 1-D memory space y_{ w }, where w = u + v.
Case B: Two sequences are ordered in opposite directions of each other, and then the 2-D matrix y_{ u, v } is mapped into a 1-D memory space y_{ w }, where w = v - u + (j - 1).
6. A further reduction for multi-rate H-matrices storage
In this section, a design example is demonstrated to show how a further reduction can be achieved after 1-D memory space mapping. As we have mentioned, all possible schemes for further reduction are meaningful as long as they can achieve significant gains, especially when the hardware resource is constrained and the demand for diverse H-matrices increases.
6.1. Preprocess of the elements in 1-D memory spaces
6.2. Design architecture
1-D Memory mapping and synthesis result of design example
H-matrix | Type | Non-zero elements storage (M × k × Z) | Multiple constructions | 1-D memory mapping | |||
---|---|---|---|---|---|---|---|
( N , j , k ) | Rate | p | ( j + k-1 ) z | Reduction factor | |||
H1 | MWC-OCS | 724 × 8 × 11 = 63712 bits | (1448,4,8) | 0.5A | 181 | 11 × 8 = 88 bits | 1.38 × 10^{-3} |
H2 | MWC-OCS | 3508 × 8 × 13 = 364832 bits | (7016,4,8) | 0.5B | 877 | 11 × 10 = 110 bits | 3.01 × 10^{-4} |
H3 | MWC-OCS | 412 × 16 × 11 = 72512 bits | (1648,4,16) | 0.75A | 103 | 19 × 7 = 133 bits | 1.83 × 10^{-3} |
H4 | MWC-OCS | 1004 × 16 × 12 = 192768 bits | (4016,4,16) | 0.75B | 251 | 19 × 8 = 152 bits | 7.88 × 10^{-4} |
H-library | MWC-OCS | 693824 bits | 483 bits | 6.96 × 10^{-4} | |||
Preprocess | MWC-OCS | 693824 bits | 348 bits | 5.02 × 10^{-4} | |||
Synthesis results | MWC-OCS H-library | 152 output pads (memory less and gate free) 13 gates (non-volatile) 16 DFFs (volatile) |
6.3. Synthesis results
Before synthesizing the design example, by applying the proposed 2-D to 1-D memory space mapping scheme, the total non-volatile memory demand is reduced to 483 bits, and the corresponding reduction factor is only 6.94 × 10^{-4}. A further optimization is achieved by a reduction factor of 5.02 × 10^{-4} after preprocessing the overlapped elements before synthesis. The synthesis report in the last line of Table 4 shows that addressing the memories and routing the data require 13 gates, 152 memory-less and gate-free output pads, and a 16-DFF H-register which spans all the shift values. The schematic view of the synthesis result is also shown in Figure 12. During a communication session, all the shift values of a specific H can be retrieved through the 152-bit extended output pads. These pads send out fixed logic levels mixed with the output of the H-register. Therefore, in a session of communication, only the 152-bit (maximum size of the four 1-D memory spaces) output data spanned by the H-register is accessed for fast retrieval of the shift values. The non-volatile part will not be accessed until a new session of communication using another H-matrix begins.
7. Conclusion
Memory-efficient MWC-OCS LDPC codes were proposed to reduce the non-volatile memory demand for H-matrix storage. The described similar codes outperform other recent approaches with huge memory-reduction rates. Compared to CMMB and WIMAX/WLAN, our similar codes have, respectively, achieved 92 and 88% reduction rates in terms of the requirement of memory bits. In the synthesis result of our similar code storage for CMMB, the gain (99.6%) evaluated by gate count is even larger than the gain (92%) estimated by memory bits. In addition, the proposed codes also show relatively good error performances comparable with competitive codes in column-weight 4 constructions. Furthermore, a design example was also synthesized to be every compact for multi-rate H-matrices storage. In implementing wireless applications for multiple communication protocols within a fixed size of memory space, our approach is of good worth. As we have mentioned, even when the huge gain achieved is only a small fraction of the overall decoder, it cannot be considered negligible as the demand for diverse H-matrices storage of multiple communication protocols increases. Especially, for a tiny device with a very limited size of memory space, applying the proposed approach can enhance the capability for supporting many more error correction functions.
Declarations
Acknowledgements
This research was partially supported by the National Science Council in Taiwan (Grant No.NSC98-2221-E-168-004-MY3).
Authors’ Affiliations
References
- MacKay D, Neal R: Near Shannon limit performance of low-density parity-check codes. Electron Lett 1996, 32: 1645-1646. 10.1049/el:19961141View ArticleGoogle Scholar
- Amador E, Pacalet R, Rezard V: Optimum LDPC decoder: a memory architecture problem. In Proceedings of the Design Automation Conference. San Francisco, CA, USA; 2009:891-896.Google Scholar
- Tian Y, Zhang X, Lai Z: A LDPC decoder with all single port memories. Proceedings of the Intelligent Computing and Intelligent Systems, Shanghai, China 2009, 3: 547-550.Google Scholar
- Kienle F, Brack T, When N: A synthesizable IP Core for DVB-S2 LDPC code decoding. In Proceedings of the Design Automation and Test in Europe 2005. Volume 3. Munich, Germany; 2005:100-105.Google Scholar
- Lee S, Park J, Chung K: Memory efficient multi-rate regular LDPC decoder for CMMB. IEEE Trans Consum Electron 2008, 55(4):1866-1874.View ArticleGoogle Scholar
- Prabhakar A, Narayanan K: Pseudorandom construction of low density parity check codes using linear congruential sequences. IEEE Trans Commun 2002, 50(9):1389-1396. 10.1109/TCOMM.2002.802537View ArticleGoogle Scholar
- Tanner R, Sridhara D, Sridharan A, Fuja T, Costello D: LDPC block and convolutional codes based on circulant matrices. IEEE Trans Inf Theory 2004, 50(12):2966-2984. 10.1109/TIT.2004.838370MathSciNetView ArticleMATHGoogle Scholar
- Fossorier M: Quasi-cyclic low density parity check codes from circulant permutation matrices. IEEE Trans Inf Theory 2004, 50: 1788-1794. 10.1109/TIT.2004.831841MathSciNetView ArticleMATHGoogle Scholar
- Kostic Z, Titlebaum E: The design and performance analysis for several new classes of codes for optical synchronous CDMA and for arbitrary-medium time-hopping synchronous CDMA communication systems. IEEE Trans Commun 1994, 42: 2608-2617. 10.1109/26.310621View ArticleGoogle Scholar
- Chen L, Xu J, Djurdjevic I, Lin S: Near-Shannon-limit quasi-cyclic low-density parity-check codes. IEEE Trans Commun 2004, 52(7):1038-1042. 10.1109/TCOMM.2004.831353View ArticleGoogle Scholar
- Huang J, Yang C, Huang C: On analyzing quasi-cyclic LDPC codes over modified Welch-Costas-coded optical CDMA system. IEEE J Lightw Technol 2009, 27(12):2150-2158.View ArticleGoogle Scholar
- Yang C: Optical CDMA passive optical network using prime code with interference elimination. IEEE Photon Technol Lett 2007, 19: 516-518.View ArticleGoogle Scholar
- Huang J, Huang C, Yang C: Construction of one-coincidence sequence quasi-cycle LDPC codes of large Girth. IEEE Trans Inf Theory LDPC Decoder implementation; 2012, 58(3):1825-1836. Accessed 9 April 2012 [http://cwe.ccsds.org]
- High throughput low power decoder architectures for low density parity check codes[http://repository.tamu.edu]
- Sandberg S: Improved design of unequal error protection LDPC codes. EURASIP J Wirel Commun Netw 2010: doi:10.1155/2010/423989Google Scholar
- Verdier F, Declercq D: A low-cost parallel scalable FPGA architecture for regular and I regular LDPC decoding. IEEE Trans Commun 2006, 54(9):1215-1223.View ArticleGoogle Scholar
- Moura J, Lu J, Zhang H: Structured low density parity check decoding. IEEE Signal Process Mag 2004, 21: 42-55. 10.1109/MSP.2004.1267048View ArticleGoogle Scholar
- MacKay D, Davey M: Evaluation of Gallager codes for short block length and high rate applications. In Proceedings of the IMA Workshop Codes, Systems and Graphical Models. Minneapolis, MN; 1999.Google Scholar
- Dai Y, Chen N, Yan Z: Memory efficient decoder architecture for quasi-cyclic LDPC codes. IEEE Trans Circ Syst 2008, 55(9):2898-2911.MathSciNetView ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.