Efficient detection and decoding of q-ary LDPC coded signals over partial response channels

In this study, we consider the ways for concatenating the intersymbol interference (ISI) detector with a q-ary low-density parity-check (LDPC) decoder for transmissions over partial response (PR) channels. LDPC codes allow achieving performance close to the channel capacity in additive white Gaussian noise channels, while designing receivers employing these codes for transmissions over channels affected by ISI is still an open issue. Turbo equalization schemes are considered with a novel joint message-passing-based receiver, which is derived from a recently proposed joint algorithm for binary LDPC codes. Simulation results provide performance evaluation of these systems over three different PR channels, together with an analysis of the trade-off between error-rate performance and complexity.


Introduction
High data rate modern communication systems are affected by intersymbol interference (ISI) effects, due to frequency selective fading for microwave wireless transmissions or to increased bit density for magnetic recording systems. The ISI channel may efficiently be represented by partial response (PR) discrete models, approximating the frequency selective behavior.
An optimal receiver for PR channels may be realized by a maximum likelihood sequence detector, using the estimated channel impulse response in a Viterbi algorithm [1,2]. Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm [3] can replace a Viterbi processor if the soft output information is employed in a concatenated scheme, feeding an outer channel decoder [4]. Recently turbo codes [5] and low-density parity-check (LDPC) [6] codes have been used to achieve bit error rate (BER) values lower than those provided by other typical error-correcting codes, like Reed-Solomon, in applications where ISI has to be very efficiently counteracted.
Specifically, LDPC codes [6] are linear codes characterized by a sparse parity-check (PC) matrix H, having M rows and N columns. LDPC can be classified as either regular or irregular depending on their row and column degree-distributions. Regular LDPC codes have a parity check matrix in which all rows (and columns) have *Correspondence: pietro.savazzi@unipv.it Dipartimento di Ingegneria Industriale e dell'Informazione, University of Pavia, Pavia, Italy equal weight, while irregular LDPC codes do not exhibit this property. Non-binary (or q-ary) LDPC codes have codewords (and also a PC matrix) whose symbols are elements of the finite field GF(q), with q > 2. These non-binary LDPC codes can allow significantly enhanced performance with respect to the binary case, by increasing the finite field dimension [7]. However, the decoding complexity is O(Ntq 2 ), where N is the block length, t is the average column weight, and q is the alphabet size [7,8]. Recent papers reveal that q-ary LDPC may be applied to magnetic recording channels, allowing reduced complexity schemes [9] and robustness to burst errors [10], making the q-ary solution comparable with the binary case.
Historically, considering an LDPC-coded system, the most popular scheme to improve error-correction performance over channels affected by ISI has been to serially concatenate a soft-output detection algorithm and the binary LDPC decoder. However, a greater performance improvement can be achieved by incorporating the channel detector in the iterative decoder: this implies a turbo concatenation of the two system blocks and several papers in literature call that configuration turbo equalization (TE). Further, in [11][12][13], an LDPC-coded detection-anddecoding system implemented by a joint algorithm based on the message-passing (MP) algorithm is addressed.
In this article, we extend the joint detection-anddecoding scheme to q-ary LDPC codes. Furthermore, we compare the proposed approach to TE, paying particular attention to the properties of the detector and the decoder http://jwcn.eurasipjournals.com/content/2013/1/18 selected for each one. Some preliminary results of this study are presented in [14].
The article is organized as follows. The first sections introduce the system model for the different architectures that are discussed in the article, namely TE, first turbo equalization iteration (FTEI), and the proposed joint MPbased architecture. For the FTEI scheme, no extrinsic information is exchanged, since only the first iteration is performed by doing soft separate detection and decoding.
Further, the computational complexity of each receiver scheme considered is discussed in a dedicated section. Finally, simulation results are given, and some remarks about future research development conclude the article.

System model
In this section, we analyze the performance of a novel receiver algorithm for q-ary LDPC-coded signals over PR channels, comparing its performance with those obtained by TE and FTEI schemes.
The basic system model is shown in Figure 1. The qary LDPC encoder output is a length- is the parity-check matrix. The binary representation of each codeword {ξ } n is transmitted by an antipodal binary pulse-amplitude modulator through a PR channel having a memory of length ν bits. Each receiver architecture that has been taken into account employs a q-ary LDPC decoder and a BCJR detector that can either be bit-based (BB) or symbolbased (SB).

BB detector
The BB detector is represented by the standard BCJR algorithm [3]. The symbol-wise a posteriori probabilities (APPs) are approximated applying the BCJR algorithm to the PR channel and then multiplying the APPs that form a symbol [15,16]. The symbol-wise APPs are passed to the q-ary LDPC decoder to initialize the a priori probabilities.

SB detector
The SB detector [15,17] modifies the way the probability functions are updated when compared to the original BCJR algorithm. Hoeher [17] develops a method, called optimal subblock-by-subblock detector, in order to calculate the APP of a block of p consecutive bits. Cheng et al. [15] show that simplifications of the algorithm can be made for the case of the binary-input ISI channels, specifically for p ≥ ν.
Let a ∈ GF(q = 2 p ) be an information symbol. It is possible to map each symbol in GF(q = 2 p ) to a distinct sequence of p bits; in other terms, the binary representation of a is b 0 (2). Let ζ τ be the state at time τ . The a posteriori probability that the information symbol ξ equals a conditioned on the length- Figure 1 Block diagram of the basic system model for q-ary LDPC coded signals over binary-input PR channels. http://jwcn.eurasipjournals.com/content/2013/1/18 Equation (3) is obtained using Bayes' rule and the principle of total probability. By applying Bayes' rule and the Markov property that events after time τ only depend on the current state ζ τ and are independent on past observations [15], it is possible to write (3) as follows: where β τ (ζ ) is the backward state metric, γ a (τ −(p−1),τ ) (ζ , ζ ) is the branch transition probability, and α τ −p (ζ ) is the forward state metric. Using the Bayes' rule and the fact that the symbol a priori probabilities are state-independent, the branch transition metric can be written as follows: If state ζ at time τ is connected to state ζ at time τ − p via the input sequence ξ = a then P(ζ τ = ζ |ξ = a, ζ τ −p = ζ ) = 1, otherwise P(ζ τ = ζ |ξ = a, ζ τ −p = ζ ) = 0. The term P(r τ τ −(p−1) |ζ τ = ζ , ξ = a, ζ τ −p = ζ ) is a function of the channel characteristic; in a PR channel with additive white Gaussian noise the probability density function can be calculated as In (5), N 0 /2 is the noise variance and y 0 p−1 is the PR channel output sequence that corresponds to the input sequence a = b 0 p−1 . The forward state metric α τ (ζ ) and the backward state metric β τ (ζ ) can be updated as in the original BCJR algorithm [3].
In the case of binary-input length-ν memory ISI channels, since the states represent subsequences of the input sequence [15], Equation (3) can be expressed as follows: In (6) ζ is the state that corresponds to the shift register configuration after an input of ν bits. This simplification is valid when p ≥ ν. Therefore, for sake of generality, in this article we refer to the SB detecting structure described in [17].

Receiver architectures
The next subsections introduce the three detection-anddecoding architectures considered in this article: the FTEI, TE, and Joint MP-based architectures, respectively. In the remainder of this section, we discuss the computational complexity of each proposed scheme.

TE
When TE and FTEI architectures are considered, a BCJRbased receiver defines the a priori probabilities that have to be provided to the q-ary LDPC decoder. These probabilities are computed following the BB and SB algorithms, described in the previous section. The decoding procedure employs the MP algorithm, which is described in detail in [7,8].
Further, when TE is performed, extrinsic information between the detector and decoder is exchanged in an iterative way until an LDPC codeword is found or a maximum number of iterations are performed [4,8,18,19]. It is worth to remind that, in case of LDPC codes, convergence to a codeword is easily detected by the receiver when the parity check equations are satisfied.
On the other hand, if only the FTEI is considered, no extrinsic information is exchanged. In other terms, detection and decoding are performed separately. Analyzing the error-rate performance of the FTEI architecture can provide an interesting insight on the role of the extrinsic information in decoding for BB and SB detections.
Finally, it is interesting to investigate how the errorrate performance of the TE architecture that employs BB detection can be affected by the correlation among the likelihoods input to the q-ary LDPC decoder. Thus, a BB-TE architecture using an interleaver has been analyzed as well. Specifically, the depth of the interleaver has been set to the value of the channel response length. It is worth to note that the correlation among consecutive bits in case of a TE architecture using SB detection should be mitigated by the intrinsic interleaving function provided by the q-ary LDPC code.

Joint MP-based architecture
In this section, we consider an architecture that employs a joint detection-and-decoding system based on a q-ary MP algorithm.
In the joint MP-based architecture, the channel constraints are represented on the graph by a set of nodes called channel nodes [13]. The graph obtained is tripartite, as shown in Figure 2, where circles correspond to the http://jwcn.eurasipjournals.com/content/2013/1/18

Figure 2
A graph that represents channel constraints and the parity check of the LDPC code. Circles represent variable nodes, squares represent parity check nodes, and triangles represent channel nodes. For instance, a 4-ary code over EPR4 channel is shown.
variable (or symbol) nodes ξ κ , squares to the parity-check nodes z κ and triangles to the channel nodes κ .
A parity check node i and a variable node j are connected if the value of H ij is non-zero. A channel node k is connected to the variable node j if the channel response involves bits from the binary representation of ξ j .
Following the notation in [7], let N(i) = j : H ij = 0 be the set of symbols linked to check node i and let the checks linked to symbol j belong to M(j)c = i : H ij = 0 . Moreover, let L(k) be the set of the channel nodes connected to the kth symbol node and let I(l) be the set of the symbol nodes connected to the channel node l.
Specifically, the edges between channel nodes and the variable nodes in the tripartite graph can be represented by a square adjacency matrix, = { ij } i,j=1,...,N . If ij is not set to zero, an edge in the tripartite graph between the ith channel node and the jth variable node has to be drawn. Thus, L(k) = {l : lk = 0} and I(l) = {k : lk = 0} The expression of the adjacency matrix depends on the length ν of the channel memory and on the alphabet size q of the code. For example, the adjacency matrix for a 16-ary LDPC coded transmission over the EPR4 channel (i.e., ν = 3) is as follows Each element of has its own binary representation depending on the channel response h(D) and the alphabet size q. Specifically, the binary representation ij of ij is a square p × p matrix, where p = log 2 q. For example, since the EPR4 channel response is h EPR4 (D) = 1+D−D 2 −D 3 , each ij related to the matrix in (7) can be written as follows ∀i = 1, . . . , N: At this point, for every a ∈ GF(q) we set up two quantities, Q a ij and R a ij , for each non-zero element of the parity check matrix H. The first is defined as the probability that jth symbol be a, depending on the information flowing by the whole checks except the ith one and by the whole involved channel nodes. On the other hand, R a ij is meant to be the probability of check z i being satisfied if ξ j is equal to a.
Analogously, let S a kl be the probability that the symbol l is a given the information obtained by the channel nodes other than the kth, and by the whole involved check nodes. Further, T a kl is the probability of channel node k being satisfied if symbol l is considered fixed at a. Figure 3 shows these quantities on the tripartite graph. Finally, let X[ n] be the value of X at the nth iteration of the iterative algorithm.
The joint MP algorithm works as follows.

Initialization
The channel inputs are i.i.d., so all state transitions are initially equally likely [13]. The a priori probabilities are then initialized as

Updating S a ij
For every iteration of the MP algorithm, the messages S a ij that symbol j sends to channel node i should be the parent's belief that it is in state a, according to all other children. Therefore, S a ij [ n] can be expressed as

Updating Q a ij
The messages Q a ij that symbol node j sends to the paritycheck node i should be the parent's belief that it is in state a, according to all other children. Thus, Q a ij at the nth iteration is updated as follows

Updating T a ij
The message that channel node i sends to symbol node j should be the probability of channel node i being satisfied if ξ j was in state a. Thus, it is necessary to sum over all the configurations ξ for which the channel constraint is satisfied and the jth symbol is in state a and add up the probability of the configuration, as follows where {r μ } μ∈I(i) , {ξ μ } μ∈I(i) represent the received symbols and the symbol nodes that are connected to the channel node i. The probability P( i |ξ ) of the channel constraint being satisfied is either 0 or 1 for any given configuration ξ . A channel node is satisfied if the configuration of the variable nodes is such that the given variable node ξ j is set to the given value a ∈ GF(q).
Moreover, since the channel nodes are part of a tripartite graph, their contribution to the detection process is determined by the information provided by the other nodes in the graph in the previous iterations as well. The channel nodes are only connected to variable nodes. Therefore, the information provided by the previous iterations of the joint scheme comes from the messages S ij . Specifically, this contribution is provided by the term k∈I(i)\j S ξ k ik [ n − 1] in (13), accordingly to the MP algorithm [20].
The way the value of T a ij has to be computed depends on the considered detection scheme. Specifically, when an SB detector is employed, T a ij can be expressed as follows: The value of α and β can be calculated as follows The expressions in (15) are directly derived from the comments and observations in [15] and from the graph connections. In fact, the term β j (ζ R ) is the backward state metric, whereas the term α j−1 (ζ L ) is the forward state metric. θ L κ and θ R κ represent the possible configurations of the input that lead to the states ζ L and ζ R , respectively. ζ L and ζ R are linked convoluting the binary representation of the q-ary symbol ξ j that has been set to a and the given channel response. It is worth to note that in case of BB detection α j−1 (ζ L ), β j (ζ R ) and γ a (j−1,j) can be obtained by properly multiplying the binary APPs resulting from the binary BCJR algorithm. Each ..,| R | can be constituted by one or more symbols, depending on the length ν of the channel memory.
It is easy to notice that the S contributions related to α represent the causal part of the ISI effect, while the S contributions related to β represent the anti-causal part of it. Since the channel constraints for a given PR channel are well defined, it is natural to wonder whether a more efficient way to compute the probability expressed in (13) exists.
T a ij can efficiently be calculated by treating the partial sums of a parity check as the states in a Markov chain [8], therefore T a ij can be written as follows In order to better exploit the definition of F T and B T , let ϕ v u (X) be the law that transforms X (living in GF(u)) in its v-ary counterpart. Thus, the terms denoted by F T and B T are defined as follows The corresponding probabilities are computed as ϕ is set to ϕ q 2 ( κj · ϕ 2 q (t)). That is, s and t have to be chosen such that the convolution of those values and the channel response leads the system to the state a. i, j are successive indexes living in I(κ), with j > i for the F T contribution, while j < i for the B T part.

Updating R a ij
The message that check node i sends to symbol node j should be the probability of check node i being satisfied if ξ j was in state a. As in Section 3.2.4, using the laws of probability R a ij can be expressed as The probability P(z i |ξ ) of the check being satisfied is either 0 or 1 for any given configuration ξ as in the previous step. R a ij can be efficiently calculated by treating the partial sums of a parity check as the states in a Markov chain [8], therefore R a ij can be written as follows The terms denoted by F R and B R are defined as The corresponding probabilities are computed as where i, j are successive indexes living in N(κ), with j > i for the F R contribution, while j < i for the B R part.

Tentative decoding
A tentative decoding codeword ξ be derived using the following expression: If ξ = ξ j j=1,...,N satisfies (1), then the decoding process is stopped, declaring a success, otherwise the algorithm http://jwcn.eurasipjournals.com/content/2013/1/18 iterates from Section 3.2.2. A failure is declared if the codeword is not found after reaching a fixed maximum number of iterations.

Computational complexity
In this section, we discuss the computational complexity of each aforementioned receiver architecture.
The number of operations required from each system is given by the complexity of the used pair of detector and decoder. Table 1 shows the computational complexity of each receiver for a length-N codeword of a LDPC code over GF(q). Specifically, p = log 2 q and ν represents the length of the memory of the considered ISI channel.
Since in this article we are taking into account a classic "flooding" decoding scheme [7], the computational complexity of a q-ary LDPC decoder is O(Ntq 2 ), as already stated in this article; t is the average column weight. It is proper to point out that there exist specific decoding schemes, such as those based on layered Belief Propagation (BP) algorithm [21], that can lower the aforesaid computational cost.
However, the overall complexity of a given receiver depends on the number of operations required by the detector and the decoder separately and on the way detector and decoder are linked in the architecture that is taken into account.
The systems employing an SB detector typically show lower computational complexities with respect to the corresponding architecture employing an BB detecting scheme. In order to better exploit this point, it is proper to take into account a trellis of p stages, that correspond to an input sequence of p bits. In these conditions, the number of paths that have to be set while employing a BB detector is p · 2 ν+1 . On the other hand, an SB detector in the same conditions needs only p−1 i=0 2 χ(ν−i) patterns to work, where χ(t) = t if t ≥ 0, χ(t) = 0 otherwise. Therefore, the BB detecting scheme requires a larger number of operations to work than the SB detecting scheme.
The TE without feedback architecture requires a number of operations that is simply the sum of those needed by the q-ary LDPC decoder and the employed detector.
On the other hand, the TE scheme shows a computational complexity that is proportional to N TI , that is the number of iterations between detector and decoder. Therefore, the FTEI architecture shows a computational complexity N TI times lower than that related to the TE.
Finally, the joint MP architecture requires a number of operations that is related to the sum of those necessary for the detection and decoding steps. Specifically, the computational complexity of the joint MP-based scheme depends on the convergence rate of the system. That is, the number of operations required by the architecture decreases as the detection-and-decoding scheme finds a codeword within the maximum number of iterations of the MP algorithm.
In other terms, for a given detection scheme and a given q-ary decoder, the joint MP-based architecture shows a computational complexity that is typically lower than that of TE and higher than that required by an FTEI system.

Simulation results
For all the considered schemes, a 16-ary LDPC code with coding rate R = 1 − M N equal to 8/9 is used. The codeword blocklength is set to 1,152 symbols (that is 4,608 bits) and the average column weight is set to 2.88. The variablenode degree distribution, following the notation introduced in [22,23], was λ 2 = 0.112 and λ 3 = 0.8889, where λ(x) = d v i=2 λ i x i−1 , and d v is the maximum symbol-node degree. The q-ary LDPC code has been constructed using quasi-regular PC matrices [23] generated by a modified progressive edge-growth algorithm [24] that maximizes the minimum space distance [9,10].
The maximum number of iterations for the q-ary LDPC decoder and for the joint MP algorithm of subsection 3.2 has been set to 25. Moreover, the maximum number of iterations between the detector and the q-ary LDPC decoder for the TE scheme has been set to 10.
We consider three different PR channels having different memory length ν and channel responses h(D) as in the following: Hence, the channel memory length ν is set to 2, 3, and 4 for PR4, EPR4, and EEPR4, respectively. As shown above, the PR4 channel used in the simulations is not the standard one: 1 − D 2 , usually used for magnetic recording channel characterization. Figures 3, 4, 5, and 6 show the performance comparison in terms of BER. The FTEI, TE, and joint MP BER results are plotted using, respectively, the square, circle, and triangle markers. Solid lines refer to SB detection, while http://jwcn.eurasipjournals.com/content/2013/1/18 dashed ones represent BB soft metrics. The error-rate performance of the BB-TE architecture using an interleaver of depth set to the value of the channel response length ν + 1 is shown as a dotted line with a circle marker. Figures 7 and 8 show how the performance gap between the joint MP and the TE schemes may be reduced, increasing the maximum number of allowed iterations for the joint MP algorithm described in Section 3.2.
The systems employing an SB detector largely outperform the architectures employing a BB one. A reason for this might be that it is possible to include invalid trellis paths in the calculation by using the BB detection approach [15]. It has been proved [25] that error propagation in turbo decoding is intensified by the feedback injection. This effect is highlighted in the error floor region, where error resilience of TC is typically weak [4].
Feedbacks in q-ary environments are even more harmful in case of MP decoding. In fact, a misleading computation of the input probability distribution affects groups of bits at once [8]. Therefore, a decoder failure occurs faster than in the binary case. Thus, it is hardly surprising that the FTEI employing BB detection shows better error-rate performance than TE employing BB detection, since BB-TE iterates the aforesaid information between detector and decoder.
Specifically, the minimum performance gain of the systems employing an SB detector on their respective BB counterparts goes from about 0.3 to about 0.75 dB 5  in terms of signal-to-noise ratio (SNR), depending on the PR channel that is taken into account. This behavior highlights how the rate p ν+1 plays an important role in the error-rate performance of the described algorithms.
On the other hand, the BB-TE architecture that employs a (ν + 1)-depth interleaver strongly outperforms the other architectures using BB detection. Specifically, its error-rate performance can be compared to the those provided by SB FTEI and joint MP-based architectures. Apparently, the aforesaid architecture takes advantage of the low correlation among the likelihoods of consecutive bits at the input of the q-ary decoder provided by the interleaver. However, SB-TE architecture still outperforms BB-TE architecture with interleaver on every channel that has been taken into account. It looks like the correlation provided by the PR channels can be well counteracted by q-ary decoding as it is fed by SB detection. This effect is consistent with the results obtained in different fields that investigate the role of interleaving in BB or SB synchronization [26,27].
Let us take a deeper look on the error-rate performance of the considered receiver architectures. TE employing SB detection outperforms SB-FTEI and SB joint MPbased architectures as every other architecture employing BB detection as well. Specifically, the gain of SB-TE with respect to the architectures employing a BB detection scheme is greater than 0.55 dB at least. This result furthermore highlights the influence of the computation of the soft input to the q-ary LDPC decoder on the extrinsic information required in the turbo architecture. Moreover, the performance gap of SB-TE architecture on every other BB receiver is maximum for p ν+1 = 1.
Indeed, the ratio p ν+1 influences the error-rate performance of the joint MP-based architecture. In fact, the joint MP-based architecture employing a BB detection outperforms both BB-FTEI and BB-TE without interleaver. 4  However, on the PR4 and EEPR4 channels where p ν+1 = 1, the gains achieved in BER are very tiny. On the other hand, it looks like the more p ν+1 gets larger, the performance gap between the SB-TE and the SB joint MP-based architecture becomes lower. This could be due to better match between the channel memory length and the q-ary symbol length in bits, for p ν+1 ≥ 1. Finally, a very interesting task is addressed by the behavior of SB-FTEI with respect to SB-TE. In fact, SB-TE outperforms by 0.6 dB on SB-FTEI for each PR channel that has been taken into account. This is a result that might require a deeper investigation.

Convergence analysis
In Figure 9, a convergence analysis using simulations for the joint MP-based architecture is shown. The maximum number of iteration is set equal to 25, while the average number of requested iterations to converge is plotted versus the corresponding SNR. This does not represent a general proof and exhaustively analysis of convergence, but can sufficiently be highlight how such scheme need fewer iterations as the SNR increases. Application of density evolution would need a deeper investigation, since the application to non-binary codes could be very difficult and computationally cumbersome [28,29]

Conclusions
The joint MP algorithm for ISI channels has been extended to q-ary LDPC codes and has been compared to TE architectures over some PR channels.
Simulation results for 16-ary LDPC code over three different PR channels showed that the best performance can be achieved by using a turbo equalizer whose detection is symbol-based. Consequently, using such an architecture appears to feature the best performance in an environment affected by ISI where good error-correction capability is desirable.
We observed that in general a symbol-based detection provides better performance with respect to a bitbased one. Furthermore, the proposed extension to q-ary codes of the joint detection algorithm could represent a good trade-off between performance and complexity with respect to TE and FTEI solutions. Simulation results of a TE scheme that employs BB detection and takes advantage of an interleaver provide an interesting insight on the role of the q-ary decoding in counteracting the correlation induced by the PR. This effect can be a subject for deeper investigations in future works.
Ongoing research that promises efficiency at the receiver end includes the analysis of different structures for the joint MP-based architecture employing an SB BCJR algorithm, operating on the channel constraints. Future directions for research could focus on the behavior of the analyzed schemes over different channels (like the magnetic recording channel) and with different LDPC codes, having different codeword length or degree-distribution profile as in [30].
In order to complete the analysis on the efficient receivers over PR channels, a study of the influence of the ratio p ν+1 has to be performed. Further, an investigation on the better detection method that may be addressed by other decoders (such as those based on layered BP algorithms [21]). Finally, an optimization of the construction of q-ary LDPC codes has to be taken into account, starting from the results provided in [31,32], trying to minimize the cycle effects induced in the tripartite graph by the channel nodes. Following this direction, another interesting line of research could be the optimization of decoding for higher-order modulation over finite-length memory channels, starting from [33,34].