A low-complexity maximum likelihood decoder for tail-biting trellis

Tail-biting trellises are defined on a circular time axis and can have a smaller number of states than equivalent conventional trellises. Existing circular Viterbi algorithms (CVAs) on the tail-biting trellis are non-convergent and sub-optimal. In this study, we show that the net path metric of each tail-biting path is lower-bounded during the decoding process of the CVA. This property can be applied to remove unnecessary iterations of the CVA and results in a convergent maximum likelihood (ML) decoder. Simulation results show that the proposed algorithm exhibits higher decoding efficiency compared with other existing ML decoders.


Introduction
For linear block codes, conventional trellis and tail-biting trellis representations have gained a great deal of attention in the past decades [1][2][3]. Trellis representations not only reveal the code structure, but also lead to efficient trellisbased decoding algorithms [4][5][6][7][8]. For the same linear block codes, the number of states in its tail-biting trellis can be as low as the square root of the number of states in its minimal conventional trellis [1,8], e.g., for the (24, 12) extended Golay code, the maximum number of states in its conventional trellis is 512 [8], while the maximum number of states in its time-varying tail-biting trellis is only 16 [1].
In addition, the tail-biting technique has been widely used in convolutional encoders to eliminate the rate loss caused by the known tail bits. For example, the Worldwide Interoperability for Microwave Access (WiMAX) [9] and Long Term Evolution [10] standards both adopted tail-biting convolutional codes in the control channel or broadcasting channel. Consequently, a maximum likelihood (ML) decoder with high decoding efficiency on tail-biting trellises is important and desirable for studying tail-biting codes.
Both the Viterbi algorithm and bidirectional efficient algorithm for searching code trees (BEAST) can Full list of author information is available at the end of the article achieve ML decoding on conventional trellises [8]. In the case of a tail-biting trellis, due to the lack of a priori knowledge about the starting state, the Viterbi and BEAST decoder have to perform an exhaustive search on tail-biting trellises to find the ML codeword. BEAST can be more efficient if applied to the conventional trellis obtained by reducing the tailbiting code generator matrix to the minimum span form [8].
Another kind of decoder, the two-phase ML decoder, has been proposed to reduce the decoding complexity for tail-biting trellises [5,6]. This kind of algorithm performs Viterbi searches on tail-biting trellises in the first phase and records the accumulated path metric of each path at every section for the second phase. In the second phase, heuristic searches are performing based on the result obtained from the first phase. Since the two-phase decoder is based on two distinct algorithms, this makes it difficult for practical application.
The circular Viterbi algorithm (CVA)-based decoder greatly reduces the implementation complexity of a decoder for tail-biting trellises and provides near-optimal block error rate performance. However, the decoding process of the CVA is non-convergent and sub-optimal [4,7]. In this paper, we introduce a CVA-based ML decoder for tail-biting trellises. In this algorithm, the lower bound of the net path metric of each tail-biting path can be http://jwcn.eurasipjournals.com/content/2013/1/130 obtained to exclude impossible starting state candidates, which leads to convergence of the CVA. In addition, the net path metric of survivor paths can be used to terminate redundant searches without performing a full Viterbi iteration.
The following parts of this article are organized as follows: In Section 2, a detailed description of the algorithm is presented. The performance of the proposed algorithm is demonstrated by simulations in Section 3. Section 4 concludes this paper.

Variable definition
An example of a tail-biting trellis is shown in Figure 1, which has eight sections with four states at each section. For a tail-biting trellis with L sections, denote by S l the set of states at location l, where 0 ≤ l ≤ L. From the definition of tail-biting trellises, we have S 0 ≡ S L . Any path that starts from and terminates at the same state forms a tail-biting path. All tail-biting paths that start from the same state construct a sub-tailbiting trellis of this state. In Figure 1, the branches of thick solid lines form a tail-biting path of state '01, ' and all solid lines compose the sub-tail-biting trellis of state '01' .
A conventional CVA-based decoder performs Viterbi iterations around the tail-biting trellis to find the optimal tail-biting path. This algorithm takes advantage of the circular property of tail-biting trellises and employs the path metrics accumulated in the ending states of the trellis to initialize the path metrics in the starting states for a new iteration until a predefined termination condition is fulfilled. In the following parts, we elaborate the decoding process of CVA on tail-biting convolutional codes. The decoding process for general tail-biting trellises can be similarly obtained.
For tail-biting convolutional codes of rate b/c, the length of information bits is bL and the length of the corresponding codeword is cL. Binary code bits v (l) L , and (l) L = l mod L. During the decoding process of the CVA, the accumulated path metric of the survivor path entering state s at location l in the ith iteration is The weighted Hamming distance between r (j) l and x (j) l can be defined as in [8]: where sgn r (j) l denotes the sign of r (j) l . Based on (1) and (2), the ML decoding on the tail-biting trellis is equivalent to solving the following equation: The term r (j) k can be ignored in the third line of (3) since it is independent of specific codewords x and consequently is a constant for all paths on the tail-biting trellis.
Denote by P i (β i (s), s) the survivor path that connects state β i (s) of S 0 with state s of S l in the ith iteration. http://jwcn.eurasipjournals.com/content/2013/1/130 The corresponding net path metric of P i (β i (s), s) can be derived from (1) and (3): Since the initial path metrics M i+1 0 (s ) are different from M i 0 (s ) for each state s ∈ S 0 , different survivor paths can be obtained in each iteration. Denote by P i the ML path obtained in the ith iteration, where the ML path obtained from the first iteration has the least net path metric among all possible survivor paths [4]. Similarly, the ML tail-biting path obtained in the ith iteration is denoted by P i , which has the net path metric of M i . Among the set of tail-biting paths {( P i (s, s), M i (s, s)) | ∀s ∈ S 0 , i ≥ 1}, the optimal tail-biting path and its net path metric are denoted by P O and M O , respectively.

Lower bounds of the net path metrics
A conventional CVA-based decoder is non-convergent, and consequently, it cannot guarantee that the tail-biting path obtained is optimal when the decoding process is terminated [4,7]. In order to design a convergent ML decoder on tail-biting trellises, further information needs to be obtained from the decoding process of CVA. Based on the characteristics of CVA, we can derive a lower bound of the net path metric of each tail-biting path. This observation is summarized in Lemma 1. Proof. The tail-biting trellis defined on a circular time axis can be split at section l = 0 and duplicated on the time axis head-tail. Conventional CVA then becomes a general Viterbi decoder composed of several length L decoding sections, where the Viterbi algorithm searches on the duplicated trellis by recording and repeating the received symbols. Consequently, combining (1) and (3), we find that the survivor path P i (β i (s), s) has the minimum accumulated path metric M i l (s) among all possible paths P i (s * , s), where s * , β i (s) ∈ S 0 , s ∈ S l , and 0 ≤ l ≤ L − 1. Consequently, we have

Lemma 1. Let P(s, s) denote the ML tail-biting path on the sub-tail-biting trellis of state s, and the corresponding net path metric is M(s, s), where s ∈ S 0 . Define B(s) as
Since (6) holds for 0 ≤ l ≤ L − 1, we know that for any s ∈ S L , (6) also holds. Then for the ML tail-biting path, P(s, s), on the sub-tail-biting trellis of state s, from (6), we have Since Since P(s, s) is the ML tail-biting path on the sub-tailbiting trellis of state s, from (3) and (8), we come to the conclusion that B(s) is a lower bound of the metrics of all paths on the sub-tail-biting trellis of state s.
The lower bound B(s) defined in Lemma 1 is updated as iterations continue, and a more precise estimation of B(s) can be obtained if more iterations are performed on the tail-biting trellis. According to (8) Based on Lemma 1, we can reduce the decoding complexity of CVA on the tail-biting trellis by removing redundant computations and iterations during the decoding process and control the convergence of CVA. The improvements of the proposed decoder can be summarized into the following two aspects.
Firstly, during the decoding process, if the net path where M O is the optimal tail-biting path obtained in the first i − 1 iterations and s ∈ S l , all searches that follow state s can be terminated (refer to Figure 2). In this case, the net path metric of any survivor path that starts from state β i (s) and passes through state s is not less than M O .
Secondly, denote by S i C the set of survivor starting state candidates in the ith iteration of the CVA, i.e., This indicates that the ML tailbiting path on the sub-tail-biting trellis of state s is not better than P O . State s can be excluded from the set S i+1 C . As iterations continue, the search space on the tail-biting http://jwcn.eurasipjournals.com/content/2013/1/130 trellis shrinks and the decoder will converge to the global optimal solution eventually.

CVA-based ML decoder on tail-biting trellis
We summarize the above decoding process as follows: In the algorithm description, the operator '←' denotes value assignment from the right-hand side to the left-hand side, and the operator '=' denotes logic comparison between two operands.
From the description above, we find that the decoding process can only be terminated when S i+1 C = ∅ in step 2.4. The number of entries in S 1 C is finite, and as iterations continue, the size of S i C will reduce to zero. Firstly, state s with bound B(s) > M O is deleted from S i C in step 2.3 since the ML tail-biting path on the sub-tail biting trellis of state s is not better than P O .
Secondly, after the (i + 1)th iteration, if no state has been deleted from S i+1 C , i.e., ∀ s ∈ S i+1 C , B(s) < M O , then equation S i+1 C = S i+2 C holds and the Viterbi algorithm will be performed on the sub-tail-biting trellis of Then state s † can be deleted from S i+1 C since the ML tail-biting path on its sub-tail-biting trellis has been found.
Thirdly s ), B(s ) will be updated with M(s , s ) in step 2.3, and then state s will be deleted from S i C since the equality in B(s) ≥ M O holds. All survivor starting state candidates in S 1 C will be deleted eventually, and S i C will be empty in a finite number of iterations. When S i C is empty, the decoder converges to the global optimal solution that has been recorded in http://jwcn.eurasipjournals.com/content/2013/1/130 Algorithm description 2.1 Perform Viterbi algorithm on the tail-biting trellis with the set of starting state candidates S i C ; during the decoding process, 3. Output the codeword associated with P O ; P O . In summary, the algorithm presented above is a convergent ML decoder on tail-biting trellises. We call this a bounded CVA (B-CVA) ML decoder.
The following example is used to explain the decoding process of the B-CVA.

Simulation results
In order to show the decoding efficiency, we compare the proposed B-CVA ML decoder with other widely cited tailbiting ML decoders in three aspects: the number of real additions, the number of logical comparisons, and average memory space consumption during the decoding process. The codewords are modulated to BPSK symbols and then passed through an AWGN channel. The results shown in the tables and figures are obtained by observing at least 100 block error events.
The conventional BEAST ML decoder should perform decoding on each sub-tail-biting trellis independently and consecutively to find the ML tail-biting path [8]. To improve its efficiency, we use the upper-bounding technique on the thresholds used by BEAST. During the consecutive decoding process, the metric of the optimal tail-biting path obtained on the first i sub-tail-biting trellises are used to upper-bound the thresholds that will be used for decoding the remaining |S 0 | − i sub-tail-biting trellises, where |S 0 | denotes the size of set S 0 . With this   The ML decoders are the advanced BEAST ML decoder, two-phase ML decoder, or the B-CVA ML decoder.
upper-bounding technique, redundant searches can be terminated timely or reduced through the whole decoding process. For convenience, we call the BEAST decoder with this upper-bounding technique an advanced-BEAST ML decoder, which should be more efficient than the original BEAST decoder in [8]. Results presented in Appendix 4 are used to demonstrate this point. In the first example, different decoders were applied for decoding the tail-biting convolutional codes (64, 32) with octal generator polynomials {345, 237}, which can be represented by a 128-state tail-biting trellis [7]. For the demonstration of the block error rate (BLER) performance of the B-CVA, we choose the most cited suboptimal decoder proposed in [7] for comparison. This decoder is called the wrap-around Viterbi algorithm (WAVA). Because the WAVA is non-convergent, the maximal allowed number of iterations is set as 20 during the decoding process. Table 1 shows the BLER performance of different decoders. We find that of the sub-optimal decoders, WAVA has performance that is close to the optimal results. Figure 3 shows the average memory space consumption of the different ML decoders during the decoding process. We find that the advanced BEAST decoder and the B-CVA decoder require almost constant memory space from the low-to high-SNR regions. However, we find that the WAVA decoder requires less memory space than the B-CVA. This is due to the fact that WAVA has no need of space for storing B(s) of every state in S 0 . The memory space required by the two-phase decoder is about 2 ∼ 12 times more than that required by the B-CVA or BEAST decoders. This is due to the fact that the twophase decoder has to store path metrics of all survivor paths obtained in the first phase and has to maintain the open stack and close table during the second phase. Figure 4 shows that the two-phase ML decoder which  is based on depth-first searches is a little better than the B-CVA ML decoder in the number of real additions. The BEAST ML decoder which performs exhaustive searches on the tail-biting trellis shows the highest decoding complexity. Figure 5 shows that a larger number of logical comparisons should be performed by the two-phase decoder than that performed by the B-CVA decoder. This is due to the fact that many logical comparisons need to be performed to sort the open stack and to retrieve the close table in the low-SNR region. In fact, the length of the close table grows linearly as the searches continue. Figures 4 and 5 also show that the advanced BEAST decoder, which has to be performed on each subtail-biting trellis, exhibits high decoding complexity from the low-to high-SNR regions. Meanwhile, we find that the WAVA exhibits high decoding complexity from the low-to medium-SNR regions; this is caused by circular traps existing in the CVA decoding process [4]. The second example adopts the tail-biting convolutional codes that have been used in WiMAX. The corresponding generator polynomials are {133, 171} in octal form [9]. The length of the information sequence is 40 bits, and the corresponding code is a long tail-biting convolutional code [7], which is denoted as an (80, 40) tail-biting convolutional code. The BLER of the WAVA and ML decoders are presented in Table 2, where the ML decoder in Table 1 refers to any kind of ML decoders mentioned in this paper: the advanced BEAST ML decoder, twophase ML decoder, or BCVA decoder, since all of them exhibit exactly the same block error rate performance. We find that in the case of long tail-biting codes, the BLER performance of WAVA is close to that of ML decoders.  From Figures 6, 7 and 8, we come to almost the same conclusions as that in the first example. In Figure 6, the memory space required by the advanced BEAST decoder decreases slowly as system SNR grows. Figure 8 shows that, in the low-SNR region, many comparisons were performed to maintain the open stack and close table in the second phase of the twophase ML decoder. In summary, the B-CVA decoder is an efficient ML decoder on tail-biting trellises both in memory space saving and decoding complexity reduction.

Conclusion
We propose a convergent CVA-based ML decoder for tailbiting trellises. The proposed algorithm takes advantage of the lower bound of the net path metric of the tail-biting path to control the decoding process of CVA. Simulation results show that, on tail-biting trellises, the B-CVA decoder exhibits high decoding efficiency while requiring relatively small memory space during the decoding process. These advantages make it attractive to practical applications.

Appendix 1: decoding process of the two-phase ML decoder
In Example 1, the decoding process of the two-phase ML decoder can be described as follows [6]: (1) During the first phase, the decoder stores the path metric M 1 l (s) of ∀s ∈ S l and 0 ≤ l ≤ L; the decoding complexity of the first phase is the same as that of the B-CVA, which contains 64 real additions and 32 logical comparisons; (2)