Minimum decoding trellis length and truncation depth of wrap-around Viterbi algorithm for TBCC in mobile WiMAX

The performance of the wrap-around Viterbi decoding algorithm with finite truncation depth and fixed decoding trellis length is investigated for tail-biting convolutional codes in the mobile WiMAX standard. Upper bounds on the error probabilities induced by finite truncation depth and the uncertainty of the initial state are derived for the AWGN channel. The truncation depth and the decoding trellis length that yield negligible performance loss are obtained for all transmission rates over the Rayleigh channel using computer simulations. The results show that the circular decoding algorithm with an appropriately chosen truncation depth and a decoding trellis just a fraction longer than the original received code words can achieve almost the same performance as the optimal maximum likelihood decoding algorithm in mobile WiMAX. A rule of thumb for the values of the truncation depth and the trellis tail length is also proposed.


Introduction
The IEEE 802.16 defines the wireless metropolitan area network (MAN) technology that is commonly referred to as WiMAX. The IEEE 802.16 includes two sets of standards, IEEE 802.16-2004 (802.16d) [1] for fixed WiMAX and IEEE 802.16-2005 (802.16e) [2] for mobile WiMAX. In mobile WiMAX, tail-biting convolutional codes (TBCCs) [3] are designated as the mandatory error-correcting codes. In the WiMAX transmitters, data bursts are divided into data blocks, and each data block is separately encoded by a TBCC encoder. The circular decoding algorithm [4][5][6], in which the wraparound Viterbi algorithm traverses on the circular code trellis, has been shown to be a simple and effective decoding method for TBCCs. Its performance depends on both the truncation depth of the Viterbi algorithm [7] and the length of the circular decoding trellis [8]. The larger the truncation depth or the longer the decoding trellis, the better the performance, but also more computational overhead and longer delay.
The goal of this paper is to investigate how to choose truncation depth and decoding trellis length in mobile WiMAX. The rule of thumb for truncation depth has been studied in the literature [9,10], but never for higher order modulations on the Rayleigh channel. Several circular decoding algorithms with adaptive decoding trellis length were proposed in [11][12][13][14]. These methods do not guarantee fixed number of computations. However, for DSP/ASIC implementation, fixed decoding trellis length with fixed number of computations and delay is preferable. In this paper, we examine the performance of the circular decoding algorithm with finite truncation depth and fixed trellis length for all transmission rates in mobile WiMAX. We first derive upper bounds on the error probabilities induced by finite truncation depth and finite trellis length. We show that the circular decoding algorithm with an appropriately chosen truncation depth and a fixed-length decoding trellis just a fraction longer than the original one can achieve almost the same performance as the maximum likelihood (ML) decoding algorithm in mobile WiMAX. Moreover, the truncation depths and trellis lengths that yield losses of 0.05 dB relative to ML decoding algorithm are obtained for all transmission rates on the Rayleigh channel.
Finally, we also obtain a rule of thumb for the relative values of truncation depth and trellis tail length.

Circular decoding algorithm
In mobile WiMAX systems, data bursts are divided into data blocks. Each data block is separately encoded by the binary (171, 133) convolutional encoder with memory m = 6. Before encoding, the convolutional encoder memory is initialized with the last 6 bits of the data block being encoded. Thus, the initial state of the code trellis is the same as the end state. After encoding, the TBCC code word is then punctured to realize the designated code rate r, where r can be one of the three possible code rates 1/2, 2/3, or 3/4. Let L denote the length of a data block, and let (d 0 , d 1 ,..., d L-1 ) denote the data block. It follows that the resulting TBCC code word with length n = L/r can be viewed as one period of the periodic convolutional code word generated by periodic data bits with period L. The circular decoding algorithm (similar to the one in [6]) with truncation depth W and trellis tail length U discussed in this paper is described as follows: Step 1: For each received codeword metric sequencē v = (v 0 , v 1 , . . . , v n−1 ), lengthen the sequence by copying the first U/r entries of the sequence and appending them to the end of the sequence.
Step 2: Data bits are decoded by using the soft-decision Viterbi algorithm with truncation depth W [9] and decoding trellis length L + U. It is convenient to explain the Viterbi decoding algorithm by means of a trellis diagram. Figure 1 illustrates an example of decoding trellis for a convolutional code with m = 2. The Viterbi algorithm is initialized by assigning the same metric value to all possible initial states. At each decoding depth t, t ≥ W -1, the information bit on the branch at depth t -W + 1 is decoded by selecting the best survivor path at state S t+1 and tracing back the path to find the information bit d t-W+1 . Thus, a total of L + U -W + 1 data bits are decoded by the Viterbi algorithm. It is to be noted that the last U -W + 1 decoded bits are obtained in the second round of traversing the circular decoding trellis.
Step 3: Replace the first U -W + 1 decoded data bits by the last U -W + 1 decoded bits to obtain the final data sequence of length L. Since the initial state of the TBCC encoder is unknown to the Viterbi decoder, the bit error rates (BERs) of the first few decoded data bits are much larger than those of the rest. Thus, the first few unreliable decoded bits are replaced by those decoded bits obtained in the second traverse of the circular decoding trellis.

Upper bounds on error probabilities
In this section, we derive the theoretical upper bound on the bit error probability of rate-1/2 TBCC for QPSK modulation over AWGN channel. It is to be noted that the bit error probability is independent of the correct code word. Thus, from now on we assume that the correct path is the all-zero path. Let0 denote the all-zero state. Let S * i and d * i denote the state and the data bit on the chosen path (corresponding to state S i and data bit d i in the example in Figure 1), respectively. The bit error probability of the kth decoded data bit in Step 2, k = 0, 1,..., L + U -W , is upper bounded by the sum of probabilities of the following four error events.
1. The chosen path at decoding depth k + W -1 diverges from the correct path at state S t 1 , 0 ≤ t 1 ≤ k and merges into the correct path for the first time at state S t 2 , k <t 2 ≤ k + W and the decoded data d * k = 0. 2. The chosen path at decoding depth k + W -1 diverges from the correct path at state S t 1 , 0 ≤ t 1 ≤ k, never merges with the correct path, and reaches state S * k+W , S * k+W =0. 3. The chosen path has an initial state S * 0 =0 and merges into the correct path for the first time at state S t 2 with k + m < t 2 ≤ k + W. (This is because if a path merges into correct path at state S t 2 , the last m data bits must be correct.) 4. The chosen path has an initial state S * 0 =0, never merges with the correct path and reaches state S * k+W =0.
From the definition of the first error event, we observe that the probability of this event, P 1 , is upper bounded by the bit error probability of ML decoding for zero-tail convolutional codes. Let be the transfer function of the zero-tail convolutional code [15], where d free is the free distance of the convolutional code, a ij is the number of paths with Hamming weight i that are generated by data sequences containing j non-zero bits, and the exponents of D and N describe the Hamming weights of coded sequences and data sequences of the paths, respectively. From [15], we get E s is the energy of a QPSK signal, N 0 /2 is the power spectral density of AWGN, and The second error event is induced by finite truncation depth. Let denote the transfer function for paths of length ℓ, which start from a state S B , S B ∈ B, end in a state S E , S E ∈ E, and never merge with the all-zero path in between, where b i is the number of such paths with Hamming weight i, and the exponent of D describes the Hamming weights of such paths. The probability of the second error event, P 2 , is upper bounded by the sum of all error probabilities caused by each possible error paths in the second error event. Thus, by following an argument similar to the ones in [10,15], P 2 satisfies where d 2 is the minimum weight of error paths in the second error event, and A is the set of all 64 states. The third error event is caused by the uncertainty of the encoder's initial state. Similarly, The error probability P 3 of the third error event satisfies where d 3 is the minimum weight of error paths in the third error event. Finally, the error probability of the last error event, P 4 , satisfies where d 4 is the minimum weight of error paths in the fourth error event. This error probability is caused by both finite truncation depth and the uncertainty of the encoder's initial state. The bit error probability of the kth decoded bit is upper bounded by the sum of four upper bounds in (2), (7), (8), and (9).
We first analyze these error probabilities for high signal-to-noise ratio. In this case, the value of D 0 is very small, and only the term with the smallest power of D is significant. In the upper bounds of P 1 , P 2 , P 3 , and P 4 , the smallest powers of D are d free , d 2 , d 3 , and d 4 , respectively. Let W* and k* be the least values of W and k such that d 2 >d free and d 3 >d free , respectively. It follows that, if truncation depth is W* and the first k* -1 decoded bits in Step 2 are replaced in Step 3 (equivalently, trellis tail length U* = W* + k* -2), the error probabilities P 2 and P 3 of each bit in the final data sequence will be small compared to the error probability of ML decoding for zero-tail convolutional codes. The values of W*, k*, and U* for the three code rates are obtained using a method similar to the one in [10] and are listed in Table 1. It is to be noted that for the rate-2/3 and rate-3/4 TBCCs, W* may be different for different bits in a puncturing period. Thus, the values of W* in Table 1 are the maximum values of W* over a puncturing period. In this table, d * 4 that denotes the value of d 4 for the case W = W* and k = k* is also listed. Observe that d * 4 > d free for every code rate. Therefore, we conclude that the bit error rate of circular decoding algorithm with W = W* and U = U* will asymptotically approach that of the ML decoding algorithm for zerotail convolutional codes for high signal-to-noise ratio. From (7) and (8), it follows that if the generator polynomials of convolutional codes are symmetric [10], From Table 1 we observe that even though the three codes in mobile WiMAX do not have symmetric generator polynomials, W* -m -1 is still a good estimation for k*.
We now examine how bit error rate is affected by the finite truncation depth and the uncertainty of the initial state for a not-so-high signal-to-noise ratio. We first consider the circular decoding algorithm with a very long tail length. With a long tail length, the upper bounds of P 3 and P 4 for each data bit (after replacement in Step 3) are much smaller than those of P 1 and P 2 . Thus, the average bit error rate is upper bounded by P 1 + P 2 . In other words, the degradation of decoder performance is mainly caused by finite truncation depth. It is to be noted that even if the trellis tail length is only 60 +W (equivalently, the first 61 decoded bits are replaced in Step 3), the value of P 3 + P 4 is many orders of magnitude smaller than the upper bounds of P 1 + P 2 . Figure 2 plots the upper bounds of P 1 and P 2 and their sum versus the truncation depth W for E b /N 0 = 4 dB. At this signal-to-noise ratio, the BER of optimal ML decoding algorithm (without memory truncation) is approximately 10 -5 . For comparison, simulation results of BER with tail length U = 120 are also plotted in the figure. We observe that the upper bound of P 2 decreases exponentially with the truncation depth W, so that BER is dominated by the error probability P 1 for truncation depth W ≥ W' = 35.
Next, we examine how bit error rate of each decoded bit (in Step 2) is affected by the uncertainty of the initial state. We consider the Viterbi algorithm with a very long truncation depth. With such a truncation depth, the upper bounds of P 2 and P 4 are much smaller than those of P 1 and P 3 for each decoded bit. Thus, the BER is upper bounded by P 1 + P 3 . It is to be noted that even if the truncation length is only 60, the value of P 2 + P 4 is many orders of magnitude smaller than the upper bounds of P 1 + P 3 . Figure 3 plots the upper bounds of P 1 and P 3 and their sum for each decoded bit with E b /N 0 = 4 dB. For comparison, the simulated BER for each decoded bit with truncation depth W = 100 is also plotted in the figure. We observe that the upper bound of P 3 decreases exponentially when bit index k increases. In other words, the performance degradation caused by the uncertainty of the initial state abates rapidly as the decoder traverses through the trellis. From the figure, we see that BER is dominated by the error probability P 1 for k ≥ k' = 27. Thus, if the first k' -1 decoded bits are replaced in Step 3, all the resulting data bits will have almost the same bit error probability. It is noteworthy that W'-m -1 is still a good estimation for k'. values of W* and k* in Table 1 as E b /N 0 approaches 5 dB and BER ≈ 5 × 10 -7 . Finally, we examine the error probability P 4 . Figure 4 plots the upper bounds of P 4 and P 1 + P 2 + P 3 for each  bounds on all the other three error probabilities is much larger than the upper bound of P 4 . As the truncation length W increases, the contribution of P 4 to BER becomes even more insignificant.

Simulation results
In this section, the performance of the circular decoding algorithm is evaluated for fully interleaved Rayleigh fading channel. All simulation results are obtained with no repetition coding and with data length L equal to the maximum data block length in the mobile WiMAX standard. Figures 5, 6, and 7 plot the average BERs of the circular decoding algorithm versus truncation depth for QPSK rate-1/2, 64QAM rate-2/3, QPSK rate-3/4 with very long tail length (U = 120), respectively. As a benchmark for comparison, the average BER of the optimal ML decoding algorithm (without memory truncation) is also plotted in the figures. These figures show that the circular decoding algorithm with a sufficiently large truncation depth can achieve almost the same error performance as optimal ML decoding. We also observe that all TBCCs require smaller truncation depth as E b /N 0 increases, which agrees with the observation in the previous section. Table 2 lists the least value of truncation depth W that yields losses within 0.05 dB of optimal ML decoding for BER ≈ 10 -5 . E b /N 0 values in Table 2 are the required bit signal-to-noise ratios for BER ≈ 10 -5 . From the table, we obtain a rule of thumb for the truncation depth W . The rate-1/2 code requires a truncation depth of six to seven times the memory of the convolutional code, and the rate-2/3 and rate-3/4 codes require a truncation depth of ten to eleven times the memory. From the table, we also observe that highorder modulations require larger truncation depths than low-order ones. Figures 8, 9, and 10 plot the BER of each decoded bit in Step 2 for QPSK rate-1/2 (with E b /N 0 = 8 dB), 64QAM rate-2/3 (with E b /N 0 = 17 dB), and QPSK rate-3/4 (with E b /N 0 = 12.5 dB) with large truncation depth (W = 100), respectively. We observe that even though the BER tends to decrease in general as the Viterbi decoder traverses through the trellis, the BER is not a monotonically decreasing function of index k. This is caused by a short block length L and small interleaving depth in Figure 8. When pairs of coded bits for QPSK signals are deinterleaved to form a codeword trellis in the receivers, coded bits in some pairs end up being very close to each other on the trellis, while others are further apart. In Figures 9 and 10, this problem is further complicated by code puncturing in rate-2/3 and rate-3/4 convolutional codes and unequal protection of each coded bit in high-order modulation. Define k as the index of the first decoded bit that attains losses within 0.05 dB of optimal ML decoding. Table 2 lists the values of k for BER ≈ 10 -5 . It is to be noted that for rate-2/3 and rate-3/4 TBCCs, each data bit in a puncturing period has a different protection level and the values of k in the table are obtained by using the average BERs over a puncturing period. We observe that only a small fraction (less than 1/3) of the decoded bits in the first decoding round are unreliable, and thus should be replaced. We conclude that if truncation depth is W and the first k − 1 decoded bits in Step 2 are replaced in Step 3 (equivalently, trellis tail length U = W + k − 2 ), the losses caused by truncation and the uncertainty of the initial state will be both within 0.05 It is noted that if the tail length is chosen according to another criterion that the average BER over the whole TBCC codeword attains a loss less than 0.05 dB, the tail length will be substantially less than U . Finally, Figure 11 plots the average BERs of the circular decoding algorithm with truncation depth W and trellis tail length U for all transmission rates in mobile WiMAX.

Conclusions
We have investigated the error probabilities of TBCCs caused by memory truncation and the uncertainty of the initial state. From the upper bounds on the error probabilities, we found that if the same criterion is used to choose the truncation depth W and the first reliable decoded bit k, then k = Wm -1 for symmetric convolutional codes. The truncation depth, the index of the first reliable bit, and the trellis tail length with 0.05 dB losses on the Rayleigh channel were obtained by simulation for each transmission rate in the mobile WiMAX standard. From the results, we obtain a rule of thumb for the truncation depth W and trellis tail length U . The rate-1/2 code requires a truncation depth of six to seven times the memory m, and the rate-2/3 and rate 3/ 4 codes require a truncation depth of ten to eleven times m. Moreover, Wm -1 is an appropriate rule of thumb for the first reliable decoded bit k. Thus, the rule of thumb for trellis tail length is U = 2W -m -3. The results show that the circular decoding algorithm with an appropriately chosen truncation depth and a circular trellis just a fraction longer than the original trellis can achieve almost the same performance as the optimal ML decoding algorithm in mobile WiMAX. Moreover, it is observed that high-rate TBCCs require larger truncation depths and longer trellis length than low-rate ones, and high-order modulations require larger truncation depths and longer trellis length than low-order ones.