Flexible soft-output decoding of polar codes

In this research, we study soft-output decoding of polar codes. Two representative soft-output decoding algorithms are belief propagation (BP) and soft cancellation (SCAN). The BP algorithm has low latency but suffers from high computational complexity. On the other hand, the SCAN algorithm, which is proposed for reduced complexity of soft-output decoding, achieves good decoding performance but suffers from long latency. These two algorithms are suitable only for two extreme cases that need very low latency (but with high complexity) or very low complexity (but with high latency). However, many practical systems may need to work for the moderate cases (i.e., not too high latency and not too high complexity) rather than two extremes. To adapt to the various needs of the systems, we propose a very flexible soft-output decoding framework of polar codes. Depending on which system requirement is most crucial, the proposed scheme can adapt to the systems by controlling the level of parallelism. Numerical results demonstrate that the proposed scheme can effectively adapt to various system requirements by changing the level of parallelism.

complexity [10], it has been widely researched thanks to the high potential of parallel implementation [11][12][13][14]. However, a number of redundant computations are required in the BP decoding, which results in high computational complexity.
To address high complexity of the BP algorithm, many methods have been studied to reduce computational complexity [15], and the soft cancellation (SCAN) decoding [16] is one of them. Following the serial message update schedule of the SC decoding, the SCAN algorithm has much lower computational complexity compared with the BP algorithm. However, at the expense of decreased complexity, the SCAN algorithm suffers from long decoding latency and low throughput.
In soft-output decoding of polar codes, the SCAN and BP algorithms can be considered as two extreme cases. With these two algorithms, the system can work well only for two cases: (i) when very low latency is demanded at the expense of very high complexity, or (ii) when very low complexity is demanded at the expense of very high latency. In practice, however, some soft-output decoders may require moderate latency and moderate computational complexity. Furthermore, such requirements might be time varying, depending on the time-varying demands of the system. It is not possible to dynamically cope with such change of system requirements with the SCAN or BP algorithms, because each of the two algorithms is fixedly (and permanently) tailored for each of two extremes.
The soft-output decoding algorithm adaptable to system requirements has not been studied yet to the best of our knowledge. In this work, we propose a decoding algorithm that can work effectively for any scenarios of system requirements including two extreme cases and any moderate (in-between) cases. The system requirements of the proposed algorithm can be flexibly adjusted by controlling the level of parallelism.
The remainder of this paper is organized as follows. We first present the softoutput decoding and perform the extrinsic information transit (EXIT) analysis for measuring the convergence latency. We construct the proposed decoding scheme by representing the polar code as the concatenated codes in which the outer codes are processed by the SCAN decoding in parallel, and the inner code is processed by the BP decoding. The convergence behavior with the PEXIT analysis shows how latency of the proposed scheme varies with the level of parallelism. It is also shown that the expense of the decreased latency is the increased complexity.

Background
With the construction method in [1], a polar code is defined by trio: codeword length N = 2 n , message length K, and an information set

Fundamentals of soft-output decoding
The soft-output decoding is performed over the factor graph, which is a graphical representation of the generator matrix interconnecting variable nodes (VNs) and check nodes (CNs). The factor graph is constructed from a protograph which serves as a blueprint. Figure 1 shows an instance of the factor graph for a rate-1/2 polar code with N = 8 and the protograph. In Fig. 1a, the rightmost gray-boxed (resp. white-boxed) VNs represent frozen bits (resp. information bits).

Existing scheduling for soft-output decoding
There are many types of scheduling for soft-output decoding. First, we discuss the BPbased scheduling. Messages on the flooding BP algorithm [13] are updated in parallel, from = n to = 0 . For each layer, L-messages and B-messages of each layer are updated simultaneously. Another scheduling is the round-trip BP [10,17] which separately computes L-messages and B-messages. The updates in each layer are separated to two phases. In the first phase, B-messages are updated from = n − 1 to = 0 . In the second phase, L-messages are updated from = 1 to = n . In the BP algorithm, the latency is very low; however, the computational complexity is high.
The other well-known scheduling is the SCAN algorithm where messages are updated by serial message updating schedule used in the SC decoding. The SCAN algorithm is implemented iteratively by omitting hard decisions on the data sequence. In the SCAN algorithm, the computational complexity is low; however, the latency is high.

EXIT analysis
For the convergence analysis of iterative decoders, the EXIT chart was introduced as a novel tool because of their simplicity and accuracy [18,19]. The EXIT chart can also be used for code design [20]. The EXIT chart analyzes exchanges of the average extrinsic mutual information of VNs and CNs and tell when the decoding converges. The protograph-based EXIT (PEXIT) analysis [21] is a modified version of the EXIT analysis. In contrast to the EXIT analysis which only treats average values, every mutual information of node is considered in the PEXIT analysis.
Let J (σ ) denote the mutual information between a binary random variable X with Pr(X = +µ) = 1/2 and Pr(X = −µ) = 1/2 , and a continuous Gaussian random variable Y with mean X and variance σ 2 = 2µ . J (σ ) is given by [18] Consider a VN with degree d v and a CN with degree d c . Let I Ev|g (resp. I Ec|g ) be the mutual information between the g-th output message of the VN (resp. CN) and the associated codeword bit. For an additive white Gaussian noise (AWGN) channel, the EXIT function of the PEXIT analysis for the g-th message is given by [21] (4) where I Av|k (resp. I Ac|k ) is a priori mutual information related to the message received by the VN (resp. CN) on its k-th edge and I ch is the channel mutual information. The convergence is declared if each I APP (j) , which is the mutual information between the a posteriori LLR evaluated by a VN and an associated codeword bit x j , reaches 1 as the iteration number increases.

Proposed scheme
The recursive structure makes polar codes be considered as generalized concatenated codes and the SC decoding can be interpreted as an instance of multistage decoding [22]. We consider a length-N polar code as a concatenated code of S length-(N/S) outer codes with a length-N inner code where S = 2 s denotes the number of outer codes. Figure 2 shows two different concatenated codes for a length-8 polar code. Each outer code is processed by the SCAN decoder in parallel, and an inner code is decoded in the reverse order of the round-trip BP.
The proposed decoding consists of three phases. The first phase is to update L-messages of an inner code based on the inputs of channel LLRs. In the second phase, the messages in outer codes are updated according to the SCAN schedule using the L-messages updated in the first phase as inputs. Each outer code updates the messages in parallel. The last phase is updating B-messages of an inner code using the output LLRs of outer codes.
In the framework, the SCAN algorithm is viewed as the proposed scheme with s = 0 , which is the serialized extreme, and the round-trip BP algorithm is viewed as the proposed scheme with s = n − 1 , which is the parallelized extreme. Thus, the framework provides the explicit scheduling of soft-output polar decoding containing both cases as two extremes. In the proposed framework, we can gradually change the updating schedule from the most serial way to the most parallel way. By controlling s, we can determine how much the decoding is parallelized. Therefore, we call s the level of parallelism. The proposed scheme is described in Algorithm 1. For example, in Fig. 2a, the SCAN algorithm makes one to update the output LLRs of C 0 as the inputs LLRs of C 1 first. After the decoding of C 1 , a decoder of C 0 uses the output LLRs of C 1 as input. The same goes on C 2 . Finally, LLRs for deciding codeword bits are updated in C 0 and one iteration is concluded. Figure 2b shows a different concatenation of a length-8 polar code. The input LLRs of the decoder for outer codes, {L s (φ, ω)} , are updated in lines 5 and 6 of Algorithm 1. The decoding of outer codes is performed in lines 7 and 8 in parallel. The decoding of outer codes is the same as the process of the SCAN decoding. The output LLRs of the outer code is updated in lines 9 and 10.
We also study the convergence behavior obtained based on the PEXIT analysis to measure latency. The convergence is declared if each I APP (j) , which is the mutual information between the a posteriori LLR evaluated by a VN and an associated codeword bit x j , reaches 1 as the iteration number increases. To analyze the convergence behavior, we track the min ∀j∈A I APP (j) . We can evaluate I APP (j) as: We analyze the mutual information passed to the right and the left, which are denoted by I L (φ, ω) and I B (φ, ω) , respectively. Mutual information, paired with LLR messages denoted by the same ( , φ, ω) on the protograph in Fig. 1b, can be calculated based on EXIT functions [21]. Calculations of mutual information are as follows: (10) The process of tracking the minimum I APP (j) is described in Algorithm 2. The process of Algorithm 2 is very similar to Algorithm 1. First, initialize each parameter in lines 1 to 3. Update input mutual information of the decoders for outer codes, I L +1 , in lines 5 and 6 of Algorithm 2. The decoding of outer codes is performed in lines 7 and 8 in parallel. The output mutual information of the outer code is updated in lines 9 and 10. Then, calculate I APP (j) using (1). If the value of I APP (j) is 1 for all js, then it is considered the time when the decoding ends and the number l is set as the convergence iteration number l s . We can obtain l s which is the number of iteration when the proposed scheme with s converges. (11) Simulations were performed over an AWGN channel with binary phase shift keying(BPSK) modulation. The polar codes with codeword length N = 512, 1024, and 32, 768 and code rate R = 1/2 which are constructed using Gaussian approximation [22] optimized for E b /N 0 = 2.0 dB for N = 512 and 1024 and E b /N 0 = 1.0 dB for N = 32, 768 were used to evaluate the performance.

Error correction performance
In Figs Fig. 4 FER comparison between the proposed algorithm, SCAN, BP, and SC for N = 512 and R = 1/2 . The polar code is constructed using Gaussian approximation [22] optimized for E b /N 0 = 2.0 dB extreme cases of the SCAN and BP algorithms with 10 6 frames. The number of iterations l max for each simulation is chosen to be 4l s .
The proposed scheme with s = 1 and s = 3 performs similar to the SCAN (i.e., s = 0 ) and BP (i.e., s = 9 ) decoding, respectively. The proposed scheme with s = 5 and s = 7 performs almost the same as the BP decoding. Generally speaking, the performance of the proposed algorithm spans approximately from that of the SCAN decoding to that of the BP decoding.

Latency
We assume that decoders for length-N polar codes have N processing units with each capable of implementing (10)- (13) in one stage, where the stage denotes the required number of serial message updates. The SCAN decoder requires (2N − 3) stages to terminate the message updates at each iteration.
The stages for the inner code of the proposed scheme are twice the number of the layers in the inner code. The stages for the outer codes are the same as the SCAN decoding of an N/S-length polar code. Thus, the total number of stages of the proposed scheme, T s , is given by The convergence behavior is obtained over the same environment except that codeword length is now 32,768. The convergence trajectories for I APP are plotted in Fig. 5. Furthermore, in Fig. 6, mutual information is tracked by PEXIT, and FER is tracked by Monte Carlo simulation. Two curves are not exactly matched but the order of latency (based on stage as we defined) with different value of s is the same for both PEXIT and simulation results. In this curve, minimum I APP = 1 and FER=10 −1 at E b /N 0 = 2 dB when N = 1024 were used for the standards for successful decoding, respectively. From the convergence behavior, we confirm that the proposed scheme with the higher level of parallelism has the lower latency.

Complexity
Each iteration of the proposed scheme and the SCAN algorithm has the same computational complexity because each message is updated only once. Therefore, the computational complexity of the proposed decoding is only dominated by the number of iterations. Figure 7 shows the values of latency and computational complexity of the proposed scheme with N = 32, 768 that are respectively normalized by their corresponding values of the SCAN algorithm (i.e., s = 0 ). Each point shows a normalized value when the decoding converges as the level of parallelism s increases. The proposed (14) T s = 2s + 2N S − 3 l s .