 Research
 Open Access
 Published:
Polar codes for nonidentically distributed channels
EURASIP Journal on Wireless Communications and Networking volume 2016, Article number: 287 (2016)
Abstract
We introduce new polar coding schemes for independent nonidentically distributed parallel binary discrete memoryless channels. The first scheme is developed for the case where underlying channels are time invariant (the case of a deterministic channel parameters), while the other schemes deal with a scenario where underlying channels change based on a distribution (the case of random channel parameters). For the former case, we also discuss the importance of the usage of an interleaver Q to enhance system reliability, and for the latter case, we model the channel behavior of binary erasure channels. It is shown that the proposed polar coding schemes achieve a symmetric capacity.
Introduction
The polar codes previously introduced in [1] are a coding scheme that achieves the symmetric capacity of any discrete memoryless channel (DMC) by exploiting channel polarization phenomena. The block error rate converges to zero as the code length N goes to infinity.
One disadvantage of polar codes is their dependency on the underlying channel. If a channel is unknown to a transceiver, communication becomes unreliable and erroneous decoding can result, since the code design is based on a different channel. The performance gap due to imperfect channel knowledge was analyzed in [2], and a robust approximation technique has been proposed for this mismatch. In [3], it was demonstrated that polarization is possible even under a nonstationary channel.
In this paper, we propose an approach to the design of polar codes in the parallel channel model based on a martingale process and extend the model to the case in which the channel parameters are no longer fixed but exhibit some random behaviors. In addition, we briefly discuss the importance of the usage of the proper interleaver Q.
The communication scenario in which the transmitter and the receiver do not know the channel parameter but know the domain (set) to which it belongs is known as the compound channel scenario [4]. The authors of [4] defined the compound channel capacity as the rate at which one can reliably transmit data without knowing the underlying channel parameter. In [5] and [6], polar codes that achieve the compound channel capacity were proposed. In [5], the unknown channel is deterministic during a codeword transmission, and in [6], the authors dealt with a deterministic compound parallel channels model.
Parallel channels are used to model statistical behaviors of communications in conveying multiple data simultaneously between nodes, connected with multiple links. The internal structure of storage systems or singular value decomposed multipleinput multipleoutput (MIMO) channels are an example of parallel channel models. In these channels, the links do not need to be the same, and their statistical behaviors can differ. The question then arises about whether the parallel channel capacity is achievable via polar codes. In [7, 8], it is demonstrated, under a nonidentically distributed channel assumption, that when all channel parameters (CP) are given to the transceiver, polar codes can achieve the capacity in both degraded and nondegraded channel settings.
In this paper, we show that the polarization phenomenon occurs in the parallel channel model and provide proof of achievability of polar codes for general binary discrete memoryless channels (BDMC), under a nonidentically distributed assumption, where deterministic CPs are given. Second, in contrast to previous literature, we also consider a case where CPs that describe binary erasure channels (BECs) are realizations of some random variables, and only their probability distributions, and not their exact values, are known to the transmitter and the receiver.
This is similar to the compound channel scenario, since the transmitter does not know the CPs but differs since the CPs, which are a realization of random variables, might differ. The system model is depicted in Fig. 1. The role of the interleaver in this figure is discussed in the last part of the second section.
Developing polar codes for the randomly varying channel scenario is one of the noticeable issues. For the ith transmit BEC at time instant η, assume that an erasure probability ε _{ i }(η) is a realization of a random variable ε that follows a distribution f _{ ε }. In this case, conventional polar coding schemes fail to achieve the capacity since the encoder and the decoder do not know the exact value, ε _{ i }(η), of the underlying channel. For example, in flash memories, the statistical responses of the voltage threshold may change asymmetrically with time and with the number of accesses to the cells. As the storage capacity increases, it is inefficient for a storage controller to trace and probe the exact states of every block or cell.
The remainder of this paper is organized as follows. In the second section, we analyze the polarization process on a timeinvariant nonidentical parallel channel. This corresponds to a parallel channel with deterministic CPs which could differ to each other. In the third and fourth sections, an independently nonidentically distributed channel model with random CPs is discussed. In these sections, we assume that only the distributions of the CPs of each channel, W _{(i)}, (∀i∈ [ 1,N]), are known to an encoder and decoder. In the third section, the CPs may change in a blockbyblock manner; however, their states are maintained within a block. In the fourth section, in contrast to the third section, CPs may change even within a block. In the fifth section, we discuss a special case of partially dependent channels and show that polar codes based on the qary input can achieve the capacity. In the final section, concluding remarks are provided.
Nonidentical channels with deterministic CP
Let \(W:\mathcal {X}\rightarrow \mathcal {Y}\) denote a general symmetric binary input memoryless channel (BDMC) and \(W_{N}:\mathcal {X}^{N}\rightarrow \mathcal {Y}^{N}\) denote a vector channel. If channels are independent but not identical, then \(W_{N}({y_{1}^{N}}{x_{1}^{N}})=\prod _{i=1}^{N}W_{(i)}(y_{i}x_{i})\) where \(W_{(i)}: \mathcal {X}_{i} \rightarrow \mathcal {Y}_{i}\), such that their transition probabilities p _{(i)}(yx) and p _{(j)}(yx) may differ if i≠j. In parallel channels, a nonidentical channel may correspond to a transmission scenario through links with different qualities. In terms of the channel model, this can be interpreted as a fastfading channel in a time or frequency domain. This corresponds to the case when the time duration or the frequency gap between adjacent channels is larger than the coherence time or coherence frequency, respectively.
As in [1], given any BDMC W _{(i)}, the same definitions of the symmetric capacity and the Bhattacharyya parameter are adopted as performance measures

Symmetric capacity

Bhattacharyya parameter
According to Proposition 1 in [1], the two parameters satisfy
For later use, we denote the bit channel \(W_{N}^{(i)}\) by
where N=2^{n} is the code length.
In this section, channels are assumed to be independent and not identically distributed, and the CPs are known to the transmitter and the receiver. For BEC with \({\epsilon _{1}^{N}}\), these erasure probabilities may differ and are known in advance to the encoder and the decoder. Let us denote the sum symmetric capacity by \(I_{s}={\sum _{1}^{N}}{I_{i}}\), where I _{ i }=I(X _{ i };Y _{ i }), and the sample mean by \(\mathbb {E}[I_{i}]= \frac {I_{s}}{N}\). Then, the following theorem holds.
Theorem 1
For any set of BDMCs {W _{(i)}},i∈ [ 1,N], for arbitrary small δ≤0, there exist polar codes that achieve sum capacity I _{ s }, in the sense that as N, which is the power of 2, goes to infinity, the fraction of indices i∈ [ 1,N] satisfies
For simple notations, denote \(Z_{N}^{(i)}\triangleq Z\left (W_{N}^{(i)} \right)\) and \(I_{N}^{(i)}\triangleq I\left (W_{N}^{(i)} \right)\). To prove the above theorem, we first need to clarify the recursive structures of \(W_{N}^{(i)}\), \(Z_{N}^{(i)}\), and \(I_{N}^{(i)}\). Second, we need to prove that the values of \(I_{N}^{(i)}\) and \(Z_{N}^{(i)}\) converge to {0,1} as the code length N increases. For the second proof, the martingale convergence theorem and (3) will be used. These statements are proved via Lemma 1, Lemma 2, and Proposition 1. After proving Lemma 2, we summarize the proof of the theorem.
For a kernel \(F = \left [ \begin {array}{cc} 1&0\\ 1&1 \end {array} \right ]\) of thegenerator matrix G _{ N }=B _{ N } F ^{⊗n}, the recursively evolving structure of \(\left (W_{N}^{(i)},W_{N}^{(i)} \right) \mapsto \left (W_{2N}^{(2i1)},W_{2N}^{(2i)} \right)\) is similar to that of the recursive equation in [1], except for the last recursions for the length 2.
Lemma 1
Suppose \(\left (W_{N}^{(i)},W_{N}^{(i)} \right) \mapsto \left (W_{2N}^{(2i1)},W_{2N}^{(2i)} \right)\) for some set of binary input channels. Then,
where the last recursive channel relations is a mapping such that \(\left (W_{(j)},W_{(j+1)} \right)\mapsto \left (W_{2}^{(1)},W_{2}^{(2)} \right)\)
The evolution of the parallel channel transition probabilities is depicted in Fig. 2. Observe that the structure is still recursive, but the number of inputs and the outputs of each evolution is identical.
The proof of the Lemma 1 directly follows from [1]. The last recursion derives from the independently and nonidentically distributed parallel channel environment.
We now extend Propositions 4–7 in [1], which were proved under the i.i.d. condition, into the nonidentically distributed DMC case. This extension was performed in previous literatures, in the names of “parallel channels” [7, 8] and “nonstationary channels” [3] with the measure of the symmetric capacity.
Evolution of symmetric capacities
In this subsection, we consider the symmetric capacities of the recursively achieved bit channels. By inserting (6) and (7) into the definition of the symmetric capacities (1), the following proposition holds.
Proposition 1
Suppose \(\left (W_{(1)},W_{(2)} \right)\mapsto (W_{2}^{(1)},W_{2}^{(2)})\) for any binary input discrete channels. Then,
Proof
of ( 8 ) Equation (8) can be proved as follows:
Since the mapping from a message vector \({U_{1}^{2}}\) to an encoded codeword is deterministic through a generator matrix G _{2}, no information loss occurs between the third equation and the forth equation. Also, the fifth equation derives from the independency among the transmit channels. □
To prove (9), we focus on the second bit channel \(W_{2}^{(2)}\)
Since the value of mutual information is always nonnegative, I(·)≥0. By inserting (11) into (8), we obtain \(I\left (W_{2}^{(1)}\right) \leq I\left (W_{(1)}\right)\)
However, ambiguity remains about whether (9) and (10) are true based on this relation and (11), since there are no additional conditions on the qualities of transmit channels W _{(i)},i∈ [ 1,N] that make them order with measures of the symmetric capacity I(·) or the Bhattacharyya parameter Z(·).
Now, we can consider three necessary conditions for (9) and (10) to be true: (1) \(I\left (W_{(1)} \right) \leq I\left (W_{2}^{(2)} \right)\), (2) \( I\left (W_{(2)} \right) \geq I\left (W_{2}^{(1)} \right)\), and (3) I(W _{(1)})≤I(W _{(2)}).
Conditions 1 and 2 can be easily verified and hold for any set of BECs, binary symmetric channels (BSCs), and binaryinput additive white Gaussian noise (BIAWGN), and thus satisfy (9) and (10). However, the first and the second inequalities do not hold in general BDMCs. Then, the third condition, I(W _{(1)})≤I(W _{(2)}), should be satisfied to achieve (9) and (10). For channels of N=2, where I(W _{(1)})≥I(W _{(2)}), the third condition can be achieved simply by swapping (W _{(1)},W _{(2)})↦(W _{(2)},W _{(1)}) without degrading the achievable rate. This operation is available since the channel parameters are exposed to the transceiver in advance. Therefore, we can conclude that (9), (10), and Proposition 1 are true for any BDMCs.
In general, we can extend these relations to the case of N=2^{n}(n≥1). The following relations are thus true for all given channel parameters.
The equality holds between \( I_{2N}^{(2i)}\) and \( I_{2N}^{(2i1)}\) if and only if the underlying channels are either perfect or completely noisy.
According to (12)–(14), we can observe that the gap between these two evolved symmetric channel capacities \(I_{2N}^{(2i1)}\) and \(I_{2N}^{(2i)}\) increases as recursions are repeated (or as the code length N is doubled). In addition, their values are lower and higher than those of the previous values, respectively. Recalling that I(·)∈ [ 0,1], a conjecture can be formed suggesting that the evolved \(I_{2N}^{(2i1)}\) and \(I_{2N}^{(2i)}\) converge to one of the extremal values, 0 or 1, as the process is repeated. We can now prove this conjecture with the aid of the concept of the martingale process and bounded martingale convergence theorem in the following subsection.
Martingale process I _{ n }
Following the notation of the random tree process in [1], we first define the random sequence {I _{ n }} such that \(I_{N}^{(i)} \mapsto I_{n}\left ({b_{1}^{n}} \right)\), where \(I_{n}\left ({b_{1}^{n}} \right)\in \{ I_{n} \}\) is represented as \(I_{{b_{1}^{n}}}\). Let us introduce an average sequence E _{ n } to replace \(I_{{b_{1}^{n}}}\) such that \(E_{n}\left ({b_{1}^{n}} \right)= \mathbb {E}[ I_{{b_{1}^{n}}} ]\). Based on the channel transitions in Fig. 2 and the definition of the average sequence, we can draw the alternative graphical transitions for each stage n as depicted in Fig. 3. The following lemma then shows that {I _{ n }} is a martingale process.
Lemma 2
{I _{ n }} is a martingale under average sequence {E _{ n }} in the sense that \(\mathbb {E}[ I_{n+1}E_{n} ]=E_{n}\), where \(I_{{b_{1}^{n}}} \in \{ I_{n} \}\) and \(E_{n}\left ({b_{1}^{n}} \right)=\mathbb {E}[ I_{{b_{1}^{n}}} ]\)
Proof
Lemma 2 can be proved from the chain rule of mutual information that preserves rate.
By taking an average on both sides, one can get
which is equivalent to \(E_{n}({b_{1}^{n}})\).
We can now exploit the ergodicity in BDMCs. Let us denote the sum symmetric capacity as \(I_{s}={\sum _{1}^{N}}{I_{i}}\), where I _{ i }=I(X _{ i };Y _{ i }), and the sample mean as \(\mathbb {E}[I_{i}]\). Then, from the ergodicity, \(\mathbb {E}[I_{i}]= \frac {I_{s}}{N}\). Next, we borrow the tree process representation and the corresponding notations from Section 4 in [1].
A bitchannel \(W_{N=2^{n}}^{(i)}\) can be uniquely represented via a binary sequence \({b_{1}^{n}}\): \(W_{{b_{1}^{n}}}\). For example, \(W_{8}^{(3)} \mapsto W_{011}\). Using this, we abbreviate \( I(W_{{b_{1}^{n}}}) = I\left (W_{2^{n}}^{(i)}\right)\) into \(I_{{b_{1}^{n}}}\), where \(I_{{b_{1}^{n}}}\) is an element of a set I _{ n }. We use the same notation in the definition of the average sequence E _{ n }; \(E_{n}\left ({b_{1}^{n}}\right)=\mathbb {E}[I_{{b_{1}^{n}}}] =\mathbb {E}[I_{n}]\) which is the average over all the elements of the set I _{ n }. By exploiting E _{ n }, we can draw a new tree process as depicted in Fig. 3, and we have shown that this process satisfies the property of a martingale process via (16).
Recalling the following properties, we can conclude that E _{ n } converges to 0 or 1: (i) the symmetric capacity is bounded by 0 and 1, such that \(0\leq I_{{b_{1}^{n}}}\leq 1\), and (ii) a bounded martingale process converges to extremal values, which is the same technique as that used in the proof of [1].
Once E _{ n } converges to one of the extremal values, we can also conclude that I _{ n } converges to the same extremal, since E _{ n } is defined as the average of I _{ n }. From (16), the two newly evolving terms are equidistant from \(E_{n}({b_{1}^{n}})\), so we denote them using distances α: \(E_{n}({b_{1}^{n}})+\alpha \) and \(E_{n}({b_{1}^{n}})\alpha \). When the martingale \(E_{n}({b_{1}^{n}})\) goes to 1, we have α→0 since it is bounded above by 1. Similarly, when the martingale \(E_{n}({b_{1}^{n}})\) goes to 0, we have α→0 since it is bounded below by 0. Therefore, we conclude that I _{ n } also converges to 0 or 1.
Consequently, it is concluded that sequence {I _{ n }} is a martingale sequence. □
Since {I _{ n }} is a bounded martingale sequence, it converges to a random variable I _{ ∞ } with probability 1. Furthermore, from the martingale property, \(\mathbb {E}[I_{\infty }]= \mathbb {E}[I_{0}]\), which is equivalent to \(\frac {I_{s}}{N}\). By combining this relation with the martingale property of the random sequence {I _{ n }}, we have
which concludes the proof of Theorem 1.
In summary, to prove Theorem 1, we showed that first, polarization takes place by using the martingale process, and second, the ratio of noiseless bit channels converges to the symmetric capacity as N→∞. This is because this equivalently indicates that the block error converges to zero as the code length goes to infinity as in [1].
Example on BECs
Consider BECs with \({\epsilon _{1}^{N}}\), where ε _{ i } can differ and N=2^{10}, and we assume that they are chosen in the range of [0.4,0.5]. As is known, the BEC has a convenient property such that I _{ N } can be described in a recursive form as follows:
where
Here, f(α,β)=α+β−α β,g(α,β)=α β, and \(W_{[1:N]} = \{ W_{(j)} \}_{j=1}^{N}\).
Figure 4 depicts the polarization phenomenon in \(I_{N}^{(i)}\). Observe that even under the nonidentically distributed channel condition, symmetric capacities are polarized into {0,1}. It should be verified that it approaches the channel capacity as it is in the i.i.d. scenario. In Table 1, the differences between the symmetric capacity and the rate according to the exponent of n are shown. Here, the difference is defined as \( \frac {I_{s}NR}{I_{s}}\). As can be seen, the size of the differences decreases as n increases, which means that the rate R approaches to the average symmetric capacity \(\frac {I_{s}}{N}\). Equivalently, it can be said that I _{ s } is achievable in terms of the sum rate.
Achievable scheme based on the symmetric capacity
Encoder
Given a set of BDMCs \(\{ W_{N}^{(i)} \}\), the encoder calculates \(\{ I_{N}^{(i)} \}\) according to its definition. Only for BECs, \(I_{N}^{(i)}\) calculation follows (19) with equality. In general BDMCs, the error performance of bit channels should be tracked with appropriate measures such as \(Z_{N}^{(i)}\) by using the density evolution method [9]. Then, define an information index set \(A=\left \{ i I_{N}^{(i)} \geq I_{N}^{(j)}, i\in A, j\notin A \right \}\) for all i,j∈[1,N] such that A=⌊I _{ s }⌋. We apply the interleaver Q to map the indices of the codeword to the ordered set of transmit channels: \( Q: \mathcal {X^{N}}\mapsto \mathcal {W^{N}}\), where \({x_{1}^{N}}\in \mathcal {X^{N}}\), and \(W_{[1:N]}\in \mathcal {W^{N}}\). This is deployed to help the polarization processes. The problem with finding the proper Q for given channels is discussed at the end of this section. G _{ N } is a N×N generator matrix, and its kernel is \(G_{2} = \left [ \begin {array}{cc} 1&0\\ 1&1 \end {array} \right ]\). The encoder outputs a codeword \({x_{1}^{N}} ={u_{1}^{N}}\cdot G_{N}\cdot Q\).
Decoder
First, the receiver demaps an output sequence by a deinterleaver Q ^{−1}, which produces \({y_{1}^{N}}\). Next, \({y_{1}^{N}}\) is processed by a successive cancellation (SC) decoder in order to obtain an estimated message sequence \(\hat {u_{1}^{N}}\). One can easily check that the same recursive relations exist under the nonidentical condition as those under the i.i.d channel condition, by considering the evolutions of bit channels. The only change is the last recursion of a likelihood ratio (LR) equation: L(y _{ i }) changes to \(L_{i}(y_{i})= \frac {W_{(i)}(y_{i}0)}{W_{(i)}(y_{i}1)}\).
Thus, the decoding complexity is still maintained as O(N logN), and it has vanishing probability of error rate P _{ e } as N→∞. The encoding and decoding process is summarized in Algorithm 1.
Evolution of Bhattacharyya parameters
We can now show the similar relationships of those in Proposition 1 on the Bhattacharyya functional Z(·).
Proposition 2
Suppose \(\left (W_{(1)},W_{(2)} \right)\mapsto \left (W_{2}^{(1)},W_{2}^{(2)} \right)\) for some binary input nonidentically distributed discrete channels. Then, the following relations hold
The proofs of (22)(24) is presented in the Appendix.
Corollary 1
Z _{ N } can be described in a recursive form as follows:
where f(α,β)=α+β−α β,g(α,β)=α β, and \(W_{[1:N]} = \left \{ W_{(j)} \right \}_{j=1}^{N}\). The equality holds for BEC.
By applying the recursive channel structure of Lemma 1 to the definition of the Bhattacharyya parameter, the above corollary can be derived. With the aid of Lemma 1 and (22)–(24), we can derive Properties 1, 2, and 3 as follows:

1.
$$\begin{array}{*{20}l}{} Z_{2N}^{(2i1)}+Z_{2N}^{(2i)} \leq Z_{N}^{(i)}\left(W_{[1:N]} \right) + Z_{N}^{(i)}\left(W_{[N+1:2N]} \right). \end{array} $$(26)

2.
(for some BDMC W)
$$\begin{array}{*{20}l} Z_{2N}^{(2i1)} \geq \text{max}\left\{ Z_{N}^{(i)}\left(W_{[1:N]} \right),{Z_{N}^{i}}\left(W_{[N+1:2N]} \right) \right\}.\end{array} $$(27)
To prove the above propositions, let us start from the first property. Property 1 can be derived from the definition of the Zparameter and we can use the same proof process of (24) that exploits the inequality relation between the arithmetic mean and the geometric mean. By using the f(α,β) and g(α,β) notations as in Corollary 1, Property 2 holds if and only if the joint event of {α β≤α}∩{α β≤β} holds, which is true because of the nonnegativity of α,β∈ [ 0,1]. Property 3 is true for BEC, BSC, and BIAWGN which can be transformed to an equivalent BSC model.
BEC case
As previously mentioned, BEC has received attention for its special property of maximizing the Zparameter value of the f(·) operation. For the BEC W _{(i)} with erasure probability ε _{ i }, Z(W _{(i)})=ε _{ i }. For N=2 with ε _{1} and ε _{2}, \(Z_{2}^{(1)}= \epsilon _{1} + \epsilon _{2}  \epsilon _{1}\epsilon _{2}\), and it is larger than ε _{1} and ε _{2}. Therefore, without loss of generality, the following relation holds by repeating recursions:
Hence, \(Z_{2N}^{(2i1)} \geq Z_{N}^{(i)}(W_{[1:N]})\) and \(Z_{2N}^{(2i1)} \geq {Z_{N}^{i}}(W_{[N+1:2N]})\); thus, Property 3 holds.
BSC case
For N=2 BSC with crossover probability ε _{1} and ε _{2}, \(Z\left (W_{(i)} \right)= 2\sqrt {\epsilon _{i}(1\epsilon _{i})}\). Then, \(Z_{2}^{(1)}= 2\sqrt {(1\epsilon _{s})\cdot \epsilon _{s}}\), where ε _{ s }=ε _{1}+ε _{2}−2ε _{1} ε _{2}. Without loss of generality, assume that ε _{1}≥ε _{2}, if not, we swap these two channels. The numerical results in Fig. 5 show that differences of \(Z_{2}^{(1)}Z\left (W_{(1)} \right)\) for all CPs are always nonnegative.
Hence, we can conclude that Property 3 holds for the BSC case.
BIAWGN case
Note that a binaryinput AWGN channel can be transformed to the equivalent BSC by applying a line coding that maps the set of binary alphabets {0,1}↦{−1,+1}. Therefore, Property 3 holds.
Supermartingale Z _{ n }
Let us define the random sequence {Z _{ n }} such that \(Z_{N}^{(i)} \mapsto Z_{n}({b_{1}^{n}})\), where \(Z_{n}({b_{1}^{n}})\in \{ Z_{n} \}\) is represented as \(Z_{{b_{1}^{n}}}\). Then, for some BDMCs that satisfy four properties, we can draw a graphical evolving recursive structure of \(Z_{N}^{(i)}\) (or Z _{ n }) similar to Fig. 3. The graphical structure of Z _{ n } has a common feature to that of the identically distributed channel case; however, elements in the rightmost column are under a nonidentically distributed scenario, where CPs may not be the same.
{Z _{ n }} can be considered as a supermartingale if this random tree process satisfies the relation \(Z_{n} \geq \mathbb {E}\left [Z_{n+1}{b_{1}^{n}}\right ]\). Under the i.i.d. channel assumption, it is easily verified, since the one step transition from \(Z_{n}({b_{1}^{n}})\) to {Z _{ n+1}} is a single variable to single variable mapping: \(\mathcal {K}:\mathcal {Z} \rightarrow \mathcal {Z}\) such that \(\mathcal {Z}=\{ ZZ\in \Re, Z\in [0,1] \}\). In contrast, with the nonidentically distributed assumption, the transition becomes a two to one mapping: \(\mathcal {K}':\mathcal {Z}^{2} \rightarrow \mathcal {Z}\). Each \(Z_{{b_{1}^{n}}}\) is not a scalar but a 2×1 vector. In this case, the format of the condition is not appropriate due to dimension mismatch.
Using a similar process as that for I _{ n } in proving the martingale property, we apply the average sequence {E _{ n }} to replace \(Z_{{b_{1}^{n}}}\) such that \(E_{n}\left ({b_{1}^{n}}\right)= \mathbb {E}\left [Z_{{b_{1}^{n}}}\right ]\). Then, from (25), we can verify that \(\mathbb {E}\left [Z_{n+1}E_{n}({b_{1}^{n}})\right ] \leq E_{n}\) which means Z _{ n } is a supermartingale under E _{ n }.
Convergence of {Z _{ n } }
Since {Z _{ n }} is bounded within [ 0,1] and the supermartingale, from the martingale convergence theorem, there exists a random variable Z _{ ∞ } where the sequence {Z _{ n }} converges with probability 1 as n→∞. It is equivalent to the statement that in L ^{1}, \({\lim }_{n\rightarrow \infty }\mathbb {E}[Z_{n+1}E_{n}] = 0\) which implies E[Z _{ ∞ }−E _{ ∞ }]=0. Given that \(E_{n}({b_{1}^{n}})= \frac {\alpha +\beta }{2}\) and
where \(\alpha \triangleq Z_{N}^{(i)}(W_{1:N})\) and \(\beta \triangleq Z_{N}^{(i)}(W_{N+1:2N})\), it becomes an indeterminate equation with the condition of α,β∈[ 0,1]. Solution pairs (α _{ ∞ },β _{ ∞ })={(0,0),(1,1)} can be obtained. Therefore, since Z _{ ∞ }, which corresponds to α _{ ∞ }, converges to either 0 or 1 almost everywhere, we can conclude that all \(Z_{N}^{(i)}\) are polarized in either a near perfect or totally random manner as n→∞.
Channel mapping via the interleaver Q
In this subsection, we discuss the role of an interleaver Q in polar coding systems under nonidentically distributed BDMCs and propose an algorithm that explains how to construct such an operation. The transceiver structure including an interleaver Q and a deinterleaver Q ^{−1} is depicted in Fig. 1. To understand the importance of Q, let us consider the following example.
Example 1
For N=4, let the set of erasure probabilities of parallel BECs be \(\{ {\epsilon _{1}^{N}} \}=\{ 0.1, 0.4, 0.6, 0.9 \}\), of which the average is ε _{ m }=0.5. Then, the evolved bit channel capacities are \(\left \{ I\left (W_{N}^{(i)} \right) \right \} = \{ 0.02, 0.56, 0.44, 0.98 \}\) and the encoder selects the information set A={2,4}. The resulting sum of the capacities of the selected bit channels is 1.54 [bits] for four channel uses.
We now apply the interleaver Q over the same set, which results in \(\left \{ {\epsilon _{1}^{N}} \right \}=\{ 0.6, 0.4, 0.9, 0.1 \}\). The evolved set of symmetric capacities is \(\left \{ I\left (W_{N}^{(i)} \right) \right \} = \{ 0.02, 0.31, 0.69, 0.97 \}\).
The encoder chooses two indices with the highest \(I\left (W_{N}^{(i)} \right)\) values: A={3,4}, and the achievable sum is 1.66 [bits], which is a 7.8% enhancement compared to the previous index.
Note that as \(\sum _{i\in A}I\left (W_{N}^{(i)} \right)\) increases, the system becomes more reliable. The result of this example indicates that for a given set of channels {W _{(i)}i∈ [ 1,N]}, the optimal mapping \(Q: {x_{1}^{N}}\mapsto {s_{1}^{N}}\) exists in the sense that the reliability (or the rate) is maximized by boosting polarizations among the symmetric capacities of bit channels. Finding the appropriate Q can be viewed as an optimization problem:
Equivalently, the mapping can be interpreted as a channel permutation such that Q:{W _{(i)}}↦{W(i)′}. We discuss two methods that search for such a mapping Q.
Exhaustive search method with grouping
The simplest and most naive approach involves testing every possible combination over N channels and selecting the best combination. Obviously, N! number of cases need to be checked. However, we can categorize every combination into equivalent groups since in each group, all combinations output the same qualities of bit channels. The size of each group is 2^{N−1}. Hence, owing to the recursive channel evolving structure, the required number of tests is \(\frac {N!}{2^{N1}}\). The detailed proof is shown in the Appendix. This grouping technique considerably reduces the computational burdens: for N=8, we need to test 315 representative combinations instead of N!=40,320. However, the enhanced test set size would go beyond the computational capability for practical N lengths.
Heuristic method
The purpose of the channel combining and the splitting operation is to build virtual channels that are as close to the extremal channels as possible as the recursions are repeated. We already know that for any nonidentical parallel BDMCs, there exist polar codes that achieve the symmetric capacity as N goes to infinity. However, in a practical system that exploits finite code lengths, as we observed via Example 1, differences exist in the convergence speed for different mappings through the interleaver Q. A welldesigned Q will polarize bit channel qualities fast. Therefore, the Q should sort the given set of transmit channels in order to create as many enhanced and degraded transmit channels as possible that are close to the extremals simultaneously. In this section, we propose an algorithm of Q that achieves such an object.

1.
Sort transmit channels W _{(i)},i∈ [ 1,N] in an ascending order of the capacity I(W _{(i)}).

2.
Make \(\frac {N}{2}\) pairs: the i ^{th} pair includes the i ^{th} smallest transmit channel and the i ^{th} largest transmit channel.

3.
Using the indices of \(\frac {N}{2}\) pairs, \(\left [1: \frac {N}{2}\right ]\), repeat the second procedure until the size of the index set becomes 4.
As can be seen, this algorithm has a recursive structure, and we can represent it in a matrix form. Let P _{ n } represent the interleaver operation Q for length N=2^{n} and S _{ n } indicate the second operation of the above algorithm. Then,
where \(\mathcal {I}_{2}\) is the 2×2 identity matrix and ⊗ means the Kronecker product.
In Fig. 6, we depict achievable rates \(\frac {1}{N}\sum _{i\in A}I\left (W_{N}^{(i)}\right)\) of polar codes by exploiting different interleaving methods under parallel BECs of which the erasure probabilities are uniformly chosen within the range of (0,1) (hence, the average of 0.5). The black curve refers to the capacity, the red curve refers to the result of the proposed algorithm, and the blue curve refers the performance when using random shuffling. We can obtain the green curve when we apply the proposed algorithm only once without recursion: that is, P _{ n }=S _{ n }. Finally, the dark red curve refers to the performance when the transmit channels are sorted in capacities; whether the order is ascending or descending, the result is not affected.
From the figure, the proposed interleaving algorithm shows better performance in the rate than the other algorithms. It is also observed that when W _{(i)}s forms an ordered set, it converges to the capacity slower than the other algorithms. Hence, if the symmetric capacities of the exposed channels are ordered in either way, they should be rearranged through Q.
Nonidentical binary erasure channels with random erasure probabilities with single distribution
In previous studies, [8] and [3], it is assumed that the characteristics of underlying discrete memoryless channels are fully exposed to the transceiver; thus, the encoder and the decoder exploit this information. Under this condition, it is proved that polar codes can achieve the symmetric capacity.
In this section, we assume that channel parameters are not deterministic, but are realizations of a random variable. For BEC, the channel transition probability of a transmit channel \(W_{(i)}:\mathcal {X}_{i}\mapsto \mathcal {Y}_{i}\) is fully described by the erasure probability ε _{ i }. Hence, the channel features of the nonidentically distributed parallel channels model are perfectly represented via the set of erasure probabilities \(\left \{ {\epsilon _{1}^{N}} \right \}\).
The existence of random erasure probabilities means that each ε _{ i } is the realization of the random variable θ such that ε _{ i }∼f _{ θ }(ε _{ i }),∀i∈ [ 1,N], where f _{ θ } is a stationary probability distribution function.
We now assume that the realized set of erasure probability \(\left \{ {\epsilon _{1}^{N}} \right \}\) is exposed to neither the encoder nor the decoder. In this case, the only available information that can be extracted to the encoder and the decoder is the set of moments from the given distribution f _{ θ }. We prove that under this scenario, polar codes can achieve the symmetric capacity.
Theorem 2
For a set of BECs {W _{(i)}} with the unknown set of erasure probabilities {ε _{ i }},i∈ [ 1,N], but the distribution f _{ θ } is given to the transceiver, there exist polar codes, for arbitrary small δ≤0, that achieve the symmetric capacity I _{ s }, in the sense that as N→∞, the fraction of indices i satisfies
where the symmetric capacity I _{ s } is defined as an average of the individual transmit channel’s capacities: \(I_{s}(W_{N}) =\frac {1}{N}\sum _{i=1}^{N} I\left (W_{(i)} \right)\), where \(W_{N}: {X_{1}^{N}} \mapsto {Y_{1}^{N}}\).
Proof of Theorem 2
By the law of large numbers, the empirical channel behavior for a codeword can be described by the first moment ε _{ m } of the distribution f _{ θ }. Also, as previously mentioned, since the transceiver is oblivious to the exact set of erasure probabilities of the underlying parallel channels, it has no choice but to exploit ε _{ m } for consideration of constructing codewords.
Now, let us consider the meaning of I _{ s }
where \(\epsilon '=\frac {1}{N}\sum\limits_{\text {{\i }}=1}^{N} \epsilon _{i}\). Hence, we can conclude that the nonidentically distributed N parallel BECs model is equivalent to the N identically distributed BECs model of which the erasure probability is ε ^{′} for a large N and we can achieve the symmetric capacity. Next, we need to verify whether (33) is reliably achievable based on information given by the transceiver. To that end, we need to demonstrate that the following two issues are true:

The existence of polar codes over BEC with erasure probability ε _{ m }

ε ^{′}≥ε _{ m } for all distributions f _{ θ }
The first statement is proved to be true in Theorem 2 of [1] in that for any BDMC W, there exists a sequence of information bit sets A _{ N } such that A _{ N }≥NR with arbitrary small error probability. In this scenario, A _{ N }=⌊N(1−ε ^{′})⌋. The second statement is proved through the linearity of I _{ s } to the erasure probability. In Fig. 7, I _{ s } is depicted. Note that the I _{ s } is an affine function of the erasure probability ε. Therefore, ε ^{′}=ε _{ m } for any distribution f _{ θ } since \({\lim }_{N\rightarrow \infty }\frac {1}{N}\sum _{\text {{\i }}=1}^{N} \epsilon _{i} =\epsilon _{m}\). Note that under erasure channels, the equality in the Jensen’s inequality holds
This completes the proof of Theorem 2.
Achievable polar coding scheme
According to Theorem 2, the construction of the capacity achieving polar coding scheme is straightforward. First, given the distribution f _{ θ }, the encoder calculates the first moment \(\mathbb {E}[\epsilon ]=\epsilon _{m}\). It then constructs the message vector \({u_{1}^{N}}\) by determining the information index set A _{ N } with the predefined frozen bits u _{ F }. This message sequence is encoded through the generator matrix G _{ N } and is transmitted through the nonidentically distributed parallel BECs of \(\left \{ {\epsilon _{1}^{N}} \right \}\). The procedure is summarized in Algorithm 2.
It should be noted that to achieve the symmetric capacity under nonidentically distributed parallel BECs, with unknown channel parameters, the only constraint required is the code length N→∞.
Random erasure probabilities with nonidentical distributions
In this subsection, we consider the case of N nonidentically distributed BECs W _{(i)}: for ∀i∈ [ 1,N] that each distribution \(f_{\theta _{i}}\) of the erasure probability ε _{ i } for each transmit channel W _{(i)} could differ for different bit channel indices. Thus, this scenario includes previous scenarios as a special case.
For example, consider the multihop communications between the two nodes in Fig. 8. In this figure, each data element in S is delivered to D through different paths in a multihop fashion. If the number of hops is increased, it becomes more difficult for S and D to track the exact parameters that model each path. Furthermore, these paths could have statistical variation due to unstable media or interference from different noise sources such as the node V in the figure. In these cases, the proposed scenario is logical.
A simple coding scheme that is able to achieve the symmetric capacity is the conveyance of multiple codewords as a group for each decoding stage. In each stage, we exploit the set of parallel nonidentical channels L=2^{l} times. The reason for this power of 2 format is to match with the length of polar codewords.
Let W _{(i)}(j) mean the jth access to the channel W _{(i)}, ε _{ i }(j) as the instantaneous erasure probability of the channel W _{(i)}(j) that follows the distribution \(f_{\theta _{i}}\), and I _{ s,i }(W _{(i)}) is the symmetric capacity of W _{(i)} by accessing it L times.
The instantaneous capacity for L blocks are depicted in Table 2. The distributions of erasure probabilities can differ, and each transmit channel (row) shows ergodic behavior as shown in the last column of the table. Here, \(\bar {I}(W_{(N)}) = \frac {1}{L}\sum _{j=1}^{L} I(W_{(i)}(j)) \).
The symmetric capacity in this case becomes
where \(\epsilon _{m_{i}}\) is the first moment of \(\phantom {\dot {i}}\epsilon _{i} \sim f_{\theta _{i}}\). Note that \({\lim }_{L\rightarrow \infty }I_{s}\left (W^{L}_{[N]} \right)\frac {1}{N}=\sum _{i=1}^{N} (1\epsilon _{m_{i}})\). The equality is due to the affinity of the symmetric capacity over the domain of erasure probability. We now consider two cases. In the first case, we assume that the encoder is able to be adaptedto various code lengths. This means that the encoder can construct generator matrices G _{ N } for any exponent n (N=2^{n}). In the second case, the coding structure is fixed, thus parameter N (and the following G _{ N }) cannot be changed. In the first case, let the encoder exploit any exponent l, and construct L×L generator matrix G _{ L } where L=2^{l}. The following proposition is then satisfied:
Proposition 3
For a set of nonidentically distributed BECs {W _{(i)}}, with a set of random erasure parameter {ε _{ i }},i∈[1,N] that each ε _{ i } follows nonidentical \(f_{\theta _{i}}\), the symmetric capacity \(I_{s}\left (W^{L}_{[N]} \right)\) is achievable by exploiting multiple streams of polar codewords.
The proof of Proposition 3 is the same as that of the existence of polar codes that achieve the set of individual capacities \(\{ \bar {I}(W_{(i)}) \}\), since the symmetric capacity \(I_{s}\left (W^{L}_{[N]}\right)\) is their sum. Also, in Theorem 3, we proved that there exist polar codes that achieve each capacity \(\left \{ \bar {I}\left (W_{(i)} \right) \right \}\), in the sense that as L→∞ through the power of 2, the fraction of indices j∈ [ 1,L] of the ith message block is satisfied:
for an arbitrary small δ≥0, and for all channel index i∈ [ 1,N]. Therefore, Proposition 3 is true for any set of distributions \(\left \{ f_{\theta _{i}} \right \}\).
Achievable coding scheme
In Fig. 9, the encoding and the decoding procedures are depicted for all transmit channel indexes. Given all distributions \(\{ f_{\theta _{i}}\forall i\in \,[1,N] \}\), the encoder calculates the set of first moments \(\{ \epsilon _{m_{i}} \}\) and evolved bit channel capacities \(\left \{ I\left (W_{N}^{(i)} \right) \right \}\) with \(\{ \epsilon _{m_{i}} \}\) according to these recursive equations, (19) and (20). With these moments and symmetric capacities, the encoder then constructs N streams polar codewords \(\{ {x_{1}^{L}}(i) \}\) of length L=2^{l}. To achieve this, the encoder first defines an information set \(A_{L}(i)=\left \{ j  I\left (W_{N}^{(j)} \right) \geq I\left (W_{N}^{(k)} \right), \forall j\in A_{L}, k\in {A_{L}^{c}} \right \}\), the size of which is \(A_{L}(i)= \lfloor L(1\epsilon _{m_{i}})\rfloor \phantom {\dot {i}}\). With fixed frozen bits u _{ F }, where F=[ 1,L]∖A _{ L }(i), message vectors \(\{ {u_{1}^{L}}(i) \}\) are stacked in the N×L message matrix U, which is then encoded with the L×L generator matrix G _{ L } and output codewords matrix X. Each column of X propagates sequentially through the nonidentically distributed parallel BECs.
The receiver saves L output vectors in matrix Y and produces estimates by applying the SC decoder row by row. This procedure is summarized in Algorithm 3. Note that there are no constraints on N. Actually, since the proposed coding scheme is not affected by N, it is robust to the deletion of some transmit channels. Assume that a set of channels W _{ J } where J is a subset of [1,N] is lost since the corresponding set of symmetric capacities are all zeros. The encoder and the decoder will then simply decrease N to N−J; the data is transmitted through W _{[1,N]∖J }. Note that the symmetric capacity \(I_{s}\left (W^{L}_{[N]\setminus J} \right)= \sum _{\forall i\in [1,N]\setminus J} I(W_{(i)})\) is still achievable.
Proof of achievability From the set of BECs {W _{(i)}} for ∀i∈ [ 1,N], the size of each information set\(A_{L}(i)= \lfloor L(1\epsilon _{m_{i}})\rfloor \phantom {\dot {i}}\), the unit of which is in bits per L channel uses. It is known that the SC decoder recovers each message \(u_{A_{i}}\) with vanishing probability of error as L→∞ for ∀i. Now, define an individual rate R _{ i } as \(\frac {A_{i}}{L}\); it then converges to \(\bar {I}(W_{(i)})\) under the same condition. That is, for any δ _{ i }∈[0,1)
For the symmetric capacity, \(I_{s}\left (W^{L}_{[N]} \right)\) is the same as the arithmetical mean of its parts \(\{ \bar {I}(W_{(i)})\forall i\in [1,N] \}\): \(I_{s}\left (W^{L}_{[N]}\right)=\frac {1}{N} \sum _{i=1}^{N} \bar {I}(W_{\Lambda _{i}})\). We can conclude that it is achievable from the proposed scheme. The complexity of this polar coding scheme is O(NL logL), since it is a concatenation of N SC decoders of length L.
In addition, consider a transmitter in which the encoding structure cannot be changed. Then, the only choice is to utilize the fixed size of the N×N generator matrix G _{ N }, where N=2^{n} and the produced codewords are of the same length, N. We can treat this problem by setting L to be identical to N. The encoder defines the collection of information index sets {A _{ i }} from \(\{ \epsilon _{m_{i}} \}\phantom {\dot {i}}\), where \(\phantom {\dot {i}}A_{i}=\lfloor N\cdot I(\epsilon _{m_{i}})\rfloor \). Using a common generator matrix G _{ N }, the encoder sequentially produces N of polar codewords \({x_{1}^{N}}(i)={u_{1}^{N}}(i)\cdot G_{N}\) for i∈ [ 1,N]. These codewords \(\left \{ {x_{1}^{N}}(k) \right \}\) are stacked in rows of the matrix X. The complexity of this polar coding scheme is O(N ^{2} logN), since it is a concatenation of N SC decoders.
Polarizations on nonindependent channels
In the previous sections, it was assumed that all transmit channels are independent from each other from
where \(W_{(i)}: \mathcal {X}_{i} \rightarrow \mathcal {Y}_{i}\). Now, consider the case where these channels are correlated; thus, the equality in (39) does not hold. Then, the previous polar coding scheme for independent BDMCs might not achieve the capacity. For example, we can check the polarization phenomenon in a measure of the symmetric capacity for N=2, where the transition probability of the channel \(W_{2}:\mathcal {X}_{1} \times \mathcal {X}_{2} \rightarrow \mathcal {Y}_{1} \times \mathcal {Y}_{2}\), where \(\mathcal {X}_{1}\) and \(\mathcal {X}_{2}\) are GF(2), and \(\mathcal {Y}_{1} = \mathcal {Y}_{2} = \{ 0,1,e \}\) is defined as follows:
The element e denotes the erasure symbol. Without losing information, W _{2} can be modeled as a single quaternary erasure channel (QEC) W ^{′} with CP ε, of which the capacity is I(W ^{′})=2(1−ε). We now apply the same generator matrix G _{2} to the encoder and calculate the symmetric capacities of split bit channels \(I\left (W_{2}^{(1)} \right)\) and \(I\left (W_{2}^{(2)} \right)\) via (1). We can then obtain \(I\left (W_{2}^{(1)} \right) = I\left (W_{2}^{(2)} \right) = 1\epsilon \), which means that no polarization occurs with the coding strategy for independent channels. By relabeling, the set of binary input vectors \({X_{1}^{2}}=\{ 00,01,10,11 \}\) is mapped to a quaternary symbol set S={0,1,2,3}, as depicted in Fig. 10, where r=2 with the channel notation change from the correlated vector channel W _{2} to the single independent qary DMC W ^{′}. The problem of nonpolarization is proved to occur in some qary DMCs when the cardinality of the set S is a composite number [10]. However, in [11], the authors proved the existence of polar codes for the composite cardinality of q=2^{r}.
Under the parallel channel model, when N=2^{n}, the transmit channels are pairwise correlated from
The transmit channels can be represented as N ^{′} length of independent quaternary input DMCs
where \(N'=\frac {N}{2}\).
For generalization, assume that r transmit channels are correlated in similarly to (41). If we assume r=2^{α} (1≤α≪n−1) for simplicity, then the new code length becomes N ^{′}=2^{n−α}, and we have
In a qary representation, (42) still holds, where the input alphabet cardinality is q=2^{r}. If these relabeled independent qary DMCs are identically distributed, such that \(W^{\prime }_{(i)} = W'_{(j)}\) for ∀i,j∈ [ 1,N ^{′}], it is proved in [11] that polar codes for qary input DMCs achieves the symmetric channel capacity when q=2^{r} by exploiting the same kernel as in [1]. Therefore, the following proposition holds for a general qary DMC W ^{′}.
Proposition 4
There exist polar codes for nonindependent DMCs W ^{′}, which achieve the symmetric I(W ^{′}), by relabeling rbits binary sequence to a qary (q=2^{r}) symbol as N ^{′}→∞ through the power of 2.
Proof
The proof follows the proof of the existence of polar codes in [11]. The authors proved the existence of polar codes that achieve the symmetric capacity for any qary input alphabet discrete memoryless channel when q is the power of 2. In our model, by grouping and relabeling alphabets in the same length, the model can be mapped to an equivalent qary alphabet model, without any information loss because these operations are deterministic. We can then exploit the same proof as that in [11]; hence, the achievability still holds in this partially nonindependent channels model. □
Conclusions
In Section 2, we proved that for deterministic CPs in nonidentical channel models, polar codes can achieve the sample mean of bit channel capacities. In addition, we provided an example that demonstrates the importance of the use of the proper channel interleaver Q for achievable rate enhancement and proposed the heuristic mapping algorithm.
In Sections 3 and 4, the key contribution is a new system model where the transmitter and the receiver know only the channel parameter distribution and not the channel parameter itself. If the underlying channel type is BEC, the coding scheme can become simpler. Note that for a BEC with erasure probability ε, its symmetric capacity I is the affine function of ε. Then, we have the relation \(E[I(\varepsilon)]=I(\overline {\epsilon })\), where \(\overline {\epsilon }\) is the expectation of the random variable ε∼f _{ ε }(ε).
By applying multiple streams of polar codewords, we prove that the average capacity of BECs under our scenarios is achievable. However, this is obtained by sacrificing the latency and complexity, since they stack multiple blocks during the encoding and decoding process. Hence, these schemes might not be suitable in systems where low latency or low complexity is required. Rather, it is more practical in storage systems such as flash memory devices where throughput is more important than latency. Especially, for flash memories, statistical responses such as a voltage threshold would change with time and with the number of accesses to a cell block. Hence, as the storage capacity increases, it is inefficient for a storage controller to determine the exact states of every block or cell. If statistics on their changes are given instead, we can manage cells more efficiently using the proposed polar coding scheme.
In addition, in the case of parallel channels, where statistically different random disturbances exist across channels, it is difficult to track all the channel parameters. However, if their statistics are known to the transmitter and the receiver, we can deliver data up to the average capacity through polar codes by sacrificing latency. In such cases, polar codes are a promising option that maximizes the throughput.
Under the nonindependent channel scenario, we assume that the N transmit channels are grouped into channels with size r which is a power of 2, so that we can treat the scenario as a nonbinary system. If N is not divisible by r (N mod r≠0), puncturing may be used to fit the system into a qary system [12]. The proposed polar codes appear to be promising for applications where only the knowledge of the channel parameter distribution is available and can be practical for storage applications such as flash memory devices.
Appendix
Proof of (22)
Proof
after calculations, it becomes
Therefore, \(Z\left (W_{2}^{(1)}\right) \leq Z\left (W_{(1)} \right)+Z(W_{(2)})Z(W_{(1)})Z(W_{(2)})\) is satisfied for any binary input channel parameters. □
Proof of (23)
Proof
□
The fourth equation comes from the fact that the summation over u _{1} is a sum of two similar terms. In addition, the following relation holds with the aid of (23):
The relation can be verified simply by subtracting either the right had terms from the left hand term.
Proof of (24)
Proof
We can prove (24) simply by applying the arithmeticgeometric mean inequality to \(Z\left (W_{2}^{(1)}\right)\). Let us review the development process.
Defining shorthand notations of A=W _{(1)}(y _{1}0),B=W _{(1)}(y _{1}1),C=W _{(2)}(y _{2}0), and D=W _{(2)}(y _{2}1), we can rewrite the above equation as follows:
which the inequality is from the arithmetic and geometric mean relation. □
Proof of the number of equivalent channel combinations
According to the basic theory of polar codes, channels are recursively evolved such that \((W_{N}^{(i)},W_{N}^{(i)}) \mapsto (W_{2N}^{(2i1)},W_{2N}^{(2i)})\) for some set of binary input discrete memoryless channels. We denote each mapping as follows:
and it was proved that polarizations occur in nonidentically distributed channels.
Recall that functional \(\mathcal {F}_{n}\) and \(\mathcal {G}_{n}\) share the same set of functions as an input. Furthermore, these input functions are also results of functionals \(\mathcal {F}_{n1}\) or \(\mathcal {G}_{n1}\) due to the recursive structure of channel evolution. Let us define a swapping operation S:(α,β)↦(β,α). We can then easily check that \(\mathcal {F}_{n}\) and \(\mathcal {G}_{n}\) are invariant to S. We also define a functional \(\mathcal {H}_{n}\) which includes both \(\mathcal {F}_{n}\) and \(\mathcal {G}_{n}\). \(\mathcal {H}_{n}\) can represent its members since \(\mathcal {F}_{n}\) and \(\mathcal {G}_{n}\) are equivalent in the number of cases; hence, it is also invariant to S. In a similar way, we define \(\mathcal {W}_{n+1}\) which includes \(W_{2N}^{(2i1)}\) and \(W_{2N}^{(2i1)}\). We now rewrite (44) and (45) using \(\mathcal {H}\) and \(\mathcal {W}\), respectively.
Due to the recursive structure in channel evolutions, each \(W_{N}^{(i)}\) is calculated again through \(\mathcal {H}_{n}\). It is then generally in the form of
Defining the number of operations of \(\mathcal {H}\) that are required to recursively reach \(\mathcal {H}_{n}\) as χ _{ n } then
By solving this recurrence formula, we can obtain χ _{ n }=N−1. Recalling that \(\mathcal {H}\) is invariant to S which indicates there are two cases for each \(\mathcal {H}\), the number of combinations of the BDMCs that result in the same information set of polar codes is 2^{N−1}. Hence, the number of representative combinations that may have different information sets for length N parallel polar coding systems is \(\frac {N!}{2^{N1}}\).
References
 1
E Arikan, Channel polarization: a method for constructing capacityachieving codes for symmetric binaryinput memoryless channels. IEEE Trans. Inform. Theory. 55(7), 3051–3073 (2009).
 2
M Alsan, Conditions for robustness of polar codes in the presence of channel mismatch (2013). eprint ArXiv:1303.2379v2.
 3
M Alsan, E Telatar, in IEEE International Symposium on Information Theory (ISIT) 2014. A simple proof of polarization and polarization for nonstationary channels (Honolulu, 2014), pp. 301–305.
 4
D Blackwell, L Breiman, AJ Thomasian, The capacity of a class of channels. Ann. Math. Stat. 3(4), 1229–1241 (1959).
 5
SH Hassani, R Urbanke, in IEEE International Symposium on Information Theory (ISIT) 2014). Universal polar codes (Honolulu, 2014), pp. 1451–1455.
 6
E Sasoglu, L Wang, in IEEE International Symposium on Information Theory (ISIT) 2014. Universal polarization (Honolulu, 2014), pp. 1456–1460.
 7
E Hof, I Sason, S Shamai, in IEEE 26th Convention of Electrical and Electronics Engineers in Israel (IEEEI). Polar coding for degraded and nondegraded parallel channels (Eilat, 2010), pp. 550–554.
 8
E Hof, I Sason, S Shamai, C Tian, Capacityachieving codes for arbitrarily permuted parallel channels. IEEE Trans. Inform. Theory. 59(3), 1505–1516 (2013).
 9
R Mori, T Tanaka, Performance of Polar codes with the construction using density evolution. IEEE Commun. Lett. 13(7), 519–521 (2009).
 10
E Sasoglu, in IEEE International Symposium on Information Theory Proceedings (ISIT) 2012. Polar codes for discrete alphabets (Cambridge, 2012), pp. 2137–2141.
 11
W Park, A Barg, Polar codes for qary channels, q=2^{r}. IEEE Trans. Inform. Theory. 59(2), 955–969 (2013).
 12
A Eslami, H PishroNik, in IEEE International Symposium on Information Theory Proceedings (ISIT) 2011. A practical approach to polar codes (St. Petersburg, 2011), pp. 16–20.
Acknowledgements
This research was supported in part by the Basic Science Research Programs (NRF2013R1A1A2008956 and NRF2015R1A2A1A15052493) through NRF funded by MOE and MSIP, the ICT R&D Program of MSIP/IITP (B0717160023), the Technology Innovation Program (10051928) funded by MOTIE, the BioMimetic Robot Research Center funded by DAPA (UD130070ID), INMC, and BK21plus.
Authors’ contributions
JK and JL prove the achievability of polar codes in nonidentically distributed channels and propose the channel interleaver algorithm to enhance the system reliability. Both authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Polar codes
 Nonidentically distributed channels
 Channel parameters
 An interleaver scheme