Polar codes for nonidentically distributed channels
 Jangseob Kim^{1} and
 Jungwoo Lee^{1}Email authorView ORCID ID profile
https://doi.org/10.1186/s1363801607822
© The Author(s) 2016
Received: 27 July 2015
Accepted: 27 November 2016
Published: 16 December 2016
Abstract
We introduce new polar coding schemes for independent nonidentically distributed parallel binary discrete memoryless channels. The first scheme is developed for the case where underlying channels are time invariant (the case of a deterministic channel parameters), while the other schemes deal with a scenario where underlying channels change based on a distribution (the case of random channel parameters). For the former case, we also discuss the importance of the usage of an interleaver Q to enhance system reliability, and for the latter case, we model the channel behavior of binary erasure channels. It is shown that the proposed polar coding schemes achieve a symmetric capacity.
Keywords
1 Introduction
The polar codes previously introduced in [1] are a coding scheme that achieves the symmetric capacity of any discrete memoryless channel (DMC) by exploiting channel polarization phenomena. The block error rate converges to zero as the code length N goes to infinity.
One disadvantage of polar codes is their dependency on the underlying channel. If a channel is unknown to a transceiver, communication becomes unreliable and erroneous decoding can result, since the code design is based on a different channel. The performance gap due to imperfect channel knowledge was analyzed in [2], and a robust approximation technique has been proposed for this mismatch. In [3], it was demonstrated that polarization is possible even under a nonstationary channel.
In this paper, we propose an approach to the design of polar codes in the parallel channel model based on a martingale process and extend the model to the case in which the channel parameters are no longer fixed but exhibit some random behaviors. In addition, we briefly discuss the importance of the usage of the proper interleaver Q.
The communication scenario in which the transmitter and the receiver do not know the channel parameter but know the domain (set) to which it belongs is known as the compound channel scenario [4]. The authors of [4] defined the compound channel capacity as the rate at which one can reliably transmit data without knowing the underlying channel parameter. In [5] and [6], polar codes that achieve the compound channel capacity were proposed. In [5], the unknown channel is deterministic during a codeword transmission, and in [6], the authors dealt with a deterministic compound parallel channels model.
Parallel channels are used to model statistical behaviors of communications in conveying multiple data simultaneously between nodes, connected with multiple links. The internal structure of storage systems or singular value decomposed multipleinput multipleoutput (MIMO) channels are an example of parallel channel models. In these channels, the links do not need to be the same, and their statistical behaviors can differ. The question then arises about whether the parallel channel capacity is achievable via polar codes. In [7, 8], it is demonstrated, under a nonidentically distributed channel assumption, that when all channel parameters (CP) are given to the transceiver, polar codes can achieve the capacity in both degraded and nondegraded channel settings.
In this paper, we show that the polarization phenomenon occurs in the parallel channel model and provide proof of achievability of polar codes for general binary discrete memoryless channels (BDMC), under a nonidentically distributed assumption, where deterministic CPs are given. Second, in contrast to previous literature, we also consider a case where CPs that describe binary erasure channels (BECs) are realizations of some random variables, and only their probability distributions, and not their exact values, are known to the transmitter and the receiver.
Developing polar codes for the randomly varying channel scenario is one of the noticeable issues. For the ith transmit BEC at time instant η, assume that an erasure probability ε _{ i }(η) is a realization of a random variable ε that follows a distribution f _{ ε }. In this case, conventional polar coding schemes fail to achieve the capacity since the encoder and the decoder do not know the exact value, ε _{ i }(η), of the underlying channel. For example, in flash memories, the statistical responses of the voltage threshold may change asymmetrically with time and with the number of accesses to the cells. As the storage capacity increases, it is inefficient for a storage controller to trace and probe the exact states of every block or cell.
The remainder of this paper is organized as follows. In the second section, we analyze the polarization process on a timeinvariant nonidentical parallel channel. This corresponds to a parallel channel with deterministic CPs which could differ to each other. In the third and fourth sections, an independently nonidentically distributed channel model with random CPs is discussed. In these sections, we assume that only the distributions of the CPs of each channel, W _{(i)}, (∀i∈ [ 1,N]), are known to an encoder and decoder. In the third section, the CPs may change in a blockbyblock manner; however, their states are maintained within a block. In the fourth section, in contrast to the third section, CPs may change even within a block. In the fifth section, we discuss a special case of partially dependent channels and show that polar codes based on the qary input can achieve the capacity. In the final section, concluding remarks are provided.
2 Nonidentical channels with deterministic CP
Let \(W:\mathcal {X}\rightarrow \mathcal {Y}\) denote a general symmetric binary input memoryless channel (BDMC) and \(W_{N}:\mathcal {X}^{N}\rightarrow \mathcal {Y}^{N}\) denote a vector channel. If channels are independent but not identical, then \(W_{N}({y_{1}^{N}}{x_{1}^{N}})=\prod _{i=1}^{N}W_{(i)}(y_{i}x_{i})\) where \(W_{(i)}: \mathcal {X}_{i} \rightarrow \mathcal {Y}_{i}\), such that their transition probabilities p _{(i)}(yx) and p _{(j)}(yx) may differ if i≠j. In parallel channels, a nonidentical channel may correspond to a transmission scenario through links with different qualities. In terms of the channel model, this can be interpreted as a fastfading channel in a time or frequency domain. This corresponds to the case when the time duration or the frequency gap between adjacent channels is larger than the coherence time or coherence frequency, respectively.

Symmetric capacity

Bhattacharyya parameter
In this section, channels are assumed to be independent and not identically distributed, and the CPs are known to the transmitter and the receiver. For BEC with \({\epsilon _{1}^{N}}\), these erasure probabilities may differ and are known in advance to the encoder and the decoder. Let us denote the sum symmetric capacity by \(I_{s}={\sum _{1}^{N}}{I_{i}}\), where I _{ i }=I(X _{ i };Y _{ i }), and the sample mean by \(\mathbb {E}[I_{i}]= \frac {I_{s}}{N}\). Then, the following theorem holds.
Theorem 1
For any set of BDMCs {W _{(i)}},i∈ [ 1,N], for arbitrary small δ≤0, there exist polar codes that achieve sum capacity I _{ s }, in the sense that as N, which is the power of 2, goes to infinity, the fraction of indices i∈ [ 1,N] satisfies
For simple notations, denote \(Z_{N}^{(i)}\triangleq Z\left (W_{N}^{(i)} \right)\) and \(I_{N}^{(i)}\triangleq I\left (W_{N}^{(i)} \right)\). To prove the above theorem, we first need to clarify the recursive structures of \(W_{N}^{(i)}\), \(Z_{N}^{(i)}\), and \(I_{N}^{(i)}\). Second, we need to prove that the values of \(I_{N}^{(i)}\) and \(Z_{N}^{(i)}\) converge to {0,1} as the code length N increases. For the second proof, the martingale convergence theorem and (3) will be used. These statements are proved via Lemma 1, Lemma 2, and Proposition 1. After proving Lemma 2, we summarize the proof of the theorem.
For a kernel \(F = \left [ \begin {array}{cc} 1&0\\ 1&1 \end {array} \right ]\) of thegenerator matrix G _{ N }=B _{ N } F ^{⊗n }, the recursively evolving structure of \(\left (W_{N}^{(i)},W_{N}^{(i)} \right) \mapsto \left (W_{2N}^{(2i1)},W_{2N}^{(2i)} \right)\) is similar to that of the recursive equation in [1], except for the last recursions for the length 2.
Lemma 1
The proof of the Lemma 1 directly follows from [1]. The last recursion derives from the independently and nonidentically distributed parallel channel environment.
We now extend Propositions 4–7 in [1], which were proved under the i.i.d. condition, into the nonidentically distributed DMC case. This extension was performed in previous literatures, in the names of “parallel channels” [7, 8] and “nonstationary channels” [3] with the measure of the symmetric capacity.
2.1 Evolution of symmetric capacities
In this subsection, we consider the symmetric capacities of the recursively achieved bit channels. By inserting (6) and (7) into the definition of the symmetric capacities (1), the following proposition holds.
Proposition 1
Suppose \(\left (W_{(1)},W_{(2)} \right)\mapsto (W_{2}^{(1)},W_{2}^{(2)})\) for any binary input discrete channels. Then,
Proof
Since the mapping from a message vector \({U_{1}^{2}}\) to an encoded codeword is deterministic through a generator matrix G _{2}, no information loss occurs between the third equation and the forth equation. Also, the fifth equation derives from the independency among the transmit channels. □
To prove (9), we focus on the second bit channel \(W_{2}^{(2)}\)
Since the value of mutual information is always nonnegative, I(·)≥0. By inserting (11) into (8), we obtain \(I\left (W_{2}^{(1)}\right) \leq I\left (W_{(1)}\right)\)
However, ambiguity remains about whether (9) and (10) are true based on this relation and (11), since there are no additional conditions on the qualities of transmit channels W _{(i)},i∈ [ 1,N] that make them order with measures of the symmetric capacity I(·) or the Bhattacharyya parameter Z(·).
Now, we can consider three necessary conditions for (9) and (10) to be true: (1) \(I\left (W_{(1)} \right) \leq I\left (W_{2}^{(2)} \right)\), (2) \( I\left (W_{(2)} \right) \geq I\left (W_{2}^{(1)} \right)\), and (3) I(W _{(1)})≤I(W _{(2)}).
Conditions 1 and 2 can be easily verified and hold for any set of BECs, binary symmetric channels (BSCs), and binaryinput additive white Gaussian noise (BIAWGN), and thus satisfy (9) and (10). However, the first and the second inequalities do not hold in general BDMCs. Then, the third condition, I(W _{(1)})≤I(W _{(2)}), should be satisfied to achieve (9) and (10). For channels of N=2, where I(W _{(1)})≥I(W _{(2)}), the third condition can be achieved simply by swapping (W _{(1)},W _{(2)})↦(W _{(2)},W _{(1)}) without degrading the achievable rate. This operation is available since the channel parameters are exposed to the transceiver in advance. Therefore, we can conclude that (9), (10), and Proposition 1 are true for any BDMCs.
The equality holds between \( I_{2N}^{(2i)}\) and \( I_{2N}^{(2i1)}\) if and only if the underlying channels are either perfect or completely noisy.
According to (12)–(14), we can observe that the gap between these two evolved symmetric channel capacities \(I_{2N}^{(2i1)}\) and \(I_{2N}^{(2i)}\) increases as recursions are repeated (or as the code length N is doubled). In addition, their values are lower and higher than those of the previous values, respectively. Recalling that I(·)∈ [ 0,1], a conjecture can be formed suggesting that the evolved \(I_{2N}^{(2i1)}\) and \(I_{2N}^{(2i)}\) converge to one of the extremal values, 0 or 1, as the process is repeated. We can now prove this conjecture with the aid of the concept of the martingale process and bounded martingale convergence theorem in the following subsection.
2.1.1 Martingale process I _{ n }
Lemma 2
{I _{ n }} is a martingale under average sequence {E _{ n }} in the sense that \(\mathbb {E}[ I_{n+1}E_{n} ]=E_{n}\), where \(I_{{b_{1}^{n}}} \in \{ I_{n} \}\) and \(E_{n}\left ({b_{1}^{n}} \right)=\mathbb {E}[ I_{{b_{1}^{n}}} ]\)
Proof
which is equivalent to \(E_{n}({b_{1}^{n}})\).
We can now exploit the ergodicity in BDMCs. Let us denote the sum symmetric capacity as \(I_{s}={\sum _{1}^{N}}{I_{i}}\), where I _{ i }=I(X _{ i };Y _{ i }), and the sample mean as \(\mathbb {E}[I_{i}]\). Then, from the ergodicity, \(\mathbb {E}[I_{i}]= \frac {I_{s}}{N}\). Next, we borrow the tree process representation and the corresponding notations from Section 4 in [1].
A bitchannel \(W_{N=2^{n}}^{(i)}\) can be uniquely represented via a binary sequence \({b_{1}^{n}}\): \(W_{{b_{1}^{n}}}\). For example, \(W_{8}^{(3)} \mapsto W_{011}\). Using this, we abbreviate \( I(W_{{b_{1}^{n}}}) = I\left (W_{2^{n}}^{(i)}\right)\) into \(I_{{b_{1}^{n}}}\), where \(I_{{b_{1}^{n}}}\) is an element of a set I _{ n }. We use the same notation in the definition of the average sequence E _{ n }; \(E_{n}\left ({b_{1}^{n}}\right)=\mathbb {E}[I_{{b_{1}^{n}}}] =\mathbb {E}[I_{n}]\) which is the average over all the elements of the set I _{ n }. By exploiting E _{ n }, we can draw a new tree process as depicted in Fig. 3, and we have shown that this process satisfies the property of a martingale process via (16).
Recalling the following properties, we can conclude that E _{ n } converges to 0 or 1: (i) the symmetric capacity is bounded by 0 and 1, such that \(0\leq I_{{b_{1}^{n}}}\leq 1\), and (ii) a bounded martingale process converges to extremal values, which is the same technique as that used in the proof of [1].
Once E _{ n } converges to one of the extremal values, we can also conclude that I _{ n } converges to the same extremal, since E _{ n } is defined as the average of I _{ n }. From (16), the two newly evolving terms are equidistant from \(E_{n}({b_{1}^{n}})\), so we denote them using distances α: \(E_{n}({b_{1}^{n}})+\alpha \) and \(E_{n}({b_{1}^{n}})\alpha \). When the martingale \(E_{n}({b_{1}^{n}})\) goes to 1, we have α→0 since it is bounded above by 1. Similarly, when the martingale \(E_{n}({b_{1}^{n}})\) goes to 0, we have α→0 since it is bounded below by 0. Therefore, we conclude that I _{ n } also converges to 0 or 1.
Consequently, it is concluded that sequence {I _{ n }} is a martingale sequence. □
which concludes the proof of Theorem 1.
In summary, to prove Theorem 1, we showed that first, polarization takes place by using the martingale process, and second, the ratio of noiseless bit channels converges to the symmetric capacity as N→∞. This is because this equivalently indicates that the block error converges to zero as the code length goes to infinity as in [1].
2.1.2 Example on BECs
Here, f(α,β)=α+β−α β,g(α,β)=α β, and \(W_{[1:N]} = \{ W_{(j)} \}_{j=1}^{N}\).
Difference between capacities and achievable rates of polar codes under nonidentical parallel channels
Number  8  10  12  14  16 

Diff. [%]  0.6  0.044  0.038  0.003  0.002 
2.2 Achievable scheme based on the symmetric capacity
2.2.1 Encoder
Given a set of BDMCs \(\{ W_{N}^{(i)} \}\), the encoder calculates \(\{ I_{N}^{(i)} \}\) according to its definition. Only for BECs, \(I_{N}^{(i)}\) calculation follows (19) with equality. In general BDMCs, the error performance of bit channels should be tracked with appropriate measures such as \(Z_{N}^{(i)}\) by using the density evolution method [9]. Then, define an information index set \(A=\left \{ i I_{N}^{(i)} \geq I_{N}^{(j)}, i\in A, j\notin A \right \}\) for all i,j∈[1,N] such that A=⌊I _{ s }⌋. We apply the interleaver Q to map the indices of the codeword to the ordered set of transmit channels: \( Q: \mathcal {X^{N}}\mapsto \mathcal {W^{N}}\), where \({x_{1}^{N}}\in \mathcal {X^{N}}\), and \(W_{[1:N]}\in \mathcal {W^{N}}\). This is deployed to help the polarization processes. The problem with finding the proper Q for given channels is discussed at the end of this section. G _{ N } is a N×N generator matrix, and its kernel is \(G_{2} = \left [ \begin {array}{cc} 1&0\\ 1&1 \end {array} \right ]\). The encoder outputs a codeword \({x_{1}^{N}} ={u_{1}^{N}}\cdot G_{N}\cdot Q\).
2.2.2 Decoder
Thus, the decoding complexity is still maintained as O(N logN), and it has vanishing probability of error rate P _{ e } as N→∞. The encoding and decoding process is summarized in Algorithm 1.
2.3 Evolution of Bhattacharyya parameters
We can now show the similar relationships of those in Proposition 1 on the Bhattacharyya functional Z(·).
Proposition 2
Suppose \(\left (W_{(1)},W_{(2)} \right)\mapsto \left (W_{2}^{(1)},W_{2}^{(2)} \right)\) for some binary input nonidentically distributed discrete channels. Then, the following relations hold
The proofs of (22)(24) is presented in the Appendix.
Corollary 1
where f(α,β)=α+β−α β,g(α,β)=α β, and \(W_{[1:N]} = \left \{ W_{(j)} \right \}_{j=1}^{N}\). The equality holds for BEC.
By applying the recursive channel structure of Lemma 1 to the definition of the Bhattacharyya parameter, the above corollary can be derived. With the aid of Lemma 1 and (22)–(24), we can derive Properties 1, 2, and 3 as follows:
 1.$$\begin{array}{*{20}l}{} Z_{2N}^{(2i1)}+Z_{2N}^{(2i)} \leq Z_{N}^{(i)}\left(W_{[1:N]} \right) + Z_{N}^{(i)}\left(W_{[N+1:2N]} \right). \end{array} $$(26)
 2.(for some BDMC W)$$\begin{array}{*{20}l} Z_{2N}^{(2i1)} \geq \text{max}\left\{ Z_{N}^{(i)}\left(W_{[1:N]} \right),{Z_{N}^{i}}\left(W_{[N+1:2N]} \right) \right\}.\end{array} $$(27)
To prove the above propositions, let us start from the first property. Property 1 can be derived from the definition of the Zparameter and we can use the same proof process of (24) that exploits the inequality relation between the arithmetic mean and the geometric mean. By using the f(α,β) and g(α,β) notations as in Corollary 1, Property 2 holds if and only if the joint event of {α β≤α}∩{α β≤β} holds, which is true because of the nonnegativity of α,β∈ [ 0,1]. Property 3 is true for BEC, BSC, and BIAWGN which can be transformed to an equivalent BSC model.
2.3.1 BEC case
Hence, \(Z_{2N}^{(2i1)} \geq Z_{N}^{(i)}(W_{[1:N]})\) and \(Z_{2N}^{(2i1)} \geq {Z_{N}^{i}}(W_{[N+1:2N]})\); thus, Property 3 holds.
2.3.2 BSC case
Hence, we can conclude that Property 3 holds for the BSC case.
2.3.3 BIAWGN case
Note that a binaryinput AWGN channel can be transformed to the equivalent BSC by applying a line coding that maps the set of binary alphabets {0,1}↦{−1,+1}. Therefore, Property 3 holds.
2.4 Supermartingale Z _{ n }
Let us define the random sequence {Z _{ n }} such that \(Z_{N}^{(i)} \mapsto Z_{n}({b_{1}^{n}})\), where \(Z_{n}({b_{1}^{n}})\in \{ Z_{n} \}\) is represented as \(Z_{{b_{1}^{n}}}\). Then, for some BDMCs that satisfy four properties, we can draw a graphical evolving recursive structure of \(Z_{N}^{(i)}\) (or Z _{ n }) similar to Fig. 3. The graphical structure of Z _{ n } has a common feature to that of the identically distributed channel case; however, elements in the rightmost column are under a nonidentically distributed scenario, where CPs may not be the same.
{Z _{ n }} can be considered as a supermartingale if this random tree process satisfies the relation \(Z_{n} \geq \mathbb {E}\left [Z_{n+1}{b_{1}^{n}}\right ]\). Under the i.i.d. channel assumption, it is easily verified, since the one step transition from \(Z_{n}({b_{1}^{n}})\) to {Z _{ n+1}} is a single variable to single variable mapping: \(\mathcal {K}:\mathcal {Z} \rightarrow \mathcal {Z}\) such that \(\mathcal {Z}=\{ ZZ\in \Re, Z\in [0,1] \}\). In contrast, with the nonidentically distributed assumption, the transition becomes a two to one mapping: \(\mathcal {K}':\mathcal {Z}^{2} \rightarrow \mathcal {Z}\). Each \(Z_{{b_{1}^{n}}}\) is not a scalar but a 2×1 vector. In this case, the format of the condition is not appropriate due to dimension mismatch.
Using a similar process as that for I _{ n } in proving the martingale property, we apply the average sequence {E _{ n }} to replace \(Z_{{b_{1}^{n}}}\) such that \(E_{n}\left ({b_{1}^{n}}\right)= \mathbb {E}\left [Z_{{b_{1}^{n}}}\right ]\). Then, from (25), we can verify that \(\mathbb {E}\left [Z_{n+1}E_{n}({b_{1}^{n}})\right ] \leq E_{n}\) which means Z _{ n } is a supermartingale under E _{ n }.
2.5 Convergence of {Z _{ n } }
where \(\alpha \triangleq Z_{N}^{(i)}(W_{1:N})\) and \(\beta \triangleq Z_{N}^{(i)}(W_{N+1:2N})\), it becomes an indeterminate equation with the condition of α,β∈[ 0,1]. Solution pairs (α _{ ∞ },β _{ ∞ })={(0,0),(1,1)} can be obtained. Therefore, since Z _{ ∞ }, which corresponds to α _{ ∞ }, converges to either 0 or 1 almost everywhere, we can conclude that all \(Z_{N}^{(i)}\) are polarized in either a near perfect or totally random manner as n→∞.
2.6 Channel mapping via the interleaver Q
In this subsection, we discuss the role of an interleaver Q in polar coding systems under nonidentically distributed BDMCs and propose an algorithm that explains how to construct such an operation. The transceiver structure including an interleaver Q and a deinterleaver Q ^{−1} is depicted in Fig. 1. To understand the importance of Q, let us consider the following example.
Example 1
For N=4, let the set of erasure probabilities of parallel BECs be \(\{ {\epsilon _{1}^{N}} \}=\{ 0.1, 0.4, 0.6, 0.9 \}\), of which the average is ε _{ m }=0.5. Then, the evolved bit channel capacities are \(\left \{ I\left (W_{N}^{(i)} \right) \right \} = \{ 0.02, 0.56, 0.44, 0.98 \}\) and the encoder selects the information set A={2,4}. The resulting sum of the capacities of the selected bit channels is 1.54 [bits] for four channel uses.
We now apply the interleaver Q over the same set, which results in \(\left \{ {\epsilon _{1}^{N}} \right \}=\{ 0.6, 0.4, 0.9, 0.1 \}\). The evolved set of symmetric capacities is \(\left \{ I\left (W_{N}^{(i)} \right) \right \} = \{ 0.02, 0.31, 0.69, 0.97 \}\).
The encoder chooses two indices with the highest \(I\left (W_{N}^{(i)} \right)\) values: A={3,4}, and the achievable sum is 1.66 [bits], which is a 7.8% enhancement compared to the previous index.
Equivalently, the mapping can be interpreted as a channel permutation such that Q:{W _{(i)}}↦{W(i)′}. We discuss two methods that search for such a mapping Q.
2.6.1 Exhaustive search method with grouping
The simplest and most naive approach involves testing every possible combination over N channels and selecting the best combination. Obviously, N! number of cases need to be checked. However, we can categorize every combination into equivalent groups since in each group, all combinations output the same qualities of bit channels. The size of each group is 2^{ N−1}. Hence, owing to the recursive channel evolving structure, the required number of tests is \(\frac {N!}{2^{N1}}\). The detailed proof is shown in the Appendix. This grouping technique considerably reduces the computational burdens: for N=8, we need to test 315 representative combinations instead of N!=40,320. However, the enhanced test set size would go beyond the computational capability for practical N lengths.
2.6.2 Heuristic method
 1.
Sort transmit channels W _{(i)},i∈ [ 1,N] in an ascending order of the capacity I(W _{(i)}).
 2.
Make \(\frac {N}{2}\) pairs: the i ^{ th } pair includes the i ^{ th } smallest transmit channel and the i ^{ th } largest transmit channel.
 3.
Using the indices of \(\frac {N}{2}\) pairs, \(\left [1: \frac {N}{2}\right ]\), repeat the second procedure until the size of the index set becomes 4.
where \(\mathcal {I}_{2}\) is the 2×2 identity matrix and ⊗ means the Kronecker product.
From the figure, the proposed interleaving algorithm shows better performance in the rate than the other algorithms. It is also observed that when W _{(i)}s forms an ordered set, it converges to the capacity slower than the other algorithms. Hence, if the symmetric capacities of the exposed channels are ordered in either way, they should be rearranged through Q.
3 Nonidentical binary erasure channels with random erasure probabilities with single distribution
In previous studies, [8] and [3], it is assumed that the characteristics of underlying discrete memoryless channels are fully exposed to the transceiver; thus, the encoder and the decoder exploit this information. Under this condition, it is proved that polar codes can achieve the symmetric capacity.
In this section, we assume that channel parameters are not deterministic, but are realizations of a random variable. For BEC, the channel transition probability of a transmit channel \(W_{(i)}:\mathcal {X}_{i}\mapsto \mathcal {Y}_{i}\) is fully described by the erasure probability ε _{ i }. Hence, the channel features of the nonidentically distributed parallel channels model are perfectly represented via the set of erasure probabilities \(\left \{ {\epsilon _{1}^{N}} \right \}\).
The existence of random erasure probabilities means that each ε _{ i } is the realization of the random variable θ such that ε _{ i }∼f _{ θ }(ε _{ i }),∀i∈ [ 1,N], where f _{ θ } is a stationary probability distribution function.
We now assume that the realized set of erasure probability \(\left \{ {\epsilon _{1}^{N}} \right \}\) is exposed to neither the encoder nor the decoder. In this case, the only available information that can be extracted to the encoder and the decoder is the set of moments from the given distribution f _{ θ }. We prove that under this scenario, polar codes can achieve the symmetric capacity.
Theorem 2
where the symmetric capacity I _{ s } is defined as an average of the individual transmit channel’s capacities: \(I_{s}(W_{N}) =\frac {1}{N}\sum _{i=1}^{N} I\left (W_{(i)} \right)\), where \(W_{N}: {X_{1}^{N}} \mapsto {Y_{1}^{N}}\).
3.1 Proof of Theorem 2
By the law of large numbers, the empirical channel behavior for a codeword can be described by the first moment ε _{ m } of the distribution f _{ θ }. Also, as previously mentioned, since the transceiver is oblivious to the exact set of erasure probabilities of the underlying parallel channels, it has no choice but to exploit ε _{ m } for consideration of constructing codewords.

The existence of polar codes over BEC with erasure probability ε _{ m }

ε ^{′}≥ε _{ m } for all distributions f _{ θ }
This completes the proof of Theorem 2.
3.2 Achievable polar coding scheme
According to Theorem 2, the construction of the capacity achieving polar coding scheme is straightforward. First, given the distribution f _{ θ }, the encoder calculates the first moment \(\mathbb {E}[\epsilon ]=\epsilon _{m}\). It then constructs the message vector \({u_{1}^{N}}\) by determining the information index set A _{ N } with the predefined frozen bits u _{ F }. This message sequence is encoded through the generator matrix G _{ N } and is transmitted through the nonidentically distributed parallel BECs of \(\left \{ {\epsilon _{1}^{N}} \right \}\). The procedure is summarized in Algorithm 2.
It should be noted that to achieve the symmetric capacity under nonidentically distributed parallel BECs, with unknown channel parameters, the only constraint required is the code length N→∞.
4 Random erasure probabilities with nonidentical distributions
In this subsection, we consider the case of N nonidentically distributed BECs W _{(i)}: for ∀i∈ [ 1,N] that each distribution \(f_{\theta _{i}}\) of the erasure probability ε _{ i } for each transmit channel W _{(i)} could differ for different bit channel indices. Thus, this scenario includes previous scenarios as a special case.
A simple coding scheme that is able to achieve the symmetric capacity is the conveyance of multiple codewords as a group for each decoding stage. In each stage, we exploit the set of parallel nonidentical channels L=2^{ l } times. The reason for this power of 2 format is to match with the length of polar codewords.
Let W _{(i)}(j) mean the jth access to the channel W _{(i)}, ε _{ i }(j) as the instantaneous erasure probability of the channel W _{(i)}(j) that follows the distribution \(f_{\theta _{i}}\), and I _{ s,i }(W _{(i)}) is the symmetric capacity of W _{(i)} by accessing it L times.
Ergodic behaviors of instantaneous capacities
1  2  ⋯  L  Average  

1  I(W _{(1)}(1))  I(W _{(1)}(2))  ⋯  I(W _{(1)}(L))  \(\bar {I}(W_{(1)})\) 
2  I(W _{(2)}(1))  I(W _{(2)}(2))  ⋯  I(W _{(2)}(L))  \(\bar {I}(W_{(2)})\) 
i  I(W _{(i)}(1))  I(W _{(i)}(2))  ⋯  I(W _{(i)}(L))  \(\bar {I}(W_{(i)})\) 
N  I(W _{(N)}(1))  I(W _{(N)}(2))  ⋯  I(W _{(N)}(L))  \(\bar {I}(W_{(N)})\) 
where \(\epsilon _{m_{i}}\) is the first moment of \(\phantom {\dot {i}}\epsilon _{i} \sim f_{\theta _{i}}\). Note that \({\lim }_{L\rightarrow \infty }I_{s}\left (W^{L}_{[N]} \right)\frac {1}{N}=\sum _{i=1}^{N} (1\epsilon _{m_{i}})\). The equality is due to the affinity of the symmetric capacity over the domain of erasure probability. We now consider two cases. In the first case, we assume that the encoder is able to be adaptedto various code lengths. This means that the encoder can construct generator matrices G _{ N } for any exponent n (N=2^{ n }). In the second case, the coding structure is fixed, thus parameter N (and the following G _{ N }) cannot be changed. In the first case, let the encoder exploit any exponent l, and construct L×L generator matrix G _{ L } where L=2^{ l }. The following proposition is then satisfied:
Proposition 3
For a set of nonidentically distributed BECs {W _{(i)}}, with a set of random erasure parameter {ε _{ i }},i∈[1,N] that each ε _{ i } follows nonidentical \(f_{\theta _{i}}\), the symmetric capacity \(I_{s}\left (W^{L}_{[N]} \right)\) is achievable by exploiting multiple streams of polar codewords.
for an arbitrary small δ≥0, and for all channel index i∈ [ 1,N]. Therefore, Proposition 3 is true for any set of distributions \(\left \{ f_{\theta _{i}} \right \}\).
Achievable coding scheme
The receiver saves L output vectors in matrix Y and produces estimates by applying the SC decoder row by row. This procedure is summarized in Algorithm 3. Note that there are no constraints on N. Actually, since the proposed coding scheme is not affected by N, it is robust to the deletion of some transmit channels. Assume that a set of channels W _{ J } where J is a subset of [1,N] is lost since the corresponding set of symmetric capacities are all zeros. The encoder and the decoder will then simply decrease N to N−J; the data is transmitted through W _{[1,N]∖J }. Note that the symmetric capacity \(I_{s}\left (W^{L}_{[N]\setminus J} \right)= \sum _{\forall i\in [1,N]\setminus J} I(W_{(i)})\) is still achievable.
For the symmetric capacity, \(I_{s}\left (W^{L}_{[N]} \right)\) is the same as the arithmetical mean of its parts \(\{ \bar {I}(W_{(i)})\forall i\in [1,N] \}\): \(I_{s}\left (W^{L}_{[N]}\right)=\frac {1}{N} \sum _{i=1}^{N} \bar {I}(W_{\Lambda _{i}})\). We can conclude that it is achievable from the proposed scheme. The complexity of this polar coding scheme is O(NL logL), since it is a concatenation of N SC decoders of length L.
In addition, consider a transmitter in which the encoding structure cannot be changed. Then, the only choice is to utilize the fixed size of the N×N generator matrix G _{ N }, where N=2^{ n } and the produced codewords are of the same length, N. We can treat this problem by setting L to be identical to N. The encoder defines the collection of information index sets {A _{ i }} from \(\{ \epsilon _{m_{i}} \}\phantom {\dot {i}}\), where \(\phantom {\dot {i}}A_{i}=\lfloor N\cdot I(\epsilon _{m_{i}})\rfloor \). Using a common generator matrix G _{ N }, the encoder sequentially produces N of polar codewords \({x_{1}^{N}}(i)={u_{1}^{N}}(i)\cdot G_{N}\) for i∈ [ 1,N]. These codewords \(\left \{ {x_{1}^{N}}(k) \right \}\) are stacked in rows of the matrix X. The complexity of this polar coding scheme is O(N ^{2} logN), since it is a concatenation of N SC decoders.
5 Polarizations on nonindependent channels
where \(N'=\frac {N}{2}\).
In a qary representation, (42) still holds, where the input alphabet cardinality is q=2^{ r }. If these relabeled independent qary DMCs are identically distributed, such that \(W^{\prime }_{(i)} = W'_{(j)}\) for ∀i,j∈ [ 1,N ^{′}], it is proved in [11] that polar codes for qary input DMCs achieves the symmetric channel capacity when q=2^{ r } by exploiting the same kernel as in [1]. Therefore, the following proposition holds for a general qary DMC W ^{′}.
Proposition 4
There exist polar codes for nonindependent DMCs W ^{′}, which achieve the symmetric I(W ^{′}), by relabeling rbits binary sequence to a qary (q=2^{ r }) symbol as N ^{′}→∞ through the power of 2.
Proof
The proof follows the proof of the existence of polar codes in [11]. The authors proved the existence of polar codes that achieve the symmetric capacity for any qary input alphabet discrete memoryless channel when q is the power of 2. In our model, by grouping and relabeling alphabets in the same length, the model can be mapped to an equivalent qary alphabet model, without any information loss because these operations are deterministic. We can then exploit the same proof as that in [11]; hence, the achievability still holds in this partially nonindependent channels model. □
6 Conclusions
In Section 2, we proved that for deterministic CPs in nonidentical channel models, polar codes can achieve the sample mean of bit channel capacities. In addition, we provided an example that demonstrates the importance of the use of the proper channel interleaver Q for achievable rate enhancement and proposed the heuristic mapping algorithm.
In Sections 3 and 4, the key contribution is a new system model where the transmitter and the receiver know only the channel parameter distribution and not the channel parameter itself. If the underlying channel type is BEC, the coding scheme can become simpler. Note that for a BEC with erasure probability ε, its symmetric capacity I is the affine function of ε. Then, we have the relation \(E[I(\varepsilon)]=I(\overline {\epsilon })\), where \(\overline {\epsilon }\) is the expectation of the random variable ε∼f _{ ε }(ε).
By applying multiple streams of polar codewords, we prove that the average capacity of BECs under our scenarios is achievable. However, this is obtained by sacrificing the latency and complexity, since they stack multiple blocks during the encoding and decoding process. Hence, these schemes might not be suitable in systems where low latency or low complexity is required. Rather, it is more practical in storage systems such as flash memory devices where throughput is more important than latency. Especially, for flash memories, statistical responses such as a voltage threshold would change with time and with the number of accesses to a cell block. Hence, as the storage capacity increases, it is inefficient for a storage controller to determine the exact states of every block or cell. If statistics on their changes are given instead, we can manage cells more efficiently using the proposed polar coding scheme.
In addition, in the case of parallel channels, where statistically different random disturbances exist across channels, it is difficult to track all the channel parameters. However, if their statistics are known to the transmitter and the receiver, we can deliver data up to the average capacity through polar codes by sacrificing latency. In such cases, polar codes are a promising option that maximizes the throughput.
Under the nonindependent channel scenario, we assume that the N transmit channels are grouped into channels with size r which is a power of 2, so that we can treat the scenario as a nonbinary system. If N is not divisible by r (N mod r≠0), puncturing may be used to fit the system into a qary system [12]. The proposed polar codes appear to be promising for applications where only the knowledge of the channel parameter distribution is available and can be practical for storage applications such as flash memory devices.
7 Appendix
7.1 Proof of (22)
Proof
Therefore, \(Z\left (W_{2}^{(1)}\right) \leq Z\left (W_{(1)} \right)+Z(W_{(2)})Z(W_{(1)})Z(W_{(2)})\) is satisfied for any binary input channel parameters. □
7.2 Proof of (23)
Proof
The relation can be verified simply by subtracting either the right had terms from the left hand term.
7.3 Proof of (24)
Proof
7.4 Proof of the number of equivalent channel combinations
and it was proved that polarizations occur in nonidentically distributed channels.
By solving this recurrence formula, we can obtain χ _{ n }=N−1. Recalling that \(\mathcal {H}\) is invariant to S which indicates there are two cases for each \(\mathcal {H}\), the number of combinations of the BDMCs that result in the same information set of polar codes is 2^{ N−1}. Hence, the number of representative combinations that may have different information sets for length N parallel polar coding systems is \(\frac {N!}{2^{N1}}\).
Declarations
Acknowledgements
This research was supported in part by the Basic Science Research Programs (NRF2013R1A1A2008956 and NRF2015R1A2A1A15052493) through NRF funded by MOE and MSIP, the ICT R&D Program of MSIP/IITP (B0717160023), the Technology Innovation Program (10051928) funded by MOTIE, the BioMimetic Robot Research Center funded by DAPA (UD130070ID), INMC, and BK21plus.
Authors’ contributions
JK and JL prove the achievability of polar codes in nonidentically distributed channels and propose the channel interleaver algorithm to enhance the system reliability. Both authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 E Arikan, Channel polarization: a method for constructing capacityachieving codes for symmetric binaryinput memoryless channels. IEEE Trans. Inform. Theory. 55(7), 3051–3073 (2009).MathSciNetView ArticleGoogle Scholar
 M Alsan, Conditions for robustness of polar codes in the presence of channel mismatch (2013). eprint ArXiv:1303.2379v2.Google Scholar
 M Alsan, E Telatar, in IEEE International Symposium on Information Theory (ISIT) 2014. A simple proof of polarization and polarization for nonstationary channels (Honolulu, 2014), pp. 301–305.Google Scholar
 D Blackwell, L Breiman, AJ Thomasian, The capacity of a class of channels. Ann. Math. Stat. 3(4), 1229–1241 (1959).MathSciNetView ArticleMATHGoogle Scholar
 SH Hassani, R Urbanke, in IEEE International Symposium on Information Theory (ISIT) 2014). Universal polar codes (Honolulu, 2014), pp. 1451–1455.Google Scholar
 E Sasoglu, L Wang, in IEEE International Symposium on Information Theory (ISIT) 2014. Universal polarization (Honolulu, 2014), pp. 1456–1460.Google Scholar
 E Hof, I Sason, S Shamai, in IEEE 26th Convention of Electrical and Electronics Engineers in Israel (IEEEI). Polar coding for degraded and nondegraded parallel channels (Eilat, 2010), pp. 550–554.Google Scholar
 E Hof, I Sason, S Shamai, C Tian, Capacityachieving codes for arbitrarily permuted parallel channels. IEEE Trans. Inform. Theory. 59(3), 1505–1516 (2013).MathSciNetView ArticleGoogle Scholar
 R Mori, T Tanaka, Performance of Polar codes with the construction using density evolution. IEEE Commun. Lett. 13(7), 519–521 (2009).View ArticleGoogle Scholar
 E Sasoglu, in IEEE International Symposium on Information Theory Proceedings (ISIT) 2012. Polar codes for discrete alphabets (Cambridge, 2012), pp. 2137–2141.Google Scholar
 W Park, A Barg, Polar codes for qary channels, q=2^{ r }. IEEE Trans. Inform. Theory. 59(2), 955–969 (2013).MathSciNetView ArticleGoogle Scholar
 A Eslami, H PishroNik, in IEEE International Symposium on Information Theory Proceedings (ISIT) 2011. A practical approach to polar codes (St. Petersburg, 2011), pp. 16–20.Google Scholar