 Research
 Open Access
 Published:
Blind identification of code word length for nonbinary errorcorrecting codes in noisy transmission
EURASIP Journal on Wireless Communications and Networking volume 2015, Article number: 43 (2015)
Abstract
In cognitive radio context, the parameters of coding schemes are unknown at the receiver. The design of an intelligent receiver is then essential to blindly identify these parameters from the received data. The blind identification of code word length has already been extensively studied in the case of binary errorcorrecting codes. Here, we are interested in nonbinary codes where a noisy transmission environment is considered. To deal with the blind identification problem of code word length, we propose a technique based on the GaussJordan elimination in GF(q) (Galois field), with q=2^{m}, where m is the number of bits per symbol. This proposed technique is based on the information provided by the arithmetic mean of the number of zeros in each column of these matrices. The robustness of our technique is studied for different code parameters and over different Galois fields.
Introduction
Errorcorrecting codes are frequently used in modern digital transmission systems in order to improve the communication quality. These codes are designed to achieve a good immunity against channel impairments by introducing redundancy in the informative data. Due to the complexity of both encoding and especially decoding procedures, the majority of research and practical implementations of realtime embedded systems were often restricted to encoders manipulating binary data, i.e., elements of the Galois field GF(2). Over the last decade, lowdensity parity check (LDPC) codes and turbo codes over GF(2) have attracted considerable interest of many researchers due to their excellent error correction capability. They have been generalized to finite fields GF(q) [1,2], where q=2^{m}, and are among the most widely used errorcorrecting codes in wireless communication standards. It has been shown in [1] that nonbinary LDPC codes perform generally better than binary LDPC codes and turbo codes. However, the major drawback of these codes is their decoding complexity for a large Galois field order q [3,4]. Low complexity decoding algorithms have recently been proposed [5,6], thus allowing the use of nonbinary LDPC codes in practical implementations.
Our main research interests are focused on nonbinary errorcorrecting codes in order to blindly identify their parameters. This topic is a part of a noncooperative context like a military interception or cognitive radio applications. In this case, the receiver has no knowledge about the parameters used to encode the information at the transmitter. The solution is to design an intelligent receiver which is able to blindly identify the encoder parameters from the only knowledge of the received data stream. This blind identification function of the receiver permits to increase the data rate transmission, since it will be unnecessary to transmit supplementary information about the encoder parameters with the useful data. Such intelligent receiver is able to adapt automatically itself to the development of new highperformance coding schemes and the fast evolution of new communication standards without equipment change. In this work, we are only interested in blindly identifying the code word length of linear nonbinary block codes. In the case of the interception, this parameter can not be transmitted. Likewise, if we want to change the encoder or get out of the list of possible choice of encoders, the code word length is not transmitted.
In this context, the published research results have been restricted so far to the blind recognition of the code word length of binary codes. To the best of our knowledge, this paper introduces, for the first time, an approach to blindly identify the code word length of nonbinary codes in noisy conditions. In this work, the aim is to blindly identify the code word length from the only knowledge of received data. The authors in [7] proposed a technique of identification of nonbinary LDPC parameters, but the identification is not blind because it is based on using a predefined candidate set of encoders which is known by both the transmitter and the receiver. Furthermore, this technique only works with LDPC codes unlike our proposed technique, which is general and suitable for all block codes. In our paper, the proposed blind identification technique is based on a generalization of an existing method used for binary codes. The principle of this generalization will be explained in this paper without specifying in details its detection performances. So, we present here stateoftheart techniques to identify the code word length of binary linear block codes. The idea of these techniques is to find a basis of a dual code composed of parity check relations. For this purpose, an approach based on finding code words of small Hamming weight [8,9] was improved by Valembois [10] by using statistical hypothesis tests and recently by Cluzeau [11,12] and Côte [13]. A second approach based on linear algebra theory was introduced in [14] for noiseless channel. This approach permits to recover the length of code words by studying behaviors of the rank of matrices composed of received bits. However, the rank criterion was exploited without providing an algebraic and theoretical justification of such behavior. In [15], the use of this criterion was justified. In [16], the rank criterion approach was generalized to convolutional codes over GF(q), where q>2, assuming a noiseless transmission, but it was shown that this generalized technique can be also performed to nonbinary linear block codes. In noisy transmissions, a technique based on the Gauss elimination in GF(2) was applied in [1719] to matrices composed of noisy received bits in order to find the number of almost dependent columns permitting the identification of the code word length in the case of binary errorcorrecting codes. Indeed, an almost dependent column of a matrix composed of noisy received symbols corresponds to a column which may be a linear combination of some preceding columns without the presence of erroneous symbols and which leads to a column that contains more zero elements after the Gauss elimination.
Compared to previous works, we demonstrate here that it is possible to generalize the blind identification technique proposed in [17,18] to nonbinary block codes provided that the Galois field parameters (the cardinality and the primitive polynomial) are known by the receiver. To identify the primitive polynomial, an algorithm of identification was proposed in [20]. To achieve our purpose, it is necessary to identify the number of almost dependent columns in the matrices composed of noisy symbols of GF(q) by studying the probability of detection of these columns, denoted as P _{ i }. In fact, the computation of P _{ i } is essential in order to determine an optimal detection threshold. Assuming a transmission over qary symmetric channel with an error probability p _{ e }, the techniques based on finding a base of a dual code [18,19] for binary codes require the knowledge of p _{ e }, where a hard decision demodulation is considered. For this reason, we propose here an approach which is more robust because it allows us the blind identification of the code word length of nonbinary and binary block codes without using the error probability p _{ e }. This approach is based on analyzing behaviors of the arithmetic mean of the number of zeros in the columns of the matrices constructed by the Gauss elimination in GF(q). In this paper, the proposed method is a general method that should be applied to all nonbinary block codes even though most examples of codes given here are nonbinary LDPC codes. For this reason, the properties of LDPC codes are not exploited by our method.
This paper is organized as follows. In the ‘Technical background’ section, we present the encoding process of nonbinary errorcorrecting codes. Then, the principle of the blind identification of code parameters in the noiseless case is described. The channel model used in this study is also defined and justified in this section. In the ‘Blind identification of code word length in the noisy case’ section, the blind identification method of the code word length in noisy environment is described. A comparison in terms of error probability and detection performances is shown in the ‘Analysis and performances’ section. Finally, some conclusions are drawn in the ‘Conclusions’ section and planned future work is pointed out.
Technical background
Nonbinary errorcorrecting codes
The use of an efficient coding system in the transmitter as errorcorrecting codes is essential in order to fight disturbances present on the transmission channel. For a long time, cyclic codes such as BCH codes [21,22] and ReedSolomon codes [23] have been the most commonly used as codes based on finite fields since they are characterized by large minimum distances for a hard decision decoding. The nonbinary LDPC codes described by a sparse parity check matrix with elements in GF(q) have been developed by Davey and MacKay in 1998 [1]. Significant works on the design and the decoding complexity reduction of these codes have shown that they have a great potential to replace ReedSolomon codes in some applications of communication, such as space communications [24], and storage systems [25,26]. In this paper, we focus on the blind identification of code word length for the nonbinary block codes, but this proposed method can also be applied to convolutional codes and concatenated codes.
Let us present the encoding process of these codes over GF(q). Actually, the principle of a transmission chain is to send digital information from a source to one or more receivers. The information yielded by the source is binary data {0,1}= GF(2). Each block of m information bits are combined to generate a symbol of GF(q). Then, the generated nonbinary information, denoted as d, is encoded by one of the block codes over GF(q) listed above. For most block errorcorrecting codes, a code word, denoted as c, composed of nonbinary symbols is obtained by the multiplication of the information d and a nonbinary generator matrix G:
In the case of LDPC codes, the encoding process needs the use of the parity check matrix, which is always sparse compared to the other codes.
In most of the standards, such as longterm evolution (LTE) standard [27], the encoding is performed in a systematic form in order to facilitate the decoding process without degrading performances of the error correction. For this reason, in the case of block codes, the required parameters to perform the decoding operation are the number of inputs, denoted as k, the code word length, denoted as n, and a parity check matrix, denoted as H. Indeed, the matrix H will be used by the decoder to detect or/and to correct the errors. The recovered information will be the first k symbols of the recovered code word due to the systematic form used in the encoding. Our aim in this research work is to blindly identify the parameter n from nonbinary received symbols which are affected by noisy transmissions. In the noiseless context, we have already demonstrated in [16] that we can identify this parameter with the only knowledge of the received data, provided that the Galois field parameters are known. The principle of blind identification of the code parameter n in the noiseless case is recalled in the following subsection.
Principle of blind identification method of code word length in the noiseless case
In this part, we assume that the channel introduces no error. In [16], we have adapted the method proposed in [28] to identify the parameters of convolutional codes over GF(q), where q=2^{m}. We have shown that our method for the noiseless case can be applied to block codes. This method reshapes rowwise the received symbols, denoted as r, under a matrix form, denoted as R _{ l }, of size (M×l). Indeed, R _{ l } is filled by received symbols from the top left corner to the bottom right as illustrated in Figure 1.
The number of columns l varies between 1 and l _{max} and the number of rows M which depends on l is given by the integer part $\left \lfloor \frac {L}{l} \right \rfloor $ where L is the length of a received symbol stream. Then, the rank over GF(q) is calculated for each matrix R _{ l }. When all matrices R _{ l } have full rank, it is impossible to detect the existence of a code. Nevertheless, the redundancy introduced by the code leads to rank deficiencies in some matrices R _{ l }. Henceforth, the rank behaviors of R _{ l } allow us to detect the code and to identify its parameters, in particular the code word length. As demonstrated in [15] and studied in [16], there are two possible rank behaviors according to the number of columns l. If l is a multiple of n (i.e., l=α·n, $\alpha \in \mathbb {N}$ ), the ranks of the matrices R _{ l } are proportional to the code rate k/n (i.e., rank(R _{ l })=l·k/n). Otherwise (i.e., l≠α·n), R _{ l } have full rank (i.e., rank(R _{ l })=l). Thus, the value of the rank deficiency depends on code parameters (k and n). Indeed, only two consecutive rank deficiencies are necessary to determine all code parameters. The code word length n can be determined by the difference between two values of l corresponding to two consecutive rank deficiencies of R _{ l }. As shown in [16], the rank method gives good results in a noiseless environment. A theoretical and algebraic study of the behavior of the rank criterion, as well as particular cases which can occur for specific parameters of codes, were presented in [15]. It was demonstrated that most matrices R _{ l } have full rank when l is not a multiple of n, except for some particular cases which depend on codes (generator matrix). In a noisy environment, the rank method can not be used, since all the matrices R _{ l } have full rank in this case.
Nonbinary channel
In order to evaluate our blind identification algorithm, we assume that the encoded sequences are transmitted through a qary (nonbinary, for q=2^{m}>2) symmetric channel (QSC) which is the simplest channel. However, our proposed algorithms can work for every type of channel provided that the error probability p _{ e } computed at the output of the demodulator is known. Indeed, we consider that the blocks of the transmission chain, the modulator, the transmission channel, and the demodulator can be modeled by a nonbinary channel, where a hard decision demodulation is considered. In a cognitive radio context, a multipath fading channel is used. This realistic channel leads to burst errors which can be corrected by using an interleaver and errorcorrecting codes. In this context, the errors at the output of a deinterleaver at the receiver side can be modeled by a QSC when a decoding process with hard decision will be used. The problem of a blind identification of the interleaver period, as well as a blind synchronization with the interleaver blocks was handled in [14,18].
Let us define the qary symmetric channel which is the generalization of the binary symmetric channel (BSC). In fact, it is a discrete memoryless channel with an error probability p _{ e } and composed of nonbinary inputs and nonbinary outputs belonging to the GF(q), where q=2^{m}. The symbols at the input of the channel are independent and distributed uniformly with a probability equal to 1/q. A symbol δ∈ GF(q) at the channel input is received incorrectly with a probability p _{ e }/(q−1) [29]. In other words, it is replaced at the receiver by a different symbol β of GF(q). The probability of correctly receiving a symbol is equal to 1−p _{ e }. The QSC channel is characterized by the conditional probabilities:
where the transmitted symbol is denoted r _{ i }, i.e., r _{ i }=c _{ i }, for i∈{1,⋯,L}, and the noisy received symbol is denoted $\tilde {r}_{i}$ such that $\tilde {r}_{i}=r_{i}+e_{i}$ with e _{ i } the transmission error introduced in the symbol r _{ i }. An example of a nonbinary symmetric channel for q=2^{2} is depicted in Figure 2.
In the following section, we present the blind identification method of the parameter n in a noisy framework.
Blind identification of code word length in the noisy case
In this part, we present the implementation method which allows us to identify the code word length of a nonbinary code in a noisy environment. This method is based on the concept of finding the rankdeficient matrices among $\tilde {\mathbf {R}}_{l}, \forall l\in \lbrace 1,\ldots,l_{\text {max}}\rbrace $ , corresponding to matrices having at least one almost dependent column. Indeed, the matrices $\tilde {\mathbf {R}}_{l}$ are reshaped in the same way as R _{ l } using the noisy received symbols $\tilde {r}_{i}$ . In [19], a method devoted to determine these matrices in the case of binary codes was presented. However, this method requires the knowledge of the error probability p _{ e }. In order to avoid this constraint, we propose a method based on using the arithmetic mean criterion in order to detect the rankdeficient matrices which have some almost dependent columns without the need of the error probability p _{ e }.
Principle
In a noiseless case, the rank criterion is used to find the maximum number of linearly independent columns in the matrices R _{ l }. This allows us to derive the number of linearly dependent columns in R _{ l } (columns which are linear combinations of other columns). The finitefield Gauss elimination method [30] has to be used to eliminate those lineardependent columns to zero. In noisy transmissions, all matrices $\tilde {\mathbf {R}}_{l}$ have full rank. A matrix $\tilde {\mathbf {R}}_{l}$ can be expressed according to R _{ l } by:
where E _{ l } is the error matrix of size (M×l) constructed in the same way as R _{ l } using the errors induced by the channel. Therefore, the dependence of the columns is disturbed by the presence of errors in some received symbols. In such context, the authors in [17,18] proposed to look for the number of almost dependent columns in the matrices composed of noisy received bits by using the Gauss elimination over GF(2). Inspired by this idea, it is sufficient, in the case of nonbinary error correcting codes, to apply the finitefield Gauss elimination in GF(q) to $\tilde {\mathbf {R}}_{l}$ in order to obtain a new matrix $\tilde {\mathbf {T}}_{l}$ of size (M×l). This algorithm gives also at output a matrix of size (l×l), denoted $\tilde {\mathbf {A}}_{l}$ , that describes the combination operations performed to the columns of the matrix $\tilde {\mathbf {R}}_{l}$ in order to obtain the transformation matrix $\tilde {\mathbf {T}}_{l}$ . A recall of the finitefield Gauss elimination over GF(q) is presented in Algorithm 1. To describe this algorithm, we denote I _{ l } the identity matrix of size (l×l), $\mathbf {x}_{i}^{(l)}$ the ith column of a given matrix X _{ l } and $x_{i}^{(l)}(j)$ a coefficient of a matrix X _{ l } placed in the ith column and in the jth row.
By means of this algorithm, the lineardependent columns in the matrix will be eliminated to zeros. The whole matrix is considered in our proposed method instead of only the lower part of the matrix $\tilde {\mathbf {R}_{l}}$ as mentioned in [17]. It would be more accurate than assuming that errors do not occur in the upper part of the matrix, but it is not the real case.
We can note that the finitefield Gauss elimination over GF(q) can be defined by a linear application given by:
In noiseless transmissions, the number of dependent columns in R _{ l }, for l=α·n, $\alpha \in \mathbb {N}$ , corresponds to the number of the zero columns in the matrix T _{ l } which is the result of the transformation of R _{ l } by the finitefield Gauss elimination in GF(q) (R _{ l }·A _{ l }=T _{ l }). The matrix form of T _{ l } is described in Figure 3.
In fact, the dimension identification of a vector space generated by a code is equivalent to finding the dimension of a vector space generated by its dual code $\mathcal {C}^{\perp }$ . For any vector h belonging to $\mathcal {C}^{\perp }$ and for any code word r of , the relation between both is defined by r·h ^{T}=0. In noiseless conditions, the matrix R _{ n }, for l=n, which is composed of M code words of length n, should satisfy:
We can note that h belongs to the kernel of R _{ n }, denoted as ker(R _{ n }). So, we have $\mathcal {C}^{\perp }\subset \ker \left (\mathbf {R}_{n}\right)$ . Since the dependent columns in R _{ l } multiplied by the columns $\mathbf {a}^{(l)}_{i} $ permit to have the zero columns in the matrix T _{ l }, the corresponding columns $\mathbf {a}^{(n)}_{i} $ will belong to ker(R _{ n }) in which the dual code $\mathcal {C}^{\perp }$ is contained. Therefore, finding the dependent columns in R _{ l } is equivalent to finding the columns $\mathbf {a}^{(l)}_{i} $ which belong to the dual code $\mathcal {C}^{\perp }$ .
Due to the presence of errors induced by the channel in $\tilde {\mathbf {R}}_{l}$ , for l=α·n, the columns of $\tilde {\mathbf {T}}_{l}$ corresponding to the almost dependent columns in $\tilde {\mathbf {R}}_{l}$ will contain some nonzero symbols. Assuming that the first l rows and the pivots of the matrix $\tilde {\mathbf {T}}_{l}$ do not contain transmission errors, using (3) and (4) allows us to write the matrix $\tilde {\mathbf {T}}_{l}$ as:
In this case, a vector h is a parity check relation (i.e., $\mathbf {h}\in \mathcal {C}^{\perp }$ ) with high probability if the relation $\tilde {\mathbf {R}}_{l}\cdot \mathbf {h}^{T}$ has a low Hamming weight [11]. However, the opposite is not necessarily true. We can conclude that $\tilde {\mathbf {a}}^{(l)}_{i}$ belongs to $\mathcal {C}^{\perp }$ if the corresponding $\tilde {\mathbf {t}}_{i}^{(l)}=\tilde {\mathbf {R}}_{l}\cdot \tilde {\mathbf {a}}^{(l)}_{i}$ has a small Hamming weight. In GF(q), the Hamming weight of a vector is the number of nonzero elements in this vector. So, our aim is to determine the columns $\tilde {\mathbf {t}}_{i}^{(l)}$ which have a high number of zeros. The idea is to study the number of zeros in the columns of the $\tilde {\mathbf {T}}_{l}$ in order to detect the almost dependent columns in $\tilde {\mathbf {R}}_{l}$ .
Behaviors of the number of zeros in the columns of $\tilde {\mathbf {T}}_{l}$
Let B _{ l }(i) be the number of zeros in the ith column of $\tilde {\mathbf {T}}_{l}$ , $\tilde {\mathbf {t}}^{(l)}_{i}$ . Hence, the variable B _{ l }(i) has two behaviors depending on whether the column $ \tilde {\mathbf {a}}_{i}^{(l)} $ belongs to the dual code $\mathcal {C}^{\perp }$ or not. This variable will be studied as a function of $ \tilde {\mathbf {a}}_{i}^{(l)} $ assuming that the bits that represent an element of the GF(q), where q=2^{m}, are uniformly distributed and independent from each other.

If the column $ \tilde {\mathbf {a}}_{i}^{(l)} $ does not belong to the dual code $\mathcal {C}^{\perp }$ , the variable B _{ l }(i), for all i∈[ [1,l] ], will follow a binomial distribution of parameters M and 1/q with a mean equal to M/q, denoted as $\mathcal {B}(M,1/q)$ .

If the column $ \tilde {\mathbf {a}}_{i}^{(l)} $ belongs to the dual code $\mathcal {C}^{\perp }$ , the variable B _{ l }(i) will follow a binomial distribution with parameters M and P _{ i }, denoted as $\mathcal {B}(M,P_{i})$ . The parameter P _{ i } corresponds to the probability that a coefficient $\tilde {t}_{i}^{(l)}(j)$ of the column $\tilde {\mathbf {t}}^{(l)}_{i}$ is equal to 0 $\left (\text {i.e.},P_{i}=Pr\left [\tilde {t}_{i}^{(l)}(j)=0\mid \tilde {\mathbf {a}}_{i}^{(l)} \in \mathcal {C}^{\perp }\right ]\right)$ .
It is possible to limit the two behaviors of the variable B _{ l }(i) by computing an optimal threshold $\hat {\eta }_{\text {opt}}$ such that:
where $\hat {\eta }_{\text {opt}}=\frac {M}{q}\cdot \eta _{\text {opt}}$ is a real in the interval [0,M]. The optimal threshold η _{opt} is able to minimize the probability of wrong detection of a column $\tilde {\mathbf {a}}_{i}^{(l)}\in \mathcal {C}^{\perp }$ , denoted as P _{wd}, which corresponds to the sum of the false alarm probability, denoted as P _{fa}, and the probability of not detecting a theoretical dependent column, denoted as P _{nd}. The optimal threshold is determined by:
The normal distribution can be used to approximate the binomial probabilities of B _{ l }(i) when M is large:

If $\tilde {\mathbf {a}}_{i}^{(l)} \in \mathcal {C}^{\perp }$ :
$$ \hspace*{60pt}B_{l}(i)\rightarrow \mathcal{N}\left(\mu_{0},{\sigma_{0}^{2}}\right) $$((9)) 
If $\tilde {\mathbf {a}}_{i}^{(l)} \notin \mathcal {C}^{\perp }$ :
$$ \hspace*{60pt}B_{l}(i)\rightarrow \mathcal{N}\left(\mu_{1},{\sigma_{1}^{2}}\right) $$((10))
where $\mathcal {N}\left (\mu _{0},{\sigma _{0}^{2}}\right)$ is the normal distribution of parameters μ _{0}=M·P _{ i } and ${\sigma _{0}^{2}}=M\cdot P_{i} \cdot (1P_{i})$ and $\mathcal {N}\left (\mu _{1},{\sigma _{1}^{2}}\right)$ corresponds to the normal distribution of parameters μ _{1}=M/q and ${\sigma _{1}^{2}}=M\cdot (q1)/q^{2}$ .
Henceforth, the optimal value of the threshold $\hat {\eta }$ minimizing the probability of wrong detection P _{wd} can be computed by:
where ϕ(x) is the cumulative density function of the standard normal distribution:
We can note that the optimal threshold $\hat {\eta }_{\text {opt}}$ depends on the parameters: M, q, and P _{ i }. So, in order to delimit the two behaviors of the variable B _{ l }(i), it is necessary to compute the probability P _{ i }.
Computation of the probability P _{ i }
In the case of binary codes, the probability P _{ i } has been calculated in [11]. But, it has never been studied in the general case of codes over GF(2^{m}). In fact, the computation of the parameter P _{ i } is essential in order to detect the almost dependent columns in $\tilde {\mathbf {R}}_{l}$ by delimiting the two behaviors of the variable B _{ l }(i). Our aim is to investigate this probability in the case of nonbinary codes. In the following, the theoretical study of P _{ i } is presented.
For l=n and i a position of a column $\tilde {\mathbf {a}}_{i}^{(l)} $ contained in $\mathcal {C}^{\perp }$ , a coefficient $\tilde {t}_{i}^{(l)}(j)$ of the column $\tilde {\mathbf {t}}_{i}^{(l)}$ can be obtained, using (6), by:
where $t_{i}^{(l)}(j)=0$ in the case of noiseless transmissions as explained previously. Indeed, the sum $\sum _{k=1}^{n}a_{i}^{(l)}(k)\cdot e_{k}^{(l)}(j)$ is null in this case because $e_{k}^{(l)}(j)=0,\,\forall k\in \lbrace 1,\cdots,n\rbrace $ , and ∀j∈{1,⋯,M}. However, in the case of noisy transmissions, the coefficients $e_{i}^{(l)}(j)\in $ GF(q) corresponds to the errors introduced by the noisy channel in the symbols $r_{i}^{(l)}(j)\in $ GF(q) in order to generate the noisy symbols $\tilde {r}_{i}^{(l)}(j)\in $ GF(q). Our aim is to determine P _{ i } the probability of detecting a zero coefficient in the column $\tilde {\mathbf {t}}_{i}^{(l)}$ corresponding to having $\sum _{k=1}^{n}a_{i}^{(l)}(k)\cdot e_{k}^{(l)}(j)=0$ :
Let N _{ i }(l) be the minimum number of linear combinations of columns required to obtain $\tilde {\mathbf {t}}_{i}^{(l)}$ . This number corresponds also to the Hamming weight of the column $\tilde {\mathbf {a}}_{i}^{(l)} $ . Then, there could be positions among N _{ i }(l) where $e_{i}^{(l)}(j)=0$ . Thus, P _{ i } can be defined as the probability of having $\sum _{k=1}^{s}a_{i}^{(l)}(k)\cdot e_{k}^{(l)}(j)=0$ such that s is the number of positions among N _{ i }(l) where $e_{i}^{(l)}(j)\neq 0$ :
where X is a random variable of the erroneous positions number among N _{ i }(l). Indeed, we show in Appendix that the probability P _{ i } of having $\tilde {t}_{i}^{(l)}(j)=0$ can be determined by:
In the case of GF(2) (i.e., q=2), this probability can be written as:
This expression corresponds to that used in [11].
In Figure 4, we represent the wrong detection probability P _{wd} as a function of $\hat {\eta }/M$ and p _{ e } assuming q=2^{3}, $w\left (\tilde {\mathbf {a}}_{i}^{(l)}\right)=20$ and M=2,000. For each value of p _{ e }, the optimal threshold $\hat {\eta }_{\text {opt}}$ corresponding to a root of (11) is computed. From Figure 4, we can deduce that the threshold interval satisfying P _{wd}≈0 decreases when the value of p _{ e } increases.
We can conclude that studying the behaviors of B _{ l }(i) in order to identify n is based on the calculation of the optimal threshold $\hat {\eta }_{\text {opt}}$ . However, this threshold depends on the value of the error probability p _{ e } which is unknown for the receiver. So, the need to estimate this parameter is a blocking step in the almost dependent columns method and also leads to a lack of robustness.
In order to address these problems, we propose a new iterative method based on the arithmetic mean of the variable B _{ l }(i) which do not depend on p _{ e } and where the iterative process permits to improve the detection probability.
New iterative method based on the arithmetic mean of the variable B _{ l }(i)
In this part, the proposed method based on the arithmetic mean of the number of zeros in the columns of the matrix $\tilde {\mathbf {T}}_{l}$ is described. We recall that the Gauss elimination described in Algorithm 1 should be applied in order to obtain $\tilde {\mathbf {T}}_{l}$ . We show here that the identification of the parameter n by our proposed method does not depend on the error probability p _{ e }. In this method, in order to improve the detection probability of n, an iteration process is introduced. We consider the idea of the iterative process proposed in [18,19]. The principle of this process is to perform random permutations on the rows of the matrix $\tilde {\mathbf {R}}_{l}$ in order to obtain a new virtual realization of the received data. These permutations permit to increase the probability to obtain nonerroneous pivots during the Gauss elimination.
The arithmetic mean of the variables B _{ l }(i), ∀i∈[ [1,l] ], denoted E _{ l } is defined by:
Property 1.
If X _{1},X _{2},⋯,X _{ m } are independent random variables respectively following:
the mean defined by $\frac {(X_{1}+X_{2}+\cdots +X_{m})}{m}$ follows:
We recall that the variable B _{ l }(i) which is the number of zeros in the ith column of the matrix $\tilde {\mathbf {T}}_{l}$ has two possible behaviors depending on l:

If l≠α·n, for $\alpha \in \mathbb {N}$ , the variable B _{ l }(i) follows a normal distribution $\mathcal {N}\left (\mu _{1},{\sigma _{1}^{2}}\right)$ for all columns i of $\tilde {\mathbf {T}}_{l}$ . In this case, using the property 1, the mean E _{ l } will follow:
$$ E_{l}\rightarrow \mathcal{N}\left(\mu_{1},\frac{{\sigma_{1}^{2}}}{l}\right) $$((20))We can note that the mean E _{ l } will be close to M/q.

If l=α·n, for $\alpha \in \mathbb {N}$ :

If the ith column is an almost dependent column, the variable B _{ l }(i) will follow the normal distribution of parameters $\mathcal {N}\left (\mu _{0},{\sigma _{0}^{2}}\right)$ .

If the ith column is not an almost dependent column, the variable B _{ l }(i) will follow the normal distribution of parameters $\mathcal {N}\left (\mu _{1},{\sigma _{1}^{2}}\right)$ .
Thereby, the mean E _{ l } is given by:
$$ E_{l}\rightarrow \mathcal{N}\left(\frac{Q(l)\cdot\mu_{0}+k_{l}\cdot\mu_{1}}{l},\frac{Q(l){\cdot\sigma_{0}^{2}}+k_{l}{\cdot\sigma_{1}^{2}}}{l^{2}}\right) $$((21))where Q(l) is the number of almost dependent columns in the matrix $\tilde {\mathbf {R}}_{l}$ such that:
$$ Q(l)=\text{Card}\left\{i\in[\!\![0,l]\!\!], B_{l}(i)>\hat{\eta}_{\text{opt}}\right\} $$((22))where Card(x) is the cardinal function which returns the set size. k _{ l }=l−Q(l) is the number of independent columns in the same matrix. In the noiseless environment, the mean E _{ l } is stable at:
$$ E_{l}=\frac{M\cdot\left(q\cdot\left(nk\right)+k\right)}{q\cdot n} $$((23)) 
We note two behaviors of E _{ l } with respect to l=α·n or l≠α·n:
The gap between these behaviors allows us to find the matrices which have the number of columns l=α·n.
Let be a set of lvalues where the gap $E_{l}\frac {M}{q}>0\):
Thereby, the identified length of the code words will be such that:
where the functions diff(x) and mode(x) are defined by:

Function diff(x): the output of this function is a vector of size s−1 and it corresponds to the difference between two consecutive elements of the vector $\mathbf {x}=\begin {pmatrix} x(1)& x(2)& \cdots & x(s) \end {pmatrix}$ :
$$ \text{diff}(\mathbf{x})=\begin{pmatrix} x(2)x(1)& \cdots& x(s)x(s1) \end{pmatrix} $$((27)) 
Function mode(x): this operation provides the value which has the highest occurrence in the vector x.
The proposed iterative method of the code word length identification is summarized in the Algorithm 2.
Example 1.
Let us consider the ReedSolomon code, denoted R S(15,11), over GF(2^{4}) which is defined by: n=15 and k=11. The mean E _{ l } normalized by M, which is set to 1,000, is represented in Figures 5 and 6. In Figure 5, a zero probability of error (i.e., p _{ e }=0) is considered. For l≠α·n, we can verify that the mean E _{ l } normalized by M is stable at 1/q=0.0625. For l=α·n, the mean E _{ l } meets (23):
So, the matrices of size l=α·n have peaks for $\frac {E_{l}}{M}\frac {1}{q}=0.25>0\). In Figure 6, the gap $\frac {E_{l}}{M}\frac {1}{q}$ is represented with respect to l when p _{ e }=0.01 for one iteration of our algorithm. According to (25), the set is shown in Table 1. Henceforth, using (26), the identified length of the code words is $\tilde {n}=15$ .
Analysis and performances
The aim of our proposed algorithm is to blindly identify the length of nonbinary code words in noisy environment. This purpose can be reached with an average complexity equal to $\mathcal {O}(M\cdot \text {l}^{3}_{\text {max}}\cdot \text {it}_{\text {max}})$ . Indeed, the proposed algorithm performs ((l _{ max }−1)·i t _{ max }) processes of Gaussian eliminations which have an average complexity equal to $\mathcal {O}(M\cdot l^{2})$ , where l=2⋯l _{max}. So, the average complexity is such that:
In order to analyze the performances of our blind identification method, the probability of correct detection of the code word length n is chosen as a performance criterion. In the simulations, our method is applied to the nonbinary LDPC codes which became candidate for future communication systems. For each simulation, 2,000 Monte Carlo trials are run where the data symbols are randomly chosen at each trial. In this part, we focus on:

the gain of the iteration process on the detection probability of n

the performance comparison in the case of different channels

the impact of increasing the Galois field dimension q on the detection probabilities of n

the impact of increasing the code word length n on the detection probabilities for a given q
ᅟ
Gain of the iterative process
In our simulations, we consider a LDPC (n=6, k=3) over GF(4). Figure 7 shows the probability of detecting n according to p _{ e } for one, three, five, and ten iterations. We can see that the gain between the first and the tenth iteration is significantly important. Indeed, for p _{ e }=0.07, with one iteration, the detection probability is equal to 0.76 and it becomes equal to 0.99 after 10 iterations. We can deduce that the iterative process improves significantly the detection performances of the blind identification method based on the mean calculation.
Performance comparison in the case of different channels
Let us illustrate the detection obtained by the proposed method for a LDPC (n=16,k=8) over GF(8) when an AWGN channel (the first channel) and a multipath Rayleigh channel associated to an AWGN channel (the second channel) are considered. In order to compensate and reduce the intersymbol interference (ISI) caused by the multipath propagation, a linear mean square error (MSE) equalizer of length 20 was used.
We evaluate the performances of our method when the QAM or PAM modulation of order 8 (8QAM and 8PAM) is used to transmit the symbols coded by LDPC (n=16,k=8) over GF(8). In Figures 8 and 9, a comparison of performances of our blind identification method using 8PAM or 8QAM modulations in the case of an AWGN channel and a multipath channel with path number L _{path}=4 and it_{max}=1 is presented. In Figure 8, a comparison of the detection performances of our method in the case of AWGN channel is depicted. We can see that the proposed method for 8QAM modulation gives better performances than for 8PAM modulation when SNR <18 dB. The gain between both is equal to 5 dB. However, for SNR > 18 dB, the performances are similar and the detection probability is equal to 1. To obtain the detection probabilities presented in Figure 9, the modulated symbols by 8PAM or 8QAM modulations are transmitted in a quasistatic Rayleigh fading multipath channel with path number L _{path}=4, then the received symbols are treated by the linear MSE equalizer of length 20. We can observe that, in the case of 8QAM, our proposed method provides better performances than for 8PAM. A gain equal to 5 dB is exhibited. We have chosen to evaluate our proposed methods in the worst case of 8PAM modulation because our aim was to show that our method has the best performances even in the case of the PAM modulation.
In the following, the performance study of the impact of n and q on the proposed method is presented.
Impact of increasing q
Let us consider a LDPC (n=6, k=3), constructed in the Galois field GF(q), where q=4,8,16. The matrices $\tilde {\mathbf {R}}_{l}$ are reshaped from L=30,000 received symbols with l=2,⋯,30 and M=1,000. For each value of q, the method based on the mean calculation is applied to blindly identify the code word length of LDPC (n=6, k=3) over GF(q) when it_{ max }=1. Figure 10 depicts the probability of detecting the correct n by our blind identification method according to the error probability p _{ e } in the cases of GF(4), GF(8), and GF(16). This figure shows that the curve behavior is nearly similar for all q=4, 8, 16. We can deduce that the method based on the mean calculation is slightly sensitive to the increase of the Galois field dimension q.
Impact of increasing n
To evaluate the detection performances of our blind identification method, the impact of increasing the code word length should be studied. In our simulations, we consider two LDPC codes over GF(8), a LDPC (n=6, k=3) and a LDPC (n=16, k=8). The matrices $\tilde {\mathbf {R}}_{l}$ are reshaped from L=64,000 received symbols with l=2,⋯,64 and M=1,000. For each code, the method based on the mean calculation is applied to blindly identify the code word length n when it_{ max }=1. Figure 11 shows the detection probabilities of n by the method based on the mean calculation. We can note that the increase of the code word length leads to lower detection performances with our proposed method. Indeed, for p _{ e }=0.01, the detection probability of the method of the mean calculation is constant and equal to 1 in the case of the two codes. For p _{ e }=0.02, the detection probability decreases from 0.99 to 0.94.
In order to show that our method works in the case of codes of a reasonable code word length, we computed the detection probability of the ReedSolomon code RS (n=31,k=25) over GF(32) which corresponds to an equivalent code over GF(2) of length m·n=5·31=155. For an error probability p _{ e }=0.01 and 1,000 trials of Monte Carlo, we obtained a detection probability of 0.87 for it_{max}=50. This probability can be improved by increasing the number of iteration of our algorithm. For it_{max}=100, we obtained a detection probability of 0.95.
Conclusions
In this paper, we have introduced an algorithm devoted to the blind identification of the code word length for a nonbinary code in a noisy transmission environment. Using this algorithm, the code word length can be identified by calculating the arithmetic mean of the number of zeros that occur in the columns of the matrix obtained by the Gauss elimination. We have shown that the proposed algorithm is robust because it does not require the estimation of error probability, is insensitive to the high order of Galois field, and has the best detection performances for the most of modulation types. Furthermore, this method provides better performances of detection when an iterative process is considered in order to increase the probability to obtain nonerroneous pivots during the Gauss elimination.
Our future work will focus on identifying the remainder of the nonbinary code parameters as well as a parity check matrix, permitting to implement a generic decoder in a noisy environment. Furthermore, a method based on using soft information that allows us to improve the performances of the blind identification algorithms will be published soon [31].
Appendix
Proof of Equation 8
We define $\mathcal {H}_{0}$ and $\mathcal {H}_{1}$ by:
The two behaviors of B _{ l }(i) are limited in (7). The aim of this appendix is to demonstrate (8). In order to determine the probabilities of P _{fa} and P _{nd}, we should study the behaviors of the variable B _{ l }(i) according to the hypotheses $\mathcal {H}_{0}$ and $\mathcal {H}_{1}$ : Under the hypothesis $\boldsymbol{\mathcal {H}}_{0}$ : the variable B _{ l }(i) follows a binomial distribution $\mathcal {B}(M,P_{i})$ . So, the probability that B _{ l }(i) is greater than $\frac {M}{q}\cdot \eta $ is as follows:
Under the hypothesis $\boldsymbol{\mathcal {H}}_{1}$ : the variable B _{ l }(i) follows a binomial distribution $\mathcal {B}(M,1/q)$ . So, the probability that B _{ l }(i) is less than or equal to $\frac {M}{q}\cdot \eta $ is as follows:
Using these two probabilities, we will calculate the false alarm probability P _{fa}, the probability of not detecting a theoretical dependent column P _{nd} and the probability of detection P _{det}. Calculation of the false alarm probability P _{ fa } : this probability corresponds to decide that a column $\tilde {\mathbf {a}}_{i}^{(l)}$ belongs to a dual code $\mathcal {C}^{\perp }$ even thought in reality it does not belong. This probability can be determined by:
Calculation of the probability of not detecting a theoretical dependent column P _{ nd } : this probability corresponds to decide that a column $\tilde {\mathbf {a}}_{i}^{(l)}$ does not belong to $\mathcal {C}^{\perp }$ even thought in reality it belongs. This probability can be determined by:
Calculation of the probability of detection P _{ det } : this probability is defined by:
Using (32) and (34), the optimal threshold can be determined by:
Proof of the equation (16)
The probability P _{ i } is initially expressed by (15). We denote P _{1}(s)=P r[X=s] and $P_{2}(s)=Pr \left [\sum _{k=1}^{s}a_{i}^{(l)}(k)\cdot e_{k}^{(l)}(j)=0\right ]$ such that these two probabilities are independent. Henceforth, (15) becomes:
Assuming that the errors are independent from each other and uniformly distributed in GF (q)∖{0}, the variable X follows a binomial distribution with parameters N _{ i }(l) and p _{ e }. Thereby, the probability P _{1}(s) is determined by:
The probability P _{2}(s) is the probability of having $\sum _{k=1}^{s}a_{i}^{(l)}(k)\cdot e_{k}^{(l)}(j)=0$ where $e_{k}^{(l)}(j)\in $ GF (q)∖{0}.
We demonstrate by the mathematical induction that the probability P _{2}(s) can be expressed by:
We have P _{2}(0)=1 because there are no erroneous positions. In the case of a single erroneous position, we have P _{2}(1)=0. However, considering the example of GF(2^{2}), the probability P _{2}(s=2) can be obtained by the matrix M whose the indexes of rows and columns correspond to nonzero elements of this field. The coefficients of this matrix correspond to the sum over GF(2^{2}) of the indexes of a row and a column.
If we have $\left (a_{i}^{(l)}(1),e_{1}^{(l)}(j)\right) \in \left (GF(2^{2})^{*}\right)^{2}$ , and $\left (a_{i}^{(l)}(2), e_{2}^{(l)}(j)\right)\in \left (GF(2^{2})^{*}\right)^{2}$ , the probability of having $a_{i}^{(l)}(1)\cdot e_{1}^{(l)}(j)+a_{i}^{(l)}(2)\cdot e_{2}^{(l)}(j) =0$ will be P _{2}(2)=3/9=1/3. The computed probability verifies (38).
We assume that (38) is verified for s, and we demonstrate it for s+1. If we have $\sum _{k=1}^{s+1}a_{i}^{(l)}(k)\cdot e_{k}^{(l)}(j)=0$ , we will have $e_{s+1}^{(l)}=\frac {1}{a_{i}^{(l)}(s+1)}\cdot \sum _{k=1}^{s}a_{i}^{(l)}(k)\cdot e_{k}^{(l)}(j)$ that belongs to GF (q)^{∗} with a probability equal to 1/(q−1). Therefore, the probability P _{2}(s+1) is determined by:
In order to simplify the expression of P _{2}(s), a change of variable is done by considering φ(s)=(q−1)^{s−1}·P _{2}(s). When P _{2}(s) is replaced by φ(s), the expression (38) becomes:
Denoting ρ(s)=(−1)^{s}·φ(s), the expression (41) can be written as:
but, the sum $\sum _{i=0}^{s1}(1q)^{i}$ is a geometric sequence of common ratio 1−q. So, it can be written as:
The computation of ρ(1) gives ρ(1)=0. Therefore, using (43) and (41), the simplified expression of P _{2}(s) is written as:
Using (37) and (44), the overall probability P _{ i } is given by:
In order to simplify this equation, the Newton’s binomial formula can be applied:
Thus, the probability of having an element of the ith column of $\tilde {\mathbf {T}}_{l}$ equal to 0 is determined by:
References
 1
MC Davey, D MacKay, Lowdensity paritycheck codes over GF(q). IEEE Commun. Lett. 2, 165–167 (1998).
 2
JA Briffa, HG Schaathun, in 5th International Symposium on Turbo Codes and Related Topics. Nonbinary turbo codes and applications (IEEELausanne, 2008).
 3
D Declercq, M Fossorier, Decoding algorithms for nonbinary LDPC codes over GF(q). IEEE Trans. Commun. 55(4), 633–643 (2007).
 4
L Barnault, D Declercq, in Proceedings ITW. Fast decoding algorithm for LDPC over GF(2^{q}) (IEEE,Paris, France, 2003), pp. 70–73.
 5
A Voicila, D Declercq, F Verdier, M Fossorier, P Urard, Lowcomplexity decoding for nonbinary LDPC codes in high order fields. IEEE Trans. Commun. 58(5), 1365–1375 (2010).
 6
Yang Yu, W Chen, Design of low complexity nonbinary LDPC codes with an approximated performancecomplexity tradeoff. IEEE Commun. Lett. 16(4), 514–517 (2012).
 7
T Xia, HC Wu, Identification of nonbinary LDPC codes using average LLR of syndrome a posteriori probability. IEEE Commun. Lett. 17(7), 1301–1304 (2013).
 8
J Stern, A method for finding code words of small weight. Coding Theory Appl. 388, 106–113 (1989).
 9
A Canteaut, F Chabaud, A new algorithm for finding minimumweight words in a linear code: application to McElieces cryptosystem and to narrowsense BCH codes of length 511. IEEE Trans. Inf. Theory. 44, 367–378 (1998).
 10
A Valembois, Detection and recognition of a binary linear code. Discrete Appl. Math. 111(12), 199–218 (2001).
 11
M Cluzeau, in 2006 IEEE International Symposium on Information Theory. Block code reconstruction using iterative decoding techniques (IEEE,Seattle, WA, 2006), pp. 2269–2273.
 12
M Cluzeau, M Finiasz, in IEEE International Symposium on Information Theory 2009. Recovering a code’s length and synchronization from a noisy intercepted bitstream (IEEE,Seoul, 2009), pp. 2737–2741.
 13
M Côte, N Sendrier, in IEEE International Symposium on Information Theory (ISIT). Reconstruction of convolutional codes from noisy observation (IEEE,Seoul, 2009), pp. 546–550.
 14
G Burel, R Gautier, in IASTED International Conference on Communications, Internet and Information Technology. Blind estimation of encoder and interleaver characteristics in a non cooperative context (ACTA Press,Scottsdale, AZ, USA, 2003).
 15
Y Zrelli, R Gautier, M Marazin, E Rannou, E Radoi, Focus on theoretical properties of blind convolutional codes identification methods based on rank criterion. MTA Review. XXII(4), 213–234 (2012).
 16
Y Zrelli, M Marazin, R Gautier, E Rannou, in Proceedings of the International Conference on Computer Communication Networks. Blind identification of convolutional encoder parameters over GF(2^{m}) in the noiseless case (IEEE,Maui, Hawaii, 2011).
 17
G Sicot, S Houcke, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 3. Blind detection of interleaver parameters (IEEE,Philadelphia, Pennsylvania, 2005), pp. 829–832.
 18
G Sicot, S Houcke, J Barbier, Blind detection of interleaver parameters. Signal Process. 89(4), 450–462 (2009).
 19
M Marazin, R Gautier, G Burel, Blind recovery of k/n rate convolutional encoders in a noisy environment. EURASIP J. Wireless Commun. Netw. 2011(168), 1–9 (2011).
 20
Z Jing, H Zhiping, L Chunwu, S Shaojing, Z Yimeng, Informationdispersionentropybased blind recognition of binary BCH codes in soft decision situations. Entropy. 15(5), 1705–1725 (2013).
 21
A Hocquenghem, Codes correcteurs d’erreurs. Chiffres. 2, 147–156 (1959).
 22
RC Bose, DK RayChaudhuri, On a class of error correcting binary group codes. Inf. Control. 3(3), 68–79 (1960).
 23
I Reed, G Solomon, Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960).
 24
M Baldi, M Bianchi, F Chiaraluce, R Garello, N Maturo, IA Sanchez, S Cioni, in 2013 IEEE Military Communications Conference. Advanced coding schemes against jamming in telecommand links (IEEE,San Diego, CA, 2013), pp. 1220–1226.
 25
C Junbin, W Lin, L Yong, in International Conference on Communications, Circuits and Systems, 1. Performance comparison between nonbinary LDPC codes and ReedSolomon codes over noise bursts channels (IEEE,Hong Kong, China, 2005), pp. 1–4.
 26
B Zhou, L Zhang, J Kang, Q Huang, YY Tai, S Lin, M Xu, in Information Theory and Applications Workshop. Nonbinary LDPC codes vs. ReedSolomon codes (IEEE,San Diego, CA, 2008), pp. 175–184.
 27
version 8.8.0 Release 8 GT, LTE; Evolved Universal Terrestrial Radio Access (EUTRA); Multiplexing and Channel Coding. The 3rd Generation Partnership Project 2, Technical Specification Group Radio Access Network (2010). http://www.3gpp.org. (2013).
 28
M Marazin, R Gautier, G Burel, in IEEE GLOBECOM Workshops. Dual code method for blind identification of convolutional encoder for cognitive radio receiver design (IEEE,Honolulu, HI, 2009).
 29
EM Moro, Algebraic geometry modeling in information theory (World Scientific, Singapore, 2012).
 30
E Anderson, Z Bai, C Bischof, S Blackford, J Demmel, J Dongarra, JD Croz, A Greenbaum, S Hammarling, A McKenney, D Sorensen, LAPACK user’s guide (SIAM, Philadelphia, 1999).
 31
Y Zrelli, Identification aveugle de codes correcteurs d’erreurs basés sur des grands corps de Galois et recherche d’algorithmes de type décision souple pour les codes convolutifs (PhD thesis, Université de Brest, France, 2013).
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Yasamine Zrelli, Roland Gautier, Eric Rannou, Mélanie Marazin and Emanuel Radoi contributed equally to this work.
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Cognitive radio
 Blind identification
 Nonbinary errorcorrecting codes
 Galois field