Blind recovery of k/n rate convolutional encoders in a noisy environment

In order to enhance the reliability of digital transmissions, error correcting codes are used in every digital communication system. To meet the new constraints of data rate or reliability, new coding schemes are currently being developed. Therefore, digital communication systems are in perpetual evolution and it is becoming very difficult to remain compatible with all standards used. A cognitive radio system seems to provide an interesting solution to this problem: the conception of an intelligent receiver able to adapt itself to a specific transmission context. This article presents a new algorithm dedicated to the blind recognition of convolutional encoders in the general k/n rate case. After a brief recall of convolutional code and dual code properties, a new iterative method dedicated to the blind estimation of convolutional encoders in a noisy context is developed. Finally, case studies are presented to illustrate the performances of our blind identification method.


Introduction
In a digital communication system, the use of an error correcting code is mandatory.This error correcting code allows one to obtain good immunity against channel impairments.Nevertheless, the transmission rate is decreased due to the redundancy introduced by a correcting code.To enhance the correction capabilities and to reduce the impact of the amount of redundancy introduced, new correcting codes are always under development.This means that communication systems are in perpetual evolution.Indeed, it is becoming more and more difficult for users to follow all the changes to stay up-to-date and also to have an electronic communication device always compatible with every standard in use all around the world.In such contexts, cognitive radio systems provide an obvious solution to these problems.In fact, a cognitive radio receiver is an intelligent receiver able to adapt itself to a specific transmission context and to blindly estimate the transmitter parameters for selfreconfiguration purposes only with knowledge of the received data stream.As convolutional codes are among the most currently used error-correcting codes, it seemed to us worth gaining more insight into the blind recovery of such codes.
In this article, a complete method dedicated to the blind identification of parameters and generator matrices of convolutional encoders in a noisy environment is treated.In a noiseless environment, the first approach to identify a rate 1/n convolutional encoder was proposed in [1].In [2,3] this method was extended to the case of a rate k/n convolutional encoder.In [4], we developed a method for blind recovery of a rate k/n convolutional encoder in turbocode configuration.Among the available methods, few of them are dedicated to the blind identification of convolutional encoders in a noisy environment.An approach allowing one to estimate a dual code basis was proposed in [5], and then in [6] a comparison of this technique with the method proposed in [7] was given.In [8], an iterative method for the blind recognition of a rate (n-1)/n convolutional encoder was proposed in a noisy environment.This method allows the identification of parameters and generator matrix of a convolutional encoder.It relies on algebraic properties of convolutional codes [9,10] and dual code [11], and is extended here to the case of rate k/n convolutional encoders.
This article is organized as follows.Section 2 presents some properties of convolutional encoders and dual codes.Then, an iterative method for the blind identification of convolutional encoders is described in Section 3. Finally, the performances of the method are discussed in Section 4. Some conclusions and prospects are drawn in Section 5.

Convolutional encoders and dual code
Prior to explain our blind identification method, let us recall the properties of convolutional encoders used in our method.

Principle and mathematical model
Let C be an (n, k, K) convolutional code, where n is the number of outputs, k is the number of inputs, K is the constraint length, and C ⊥ be a dual code of C. Let us also denote by G(D) a polynomial generator matrix of rank k defined by: where g i,j (D), ∀i = 1,..., k, ∀j = 1,..., n, are generator polynomials and D represents the delay operator.Let μ i be the memory of the ith input: where deg is the degree of g i,j(D) .The overall memory of the convolutional code, denoted μ, is If the input sequence is denoted by m(D) and the output sequence by c(D), the encoding process can be described by In practice, the encoder used is usually an optimal encoder.An encoder is optimal, [10], if it has the maximum possible free distance among all codes with the same parameters (n, k, and K).This is because the error correction capability of such optimal codes is much higher.Furthermore, their good algebraic properties [9,10] can be judiciously exploited for blind identification.
To model the errors generated by the transmission system, let us consider the binary symmetric channel (BSC) with the error probability, P e , and denote by e(D) the error pattern and by y(D) the received sequence so that: Let us also denote by e(i) the ith bit of e(D) so that: Pr (e(i) = 1) = P e and Pr(e(i) = 0) = 1 -P e .The errors are assumed to be independent.
In this article, the noise is modeled by a BSC.This BSC can be used to model an AWGN channel in the context of a hard decision decoding algorithm.Indeed, the BSC can be seen as an equivalent model to the set made of the combination of the modulator, the true channel model (AWGN by example) and the demodulator (Matched filter or Correlator + Decision Rule).Furthermore, in mobile communications, channels are subject to multipath fading, which leads, in the received bit stream, to burst errors.But, a convolutional encoder alone is not efficient in this case.Therefore, an interleaver is generally used to limit the effect of these burst errors.In this context, after the deinterleaving process, on the receiver side, the errors (so the equivalent channel including the deinterleaver) can also be modeled by a BSC.

The dual code of convolutional encoders
The dual code generator matrix of a convolutional encoder, termed a parity check matrix, can also be used to describe a convolutional code.This ((nk) × n) polynomial matrix verifies the following property: Theorem 1 Let G(D) be a generator matrix of C. If an ((n -k) × n) polynomial matrix, H(D), is a parity check matrix of C, then: where .T is the transpose operator.
Let us denote by μ ⊥ the memory of the dual code.According to the properties of a dual code and convolutional encoders [9,11], this memory is defined by The polynomial, According to [12], if the polynomial h 0 (D) is a delayfree polynomial, then the convolutional encoder is realizable.It follows that the generator polynomial, h 0 (D), is such that Let us denote by H, the binary form of H(D) defined by where H i , ∀i = 0,..., μ ⊥ , are matrices of size ((nk) × n) such that The parity check matrix ( 11) is composed of shifted versions of the same (nk) vectors.These vectors of size n.(μ ⊥ + 1) and denoted by h j (∀j = 1,..., nk) are defined by where H (j) i , which correspond to the jth row of H i , is a row vector of size n such that In ( 14), 0 l is a zero vector of size l.
In the case of a rate k/n convolutional encoder, each vector h j (13) is composed of (nk -1).(μ ⊥ + 1) zeros.In this configuration, the system given in ( 7) is split into (nk) systems: ∀s = 1,...,(nk).Thus, the (nk) vectors ( 13), called parity checks, are such that i is a row vector of size (k + 1) defined by: Let us denote by S the size of these parity checks of the code ( 16) such that It follows from ( 16) and ( 10) that the (nk) parity checks, h s , are vectors of degree (S -1).

Blind recovery of convolutional code
This section deals with the principle of the proposed blind identification method in the case where the intercepted sequence is corrupted.Only few methods are available for blind identification in a noisy environment: for example, an Euclidean algorithm-based approach was developed and applied to the case of a rate 1/2 convolutional encoder [13].At nearly the same time, a probabilistic algorithm based on the Expectation Maximization (EM) algorithm was proposed in [14] to identify a rate 1/n convolutional encoder.Further to our earlier development of a method of blind recovery for a convolutional encoder of rate (n -1)/n [8], it appeared to us worth extending it, here, to the case of a rate k/n convolutional encoder.Prior to describing the iterative method in use, which is based on algebraic properties of an optimal convolutional encoder [9,10] and dual code [11], let us briefly recall the principle of our blind identification method when the intercepted sequence is corrupted.

Blind identification of a convolutional code: principle
This method allows one to identify the parameters (n, k, and K) of an encoder, the parity check matrix, and the generator matrix of an optimal encoder.Its principle is to reshape columnwise the intercepted data bit stream, y, under matrix form.This matrix, denoted R l , is computed for different values of l, where l is the number of columns.The number of rows in each matrix is equal to L. If the received sequence length is L', then the number of rows of R l is L = L l , where ⌊.⌋ stands for the integer part.This construction is illustrated in Figure 1.
If the received sequence is not corrupted (y = c ⇒ e = 0), for a N, we have shown in [8] that the rank in Galois Field, GF(2), of each matrix R l has two possible values: where n a is a key-parameter which corresponds to the first matrix R l with a rank deficiency.Indeed, in [8], for a rate (n -1)/n convolutional encoder, this parameter proved to be such that In this configuration, n a is equal to the size of the parity check (S).But, what is its value in general for a rate k/n convolutional encoder?
For a rate k/n convolutional encoder, we show in Appendix A that the size of the first matrix which exhibits a rank deficiency, n a , is equal to From ( 22), it is obvious that the parameter, n a , is not equal to the size of the (nk) parity check ( 16) of the code.In Appendix B, a discussion about the value of a rank deficiency of matrix R n a is proposed.

Blind identification of convolutional code: method
A prerequisite to the extension of the method applied in [8] to the case of a rate k/n convolutional encoder is the identification of the parameter, n.Then, a basis of dual code has to be built to further deduce the value of n a that corresponds to the size of the parity check with the smallest degree.Using both this parameter and (22), one can assume different values for k and μ ⊥ Then, the (nk) parity check (16) and a generator matrix of the code can be estimated.
To identify the number of outputs, n, let us evaluate the likely-dependent columns of R l .Then, the values of l at which R l matrices seem to be of degenerated rank are detected by converting each R l matrix into a lower triangular matrix (G l ) through use of the Gauss Jordan Elimination Through Pivoting adapted to GF(2): where A l is a row-permutation matrix of size (L × L) and B l is a matrix of size (l × l) that describes the column combination.Let N l (i) be the number of 1 in the lower part of the ith column in the matrix, G l .In [15,16], this number was used to estimate an optimal threshold (g opt ), which allows us to decide whether the ith column of the matrix R l is dependent on the other columns.This optimal threshold is such that the sum of the missing probabilities is as small as possible.The numbers of detected dependent columns, denoted as Z (l), are such that where Card{x} is the cardinal of x.So, the gap between two non-zero cardinals, Z(l), is equal to the estimated codeword size ( n).Let I be a set of l-values where the car- dinal is non-zero.From the matrix, B i , ∀i ∈ I, one can build a dual code basis.Let I be a ((L -i) × i) matrix com- posed of the last (Li) rows of R i .If b j , ∀j = 1,..., i, represents the jth column of B i , b j is considered as a linear form close to the dual code on condition that: where d(x) is the Hamming weight of x.Let us denote a set of all linear forms by D. Within the set of detected lin- ear forms, the one with the smallest degree is taken and denoted, here, by ĥ, and its size by na .From ( 22), one can make different hypotheses about k and μ ⊥ values.This algorithm is summed up in Algorithm 1.
For a rate (n -1)/n convolutional encoder with ĥ as parity check, solving the system described in Property 1 (see Section 2) enables one to identify the generator matrix.One should, however, note that with a rate k/n convolutional code, a prerequisite to the identification of the generator matrix, G(D), is the identification of the (nk) parity check, h j of size S (see (16) and (18)).
It is done by building ( n − k) row vectors denoted by x s so that ∀s = 1,..., ∀s = 1, ..., n − k .For each vector, x s , a matrix, R s l , is built as previously done for R l .Then, for each matrix R s l , a linear form of size S has to be estimated.This algorithm is summed up in Algorithm 2 where ĥ s refers to the identified n − k parity check.
Identification of the generator matrix from both these ( n − k) parity checks and the whole set of the code parameters can be realized by solving the system described in Property 1.
In [15,17], a similar approach, based on a rank calculation, is used to identify the size of an interleaver.In this article, an iterative process is proposed to increase the probability to estimate a good size of interleaver.The principle of this iterative process is to perform permutations on the R l matrix rows to obtain a new virtual realization of the received sequence.These permutations increase the probability to obtain non-erroneous pivots during the Gauss Elimination process (23).Our earlier identification of a convolutional encoder relied on a similar approach [8].Indeed, at the output of our algorithm, either: (i) the true encoder, or an optimal encoder, is identified or (ii) no optimal code is identified.But in case (ii), the probability of detecting an optimal convolutional encoder is increased by a new iteration of the algorithm.
The average complexity of one iteration of the process dedicated to the blind identification of convolutional encoder is O l 4 max .Indeed, our blind identification method is divided into three steps: (i) identification of n, (ii) identification of a dual code basis, and (iii) identification of parity checks and a generator matrix.Each step consist of maximum (l max -1) process of Gaussian eliminations on R l matrices of size (L × l) Algorithm 2: Estimation of ( n − k) parity check.Input: y, n, k and μ⊥ Output: where L = 2.l max .Thus, the average complexity is such that Thereby, the average complexity of the iterative process is where nb iter is the number of iterations realized.To identify all parameters of an encoder, it is necessary to obtain two consecutive rank deficiency matrix.So, the minimum value of l max is Furthermore, in the literature, the parameters of convolutional encoders used take typically quite very small values.Indeed, the maximum parameters are such that A minimum value of l max is given in Table 1 for three optimal encoders used in the following section dedicated to the analysis and performances study of our blind identification method.
Let R l be a matrix built from 20, 000 received bits with l = 2, ..., 100 and L = 200.It is very important to take into account the number of data to prove that our algorithm is well adapted for implementation in a realistic context.The amount of 20,000 bits is quite low with regards compared to standards.For example, in the case of mobile communications delivered by the UMTS at a data rate up to 2 Mbps, only 10 ms are needed to receive 20, 000 bits.Furthermore, the rates reached by standards in the future will be higher.
For each simulation, 1000 Monte Carlo were run, and focus was on • the impact of the number of iterations upon the probability of detection; • the global performances in terms of probability of detection.
In this article, the detection means complete identification of the encoders (parameters and generator matrix).

The detection gain produced by the iterative process
The number of iterations to be made is a compromise between the detection performances and the processing delay introduced in the reception chain (see [8]).To evaluate this number of iterations, let P det (i) be the probability of detecting the true encoder at the ith iteration.
The probability of detecting the true encoder, P det , is called probability of detection.
• C(3, 2, 3) convolutional encoder: Figure 2 shows the probability of detecting the true encoder (P det ) compared with P e for 1, 10, and 50 iterations.It shows that, for the C(3, 2, 3) convolutional encoder, 10 iterations of the algorithm result in the best performances: indeed, there is no advantage in performing 50 iterations rather than 10.On the other hand, the gain between 1 and 10 iterations is huge.
• C(3,1,4) convolutional encoder: Figure 3 illustrates the evolution of P det compared with P e for 1, 10, and 50 iterations in the case of C(3,1, 4) convolutional encoder.It shows that the gain between the 1st and the 50th iterations is nearly nil.
For a rate k/n convolutional code where k ≠ n -1, the algorithm presented in Figure 2 requires several iterations to estimate the (nk) parity checks (16).Consequently, for such codes (k ≠ n -1) there is no need to realize this iteration process.Indeed, the gain provided by our iterative process is not significant.But, for a rate (n -1)/n convolutional encoder, it is clear that the algorithm performances are enhanced by iterations.Moreover, it is important to note that the detection of a convolutional code depends on both the parameters of the code, the channel error probability, and the correction capacity of the code.Thus, the number of iterations needed to get the best performance is code dependent.For such a code, it would be worth assessing the impact of the required number of data.In order to achieve this, for the C(2,1, 7) convolutional encoders, a comparison of the detection gain produced by the iterative process for several values of L is proposed.
• C(2,1,7) convolutional encoder: iterations, Figure 5 illustrates the evolution of P det compared with P e for L = 500.It shows that, for L = 200, 5 iterations permit us to identify the true encoder, whereas, for L = 500, the identification of the true encoder requires 40 iterations.For L = 200, after 5 iterations, P det is close to 1 for P e ≤ 0.02, but after 40 iterations and L = 500, P det is close to 1 for P e ≤ 0.03.It is clear that the number of received bits is an important parameter of our method.Indeed, by increasing the size of matrices R l , the probability to obtain non-erroneous pivots increases during the iterative process.Thus, it is possible to realize more iterations of our algorithm to improve detection performances.But, for implementation in a realistic context, the required number of data has to be taken into account.In the last section, we will show that the algorithm performances are very good when L = 200.

Probability of detection
To analyze the method performances, three probabilities were defined as follows: 1. probability of detection (P det ) is the probability of identifying the true encoder; 2. probability of false-alarm (P fa ) is the probability of identifying an optimal encoder but not the true one; 3. probability of miss (P m ) is the probability of identifying no optimal encoder.
In order to assess the relevance of our results through a comparison of the different probabilities to the code correction capability, let us denote by BER r the theoretical residual bit error rate obtained after decoding of the corrupted data stream with a hard decision [12].Here, to be acceptable, BER r must be close to 10 -5 .
Figures 6, 7, and 8 show the different probabilities compared with P e after 10 iterations and the limit of the 10 -5 acceptable BER r for C(3, 2, 3), C(3, 1, 4), and C(2, 1, 7) convolutional encoders, respectively.One should note that the probability of identifying the true encoder is close to 1 for any P e with a post-decoding BER r less than 10 -5 .Indeed, the algorithm performances are excellent: P det is close to 1 when P e corresponds to either BER r < 2 × 10 -4 for C(3,2,3) convolutional encoder or BER r < 0.67 × 10 -4 for the C(3,1,4) encoder.

Conclusion
This article dealt with the development of a new algorithm dedicated to the reconstruction of convolutional code from received noisy data streams.The iterative method is based on algebraic properties of both optimal convolutional encoders and their dual code.This algorithm allows the identification of parameters and generator matrix of a rate k/n convolutional encoder.The performances were analyzed and proved to be very good.Indeed, the probability to detect the true encoder proved to be close to 1 for a channel error probability that generates a post-decoding BER r that is less than 10 -5 .Moreover, this algorithm requires a very small amount of received bit stream.
In most digital communication systems, a simple technique, called puncturing, is used to increase the code rate.The blind identification of the punctured code is divided into two part: (i) identification of the equivalent encoder and (ii) identification of the mother code and  puncturing pattern.Our method, dedicated to the blind identification of k/n convolutional encoders, also allows the blind identification of the equivalent encoder of the punctured code.Thus, our future study will be to identify the mother code and the puncturing pattern only from the knowledge of this equivalent encoder.

A The key-parameter n a
According to (20), the rank of the matrix, R a.n , is: Let us seek n a , when n a = a.n, which corresponds to the first matrix, R n a , with a rank deficiency.This corresponds to seeking the minimum value of a.
So, the minimum value of a, denoted a min , is such that According to (35), the key-parameter n a is such that B The rank deficiency of R n a According to (36), the rank of R n a is such that Therefore, the rank deficiency of R n a , denoted Z(n a ) = n a − rank R n a , is The modulo operator is equivalent to where Z(n a ) N. Therefore, the rank deficiency of the matrix, R n a , is such that

Corollary 1
Let H(D) be a parity check matrix of C. The output sequence c(D) is a codeword sequence of C if and only if:

Figure 1
Figure1Example of matrix R l .An example of the received data bit stream reshape under matrix form.

Algorithm 1 :
Estimation of k and μ ⊥ Input: Value of n and na Output: Value of k and μ⊥ for

Figure 5 C
Figure 5 C(2,1,7): Probability of detection compared with P e for L = 500.For the C(2,1,7) encoder and L = 500, the probability of detecting the true encoder is depicted compared with the channel error probability for 1, 10, 40, and 50 iterations.

Figure 6 C
Figure 6 C(3,2,3): Probability of detection, probability of falsealarm, and probability of miss compared with P e .For the C(3, 2, 3), the probability of detection, the probability of false-alarm, and the probability of miss are depicted compared with he channel error probability.

Figure 7 CFigure 8 C
Figure 7 C(3,1,4): Probability of detection, probability of falsealarm, and probability of miss compared with P e .For the C(3, 1,4), the probability of detection, the probability of false-alarm, and the probability of miss are depicted compared with he channel error probability.

Table 1
Different values of l max (the minimum value of l max is given for three optimal encoders)