### The tough problem about error propagation in network coding

In network coding, its very nature of combining information in the intermediate nodes makes it very susceptible to transmission errors. A single error will be propagated by every node further downstream in networks. This will thus prevent the reconstruction of the file in the sink. There are fruitful works about network error correction coding (NEC) in network coding. However, none of the existing works solved the error spread problem in random network coding. If network coding theory is to be applied the commercial theory to the reality, a new method should be proposed to cope with the error spread in the random network coding. A deep understanding of how the error is spread in random network coding is necessary to tackle this difficult issue.

To understand the error propagation in network coding, we will illustrate the transmission model in network coding. A sketch approach is as follows. *κ* ∈ (*F*_{
Q
})^{k × 1} is the original uncompressible message needed to be sent where *k* < *C*. *C* is the max flow min cut, and *F*_{
Q
} is the finite field in the source. In the source of a multicast network, *κ*is encoded with a MDS (maximum distance separable) linear block code Ω. The block code Ω is with code length *C* and information length *k* and is denoted by (*C*, *k*). The maximum distance of code Ω is *d*_{min}. *κ* ∈ (*F*_{
Q
})^{k × 1} is coded into *X* ∈ (*F*_{
Q
})^{C × 1} with *X* = *G*^{∗}*κ*. *G* ∈ (*F*_{
Q
})^{n × k} is the generate matrix of code Ω. *X* is then sent through the network with the network coding scheme. The transport procedure can be expressed by the formulation *Y* = *T* ⋅ *X* + *T*_{Z → Y} ⋅ Z [1]. *Z* is the error vector which occurs actually in the network. The length of *Z* ∈ (*F*_{
Q
})^{t × 1} is the number of links where the error occurs. *T* and *T*_{Z → Y} are the transfer matrix in network of *X* and *Z* respectively. *Y* ∈ (*F*_{
Q
})^{C × 1} is the received messages vector in the sink. To decode *κ*, we first perform decoding algorithm of network coding scheme. The procedure can be expressed by the formulation *X* = *T*^{−1} ⋅ *Y* − (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z. Second, the decoding algorithm of Ω is performed to get *κ* ultimately. If there are no errors which occur in the network, the vector (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z ∈ (*F*_{
Q
})^{C × 1} will be zero vector, and this moment, *X* = *T*^{−1} ⋅ *Y* = *G* ⋅ *κ*. *κ* is mapped to *X* based on the codebook of code Ω. Therefore, *κ* can be decoded successfully based on the decoding algorithm of code Ω. Now, we consider the situation that (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z is not omitted. We denote the number of nonzero components of (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z ∈ (*F*_{
Q
})^{C × 1} with *t*'. If *t*' < *d*_{min}/2, based on the block coding theory, we can also decode *κ* successfully. However, because of the impact of *T*^{−1} ⋅ *T*_{Z → Y}, (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z will have nonzero components more than *t*'. That means *t*' > *t*, and it may *t*' < *d*_{min}/2. In this situation, we can decode *κ* no longer based on the decoding algorithm if the code is Ω. In the sense of network coding, the error Z ∈ (*F*_{
Q
})^{t × 1} is propagated into (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z ∈ (*F*_{
Q
})^{C × 1} with *t*' nonzero components. If *t*' > *t*, we say the error is propagated. We denote the error *Z*, which is the error occurs actually, with the terminology “original error”. After the impact of the network coding, the error *Z* is propagated into (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z ∈ (*F*_{
Q
})^{C × 1}. We call the error (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z “propagated error”. Usually, the nonzero components of (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z are far more than *t*. This is the famous error propagation problem in the network coding. We can also interpret this problem from other perspective. This will help us understand the error propagation problem more clearly. We call the above model as the first model and the upcoming model as the second model. In this paper, we adopted the first model and the second model is just used to interpret the propagate problem more clearly. In the sense of transformation of generate matrix *G*, the transport procedure can be expressed by the simplified formulation *Y* = *T* ⋅ *X* = *T* ⋅ *G* ⋅ *κ*. We can regard *T* ⋅ *G* as the generate matrix a new code Ω'. Because of the impact of *T*, the minimum distance *d*_{min}' of code Ω' is usually smaller than *d*_{min}, i.e., *d*_{min}' ≤ *d*_{min}. The error *T*_{Z → Y} ⋅ Z can be regarded as a disturbance to the new code Ω'. *t*^{''} < *d*_{min}'/2 is the number of nonzero components in the vector *T*_{Z → Y} ⋅ Z. If *t*^{''} < *d*_{min}'/2, we can decode *κ* successfully based on the decoding algorithm of code Ω'(not Ω). Therefore, we can conclude that the combination operation of network coding in the intermediate nodes makes it harder to decode *κ*. The issue in the network coding is more complicated than that in the point-to-point communication environment. In this environment, to recover the original uncompressible message *κ*, we should first perform network decoding and then perform decoding procedure of code Ω. The decoding difficulty not only comes from the combination of network coding, but also is leaded by the complexity of code Ω. The network coding scheme in the network and code Ω in the source must maintain good coordination and cooperation. The two codes should be constructed delicately.

To describe the model more conveniently, two terminologies are defined here. They are *original error Z* and *propagated error* (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z. *Original error Z *is caused by many reasons. It includes the random error from physical cause and error caused by attacks from malicious nodes. It also includes the error caused by the rank deficiency of the transfer matrix of the source messages. The terminology of “original error” captures the essence that the error is injected to one link from the outside word. It is the equivalent of “symbol error” defined in [2]. It is also the equivalent of “corrupted packets” defined in [3] and “erroneous packets” in [4]. Another terminology is *propagated error* ((*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z) which represents the propagated result of the link error in the network coding. Under the influence of the error transfer matrix, the link error is enlarged to the propagated error. The definition of the two terminologies “original error” and “propagated error” are just right necessary.

Based on the brief description about error propagation in the network, it is easy to see that the key factor is to reduce the number of propagated error. That is to say we should make the value *t*' as small as possible. In (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z, Z is objective and it cannot be modified. If we want to decrease *t*', *T*^{−1} ⋅ *T*_{Z → Y} is the only target that we can change. *T*^{−1} ⋅ *T*_{Z → Y} reflects the construction effect of network coding scheme. We have to construct network coding scheme delicately to avoid the spread of error Z. Just for a sink, it may appear *t*' < *t*, but for a multicast network, the minimum of *t*' among all the sinks is indeed bigger than *t*. If else, we can always reach the upper bound of multicast capacity, no matter how many original errors. This contradicts with common sense. We cannot construct such unrealistic perfect network coding scheme. Yang et al. [5] also confirm this point. Therefore, in the network coding, the best case is *t*' = *t*. We need to construct the network coding scheme delicately to reach this best case. Even so, this best case just appears in the coherent network if Hamming distance is adopted. The reason is that, in coherent network, we can take advantage of the topology which is a prior knowledge. That means we can construct *T*^{−1} ⋅ *T*_{Z → Y} ingeniously to restrain the spread of Z. A coherent network is a network whose topology is known and stable. Accordingly, non-coherent network is a network whose topology is unknown. However, in non-coherent network, *T*^{−1} ⋅ *T*_{Z → Y} is completely random. We cannot interfere with the construction of *T*^{−1} ⋅ *T*_{Z → Y}. The simulation experiment in the following section shows, even if *t* = 1, *T*^{−1} ⋅ *T*_{Z → Y} ⋅ *t* is always between *C*/2 and *C* when the max flow min cut is smaller than 7. If the max flow min cut is bigger than 7, even if *t* = 1, *T*^{−1} ⋅ *T*_{Z → Y} ⋅ *t* is always equal to *C*. In the latter situation, the propagated errors (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z always pollute all the components of received messages *Y*. It is not surprising to see such a phenomenon because *T*^{−1} ⋅ *T*_{Z → Y} is completely random. Although some works increase the information rate as far as possible, the effectiveness of such improvements is limited [6, 7]. *T*^{−1} ⋅ *T*_{Z → Y} has the potential to pollute the received messages completely. Therefore, the propagated error in network coding is dense. Especially, we have to face the fact that a propagated error mostly all the time pollute just 100% of the received packets. If we want to apply network coding scheme in a real application, an efficient NEC has to be proposed. Unfortunately, there is no such a method to solve this problem.

Though there are fruitful works for the error-correcting of network coding, most of them have a fatal drawback that they cannot correct corrupted propagated errors which are dense. As far as we know, all the existing works about NEC have an unrealistic assumption: the number of errors in network coding is bounded by a constant which is less than *C*. More so, in the situation where the traditional block codes are used, they assume that the number of errors in network coding is smaller than *C*/2. They cannot correct errors beyond *d*_{min}/2 = (*C* − *k*)/2 < *C*/2 where (*C*, *k*) is MDS linear block code with code length *C* and information length *k*. Only a few studies refer to the assumption that the number of errors in network coding is bigger than *C*/2*.* They are list coding in network coding [8, 9] homomorphic signatures [10]. However, homomorphic signatures based on cryptographic approaches can correct the propagated error which equals *C*. However, cryptographic approaches have great complexity, and it is completely impractical in NEC [10]. Although a novel idea using nonlinear network coding seems a promising method to correct more errors, it cannot solve the problem completely just now and this method needs further being studied [11]. Except homomorphic signatures [10], there is no approach that can cope with propagated error which reaches *C*.

Next, we will give a review of previous research works. All the mentioned works will be judged from such a perspective: how many errors can be corrected by these methods at most? Can they correct the propagated error which reaches *C*? For limitation of space, we will just outline the most relevant and typical works about NEC. Sanna and Izquierdo [12] has made a survey on NEC.

### Existing error-correcting methods in network coding

Existing solutions can be categorized into cryptographic approaches and information theoretic approaches. Generally speaking, cryptographic approaches have high rate, but they have high complexity. Meanwhile, information theoretic approaches have low complexity, but they cannot cope with the dense corrupted errors beyond *C*/2.

Cryptographic approaches include the methods such as keys, signatures, null space, and authentication [13]. And also, it is really worthwhile to mention the homomorphic signatures [10]. In the homomorphic signatures, the intermediate nodes can combine and encode the incoming hash packets and forward them without knowing the content of native packets or private key of the source node. The hash packet is a parity part of the common packet. The parity part is the so-called homomorphic signature. The excellent advantage of this scheme is that it can correct all the errors occurring in every links in network coding. That is to say it has solved the extreme tough problem: the network coding is susceptible to errors. However, almost all the existing homomorphic signature schemes have high complexity and intolerable delay [14]. The reason is that every intermediate node needs to perform time-consuming computation based on cryptographic approaches. Therefore, the homomorphic signatures are utterly impractical.

The existing works about information theoretic approaches for NEC can be divided into two categories. In the first category, a secret channel is needed. On the contrary, the secret channel is not needed. The representative work of the first type is [15]. After both parity symbols and hash values are sent over secret channels, the original messages can be recovered through solving equations. Parity symbols and hash values are used to counteract the uncertainty of the transfer matrix in the network coding. However, both the field size and the packet length are needed to be sufficiently large. A secret channel is also needed. Therefore, it would not be a promising practical scheme. The second kind type is *redundancy NEC* which introduces redundancy to the space domain. This method is also the so-called FEC (forward error correction) method. It is also the main stream in NEC. These skills can decode the code when the number of erasures and errors is within the minimum distance of the code. These skills are developed from traditional codes’ methods that add redundancy in the time domain.

For a coherent network, there are such typical relevant work as follows. Based on the concept of *transfer matrix in network coding* defined in [1], Zhang presents a ground-breaking method that gives a framework of error-correcting in coherent networks [16]. Some important concepts such as *error vector* and *error pattern* are introduced. The transfer of messages and errors in the network coding can be expressed by the following equation.

$$ Y=\left(\kappa A+z\right){\left(I-F\right)}^TB $$

(1)

Here, *A*, *B*, and *F* denote the adjacency matrixes of the source node, the sink node, and the whole graph. *κ* is the source message injected into the network, and *Y* is the message received by the sink. *I* is the unit matrix. The error is considered as an 1 × ∣ *E*∣error vector *z* ∈ (*F*_{
Q
})^{1 × ∣ E∣}. The error pattern of *z* is a 1 × ∣ *E*∣ error vector *ρ*_{
z
} with unitary components in the corresponding nonzero components of *z*. Note that Z ∈ (*F*_{
Q
})^{t × 1} (in Section 1.1) and *z* ∈ (*F*_{
Q
})^{1 × ∣ E∣} are all the original error in essence. Some components in*z* ∈ (*F*_{
Q
})^{1 × ∣ E∣} can be equal to zero, and no components in *z* ∈ (*F*_{
Q
})^{1 × ∣ E∣} can be equal to zero. They just can be defined with different dimensions to adapt different algebraic formulations. *rank*(*ρ*_{
z
}) is the number of the nonzero components of *z*. It also generates traditional codes’ concepts of the minimum distance and MDS (maximum distance separable) to NEC. It shows that if the coding field is big enough, there exists a linear network MDS code with minimum distance *d*_{min} = *C* − *k* + 1 where *k*is the dimension of the information transmitted in the source. The *d*_{min}of the generated NEC has properties like traditional codes. It can decode error *z* with rank(*ρ*_{
z
}) ≤ 1/2^{∗}(*d*_{min} − 1). It also can decode erasure *z* with rank(*ρ*_{
z
}) ≤ *d*_{min} − 1. The erasure is like that mentioned above, the error links, i.e., error pattern is known by the sink. Obviously, the MDS codes in [16] cannot correct dense corrupted errors beyond *C*/2 because *d*_{min} = *C* − *k* + 1.

For the construction of MDS codes of NEC in coherent networks, [16] does not give an efficient construction algorithm. It just outlines this existence of MDS code in coherent networks. Its brute force decoding algorithm just checks all error patterns *ρ* in a non-decreasing order of cardinality up to rank(*ρ*) ≤ 1/2^{∗}(*d*_{min} − 1) and then solve equations. Therefore, [17,18,19,20] each gives the construction algorithm of a MDS codes in NEC. Yang et al. [17] is based on the model of [16], and it just considers the situation that rank(*ρ*) ≤ *d*_{min} − 1 = *C* − *k* where rank(*ρ*) is the number of the nonzero of *ρ*. Xuan et al. and Matsumoto [18, 19] all modify the Jaggi-Sanders algorithm [21] to get an efficient construction algorithm. In finding the global coding kernel for the processing link, they select a vector from the candidate vectors. Through an exhaustive search, the candidate vectors are promised as legal by avoiding all the error patterns *ρ* where rank(*ρ*) ≤ *d*_{min} − 1 = *C* − *k*. Bahramgiri and Lahouti [20] is very similar to [18]. Except [17,18,19,20], there is no other important construction schemes for coherent networks with Hamming metric as far as we know. Especially, what deserves to be mentioned is that these error-correcting codes for coherent networks is based on Hamming distance. It handles symbol errors, rather than dimensional errors [2]. However, obviously, all the three schemes also cannot correct dense corrupted errors beyond *C*/2 because rank(*ρ*) ≤ *C* − *k*.

For non-coherent networks, there are mainly two kinds of methods essentially. They are subspace codes based on the subspace distance and rank codes based on rank distance. Silva and Kschischang present a seminal idea of subspace codes. They consider, for the completely random network channel resulting from random coefficient operations in the intermediate nodes, the only stable thing is the vector space spanned by the source messages. The subspace distance captures the above essence. Theorem 2 in [4] shows it can correct up to an *error dimension* \( \left\lfloor \frac{D\left(\Omega \right)-1}{4}\right\rfloor \) where *D*(Ω) is the minimum space distance about the space code Ω. Therefore, a subspace code Ω can correct \( \left\lfloor \frac{D\left(\Omega \right)-1}{4}\right\rfloor \) corrupted packets (it is original corrupted packets) at most. It is for that, in the worst case, *t* corrupted packets (it is original corrupted packets) can make 2*t error dimensions*. Therefore, the space code cannot correct more than \( \left\lfloor \frac{D\left(\Omega \right)-1}{4}\right\rfloor \) errors.

Another approach is the rank code [22]. It can also correct no more than *d*_{min}/2 corrupted packets where *d*_{min} is the minimum rank distance of the rank codes. The maximum value of *d*_{min} is *C*.

After we are aware of the shortcomings about the subspace distance in Silva and Kschischang’s method, [2] combines correcting symbol errors and dimension losses together. Symbol errors are based on the Hamming metric, and dimension losses are based on the subspace distance metrics. However, the effectiveness of this improvement is limited. Skachek et al. [2] points that most of NEC approaches have a limited error-correcting ability. It also shows directly, rather than evading the questions and avoiding them, that the errors in network coding are really very dense. Mahdavifar and Vardy [9] generates Silva and Kschischang code about list decoding. It shows that for any L, the list-L decoder can correct at most \( L-\frac{L^2\left(L+1\right)}{2}R \) errors where *R* is the normalized rate of the code. However, the solutions are not unique and additional redundancy must be send again. Therefore, it is also inefficient. Based on [9, 23], this improves the list model and achieves a much bigger decoding radius \( \frac{C}{k}\hbox{-} 1 \) where *C* is the max flow min cut and *k* is the size of information. Guruswami and Sudan [24] can correct \( C\cdot \left(1-\sqrt{1-\frac{{\mathrm{d}}_{\mathrm{min}}}{C}}\right) \) errors where Hamming distance is adopted instead of subspace distance.

In conclusion, most approaches can correct few errors. However, it is not realistic to assume that, *t*, the number of the original error is less than *C*/2 or \( \left\lfloor \frac{D\left(\Omega \right)-1}{4}\right\rfloor \) (in [16]). The original error usually exceeds *C*/2. Under a fixed BER (bit error rate), the more the links, the more the corrupted packets. The situation will be worse where no link-layer error correction is performed; this kind of network includes sensor networks where the computational power is not sufficiently large. The random errors are more common in reality indeed. Based on the NEC developed from traditional codes, the error-correcting ability cannot go beyond *d*_{min}/2 according to the inexorable law. Even listing decoding is influenced by this law. Though list decoding can correct errors whose number is beyond *d*_{min}/2, the ability of correcting more errors is at the cost that the solution is not unique. It needs the additional information to make the solution unique. The essential reason is that the decoding is performed in the framework of traditional block codes.

Let us take a look at this problem from a different perspective formulation sketchily. The perspective is how the original error is transferred into the propagated errors based on different distance metrics. Martínez-Peñas [25] compares Hamming metric and the rank metric in network coding. *X* is transmitted with network coding scheme. The transport procedure can be expressed by the formulation,

$$ Y=T\cdot X+{T}_{Z\to Y}\cdot \mathrm{Z} $$

(2)

The propagated error is (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z ∈ (*F*_{
Q
})^{C × 1}. If the metric is Hamming distance, where the Hamming weight of *Z* is *t*, then the Hamming weight of (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z cannot be restricted within a small range less than *C*. Theoretically, it may equal the dimension of *X* and *Y*, where *X* and *Y* are both *C* × 1 dimensional vectors. It means *Z* pollutes all the received messages and the messages are spread over the networks. When the max flow min cut is bigger than 7, the coding field must be bigger than 7. If the size of coding field is bigger than 7, the Hamming weights of (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z is equal to *C* with a very high probability, even if there is only one nonzero component. It means that a single error may pollute all the received messages almost all the time, if the Hamming distance metric is adopted in network coding. The propagated error is much greater than earlier thought. For coherent networks, it provides the Hamming weights of (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z is equal to *Z*’s through exhaustive searching all the error pattern. It means, based on the construction of MDS codes in coherent networks, it prevents the corrupted error *Z* from spreading and puts the corrupted errors into a “cage” [17,18,19,20]. This needs a prerequisite that the characteristics of the networks must be known a priori. Hamming distance is not suited for non-coherent NEC.

For subspace distance metric, Theorem 1 shows that

$$ {d}_s\left(\left\langle TX\right\rangle, \left\langle Y\right\rangle \right)\le 2{rankT}_{Z\to Y}Z\le 2 rankZ\le 2t $$

(3)

Here, rank*Z* is the rank of *Z*, which means *t* corrupted errors can make 2*t* dimension change at most even though the network is not known at all according to the space metric. For the rank distance metric, it is obvious that *t* corrupted errors can make *t* rank changes in the received messages at most. It is for that the rank of (*T*^{−1} ⋅ *T*_{Z → Y}) ⋅ Z is no more than *Z*’s even though the *T*^{−1} ⋅ *T*_{Z → Y} is not known at all. Therefore, from the perspective of “rank” metric, the error *Z* will not spread. If Hamming metric is adopted, the error will spread. However, the spread of original errors in non-coherent networks is avoided by adopting the subspace metric or the rank metric; the number of original errors can be corrected is very small. For coherent networks, the number of original errors can be corrected is also very small. Constructing a MDS codes based on the Hamming metric is a tough work too, because there are too many combinations of the error patterns to make an exhaustive search [17,18,19,20]. When the network becomes larger, this method will completely fail.

A possible idea for NEC is to find a method to correct corrupted errors as many as possible, rather than errors less than *d*_{min}/2. Avoiding this point of “correct more errors” may be unrealistic. We need a new error-correcting framework which is different from the block codes theory.

### Correcting propagated errors in network coding via L1 minimization

Recently, there is a new trend which is applying the machine learning theory to communications [26]. Combing the compressed sensing with the network coding is intensively researched by a lot of work. Among them, [27, 28] are representatives. However, there are some drawbacks. It takes advantage of the related information in different sources If these messages in different sources are not correlative, it is hard to apply the compressed sensing. The common assumption in this kind of studies is that the coding vectors of network coding satisfy the restricted isometry property (RIP) of the compressive sensing. This assumption has no serious mathematics foundation and not been met all the time in experiments [29]. Its purpose is mainly to decrease sampling frequency rather than to communicate.

There are also some works of applying the convex optimization to correct errors. Wright and Ma [30] is the first important in this kind of works. It proposes a decoding algorithm which is completely different from decoding methods of the block codes. However, based on the experimental data’s analysis, the error-correcting ability is nearly equal to that of the traditional codes. Therefore, this method cannot be used to correct dense errors like the propagated error in network coding. We noticed an interesting work whose result is very surprising and attractive in [30]. This work is developed from the work [31]. It suggests that the accurate and efficient recovery of sparse signals is possible even 100% of the observations are nearly corrupted. The error fraction which can be corrected is at most 50% in traditional codes. In formulation *d*_{min}/*n* = ((*n* − *k*)/2)/*n* = (1 − *k*/*n*)/2 where *n* is length of one code and *k* is the information rate, we can see, even *k*/*n* is 0, the *d*_{min}/*n* is 1/2 at most. This work is stirring and attractive. Its character of correcting dense error is inherently suited for solving the tough task to correct dense “propagated error” in network coding. Ganesh et al. and Qiu and Vaswani [32, 33] are other works whose performance may be near the performance of [30]. Except [32, 33], as far as we know, none of the other works in parse recovery have studied the case of any other kinds of large noise. However, in our work in this paper, we just consider applying [30] to the network coding.

The information does not need have any a priori characteristics inherently. Some redundancy of zeros is added to produce a sparse signal. Then, the sparse signal is sent through a dense error channel. Finally, the received signal polluted by the dense error can be corrected accurately via L1 minimization. The networks adopting network coding can be seen as such a channel with dense errors. Therefore, it is natural to apply [30] to the network coding.

Though near 100% errors can be corrected, in experiments, [30] can correct successfully varying the fraction of errors from 0 to 0.95. But when the fraction of errors is high, the information rate is low. It also gets a high information rate when the corrupted fraction is 60%. Theoretically, [30] cannot correct accurate (not near) 100% fraction of errors. Correcting 100% fraction of errors contradict the information theory and the basic truth. However, as mentioned above, when the size of the coding field is bigger than 7, the corrupted fraction is nearly 100%. Therefore, we should give some kinds of a priori interfaces to the network coding to decrease its corrupted percentage.

In the sense of the sparseness of L1 minimization in [30], we can make the transfer matrix sparse in random network coding. This method makes the “propagated errors” less than the “original errors”, even the latter’s fraction is 1 (each link in networks has an original error). After this operation, the error-correcting scheme of the network coding will be performed successfully, regardless of how many the corrupted errors in networks there are.

However, the above method is just effective for the environment where the max flow min cut is smaller than 7. For the communication in the realistic world, the network whose min cut is bigger than 7 is the common environment. Therefore, we have to give some improvements based on L1 minimization in [30].

### The potential viable solutions

It is impossible to avoid the two facts: one is that the propagated error will just pollute 100% of the observations in random network coding and the other is the max flow min cut of the most networks is bigger than 7. Applying the network coding technique to the real world needs us to find a new way of dealing with the propagated errors in random network coding. Because the coherent network is a model with unrealistic assumptions about real processes, we are not going to take the question of error correction for coherent network into consideration for the time being. We will only consider the error correction issue for random network coding. This problem is pregnant and agent.

Based on the above description, the existing methods to solve the error-correcting problem in network coding are divided into six categories. The first is cryptographic approaches whose representative work is homomorphic signatures [10]. The second is being based on Hamming distance to construct a NEC (network error correction coding) which is MDS (maximum distance separable). The third is being based on the rank distance [3] or subspace distance [4] to construct a NEC which is also MDS. The fourth is the list decoding. The representative work is [9]. Mahdavifar and Vardy [9] generate Silva and Kschischang code to list decoding. Especially what deserves to be mentioned most is that there are also list decoding methods based on rank distance [8] and Hamming distance [34, 35]. Unlike [9] which designs list decoding with a purpose to solve the error-correcting problem in random network coding, [8, 23, 24, 34, 35] are not designed for solving the error-correcting problem in random network coding. However, because [8, 23, 24, 34, 35] can correct errors beyond *d*_{min}/2, they may have the potential to solve the propagated errors in random network coding. The fifth is the approach based on the secret channel. The representative work is [15]. The sixth is John Wright’s method. The method has been introduced briefly in Section 1.3.

All the above methods have not solved the propagated error problems in network coding. However, we also want to select some methods among them as the potential approaches to solve this tough problem. Even if they do not work at it ultimately, they can offer some insight and enlightenment for us at least. Because of the obvious lack of utility, we will neglect approaches from the first to the third. The first approach has a high complexity. The second and the third approaches can correct errors not beyond *d*_{min}/2. Though the fourth approach can correct errors more than *d*_{min}/2, the normalized rate of errors which are corrected by list coding is also far less than 100%. It also seems that the fifth has the potential to solve the propagated problem in random network coding. The rate of the approach with shared channel in [15] is *C* − *t* − *k*^{2}/*j*. The source encodes *k* unimpressive packets into one batch with *C* − *t* redundancy and then sends the batch into network. A packet contains a sequence of *j* symbols from the finite field *F*_{
Q
}. If n is big enough, *k*^{2}/*j* will be a asymptotically negligible term. Therefore, the rate of the approach with shared channel in [15] is *C* − *t* asymptotically. If *t* is big enough, we say, *t* = *C* − 1, the error will pollute nearly all the observations. Even if the error pollute nearly all the observations, this scheme also can correct the error by adding *t* redundancy packets and send both parity symbols and hash values over secret channel. The size of parity symbols and hash values reach *k*^{2}/*j* which can be neglected. However, the redundancy (*t* redundancy packets) is too much which results in a low information rate ultimately. What is nice is that this scheme can correct errors beyond *d*_{min}/2 though the redundancy cost is expensive. Based on the traditional block code, we cannot correct errors beyond *d*_{min}/2. However, the fifth method also cannot correct propagated error which is *t* = *C*. However, it certainly draws some inspiration and reference to the error correction when *t* = *C*.

Obviously, among the six methods, there is no*t* an alone method which can correct the dense errors in random network coding. One *is* inspire*d* with the combination of some methods to solve the difficult problem. John Wright’s method is a method which seems has the greatest potential to solve the tough problem of errors spread in random network coding if some variable improvement solutions are given to [30]. However, some improvements must be given.

### Combination of L1 minimization and list decoding

Though [30] can correct dense error which is nearly 100% of the observations, they also cannot correct completely 100% corrupted observations. The conclusion is that, if Hamming distance is adopted, the ratio of propagated error will be 100% in random network coding and there is no method which can correct so dense errors. The inspiration is to adopt list-decoding of subspace codes for error correction because the ratio of propagated error will be not 100% in random network coding in the sense of subspace distance. To overcome the drawback that the solution of list decoding is not unique, we need an “inner code” to achieve a unique solution from the multiple solutions of list decoding. However, the traditional block code will not be the candidate of the “inner code” because it will not correct errors exceed *d*_{min}/2 where the errors in the “inner code” will exceed *d*_{min}/2 usually. We will adopt the L1 minimization method of John Wright as the “inner code” which can correct errors more than *d*_{min}/2. The list decoding code can be such codes as in [24] which can correct \( C\cdot \left(1-\sqrt{1-\frac{{\mathrm{d}}_{\mathrm{min}}}{C}}\right) \). Though \( C\cdot \left(1-\sqrt{1-\frac{{\mathrm{d}}_{\mathrm{min}}}{C}}\right) \) can approach *C* when d_{min} approaches *C*/2 and *C* is big, the propagated errors are always equal to *C*; [24] also cannot correct errors 100% of the propagated errors. Thus, we should adopt such list codes which are based on subspace or rank distance. In the sense of subspace distance, the propagated error usually does not pollute all the received messages. Thus, the list code based on subspace can correct propagated errors in random network coding though the solution is not unique. To make the solution unique, we combine the list code based on the subspace code and LI optimization together, and LI optimization can make the solution unique after the list decoding of the subspace code is performed. In one of the solutions after list decoding, there are some differences between the solution and the real unique solution and the difference may exceed *C*/2 errors. If L1 minimization method of John Wright be as the “inner code”, it can correct difference which may exceed *C*/2 errors. The whole decoding procedure is completed. This is a perfect combination of different methods.

The remainder of this paper is organized as follows. Section 2 presents a brief review on [30] and gives some basic definitions. In Section 3, we will formally give our scheme. Then, Section 4 performs the experiments. Finally, Section 5 presents our conclusions.