Efficient weakly secure network coding scheme against node conspiracy attack based on network segmentation

In this paper, we consider the problem of building a secure network against node conspiracy attack that based on network segmentation. As we know, network coding has demonstrated its great application prospects in wireless sensor network (WSN) transmission. At the same time, it is facing a variety of security threats, especially conspiracy attack. In existing research, secure coding design strategies are much more than secure topological structure. In this background, a weakly secure scheme is proposed from the perspective of topology and network segmentation. Based on the network segmentation and topology design, the network coding transmission is weakly secure. We conduct a simulation to show that the proposed scheme can efficiently prevent conspiracy attack.


Introduction
In 2003, Li [1] demonstrated that with a finite field size, the maximum flow from the single source to sinks can be achieved by linear network coding [2,3]. Based on this theory, network coding technology, network coding has demonstrated its great application prospects in both wired networks and wireless networks. With large-scale application of network coding, it faces a growing number of security issues.
There are many studies on the safety of network coding. Based on secure models, it can be separated into two groups in previous, Shannon secure and weakly secure. The difference of these two classes is that Shannon secure disallows any information leakage and weakly secure disallows any meaningful information leakages. For example, given two data streams x and y, based on weakly secure requirements, the attacks allow to get the combination value of x ⊕ y, but not x or y alone, while in Shannon secure, the attacks disallow learning neither of them. In this paper, we focus on the weakly secure topological structure.
Based on the attack models, there are mainly two attack models, polluting attacks (active attack) [4][5][6][7][8] and wiretapping attacks (passive attacks) [9][10][11][12][13][14][15][16][17][18]. In this paper, we focus on wiretapping attacks, defined by Cai and Yeung in [9] and proposed a multicast network coding against wiretapping attacks in [10,11]. Feldman et al. [12] proposed a coding scheme in small infinite field at the expense of a small amount of bandwidth. Chan [13] gave the boundaries of the multicast capacity in secure network coding. Bhattad and Narayanan [14] proposed a weakly secure network coding system. On the basis of [14], Silva [15] proposed a general weakly secure network coding system. When the calculation ability of eavesdroppers is limited, Jain [16] designed a weakly secure networking system using one-way function. In [17,18], the authors discussed the security issues in the light of the different conditions and different other safety requirements in wireless sensor network (WSN). Fancsali et al. [19][20][21] did the corresponding research and gave the respective security coding system.
Most existing researches are mostly from a coding perspective with given topologies, which propose different coding algorithms in different network environments, but there is almost no research in secure topology design. Topology design strategies do not need complicated encoding and decoding process which save a lot of memory and computing time. Topology design also has certain failure rate, but when the cooperative eavesdroppers are relatively fewer and allow a certain error rate, the topology design scheme reflects its advantage.
In view of the fact, the secure topology design is worthy of study. In this paper, we propose a weakly secure network topology algorithm based on topology design and network segmentation. The rest of the paper is organized as follows: We first discuss related work and the security goals we aim to in Section 2. Then, we discuss the problem and present an algorithm for secure network topology design and network segmentation in Section 3. Finally, the results and discussions are addressed in Section 4, and the paper is concluded in Section 5.

Problem statement
In this section, we first summarize network coding and then we introduce the system model. Finally, we introduce the threat model and security goals to be used in this paper.

Network coding
Unlike traditional communication networks, network coding is a new technique that allows intermediate nodes to encode multiple input messages together to form multiple output messages. We can use the following example to show how network coding can provide high transmission rate. Figure 1 is a classic example of network coding, each link has unit capacity. Node W encodes the message that is transmitted from W to X through a linear combination b1 ⨁ b2. Thus, the source S-transmitted two bit streams b1 and b2 can multicast to the nodes Y and Z simultaneously; the transmission rate can achieve the multicast rate of 2 bits per unit time.

System model
In this paper, a directed acyclic graph G = 〈V, E〉 is considered, where V and E are the node set and the edge set, respectively. C min (G) is the minimum cut of G, and the capacity C min (G) is the maximal possible information rate of network G. Each edge has one data stream unit per time slot. The source node s generates and sends out an n symbol message vector X = (x 1 , x 2 , …, x n ) T in a finite field F q . In linear coding systems, the messages on outgoing edges of node v n are linear combinations of messages on its incoming edges. It can be understood as that each edge of the network carries an equation of source symbols.

Threat model and security goals
In wiretapping attack, the eavesdroppers are able to gain access to the information transmitted on these nodes, suppose the positions of malicious nodes are known. Also, they can cooperate with each other to decode the packet sent from the source S. Precisely, they can wiretap on a collection of M = 〈M 1 , M 2 , …, M K 〉, where M represents a set of malicious nodes; accordingly, they can gain the data stream carried by the incoming links of these malicious nodes, suppose E = 〈E 1 , E 2 , …, E K 〉 is the incoming link of these malicious nodes. In this paper, we focus on weakly secure. We disallow any meaningful information leakages transmitted from the source node to the sink node.

Related definitions
is less than the capacity of the graph C G (s, d), then for sufficiently large size q, the generated network code is said to be secure with high probability, because the intermediate node v i cannot recover any of the k symbols based on k − 1 or fewer linear equations. On the other hand, if C G ≤ |ln (v i )|, the security is said to be topology dependent, and the network is considered secure if and only if rank(in(v i )) < C G .
In [22], the sibling work of this paper, we analyzed how the topology design influenced the security of networks, and we proposed a secure strategy against node conspiracy attack by topology design. This method is a b suitable for the small network environment; when in a large network, the wiretapping nodes become more, increasing the number of links that needs to be removed. Therefore, we propose a strategy of network coding against wiretapping attack based on network segmentation. Figure 2 is a directed acyclic graph, and each link has a unit capacity. The source node s wants to transmit some information to the destination node t without leaking meaningful information to the eavesdropper. Suppose there are m malicious nodes, each node has n incoming links C G (s, t) = k. We need to remove mn − k links to ensure the network security. If we divide the network into two sub-networks, assuming that malicious nodes are uniformly distributed, we just need to remove approximately mn −k= 2 links to the network security.

Secure network segmentation algorithm
In Figure 2, we randomly generate a 50-node network diagram; after path enforcement, we get a directed graph G(V, E). The entire network is divided into two subnetworks G 1 (V 1 , E 1 ) and G 2 (V 2 , E 2 ) by the red line; the dashed line is the link to be removed. C(G 1 ) = C(G 2 ) = 3, any one of the sub-network is safe and leads the whole network security.
How to find the best split routing is the problem that we are mainly faced with, and we get two objective functions.
We know that the min-cut sum of the two sub-networks is no more than the min-cut of the whole network: To ensure the throughput of the network, the divided maximum flow is as close as possible to the original maximum flow.
We find the split routing and remove the dashed links; the removed links are E p 1 . Then, we pick one sub-network, with the algorithm of [22]; the removed links are E p 2 : The two objective functions are max C G 1 þ C G 2 ð Þand min E p . Such an algorithm is referred to as secure network segmentation (SNS) algorithm.

An improved scheme based on the network segmentation and topology design
In [22], we proposed a secure strategy against node conspiracy attack by topology design (ISTD). Here, we use Figure 3 to describe the method generally.
We recommend a conception. As node i, Case 1. in(i) = 1, out(i) = 1, the outgoing message is no change with incoming message (shown in Figure 3a) Case 2. in(i) = 1, out(i) > 1, the outgoing messages are linear correlation of incoming message (shown in Figure 3b) Case 3. in(i) > 1, out(i) = 1, the outgoing message is a combination of the incoming messages (shown in Figure 3c) Case 4. in(i) > 1, out(i) > 1, the outgoing messages are not linear correlation of incoming messages. We can mark these no linear correlation messages as G i,j = (x 1 , x 2 , …, x l ) N (shown in Figure 3d) In Figure 4a, suppose nodes 3, 5, and 8 are malicious nodes. Links {2 → 3, 9 → 3, 4 → 5, 10 → 5, 1 → 8, 7 → 8} are polluted links. They carry the different messages (x, y) 1 , z, (x, y, z) 1 , z, y, and x. Each data stream x, y, z appears three times in these six polluted links. To satisfy the security requirements, we need to remove three polluted links at least. Using the STD algorithm, removing {9 → 3, 4 → 5, 10 → 5} is the best result. Only one additional link {3 → 4} needs to be removed. In Figure 4b, we do not remove any polluted links. Instead, we remove {6 → 7} and re-create the network topology; we will find that the topology is already weakly secure. Eavesdroppers cannot get data stream x, which means the data stream x of {2 → 3, 4 → 5, 7 → 8} comes from {6 → 7}; the sink node d receives data stream x from another way. It tells us that if we can get every data source of polluted links, we may find an advanced scheme to solve the security problem.
In Figure 4c, we mark the original network topology in another way. Node 6 gets the data stream x from the source node s and sends it to node 7 and node 11. We define the source node is a parent node of node 6, and node 7 and node 11 are child nodes of node 6. We mark the data stream x carried by {6 → 7} and {6 → 11} as x 1 and x 2 , node 7 gets the data stream from node 6 and points to nodes 2 and 8, then the data stream carried by {7 → 8} and {7 → 2} can be marked as x 1.1 and x 1.2 ; here, x a,b,c,d⋯ represents different sources of data x. For node 5, it gets the data stream (x 1,2 , y 2 , z 1 ) 1 and z 2,1 .
From the above transmission matrix G, polluted links {2 → 3, 4 → 5} carry the data stream x 1,2 , {7 → 8} carries the data stream x 1,1 , which both come from the data stream x 1 , and the sink node d receives both x 1 and x 2 . Once link {6 → 7} is removed, the eavesdroppers cannot get the data stream x, while the sink node d can get complete information from the source node s.
It needs to be mentioned that the same data streams need to be combined. For example, the data x 1,2,3,4 and x 1,2,3,5 can be combined into x 1,2,3 , and the data x 1,2,3,4 and x 1,1,2,3 are combined into x 1 ; the combination is to find the maximum number of occurrences of the original data. In Figure 4, {5 → d} carries the data (x 1,2 , y 2 , (z 1 , z 2,1 )) 1 where (z 1 , z 2,1 ) means the data stream z comes from both {9 → 3} and {9 → 10}, and it can be grouped into z. The biggest features of the ISTD algorithm are finding the sources of each polluted edges and removing these relatively few source edges to make the network secure.
Combining the two algorithms of SNS and ISTD, we get an improved scheme based on topology design and (3) network segmentation (ISNS). We summarize this method into the following steps: Step 1: Given a directed acyclic graph G = 〈V, E〉, after path enforcement, suppose the positions of malicious nodes are known, the minimum cut of G is the set of malicious nodes. E ′ ¼ e ′ 1 ; e ′ 2 ; …; e ′ n À Á is the set of the incoming edges of V′. We call it polluted edges.
Step 2: Mark all links of the network topology, fill the transmission matrix G with the data stream X, and combine the source of information flow. Step 3: Remove the same linear correlation of incoming message. We define E″ is the set of the remaining edges which we called it polluted edges.
Step 4: Get a forward routing which contains the polluted edges. The total number of the forward routing cannot be more than the total polluted edges. The forward routing is the split routing. The entire network is divided into two sub-networks G 1 (V 1 , E 1 ) and G 2 (V 2 , E 2 ). Suppose the split routing is E p .
Step 5: We calculate the number of polluted links of the two sub-networks E ″ 1 and E ″ 2 . Suppose the less polluted links is E ″ 1 . For safety requirements, all the links that need to be removed is E ″ 1 þ E p .
Step 6: Get the solution of the topology design and network segmentation. The least links to be removed is the optimal solution, if there is no such a topology that satisfies the condition. Then, the message leakage is unavoidable.

The performance and discussion of SNS
In this section, simulations are conducted based on ns-2 simulator and MATLAB to evaluate the effectiveness of the proposed algorithm. The network is defined by these parameters, the number of nodes, N, (the number of edges, E all ), the probability of malicious nodes in Figure 6 Removed links E p vs. N.   intermediate nodes, p, and the removed links E p . The algorithm in [22] is the STD algorithm, and the SNS algorithm is the basic scheme in this paper. For each combination of parameters, we generate 50 instances.
In Figure 5, we set N = 50 and vary p in the range of (0.1 − 0.3) to calculate E p . In Figure 6, we set p = 0.2 and vary N in the range of (30-70).
We can see from Figures 5 and 6 that, with the increase of p and N, the SNS algorithm removes few links than the STD algorithm. It improves the link utilization, when faced with a large number of wiretapping nodes, and performs particularly well. Compared to the STD algorithm, the SNS algorithm in this paper has been greatly improved, especially suitable for larger networks.

Improved SNS algorithm (ISNS)
STD is the proposed scheme in [22]. We proposed an advanced scheme ISTD, relative to STD. ISTD was greatly increased efficiently. In this paper, the ISTD algorithm will be integrated into SNS. ISNS is the improved scheme based on the network segmentation and topology design. Compared with ISTD, the efficiency of ISNS algorithm has been greatly improved. It removed less polluted links; the small change of the transmission topology improves the successful rate.
The network is defined by four parameters, the number of nodes, N, the probability of malicious nodes in intermediate nodes, p, the largest degree of each node D (the largest amount of incoming links), and the successful rate of transmission r. For each combination of parameters, we generate 50 instances.
In Figure 7, we set N = 50, D = 7 and vary p in the range of (0.1 − 0.35) to calculate the r. In Figure 8, we set p = 0.2, D = 7 and vary N in the range of (50-100) to calculate the r. With the increase of p and N, the ISNS algorithm performs better than ISTD. From the simulation results, we can see that ISNS can cope with larger structures and more malicious node network conspiracy attack.

Performance comparison between coding design strategies and topology design strategies
Compared with secure coding design strategies, the topology design scheme has its own advantages and disadvantages. From the ISTD algorithm and ISNS algorithm, we can see that topology design strategies do not need complicated encoding and decoding processes; they use linear network coding which save a lot of memory and computing time. The proposed algorithm also has certain failure rate, but when the cooperative eavesdroppers are relatively fewer (not more than 0.2) and the network allows a certain error rate, the topology design scheme reflects its advantage. In Table 1, we give the comparison of coding design strategies and topology design strategies.
It is especially suitable for the network with low percentage of malicious nodes and allows certain error rate.

Conclusion
In this paper, we have investigated the topology design and network segmentation issue for weakly secure against node conspiracy attack. We analyzed how the network segmentation and topology design influenced the security of networks. We proposed a secure strategy against node conspiracy attack by network segmentation and topology design. We compared the ISTD and ISNS strategies. Simulations showed that the proposed routing algorithm ISNS achieved good performance. It can cope with larger structures and more malicious node network than ISTD. As a future research, we will study the secure topology design strategy under a large number of malicious nodes and a larger structure.