3.1 Related definitions
Assume C_{
G
}(s, t) = k, for any intermediate node v_{i} in G; if the In (v_{i}) is less than the capacity of the graph C_{
G
}(s, d), then for sufficiently large size q, the generated network code is said to be secure with high probability, because the intermediate node v_{i} cannot recover any of the k symbols based on k  1 or fewer linear equations. On the other hand, if C_{
G
} ≤ ln (v_{i}), the security is said to be topology dependent, and the network is considered secure if and only if rank(in(v_{
i
})) < C_{
G
}.
In [22], the sibling work of this paper, we analyzed how the topology design influenced the security of networks, and we proposed a secure strategy against node conspiracy attack by topology design. This method is suitable for the small network environment; when in a large network, the wiretapping nodes become more, increasing the number of links that needs to be removed. Therefore, we propose a strategy of network coding against wiretapping attack based on network segmentation.
3.2 Secure network segmentation algorithm
Figure 2 is a directed acyclic graph, and each link has a unit capacity. The source node s wants to transmit some information to the destination node t without leaking meaningful information to the eavesdropper. Suppose there are m malicious nodes, each node has n incoming links C_{
G
}(s, t) = k. We need to remove mn  k links to ensure the network security. If we divide the network into two subnetworks, assuming that malicious nodes are uniformly distributed, we just need to remove approximately \raisebox{1ex}{$\mathit{\text{mn}}k$}\!\left/ \!\raisebox{1ex}{$2$}\right. links to the network security.
In Figure 2, we randomly generate a 50node network diagram; after path enforcement, we get a directed graph G(V, E). The entire network is divided into two subnetworks G_{1}(V_{1}, E_{1}) and G_{2}(V_{2}, E_{2}) by the red line; the dashed line is the link to be removed. C(G_{1}) = C(G_{2}) = 3, any one of the subnetwork is safe and leads the whole network security.
How to find the best split routing is the problem that we are mainly faced with, and we get two objective functions.
We know that the mincut sum of the two subnetworks is no more than the mincut of the whole network:
{C}_{{G}_{1}}+{C}_{{G}_{2}}\le {C}_{G}
(1)
To ensure the throughput of the network, the divided maximum flow is as close as possible to the original maximum flow.
We find the split routing and remove the dashed links; the removed links are {E}_{{p}_{1}}. Then, we pick one subnetwork, with the algorithm of [22]; the removed links are {E}_{{p}_{2}}:
{E}_{p}={E}_{{p}_{1}}+{E}_{{p}_{2}}.
(2)
The two objective functions are max \left({C}_{{G}_{1}}+{C}_{{G}_{2}}\right) and min E_{
p
}. Such an algorithm is referred to as secure network segmentation (SNS) algorithm.
3.3 An improved scheme based on the network segmentation and topology design
In [22], we proposed a secure strategy against node conspiracy attack by topology design (ISTD). Here, we use Figure 3 to describe the method generally.
We recommend a conception. As node i,
Case 1. in(i) = 1, out(i) = 1, the outgoing message is no change with incoming message (shown in Figure 3a)
Case 2. in(i) = 1, out(i) > 1, the outgoing messages are linear correlation of incoming message (shown in Figure 3b)
Case 3. in(i) > 1, out(i) = 1, the outgoing message is a combination of the incoming messages (shown in Figure 3c)
Case 4. in(i) > 1, out(i) > 1, the outgoing messages are not linear correlation of incoming messages. We can mark these no linear correlation messages as G_{i,j} = (x_{1}, x_{2}, …, x_{
l
})^{N} (shown in Figure 3d)
In Figure 4a, suppose nodes 3, 5, and 8 are malicious nodes. Links {2 → 3, 9 → 3, 4 → 5, 10 → 5, 1 → 8, 7 → 8} are polluted links. They carry the different messages (x, y)^{1}, z, (x, y, z)^{1}, z, y, and x. Each data stream x, y, z appears three times in these six polluted links. To satisfy the security requirements, we need to remove three polluted links at least. Using the STD algorithm, removing {9 → 3, 4 → 5, 10 → 5} is the best result. Only one additional link {3 → 4} needs to be removed. In Figure 4b, we do not remove any polluted links. Instead, we remove {6 → 7} and recreate the network topology; we will find that the topology is already weakly secure. Eavesdroppers cannot get data stream x, which means the data stream x of {2 → 3, 4 → 5, 7 → 8} comes from {6 → 7}; the sink node d receives data stream x from another way. It tells us that if we can get every data source of polluted links, we may find an advanced scheme to solve the security problem.
In Figure 4c, we mark the original network topology in another way. Node 6 gets the data stream x from the source node s and sends it to node 7 and node 11. We define the source node is a parent node of node 6, and node 7 and node 11 are child nodes of node 6. We mark the data stream x carried by {6 → 7} and {6 → 11} as x^{1} and x^{2}, node 7 gets the data stream from node 6 and points to nodes 2 and 8, then the data stream carried by {7 → 8} and {7 → 2} can be marked as x^{1.1} and x^{1.2}; here, x^{a,b,c,d⋯} represents different sources of data x. For node 5, it gets the data stream (x^{1,2}, y^{2}, z^{1})^{1} and z^{2,1}.
From the above transmission matrix G, polluted links {2 → 3, 4 → 5} carry the data stream x^{1,2}, {7 → 8} carries the data stream x^{1,1}, which both come from the data stream x^{1}, and the sink node d receives both x^{1} and x^{2}. Once link {6 → 7} is removed, the eavesdroppers cannot get the data stream x, while the sink node d can get complete information from the source node s.
It needs to be mentioned that the same data streams need to be combined. For example, the data x^{1,2,3,4} and x^{1,2,3,5} can be combined into x^{1,2,3}, and the data x^{1,2,3,4} and x^{1,1,2,3} are combined into x^{1}; the combination is to find the maximum number of occurrences of the original data. In Figure 4, {5 → d} carries the data (x^{1,2}, y^{2}, (z^{1}, z^{2,1}))^{1} where (z^{1}, z^{2,1}) means the data stream z comes from both {9 → 3} and {9 → 10}, and it can be grouped into z. The biggest features of the ISTD algorithm are finding the sources of each polluted edges and removing these relatively few source edges to make the network secure.
Combining the two algorithms of SNS and ISTD, we get an improved scheme based on topology design and network segmentation (ISNS). We summarize this method into the following steps:

Step 1: Given a directed acyclic graph G = 〈V, E〉, after path enforcement, suppose the positions of malicious nodes are known, the minimum cut of G is C_{
G
}. {V}^{\prime}=\left({v}_{1}^{\prime},{v}_{2}^{\prime},\dots ,{v}_{m}^{\prime}\right) is the set of malicious nodes. {E}^{\prime}=\left({e}_{1}^{\prime},{e}_{2}^{\prime},\dots ,{e}_{n}^{\prime}\right) is the set of the incoming edges of V′. We call it polluted edges.

Step 2: Mark all links of the network topology, fill the transmission matrix G with the data stream X, and combine the source of information flow.

Step 3: Remove the same linear correlation of incoming message. We define E″ is the set of the remaining edges which we called it polluted edges.

Step 4: Get a forward routing which contains the polluted edges. The total number of the forward routing cannot be more than the total polluted edges. The forward routing is the split routing. The entire network is divided into two subnetworks G_{1}(V_{1}, E_{1}) and G_{2}(V_{2}, E_{2}). Suppose the split routing is E_{
p
}.

Step 5: We calculate the number of polluted links of the two subnetworks {E}_{1}^{\u2033} and {E}_{2}^{\u2033}. Suppose the less polluted links is {E}_{1}^{\u2033}. For safety requirements, all the links that need to be removed is {E}_{1}^{\u2033}+{E}_{p}.

Step 6: Get the solution of the topology design and network segmentation. The least links to be removed is the optimal solution, if there is no such a topology that satisfies the condition. Then, the message leakage is unavoidable.