The gist of our proposed algorithm lies in converting the connection recovery problem into iMTSP, by introducing virtual segments and hierarchical chromosomes and then adopting a multi-objective optimization genetic algorithm to solve the optimal solutions of iMTSP. Firstly, it identifies the independent segments using a fuzzy C-mean clustering algorithm (FCMA) in a damaged network, in which internal nodes in the same cluster can communicate with each other. Secondly, it uses MDCs to travel among all segments, which then carry collected data back to the sink. The optimal path problem of the MDCs is transformed into our proposed iMTSP. Thirdly, the problem is solved by an improved multi-objective optimization genetic algorithm based on NSGA-II [27], which introduces hierarchical chromosome structures, improved population initialization, and self-defined genetic operating and optimized crowding distance calculation, allowing the optimal moving paths of different MDCs to be obtained.
Segment identification by clustering
For an unknown deployment, we would use inspection equipment to detect the physical location of the existing active sensor nodes, together with their location information. Considering the limited resources of the nodes, a clustering algorithm is used in the base station or the client. In this paper, the FCMA is used to undertake clustering analysis of the sensor nodes. We assume that the coordinates of the sensor nodes in the network is X={x
1,x
2,…,x
n
}. U=[u
ij
] is the fuzzy C-class matrix of the data set X, in which u
ij
is the membership function of the j
th data point for the i
th class. The total membership of the j
th data point for C-class satisfies the following conditions:
$$ \begin{aligned} \forall i,j,u_{ij} \in\, [\!0,1]; \\ \forall \sum_{j=1}^{n} u_{ij}>0, i=1,2,\ldots,c; j=1,2,\ldots,n; \\ \forall j,\sum_{i=1}^{e} u_{ij}=1 \end{aligned} $$
(11)
The clustering criterion is the minimization of the clustering loss function J
m
(U,V):
$$ J_{m}(U,V)=\sum_{j=1}^{n} {\sum_{i=1}^{e}} u_{ij}^{m}d_{ij}^{2}(x_{j},v_{i}) $$
(12)
where V is the clustering centre, m is the weighted index, and d
ij
(x
j
,v
i
)=∥v
i
−x
j
∥. After it is initialized, the algorithm is iterated continuously by Formulas (11) and (12), until converging to the minimum solution required realizing the fuzzy clustering.
Problem transfer, encoding, and decoding
A genetic algorithm (GA) is a type of parallel processing algorithm used to simulate biological evolution theory, which is an effective method for solving the multi-objective optimization problem. The solution process used in a genetic algorithm generally follows the following sequence: (1) conversion of the real problem into genetic sequences by encoding; (2) repeated execution of the basic genetic operations of selection, crossover, and mutation, producing new outstanding individuals, until the requirements are met; and (3) arrival at an approximate solution of practical problem according to decoding rules. Compared with a single-objective GA, a multi-objective GA must adopt special rules to evaluate the individual fitness and sorting.
To satisfy the requirements of problem transfer, we introduce virtual segments and hierarchical chromosomes into the iMTSP. The virtual segments transform the multi-travelling salesman problem into single travelling salesman problem with N+m−1 virtual segments (N is the number of segments, m is the number of MDCs). The virtual segment in iMTSP is different from the “virtual city” in MTSP, which is a set of virtual points in the whole segment, rather than a single representative point. The introduced hierarchical chromosome consists of two level chromosome structures: segment gene and node gene. The segment gene represents the travelled sequence of different segments, and the node gene indicates the visited internal sensor nodes of the corresponding segment. Both are coded using integer numbers, in which the rules and limitations of encoding and decoding are as follows:
Rule 1 Each segment in the network is uniquely identified by SegmentID, which is an integer number from 1 to N (N is the number of segments). Similarly, each node in each segment is uniquely identified by NodeID from 1 to n (n is the number of nodes in the segment). Thus, each node in the network can be uniquely identified by “SegmentID+NodeID”.
Rule 2 The corresponding relationship between the segment gene and the node gene lies in the gene position of the node gene and is not a simple one-to-one mapping.
Limitation 1 Each path of the MDCs starts from the source segment. The first gene position of the segment gene is always the source segment, and this does not participate in the subsequent crossover and mutation operation.
Limitation 2 Set \(c_{S_{11}}=M\), where M is a positive infinite value and S
1 is the source segment.
In decoding, the segment gene is visited one-by-one from the first gene position to the end position. The first gene position represents the source segment of the first MDC. From the second gene position, the decoding gene position is mapped to the source segment, which indicates the end of the moving path of the previous MDC and the beginning of the moving path of the next MDC. Different from the segment gene, the i
th gene position of the node gene stores the traversed NodeID of the i
th segment by a MDC.
As an example, let us assume that the number of MDCs is 3, denoted as MDC-1, MDC-2, and MDC-3. There are seven segments: S
1,S
2,S
3,S
4,S
5,S
6, and S
7, and the number of sensor nodes in each segment is 10 to 20. S
1 is the source segment, and S
8 and S
9 are the introduced virtual segments, all of which have the same internal nodes. Using a hierarchical chromosome structure, the feasible solution can be described as shown in Fig. 2.
As indicated in Fig. 2, MDC-1 starts from S
1, moves through S
7 and S
4, and returns to S
1 (S
1 and S
8 are the same segments). The visited node NodeID=7 of S
1 comes from the 1st gene position of the node gene, and NodeID=12 of S
7 comes from the 7th gene position of the node gene. Correspondingly, the visited nodes of S
4 and S
8 are NodeID=17 and NodeID=18. The moving path of MDC-1 can be expressed as 1(7)-7(12)-4(17)-1(7). Similarly, the moving path of MDC-2 is 8(18)-6(5)-2(15)-8(18), and the moving path of MDC-3 is 9(3)-5(10)-3(9)-9(3). It is worth noting that Limitation 1 and Limitation 2 could ensure that the encoding is effective and feasible. If we do not limit the chromosome gene encoding, the segment gene may cause the following two cases to arise:
1. The source segment and virtual segments could be distributed in arbitrary gene positions apart from the first gene position. As shown in Fig. 3, there are three MDCs and four closed-loop moving paths in the encoding structure, which would be difficult to decode.
2. The multiple virtual segments could continuously appear in the same chromosome. As shown in Fig. 4, the possible paths of the three MDCs are S
8−S
2−S
5−S
4−S
8, S
9−S
9, and S
1−S
2−S
7−S
3−S
1. The second path is invalid, starting from S
9 and returning to S
9. If we set \(c_{S_{11}}=M\) (M is a positive infinity), this kind of chromosome will be eliminated in the population selection of a GA.
Improved population initialization
Different from NSGA-II, the parallel selection method is adopted for random population optimization to improve the convergence speed of a GA. The specific strategies are as follows: (1) a population with M individuals is randomly generated for iMTSP and (2) the M individuals are ordered by different objectives and the best m individuals of each objective function are preserved. If the number of individuals is not achieved, the best m+1 individuals are preserved until M individuals are generated. It is worth noting that duplicate individuals are deleted in the preservation of optimal individuals. The new population is the initial population of the subsequent genetic algorithm.
Fast non-dominated sort
This method uses the same non-dominated sort as NSGA-II. Each individual I(i) has two parameters, n
i
and S
i
, where the domination count n
i
is the number of individuals that dominate I(i) and S
i
contains all the individuals being dominated by I(i). In the outline, the sorting is as follows: Firstly, the individuals with n
i
=0 are denoted as the first non-dominated front F
1, which are removed from the population, and the rank of these individuals is set to I(i)
rank
=1. Then, we continue to look for the non-dominated solution set from the remaining population and denote them as the second non-dominated front F
2; their individuals are ranked as I(i)
rank
=2. The above procedure is continued using the remaining population, and the third front is identified. This process continues until all fronts are identified. The computational complexity of the sorting is O(mN
2), in which m is the number of objectives and N is the population size.
Improved crowding distance
Once the non-dominated sort is complete, the crowding distance is assigned. Because the individuals are selected based on rank and crowding distance, all the individuals in the population are assigned a crowding distance value. The crowing distance of NSGA-II is calculated as:
$$ \begin{aligned} I(d_{i})=\sum_{m=1}^{M} \frac{I(i+1).m-I(i-1).m}{f_{m}^{max}-f_{m}^{min}},\\ i=2,3,\dots,n-1;I(d_{1})=I(d_{n})=\infty \end{aligned} $$
(13)
I(d
i
) is the crowding distance of individual I(i), and I(i).m is the value of the m
th objective function of the i
th individual. \(f_{m}^{max}\) and \(f_{m}^{min}\) are the maximum and minimum of the m
th objective function of individuals in front F
i
. M is the number of objectives. The essence of the crowing distance is to find the Euclidean distance between each individual in a front based on m objectives in m dimensional hyper space.
However, there are some limitations in calculating the crowding distance in NSGA-II as shown in Fig. 5: (1) Individual I(a
1) and I(b
1) are close to each other and are far from other individuals. By contrast, individual I(d
1) and I(e
1) are more dispersed. Using the calculation in NSGA-II, the crowding distances of I(a
1) and I(b
1) are greater than I(d
1) and I(e
1). In gene selection, individuals I(a
1) and I(b
1) are retained simultaneously, and individuals I(d
1) and I(e
1) are eliminated. In fact, the ideal selection is that one of I(a
1) and I(b
1) is removed and both of I(d
1) and I(e
1) are retained. (2) Individuals I(b
2) and I(d
2) have the same (or similar) crowding distances, and they would have the same (or similar) genetic probabilities in the selection. However, the ideal situation is that the selection probability of I(d
2) is greater than that of I(b
2), because the uniform distribution of I(d
2) is better than that of I(b
2).
In order to solve these problems, we propose an improved calculation strategy: firstly, we introduce the crowding distance threshold of adjacent individuals in the same non-dominated front. If the distance between adjacent individuals is less than the threshold, some of the adjacent individuals are removed according to the elimination strategy. Secondly, the crowding distances are recalculated for the remaining individuals in the same front using Formula (13). Obviously, where the values of crowding distance are higher, this indicates that the distribution of individuals is better.
Crowding distance threshold: set \(\theta _{m}^{k}\) is the threshold on the m
th objective function in the non-dominated front F
k
. After sorting individuals by arbitrary objectives in front F
k
, the threshold is calculated as follows:
$$ \theta_{m}^{k}=\frac{f_{m}^{max}-f_{m}^{min}}{2(n-1)},m=1,2,\dots,M $$
(14)
The threshold is adjusted along with the evolution and the density of individuals. It is higher in the early evolution and becomes smaller in the later one. If \(\forall m \in M, |I(i+1).m-I(i).m|<\theta _{m}^{k}\), then some similar individuals are removed by the following two strategies.
Removing strategy 1 If one of the solutions is a boundary solution in the front F
k
, then the non-boundary solution is removed. The main purpose is to keep the boundary solutions and expand the scope of the non-dominated solution as soon as possible.
Removing strategy 2 If both solutions are not boundary solutions, we introduce variables φ
i
and φ
j
of individual I(i) and I(j) to judge which individual should be removed.
$$ \begin{aligned} \varphi_{i}=\sum_{m=1}^{M} {[\!I(i).m-I(i-1).m]*\,[\!I(j+1).m-I(i).m]},&\\ \varphi_{j}=\sum_{m=1}^{M} {[\!I(j+1).m-I(j).m]*\,[\!I(j).m-I(i-1).m]},&\\ j=i+1 \end{aligned} $$
(15)
If φ
i
>φ
j
, individual I(i) will remain and individual I(j) will be removed. In the opposite case, I(j) will remain and I(i) will be removed. As shown in Fig. 6, the distance variance \(\sigma _{i}^{2}\) between individuals and their centre is
$$ \begin{aligned} \sigma_{i}^{2}&=\frac{\sum_{m=1}^{M} [I(i).m-I(i-1).m-\frac{I(j+1).m-I(i-1).m}{2}]^{2}}{M} \\ &=\frac{\sum_{m=1}^{M} [\frac{(I(j+1).m-I(i).m)-(I(i).m-I(i-1).m}{2}]^{2}}{M} \\ &=\frac{1}{4M}\sum_{m=1}^{M} [I(j+1).m-I(i-1).m]^{2}-\frac{1}{M}\varphi_{i} \\ \end{aligned} $$
(16)
From Formula (16), I(j+1).m−I(i−1).m is a fixed value for individual I(i) and individual I(j). Therefore, the larger the φ
i
of individual I(i) is, the smaller the \(\sigma _{i}^{2}\) is. In other words, where the centre deviation between individuals I(i), I(i−1), and I(j+1) is smaller, the preserved individual I(i) is better for maintaining population diversity. It is worth noting that if I(i) and I(j) are repeated individuals, they would also be removed to avoid unnecessary genetic operation.
Selection operator
The same tournament selection operator is used as in NSGA-II: (1) if the non-dominated fronts are different between two random individuals, the individual with the smaller front is selected to ensure optimization searching along the non-dominated solution direction and (2) if they belong to the same front, the individual with the greater crowding distance is selected to maintain the population diversity.
Improved crossover and mutation
Our proposed algorithm differs from NSGA-II in respect of the introduced hierarchical chromosome structure. It will use a different crossover and mutation operator based on the characteristics of the segment gene and the node gene. Because the starting segment of any path is the source segment, the first gene position of the segment gene does not participate in the crossover and mutation. However, all the gene positions of the node gene are involved in the genetic operation.
The segment gene uses “Order Crossover Operator (OX)”. In essence, a swath of consecutive alleles from parent A drops down, and the remaining values are placed in the child in the order in which they appear in parent B, as shown in Fig. 7.
The node gene uses “Partial Matching Crossover Operator (PMX)”. In essence, parent A donates a swath of genetic material, and the corresponding swath from the other parent is sprinkled around in the child. Once this is done, the remaining alleles are copied directly from parent B, as shown in Fig. 8.
The segment gene is mutated by an exchange mutation operator, involving the exchange of two randomly selected genes. The node gene is mutated by an integer mutation operator, which itself is replaced by another gene with a certain mutation probability. The mutation operators are as shown in Fig. 9.
Elite strategy
The same recombination and selection strategy are used as in NSGA-II. The offspring population is combined with the current generation population, and selection is performed to set the individuals of the next generation. Because all the previous and current best individuals are added into the population, elitism is ensured. The population is now sorted based on non-domination. The new generation is filled by each front subsequently until the population size exceeds the current population size. If by adding all the individuals in front F
k
the population exceeds N, then individuals in front F
k
are selected based on their crowding distance in descending order until the population size is N.
Complexity analysis
The iMTSP is an improved multi-travelling salesman problem, which belongs to the non-deterministic polynomial (NP) problems. The computational complexity of our proposed algorithm is mainly contained in the clustering algorithm and the multi-objective genetic algorithm, with the latter playing the dominant role. The overall complexity of our algorithm is O(mN
2), in which m is the number of objective functions and N is the number of individuals in the initial population.