A routing algorithm on delay-tolerant of wireless sensor network based on the node selfishness

According to the problem of the intermittent connectivity of the nodes in the delay-tolerant sensor network, considering that the nodes in the network are of social selfishness, we study how to reduce transmission delay and communication overhead on the basis of ensuring the message delivery success rate of the entire network. We present a delay-tolerant routing algorithm based on the node selfishness (DTSNS). Firstly, we divide the activity area into grids. Secondly, we predict the next mobile location of the nodes based on the Markov process and estimate the credibility of the nodes based on the node selfishness in order to reduce the network communication overhead and improve message delivery success rate. Then, we put forward the delay-tolerant algorithm based on delay constraints. Lastly, the simulation results show that the DTSNS has better performance which has higher level of data delivery rate and lower level of message delay and resource, compared with Spray and Focus, Spray and Wait, and Epidemic.


Introduction
In recent years, routing has always been a core issue in the studies on wireless sensor network. And the network topology structure is transformed from simple structure (All nodes cannot move.) to not simple, for example, the heterogeneous structure (The monitoring nodes are stationary and the relay nodes are mobile.) and the structure that all nodes in the network move. The problem of the intermittent connectivity due to the random mobility of the nodes in the network brings great challenges to the routing [1]. T Spyropoulos et al. have presented routing for disruption-tolerant networks. It has given routing module that is dependent on the network characteristics exhibited [2]. A Vasilakos et al. have recommended protocols and applications of delaytolerant networks (DTNs) [3].
Wireless sensor network is a typical DTN [1]. A Dvir et al. have presented backpressure-based routing protocol for DTNs [4]. To solve the problem of the intermittent connectivity of the nodes in the DTN, we generally adopt the routing strategy based on carrier transfer. The nodes carry the message all the time. After encountering the appropriate nodes which are used to relay messages, the initial node delivers the messages in the form of multiple hops to the destination node. Thus, we know that the performance of the routing algorithm depends much on the intermediate nodes [5] and the messages arriving to the destination cost much more time (The delay of the network is large). However, in the application in the delay-tolerant wireless sensor network (DTSN), the nodes in the network may show some social selfishness and these nodes will refuse to relay messages for other nodes in order to reduce their own overhead. Therefore we need to take some corresponding measures to reduce the effects of the node selfishness on the network performance [6,7]. It is of great significance to predict the neighbor nodes within the communication range and deliver data and to ensure the message delivery success rate in the entire network and to reduce the transmission delay and communication overhead under the premise of the relay nodes with selfishness. According to these situations, the paper puts forward a routing algorithm on delay-tolerant of wireless sensor network based on the node selfishness.
The calculation process of the algorithm is divided into three steps. The first step is that we divide the activity area into grids and we predict the next mobile location of the nodes based on the Markov process. We can calculate the probability of reaching at each grid and choose the grid with the largest probability as the next mobile location so that to build the best possible route to reduce the transmission delay. It follows as section 3.2 and section 3.3. The second step is that we estimate the credibility of the mobile relay nodes based on the node selfishness in order to reduce the network communication overhead and improve message-delivery success rate, such as section 3.4. The third step is that we put forward the delay-tolerant algorithm based on delay constraints, showed in section 3.5. We carry out the performance analysis of the algorithm and simulation experiment in the section 4. Section 5 is the summary of the paper.

Related research
The copies of message routing and the data forwarding routing are the major in the current researches on sensor network delay-tolerant routing [1]. Routing strategy based on the copies of message is mainly by controlling the number of copies of message to improve the message delivery rate and reduce the message delay. According to the number of the copies of message in the network, it can be divided into simple-copy routing [8,9] and multi-copy routing [10,11]. The two strategies have their own advantages and disadvantages. The simple-copy routing has fewer copies compared with multi-copy routing so that it has less overhead and longer life of sensor nodes and the network. Although the overhead of the multi-copy routing network is large, by increasing the number of copies of message, the message can quickly reach to the destination nodes so as to improve the message submission rate and reduce the network delay. For the dynamic changes network structure, if we use the simple-copy to transfer the message, the worst case is that the nodes with messages may never meet with the destination nodes resulting in great message loss rate. So, multi-copy routing strategy has higher reliability than the simple-copy routing strategy. The main representative algorithms are as follows: direct transmission routing protocol [12,13], first-time-connection routing protocol [8], random routing protocol [9], Spray and Wait routing protocol [14], Epidemic routing protocol [10], and RDAD routing protocol [15]. Routing protocol based on data forwarding is mainly uses relevant knowledge to establish related model. Using the model to predict the probability of surrounding nodes reaching to the destination nodes, we can obtain the forwarding probability of the corresponding node and then choose the nodes with largest forwarding probability to relay messages. The main representative algorithms are as follows: PROPHET routing protocol [16], Spray and Focus routing protocol [17], and FPAD routing [18].
The nodes can communicate with each other and forward data when two nodes meet. But for DTN, some nodes considering their own resource usage [19] may refuse to the data delivery request from other nodes, or some nodes do not obey the rules of the relevant agreement. They take advantage of the defects existing in the agreement rules to obtain more resources than other nodes in the network [6,7]. When there are selfish nodes in the delay-tolerant network, some nodes cannot deliver data successfully even if they meet. Node selfishness greatly hinders the normal communication between nodes and causes negative impact on the network performance and reduces the network connectivity constitutionally [20]. Related research shows [21] that with 10% to 40% of selfish nodes, the network throughput may drop 16% to 32%. How to motivate nodes to deliver data and reduce the social selfish behavior of the nodes is of great significance [22].

Problem description and model assumption
In the DTN, the relay nodes mostly use the 'store-carryforward' [23] approach to deliver data. So, in order to route data to their destination timely to achieve a higher success rate of transmission, we need to have as many nodes as possible in the connection detection process and detect the existing connections timely. Meanwhile, we minimize the number of messages of the relay carrying and erupting as possible to reduce the delay and the bandwidth costs.
In the paper, we assume that all nodes are distributed in the plane of the 2D. The entire wireless sensor network constitute of M mobile sensor nodes {M 1 , M 2 , M 3 ,……M m } and N stationary sensor nodes {N 1 , N 2 , N 3 ,……N n },M ≺ ≺ N. The mobile nodes only play the role of forwarding, no data collection. The stationary nodes deliver the collected data to the mobile nodes or directly deliver to the sink node, while the mobile nodes can only forward data between each other or deliver data to the sink node. There may be some disconnected parts of the entire network, dividing the whole network into multiple connected subsets, in the name of 'fragmentation network' [24]. Under the situation, it needs mobile nodes to forward data, as showed in Figure 1. The mobile performance of mobile nodes is as follows: firstly, it randomly selects a target location; secondly, it moves in a certain speed to the target position; thirdly, it stays for some time when reaching to the target position. It then randomly selects the next target location to continue to move, and so forth. The residence time of the node cannot exceed a certain threshold and the moving speed cannot exceed a predetermined maximum speed. In order to avoid random move causing the waste of energy, we set a rule that it will move in a reverse direction when the mobile nodes forwarding data to the sink node or after arriving at the network boundary.

Divide the activity area into grids
For a given information collection area, the maximum length is L and the maximum width is W. The average move speed of the nodes is V. We can divide the area regular squares, in other word, grids. Among it, ⌈x⌉ is the top integral value of x. We number the corresponding grids. The number of the sink node is (0, 0), the left is (−1, 0) and the right is (1, 0), as showed in Table 1.

Predict the next mobile location of the nodes based on first-order Markov process
The Markov chain is a kind of stochastic process with characteristics of discrete properties in mathematics. The system from one state to another depends only on the current departure state and has nothing to do with the previous history. For the delay-tolerant network, the movement of nodes in the network meets with the random walk model. In other words, the next moment location of the node in the network only relates to the current node location and has nothing to do with the location of the nodes which are before the current nodes. The nature is the Markov process. So, we define the location of the nodes as a stochastic process, showed in formula 1.
From the node motion model, we know that the process should be a first-order Markov process. So, we adopt first-order Markov process to predict the next moment location of the current node. According to the properties of Markov, we can obtain the Markov probability equation, showed in formula 2.
Observing the active nodes, we record its location after a time interval ΔT. The location sequence is showed in formula 3.
The symbol w1, w2, w3, ……, wn indicates the grid number of the mobile node H. In order to calculate each  Table 1 Numbering of the activity area (grids) of mobile nodes grid transition probability of active node, we define wi as w i,j , which indicates the node from location i to location j. According to the formula 3, we calculate the probability from i to each grid, showed in formula 4.
The symbol F sub (H, ij)indicates the frequency of subsequence ij among the corresponding history location sequence H of the node A. The value reflects the frequency that after the node moving to location i the next location is j in the past movement process. The symbol F sub (H, i) indicates the frequency of subsequence i among the corresponding history location sequence H of the node A. The value reflects the frequency that the node moves to the location i in the past movement process. So, the transition probability of the corresponding nodes in the historical sequence can reflect the probability of the next location. The Markov location transition matrix can be set according to the probability value, showed in formula 5.
As the nodes run in the network, the history location sequence of the node will become larger making the associated nodes having more accurate forecasts. The sum of each row of the matrix should be 1, showed in the formula 6.
It represents the practical significance: when the node is now in position i, the sum of the probability of arriving at all the nodes in the network should be 1. This is an inevitable event. So, the predicted location of the next moment of the corresponding node should be j. The corresponding j needs to meet with formula 7.
Because of formulas 3, 5 and 7, we can obtain the most possible path of the node movement as showed in formula 8.
The symbol w p [jx] indicates the most possible grid of the next moment when the node now in the location j.

Estimate the credibility threshold R 0 based on the node selfishness
In the delay-tolerant sensor network with social selfishness, the nodes to communicate the need to meet the two conditions at the same time. One is that the nodes in each other's communication range can set up the physical connection. The other is that the nodes between each other are willing to transfer data for others. For the node with social selfishness, its data relay ability is much stronger with much more nodes in its willing forwarding domain. When choosing the relay nodes, we should choose the nodes whose willing forwarding domain has much more nodes to relay message. The reason is that these nodes can easily deliver the message out in the course of the subsequent message relay. It can reduce the effects of selfish nodes on the network performance by choosing the nodes whose willing forwarding domain has much more nodes to relay message. If there are much more nodes in the willing forwarding domain, it indicates that the node has a higher credibility in the network. Namely, it is willing to deliver data for other nodes and it has lower social selfishness. So, we can introduce the credibility to evaluate the nodes, as showed in formula 9.
The symbol Mi indicates the number of the nodes in the willing forwarding domain of node i and N indicates the number of all the nodes except of the node i in the entire network. The symbol Ri reflects the credibility of node i in the network. The nodes with higher credibility in the network behave less selfish. Conversely, the nodes with lower credibility behave more selfish. The average credibility of the nodes in the network generally reflects the corresponding relationship between nodes in a network. We can use the average credibility to reflect social selfishness of the network. So, we can obtain corresponding mathematical description formula of social selfishness of network, showed in formula 10.
From the expression formula of the selfishness of network, we know that the average credibility of the nodes in the network is higher and the corresponding selfishness of network is lower.
In the injection stage of the copies of message, it can have choice of injection according to the level of the credibility of the node itself. It can improve the later spread speed of the copies of message by choosing the nodes whose willing forwarding domain has a higher credibility to inject. If the copies of message are injected to the nodes which have lower credibility, in the later injection stage, the nodes cannot meet with the nodes injected for a long time to make the injected delay of the copies of message large. The worst case is that the node injects the copies of message to the node with credibility value 0 to result in the copy of message turning into invalid. This is because the copied message cannot be injected out through the node.To improve the injected efficiency of the copies of message in the network, we set up an injection threshold R 0 to make the node with copies of message inject according to the threshold value. Thus, it only injects the corresponding copies of message to the nodes whose credibility exceed the threshold value R 0 in its willing forwarding domain. The main steps of the algorithm are as follows: Step 1: When the node is passing a static node, the node receives the data from corresponding static nodes.
Step 2: Look for the shortest grid path link to the sink node, implementation method showed in section 3.5.2.
Step 3: We judge whether there are the nodes within the communication scope in path link whose selfishness value is larger than R 0 . If there are, the nodes will transform data and delete information in itself, implementation method showed in section 3.4.
Step 4: If there are none, according to the Markov prediction method in section 3.3, we calculate the most possible movement track of the node to compare with link path to check whether there are coincident points. If there is coincident point, the node carrying message will move.
Step 5: If there are none, we check whether there are nodes whose selfishness value is larger than R 0 within the communication scope of the nodes with message carried. If there are, the copies of message will be injected, the injection rules showed in section 3.5.3.
Step 6: If the above all cannot meet the conditions, we will move the node carrying message into the next operation.
Step 7: Until the information reaches to the sink node, the whole process is over.

Grid link
By the simple mathematical knowledge, we know that the linear distance between two points is the shortest, while in the two-dimensional plane grid, the diagonal distance is the shortest. So, the shortest grid link can be converted into the number of displacement grids between diagonal and axes x and y.
The number of grids from each grid to the diagonal is the minimum value of this grid coordinate, as showed in formula 11.
The symbol Num d is the number of grids from grid W i to grid W 00 . The two symbols |w i,x |, |w i,y | are the absolute values of the x,y coordinates of the grid i. From formula 11, we know the displacements from grid W i to grid sink moving along the line y = x, as showed in formula 12.
The symbol Num dx and Num dy is defined as the number of girds of the x,y coordinates from grid W i to grid W 00 .
From formulas 11 and 12 we can obtain a shortest grid link, as showed in formula 13.
The symbol path (W i -W 0 ) is the number of grids from grid W i to grid W 0 . The symbol d shorts for the symbol Num d . The grid link is showed as the shaded part in Figure 2.

The rules of copy injection
The message carrier S will judge whether the credibility value R i is greater than R 0 when it meets the node set Sum i in the domain of willing to forward. If R i ≥ R 0 , the node will inject the copies of message to the node i. Instead, it will not. We need to quantify the delay conditions when injected. So, we introduce two time parameters on message: T s and T cur . The symbol T s indicates the effective time delay of message M. In other Figure 2 The shortest grid area. words, it is the prescribed target delay of message M depending on the need to delay of the application. T cur indicates the time that it has been used. We consider it valid only when message M is passed to the target node in the time of T s . When the time used by message M exceeds T s , the message M will be considered invalid and nodes can automatically delete the copies of message.
For the value T s , we can add a field in the packet header of message M. When the message is produced, the application layer will set value to the time field to be used when it judges the time constraints at the subsequent message transmission process, while for the value T cur , we can directly obtain it by message timestamps.

The simulation experiment
In the simulation experiment, the first step is to use the software SETDEST in the NS-2 to simulate the motion model. We should set the size of the area of the node movement, the speed value of the node movement, the number of nodes in the area of movement and the simulation running time. The second step is that CBRGEN is used to produce the packets delivered between the nodes. Finally, the results of the three parameters can be drawn out: the message delivery rate, the message delay and the network overhead.
The successful delivery rate of message -The theoretical value of the successful delivery rate of message is 100%. The successful delivery rate of message indicates the radio of the amount of receiving information of the destination node and the amount of sending information of the source node, in a certain time limit. The message delay -The message delay is the average time of the message delivered successfully from the source node to the destination node. The overhead of message delivered -The number of the copies of message reflects the network resource consumption of the message in the transfer process. Firstly, the copy of message needs to take up the corresponding cache space to consume the storage resource. Secondly, the message needs energy when transferred to consume energy resource. Lastly, the message takes up the corresponding communication channel when transferred to consume the network bandwidth. So, the number of copies of message can reflect the network resource overhead. The cost cannot be estimated directly, but it is usually related to the number of the copies of message in the network. If there are more copies of message in the network, the link cost will be higher. Conversely, the link cost is lower. So, the number of copies of message can reflect the link cost of the message in the transfer process.

Node density impact on the performance of the algorithm
To consider the density of nodes in a network impact on the performance of the algorithm, it needs to maintain the three values of the social selfishness of node, the movement speed of node and the buffer size of node not to change. Then we observe how the message delivery rate and the message transmission delay change when increasing the number of nodes in the network. From the simulation experiment, we know how the message delivery rate and the message transmission delay change with the corresponding increase in the number of nodes in the network, as showed in the Figures 3 and 4.
From the simulation results, all of the algorithms show that the message delivery rate is improving along with the corresponding increase in the number of nodes in the network. With the number of nodes in the network up to a certain number, the message delivery rate is no longer to grow along with the increase of the number of nodes. For Epidemic, its message delivery rate is the highest initially and the growth rate of its message delivery rate is largest along with the increase in the number of nodes. But if the number of nodes reaches up to 60, the message delivery rate begins to fall. For several other routing protocols, along with the growth in the number of nodes, its message delivery rate continues to increase. When the number of nodes is up to the peak, the rate is unchanged basically. The DTSNS algorithm proposed in this paper compared with Epidemic algorithm has lower message delivery rate at the beginning process. But, the message delivery rate of DTSNS algorithm has larger rate when the number of nodes reaches up to 80. It indicates that DTSNS algorithm is suitable for large-scale wireless sensor network. DTSNS algorithm compared with other two algorithms has a higher message delivery rate.
From Figure 4, we can obtain that with the increase of the number of nodes, the corresponding message transmission delay will reduce, and the delay will no longer continue to reduce when the number of nodes is more than a certain value. When the number of nodes in the network is more than 60, the message transmission delay of Epidemic algorithm will increase instead. And for several other algorithms, the message transmission delay will remain unchanged. The delay value of DTSNS algorithm proposed in this paper remains unchanged all the time, because of the control of the number of copies of message. For DTSNS algorithm, with the increase of the number of nodes in the network, the number of copies of message is not to continue to increase. As a result, this algorithm has higher message delivery rate and lower message transmission delay.

Node cache impact on the performance of the algorithm
The buffer size of the node will affect the performance of the routing algorithm. To observe the performance of the algorithms under the cases that the buffer size of the node is with different values, we retain the parameters of the number of nodes, the selfishness of the network and the movement speed of node unchanged. With the increase in the buffer capacity of the network nodes, the results of the corresponding message delivery rate and message transmission delay are shown in Figures 5 and 6.
From Figure 5, we know that the message delivery rate of each algorithm increases with the growth in the buffer capacity of nodes. Epidemic algorithm is most easily affected by the buffer capacity of nodes. As the buffer capacity of nodes is small, the rate of message delivered is relatively low but the rate of message delivered is more than that of three other protocols with the growth in the buffer capacity of nodes. For DTSNS algorithm proposed in this paper, its message delivery rate is the largest when the buffer capacity is small. And its message delivery rate is only lower than that of Epidemic algorithm when the buffer capacity is large. It can be seen from Figure 6 that the result of message transmission delay is just opposite. Above all, it can show how each algorithm depends on network resource. Epidemic algorithm depends largely on the network resource compared with others. In other words, with sufficient network resource its performance is best; otherwise, its performance is worst. For DTSNS algorithm, no matter the case of the buffer capacity, whether big or small, it still has good performance.

Selfishness of network impact on the performance of the algorithm
To observe how the nodes with social selfishness affect the network performance, we will apply these algorithms in the networks with different social selfishness. Meanwhile, we retain the parameters of the buffer size of nodes, the movement speed of nodes and the number of nodes unchanged to see the relationship between message delivery rate and message transmission delay and the selfishness of network, as showed in Figures 7 and 8.
From Figure 7, the message delivery rate of each algorithm drops sharply with the increase in the selfishness of network. In comparison, the performance of DTSNS routing protocol is the best, while the performance of Spray and Wait routing protocol is the worst. Buffer size*100 Message delivery rate

Epidemic
Spray and Wait Spray and Focus DTSNS Figure 5 The message delivery rate changes with the buffer capacity of nodes. network, the average credibility of the corresponding nodes will reduce and the number of the nodes willing to forward data for other nodes will be less. Thus, it reduces the network connectivity. Because of reducing the network connectivity, for all algorithms, it will reduce their message delivery rate and increase their message transmission delay. For Epidemic algorithm using the strategy of flooding, the message delivery rate is relatively lower compared with other routing algorithms. For Spray and Wait algorithm and Spray and Focus algorithm, at the stage of spreading the copies, if the copies of message are injected to the nodes with the small credibility, then it will seriously hinder the spread of the copies in the network. Due to the number of the actual valid message reduced, the message delivery rate will be very low. For DTSNS routing protocol, at the stage of the injection of copies of message, we adopt the injection scheme based on node credibility threshold to make copies of message effectively injected.

The number of copies of message produced by different algorithms
Network overhead is another indicator to measure network performance. For the network overhead of algorithms, we usually use the number of the total copies of message to describe. The number of copies of each algorithm is shown in Figure 9. From Figure 9, the number of copies of all algorithms is increasing with the passage of time, but for Epidemic algorithm, its copies of message spread the fastest, presenting exponential growth. For Spray and Wait algorithm and Spray and Focus algorithm, they have the limit for the maximum number of copies of message in the network. With the number of copies of message growing to a certain stage, the value will be a constant. For DTSNS routing protocol, its number of copies of message is controlled by the delay constraints to self adapt to have the smallest number of copies. More copies of message consume more network resource seriously. This is because the copies of message take up the node cache and need a large amount of energy and network bandwidth at the forwarding process. For DTSN which has limited network resource, the overhead is unbearable. So, though Epidemic algorithm has good performance, it is rarely used in practical application.

Conclusion
With very broad application prospects, DTSN appeal to a large number of researchers to conduct study. The resource of DTSN is limited and intermittently connected to make DTSN difficult. It is of great significance that we can reduce the transmission delay and communication overhead on the basis of ensuring the message delivery rate and, in the case of the network nodes, with social selfishness. We propose a routing delay-tolerant algorithm based on selfishness of nodes in this paper. Firstly, we divide the activity area into grids. Secondly, we predict the next mobile location of the nodes based on the Markov process and estimate the credibility of the nodes based on the node selfishness in order to reduce the network communication overhead and improve message delivery success rate. Then, we put forward the delay-tolerant algorithm based on delay constraints. Lastly, the simulation results show that the new algorithm, which is compared with Spray and Focus, Spray and Wait, and Epidemic, is of higher level of datasubmitting rate and lower latency and resource consumption of the messages in the large-scale network.