Data Dissemination in Wireless Sensor Networks with Network Coding

. In wireless sensor networks (WSNs), it is often necessary to update the software running on sensors, which requires reliable dissemination of large data objects to each sensor with energy e ﬃ ciency. During data dissemination, due to sleep scheduling designed for energy e ﬃ ciency, some sensors may not receive some packets at some time slots. In the meantime, due to the unreliability of wireless communication, a sensor may not successfully receive a packet even when it is in the active mode. Thus, retransmission of such packets to those sensors is necessary, which consumes more energy and increases the delay of data dissemination cycle. In this paper, we propose a network coding-based approach in data dissemination such that data dissemination can be accomplished at the earliest time. Thus, less energy is consumed and the delay can be decreased. The impact of packet loss probability and the sleep probability of sensors on the network coding gain is analyzed. A threshold is also given to decide whether the current sleep scheduling is e ﬀ ective on energy saving in data dissemination process or not. Simulation results demonstrate the e ﬀ ectiveness and scalability of the proposed work.


Introduction
Recently, more research attention has been directed towards wireless sensor networks.Once deployed, sensors are expected to operate for extended periods of time, and it is impractical to physically reach all sensors.However, it is quite often necessary to update the software running on those sensors or add new functionality to the sensors [1][2][3].Reprogramming the network needs to reliably disseminate large data objects (50-100 KB) to every sensor in the network with energy efficiency [2].
Protocols for reliably disseminating large data objects in WSNs have been developed over years.Protocols in [1][2][3][4] achieve data dissemination reliability through different mechanisms such as hop-by-hop recovery, NACKs or ACKs mechanisms, while another requirement of disseminating large objects in WSNs, energy efficiency, has not been well studied.
In WSNs, energy consumption is a critical issue and sleep scheduling has been well studied as a conservative approach to minimize the energy consumption due to idle listening [5,6].Though sleep scheduling can save energy, sensors in sleep mode cannot receive data packets.In addition, due to the unreliability of wireless communication, a sensor may not receive the packet successfully even when it is in active mode [7].Hence, a data packet may be transmitted several times in order to be disseminated to all sensors, which wastes energy and increases the delay of the whole data dissemination process.In other words, the data dissemination process consists of sending native data packets and recovering "wanted" packets that each sensor has not received due to sleep scheduling and/or link unreliability.In order to complete the data dissemination process in a timely manner and achieve energy efficiency, it is crucial to assure that the maximum number of "wanted" packets at all sensors can be recovered at each time slot.
Recently, network coding has become a promising approach to improve the system throughput in wireless networks.Network coding with XORs operation in wireless broadcast has been studied in [8], which shows the advantage of the proposed XORs coding scheme over the traditional wireless broadcast in the bandwidth efficiency through simulations and theoretical analysis.In XORs coding, a coded packet carries both the coding vector information and the encoded data.Thus, upon receiving a coded packet, the receiver knows which packets are encoded together and how to decode the packet with the available packets at the receiver.The work in [9] has proved that optimal XORs encoding decision for wireless broadcast, which decides the coding vector of each coded packet, is an NP-hard problem.Heuristic algorithms of encoding decision problem for wireless broadcast and multicast are proposed in [9,10].However, the proposed encoding decision approach can only be applied to the scenario where all receivers remain active during the whole time period of recovery.Such an approach can not be applied to WSNs with sleep scheduling because different sets of active sensors may be available at different time slots.
In this paper, given the sleep scheduling information at the sensors, we aim to determine an effective XORs encoding strategy such that the minimum number of transmissions is required in order for each sensor in the network to successfully receive the whole set of disseminated data packets.Thus, energy consumption can be reduced and the data dissemination process can be accomplished in a timely manner.To achieve such an objective, it is important to maximize the expected number of active sensors that can decode out one "wanted" data packet at each time slot in the recovery process, which is the focus of this paper.The contribution of the proposed work is summarized as follows.
(i) The proposed work takes both link unreliability and sleep scheduling into consideration and proposes an XORs encoding decision algorithm to maximize the expected number of active sensors that can decode out one native packet in their "wanted" data packet sets at each time slot in the recovery process.
(ii) We analyze the impact of each link's packet loss probability and each sensor's sleep probability at each time slot on the network coding gain, which is an extension of the analysis given in [8].
(iii) We also study the effectiveness of sleep scheduling on energy saving, which is offsetted by the total number of active time slots consumed in the data dissemination process.A threshold is derived to decide whether the current sleep scheduling is effective on energy saving or not.The simulation results also confirm the accuracy of our analysis.
The rest of the paper is organized as follows.Related work is reviewed in Section 2. Section 3 introduces the system architecture and data dissemination schemes.The problem description and its complexity is presented in Section 4. Section 5 describes the algorithm design.Theoretical analysis is given in Section 6. Section 7 gives the simulation results.Finally, we conclude the paper in Section 8.

Related Work
In this section, we review the related work of network coding in WSNs.Network coding is originally proposed in information theory [11] and recently has become a promising approach to improve the system throughput in wireless networks [11][12][13][14][15][16].Adaptive network coding is proposed in [17] to reduce traffic in the process of software updates where linear network coding technique is used.As computation ability and the memory at sensor nodes are very limited, the complexity of linear encoding and decoding introduces extra overhead.Thus, it is more appropriate to use XORs operation in WSNs since both encoding and decoding operations are much simpler.In fact, XORs coding has been widely used in wireless networks to reduce the complexity of linear network coding [8,10,18,19].COPE proposed in [18] improves the throughput of unicast with XORs coding.By exploiting the broadcast nature of wireless medium, each node buffers overheard packets for a short time and notifies its neighbors which packets it has heard.When a node transmits a packet, it uses its knowledge of what its neighbors have heard to perform opportunistic coding and XORs multiple packets to transmit them as a single packet while ensuring that each intended next-hop has enough information to decode the encoded packet.
Network coding with XORs operation in wireless broadcast has also been studied in [8], which shows the advantage of the proposed network coding scheme over traditional wireless broadcast in bandwidth efficiency through simulations and theoretical analysis.However, encoding decision has not been given in [8].The work in [9] has proved that optimal XORs encoding decision problem for wireless broadcast is an NP-hard problem.
Several heuristic algorithms for encoding decision in wireless broadcast and multicast have been proposed in [9,10].With the knowledge of the "wanted" packet set at each receiver, an auxiliary graph is constructed.The encoding decision during the recovery process is then converted to a clique partition problem in the auxiliary graph.However, the proposed encoding decision algorithms can only be applied to the scenario where all receivers remain active during the time period of recovery.Such an approach cannot be applied to WSNs since different set of active sensors may be available at different time slots.Thus, encoding decision in WSNs with sleep scheduling cannot be converted into finding a minimum clique partition in the graph.
The work in [20] proposes a retransmission scheme, which only uses reception estimation to determine the coding set selection.However, the reception estimation at the source node may not be accurate enough, consequently, some receivers may not be able to decode useful information from the coded packet and more retransmissions will be needed.In addition, the coding decision based on reception estimation does not consider the impact of sleep scheduling, which affects the decoding probability at the receivers in low duty-cycled WSNs.In this paper, we propose to use XORs coding in data dissemination in a large scale WSN which is organized as a multihop cluster hierarchy [21].A multihop cluster hierarchical architecture consists of multiple layers as shown in Figure 1.In the lowest layer, all the nodes in the network are grouped into clusters.In addition, besides being a member in a cluster, a node may act as a cluster head in a down layer cluster, for example, p 2 in the figure.Within each cluster, the cluster head communicates with its member sensors in a one-hop fashion [22].We also assume that each sensor is aware of its one-hop neighbors' sleep scheduling and the reliability of the wireless links between the sensor to its neighbors.This can be easily accomplished by one-hop information exchange and link loss inference [23].

System Architecture and Data Dissemination with Network Coding
Our data dissemination process is conducted at each cluster head so as to make sure that finally all the sensors obtain the updating packets.In a multihop cluster hierarchy, if a cluster head in an intermediate layer starts to transmit the received packet immediately after receiving one fresh packet, the gain of network coding cannot be fully utilized.On the other hand, if a cluster head waits and starts to transmit packets until it receives all packets from the cluster head in the upper layer, it will waste bandwidth and introduce extra delay.In order to achieve the balance between bandwidth efficiency and network coding gain, we propose to use a threshold α to determine when the current cluster head starts to transmit the packets to its member nodes.Specifically, for each cluster head, after obtaining αM fresh native packets, where 0 < α ≤ 1 and M is the number of native packets available at its upper-layer cluster head, it will conduct XORs coding scheme to transmit the packets to its member nodes.In the simulation part, we will study the impact of the threshold α on the delay and energy consumption.
In the rest of the paper, we focus on how a cluster head encodes the packets and transmits them to its member sensors.The coding decision at other cluster heads can use the same approach.
As we mentioned earlier, the data dissemination process consists of sending native data packets and recovering "wanted" packets for each receiver.We now give an example to show that network coding can indeed recover "wanted" packets for all neighbors more efficiently.
Suppose that four packets d 1 , d 2 , d 3 , and d 4 need to be transmitted to sensors p 1 , p 2 , p 3 and p 4 as shown in Figure 2.
The sleep scheduling at each receiver is given in Figure 2(a) where 1 denotes that this sensor is active at the current time slot, otherwise, it is in sleep mode.For the sake of simplicity, in this example, we assume that no packet is lost due to unreliable wireless communication, which means that a sensor can receive a packet successfully when it is in active mode.We also assume that an active sensor can only transmit or receive one packet at each time slot [5].We show that different data dissemination approaches will lead to different finishing time of data dissemination.
(i) Without network coding, 4 native packets will be sent firstly, followed by sending native packets to recover "wanted" packets at sensors. Figure 2(b) gives the "wanted" data packet set at each sensor after 4 native packets are sent out.Without network coding, it will take 10 time slots to finish the data dissemination process as shown in Figure 2(c).(ii) With network coding, 4 native packets can be sent at first followed by sending encoded packets to recover "wanted" packets at sensors.Assume that our coding strategy at each time slot is to maximize the number of active receivers that can decode the encoded packet.For example, at time t 5 , if d 1 ⊕ d 2 is sent, all four receivers can obtain a "wanted" packet by . Eventually, it will take 8 time slots to finish the data dissemination process as shown in Figure 2(d).Under such a data dissemination approach, as all native packets are sent at first, the available packets at sensors are most diversified.Thus, the best network coding gain can be achieved.This, however, means that each sensor needs to buffer all received native packets in order to decode out "wanted" packets, which might not be feasible in a WSN due to limited memories at sensors.(iii) An alternative approach will be to divide the data dissemination process into several batches where in each batch, M native packets are sent followed by the recovering process [24].Once all M native packets are received by all sensors in the cluster, the cluster head proceeds to transmit the following batch of packets.The data dissemination is accomplished when all batches of packets are obtained by all sensor nodes in the network.In Figure 2(e), we send two native   packets at first, followed by sending encoded packets to recover "wanted" packets of the first batch at sensors, then send the last two native packets followed by sending encoded packets to recover "wanted" packets of the second batch at sensors.It takes 9 time slots to finish the data dissemination process.
We now discuss how the cluster head can maintain "wanted" packet set at each member sensor.After sending out a packet, the cluster head needs to collect the "wanted" packet set at each member sensor.In order to reduce ACKs implosion, only the active receivers that have received a packet at current time slot successfully and can obtain/decode one "wanted" packet from the received packet will send an ACK message to the cluster head.Thus, according to ACKs from receivers, the cluster head can derive the "wanted" packet set for each active receiver.
With the information of "wanted" packet set of each receiver at each time slot in the recovery process, an encoding decision which aims to maximize the expected number of active sensors that can decode out one "wanted" packet at current time slot will be introduced in the following section.

Problem Description and Complexity
In this section, we first describe the encoding decision problem that aims to decide which native packets should be encoded at each time slot t in the recovering process such that the maximum expected number of active sensors at time slot t can decode out one "wanted" native packet.Thus, we limit our discussion to the recovery process of one data dissemination batch in a cluster, which can also be applied to other batches in all other clusters.
Suppose that D = {d 1 , d 2 , . . ., d M } is the set of data packets in a batch which need to be disseminated to all the sensors in a cluster.Let P t = {p i1 , p i2 , . . ., p il } be the set of active member sensors in the cluster at tth time slot At each time slot, the cluster head can obtain its neighbor sensors' "wanted" packet set based on ACKs feedback.Let r i, j be 1 if packet d j is not available at active sensor p i at current time slot where d j ∈ D, otherwise, let it be 0. Let R(p i ) = {d j | r i, j = 1 and p i ∈ P t } be the "wanted" data packet set of active sensor p i at current time slot t as shown in Figure 2(b).Assume that l i is the probability that sensor p i can not successfully receive a packet from the cluster head when p i is in active mode.
Let a j be 1 if native packet d j ∈ D is combined in current encoded packet, otherwise, let it be 0. Let c i, j be 1 if active sensor p i can decode out one "wanted" native packet d j from the current encoded packet where d j ∈ R(p i ), otherwise, let it be 0. Considering unreliable wireless communication, the probability that an active sensor p i can successfully obtain one "wanted" packet at the current time slot is M j=1 c i, j (1−l i ).Thus, at current time slot, the expected number of sensors that can decode out one "wanted" packet is i∈{i|pi∈Pt} M j=1 c i, j (1 − l i ), which needs to be maximized in order to save energy.
Still take Figure 2(d) as an example, after t 4 , the cluster head starts to recover the "wanted" packets at its member sensors.At t 5 , if the cluster head sends an encoded packet d 1 ⊕ d 2 , in an ideal condition where no packet will be lost, active receivers p 1 , p 2 , p 3 , p 4 can decode out one "wanted" packet by d 1 ⊕(d 1 ⊕d 2 ) or d 2 ⊕(d 1 ⊕d 2 ).Assume that l 1 = 0.1, l 2 = 0.2, l 3 = 0.3, l 4 = 0.15 in a practical wireless network where the probability of successfully receiving a packet at p 1 , p 2 , p 3 , and p 4 is 0.9, 0.8, 0.7 and 0.85 respectively due to unreliable wireless communication.Thus, the expected number of active receivers that can decode out one "wanted" packet after receiving the current encoded packet d 1 ⊕ d 2 is 0.9 * 1 + 0.8 * 1 + 0.7 * 1 + 0.85 * 1 = 3.25, which is maximum at the current time slot.Thus, the cluster head will send out d 1 ⊕ d 2 at the current time slot.In this paper, such an encoding decision problem using XORs coding is referred to as network coding based data dissemination (NCDD) problem.

Problem Formulation.
We can formally formulate the NCDD problem at time slot t in the recovery process as follows: In the above formulation, the term of the objective represents the expected number of active receivers that can decode out one "wanted" data packet from the encoded packet at the current time slot.Equations ( 2) and ( 6) ensure that each receiver can only decode out at most one "wanted" native packet from the encoded packet.Equations ( 3) and (4) give two requirements that active receiver p i can decode out one "wanted" packet d j : (1) packet d j is in p i 's "wanted" packet set and d j is participated in the encoded packet; (2) all other combined native packets except d j in the encoded packet have already been successfully received by receiver p i .Equation (5) guarantees that if packet d j is available at all active receivers at current time slot t, d j must not be combined into the encoded packet.
Proof.We prove the theorem by a reduction from MAXI-MUM ONE-IN-THREE SAT problem which is a well known NP-hard problem in the strong sense.
MAXIMUM ONE-IN-THREE SAT: We are given a set U = {u 1 , u 2 , . . ., u M } of M boolean variables and a collection C = {c 1 , c 2 , . . ., c n } of clauses with exactly three literals.Each of these clauses is a boolean formula and it is true if and only if exactly one of its three literals is true.Without loss of generality, we assume that the three literals in c i are {u i1 , u i2 , u i3 }.The objective of MAXIMUM ONE-IN-THREE SAT is to find a truth assignment such that the maximum number of clauses is true.We use OPT s to denote the optimal solution of this problem.
Given an instance of MAXIMUM ONE-IN-THREE SAT, we can construct an instance of the decision version of the NCDD problem in polynomial time as follows.Let there be M data packets needed to be disseminated from the cluster head to n receiver nodes.If u j = 1, packet d j is participated in encoding, otherwise, d j is not participated in encoding.For each clause c i , if u j is a literal of c i , then d j is a "wanted" packet at p i .In other words, each sensor p i has lost exactly three packets and has all other packets.Let the probability that an active sensor can successfully receive a packet be 100%.Then, our objective is to maximize In this assignment, c i also has a true value.So, we have OPT p ≤ OPT s .
The above analysis shows that OPT p = OPT s .Thus NCDD problem is NP-hard.

Algorithm for NCDD Problem
In this section, we first introduce an auxiliary graph in which each vertex is assigned a weight.We then show that the proposed NCDD problem can be converted into finding a maximum weight clique problem in the auxiliary graph, based on which we develop a heuristic algorithm for the NCDD problem.

Model Design.
At any tth time slot, let R(p i ) ⊆ D be the set of packets "wanted" by p i and H(p i ) ⊆ D be the set of packets received by p i .We can construct an auxiliary graph G(V , E) similar to [9] where V = {v i, j | d j ∈ R(p i ) and p i ∈ P t }, which means that every "wanted" packet of each active sensor has a vertex in G. Considering two receivers p i1 and p i2 , if they have lost the same packet d j , then they can both recover d j if only native packet d j is encoded at current time slot.We use a link e ∈ E between v i1, j and v i2, j to denote such recoverability.If d j1 is a "wanted" packet of p i1 and d j1 ∈ H(p i2 ), while d j2 is a "wanted" packet of p i2 and d j2 ∈ H(p i1 ), then p i1 can recover d j1 when it receives d j1 ⊕ d j2 and p i2 can recover d j2 when it receives d j1 ⊕ d j2 .We use a link e ∈ E between v i1, j1 and v i2, j2 to denote such recoverability.In other words, where p i1 , p i2 ∈ P t .For a clique Q = {v i1, j1 , v i2, j2 , . . ., v ik, jk } in the graph, let P = {p i |v i, j ∈ Q, 1 ≤ j ≤ M} be the sensors which have "wanted" packets in Q and D = {d j | v i, j ∈ Q, 1 ≤ i ≤ n} be the set of "wanted" packets of those sensors in Q. Suppose that there are m packets in D .For any vertex v i, j ∈ Q, according to the edge assignment of G, p i must have already successfully obtained the packets in D − {v j } but still requires packet v j .Thus, if d j1 ⊕ d j2 ⊕ • • • ⊕ d j m where d j1 , d j2 , . . ., d j m ∈ D are encoded and sent at tth time slot, each sensor in P will be able to decode out one "wanted" packet if the encoded packet can be successfully received by all sensors in P .To consider the unreliability of wireless communication, we assign weight w i, j = 1 − l i in the vertex v i, j for any j ∈ {j | v i, j ∈ V }.Then the weight for clique Q which is defined in is equivalent to the expected number of active sensors which can successfully decode out one "wanted" packet if all packets in D are encoded together.Thus, our NCDD problem which aims to maximize the expected number of active sensors that can decode out one "wanted" packet is converted into finding a maximum weight clique in graph G.For example, after the whole 4 native packets are sent, the "wanted" packet set in Figure 2(b) can be constructed into Figure 3. Thus, the encoding decision for recovery process at 0.9 v 1,1 0.9 t 5 is then converted into finding a maximum weight clique in such a graph.As shown in Figure 3, the clique that consists of {v 1,1 , v 2,2 , v 3,1 , v 4,1 } is the clique with the maximum weight 0.9 + 0.8 + 0.7 + 0.85 = 3.25.After the encoded packet

Algorithm Design.
Assume that the total number of vertices in G(V , E) is N.We first sort all vertices into nonincreasing order according to w i, j .For the example given in Figure 3, vertices in G will be sorted into For the simplicity of presentation, we abuse the notation a little bit and assign a unique id v k for each vertex in G, which uses one-dimensional subscript for vertices in G instead of using two-dimensional subscripts.Correspondingly, we use w k to denote the weight of v k .Thus, for the example given in Figure 3, we have Without loss of generality, we assume that Let Q i be the clique with maximum weight in the subgraph which only contains vertices of S i = {v i , v i+1 , . . ., v N } and let C(Q i ) be the weight of clique Q i .In other words, Q i represents the maximum weight clique the algorithm has found considering of the subgraph consisting of vertices {v i , v i1 , . . ., v N }.The algorithm starts with i = N and iteratively considers more vertices until all vertices in G are considered.The algorithm stops when Q 1 is found.
When we consider vertex v i−1 , there are two cases.
and update S i−1 , that is, S i−1 = S i−1 ∩ N(v j ).If S i−1 is still not ∅, we then add another vertex whose index is the smallest in S i−1 into the clique Q i−1 .We repeat this process until there is no vertex in S i−1 , that is, S i−1 = ∅.By comparing the weight of the clique Q i without including v i−1 and the weight of the clique Q i−1 including v i−1 , the clique Q i−1 with maximum weight in the subgraph including vertices in {v i−1 , v i , . . ., v N } is set to be the one with the larger weight.The detail of the algorithm is given in Algorithm 1.After this algorithm, Q 1 gives all vertices in the found maximum weight clique.All native packets involved in Q 1 will be encoded together and be sent out at current time slot.
We now show how to find the maximum weight clique of the graph shown in Figure 3. Assume that Q 2 has been found, which consists of {v 2 , v 4 , v 6 }.Next, we will consider Q 1 .Since {v 1 } ∪ Q 2 is not a clique, we need to find Q 1 which includes vertex v 1 in the subgraph consisting of S 1 = {v 1 , v 2 , . . ., v 9 }.The corresponding steps for finding such Q 1 is given in Algorithm 2 where v k (v i1, j1 ) in V denotes that we use a unique id v k in the algorithm to replace the original vertex v i1, j1 .After Q 1 is found, we compare it with Q 2 which has the weight

Analysis
In this section, we firstly analyze the impact of packet loss probability and sleep probability on network coding gain.Then, we derive a threshold to decide whether the current sleep scheduling can save energy compared with no sleep scheduling.We only limit the analysis to one cluster in the multihop cluster hierarchy.

Impact of Packet Loss Probability and Sleep Probability on Network Coding Gain.
Suppose that N a is the number of transmissions that the data dissemination process requires without coding and N b is the number of transmissions required with XORs coding.Assume that the probability that receiver p i is in sleep mode is s i at each time slot, and l i is the probability that receiver p i can not successfully receive a packet even when it is in active mode due to unreliable wireless communication.We have the following two lemmas.

Lemma 1. The total number of transmissions without coding required for transmitting sufficient large M packets to n receivers is
where i 1 , i 2 , . . ., i n ∈ {0, 1} and ∃i j / = 0, Proof.See Appendix A.

Lemma 2. The total number of transmissions with XORs coding for transmitting sufficient large M packets to n receivers is
Proof.See Appendix B.
With the analytical result of N a and N b , we can define analytical network coding gain as Take two receivers as an example, assume that l 1 = 0.1, l 2 = 0.25, s 1 = 0.15, s 2 = 0.05 and M is sufficient large.According to ( 8) and ( 9), we can calculate that N a = 1.6382M, N b = 1.4035M.Then, the analytical network coding gain is γ = 0.1433.From Lemmas 1 and 2, we can also obtain the following corollary.

Corollary 1. With two receivers, the maximum network coding gain γ can be achieved if
Algorithm 2: The steps of finding Q 1 .

Impact of Sleep Probability on Energy Consumption.
Though sleep scheduling can save energy consumption due to idle listening, sensors in sleep mode cannot receive data packets, which imposes retransmission and may consume more energy.If sensor p i is active at tth time slot, we say that tth time slot is an active time slot for sensor p i .We know that only at its active time slot, sensor p i consumes its energy.Thus, we can use the total number of active time slots consumed for the sensors to successfully receive the whole set of packets as the energy consumption for data dissemination.We define a threshold as follows: where l min = 1 − max i∈{1,2,...,n} {l i }.
Then, we have the following lemma.
Lemma 3. In XORs coding, if ε < 0, the current sleep scheduling can save energy consumed by idle listening; otherwise, the current sleep scheduling has no contribution to energy saving.
Proof.See Appendix D.
Take two receivers with l 1 = 0.23, s 1 = 0.15, l 2 = 0.27, s 2 = 0.18 as an example, according to (11), we have ε > 0. Thus, the energy saving with sleep scheduling is offsetted by more retransmissions.In this case, the cluster head should wake up more sensors.An interesting problem is how to design an optimal sleep scheduling such that energy saving of sleep scheduling will not be offsetted by more retransmission, which is out of the scope of this paper.

Simulation Results
In this section, we demonstrate the effectiveness of our dissemination schemes through simulations using C++ simulator.In our simulations, a multihop cluster hierarchical WSN is randomly generated with the fixed value of the number of sensors if without specification.We group the packets required to send into batches, and each batch has M packets.Recovery process with network coding starts after every M native packets are transmitted.In a cluster, we randomly generate sensor p i 's sleep scheduling according to its sleeping probability s i .
To demonstrate the advantage of our coding scheme, we introduce two baseline algorithms, namely, dissemination without coding algorithm and dissemination with random coding algorithm.Dissemination without coding algorithm randomly transmits a native "wanted" packet at each time slot until all receivers obtain their "wanted" data packets while dissemination with random coding algorithm transmits an XORs packet which is randomly generated at each time slot until all receivers obtain their "wanted" packets.
In the simulation, we are interested in evaluating the performance of our coding schemes from the following perspectives.
(i) The number of active receivers that can obtain a new "wanted" packet at one time slot and the total number of transmissions required in one batch data dissemination within one cluster.(ii) The impact of the number of receiver sensors n, batch size M, sleep probability and packet loss probability on the network coding gain under different dissemination schemes within one cluster.(iii) How close the performance of our proposed algorithms is to the derived analytical results within one cluster.(iv) The impact of the threshold α on the delay and the total number of transmissions required in a multihop cluster hierarchy.
For each setting, we simulate 150 instances and report the average performance.

Comparison with Different Data Dissemination Schemes.
The effectiveness of our coding scheme for maximizing the expected number of sensors that can obtain one "wanted" packet at one time slot is demonstrated by comparing with dissemination without coding algorithm and dissemination with random coding algorithm.
We evaluate the performance of our algorithms by varying the number of active sensors within a cluster at one time slot in the range of [10,40] for M = 50, and l i = 0.2.As shown in Figure 4, the number of active sensors that can obtain one "wanted" packet by our coding scheme is much more than that by dissemination without coding algorithm and dissemination with random coding algorithm.
For one batch data dissemination process within a cluster, to demonstrate the performance of our coding scheme, the total number of transmissions required is also compared with the other two baseline algorithms: dissemination without coding and dissemination with random coding algorithms.We vary the number of packets needed to be sent in the range of [60, 100] for n = 10, s i = 0.3, l i = 0.2.As shown in Figure 5, the total number of transmissions required in one batch dissemination by our coding scheme is much less than that by dissemination without coding and dissemination with random coding algorithms.Hence, for data dissemination with a large set of packets, our XORs coding scheme can efficiently decrease the number of transmissions required.Thus, more energy can be saved.

Network Coding Gain Comparison with Analytical Results.
We demonstrate the effectiveness of the proposed network coding algorithm by comparing the network coding gain obtained through simulation with the analytical network coding gain.We start with a simple experiment where there are only two members sensors in a cluster.We fix l 1 to 0.2 and vary l 2 in the range of [0.1, 0.4] for n = 2, s i = 0.3, M = 100.As shown in Figure 6(a), the network coding gain obtained by our simulation follows the same trend as the analytical results.In addition, the maximum network coding gain is achieved when l 1 = l 2 = 1−(1−0.2)(1−0.3)= 0.44 with both our simulation results and analytical results, which verifies Corollary 1.When l 1 = l 2 , most likely the "wanted" packets at one receiver are the packets available at another receiver, thus, coding opportunity is high, which achieves maximum network coding gain.
We also extend the simulation to 10 receivers in a cluster.The loss probability of p 1 is varied along the x-axis for M = 100; s i = 0.2, 1 ≤ i ≤ 5; s i = 0.3, 6 ≤ i ≤ 10 and l 2 = l 3 = l 1 + 0.02, l 4 = l 5 = l 1 + 0.04, l 6 = l 7 = l 1 + 0.06, l 8 = l 1 + 0.08, l 9 = l 10 = l 1 + 0.1.As shown in Figure 6(b), the simulation results are very close to the analytical results.In addition, Figure 6 verifies that network coding indeed can bring gains on reducing the number of transmissions required.
In Figure 7, we vary the sleep probability at sensors, similar results as Figure 6 can be observed and the network coding gain obtained through simulations is quite close to the analytical results.

The Impact of Sleep Scheduling on Energy
Saving.We now study the impact of sleep scheduling on the energy consumption.Our simulation is conducted within one cluster.We use the total number of active time slots consumed to denote the energy consumption in data dissemination process.
Suppose that XORs coding is applied.Let η s be the total number of active time slots consumed for data dissemination with sleep scheduling and η ns be the total number of transmissions for data dissemination without sleep scheduling.The energy saving in XORs coding with sleep scheduling over that without sleep scheduling is For data dissemination without coding, we can define energy saving with sleep scheduling over that without sleep scheduling in a similar way.
As shown in Figure 8, the simulation results are very close to the analytical results.
For our XORs coding, from the figure, we know that the energy consumption with sleep scheduling is less than that without sleep scheduling when s 2 is less than 0.15.When s 2 = 0.15, the energy consumption with sleep scheduling is equal to that without sleep scheduling.When s 2 is larger than 0.15, sleep scheduling has no contribution to the energy saving, it even incurs more energy consumption than that without sleep scheduling.This interesting result is plausible since when the number of sleep sensors becomes larger, more retransmissions are required, which imposes more energy consumption.In this case, the energy saving with sleep scheduling is offsetted by more retransmissions, which means that the threshold ε > 0 and the cluster head should wake up more sensors to receive packets in order to save energy.

The Impact of Threshold α on the Delay and the Total Number of Transmissions
Required.We now study the impact of threshold α on the delay of the data dissemination process in a multihop cluster hierarchical WSN.The threshold α is varied in the range of [0.2, 1.0] for M = 30, 40, 50. Figure 9 gives the delay required for data dissemination when the number of layers is 5 and 6, respectively.We can see that the delay increases with the threshold α.This is because the cluster heads need to wait more time before they can transmit their available packets to their members with the increasing of α.Thus, the cluster heads in down layers can do nothing for a long time.Specifically, when α = 1, each cluster head cannot transmit its available packets until receiving all M packets.In this case, concurrent transmissions cannot be allowed even if there is no collision between them, which thus increases the delay.From Figure 9, we can also see that the delay increases with the number of layers, because the number of receivers increases with the number of layers.
We further study the impact of the threshold α on the total number of transmissions required under a multihop cluster hierarchical WSN.The threshold α is also varied in the range of [0.2, 1.0] for M = 30, 40, 50.As shown in Figure 10, the total number of transmissions required decreases with the threshold α.When α is small, the cluster heads transmit the packets to their members more quickly.Therefore, the number of fresh packets available at cluster heads is small, which can not fully utilize the network coding gain.Hence, the total number of transmissions required is more than with larger threshold α.

Conclusion
This paper studies data dissemination in wireless sensor networks with network coding to achieve energy efficiency.In order to quickly complete the whole process of data dissemination, at each time slot in the recovery process, we aim to transmit an encoded packet such that the expected number of active sensors that can decode out one "wanted" packet is maximized.A maximum weight clique model is proposed here to achieve such an objective.We further study the impact of packet loss probability and sleep probability on network coding gain.We also analyze the impact of sleep probability on energy saving gain and derive a threshold which can be used to decide whether the current sleep scheduling is effective on energy saving or not.The simulation results verify the work proposed in the paper.

A. Proof of Lemma 1
According to [8], we can obtain that the total number of transmissions without coding to successfully deliver sufficient large M packets to n receivers is i1,i2,...,in , where each receiver keeps in active mode during data transmission process.
However, in the data dissemination process, receiver sensor p i may be in sleep mode and can not successfully receive a packet.Therefore, the probability that sensor p i can successfully receive the packet at any time slot is (1−l i )(1−s i ).In other words, the probability that sensor p i will lose the packet is 1−(1−l i )(1−s i ).Thus, considering sleep scheduling, the total number of transmissions required without coding is where i 1 , i 2 , . . ., i n ∈ {0, 1} and ∃i j / = 0, l i = 1−(1−l i )(1−s i ).

B. Proof of Lemma 2
From [8], we know that the total number of transmissions with XORs coding to successfully deliver sufficient large M packets to n receivers is M/(1−max i∈{1,2,...,n} {l i }), where each receiver keeps in active mode during the data transmission process.
As in Appendix A , the probability that sensor p i can not successfully receive the packet with sleep scheduling is changed into 1 − (1 − l i )(1 − s i ).Thus, the total number

C. Proof of Corollary 1
With two receivers, from Lemma 1, the total number of transmissions required for M packets without coding is N a = M/(1 − l 1 ) + M/(1 − l 2 ) − M/(1 − l 1 l 2 ), and from Lemma 2, the total number of transmissions with XORs coding is N b = M/ min{1 − l 1 , 1 − l 2 }, where l i = 1 − (1 − l i )(1 − s i ).Without loss of generality, suppose that l 1 ≥ l 2 and l 2 = βl 1 , 0 ≤ β ≤ 1.We have Define a function f (β) = γ with β being the variable.We can easily prove that f (β) is an increasing function.Thus, when β is 1, the value of function f (β) is maximum.That is when l 1 = l 2 , the network coding gain γ is maximum, which proves our Corollary 1.

D. Proof of Lemma 3
From the analysis in the previous section, we can see that the total number of active time slots consumed for data dissemination with XORs coding is where s i is the probability that sensor p i is in sleep mode at each time slot.However, if there is no sleep scheduling at sensors, that is s i = 0, the total number of transmissions for disseminating sufficient large M packets to n receivers with XORs coding is where l min = 1 − max i∈{1,2,...,n} {l i }.
Thus, if (D.8) can be satisfied, the current sleep scheduling must have contribution to save energy compared with no sleep scheduling.
receiver nodes Dissemination with our coding scheme Dissemination with random coding Dissemination without coding

Figure 4 :
Figure 4: The number of receivers that can obtain a new "wanted" packet at a time slot versus the number of receiver nodes.
needed to be sent Dissemination with our coding scheme Dissemination with random coding Dissemination without coding

Figure 5 :
Figure 5: Total number of transmissions versus number of packets needed to be sent.

Figure 7 :
Figure 7: Network coding gain versus sleep probability of sensor.

p 2 15 Figure 8 :
Figure 8: Energy saving versus sleep probability of sensor p 2 .

Figure 9 :
Figure 9: The total delay (time units) versus the threshold α.

Figure 10 :
Figure 10: The total number of transmissions required versus the threshold α.
For a given encoded packet, p i can decode a new native packet if and only if exactly one native packet in R i is encoded into the new encoded one.The problem is to find an encoding strategy to maximize the number of receivers which can decode out one "wanted" packet from the encoded packet.We use OPT p to refer to the result of this objective.(i) Suppose that there is a true assignment for MAX-IMUM ONE-IN-THREE SAT with the maximum number of clauses.If c i is true, there must be exactly one true assignment for {u i1 , u i2 , u i3 }.Without loss of generality, we assume that u i2 is true while u i1 , u i3 are both false.According to the construction of the instance, only d i2 is participated in encoding while neither d i1 nor d i3 is participated in encoding.In other words, only one lost packet of p i is participated in encoding and p i has all other packets involved in encoding, thus, p i can decode out one "wanted" native packet d i2 .Therefore, if there is a clause which is true in the MAXIMUM ONE-IN-THREE SAT problem, there must be a receiver which can obtain a "wanted" native packet.Then, we have OPT s ≤ OPT p .
(ii) Suppose that there is an encoding strategy such that the maximum number of receivers can decode the new native packet.Assume that p i can decode a new native packet d i2 from the encoded one.According to the decoding strategy, the other two "wanted" packets d i1 , d i3 must not be encoded into the new one, that is, u i1 , u i3 both have false assignment while u i2 is true.
the maximum weight clique found in graph G. Vertices in Q 1 indicate that p 1 , p 3 and p 4 lost packet d 1 and p 2 lost packet d 2 .The encoding decision will be to send d 1 ⊕ d 2 .