Our proposed protocol includes different components designed to carry out the following functions:
-
Route discovery and weight assignment.
-
Route selection and reservation.
-
Cluster formation.
-
Collection of cluster interference information.
-
Assignment of activity windows to the clusters.
-
Normal operation.
-
Reconfiguration.
As far as engineering decisions, we selected the mechanisms that compose this proposal based on the following criteria. The route discovery and weight assignment algorithm was borrowed from an existing protocol, as explained in the following section, because it is very well suited for WSN, in which information has to converge into a node with special responsibilities (the sink) and because the relevant protocol assigns weights to the different discovered paths. This notion of path weight, based on the nodes' anticipated traffic load and remaining energy, proved very helpful for our route selection and reservation phase.
Now the concept of cluster, as will be described more clearly in Section 3.2, was selected because the end result of the route discovery phase is the creation of a route tree, rooted at the sink. In this tree, when several branches converge together into a common node, it means that this common node will be in charge of forwarding traffic on behalf of all of the nodes included in those branches. So, we decided to allow the common node to be the head of a cluster and to have the next node in each branch to also be part of the same cluster; this way we can allow these nodes to communicate by scheduling them to be active at the same time as members of the same cluster.
Regarding the assignment of activity windows to the clusters, described in Section 3.4, we chose a staggered approach according to the depth of the cluster heads (CHs), so that information can move forward in every activity window to a node that has not had yet a chance to transmit in the current cycle. The end result is that all the information that is in the nodes' buffers at the beginning of a cycle will be able to reach the sink by the end of the same cycle, thus reducing delay.
We also selected polling as the method to send traffic from cluster members to the CH during the normal operation of the protocol (after the initial configuration) due to the fact that it is a simple protocol that incurs low and deterministic overhead. Moreover, it is perfectly suited for the master-slave relationship that exists between the CH and the other cluster members.
The idea of having periodic reconfigurations has two goals. One of them is to make sure that routing information remains up to date, and the other is to distribute more evenly the energy consumption among nodes. This idea is also borrowed from the LEACH protocol [25].
These functions and the proposed methods to carry them out will be described in the following sections. The names of the different messages used within these methods are composed of a prefix that identifies the protocol phase to which they correspond and a suffix that indicates the specific role played by the message. The prefixes used are R (route discovery), W (weight assignment), RS (reservation), CI (cluster interference information), AW (activity window assignment), and RC (reconfiguration).
3.1. Route discovery and weight assignment
The first steps of our proposed protocol are the discovery of a set of non-overlapping routes from each sensing node to the currently active information sink and the assignment of a weight value to each one of these routes. This procedure is taken from [26], which in turn is a generalization of [27], and consists of four phases. Briefly explained, the algorithm works by having the sink periodically broadcast a route-update message, also called RPRI, whose goal is to allow each sensing node to discover its primary route to the sink. The format of this message is shown in Figure 1. Each node, when receiving the update for the first time, marks the node from which it received the message as its parent and rebroadcasts the message. This continues recursively until every node in the network has received a copy of the message and has identified its parent, which represents the next hop in the primary route to the sink.
The second phase of the relevant routing protocol consists of the transmission of messages of type RALT, used by each sensing node to share with its neighbors the set of alternative routes discovered, either by hearing the route-update message from a node different from its parent or by receiving a RALT message from one of its neighbors. See Figure 1 for a list of fields included in these messages. When the second phase ends, the sensing nodes have discovered all the available minimum-cost disjoint routes to the sink. In our case, the cost is given by the number of hops in the route, but other metrics such as the received power, estimated BER, or level of node conglomeration (anticipated collision rate) could be used as well.
In phase three, each sensing node will send a WPRB (probe) message through each of the routes discovered, including the primary and the alternative ones. When a sensing node receives one of these messages, it will increase a counter of the number of routes that will potentially go through it (num_routes) and will forward the message. The message will eventually reach the sink, which will maintain a table of the paths contained in the received probe messages.
Finally, in phase four, the sink will send back a WRSP (probe response) message through each of the paths stored in its table. Figure 2 displays the format of WPRB and WRSP messages. When a sensing node receives a WRSP message, it will forward the message after tagging it with the proper values in the load and energy bottleneck fields as follows. If the value already stored in the load bottleneck field is less than its expected load (num_routes), the node will set it to num_routes. Similarly, if the value stored in the energy bottleneck field is greater than its remaining energy reserve, it will set it to its available energy. The message will eventually reach the sensing node at the end of the reverse path. When all of its probe messages have been responded or a timer has expired, a sensing node will assign each of its N routes the weight p
i
given by the following expression:
(1)
where ε
i
is energy bottleneck of route i received in related probe response message (or zero if none received), λ
i
is load bottleneck of route i received in related probe response message (or 1 if none received), h
i
number of hops in route i, β ∈ (0, 1) is the factor defining the desired impact of the number of hops on the weight.
Figure 3 graphically describes how this phase of the protocol works.
3.2. Route selection and reservation
The second step is the reservation of resources over one of the existing routes between each sensing node and the currently active sink. Every sensing node will become a traffic source during the normal operation of the system, and the selected route has to be capable of satisfying the QoS requirements of the traffic that will be generated. Routes are reserved in a link-by-link fashion. When a link reservation attempt is successful, both nodes become part of a cluster and the node that agrees to forward the requesting node's traffic becomes the CH. The cluster will grow as more nodes select the same CH to be the next node en route to the sink. As we can see, in this proposal a cluster is defined as a set of one-hop neighboring nodes that need to communicate directly, and the goal is to coordinate the sleeping and waking patterns of all the nodes in a cluster. The cluster ID will be the same as that of the CH. We assume that there is no data traffic flowing from the sink to the sensing nodes, as is typical in WSNs; there may be control traffic flowing in this direction (from sink to sensing node), but it will be in general less intense and non-real-time, thus no reservation is necessary for this traffic.
Notice that a node that agrees to forward other nodes' traffic will in fact be a member of two clusters, one in which it is the head (in charge of collecting data from its members) and another one in which it is just another member (its function in this case is to forward to the new CH the data collected previously plus that generated locally).
Before the actual reservation process begins, the sink will broadcast a RSINT (reservation intention) message, indicating that it is time to start the reservation phase. When one-hop neighbors hear this message, they will in turn send an RSINT message to the sink. As shown in Figure 4, RSINT messages contain the identifiers of the sender and of the intended receiver. The meaning of the RSINT message is that the transmitter has selected the receiver as the potential next node in its path to the sink and the actual reservation request will be transmitted shortly. When a node different from the sink and its one-hop neighbors hears an RSINT message transmitted in its neighborhood, it will store the ID of the transmitter and start/restart a timer. On expiry, the node will randomly select one of its routes, with probabilities proportional to their weights, and will send an RSINT message to the next node in such a route. Notice that route weights are used as a guideline only, to increase the probability that a reservation attempt is successful, but bandwidth availability will be verified. The previous selection will exclude those routes for which the next node has not yet transmitted its RSINT message; the goal here is to avoid cyclic routes. The node will start at this point a new timer that will be restarted every time a new RSINT message is overheard. On expiry, this timer will indicate that the intention phase has ended and it is time to start the actual reservation of resources.
We argue that guaranteeing a certain amount of bandwidth is enough for QoS assurance, i.e., throughput, delay, jitter, and losses due to buffer overflow will be bounded as a consequence, as explained in [28]. This will be discussed in more detail later.
The bandwidth management procedure will be as follows. All nodes start with an available bandwidth equal to the wireless channel's achievable effective data rate R, which depends on the underlying physical layer and medium access policy. For instance, in a polling-based system such as the one assumed in this work, the achievable throughput is about 85% of the channel gross bit rate [28]. In addition, R can be reduced further to account for the expected channel errors, which are present in any transmission medium but are especially common in wireless environments.
The following inequality has to be satisfied at all times:
(2)
In the previous equation Bavail is the bandwidth still available for new reservations, Bcommitted is the total bandwidth that the relevant node has committed to forward on behalf of other nodes, Bown is the traffic that the node itself will generate, and Boverheard is the amount of bandwidth committed in reservations that the node has overheard. Bcommitted is counted twice since the same channel will be used by the node to receive and forward the traffic.
Let us denote by Breq the amount of bandwidth to be reserved by a node, which in general will be equal to Bcommitted + Bown. The formats of the reservation-associated messages are shown in Figure 4. If the reservation-requesting node is only one-hop away from the sink, it will need no more verifications than Equation 2 before sending a RSRQ (reservation request) message to the sink asking if it is able to reserve the necessary bandwidth, and will wait for a response. If, on the other hand, the node is more than one-hop away from the sink, it will only send the reservation request to the next node in the selected route if its own available bandwidth is at least equal to Breq, in anticipation of the reservation that the next node in the path will have to make to further forward the traffic. All the nodes that can overhear the message will also check if their available bandwidth is at least equal to Breq; if not, they will send back a response message indicating that the new reservation is not possible. In turn, the node for which the reservation request message was addressed will check if its own available bandwidth is at least one, two or three times Breq, depending on whether it is the sink or a node one-hop or at least two-hops away from the sink, respectively, again anticipating the reservations that it itself and the next node in the path will have to make to further forward the traffic; if everything goes well, it will send a positive response as expected and will start over to reserve the next link in the route. If a node that had not heard the original reservation request hears the response, it will also verify its available bandwidth and, if it finds that the new reservation represents a problem, it will send a response message to indicate it. When a node sends a negative response, it will include in the Balt field the amount of bandwidth it has available as a way to help make an alternative reservation.
If a reservation fails, the requesting node will try the same reservation request on another route, selected again based on their weights. If all routes fail, it will reduce the amount of bandwidth needed by canceling the agreements previously made with some of its cluster members, if any, and try again on the route with the most available bandwidth. When a node has to select another route because a reservation attempt failed, it will send again an intention message before the actual reservation request. If the next node already sent its own reservation request, it will forward the intention message to the next node in the relevant route. The idea here is that, even if some reservations take longer than expected, the sink should be aware that it has to wait a little longer before passing to the next step in the setup process, which would be the collection of interference information.
The actual reservation procedure will be started at the sensor nodes when a timer expires indicating that a sufficiently long time has passed without overhearing new RSINT messages, as explained above. Those nodes that did not receive intention messages (so-called leaf nodes) will send their RSRQ message immediately and will set Breq to Bown since they do not need to forward any data on behalf of other nodes. If, on the other hand, a node did receive one or more intention messages, which turned it into a potential CH, it will start a new timer to know how long it should wait for the RSRQ messages indicated by the intention messages previously received. In other words, it will not start the reservation of its own resources until it has received all of the expected RSRQ messages or the relevant timer expires. When that happens, it will send a RSRQ message to the next node in its selected route asking for enough resources to send its own traffic plus that of its cluster members; that is, it will set the field Breq to Bcommitted + Bown.
Figure 5 shows a graphic description of the reservation process. In summary, the following set of control messages should be included to reserve bandwidth: RSRQ, RSRP, and RSACK messages. The RSRQ message can be used to set up or tear down a reservation, as mentioned above. Similary, the RSRP message can be used to send a positive or negative response. Notice that not only the intended receiver of an RSRQ message can send a response, but the nodes that overhear the message will also check if their available bandwidth is enough for this new connection and, if not, they will send back a response message indicating that the reservation is not possible. Lastly, the RSACK message can be used by the node that requested to set up or tear down a reservation to acknowledge receipt of the relevant response. The messages exchanged to establish a new reservation have the goal of reaching the whole set of potentially interfering neighbors at each hop. It is sometimes possible that, even if two nodes cannot communicate, they can still cause interference to each other. To take this into account, reservation-related messages can be transmitted with a higher power than all other messages, including those used in the other setup stages and even data frames transmitted during the normal operation of the system. This way, even those nodes that are too far away to be able to correctly receive data frames will be aware of the potential interference that can be caused by a reservation and will take that into consideration.
In addition to listening to these messages to keep track of the bandwidth usage in their neighborhood, nodes will also extract from RSRP and RSACK messages information as to the creation of new clusters that can potentially interfere with them, which is needed in the next protocol step.
3.3. Collection of cluster interference information
Information relative to the potential interference among neighboring clusters has to be collected to enable the scheduling of waking times of the different clusters without damaging overlaps. To do that, at the end of the route reservation phase, the sink will broadcast a CISTART (start of cluster-interference information collection) message that will be flooded over the network so that all nodes know that this new phase has started. These messages only have two fields, mtype set to CISTART and mid set to the current route update cycle number.
Nodes that are not CHs, referred to as leaf nodes since they are at the end of a routing branch, will be the ones starting the collection of cluster interference information. Each of these nodes will send a CIINFO message to its respective CH containing, in the local interf field, a list of the interfering clusters (not individual nodes) it detected, and a zero in the cluster depth field. In these initial messages, the fields Bcommitted, path and cluster interf, shown and described in Figure 6, will be empty indicating the recipient that the sender is not a CH and that the message does not have to be forwarded. When a CH receives from one of its cluster members a CIINFO message in which the path field is either empty or the originating node (first node in the path) is equal to the nid field, meaning that the message was generated by one of its children, it will update the list of interfering clusters detected by the members of its own cluster based on the local interf field just received, and it will update its own depth as the largest depth of its cluster members plus one; it will then forward the message towards the sink as long as the path field is not empty, and will add its own ID to the path. If, on the other hand, the received CIINFO message is such that the path field is not empty and the originating node ID is different from the nid field, meaning that the message was not generated by one of its children, then the node will simply forward the message towards the sink after adding its own ID to the path. When a CH has received a CIINFO message generated by each of its cluster members or a timer expires, it will in turn generate a new CIINFO message in which it will set the path field to its own ID, the cluster depth to its own calculated depth, it will include in the local interf field a list of the interfering clusters it detected directly and in the cluster interf field a list of the interfering clusters detected by its cluster members, except for those already included in the local interf field, and it will set the Bcommitted field to the amount of bandwidth needed to collect the data that the node has committed to forward on behalf of its children.
As the sink receives the CIINFO messages, it will create a temporary array called temp_sched in which clusters will be sorted by their depth. Each array element will contain the cluster ID, its depth, the path to reach it, the bandwidth required by its members and a list of interfering clusters resulting from merging the local interf and cluster interf fields.
Figure 7 shows a summary of the process needed to collect cluster interference information.
3.4. Assignment of activity windows to the clusters
At the end of the process described in the previous section, the sink will have enough information to assign sufficiently long activity windows to the different clusters in such a way that they do not interfere with each other. The assignment of activity windows to the clusters is the last stage of what will be referred to as the setup process. It is important to mention that the setup process will be repeated periodically, with a relatively low frequency (possibly in the order of hours or days) that depends on the size and density of the network, on the residual energy of nodes, on the traffic intensity, and/or other factors, with the goal of reassigning the responsibility of frame forwarding, hence redistributing the energy consumption to extend the system lifetime.
Activity windows will repeat periodically, giving the members of each cluster enough time to forward to the CH all of the information accumulated during the time they were inactive. The time interval that includes one activity window for every cluster in the system is known as a cycle, as shown in Figures 8 and 9. Depending on the availability of bandwidth and on the application's delay tolerance, a cycle may include idle periods in which no clusters are active to reduce the duty cycle and save more energy. The setup phase plus the sequence of cycles that precede the following setup phase is called a superframe. One or more non-interfering clusters may be active during each activity window, as exemplified in Figure 6. The algorithm accommodates the activity windows according to the depth of the CHs, so that information can move forward in every activity window to a node that has not had yet a chance to transmit in the current cycle. With this approach, all the information that is in the nodes' buffers at the beginning of a cycle will be guaranteed to reach the sink by the end of the same cycle. This is the main reason why we maintain that guaranteeing enough bandwidth to empty all buffers during each cycle is enough to make sure that all other QoS metrics (throughput, delay, jitter, and losses due to buffer overflow) will also be bounded as a consequence.
The process to assign activity windows to the clusters starts when the sink receives interference information messages from all of the CHs in the network or a timer expires. It will then create a two-dimensional structure called cluster_sched in which elements in the same column are such that they have the same depth and do not interfere with one another, and for that reason they can be scheduled to be active simultaneously; similarly, the clusters in the leftmost column will be scheduled first in each cycle and those in the rightmost column will be scheduled last. To achieve this, the sink removes one by one the clusters from the temporary array temp_sched it created before. If there are no columns in cluster_sched containing clusters of the same depth as the one just removed from temp_sched, the sink will insert it into the next empty column. If, on the other hand, there is already at least one column in cluster_sched containing clusters of the same depth as the cluster just removed from temp_sched, the sink will try to add this new cluster into one of such columns verifying first if there is no potential interference; if the cluster can be accommodated in more than one column, it will be included in one in which the difference between its required bandwidth and the maximum required bandwidth of clusters already in the column is either negative (if any) or as small as possible (if all of them are positive); if the cluster cannot be accommodated in any of the columns already in use due to potential interference, it will be inserted into the next empty column.
Once all the clusters have been moved into cluster_sched, the sink has to calculate the duration of the M resulting activity windows, each represented by a non-empty column of the relevant data structure. Let us first calculate the activity time each cluster needs during every cycle:
(3)
where R is the wireless channel's achievable effective data rate, Bcommitted(j) is the bandwidth needed by CH j to collect data from its children, and Tcycle is the duration of each cycle. Tcycle should be small enough so that the maximum delay that a frame will experience from the time it is generated to the time it is received by the sink is within acceptable values, and large enough to allow as many frame transmissions as needed by the members of all clusters. Notice that the maximum delay that a frame can experience is upper-bounded by twice the duration of a cycle. The reason for that is the fact that a frame can be generated after the activity window of the node's cluster has already passed within the current cycle, which means that it will reach the sink by the end of the next cycle. From here, the maximum delay tolerated by the application being served should be less than 2 Tcycle.
We can now calculate the duration of the i th activity window as:
(4)
where S
i
is the set of clusters to be scheduled within the i th activity window. Window sizes should be increased by a small amount to account for traffic that will flow in the opposite direction. Notice that, if:
(5)
it means that there is not enough bandwidth in the network to schedule non-overlapping activity windows. A possible solution in this case is to try to start a new cycle before the end of the previous one, but maintaining the condition of avoiding destructive interference among clusters. To do this, the sink will examine the consequences of overlapping consecutive cycles by an amount of time Toverlap equal to the duration of the first activity window. If there is no potential interference, the sink will analyze the possibility of overlapping consecutive cycles by an amount of time equal to the duration of the two earliest activity windows, and so on. Notice that clusters that are closer to the sink will need more bandwidth, so that it is possible that the last activity windows will be much larger than the first ones; this has to be taken into account when analyzing the possibility of allowing overlaps. When this procedure is finished, if the following inequality still holds:
(6)
it means that the system does not have enough bandwidth to satisfy the QoS requirements of its nodes. When this happens, a topology-control mechanism may optionally be run [29] to decide if a reduced number of nodes (hence a smaller amount of traffic) would be enough to satisfy coverage and connectivity requirements. Topology-control mechanisms are not required for the proper operation of QUATTRO and are therefore out of the scope of this article.
Notice that the reservation procedure is in charge of a first assessment as to whether the system has enough resources to satisfy QoS requirements. However, allocating different activity windows to the clusters means that, when a given node is transmitting, all members of every neighboring cluster have to be silent, and not only those that can experience a collision with the relevant transmitting node. In other words, allocating activity windows is more resource-demanding than local bandwidth reservation, meaning that the latter may be successful and still the former can fail.
When the sink has finished these calculations, it has to inform each CH of the activity window to which it has been associated and of its length. To that end, the sink will send an activity-window notification (AWN) message to each CH, as shown in Figure 10. The AWN message, shown in Figure 11, contains all the information that has to be exchanged in the AWN process. Upon reception, the CH will in turn broadcast an activity-window local notification (AWLN) message to inform its cluster members.
The leaf nodes will acknowledge receipt of the AWLN message by sending an AWACK message. A CH will wait to receive AWACK messages from all of its cluster members to generate its own AWACK message. This process continues until the sink eventually receives the corresponding acknowledgments from its own cluster members.
The system is now ready to start working in a QoS-aware, collision-free, safe-sleep fashion. To make that stage start, the sink sends out a GOAHEAD message that is flooded throughout the network to make every node aware of it. This message contains the time at which the first cycle will begin, as shown in Figure 11, which is what nodes still need to know in order to calculate their sleep and wake times. The duty cycle is included so that nodes can know what fraction of time during each cycle the whole system is idle; this information will be useful when the network has to be reconfigured, as explained in Section 3.7.
Up to this point, communication among nodes takes place using a contention-based MAC protocol, such as CSMA/CA in its IEEE 802.15.4 or 802.11 versions. If IEEE 802.11 (DCF) is used, it would be advisable to force nodes to backoff before every transmission, even when the channel is initially idle, to reduce collisions.
3.5. Normal operation phase
During the normal operation phase, the only active nodes will be those corresponding to the clusters whose activity window includes the current time. Access to the channel is not through contention anymore, but cluster members will wait to receive a poll from the CH indicating that the receiving node is allowed to send. The polls may allow the transmission of one frame at a time, or all frames corresponding to a single upper-layer packet (MSDU), or as many frames as possible during a pre-specified amount of time (TXOP), proportional to the amount of bandwidth requested during the reservation phase. In our simulations, we adopted the TXOP approach.
A node that receives a poll will respond with a Data frame if it has information to send, indicating respectively with the More_Frag and More_Data bits if there are more frames still to send corresponding to the packet being transmitted and if there are more packets to forward in addition to the one currently in service. If the node does not have information to send, it will respond with a Null frame. Either way, the CH can detect when a node has finished sending the information it stored in its buffer and stop polling it during the current cycle. The member node can go to sleep at this time to further save energy. If all the member nodes go to sleep before the end of the activity window, the CH can follow suit. This approach is designed to save energy by avoiding collisions and idle listening as much as possible.
If a node fails repeatedly to respond to polls, an error message will be generated by the CH to alert the sink.
3.6. Network-wide synchronization
Synchronization is very important in time-shared access protocols, such as this one, to keep activity windows from overlapping because of clock drifts. To achieve synchronization, a mechanism such as the one described in [30, 31] can be used from the beginning of the initial setup process. As soon as a node identifies its parent node through the process described in Section 3.1, it can start exchanging synchronization messages with its parent and adopting its clock. This will eventually cause for all nodes to be synchronized to the sink's clock. After a successful reservation procedure, the node can start exchanging synchronization messages with its selected CH and again adopting its clock. This process can continue during the data exchange within the normal operation phase to avoid considerable drifts. Poll, Data and Null frames can carry time stamps to achieve this task.
Notice that even though this is a time-shared access technique, there is no need for nodes to be very precisely synchronized as in TDMA-based systems. The reason for this is the fact that, even during the polling-based data transmission, this remains to be a random-access technique in the sense that every transmission includes a preamble that indicates when a frame is about to begin. Dividing time into activity windows is only used as a reference to specify, with relatively low accuracy, when attempts to access the medium are allowed to be made.
3.7. Reconfiguration
Being a CH can be energy consuming because they have to be awake during two activity windows and because, in addition to sending their own collected data, they have to forward those of their cluster members. That is why nodes will alternate taking the role as CHs by rerunning periodically the setup process. The redistribution of energy consumption relies on the fact that nodes select a route based, among other aspects, on the available energy of the weakest node in that path, as explained in Sections 3.1 and 3.2. If there is only one sink in the system, the redistribution of energy consumption may not be as effective as we would like since nodes closer to the sink will always carry more traffic than those farther away. Hence, having several sinks and alternating their activity is advisable.
In preparation for the beginning of the new setup procedure, after some time of working in the normal operation phase, the current sink will broadcast a RCWK (wake up) message that will be repeatedly rebroadcast by CHs, at the beginning of each subsequent cycle, with the goal of eventually reaching the sink that has to be active during the next superframe. When the sought-after sink receives the RCWK message, it will immediately respond with a RCWRP (wake-up response) message to confirm that it is ready to play its role. The CHs that hear the RCWRP message will forward it until it reaches the currently-active sink; they will also stop rebroadcasting the RCWK message previously received. It is clear that sleeping sinks will have to turn on their receivers periodically to listen for wake-up messages. It is also recommended for them to use overheard synchronization-related messages to remain synchronized with the currently active sink. If no answer to the RCWK is received during a certain number of cycles, the current sink will try to wake up another sink or, if none responds, it itself will remain active during one more superframe.
The next step is for the currently active sink to broadcast an reconfiguration notification (RCN) message indicating that the setup phase has to start again. Receiving an RCN message indicates a CH to stop rebroadcasting the RCWK message if it has not stopped yet. If there is only one sink in the whole network, the RCWK and RCWRP messages will not be used and the sink will proceed immediately to send the RCN message.
Each CH will rebroadcast the RCN message during its corresponding activity window to make its members aware of the imminent reconfiguration phase. Those members that are at the same time heads of other clusters will retransmit the message to their respective cluster members, and so on until all nodes in the system are reached. The RCN message will have a time-to-setup field, as shown in Figure 12, which indicates the number of cycles still pending until the beginning of the setup process. This field is initially set by the sink to a value equal to its depth. Each CH will reduce this value by 1 before retransmitting it, so that setup messages will start to be transmitted one cycle after the farthest leaf receives this message; by this time every node in the system is aware and ready to start the setup phase, either by remaining awake during the time in which the system would otherwise be idle or by stopping temporarily the data transfer phase, depending on whether the reconfig_type field has the value normal or exclusive. In other words, to avoid having to stop the data transmission during the setup process, in a normal reconfiguration nodes can use the portion of each cycle in which all clusters would normally be sleeping, assuming that the duty cycle is sufficiently less than 1. If this is the case, nodes will continue to send information to the current sink until such an instant in which it is time to send the AWACK message to acknowledge receipt of the activity window they will use to send data to the new sink. If the duty cycle is too close to 1, on the other hand, an exclusive reconfiguration will be needed in which nodes will have to stop data transmission when the time-to-setup variable counts down to zero so that they can be ready to execute the new setup process.
The reconfiguration procedure can also be initiated if a severe problem is detected in the system, such as the loss of a CH, which would isolate a set of nodes from the sink. If this happens, an error message will be generated by the node detecting the problem and forwarded until it reaches the sink, as mentioned in Section 3.5. Depending on the extent of the damage (e.g. number of nodes that have failed) the sink can decide if a complete setup procedure is needed, or if a fast reconfiguration might be enough. This will again be indicated in the reconfig_type field of the RCN message transmitted by the sink. In a fast reconfiguration, nodes do not go through the route discovery and weight assignment phases, described in Section 3.1, but use the information collected during the most recent setup process and go straight into the reservation phase described in Section 3.2. In fact, nodes that have not been disconnected can send the RSRQ message to the same node currently acting as its CH, which will have a high probability of accomplishing a successful reservation, thus saving time and energy. Regardless of whether the reconfiguration is complete or fast, it can be carried out using the unused portion of time (normal) or stopping data transmission to use the whole bandwidth (exclusive).
A node will know that it has become isolated from the rest of the network if polls are not received from its CH during several consecutive cycles. When that happens, if the node is itself a CH, it will stop polling its cluster members to indicate them that their data cannot be forwarded all the way up to the sink. To be reconnected to the rest of the network, isolated nodes will remain awake (during the time that the system would otherwise be idle, if the duty cycle is sufficiently less than 1, or constantly if not) waiting for the RCN message that will start a new reconfiguration process, allowing them to select a new CH.
Reconfiguration is the last ingredient of our proposal. Figure 13 shows the whole protocol at a glance, including all of its phases and the events that trigger the beginning and end of each of them.