In this section, we first formulate the problem and then provide a detailed solution by dynamic programming.

### 3.1. Problem formulation

In traditional routing algorithms, most efforts have been taken to alleviate the impact of the path loss [19, 20]. The problem to be addressed in this paper can be summed up as how far away the next-hop node should be. However, as energy and QoS aware packet forwarding are concerned, decisions should take into account both transmission distance and FER, i.e., the QoS issue. Meanwhile, both the power allocation and the route plan can be done jointly in the frame work of dynamic programming, hop by hop in the packet forwarding process. The introduced forwarding protocol can be performed in a distributed fashion, because a distributed protocol is scalable and easily implementable in practice, and every sensor has to make forwarding decisions based on its limited localized knowledge without the usage of end-to-end feedback. Hence, by prediction technique, such as dynamic programming, to estimate energy consumption is straightforward. Here, we assume that every node has its position information, which can be easily obtained by small-sized, low power, low cost GPS receivers and position estimation techniques based on signal strength measurements. In addition, we assume that sensor nodes are deployed uniformly in the plane region. The data center (i.e., the destination node in the network) is fully aware of both the node density and the region size.

### 3.2. Solution

A WSN can be modeled using an undirected graph *G*(*W, L*), where *W* and *L* represent the set of all nodes and the set of all directed links, respectively. ∀*w*_{
i
}, *w*_{
j
}∈ *L* if and only if *v*_{
j
}∈ *A*(*w*_{
i
}), which represents neighboring region of *w*_{
i
}that are directly reachable by *w*_{
i
}with a transmitting power level within its dynamic range. For a *U*-hops path {*w*_{0}, ..., *w*_{
k
}, ..., *w*_{U- 1}}, where *w*_{0} and *w*_{U-1}denote the source and destination nodes, respectively. The final SER can be expressed as {\sum}_{k=0}^{U-1}{\mathsf{\text{SER}}}_{k}^{\prime}\left({p}_{k},{l}_{k}\right), where *p*_{
k
}represents the transmission power allocated to the *k*-th node and *l*_{
k
}= |(*w*_{
k
}, *w*_{k+1})| denotes the distance of the link (*w*_{
k
}, *w*_{k+1}). Note that the total power constraint is *p*_{
k
}+ *c*. We now cast the packet forwarding problem into the framework of dynamic programming [17], which contains the following five ingredients [12].

#### 3.2.1. Stage

The process of packet forwarding can be naturally divided into a set of stages by hops on the path. Stages are indexed by positive integers (*i* = 1, 2, ...).

#### 3.2.2. State

At every stage *i*, the state *s*_{
i
}= (*w*_{
i
}, *e*_{
i
}) consists of two components: the position of current node *w*_{
i
}, and the remaining power *e*_{
i
}. Thus, the state space is a multi-dimensional continuous space *S* = *A* × [*e*_{min}, *e*_{max}], where *A* represents the region that nodes are deployed, and *e*_{min} and *e*_{max} represent the minimum and maximum end-to-end power consumption thresholds, respectively. An alternative forms of state can be expressed by *s*_{
i
}= (*d*_{
i
}, *e*_{
i
}), where *d*_{
i
}denotes the distance from *w*_{
i
}to the destination. Correspondingly, the state space is a two-dimensional continuous space represented by *S* = [0, *D*]×[*e*_{min}, *e*_{max}], where *D* denote the farthest distance from the sensor node to the data center.

#### 3.2.3. Decision

As we have mentioned earlier, a decision should sort out two problems: how far away the next-hop node, and how much power should be allocated in order to forward the packet. Hence, we can define the decision at stage *i* as {q}_{i}=\left({\u0175}_{i+1},{e}_{i+1}\right), where {\u0175}_{i+1} represents the target position of *w*_{i+1}(there may be no sensor at this position). Note that {\u0175}_{i+1} and *e*_{i+1}provide answers to the questions we give forward. The decision space is a multi-dimensional continuous space *G* = *A* × [*e*_{min}, *e*_{max}], while its subset at state *s*_{
i
}*=* (*w*_{
i
}, *e*_{
i
}) is *G*(*s*_{
i
}) = *A*(*w*_{
i
}) × [*e*_{min},*e*_{
i
}]. Since the target position {\u0175}_{i+1} always appears on the line connecting *w*_{
i
}and the destination, a single variable *l*_{
i
}= |(*w*_{
i
}, *w*_{i+ 1})| denoting the target advancement at state *s*_{
i
}is enough to represent the position of *w*_{i+1}. Thus, a decision could be alternatively described in term *g*_{
i
}= (*l*_{
i
}, *e*_{
i
}). With this definition, the decision space is a two-dimensional continuous space *G* = [0, *D*_{max}]×[*e*_{min}, *e*_{max}], where *D*_{max} is the maximum transmission radius of nodes. Both terms of decisions will be used in the later discussion.

It is notable that the decision we define above might not reflect the exact position of the relay node *w*_{i+1}. It just describes the ideal position {\u0175}_{i+1}. Therefore, one additional procedure called *relay-selecting* algorithm should be carried out to find out the exact relay node. This procedure works in this way: Given an optimal position {\u0175}_{i+1}, the node within the region of *A*(*w*_{
i
}) and being the nearest to {\u0175}_{i+1} will be selected as *w*_{i+1}. Node *w*_{
i
}will be chosen to forward the packet to *w*_{i+ 1}within consumed power *e*_{
i
}.

#### 3.2.4. Policy

A policy *g* : *S* → *G* represents a mapping method from state space to decision space. The decision according to state *s*_{
i
}can then be written as *g*_{
i
}= *g*_{
i
}(*s*_{
i
}), where *g*_{
i
}is the policy at state *s*_{
i
}. A policy is said to be stationary, if the mapping does not change with stages, i.e., ∀*i, g*_{
i
}= *g*. That is, once the input state is given, the output decision would be determined whatever stage it is at. Here, we consider only stationary policies as candidates and the feasible stationary policy space is represented by \mathcal{P}\left(X\right).

#### 3.2.5. Value function

Value function (or *cost-to-go* function) plays an important role in the framework of dynamic programming based algorithm. Usually, a value function *J* : *S* → ℝ is a mapping from the state space to the set of real values. The value function *J*_{
g
}(*s*) we define here can be interpreted as the average end to end SER with respect to the initial state *s*_{
x
}and the stationary policy *g*, which is given by

{J}_{g}\left({s}_{x}\right)=E\left\{\sum _{k=0}^{U-1}{\mathsf{\text{SER}}}_{k}^{\prime}\left({s}_{k},g\left({s}_{k}\right),{\epsilon}_{k}|{s}_{0}={s}_{x}\right)\right\},

(13)

where {\mathsf{\text{SER}}}_{k}^{\prime}\left({s}_{k},g\left({s}_{k}\right),{\epsilon}_{k}|{s}_{0}={s}_{x}\right) is the SER from state *s*_{
i
}to *s*_{k+1}and *ϵ*_{
k
}represents the derivation between the actual position *w*_{
k
}and the ideal position {\u0175}_{i}. Given *g*(*s*_{
k
}) and *ϵ*_{
k
}, the state *s*_{k+1}can be determined. Hence, we may use SER'(*s*_{
x
}, *s*_{
y
}) to represent the SER from state *s*_{
x
}to state *s*_{
y
}in the later discussions.

An iteration form of (13) is [12]

{J}_{g}\left({s}_{k}\right)=E\left\{\mathsf{\text{SER'}}\left({s}_{x},{s}_{y}\right)+{J}_{g}\left({s}_{y}\right)\right\}={\phi}_{g}\left({s}_{x}\right)+\underset{G}{\int}{J}_{g}\left({s}_{y}\right){f}_{g}\left({s}_{x},{s}_{y}\right)d{s}_{y}

(14)

where *f*_{
g
}(*·, ·*) is the state transition probability density function (PDF) with policy *g*, and *φ*_{
g
}(*s*_{
x
}) = *∫*_{
G
}SER'(*s*_{
x
}, *s*_{
y
})*f*_{
g
}(*s*_{
x
}, *s*_{
y
})*ds*_{
y
}is the average one-hop SER with policy *g* given the current state *s*_{
x
}.

Actually, (14) is the standard result in dynamic programming problems. The objective is to find out the optimal policy. A policy *g**. is called optimal if \forall s,\phantom{\rule{2.77695pt}{0ex}}{J}_{{g}^{*}}\left(s\right)\le {J}_{g}\left(s\right) for every other policy *g*. We use *J**(·) to denote the value function under the optimal policy. In principle, the optimal policy can be obtained by solving the Bellman's equation [17]

\underset{g\in \mathcal{P}\left(s\right)}{\text{max}}{J}^{*}\left({s}_{x}\right)={\phi}_{g}\left({s}_{x}\right)+\underset{G}{\int}{J}^{*}\left({s}_{y}\right){f}_{g}\left({s}_{x},{s}_{y}\right)d{s}_{y}.

(15)

Once the optimal value function *J** are available, the optimal forwarding policy is given by

{g}^{*}=\text{arg}\underset{g\in \mathcal{P}\left(s\right)}{\text{max}}{J}^{*}\left({s}_{x}\right)=\text{arg}\underset{g\in \mathcal{P}\left(s\right)}{\text{max}}\left\{{\phi}_{g}\left({s}_{x}\right)+\underset{G}{\int}{J}^{*}\left({s}_{y}\right){f}_{g}\left({s}_{x},{s}_{y}\right)d{s}_{y}\right\}

(16)

We notice that the optimal policy consists of a series of optimal decisions which are made at each state *s*_{
x
}to maximize the right-hand side of (16).