Dynamic power control for energy harvesting wireless multimedia sensor networks

Optimization of energy usage in wireless sensor networks (WSN) has been an active research field for the last decades and various approaches have been explored. In fact, A well designed energy consumption model is the foundation for developing and evaluating a power management scheme in network of energy constrained devices such as: WSN. We are interested in developing optimal centralized power control policies for energy harvesting wireless multimedia sensor networks (WMSN) equipped with photovoltaic cells. We propose a new complete information Markov decision process model to characterize sensor ’ s battery discharge/recharge process and inspect the structural properties of optimal transmit policies.


Introduction
The recent technological advances in the fields of micro-electronic, wireless communication along with reduction of production costs have motivated the development of a novel generation of wireless networks. wireless sensor networks (WSN) are articulated over a set of miniaturized battery powered devices (sensors) with communication capabilities and are expected to become highly integrated into our daily activities. This class of networks is perceived as an evolution of AdHoc networks with specific energy and computation limitations. Also, the increasing availability, at low cost, of multimedia devices (cameras, microphones,...) has triggered the emergence of multimedia wireless sensor networks (WMSN) [1,2]. With the diversity of their application domains, ranging from healthcare and intelligent patient monitoring to disaster relief and industrial process supervision through intrusion detection and border protection, WMSN hold a promising future [3]. It is noteworthy that the volume and the nature of carried multimedia content, mainly composed of images and/or video streams, impose severe requirements on sensor's residual energy and available bandwidth.
The energy scarcity represents one of the major limitations of WMSN, indeed, post-deployment replacement of the sensors batteries is generally not practical or even impossible. Therefore, a proper management strategy of the residual energy happens to be a crucial prerequisite to any large scale WMSN deployment. In order to preserve the sensors energy, an optimal choice of transmit powers and an efficient scavenging mechanism of energy from the deployment environment along with an adequate topological placement of the sensors are necessary.
A variety of topologies have been proposed for WMSN deployment, notably: single-tier flat, single-tier clustered and multi-tier (see Figure 1). Introducing hierarchy in WMSN benefits at various levels: indeed, since sensors forward exclusively packets produced within their cluster, the communication overhead is reduced and consequently the network lifetime is prolonged. Also, resourced devices (multi-processing hubs) realize heavy computations and aggregation of data reported by their cluster sensors, reducing by the way the energy consumption resulting from relaying redundant data.
Energy harvesting [4][5][6][7] in the context of WSN received increasing attention from research community. Indeed, enabling sensors to replenish their energy reserves, extends the WMSN's deployment lifetime and enlarges their application domains. Energy harvesting sources are various and encompass solar, wind and vibratory sources. In this work we will focus our attention on sensors equipped with photovoltaic cells that realize solar energy transformation into electric energy needed to recharge their batteries.
Energy consumption optimization remains a crucial issue even for energy harvesting WMSN. In fact, the harvested energy should be exploited optimally to cope with energy sources periodicity (day/night cycles) and unpredictability (wind activity/inactivity periods). Since, most of energy consumption is incurred at transceiver level, a balance should be found between conflicting objectives: maximizing the achieved throughput while reducing energy consumption and consequently extending the sensor's battery lifetime. Optimal energy management policies for energy harvesting sensors are considered in [8]. The discounted throughput is maximized over an infinite horizon, where queuing for data is also considered. In [9], the authors consider a binary power control problem: at each slot the wireless device could either transmit at a constant power or remain silent. The authors consider only the single user case and the optimal transmission policy is shown to be of a threshold type for the soft and strict delay constraint cases. The authors of [10] presented a decentralized power control with stochastic channel variation scheme.
The proposed scheme considers a cost function that accounts for the QoS of each user and the interference to other users. The single user optimal policy is generalized to the multiple users scenario for the ergodic regime where the spreading factor and the number of users grow infinitely but their ratio remains constant. In [11], the authors consider a single transmitter-receiver scenario where the transmitter has a finite buffer, and solved the problem of dynamically assigning rates/ powers to packets in order to minimize the long-term average transmission energy subject to an upper bound on the buffer overflow probability. The problem is formulated as a constrained Markov decision process (CMDP) and an analytical solution is given and proved to be monotone in queue length. The authors of [12] use an evolutionary game theoretic formulation to characterize the equilibrium policy for power allocation under channel un-certainty and delayed imperfect payoffs. A heterogonous learning framework that accounts for user and technology specificities is proposed. The authors of [13] address the problem of network resource allocation for energy-harvesting sensor platforms with time-varying battery recharging rates. They propose a joint approach that combines QuickFix for getting the optimal sampling rate and SnapIt that adapts the sampling rate with the objective of maintaining the battery at a target level. The considered networks are characterized by a special directed acyclic network graph (DAG) structure and the choice of a given rate implies a specific transmit power. In [14], an energy harvesting body sensor network formed by sensors with a two-state energy harvester device is considered. The authors develop policies based on the energy-error probability tradeoff to maximize successful transmission probability while minimizing probability of running out of energy. The developed strategies exploit the knowledge of the current energy level and the process governing event generation and battery recharge to select the appropriate transmission mode. The problem of throughput optimal energy allocation is studied for energy harvesting systems in a time constrained slotted setting in [15]. The structural properties of the optimal power allocation policy are obtained through dynamic programming and convex optimization. The optimal use of the harvested energy for different energy profiles and storage capabilities is discussed in [16] with the outage probability considered as a performance metric. The authors developed a discrete time Markov model of the evolution of battery and transmission state and provide optimal transmission strategy that minimizes outage probability.
Our objective is to define an optimal energy management policy for the centralized dynamic power control problem for energy harvesting WMSN. Thus, at each time slot, the transmission powers of all the sensors are fixed by the base station to maximize the expected system's throughput under minimum energy consumption.
The added value of our work covers the following points: • We formalize the centralized power control problem in the context of hierarchical energy harvesting WMSN as a complete information MDP. • A stochastic model for the discharge/recharge battery process for energy harvesting WMSN is provided.
• We consider a novel utility function that balances the sojourn in a each battery state along with achieved throughput for a given transmit policy.
• The structural properties of centrally computed optimal transmit policies are inspected.
This article is organized as follows: In Section 2, we give a mathematical formulation for dynamic power control problem in the context of energy harvesting WMSN. Optimal transmission policies are treated in Sections 3 and 4 for the single and multiple sensors scenarios. Then we present numerical simulation results in Section 5. Finally, we conclude the article and announce our future works in Section 6.

System model
We consider a WMSN formed by a set of sensors N = {1, . . . , n}, under the authority of a single base station or gateway. Each sensor is equipped by its own battery and uses power to communicate with the base station either directly or through multimedia processing hubs. We discretize each sensor's battery capacity to several intervals that give a more coarse-grained description of the battery state, e.g., full, medium, discharged. The sensors are equipped with photovoltaic cells that make them capable of harvesting solar energy while undergoing the discharge process. The residual energy of the battery dictates available transmission powers and the achieved throughput is affected by the chosen transmit powers of other sensors. Time is discrete and at each time slot t, each sensor knows its own battery state, whereas the battery state of the other sensors and their actions remain unknown.
The formulated problem fits within the MDP framework with full information and infinite planning horizon, where the base station will compute and provide each sensor with its optimal transmission strategy, given the fact that it has access to all sensors' information i.e environment, battery and radio channel status. In order to be aware of each sensor battery level, we assume that time is slotted into virtual slots that encompasses several physical slots. the initial physical slots will be affected to the sensors to communicate their battery levels to the base station in a TDMA fashion. The remaining slots will serve for data exchange with interference possibility. We use three bits to code the energy level of each battery and thus, we could represent up to eight battery states.

Mathematical formulation
Denote by w = {w t } t≥t 0 the set of environment states at each time slot post t 0 . Thus, the power allocation dynamic problem for a sensor j could be modeled by an MDP: Where: • X j = {0, 1, 2, . . . , m − 1} is a finite set of states of sensor's j battery. A state of the battery represents some interval of percentage of the remaining energy. The energy level of the battery increases (respectively decreases) sequentially. Initially, the sensor's battery is in its highest state (m -1), as sensors perform their affected tasks (event detection, packets forwarding,...) they consume energy and their batteries energy levels decrease sequentially. The harvested solar energy will be converted into electrical energy and will increase the sensors batteries residual energy level sequentially.
• ∀s j,t ∈ X j , A j (w t , s j,t ) = {p 0 , . . . , p s j,t } is a finite set of available transmission powers for sensor j. This set satisfies A j (w t , s j, t -1) ⊂ A j (w t , s j, t ): more powers are available at higher states. A sensor makes a decision on its transmit power p j t ∈ A j (w t , s j,t ) based on its remaining energy s j, t at time t.
• q j is the state transition probability of sensor's j battery. Given the state of environment w t , the state s j, t of the battery of j, the state c j, t of the radio channel in the vicinity of j, and the actions of the others p . . , p n t ) the new states are (s j, t+1 , c j, t+1 ) with the probability • The discount factor l indicates for a user the decay in the gain value with the evolution of the time.
Let h t = (w t , s 1, t , c 1, t ,..., s n, t , c n, t ) be the state profile of all the system at time t. The action profile of the system is: p t = (p 1 t , . . . , p n t ) , where ∀j ∈ N , p j t ∈ A j (w t , s j,t ) . When w t = 0, the SINR of sensor j is null. For w t ≠0, the SINR of sensor j is given by: where h j (w t , s j,t , c j,t )p j t represents the power received at the base station or the multimedia processing hub given that states are s j, t (respectively c j, t ) for the battery (respectively the radio channel in the vicinity) of sensor j. p j t is the power level chosen by j and h j (w t , s j, t , c j, t ) is a function of the channel state and others exogenous characteristics, N 0 is the variance of the noise. The throughput of sensor j at time t is an increasing function of the SINR j (h t , p t ).
In the rest of the article, we consider f j to be Shannon capacity [17] and thus:

Stochastic battery model for sensors with energy harvesting capabilities
The sensors harvest energy through a photovoltaic cell and use the scavenged power to recharge their batteries. Thus, the sensor battery will move from state s j, t to state s j, t +1 with probability q j (s j,t + 1|s j,t , p j t ) = p harvest . The transmit power choice of j determines the transitions to the next state. Thus, when the sensor j selects a transmit power p j t ∈ A j (w t , s j,t ) , the new state of the battery is s j, t+1 with probability q j (s j,t+1 |s j,t , p j t ), q j (s j,t+1 |s j,t , p j t ) = 0 if s j, t+1 ∉ { s j, t + 1, s j, t , s j, t -1 } . If the energy harvesting process is frozen for a long period of time (due to cloudy weather for example), the state 0 is reached (the battery is completely empty) and the sensor is considered to be out of service. Figure 2 shows the state transition probabilities of sensor's j battery.
The probability to move to the lower adjacent state increases with the energy consumption i.e At each time slot t, depending on the remaining energy off the battery, the sensor j makes a decision on its transmit power p The sojourn time T j (l, d ∞ j ) in the state l under transmit policy d ∞ j can be expressed as: When p j t depends only on the state of the battery but not on the time (stationary policy), the sojourn becomes: 3 Optimal policy for a single sensor When considering a single sensor, the interferences are omitted from the SINR expression that becomes equivalent to the signal to noise ratio (SNR): immediate reward r t (s j,t , p j t ) for choosing transmit power p j t at slot t. The immediate reward reflects a balance between maximizing the expected sojourn in a given battery state and the corresponding achieved throughput for the chosen transmit power: Let ind: X j → R be a non-decreasing function on X j . We consider transition probabilities that adhere to the following expression: In the discounted total reward problem the gains on the first stages are more important than the future ones. In particular, a gain acquired at time n is assumed to have a present value l n r(s j , p j ) where 0 < l <1 is a discount factor. Under an infinite planning horizon and due to its elegant theory, the ease in which it allows inclusion of constraints, and its facility for sensitivity analysis, linear programming formulation [18] is adequate to solve our MDP. We randomly choose a set of real constants {α(s j )} s j ∈X j to be a distribution probability over the states of sensor's j battery. Therefore, the set of a (s j ) should respect the following constraint: s j ∈X j α(s j ) = 1 . We also consider for every state s j and available action P j A j (s j ) the linear program variable x(s j , p j ) = ∞ n=0 λ n Prob(s j , p j ) that indicates the expected discounted time of the sensor's battery being in state s j and making decision p j . The linear program equivalent to the l discounted MDP Ω is described by: The solution of this linear program (LP) is obtained through application of the simplex method. After solving the LP above, we recover the optimal decision rules of the associated MDP by applying the rule [19]: • A discounted MDP has always a deterministic optimal decision rule [20] that we select based on the following criterion: We argue that there exists structured decision rules for the MDP Ω. Proposition 1. There exists optimal monotone nondecreasing decision rules on X j for the MDP Ω.
The detailed proof of Proposition 1 is given below: Proof. Let Q(k, l, p base station or the multimedia processing hubs, sensors transmissions do not interfere. Therefore, the overall policy (d ∞ * 1 , . . . , d ∞ * n ) realized when every sensor adopts its optimal transmit power happens to be the optimal transmit policy for the overall system noted d ∞ *.
For the general case: non-orthogonal codes are used and sensors transmissions do interfere. The WMSN is modeled by the MDP: + = {X , (A(w t , S t )) S t ∈X ,w , Q, λ} with full information. We extend the previously formulated mathematical model to account for multiple sensors and denote the augmented states and actions spaces respectively X = n k=1 X k and A = n k=1 A k . The transition probability from state S t to S t+1 for the power profile P t is given by the formula: With each sensor utility expressed as follows: The immediate reward for the network becomes: We reconsider the LP in (9) for the augmented MDP:

Numerical investigations
We discretize each battery residual energy capacity to five states: near full, high, medium, low and discharged. The states set is: {0, 1, 2, 3, 4} and the transmit power panel for each state is detailed in Table 1: Our objective is to characterize the optimal transmit policy for a single sensor with a discount factor l = 0.6.
We solve the associated LP through the simplex algorithm to obtain the optimal policy summarized below: . Table 2 describes the optimal transmit policy for a network formed by three sensor with three states {0, 1, 2} for a l = 0.6 discounted Ω + under infinite horizon planning. We notice that sensors tend to use their lowest available transmission powers as using higher ones result in reduced throughput due to interference and rapid depletion of their batteries.

Concluding remarks
In this article we considered the problem of dynamic centralized power allocation for energy harvesting WMSN. We focus on solar powered sensors and provide a stochastic model for the associated battery discharge/recharge process. The dynamic power control problem was formulated as a MDP and the structural properties of optimal transmission policies established. We plan, in a near future, to generalize our approach for the decentralized case with partial channel information using stochastic game theory. Table 1 Transmit power panel