Optimal harvest-use-store policy for energy-harvesting wireless systems in frequency-selective fading channels

Recent advances in energy-harvesting (EH) technology have enabled the realization of wireless systems composed of rechargeable devices. In this paper, we analyze the problem of maximizing the data transmission for the point-to-point (P2P) wireless communication systems which the transmitter is able to harvest energy from ambient environment. To be more general, we consider the EH optimal problem under the quasi-static frequency-selective fading channel. Our optimization work also includes energy storage loss constraint of the battery; therefore, we apply an efficient harvesting architecture, i.e., harvest-use-store (HUS), where the harvested energy is prioritized for use in data transmission. To balance the energy stored in or extracted from the battery for maximization throughput with the randomly arrival harvesting energy constraint, we first characterize the amazing properties of our optimal policy, implying a double-threshold structure of the solution, then investigate a dynamic programming (DP)-based double-layer optimal allocation policy. Further, we tend to analyze the online solution. First, the optimal policy is provided by using the continuous time stochastic dynamic programming. Then, building on the intuition from the optimal offline policy (i.e., double-threshold structure), a heuristic online policy is proposed, which is simple to be implemented. Numerical results are presented to validate the theoretical analysis and to demonstrate the superior performance over the existing counterparts in the previous literatures and show that the proposed online policies track well to the optimal solution.

constraints, i.e., sporadic arrival of the harvested energy in limited amounts performs a totally different energy available profile in each block. Thus, it is critical to reoptimize the transmission policy to adapt to the causality constraint imposed on the use of the harvested energy.
Recent works on optimizing transmission policy with an energy-harvesting transmitter have drawn great attentions [2][3][4][5][6][7][8]. Ozel et al. [2] introduced two related optimization problems in single-link fading channels: a) maximization of data transmission (or throughput) within a deadline T and b) minimization time of transmission (or delay) by B bits of data is completed. Gong et al. [3] considered joint energy-harvesting and grid power supply, formulating the problem of minimizing the power grid consumption by completing the required data transmission before a given deadline. Other communication scenarios with EH ability include broadcast channel [4,5], multiple access channel [6], and two-hop networks [7,8]. Furthermore, note that battery imperfections are also key factors of energy harvesting, leading to researchers to focus on. Devillers and Gunduz [9] converged the influence of constant leakage rate and battery degradation over time into the battery model. Tutuncuoglu and Yener [10] studied the data maximization problem under finite battery capacity constraints.
All these contributions are made under flat fading channels; however, variations of the transmitter location within a dense urban wireless environment lead to constantly changing scattering scenarios, which in turn result in a varying channel law. Thus, in this paper, we pay attention to more general cases, i.e., quasi-static frequency-selective fading channels. Moreover, we focus on battery imperfection case which has not been involved in above papers, i.e., energy loss during storage; therefore, we apply an energy-efficient harvesting strategy, i.e., harvest-use-store (HUS) [11] which puts the highest priority to usage, followed by storage, contrary to harvest-store-use (HSU) strategy, which leads to a server energy loss due to storage dominated. Thus, in this paper, we study the problem of throughput optimization for P2P system within a finite block under various constraints regarding the EH profile, quasi-static frequency-selective fading channels as well as storage loss and propose a dynamic programming-based double-layer allocation policy with non-causal energy and fading information. We show that the optimal offline solution has a double-threshold structure. With this structure property in mind, we further extend the results to causal case and present an optimal online policy and a heuristic one.
The remainder of the paper is organized as follows. We provide the system model and formulate the problem in Section 2. Optimal offline policy operating on HUS mode is solved by investigating a dynamic programming (DP)based double-layer optimal allocation policy in Section 3, followed by online policies in Section 4. Numerical results are presented in Section 5 for performance comparison of our optimal offline solution with various existing energy-harvesting architectures. Also, we provide a thorough numerical study of the proposed online policies under various algorithms and compare them to the offline policy. Section 6 finishes the paper with concluding remarks.

System model and problem formulation
Consider a point-to-point wireless communication system with an energy-harvesting transmitter wearing a rechargeable battery which suffers storage loss, as depicted in Figure 1. The energy comes from the ambient environment and harvested by the energy unit. Based on the harvested energy pattern and energy loss during storage, the transmitter operates in HUS mode, suggesting that the decision device should optimize the scheduling of the energy that stored in or drawn from the battery for data transmission.
In our system, we will focus on an N-block transmission which starts from block 1 as shown in Figure 2. E n units of energy arrives at the beginning of block n, and the time interval between two consecutive energy arrival is defined as the transmission time n . We assume that the harvested energy increments and their arrival times can be exactly known at the transmitter prior to the transmission (similar to [12]). Further, with the lemmas in [12], we indicate that the transmit power must separately remain constant within each block, due the rate function (i.e., r = (1/2) log (1 + p)) is concave in power. In what follows, we assume the L-tap quasi-static frequency-selective fading channel in block n to be Turin model, as shown by: where h n,i and τ n,i respectively denote the channel gain and the delay at ith-tap in block n. Under the timeinvariant assumption, its discrete form is given by a Toeplitz matrix (i.e., each descending diagonal from left to right is constant) [13], which has a special eigenstructure H = U † U. The matrix is diagonal with diagonal entries defined by the Fourier transform of h n (t). Based on  this, the transmission over a frequency-selective channel can be simplified to the M-subcarriers system by adding a cyclic prefix of length L, as shown in [13]. The mth channel component of block n, is defined as: Now the transmission channel can be viewed as a collection of parallel AWGN sub-channels, one for each subcarrier m with the fading gainsh n,m , n = 1, . . . , N, m = 1, . . . , M. We assume thath n,m remains unchanged during a transmission block, i.e., block-fading mode.

Problem statement
The transmission rate in block n is then the sum rate of all the sub-channnels, given by the mutual information I n = m I n,m in bits per symbol. In general, we assume that I n,m is concave and increasing in p n,m , which represents the power allocated to the subcarrier m of block n. Consider a complex Gaussian channel with average signal power constraint p n,m with the channel gain H n,m = |h n,m | 2 and the noise power is 1, the information theoretically optimal channel coding scheme, which employs randomly generated codes, achieves the channel rate given by (as is well known in [14]): Hence, the total data transmission (i.e., throughput) over N blocks for the P2P wireless communication is described as: To proceed on, we characterize the HUS strategy by the battery modes as follows: (a) Charging: when E n > M m=1 p n,m n , the transmitter will use M m=1 p n,m n amount of energy directly from energy unit, and the battery will store the excess energy E n − M m=1 p n,m n which is denoted as D n for simplification. (b) Discharging: when E n < M m=1 p n,m n , the transmitter will use all the harvesting energy in current block, and the battery replenishes the lacking part M m=1 p n,m n − E n which is denoted as −D n correspondingly. (c) Neutral: when E n = M m=1 p n,m n , specially, the transmitter uses up all the harvested energy for transmission without any operation to the battery.
Note that only fraction or none of the harvested energy will be wasted in the presence of storage efficiency 0 < η B < 1 under HUS mode, more energy efficient than HSU that all the harvested energy will suffer the energy loss, accounting for it always stored the harvested energy in a battery first before its subsequent use. With definition [D i ] + = max(0, D i ), therefore, the battery level at the end of block n (i.e., the residual battery level), denoted here by B n , is given by: where η B n i=1 [D i ] + and n i=1 [−D i ] + represent, respectively, the energy stored in and taken out from the battery at the end of block n. For simplicity and ease of analysis, we assume that the initial battery level is zero (i.e., B 0 = 0) and has an infinite capacity.
Thus, our throughput maximization problem over N transmission blocks can be expressed as: where the battery level must not be negative at the end of each block in order to supply sufficient energy for data transmission. The equality constraint on B N , as shown in Equation 8, is obvious since otherwise, we can always increase the transmission data rate by increasing p n,m without violating any other constraints in Equation 7. Note that (P1) is not only the power allocation problem between blocks in the time domain but also the power allocation of each subcarrier within the block in the frequency domain. The former will affect the latter.
Moreover, the non-linearity and non-differentiability of the constraint conditions make the whole (P1) become more difficult to solve. Thus, we start with the properties of the solution, and then we propose a DP-based double-layer allocation algorithm to solve the problem.

Optimal offline policy for frequency selective fading channel
We first employ the Lagrangian technique to investigate the properties of the transmit power whereby to obtain an intuitive insight into our optimization problem and then introduce dynamic programming algorithm to achieve the solution.

Solvability and properties of the solution
The optimal solution to Problem (P1) has the double-threshold structure, and the form is described as follows: where ξ k , ρ k , and σ k is the water level of charging, discharging, and neutral block, respectively. Specially, the water level of neutral block can be obtained by maximizing the rate of block k, with a total power constraint M m=1 p k,m = E k / k (i.e., the harvesting power, derived from D k = 0) across the sub-channels, using the traditional water-filling method.
Proof. The objective function (Equation 6) is a sum of log functions and is, thus, concave with respect to the power sequence. We can further show the convexity of the constraint set defined by Equation 7 by the method of induction. As such, our throughput maximization problem has a unique solution, according to the theory of convex optimization. For notational simplicity, denote the Lagrangian function for any λ n ≥ 0 by: The Lagrangian function in Equation 10 is, in essence, the summation of all the non-zero entries of a lower triangular matrix. Differentiating L with respect to p k,j , we obtain: To handle the non-linearity of the rectifier function The Kuhn Tucker condition for the optimality of a power allocation is as follows [14]: which guarantees the constraint p k,j ≥ 0 is satisfied. Recognize that dsgn (x)/dx = 2δ (x), xδ (x) = 0, the optimal power allocation can be described as below: To solve this equation for the optimal power, we identify three cases for the signum function, which will lead to Equation 9. Particularly, when D k = 0, which means all the harvested energy of block k has been allocated to the same block. Based on water-filling method in [13,14], we know the maximum throughput will be obtained by allocating the power σ k , which is determined by: Above all, the main result in Equation 9 provides a basis to investigate the properties of the optimal power allocation policy for the new energy-harvesting system. This theorem reveals more interesting properties in the optimal power-allocation pattern, as summarized below. For ease of description, we define some terms for subsequent use.

Definition 1. A block that hits the zero battery level is called a valley block or simply a valley. The blocks between any two closest valleys constitute a hill segment.
A hill segment starting from block a and ending at s > a is briefly denoted as HS(a, s), which means B a−1 = B s = 0. Now we can state the main properties of our problem. Property 1. The subcarriers in the same block possess the same water level. Proof. Since the water level is only related with the block index k in Equation 9 rather than the subcarrier index j, thus the water level in the same block is equal. a hill segment (e.g., HS(a, s)), all the energy-charging blocks have the same water level, equal to w + , whereas the energy-discharging blocks has the similar property, which has a lower water level w − , where: Thus, for D k > 0 which corresponds to the battery state of energy charging, the water level, j = 1, 2, . . . , M: Similarly for D k < 0 which corresponds to the battery state of energy discharging, we obtain: Then, consider block s with B s = 0. The only possibility for B s = 0 is that block s is a discharging period and hence, D s < 0. It follows from Equation 9 that: Particular, when D k = 0, which corresponds to the battery state of energy neutral, we obtain: Since D k = 0 means all the harvested energy of block k has been allocated to the same block, the water level w 0 is determined by Equation 14. Since 1 ≥ η B ≥ 0, we can easily know w + ≥ w 0 ≥ w − . Property 3. The water levels of charging blocks, though equal within a hill segment, is monotonically nondecreasing from one hill segment to the next. The same assertion is true for the discharging blocks.
Proof. Assuming there are M valley blocks, we denote these blocks as V 1 , V 2 , . . . , V M , and then respectively denote w + V i and w − V i as the optimal water level of charging and discharging blocks within the hill segment HS(V i−1 + 1, V i ), yielding: Since λ n ≥ 0, V i > V i−1 , thus w + and w − monotonically increase from one hill segment to the next.
These properties imply that: 1. Actually, within the block, the power allocation is equivalent to the traditional water filling. 2. The water level has a close relationship with the battery mode (i.e., charging D k > 0, discharging D k < 0, neutral D k = 0). 3. In order to maximize the total throughput, the power management policy should balance the tradeoff among the whole transmission process. Thus, the energy may be transferred from current block to the future to ensure the optimal benefit. Also, on account of the causality constraint, energy can not be used before its arrival. These two aspects in turn prove the properties. 4. The main insight is that the optimal solution has a double-threshold structure as shown in Theorem 1.
All above leads to an intuitive understanding of our optimal solution; based on this, we introduce dynamic programming algorithm and form a double-layer allocation problem to obtain our optimal solution, which is shown below.

Dynamic programming
In this subsection, we develop a DP approach for our throughput maximization problem. Recall that the battery status of a block, say block n, depends on its charging and discharging history up to the time n. Specifically, it follows from Equation 5 that: where α n can be regarded as an operation of the battery (i.e., battery mode), which accords with the dynamic programming model (i.e., basic discrete dynamic system) [15]. Equation 15 is a basic discrete time dynamic system, which can be regarded as an order-one Markov process or more accurately random walk model. In what follows, we introduce the first layer allocation which characterizes how to allocate the power within block n to achieve the maximum throughput in the frequency domain: where the water level w is determined by the argument D n based on Equation 9. P n is the sum power which allocated to the block n, and is actually a mapping relation with α n : Now, the benefit function is related to α n , and we can rewrite C P n n,M as C α n n,M . Therefore, the maximum total benefit is to find a series battery operation α 1 , α 2 · · · α N for the N blocks which leads to our second layer allocation in the time domain, as shown by: The second layer allocation problem can be solved by dynamic programming, which can be obtained by recursively computing J N , · · · , J 1 based on Bellman's equation [15]: where B n is updated by Equation 15. Equation 22 denotes the optimal benefit of last N − n + 1 blocks, which describes the tradeoff between the current rewards C α n n,M and the future rewards J n+1 (B n ). A battery operation is feasible if the energy constraints −B n−1 ≤ α n ≤ η B E n are satisfied for all possible B n−1 , accounted for during block n the system can at most store η B E n amount of energy and at most take B n−1 amount of energy from the battery. Equation 21 denotes the optimal benefits for the last block and the constraint (Equation 8) suggesting that the corresponding battery operation should be α N = −B N−1 for optimality evolved from any previous state B N−1 . We compute J n (B n−1 ) as well as the optimal battery operation policy α n = μ n (B n−1 ) for every B n−1 , n ∈ {1, 2, . . . , N}, where μ n (B n−1 ) is equivalent to a mapping function which maps the given B n−1 to the optimal α n . Obviously, the search procedure for the optimal battery operation α 1 , α 2 , . . . , α N is the dynamic programming which starts with the last period and proceeds backward in time [15]; thus, we can obtain the optimal battery operation policy set . Then, given the initial battery level B 0 = 0 and the optimal battery operations α * can be obtained. Through the mapping relation (Equation 18), our optimal policy can be solved. According to the analysis, we give Algorithm 1 to find the optimal power allocation and to make the process more clearly. It is interesting to note that the DP algorithm is similar, in principle, to the Viterbi algorithm except that the former is a backward operation, and thus, our algorithm enjoys the same computational efficiency as the Viterbi algorithm. The optimality can be proved by applying Bellman's equation [15].

Algorithm 1 Optimal offline power allocation algorithm in HUS mode for energy-harvesting wireless systems Initialization:
Block size N, subcarrier number M, energy arrival and amount n , E n , channel information H n which is defined in the first paragraph in Section 4, for ∀n ∈ {1, 2, . . . , N}. Set B 0 = 0, n = N and discretize each 0 ≤ B n ≤ η B n n=1 E n into a sufficient number to make the transition from state B n to a future state B n+1 possible, provided that (B n+1 − B n ) is a feasible input.

Iteration:
1: while n ≥ 1 do 2: Compute J n with for all discretized B n−1 ; 3: n ← n − 1; 4: end while Output: Optimal battery operation policy set μ * B n = B n−1 + α * n ; 8: n ← n + 1; 9: end while Output: The determined value of the optimal battery operation for each block α * Calculate the optimal sum power P * n allocated to block n using the corresponding relationship between optimal control variables and allocated powers from Equation 18; 12: Solve the first layer problem using Equations 16 and 17; 13: n ← n − 1; 14: end while Output: The optimal power allocation p n,m for subcarrier m in block n;

Online policy
Previously, we solve the maximization problem noncausally, which means it is necessary to know the realization of the harvesting energy and the channel in advance in order to determine the optimal transmission power. However, such information may not be available in all circumstances. Thus, in this section, based on a benchmark solution as well as insights provided in last section, we will analyze the online scheduling with the assumption that the transmitter only has the knowledge of the energy amount of the current block and the probability density function of the harvesting energy and the channel gains. We say that causal current block information is available (i.e., s n ) as future states are not a priori known. Thus, this allows us to model and treat the unpredictable nature of the wireless channel and harvesting environment. Let the accumulated channel states be H n H n,1 , H n,2 , . . . , H n,M and thus denote the state s n = (H n , E n , B n−1 ), n ∈ {1, 2, . . . N}. We assume the initial state s 1 = (H 1 , E 1 , B 0 ) to be always known at the transmitter.

Optimal online policy
The optimal solution is to decide the optimal battery operation α n for the block n. Hence, the optimization now is becoming the expected mutual information summed over a finite horizon of N blocks, by choosing a deterministic battery operation policy from the set π = {α n = μ (s n ) , ∀s n , n = 1, 2, . . . , N} based on the state s n . Then, by applying Equation 18, we will obtain the optimal power allocation for each subcarrier in each block. This can be solved by the dynamic programming with the only knowledge of the current block state. The detail is described as follows. Given the initial state s 1 = (H 1 , E 1 , B 0 ), the maximum throughput is given by J 1 (s 1 ) which can be obtained by recursively computing J N (s N ) , · · · , J 1 (s 1 ) based on Bellman's equation [15]: for n = 1, 2, . . . , N − 1, where is a function that takes the expectation over the distribution of the harvesting process and the fading process. The optimal battery operation policy is denoted as π * = α * n = μ * n (s n ) , ∀s n , n = 1, 2, . . . N and can be solved iteratively. However, it is possible to further decrease the dimension of the problem to make it more tractable. For example, if the arrival process is Markovian or i.i.d., the past states do not provide any additional information about the process. The optimality can be proved by applying Bellman's equation [15].

Structure online policy
In this subsection, based on the structure properties of the offline optimal solution obtained, we give an intuitive understanding of the optimal solution. During the hill segment HS(j, m), if E k /l k ≥ M j=1 β k,j (i.e., charging), block k ∈ j, . . . , m is allocated with the water level w + , or if E k /l k ≤ M j=1 γ k,j (i.e., discharging) block k ∈ j, . . . , m is allocated with the water level w − . Otherwise, that is, M j=1 γ k,j < E k /l k < M j=1 β k,j (i.e., neutral), block k ∈ j, . . . , m is allocated with power w 0 , where β k,j = w + − (H k,j ) −1 + , γ k,j = w − − (H k,j ) −1 + . Hence, we now proposed a heuristic online policy as follows, based on the determination of w + and w − . Note that without loss of the generality, we assume the energy arrival duration is unit (i.e., k = 1, ∀k ∈ {1, 2, . . . N}).
where w 0 can be obtained by traditional water-filling method with the power constraint E k . Assuming that the distribution of the harvesting process is known as f (p E ), we propose finding fixed water levels w + and w − that simultaneously satisfy: Equation 26 provides long-term energy stability by ensuring that the expected energy stored in and drawn from the battery are equal. Equation 27 can be obtained from Property 2. A more simple way to approximately determine the water levels w + and w − can be described as follows: where only the knowledge of the current state is needed, simplifying the computation due to there is unnecessary to know the distribution of the energy-harvesting process. For completeness, an implementation of the proposed online policy algorithm is given in Algorithm 2. The power allocation p n,m for each subcarrier in each block.

Numerical results
In this section, we present the numerical results in order to demonstrate the performance of our offline and online policies.
We first give a pictorial view of the optimal offline power allocation strategy for our HUS harvesting system in Figure 3. The channel level, defined as the reciprocal of channel gain, serves as the bottom of a vessel. Note that there are two hill segments, i.e., HS(1, 4) and HS (5,8).
We can see that the water levels of charging of blocks are equal within hill segment and the same phenomenon to the discharging blocks. Moreover, the water levels of different modes are respectively non-decreasing between hill segments. In the second hill segment, note that w + , w − , w 0 satisfy the relationship w + ≥ w 0 ≥ w − . Particularly, no power are allocated to the first subcarrier of block 1, the third subcarrier of block 3, and the second subcarrier of block 5 accounting for the fact that the corresponding channel gain is so bad that the reciprocal of channel gain exceeds the water level.
We compare the maximum throughput of our policy to various harvesting architectures in Figure 4, where each throughput point is obtained by averaging over 1,000 random harvested energy data in Rayleigh fading of unit power. We assume that there are a total of N = 6 blocks, for which harvested energy varies independently from one block to another following the uniform distribution over the range [1,8], symbolically denoted as U (1,8). Each block has a random duration uniformly distributed as U (1, 4). We determine the HSU results by using the optimal power policy developed in [12] and taking into account the storage efficiency. HU (i.e., harvest-use) is the greedy policy which means immediately using up the harvesting energy without storage. It is observed from the figure that HUS mode always outperforms its counterparts, regardless of the storage efficiency. For very low storage efficiency less than 0.4, the performance of HUS coincides with that of HU, implying that no energy is stored, as shown in the left subfigure. Specially, when η B = 1, HSU achieves the same performance as HUS does, as shown in the right subfigure. Figure 5 shows the average throughput achieved with the optimal offline policy and the online policies. It is  observed that either the proposed online with the determination of w + and w − in Equation 26 or in Equation 28 performs significantly well in comparison with the online optimal policy, while all remain notably close to the optimal offline upper bound in the absence of non-causal harvesting and fading information.

Conclusions
In this paper, we analyzed the problem of maximizing the data transmission for the energy-harvesting wireless communication systems in the frequency-selective fading channel, which operates on HUS mode. We proposed a DP-based double-layer policy and analyzed the properties of the solution. It was shown that the optimal policy has a double-threshold structure. Based on this, we further provided an optimal online policy and a heuristic online one. Numerical results perform superiorly over other offline strategies with different energy-harvesting architectures and show that the proposed online policy performed notably well, closely tracking the optimal online policy.