Optimal power allocation on discrete energy harvesting model

This paper studies the power allocation problem in energy harvesting systems with finite battery. We adopt the discretized energy arrival and power allocation model. Hence, the service process can be modeled as a finite state Markov chain. Based on the discretized model, we analyze the stationary distribution of the Markov chain and formulate the utility maximization problem, which is then reformed as a linear programming problem. By analyzing the linear programming problem, we provide some intuition on the structure of the optimal power allocation policy and find the condition in which the greedy power allocation is optimal. Numerical simulations show the influence of the energy arrival process on the optimal power allocation policy, and the results are consistent with our analysis.


Introduction
With increasing CO 2 emissions in communication networks, how to realize green communications in the future has become an important and hot topic in the academic society. Besides the energy-efficient protocol design to reduce the energy consumption of the conventional wireless systems, utilizing renewable energy (e.g., solar or wind energy) to power on communication devices, namely, the energy harvesting technology, provides a new way for green by exploiting sustainable energy sources and hence is a promising solution to achieve environmentfriendly communications. Recent developments in hardware have made the energy harvesting technology feasible for modern communication systems. For instance, a wind-powered tower for wireless base stations has been designed by Ericsson [1]. However, due to the randomness of the energy arrival process, how to optimally allocate the harvested energy is a challenging issue.
In recent years, a lot of research efforts have been focused on the energy harvesting systems. For the additive white Gaussian noise (AWGN) channel, the problem of minimizing the transmission completion time with infinite battery capacity in non-fading channel is studied in [2] for two scenarios, i.e., all packets are ready before transmission and packets arrive during transmission. Tutuncuoglu [3] finds the optimal transmission policy to maximize the short-term throughput with limited energy storage capacity, and exploits the relation between the throughput maximization and the transmission completion time minimization. For the fading channel, authors in [4] propose the directional waterfilling (WF) algorithm which is proved throughput optimal for greedy source. Similar result is obtained in [5], which further considers the optimal solution with causal information. The algorithm is then extended to multiple antennas scenario in [6], where the spatial-temporal WF is proposed. Further, considering the dynamic data arrival with hybrid energy harvesting and power grid supplies, [7] proposes the optimal reverse multi-stage WF policy. Considering the circuit power consumption, a two-phase transmission policy is shown to be optimal [8]. In [9], the authors study the throughput maximization problem for the orthogonal relay channel with energy harvesting source and relay nodes under the deterministic model and show the structure of the optimal source and relay power allocation. Although the above algorithms give some insights about the optimal solution, they assume that all the energy arrival, the channel fading, and the data arrival must be explicitly known before transmission, which is called the offline condition. Since the solutions based on the offline condition require accurate predictions for the system states, they are not always applicable in real communication systems. Based on the online condition that only the past and current system states can be known, researchers have studied the optimal and sub-optimal power allocation policies in some special scenarios. Sharma [10] identifies throughput optimal and mean delay optimal energy management policies and shows a greedy policy to be optimal in low SNR regime with infinite battery capacity. And a throughput maximization algorithm in point-topoint communications with causal information based on Markov decision process (MDP) [11] approach is proposed in [12]. Recent work [13] studies the finite-horizon scheduling problem with discrete rates and proposes a low complexity threshold-based policy. However, the properties of the optimal solution can not be directly obtained via MDP approach. In addition, the MDP approach experiences very high computational complexity due to the curse of dimensionality, hence may not be applicable when the system state space grows large. From the information theory perspective, [14] studies the channel capacity of energy harvesting links with finite battery capacity and proves that the Markovian energy management policies are sufficient to achieve the capacity. Besides the throughput maximization problems, some other issues on the energy harvesting systems, such as the quality of service (QoS), the energy efficiency, and etc. are also studied. Huang [15] studies the utility optimization problem in energy harvesting networks under limited average network congestion constraint and develops a close-tooptimal algorithm using the Lyapunov optimization theory, which jointly manages the power allocation and the data access control. As the renewable energy is usually distributed asymmetrically in space domain, there are some papers considering the energy cooperation problem to balance the harvested energy in different places, including cellular network planning [16] and power grid energy saving [17], so that the overall system energy efficiency can be improved. But still, under the dynamic property of the energy harvesting process, how to allocate the energy to achieve the optimal system performance in general case is still an open question. It is desirable to explore the closed-form analytical solution for the online condition with some statistic characteristic of the energy harvesting process.
In this paper, we consider the power allocation problem in energy harvesting capacity to achieve the optimal system utility. Specifically, we study a single link with renewable energy transmitter, which only has the casual state information, including the distribution of the energy harvesting process, the past, and the current battery energy state. We model the energy arrival, storage, and usage as a discrete model and derive the optimal solution with closed-form expressions. The main contributions of this paper are presented as follows.
• We propose the discrete model for the energy harvesting system analysis. On one hand, the digital equipment has been widely used in modern communication systems, and it is feasible to give a discrete model for the energy harvesting process. On the other hand, the discrete model enable us to give a Markovian analysis and get some interesting closed-form analytical solution. • For the independent identically distributed (i.i.d.) energy arrival process, we show the optimal solution can be obtained by solving a linear programming problem. Based on the linear programming formulation, we get some properties of the optimal power allocation policy and find the condition under which the greedy policy is optimal. • Through extensive numerical simulations, we discuss the influence of the statistics of the energy arrival process on the optimal power allocation policy, which is shown consistent with our mathematical analysis.
The rest of the paper is organized as follows. Section 2 presents the system model. The problem is formulated and analyzed in Sections 3 and 4, respectively. Some numerical results are provided in Section 5 to evaluate the performance analysis. Finally, Section 6 concludes the paper.
Notations: Bold upper case and lower case letters denote matrices and vectors, respectively. (·) T denotes the transpose of a matrix or a vector. 0 n×m and 1 n×m represent the n × m matrices with all elements equal to 0 and 1, respectively. If n = m, they can be simplified as 0 n and 1 n . I n is the n × n unit matrix. E is the expectation operation.

System model
We consider a single link time-slotted wireless communication system with slot length T f . The transmitter is powered by renewable energy, which is harvested from the environment and stored in a battery with finite capacity B max . The greedy data source is assumed to focus on the utility maximization with efficient harvested energy usage. The system model is illustrated in Figure 1. The utility is assumed to be a strictly concave and increasing function of allocated transmit power. As the slot length is fixed, the utility can be equivalently viewed as a function of used energy S t , denoted by u(S t ). While the transmit energy S t depends on the system state, i.e., the amount of energy stored in the battery in slot t, denoted by E t .
The state transition happens between time slots as shown in Figure 2. At the beginning of slot t, the power allocation policy determines the amount of used energy S t based on the system state E t (red bar in the figure). Considering the energy causality constraint, the allocated energy cannot exceed the energy stored, i.e., S t ≤ E t . DenoteẼ t as the transition state after the power allocation process and before the energy harvesting process. According to the power allocation result, the transitional system state updates asẼ t = E t − S t . Then, the amount of energy A t is harvested during slot t is stored into the battery at the end of the slot. Note that the battery energy cannot exceed the battery capacity. Hence, the additional energy will be wasted if the battery is full. As a result, the system state is updated according to the following equation: In this paper, we consider a discrete system model, i.e., the energy is discretized with unit E. In the discrete model, the battery capacity can be expressed as B max = N E, where N is an integer. Hence, for the ease of description, we omit E and denote the system state as E t = n, n = {0, 1, . . . , N} which indicates that n E amount of energy is stored in the battery. Besides, the energy arrival A t and the power allocation S t are also discretized with the unit energy E, which will be detailed in the following subsections.

Energy arrival model
The amount of energy arrived in each time slot is assumed to be i.i.d. and takes non-negative integer values. The distribution of the energy arrival is expressed as: where h k ≥ 0, k h k = 1. After harvesting the arrived energy, the system state transits from the transition stateẼ t to the state of the next slot E t+1 . The state transition probability matrix due to energy arrival and harvesting process can be expressed as an (N + 1)×(N + 1) matrix H, namely, harvesting matrix, with elements H i,j denoting the transition probability from E t = i to E t+1 = j by harvesting (j − i) E amount of energy. As the amount of harvested energy is nonnegative, H is an upper triangular matrix. The elements of H can be calculated as: Note that j = N represents that the battery is full. In this case, the amount of arrived energy larger than N − i will tend to the same state E t+1 = N. Hence, the probability Figure 2 Discrete system state transition model. At the beginning of slot t, S t is used to transmit. At the end of the slot, A t is arrived and stored into the battery. Hence, the battery energy state at the beginning of slot t for j = N is a summation of energy arrival distribution that satisfy A t ≥ N − i.

Power allocation policies
Recall that the power allocation policies only depend on the current system state. Similar as the energy arrival model, we express the state transition probability matrix between E t andẼ t as a policy matrix P with elements p i,j ∈ {0, 1}, ∀i, j denoting the event of using (i − j) unites of energy in state E t = i. Then, the allocated power is calculated as (i − j) E/T f . Since the allocated energy is non-negative, all the non-zero elements of the policy matrix P is on the left side of the diagonal elements, i.e.: Besides, the deterministic policy tells us that only one action will be taken in each state. Hence, there is and only is one non-zero element in each row, which is as follows: Note that p i,j is relaxed to take the value between 0 and 1 for the ease of mathematical formulation and theoretical analysis in the next section. However, a deterministic optimal policy is ultimately obtained through our solution, which means that the relaxed problem is equivalent with the original problem. We will discuss this issue in detail later.

Utility model
For any amount of allocated energy S t , there is a corresponding utility u(S t ). Since we consider the discrete energy model, the utility also takes value from a finite set {u 0 , u 1 , . . . , u N }. Specifically, u k is the utility when the amount of k E energy is allocated in a slot, i.e., u k = u(k E). As mentioned before, it is assumed to be increasing and strictly concave for k ≥ 0 and u 0 = 0. For instance, if the optimal channel coding scheme with randomly generated codes is adopted, we can achieve the channel capacity given by: where σ 2 is the noise power. It is the well-known Shannon's equation [18]. Also, some other utility functions can be used, as long as they satisfy the monotonicity and the concavity properties.

Problem formulation
In this section, we formulate the utility maximization problem and transform the problem formulation into lin-ear programming based on the Markovian property. We also simply discuss the linear programming problem formulation from the MDP point of view.

Utility maximization and Markov chain-based formulation
The objective of our problem is to maximize the average utility over a long time period, i.e.: max lim where k 0 is the initial battery energy, and the allocated energy S t is determined by the battery energy state E t . Notice that we can apply MDP approach [11] to solve the infinite horizon average utility maximization with finite number of states (the number of system states, i.e., the battery energy states is N). However, the MDP approach usually encounters the curse of dimensionality problem.
In addition, the structure of the optimal policy is not clear as the MDP only outputs numerical results. To avoid the drawbacks of the MDP approach and analyze the optimal power allocation policy in detail, we consider to deal with the problem using linear programming. Specifically, for a given power allocation policy, the battery energy state forms a Markov chain. Firstly, we have the following lemma. (7) is irrelevant with the initial state k 0 .

Lemma 1. The problem
Proof. The Markov chain with battery energy as state satisfies the weak accessibility ( [11], Def. 4.2.2), as for a given energy arrival distribution {h 0 , h 1 , . . .} and for any states i, j, we can always find a stationary policy so that state i is accessible from state j. According to ([11], Prop. 4.2.3), the optimal average utility is the same for all initial states.
Actually, the conclusion of Lemma 1 is easy to be understood as we consider the long-term average performance; the influence of the state at some specific time is negligible. Based on Lemma 1, we only need to consider the stationary behavior of the Markov chain. For a given power allocation policy P, there always exists a stationary system state distribution π = [π 0 , π 1 , . . . , π N ] T , N i=0 π i = 1, π i ≥ 0, ∀i that satisfies: where π i is the probability that the battery energy is i and PH is the state transition probability matrix from state E t to state E t+1 . With the stationary distribution, the original problem (7) can be reformulated as: where the harvesting matrix H and the utility function u k , k = 0, . . . , N are predefined and (9b) is the expansion of (8). Note that different power allocation policies lead to different stationary distributions. On the contrary, if the state distribution varies, the optimal power allocation policy also changes. Hence, the unknown variables include both the power allocation policy p i,j , i ≥ j and the stationary distribution {π i }, which need to be jointly considered and optimized. As p i,j ∈ {0, 1}, it is a mixed optimization problem combining both integer programming and linear programming, which is difficult to be solved. To make it tractable, we transform the problem into a linear programming optimization problem in the rest of this section.

Problem reformulation with linear programming
Firstly, we relax the constraint (9e) and reformulate the problem as: where p i,j becomes continuous variable, which can be considered as a probabilistic power allocation policy. In the next section, we will prove that the relaxation does not change the optimal solution. In other words, the optimal policy p * i,j obtained by solving P-1R turns out to be of integer value.
Obviously, the problem P-1R is not a convex optimization problem since there is product π i p i,j in the constraint (9b). To solve this problem, we transform it by multiplying π i on both sides of the constraints (9c): When π i = 0, Equation (11) is the same with Equation (9c). On the other hand, if π i = 0, it means that the stationary state i does not exist, leading to no influence on the total utility. Hence, by denoting f i,j = π i p i,j , we can get the equivalent optimization problem of P-1 as: We can see that the optimization problem P-2 is a linear optimization as the objective function and the constraints are all linear functions. By solving the optimal f i,j and π i , p i,j is also obtained by: If π i = 0, any p i,j : N j=0 p i,j = 1, 0 ≤ p i,j ≤ 1 is optimal since the system state i has no influence on the total utility. Actually, f i,j is the probability that the system state is i and j − i units of energy is used.
In the objective function of problem P-2, the variable π i is not presented. And if f i,j is known, π i can be calculated via (12b) or (12c). Substituting π i by i k=0 f i,k in P-2, the optimal f i,j can be obtained by solving the following problem: Note that the right-side inequality of (12e) and the constraint (12f) is omitted in P-3 as they are guaranteed by (14c) and (14d). Further, the problem P-3 can be expressed in a matrix form as: is expressed as: wherẽ The problem can be further simplified based on the following lemma.
Based on Lemma 2, the number of the equality constraints is reduced from (N + 2) to (N + 1). Note that the rank ofÂ is (N + 1) since the first (N + 1) columns form an (N +1) unit matrix, which means that the equality constraints are irreducible. We re-write the problem as: In the rest of the paper, we focus on the solution for the problem P-4. As long as it is solved, the original problem P-1R is also solved. Specifically, the stationary distribution π can be calculated by (12b) or (12c), and the power allocation policy can be obtained by (13). In addition, the optimality proof of deterministic solution given in the next section guarantees the equivalence between problem P-4 and problem P-1. Hence, problem P-1 is also solved.
Remark 3. The optimization problem P-4 can also be derived based on the MDP theory [11]. Specifically, for a given time index t, the system state x t ∈ {0, . . . , N} is the battery energy state, the action a t (x t ) ∈ {0, . . . , x t } is the allocated energy. For the finite state problems, there exists an optimal stationary policy. Hence, we can omit the time index of a t . Then, the cost function g(x t , a(x t )) = −u a(x t ) is the negative utility, and the state transition is calculated as p ij (a(i)) = H i−a(i),j . We re-write the average utility maximization problem as an average cost per-slot minimization problem: The optimal policy satisfies the Bellman's equation ( [11], Prop. 4.2.1), which in the problem (24) takes the form: with a scalar λ anda vector s. The optimal cost λ * can be determined by solving the following linear programming problem ( [11], Sec. 4.5): Applying the duality theory of linear programming ( [19], Chap. 5), we can exactly get the linear program formulation as P-3, and hence, P-4 is formulated. However, with Markov chain analysis instead of MDP theory, we can get the clear physical meaning of the variable f i,j .
Remark 4. When solving the original problem P-1, it requires an exhaustive search over all the possible values of the parameters p i,j ∈ {0, 1}, ∀i ≥ j, which is of exponential complexity. While the complexity of solving the linear programming problem P-4 depends on the numerical algorithms which we apply. The most popular algorithms for solving linear programming problem are the simplex algorithm and the interior point algorithm [19]. It has been analyzed that the simplex algorithm is of exponential complexity in the worst case [20], while the complexity of a specific problem is case by case. We will show in the following section that in some cases, the optimal solution can be found without iteration. On the other hand, the interior point algorithm is of polynomial complexity. In summary, the complexity of the proposed linear programming is lower than that of the exhaustive search algorithm.

Optimal solution analysis
It is difficult to give an analytical solution for general conditions of H and u. In this section, we try to find some properties on the structure of the optimal power allocation policy and derive the condition in which some simple policy is optimal. We firstly present some general results about the optimal policy.

General properties
With a quick observation, we can firstly get the following property.

Proposition 1. (Feasibility)
The optimal solution f * for the problem P-4 exists.
Proof. Firstly, the problem P-4 is feasible as we can at least find one solution satisfying all the constraints. For instance: is feasible for the problem. Secondly, the feasible region for the linear programming problem is finite according to the constraints f ≥ 0 and 1 1× 1 2 (N+1)(N+2) · f = 1. Hence, the optimal solution exists.
Proposition 1 tells us that the problem is feasible, and the existence of the optimal solution is guaranteed by the finite state constraint.

Lemma 3.
The optimal solution f * for the problem P-4 can be achieved by the vector with at most (N + 1) non-zero elements.
Proof. The optimal solution for a linear programming can be achieved by its basic feasible solution [19], of which the number of non-zero elements is no more than the rank ofÂ. Since the rank ofÂ is (N + 1), its basic feasible solution has at most (N + 1) non-zero elements.
The geographic explanation of Lemma 3 is that for a linear programming problem, the optimal solution can always be found at the vertex (corresponding to the basic feasible solution) of the convex polyhedron defined by the constraints (23b) and (23c) [19]. In this sense, we only need to focus on the basic feasible solutions with relatively small number of non-zero elements. However, it is not guaranteed that the optimal solution must have no more than (N + 1) non-zero elements. For instance, if there are two vertexes to be optimal, all the linear combinations of the two vertexes are also optimal.
Based on Lemma 3, we can guarantee the deterministic optimal policy can be obtained.
Proof. Firstly, consider the case that π * i > 0, ∀i. That is, all the (N + 1) elements of π are non-zero. As we can find an optimal solution so that f * has at most (N +1) non-zero elements, based on (12c), there is and only is one non-zero element of f * i,0 , . . . , f * i,i for a given i. Then, according to (13), we have p * i,j ∈ {0, 1}.
If for some k, π * k = 0, the state k is a transient state. We can erase all the elements related with π * k in the problem formulation without changing the optimal solution. Specifically, in the original problem, by removing the kth row and k-th column of the matrix PH in (8), the k-th constraint in (9b), and the elements related with π k in (9a) and (9d), the optimal solution is not changed. Through the same deduction of the previous section, we can conclude the similar lemma that the optimal solution can be achieved with at most N non-zero elements. Similarly, we can prove p * i,j ∈ {0, 1}, ∀i = k. Then, by setting p * i,0 = 1, a deterministic optimal policy is also obtained. The proof can also be extended to the case that for several ks, π * k = 0.
Proposition 2 guarantees that a deterministic optimal power allocation policy can always be found. In fact, the deterministic optimal policy corresponds to the optimal basic feasible solution. Hence, in the sense of deterministic policy, the equivalence between P-4 and P-1, or in other words, the equivalence between P-1R and P-1 is guaranteed. By finding the optimal basic feasible solution of P-4, we can obtain the deterministic optimal power allocation policy for the original problem P-1. Again as discussed in the paragraph below Lemma 3, if there are two basic feasible solutions achieving optimal, the linear combination of the two solutions correspond to the probabilistic optimal policy. In practice, the deterministic optimal policy is desirable and also sufficient.

Theorem 1. (Structure of optimal policy)
For the optimal power allocation policy P * , if p * i,j > 0 and π * i > 0, for any m that satisfies π * m > 0 and m > i, we have: Proof. See Appendix 1.
Theorem 1 demonstrates that for the optimal policy, if k units of energy is allocated in state i, the allocated amount of energy k in the next state i + 1 will not increase drastically, i.e., k − k ≤ 1. The theorem can also be equivalently described as p * m,n = 0, ∀n > j, where π * m > 0 and m < i. Notice that the condition π * i > 0 indicates that state i is not a transient state. For a transient state, any power allocation policy is applicable as it does not change the objective. Hence, the conclusion does not hold for transient states.

Optimality of greedy policy
According to the feasible solution example (27), we can get that p i,0 = 1, ∀i, which turns out to be the greedy policy, i.e., in each slot, all the available energy in the battery is used up. In general, the greedy policy is not optimal. However, the following theorem provides the condition on which it is optimal.

Theorem 2. (Greedy optimal policy) If
the optimal solution for P-4 is f * = f g . The optimal power allocation is the greedy algorithm, which means that we use up all the available energy in every slot. The stationary distribution of the system state is π = h 0 .
For the ease of understanding the greedy optimal condition, we rewrite (29) as: where is the relative utility gain by increasing one unit energy in power allocation state i. Thus, given the utility function, the condition can be viewed as the case that the weighted sum of energy arrival distribution up to N − 1 weighted by relative utility gain is no more than 1. Since u i ≥ 1 and is decreasing function of i, the condition holds when the tail probability of energy arrival 1 − N−1 i=0 h i is large enough, and the value of h i is relatively small when i is small. Intuitively, such kind of energy arrival distribution shows the property that the amount of energy arrival is quite large relative to the battery capacity. Hence, the optimal policy tends to use up all the available energy in the battery in every time slot (i.e., greedy policy). In this case, the battery can store as many energy as it can, and hence, the energy wasted due to the battery overflow is reduced.
On the other hand, with a fixed energy arrival distribution {h i }, we can find the influence of the utility function on the optimality of greedy policy. Specifically, we adopt the Shannon's capacity as the utility function as expressed in (6). At low signal-to-noise ratio (SNR) regime, i.e., k E T f σ 2 is very small, we have the following approximation: which results in u i ≈ 1, ∀i. Since {h i } is a probability distribution, (30) naturally holds in this low SNR case. It can be explained as when the channel condition is poor, the capacity gain is linearly proportional to the power allocation. Consequently, the greedy policy is optimal as it not only can obtain the same capacity with the same available energy, but also reduces the amount of wasted energy due to battery overflow. A quick conclusion based on the Theorem 2 can be obtained as follows: Corollary 1. If the energy arrival is uniformly distributed between 0 and 2h with average arrival rateh, the greedy policy is optimal when: Proof. The energy arrival probability of uniform distribution is expressed as: By substituting h i in (29) with (34) and some derivation, we can get the condition on the average arrival rate.

Numerical results
We calculate some numerical examples of the optimal power allocation to demonstrate the structure and the properties of the optimal policies. We define: as the reference SNR with one unit of energy, and adopt the Shannon's equation to calculate the utility, i.e.: Set N = 10 and γ = 1 as an example, we examine the influence of the random energy arrival process on the optimal power allocation. Specifically, we consider some widely used distributions, including discrete uniform distribution, geometric distribution, Poisson distribution, and binomial distribution.
with meanh = (1−p)/p. And denote Bin(n, p) as the binomial distribution with parameters n, p. The probability density function is as follows: and the mean value ish = np. Since 0 < p < 1, we have n >h. Notice that except for binomial distribution, all the studied distributions have only one parameter. Tables 1, 2, and 3 provides the optimal power allocation policies for uniform distribution, Poisson distribution, and geometric distribution with different mean values. It can be seen that the minimum average arrival rates with which the greedy policy is optimal are different for different distributions (h = 13 for uniform distribution, h = 8 for Poisson distribution,h = 23 for geometric distribution). The result is consistent with our observation. Specifically, if h i is an increasing function of i, it is easier for the greedy policy to be optimal than the case that h i is decreasing. As Poisson distribution increases before the average value, while geometric distribution is strictly decreasing, consequently higher average energy arrival rate is needed so that greedy policy is optimal for geometric distribution.
Another observation from these results is that different from infinite battery capacity case where the optimal power will not exceed the average energy arrival rate [ [10] Theorem 1], the optimal power exceeds the average energy arrival rate at some cases. As the battery capacity is finite, the harvested energy may be wasted when the battery is full. Hence, more energy will be used when the battery energy is close to its limit in order to reduce the wastage of energy.
As the binomial distribution has two parameters, we provide the detailed numerical results in Tables 4, 5, and 6. It can be seen that when the average arrival rateh ≤ 7, the greedy policy will never be optimal. While forh = 8, the greedy policy is shown optimal when n ≥ 11. In addition, whenh ≥ 9, the greedy policy is always optimal for any feasible values of n >h. Since in the binomial distribution, h i also follows the same feature of Poisson distribution that it firstly increases and then decreases, the threshold ofh for the greedy policy to be optimal is relatively low.   Finally, Tables 7 and 8 compare the optimal policies for the considered distributions at high and low SNR regimes, respectively. It can be seen that at low SNR regime, the optimal policy for any distribution turns out to be the greedy policy. On the other hand, when the SNR is high, it is not true. Besides, the policies for Poisson distribution and binomial distribution are closer to the greedy policy at high SNR regime, which is consistent with the previous analysis.
Another interesting property can be seen from all the results, i.e., the optimal power allocation policy is a nondecreasing function of the battery energy state. Besides, the increasing step is no more than 1, which coincides to Theorem 1. The non-decreasing property can be explained as follows. The more the available energy is, the more allocated power is used in order to not only achieve higher data rate but also reduce the energy wastage due to battery overflow. As the battery capacity tends to Table 3   infinity, the non-decreasing property still holds as shown in [10].
We also run some simulations to evaluate the performance of the optimal power allocation policy. For comparison, we consider the following two policies as baselines: (1) Greedy power allocation policy, which allocates all the available energy to transmit data.
(2) Constant power allocation policy, which allocates the amount of energy equal to the average energy arrival rate. When the required constant energy is not available, the transmitter will allocate all the available energy.
The simulation results are shown in Figures 3, 4, 5, and 6. Specifically, under the uniform distributed energy arrival, it can be seen in Figure 3 that the constant power policy performs close to the optimal at two extreme cases, i.e.,h = 1 andh ≥ 8. While as was shown before, the greedy policy performs close to the optimal when h ≥ 13. In addition, the constant policy always performs   better than the greedy policy. For the Poisson distributed energy arrival (Figure 4), the greedy policy performs worse than the constant policy at low energy arrival rate regime (h < 6), but better at high energy arrival rate regime (6 ≤h < 10). The performance converge to the optimal whenh ≥ 10. In Figure 5, the behavior of these policies under Geometric distributed energy arrival is similar with the uniform distributed scenario. However, the two baseline policies in this scenario converge to the optimal much slower than in the uniform distributed scenario. At last, the performance under the binomial distributed energy arrival is illustrated in Figure 6, which is similar with the Poisson distributed case. To sum up, all the simulations show that the constant policy shows higher performance than the greedy policy in low energy arrival rate regime. However, there are still gaps to the optimal solution

Conclusions
In this paper, we analyzed the optimal power allocation policy under a discrete system model using the Markov chain analysis. We proved that the problem can be solved via linear programming approach and analyzed the properties of the optimal policy. And for the greedy power allocation policy, we found the condition to guarantee its optimality. Numerical results show that under finite battery capacity condition, the optimal policy is quite different from that under infinite battery capacity condition. Specifically, different from the infinite battery case, the energy allocated in each slot might be larger  than the average arrival rate. Also, extensive simulations showed the consistent results with the analysis for the greedy optimal condition. Based on the intuition provided through the analysis of discrete model, future work will extend our results to the continuous energy model scenario and the extensive wireless systems with multiple antennas/subcarriers.

Appendix 1 Proof of Theorem 1
We prove the theorem by contradiction. Notice that π * i > 0 means that state i is not a transient state. Suppose for  Then, the objective value can be written as: Denote c i = u T 0Â i + u T i . To guarantee the optimality of u T 0 h 0 , as f i ≥ 0, all the elements of c i , i = 1, . . . , N should be no more than zero. The k-th element of c i is as follows: where i = 1, . . . , N, k = 0, . . . , N − i. Due to the concavity of the utility function u j , we have: As a result, c i,N−i is the largest element for a given i. In addition, we have: where inequality (a) holds due to the concavity of u j , (b) is derived from (29), and (c) holds since h j ≥ 0 and u j is increasing. Combining (58) and (59), we can conclude that c 1,N−1 is the largest element. Since c 1,N−1 = N−1 i=0 h i (u i+1 − u i ) − u N − u N−1 ≤ 0, all the elements of c i , i = 1, . . . , N is non-positive, and hence the optimality of the greedy solution is proved.