We first employ the Lagrangian technique to investigate the properties of the transmit power whereby to obtain an intuitive insight into our optimization problem and then introduce dynamic programming algorithm to achieve the solution.
Solvability and properties of the solution
Theorem
1.
The optimal solution to Problem (P1) has the doublethreshold structure, and the form is described as follows:
$$ p_{k,j}^{*}=\left\{\begin{array}{ll} \left[\xi_{k}  (H_{k,j})^{1}\right]^{+}, & D_{k}>0\\ \left[\rho_{k}  (H_{k,j})^{1}\right]^{+}, & D_{k}<0\\ \left[\sigma_{k} (H_{k,j})^{1}\right]^{+}, & D_{k}=0 \end{array}\right. $$
((9))
where
$$\begin{array}{*{20}l} \xi_{k} = \left(\sum\limits_{n = k}^{N} {{\lambda_{n}}} {\eta_{B}}\right)^{1},~\rho_{k} = \left(\sum\limits_{n = k}^{N} {{\lambda_{n}}}\right)^{1} \end{array} $$
where ξ
_{
k
}, ρ
_{
k
}, and σ
_{
k
} is the water level of charging, discharging, and neutral block, respectively. Specially, the water level of neutral block can be obtained by maximizing the rate of block k, with a total power constraint \({\sum \nolimits }_{m = 1}^{{M}} {{p_{k,m}}}=E_{k}/\ell _{k}\) (i.e., the harvesting power, derived from D
_{
k
}=0) across the subchannels, using the traditional waterfilling method.
Proof.
The objective function (Equation 6) is a sum of log functions and is, thus, concave with respect to the power sequence. We can further show the convexity of the constraint set defined by Equation 7 by the method of induction. As such, our throughput maximization problem has a unique solution, according to the theory of convex optimization. For notational simplicity, denote the Lagrangian function for any λ
_{
n
}≥0 by:
$$ {\cal L} = \mathbb{C} +\sum \limits_{n = 1}^{N} {\lambda_{n}}\left({{\eta_{B}}\sum \limits_{i = 1}^{n} {{\left[ {D_{i}} \right]}^ + }  \sum \limits_{i = 1}^{n} {{\left[ {D_{i}} \right]}^ + }} \right) $$
((10))
The Lagrangian function in Equation 10 is, in essence, the summation of all the nonzero entries of a lower triangular matrix. Differentiating with respect to p
_{
k,j
}, we obtain:
$$\begin{array}{*{20}l} \frac{{\partial {\cal L}}}{{\partial {p_{k,j}}}} = \frac{{\ell_{k}}{H_{k,j}}}{{1 + {p_{k,j}}{H_{k,j}}}}+\sum\limits_{n = k}^{N} {{\lambda_{n}}} \frac{{d\left({{\eta_{B}}{{\left[ {{D_{k}}} \right]}^ + }  {{\left[ {  {D_{k}}} \right]}^ + }} \right)}}{{d{p_{k,j}}}} \end{array} $$
((11))
To handle the nonlinearity of the rectifier function [D
_{
k
}]^{+}, we alternatively represent [D
_{
k
}]^{+} in terms of the signum function to yield \({\left [ {{D_{k}}} \right ]^ + } = \left ({{D_{k}} + {D_{k}}{\text {sgn}} \left ({{D_{k}}} \right)} \right)/2\). The Kuhn Tucker condition for the optimality of a power allocation is as follows [14]:
$$ \frac{{\partial {\cal L}}}{{\partial {p_{k,j}}}} = \left\{ \begin{array}{ll} = 0, & if{\kern 6pt} {p_{k,j}} > 0 \\ \le 0, & if{\kern 6pt} {p_{k,j}} = 0 \end{array} \right. $$
((12))
which guarantees the constraint p
_{
k,j
}≥0 is satisfied. Recognize that \({{d{\text {sgn}} \left (x \right)}}/{{dx}} = 2\delta \left (x \right)\), x
δ(x)=0, the optimal power allocation can be described as below:
$$\begin{array}{*{20}l} {p_{k,j}} &= \left[\vphantom{\frac{q^{2^{3}}}{q}}\left[{\sum\nolimits}_{n = k}^{N} \lambda_{n} \left\{ \left({{\eta_{B}} + 1} \right)\vphantom{\frac{q}{q}}\right.\right.\right.\\ &\quad+\left.\left.\left.\!\!\!\vphantom{\frac{q}{q}} \left({{\eta_{B}}  1} \right){\text{sgn}} \left({D_{k}} \right) \right\}/2\vphantom{{\sum\nolimits}_{n = k}^{N}}\right]^{1} {H_{k,j}}^{1}\right]^{+} \end{array} $$
((13))
To solve this equation for the optimal power, we identify three cases for the signum function, which will lead to Equation 9. Particularly, when D
_{
k
}=0, which means all the harvested energy of block k has been allocated to the same block. Based on waterfilling method in [13,14], we know the maximum throughput will be obtained by allocating the power σ
_{
k
}, which is determined by:
$$ \sum\limits_{j=1}^{M}\left[(\sigma_{k})^{1}(H_{k,j})^{1}\right]^{+}=E_{k}/\ell_{k} $$
((14))
Above all, the main result in Equation 9 provides a basis to investigate the properties of the optimal power allocation policy for the new energyharvesting system. This theorem reveals more interesting properties in the optimal powerallocation pattern, as summarized below. For ease of description, we define some terms for subsequent use.
Definition
1.
A block that hits the zero battery level is called a valley block or simply a valley. The blocks between any two closest valleys constitute a hill segment.
A hill segment starting from block a and ending at s>a is briefly denoted as HS (a,s), which means B
_{
a−1}=B
_{
s
}=0. Now we can state the main properties of our problem.
Property
1.
The subcarriers in the same block possess the same water level.
Proof.
Since the water level is only related with the block index k in Equation 9 rather than the subcarrier index j, thus the water level in the same block is equal.
Property
2.
Within a hill segment (e.g., HS (a,s)), all the energycharging blocks have the same water level, equal to w
^{+}, whereas the energydischarging blocks has the similar property, which has a lower water level w
^{−}, where:
$$\begin{array}{*{20}l} w^{+} & =\left(\sum\limits_{n = s}^{N} {{\lambda_{n}}} {\eta_{B}}\right)^{1}=\xi_{s} \\ w^{} & =\left(\sum\limits_{n = s}^{N} {{\lambda_{n}}}\right)^{1}=\rho_{s} \end{array} $$
Particular, the energyneutral block k∈[a,s] has the water level w
^{0}, which is determined by Equation 14. Here, we always have w
^{+}≥w
^{0}≥w
^{−}.
Proof.
Since B
_{
k
}≠0, we assert λ
_{
k
}=0, ∀k∈[a,s−1], in accordance with the slackness conditions \({\lambda _{k}}\left ({{\eta _{B}}\sum \nolimits _{i = 1}^{k} {{\left [ {D_{i}} \right ]}^ + } \sum \nolimits _{i = 1}^{k} {{\left [ {D_{i}} \right ]}^ + }} \right)=0\). Thus, for D
_{
k
}>0 which corresponds to the battery state of energy charging, the water level, j=1,2,…,M:
$$w_{k,j}=\left(\sum\limits_{n = k}^{N} {{\lambda_{n}}} {\eta_{B}}\right)^{1}=\left(\sum\limits_{n = s}^{N} {{\lambda_{n}}} {\eta_{B}}\right)^{1}=w^{+}. $$
Similarly for D
_{
k
}<0 which corresponds to the battery state of energy discharging, we obtain:
$$w_{k,j}=\left(\sum\limits_{n = k}^{N} {{\lambda_{n}}}\right)^{1}=\left(\sum\limits_{n = s}^{N} {{\lambda_{n}}}\right)^{1}=w^{}. $$
Then, consider block s with B
_{
s
}=0. The only possibility for B
_{
s
}=0 is that block s is a discharging period and hence, D
_{
s
}<0. It follows from Equation 9 that:
$$w_{s,j}=\left(\sum\limits_{n = s}^{N} {{\lambda_{n}}}\right)^{1}=w^{}. $$
Particular, when D
_{
k
}=0, which corresponds to the battery state of energy neutral, we obtain:
$${\small\begin{aligned} w_{k,j}=\left(\sum\limits_{n = k}^{N} {{\lambda_{n}}\left\{ {\left({\eta_{B}} + 1\right) + \left({{\eta_{B}}  1} \right){\text{sgn}} \left({0} \right)} \right\}/2}\right)^{1}\!\!=w^{0}=\sigma_{k} \end{aligned}} $$
Since D
_{
k
}=0 means all the harvested energy of block k has been allocated to the same block, the water level w
^{0} is determined by Equation 14. Since 1≥η
_{
B
}≥0, we can easily know w
^{+}≥w
^{0}≥w
^{−}.
Property
3.
The water levels of charging blocks, though equal within a hill segment, is monotonically nondecreasing from one hill segment to the next. The same assertion is true for the discharging blocks.
Proof.
Assuming there are M valley blocks, we denote these blocks as V
_{1},V
_{2},…,V
_{
M
}, and then respectively denote \(w^{+}_{V_{i}}\) and \(w^{}_{V_{i}}\) as the optimal water level of charging and discharging blocks within the hill segment HS(V
_{
i−1}+1, V
_{
i
}), yielding:
$$\begin{array}{*{20}l} w^{+}_{V_{i}} & =\left(\sum\limits_{n = V_{i}}^{N} {{\lambda_{n}}} {\eta_{B}}\right)^{1}\\ w^{}_{V_{i}} & = \left(\sum\limits_{n = V_{i}}^{N} {{\lambda_{n}}}\right)^{1}\\ \end{array} $$
Since λ
_{
n
}≥0, V
_{
i
}>V
_{
i−1}, thus w
^{+} and w
^{−} monotonically increase from one hill segment to the next.
These properties imply that:

1.
Actually, within the block, the power allocation is equivalent to the traditional water filling.

2.
The water level has a close relationship with the battery mode (i.e., charging D
_{
k
}>0, discharging D
_{
k
}<0, neutral D
_{
k
}=0).

3.
In order to maximize the total throughput, the power management policy should balance the tradeoff among the whole transmission process. Thus, the energy may be transferred from current block to the future to ensure the optimal benefit. Also, on account of the causality constraint, energy can not be used before its arrival. These two aspects in turn prove the properties.

4.
The main insight is that the optimal solution has a doublethreshold structure as shown in Theorem 1.
All above leads to an intuitive understanding of our optimal solution; based on this, we introduce dynamic programming algorithm and form a doublelayer allocation problem to obtain our optimal solution, which is shown below.
Dynamic programming
In this subsection, we develop a DP approach for our throughput maximization problem. Recall that the battery status of a block, say block n, depends on its charging and discharging history up to the time n. Specifically, it follows from Equation 5 that:
$$\begin{array}{*{20}l} {B_{n}} &= \sum \limits_{i = 1}^{n}\left({\eta_{B}} {\left[ {{D_{i}}} \right]^ + }  {\left[ {{D_{i}}} \right]^{+}}\right) \\ &= \sum \limits_{i = 1}^{n1}\left({\eta_{B}} {\left[ {{D_{i}}} \right]^ + }  {\left[ {{D_{i}}} \right]^{+}}\right)+ \left({{\eta_{B}}{\left[ {{D_{n}}} \right]^ + }  {\left[ {{D_{n}}} \right]^ +}}\right) \\ &= {B_{n1}} + {\alpha_{n}} \end{array} $$
((15))
where α
_{
n
} can be regarded as an operation of the battery (i.e., battery mode), which accords with the dynamic programming model (i.e., basic discrete dynamic system) [15]. Equation 15 is a basic discrete time dynamic system, which can be regarded as an orderone Markov process or more accurately random walk model. In what follows, we introduce the first layer allocation which characterizes how to allocate the power within block n to achieve the maximum throughput in the frequency domain:
$$\begin{array}{*{20}l} \left(\mathrm{First~ layer}\right)~C_{n,M}^{P_{n}}:= \frac{{\ell_{n}}}{2}\sum\limits_{m = 1}^{{M} } {\log \left({1 + {{p_{n,m}}{{ {{H_{n,m}}} }}}} \right)} \end{array} $$
((16))
$$\begin{array}{*{20}l} s.t. ~~{p_{n,m}}=\left[w  \left(H_{n,m}\right)^{1}\right]^{+},~~\sum\limits_{m = 1}^{{M}}{{p_{n,m}}}=P_{n} \end{array} $$
((17))
where the water level w is determined by the argument D
_{
n
} based on Equation 9. P
_{
n
} is the sum power which allocated to the block n, and is actually a mapping relation with α
_{
n
}:
$$\begin{array}{*{20}l} {P_{n}} = \left\{ {\begin{array}{*{20}c} \frac{{{E_{n}}  \frac{{{\alpha_{n}}}}{{{\eta_{B}}}}}}{{{\ell_{n}}}},& {\alpha_{n}} \ge 0, & {\alpha_{n}} = {\eta_{B}}{D_{n}}\\ \frac{{{E_{n}}  {\alpha_{n}}}}{{{\ell_{n}}}}, & {\alpha_{n}} < 0, & {\alpha_{n}}={D_{n}} \end{array}} \right. \; \end{array} $$
((18))
Now, the benefit function is related to α
_{
n
}, and we can rewrite \(C_{n,M}^{P_{n}}\) as \(C_{n,M}^{{\alpha _{n}}}\). Therefore, the maximum total benefit is to find a series battery operation α
_{1},α
_{2}⋯α
_{
N
} for the N blocks which leads to our second layer allocation in the time domain, as shown by:
$$\begin{array}{*{20}l} (\mathrm{Second~ layer})~~ &{\underset{{\alpha_{1}},{\alpha_{2}} \cdots {\alpha_{N}}}{\max }} \sum\limits_{n = 1}^{N} C_{n,M}^{{\alpha_{n}}} \end{array} $$
((19))
$$\begin{array}{*{20}l} s.t. ~~~~ {B_{n}} = {B_{n1}} &+ {\alpha_{n}} \ge 0, ~~\forall n \in \left\{ {1,2,\ldots,N} \right\} \end{array} $$
((20))
The second layer allocation problem can be solved by dynamic programming, which can be obtained by recursively computing J
_{
N
},⋯,J
_{1} based on Bellman’s equation [15]:
$$\begin{array}{*{20}l} {J_{N}}\left({{B_{N1}}} \right) &= {\underset{{\alpha_{N}}= {B_{N1}}}{\max }}{C_{N,M}^{\alpha_{N}}} \end{array} $$
((21))
$$\begin{array}{*{20}l} {J_{n}}\left({{B_{n1}}} \right) &= {\underset { {B_{n1}} \le {\alpha_{n}} \le {\eta_{B}}{E_{n}}}{\max }} \left\{ {{C_{n,M}^{\alpha_{n}}} + {J_{n + 1}}\left({{B_{n}}} \right)} \right\},\\ &~~~~~~n=1,~2,~\ldots,~N1 \end{array} $$
((22))
where B
_{
n
} is updated by Equation 15. Equation 22 denotes the optimal benefit of last N−n+1 blocks, which describes the tradeoff between the current rewards \({C_{n,M}^{\alpha _{n}}}\) and the future rewards J
_{
n+1}(B
_{
n
}). A battery operation is feasible if the energy constraints −B
_{
n−1}≤α
_{
n
}≤η
_{
B
}
E
_{
n
} are satisfied for all possible B
_{
n−1}, accounted for during block n the system can at most store η
_{
B
}
E
_{
n
} amount of energy and at most take B
_{
n−1} amount of energy from the battery. Equation 21 denotes the optimal benefits for the last block and the constraint (Equation 8) suggesting that the corresponding battery operation should be α
_{
N
}=−B
_{
N−1} for optimality evolved from any previous state B
_{
N−1}. We compute J
_{
n
}(B
_{
n−1}) as well as the optimal battery operation policy α
_{
n
}=μ
_{
n
}(B
_{
n−1}) for every B
_{
n−1}, n∈{1,2,…,N}, where μ
_{
n
}(B
_{
n−1}) is equivalent to a mapping function which maps the given B
_{
n−1} to the optimal α
_{
n
}. Obviously, the search procedure for the optimal battery operation α
_{1}, α
_{2}, …, α
_{
N
} is the dynamic programming which starts with the last period and proceeds backward in time [15]; thus, we can obtain the optimal battery operation policy set \(\left \{ {\mu _{1}^{*}\left ({{B_{0}}} \right),~\mu _{2}^{*}\left ({{B_{1}}} \right), ~\ldots,~\mu _{N}^{*}\left ({{B_{N  1}}} \right)} \right \}\). Then, given the initial battery level B
_{0}=0 and the optimal battery operations \({\alpha _{1}^{*}}={\mu _{1}^{*}\left ({0} \right)},{\alpha _{2}^{*}}=\mu _{2}^{*}\left ({{\alpha _{1}^{*}}} \right),~\ldots,~{\alpha _{N}^{*}}=\mu _{N}^{*}\left ({{\sum \nolimits }_{n=1}^{N  1} {\alpha _{n}^{*}}} \right)\) can be obtained. Through the mapping relation (Equation 18), our optimal policy can be solved. According to the analysis, we give Algorithm 1 to find the optimal power allocation and to make the process more clearly. It is interesting to note that the DP algorithm is similar, in principle, to the Viterbi algorithm except that the former is a backward operation, and thus, our algorithm enjoys the same computational efficiency as the Viterbi algorithm. The optimality can be proved by applying Bellman’s equation [15].