Throughput maximization-based optimal power allocation for energy-harvesting cognitive radio networks with multiusers

An optimal power allocation (OPA) policy for orthogonal frequency division multiplexing (OFDM)-based cognitive radio networks (CRNs) using underlay spectrum access model is presented under multiple secondary users (SUs) with energy harvesting (EH). The proposed algorithm can allocate transmission power to each SU on each subcarrier with the objective of maximizing the average throughput of secondary network over a finite time interval. We consider both the interference power constraint limited by primary user (PU) and the minimum throughput constraint of each SU to improve the throughput of SUs while guaranteeing the communication quality of PU. To balance current throughput and expected future throughput, a dynamic programming (DP) problem is defined and solved by the backward induction method. Moreover, for each time slot, a convex immediate optimization is presented to obtain an optimal solution, which can be solved by the Lagrange dual method. Simulation results show that our policy can achieve better performance than some traditional policies and ensure good quality of service (QoS) of PU when SUs access the spectrum.

wireless communication scenarios based on different considerations addressed in [4][5][6][7]. An OPA policy for EH wireless communications with limited channel feedback from receiver is investigated in [4] where the receiver periodically sends only 1-b feedback by comparing channel power gain with a predetermined threshold. In [5], a power allocation policy for an access-controlled transmitter with EH capability based on causal observations of channel fading state is considered. In [6], the authors study the OPA for an outage probability minimization problem in point-to-point fading channels with the constraints of the EH and the channel distribution at transmitter. In [7], a Markov decision process (MDP) model is proposed for the energy allocation problem over a finite horizon to maximize the throughput under point-to-point wireless communications, and both channel conditions and time varying energy sources are taken into account. According to the above research results, EH technology can obviously improve the energy efficiency; however, it also brings some new challenges to the power allocation strategy, such as the problem of uncertainty in EH process.
In fact, the former papers [4][5][6][7] have only considered point-to-point wireless communications without using CR technology. Recently, some researches have focused on the combination of EH technology and CRN, and different OPAs are discussed in [8][9][10][11][12][13][14]. The OPA for EH CRN is studied in [8], the authors consider the problem of system throughput maximization over a finite horizon rather than at a certain time slot, and they adopt a rate loss constraint to protect the transmission of primary user (PU). The access strategy for hybrid underlayoverlay CR networks with EH is analyzed in [9]. The partially observable Markov decision process (POMDP) framework is proposed to determine the action of SU; meanwhile, energy threshold is used to determine the transmission mode of SU. In [10], considering an EH-CR system, a power allocation policy with peak power constraints is proposed, and the target throughput maximization problem is solved by recursion machinery and geometric water-filling algorithm. A novel saving-sensing transmitting (SST) frame structure for EH CRNs is proposed in [11], where the authors aim to maximize the energy utilization efficiency of SU by jointly optimizing the save ratio and transmission power under both the energy causality constraint and the minimum throughput constraint. And the SST can make full use of residual battery energy as well as ensure enough time for spectrum sensing and data transmission. A generalized multislot spectrum sensing paradigm and two types of fusion rules (data fusion and decision fusion) are proposed in [12]. The authors focus on the trade-off of "harvesting-sensingthroughput" and joint optimization for save ratio, sensing duration, and sensing threshold as well as fusion rule to maximize the expected achievable throughput of SU while keeping the protection to PU. A POMDP is proposed in [13] to trade off energy consumption and throughput gain in hybrid CRN, where SU dynamically determines its operation mode for each time slot ( e.g., to be idle or to transmit), sensing time, and access mode. In [14], for an overlay EH CRN, the authors aim to find an optimal sensing time to maximize throughput of SU and harvested RF energy. Most of the existing works only investigate OPA problem in CR system with single user, which is not pratical for real communication systems.
Based on the above discussions, we propose an OPA policy for the EH CRN considering multiusers with the underlay spectrum access model. And this paper only considers that SUs harvest energy from ambient environment such as solar [8,9]. The major contributions of this paper are as follows: • We consider the system model with multiple SUs where each SU transmitter is equipped with an EH device. In our study, we focus on the maximization of the average throughput of the secondary system within K time slots under the maximum transmission power constraint, the minimum throughput constraint of SUs, and the interference power constraint defined by PU. The proposed algorithm can maximize average throughput and satisfy all constrains simultaneously to achieve better performance for SUs and the QoS of PU. • Orthogonal frequency division multiplexing access (OFDMA) scheme is used in the process of spectrum sharing, where the available spectrum is divided into a set of subcarriers. The optimal transmission power of each SU can be obtain by using dynamic programming (DP) algorithm and immediate optimization solved by the backward induction method and the Lagrange dual method, respectively.
The rest of the paper is organized as follows. In Section 2, a system model of EH CRNs, a subcarrier allocation model of SUs, and a subcarrier occupation state are described. We introduce EH process in Section 3 and present the throughput optimization problem and the immediate OPA algorithm. Then, the system state and the DP-based scheme are formulated, and the backward induction method is given in Section 4. In Section 5, we present our simulation results and performance analysis through the comparison between our proposed policy and other policies. Finally, the conclusion of the whole paper is provided in Section 6.

System model
We consider an EH-CR network with a PU and M SUs as shown in Fig. 1a. Each SU transmitter is equipped with an EH device. We let the set M = {1, 2, . . . , m, . . . , M} denote the number of SU, and ∀m ∈ M. The PU bands are divided into N subcarriers, and each of the subcarriers with the same bandwidth. We let the set N = {1, 2, . . . , n, . . . , N} denote the number of subcarrier, and ∀n ∈ N . The number of the subcarriers occupied by PU is l. The SU can obtain the channel state information (CSI) and the working state of PU by spectrum sensing algorithm [15], such as matched filter detection, energy detection, and multiple identification spectrum detection. We let PU-Tx and PU-Rx denote the primary transmitter and receiver, respectively. Similarly, SU-Tx and SU-Rx denote the secondary transmitter and receiver, respectively.
We consider Rayleigh fading channels modeled as a twostate Markov chain [16,17] as shown in Fig. 1b. The channel gains from PU-Tx to PU-Rx, the m th SU-Tx to the m th SU-Rx, the m th SU-Tx to PU-Rx, and PU-Tx to the m th SU-Rx over the n th subcarrier are denoted by w n , h n m , a n m , and b n m , respectively. Moreover, we select the underlay spectrum access model in which SU has the opportunity to coexist with PU, while SU must control the interference to PU under a certain threshold.

Subcarrier allocation model
According to the above system model, we determine a policy to allocate N subcarriers to M SUs. Both PU and SU systems use the OFDMA sheme, and we assume that one subcarrier can only be used by one SU at each time slot, which means the interference between SUs is not considered. We consider that each SU can use multiple subcarriers at each time slot, the subcarriers occupied by the m th SU can be denoted as the set K m , K m ∈ N . We define the minimum rate requirement [18] of each SU denoted as R m min , ∀m ∈ M. The allocation policy can be described as follows.
First, we determine the priorities of each SU according to R m min , where the larger value of R m min reaches the higher priority of SU has. Then, the subcarriers that have a good channel condition and are not occupied by PU can be allocated to SU with high priority. According to this allocation policy, we can allocate the subcarriers to SUs.

Subcarrier occupation state
Since the subcarrier occupation state is constantly changing over time slot, SU should perform spectrum sensing to determine the behavior of PU at the beginning of each time slot. From Fig. 1b, the channel may be in one of the two states: busy (B) or free (F). The state B denotes that PU is active; conversely, the state F denotes the inactiveness of PU. We consider K time slots in our study; the time slot set can be defined as K = {0, 1, . . . , k, . . . , K − 1} and ∀k ∈ K.
We define x n k to indicate the state of n th channel at the time slot k, which has two possible values x n k = 0, the n th channel is in state F 1, the n th channel is in state B In addition, we define O k to indicate the state set of all channels at the time slot k. Let N denotes the total subcarriers and l denotes the number of subcarriers occupied by PU, then the number of random subcarrier occupation state can be defined as L = N L . Therefore, the elements of O k can be given by The transition matrix of PU occupation state is defined as P o . At first, the state transition probability of the n th channel can be given by The transition probability of P o is defined as and it further can be expressed as According to P n BF and P n FB , we can get the value of p o ij .

Energy-harvesting process
Here, we assume that the finite capacity of energy harvest battery (i.e., energy queue) is attached to each SU transmitter and can be used for signal transmission. We also consider that the energy harvesters only harvest energy from ambient environment such as solar. The energy harvested packets can be denoted as E m k ∈ e = {e 1 , e 2 , ......, e H } following a Poisson process [9] with mean e λ , ∀k ∈ K, ∀m ∈ M. Thus, its probability distribution can be described as follows Besides, we use B m k to represent the remaining battery energy of the m th SU at the time slot k; therefore, the battery energy update value at the next time slot k + 1 can be given by where p m,n k denotes the transmission power allocated to the m th SU in the n th channel at the time slot k, T denotes the duration of one time slot, and B max is the maximum battery capacity.

Problem formulation
In our study, the optimization objective is to maximize the average throughput of the secondary system within K time slots. We define SINR m,n k as signal-interference-noise radio (SINR) of the m th SU in the n th channel at the time slot k, and then defined where p p,n k denotes the transmission power of PU in the n th channel at the time slot k , and σ 2 is the noise power. Therefore, the throughput of the m th SU at the time slot k can be defined as And the optimization problem can be formulated as OP1 where E {·} denotes the expectation of the channel gain distribution and the subcarrier-occupied state at each time slot. C1 denotes the maximum transmission power constraint, and B m k /T is the total transmission power budget for the m th SU at the time slot k. This constraint can ensure the transmission power of each SU at each time slot not to exceed the energy budget. C2 denotes the interference power constraint to guarantee the interference to PU remains under I th , where I th is the interference threshold prescribed by the PU receiver. C3 represents the minimum throughput constraint which can keep the throughput above the minimum throughput requirement in the network. The constrain C4 can make the transmission power of SU conform to the actual situation.

Immediate optimization problem solution
To solve the optimal problem OP1, we consider both the throughput at the current time slot and the future throughput. In order to simplify the problem, we set K = 1, that means we only consider the optimal problem at one time slot. Therefore, we can formulate the immediate optimization problem as OP2 OP2 is a convex problem which can be solved by the Lagrange dual method [19]. First, we define the Lagrange function where λ m , μ, ξ m ≥ 0 are the Lagrange multipliers. The dual variable λ m relates to the maximum transmission power constraint, the dual variable μ connects with the interference power constraint, and the dual variable ξ m contacts with the minimum throughput constraint. Moreover, the dual problem of the Lagrange function is where L m p m,n k , λ m , μ, ξ m = The dual optimization problem d * of (13) can be formulated as Since L m is a convex function, according to the Karush-Kuhn-Tucker (KKT) conditions [20], the optimal transmission power p m,n k at the m th SU transmitter can be calculated by ∂L m /∂p m,n k = 0. Thus, the optimal solution is where [·] + = max (0, ·).
The Lagrange multipliers λ m , μ, and ξ m should ensure a fast convergence rate. We can use the sub-gradient methods to update these multipliers, and their recursive forms are where i denotes the iteration number. α 1 , α 2 , and α 3 ≥ 0 are small step sizes. The proper selection of the step size can ensure the stability and convergence of this dual algorithm [21]. Finally, we can get the optimal solution p m,n * k . Then, by taking this solution into (11), the optimal throughput of each SU can be calculated.
Thus, our proposed immediate OPA algorithm can be summarized as Table 1.
However, this solution does not directly apply to the multiple time slots, namely, K > 1, in which we should balance the current throughput and the expected future throughput. According to our system model, the subcarrier occupation state, the harvested energy state, and the total energy budget are all time-varying state. In order to further develop this problem for K > 1, we use DP algorithm, which will be introduced in the next section.

System state
Considering the future expected throughput, the transition probability of the system state should be determined.
Solve the optimization problem (13) to obtain p m,n k , ∀k, thus we can get the throughput of the secondary system; End: The optimal transmission power p m,n * k can be calculated by (18), and take p m,n * k into (11), the optimal throughput can be obtained.
Through the introduction of the previous section, we know that the system states include the subcarrier occupation and the harvested energy which are independent of each other. Since the process of EH for each SU is independent, the states of harvested energy for each SU are also independent. We define the system state as S k , ∀k ∈ K, and the number of system state is J = L × H M . The elements of S k can be described as follows We use P s to denote the transition matrix of the system state, and its dimension is D = J 2 = L × H M 2 . We assume S k+1 = s j at the time slot k + 1, where Therefore, the transition probability of P s can be given as According to (5), (6), and (7), the transition probability can be calculated.

Dynamic programming formulation
In this section, we define a DP formulation [22] to solve the optimization problem OP1. We define a reward function which can be understood as a maximum of the sum of the throughput at the current time slot and the expected cumulative throughput at the future time slot from the current system state. We set V k B 1 k , B 2 k , ......, B M k , S k to denote the reward function at the time slot k , which is a function of the current energy budget B m k of each SU and the current system state S k , and it can be expressed as

Backward induction method
The backward induction method [23] can be used to solve , S k+1 to denote the future reward function at the next time slot k + 1.
Since we consider the time slot set as K = {0, 1, . . . , K − 1}, the reward function at time slot K is For k = K − 1, K − 2, . . . , 0, the reward functions can be expressed as Thus, the reward function can be further calculated in the time-reversal order.
1) Time slot k = K − 1 In this case, we only need to consider the immediate reward function, which can be achieved when the transmission power of each SU on each subcarrier satisfies the optimal solution (18).
2) Time slots from k = K − 2 to k = 1 where B m k+1 can be updated by (8), ∀m ∈ M. Through the state transition probability p s ij , we can get the expected reward function. By considering the trade-off between the current reward and the potential reward at next time slot, we can get the optimal transmission power.
where B m 0 denote the energy budget for SUs at the beginning of transmission, ∀m ∈ M, and S 0 is the system initial state. At the initial time slot, we only need to satisfy the maximum of the expected reward at time slot k = 1.
Therefore, our DP power allocation algorithm can be summarized as Table 2.
As we can see, the results can be stored by a table with the time slot index; according to this table, SUs can determine the optimal transmission power.

Performance analysis
We first analyze the performance of immediate power allocation algorithm. This algorithm can guarantee the interference of PU below a certain threshold and the throughput of each SU above a proper threshold at each time slot.
From (19) to (21), we can see that the Lagrange multipliers can be updated only by local information, which can effectively improve the calculate speed and reduce the algorithm complexity. If the transmission power p m,n k is relatively high, it will result in conditions, so that λ m and μ will increase and ξ m will decrease. Following (18), we can find p m,n k will reduce. As a result, the transmission power can be adjusted to satisfy the constraint conditions. However, this power does not infinitely decrease, if p m,n k becomes relatively small, λ m and μ will decrease and ξ m will increase. Therefore, p m,n k will increase and goes back to the appropriate range. This adaptive iterative process can ensure good QoS for both PU and SU.
Based on the Lipschitz continuity [24] of the dual function and the proper step parameters of the Lagrange multipliers, this algorithm can converge quickly. According to the Lipschitz continuity, there exist a Lipschitz constant δ which can make the function d * (λ m , μ, ξ m ) satisfy the following condition: where λ 1 , λ 2 ∈ λ m , μ 1 , μ 2 ∈ μ, ξ 1 , ξ 2 ∈ ξ m , and · 2 denotes the norm of vector. Thus, we can determine the dual function d * of (13) is uniformly continuous. When p m,n k satisfies all constraints C1 to C4 with λ m , μ and ξ m ≥ 0, p m,n k * , λ * m , μ * , ξ * m can converge to a feasible region. Owing to the duality property between the dual problem and the original problem, the immediate power allocation algorithm can converge to the optimal solution.
Using the immediate power allocation algorithm, we can realize DP power allocation algorithm with consideration of the throughput optimization problem for the whole K time slots. We store the system state S k , the energy budget level B m k , the immediate reward function R k , and the reward function V k in a look-up table indexing with the time slot, which contains all the possible situations. Therefore, each SU can determine his optimal power policy from this table which greatly reduces the computational complexity.

Simulation results
In this section, we present some simulation results to evaluate our proposed algorithm by comparing with two policies. The first policy is the conservative power policy. We use this scheme to allocate half of available energy to each time slot for power allocation, i.e., p m k = B m k /(2T), ∀k ∈ K. The second policy is the greedy power policy which uses whole available energy for power allocation at each time slot, i.e., p m k = B m k /T, ∀k ∈ K. In our simulations, we assume that there are four SUs, i.e., M = 4, one PU, and eight subcarriers, i.e., N = 8. In addition, this PU occupies two subcarriers in each time slot. As the result, the number of subcarrier occupation state is twenty-eight, i.e., L = 28. We assume that the total transmission power of the primary user is constant, and average allocation in each occupied subcarriers. The throughput performance is compared over the total bandwidth of B = 1 MHz. The channel suffers with the frequency-selective fading by a six-ray Rayleigh model with exponential profile and maximal 5μs multipath delay [8]. Moreover, we define the maximum battery capacity of each SU is 5 J, i.e., B max = 5J. Thus, the total energy budget of the secondary system is 20 J. We set T = 1 s, and the energy budget and the transmission power with resolutions of 0.5 J and 0.5 W, respectively. The minimum throughput of each SU, i.e., R m min , is a positive constant depending on the energy budget at current time slot. We set R 1 min > R 2 min > R 3 min > R 4 min , which means the priority of SUs is SU 1 , SU 2 , SU 3 , and SU 4 . The interference temperature (IT) at the PU receiver is I th = 0.1W. The simulation results are presented in Figs. 2, 3, 4, 5, 6, and 7.  Figure 2 shows the convergence of the average throughput of four SUs with the proposed algorithm, respectively. In this simulation, we set the energy budget of each SU as 5 J. From Fig. 2, we can clearly see that this scheme can quickly converge to the equilibrium point. Obviously, Fig. 2 show that the average throughput of SU 1 is the best of all SUs, and the average throughput of SU 4 is less than all SUs. The reason is that according to the priority of SUs and the subcarriers allocation model, the SU with higher priority can transmit data under good channel condition. However, SU 4 must deal with the interference power constraint to ensure the QoS of PU.
In Fig. 3, we consider the average throughput comparisons among our proposed power allocation policy, the conservative power policy, and the greedy power policy. We compare the average throughput of the secondary system on the variation of total energy budget. From this Total Energy Budget (J) 0 4 8 12 16 20 Average Throughput (bit/s/Hz) figure, we know that the average throughput of our propose policy is much higher than those of the other two policies over the examined range of total energy budget, since our policy considers not only the immediate OPA but also the whole time slot transmit performance. In addition, the average throughput increases for all three policies with the increase of the total energy budget, since the increase of the energy budget means we have more energy for power allocation for the increase of the throughput. But this throughput increases much more rapidly through our policy. Figure 4 shows the comparisons of total throughput among three policies on the variation of time slot. In this case, we also set the energy budget of each SU is 5 J. Obviously, the throughput of our policy increases significantly with the increase number of time slot. Moveover, we can find that our policy has the best performance among the three policies over the whole range of number of time slot. From the simulation result, we can conclude that our policy can guarantee optimal performance for long-run operations. Figure 5 illustrates the comparison of interference power at PU receiver for three policies. We set the IT level as I th = 0.1 W. Figure 5a provides the convergence of the interference power under arbitrary time slot. It is clear that the interferences from three algorithms can quickly converge to their stable points, and these three policies can guarantee the interference power at the PU receiver below the IT level. Figure 5b presents the average interference comparisons among three policies on the variation of total energy budget under K = 5. We can find that with the increase of the total energy budget, three average interferences increase gradually and are always less than the IT level even if the energy budget reaches the battery capacity. From Fig. 5a, b, we can see that the interference from our policy is slightly higher than those of other two policies under the premise of the interference power constraint, i.e., 0.01 W ∼ 0.02 W. Moreover, from Figs. 3, 4, and 5, we conclude that the proposed policy can provide better throughput performance of the secondary system at the cost of little more interference at the PU receiver within the tolerance of PU.
To show the impact of different IT level on the system performance, Fig. 6 presents the average throughput on the variation of the IT level with different energy budget. From Fig. 6, we find that the average throughput increases first then tends to a constant value with the increasing of the IT level. The reason for this phenomenon is that the higher IT level attains, the higher interference power can be tolerated by PU. Thus, SU can get more transmission power for more throughput. Meanwhile, due to the constraint of the energy budget in our policy, the transmission power will eventually tend to a constant value. In addition, the interference power constraint represents the distance, with the increasing distance between SU and PU; more transmit power is allocated to achieve higher throughput.
Finally, in Fig. 7, we present the average throughput on the variation of the number of time slot of our proposed algorithm for different IT levels. In this simulation, we set the energy budget of each SU is 0.5 J. From Fig. 7, we can find that the higher IT level can obtain higher total throughput. Since that if PU can tolerate higher interference power, SUs can be allocated with higher transmission power. From other perspective, with the increase number of time slot, the overall trend of throughput goes up. However, the throughputs at K = 6, 9 slightly decrease, since the channel gain, the subcarrier occupation state, and the harvested energy state are random at each time slot. According to Figs. 6 and 7, we can get the conclusion that it is necessary to choose an appropriate IT level to balance the PU protection and the secondary system performance.

Conclusions
In this paper, we study the OPA problem in the EH CRN and propose an OPA policy to maximize the average throughput of the secondary system within K time slots, where the maximum transmission power constraint, the interference power constraint, and the minimum throughput constraint are considered. The optimal transmission power in each time slot can be obtained by joint utilization of the immediate OPA algorithm and the DP method. The simulation results show that, compared with the conservative power policy and the greedy power policy without DP approach, our policy can achieve better throughput performance on the variation of total energy budget and the number of time slot. Meanwhile, our algorithm can provide well protection for the basic communication of PU through introduction of interference power constraint. However, this policy improves the performance of the secondary system at the expense of little more interference compared with the other two policies.