Combined Rate and Power Allocation with Link Scheduling in Wireless Data Packet Relay Networks with Fading Channels

We consider a joint rate and power control problem in a wireless data tra ﬃ c relay network with fading channels. The optimization problem is formulated in terms of power and rate selection, and link transmission scheduling. The objective is to seek high aggregate utility of the relay node when taking into account bu ﬀ er load management and power constraints. The optimal solution for a single transmitting source is computed by a two-layer dynamic programming algorithm which leads to optimal power, rate, and transmission time allocation at the wireless links. We further consider an optimal power allocation problem for multiple transmitting sources in the same framework. Performances of the resource allocation algorithms including the e ﬀ ect of bu ﬀ er load control are illustrated via extensive simulation studies.


INTRODUCTION
Recently there has been a growing research interest in traffic relay in wireless networks [1][2][3][4][5][6][7].Relaying is regarded as a promising means for supporting high data rate transmission in 4G systems, where users may be separated from the base station or an access point in a wireless local area network (WLAN) by a long distance.The implementation of multihop relaying can lead to accommodating more high data rate users, efficient interference control, and significant power savings via economical amplifier design.In addition, simultaneous transmission from the base station and the relay node may achieve capacity gains through cooperative diversity.See [6] for a summary on relay-based deployment ideas for wireless and mobile broadband radio.Among recently published works, traffic relay has been considered for cellular networks in [8,9], and for wireless data packet networks in [2].
In a practical relay deployment scenario, one naturally encounters random fluctuation of the channel gain along each involved link, which impairs the transmission of signals.Power control is effective for dealing with fading by maintaining an acceptable power level at the receiver end by responding to channel variations.On the other hand, in systems facilitating variable rate transmission, rate control is also useful in reducing the probability of error.The reader is referred to [10,11] on power control, [12,13] on rate control, and [14,15] on joint rate and power control.Notably, under dynamic channel conditions, dynamic programming techniques have provided useful tools for system performance optimization in the context of either rate or power control [12,16].Specifically, in [2], the authors analyzed an optimal power control algorithm by using stochastic dynamic programming techniques for a two-hop relay problem where the source and relay each contains a buffer.
In this paper, we consider joint rate and power control in a wireless data packet relay model.Such relay-based packet data transmission systems can be useful in almost all wireless data networks cellular, WLANs, mobile multihop ad hoc networks, or even emerging hybrid networks combined of different components that provide seamless integrated service for transmitting and receiving data at high rates over the wireless channel.In this setup, packets at the source nodes (SN) need to reach a destination node (DN) via a relay node (RN).Hence there are two sets of wireless channels connecting the sources and destination with the relay node being located at an intermediate location; see Figure 1.For either a single or multiple sources, however, we restrict to a single destination, which is typical for modeling the access point to a wired infrastructure which receives data traffic from different users.For practical implementation, the significance of one relay node lies in the fact that it reduces complicated routing task, avoids the formation of bottleneck links, and increases network reliability [17].
In our relay model, we assume that (i) at the wireless links, data packets are sent using a spread spectrum scheme, and furthermore, (ii) it is not allowed for the relay node to receive and transmit packets simultaneously (half-duplex model).The second assumption is made because, at the relay node of the network, the receiver and the transmitter are installed at the same unit and, if active simultaneously, will produce self-interference which is significantly more serious than the near-far effect in a code-division multipleaccess (CDMA) model.This assumption is useful for interference management in a wireless data network which requires low bit error rate (BER) under much poorer channel quality compared to wired networks.Node transmission assumptions similar to (ii) can be also found in [2,18].Notice that assumption (ii) naturally leads to the issue of transmission link scheduling and its associated optimization.
Indeed, under the above assumptions, we can essentially implement a joint CDMA/time-division multipleaccess (TDMA) protocol, where the TDMA component is used to allocate the transmission time of the wireless links connecting the relay node.The CDMA component allows multiple sources to transmit simultaneously where the receivers can be equipped with multiuser detection capability.For the joint rate and power control analysis, we will concentrate on the single user case although the optimization of the multiple source case can be formulated in a straightforward manner.This leads to useful notational simplifications in the underlying optimization problem which is very rich in structure.The solution to this problem provides us with interesting insights into network resource allocation problems.We study the multisource case from the perspective of power control only, as considering variable rate CDMA transmission from multiple users in the context of relaying is beyond the scope of this paper.We also assume that all necessary resource allocation computations (for link scheduling, power and rate allocation) are carried out in a centralized manner.For the particular problem considered in this paper with one or more sources, relay and destination, the centralized entity carrying out these computations can be the destination.Note that this implies that the destination needs to have all channel information regarding the source-relay and relay-destination links available to itself in a dynamic manner, that is, this information is collected at the same time scale as the channel changes.Clearly, this requires additional communication overhead such as sending of pilot tones to relay and receiving additional information from the relay regarding the sourcerelay link.While these computations can be distributed at the source and the relay based on their locally available information (perhaps resulting in loss of optimality), in this paper we do not investigate such distributed resource allocation algorithms.A detailed investigation of channel estimationrelated communication overhead issues is also beyond the scope of the current paper.
The main contributions of this paper are summarized as follows.
(i) A unified framework for power, rate control, and link scheduling with fading channel is proposed.(ii) A two layer dynamic programming scheme for link scheduling and rate/power selection is provided.(iii) Algorithms for relay utility optimization and dynamic buffer load control are proposed, which lead to simple threshold rules for link scheduling according to the buffer level conditioned on channel quality.Numerical studies are presented to illustrate the performance of all algorithms.
The rest of the paper is organized as follows.In Section 2 we state the channel model and variable rate packet transmission.Section 3 presents the model for transmission dynamics in terms of a finite state Markov chain.The system state transition resulting from channel variations and multiple retransmissions is described in Section 4, and then in Section 5, the performance measure is introduced which involves the objective of relay node utility, buffer management, and power savings.The dynamic programming equation is analyzed in Section 6.The role of buffer load control is analyzed in Section 7. Numerical examples are presented in Section 8 for optimal rate and power control.Section 9 illustrates power control with multiple sources.Some concluding remarks are included in Section 10.

SYSTEM MODEL
In this section, we consider the case of a single transmitting source node.Let x(t) and y(t) denote, respectively, the channel link gain between the source and relay, and that between the relay and destination, where t takes values from a set of discrete times.We will term the wireless channels associated with x(t) and y(t) as the incoming and outgoing links, respectively.Transmission takes place across a channel if and only if the channel is active.
We model x(t) and y(t) by two independent finite state Markov chains with state space S x = {a 1 , . . ., a n } and S y = {b 1 , . . ., b m }, which describe the random fluctuation of the channel gain.Note that the individual channel gains can be temporally correlated due to their Markovian property.For packet transmission, let us consider the incoming link.The transmission for the other link is formulated similarly.A packet transmitted by the source, if received correctly at the relay node, results in an acknowledgment (ACK) which is immediately sent by a feedback channel from the relay to the source; consequently, the source deletes that packet and continues with the transmission of the next one if its channel (i.e., the incoming link) is still in an active state.We assume that the feedback channel is error-free and does not interfere with data transmissions.
In the case of a packet loss (or a corrupted packet), the source will receive a negative acknowledgment (NACK) from the relay node, and it needs to go through multiple retransmissions until the packet is received successfully or until a maximum number of M trials is reached, whichever happens earlier.See [19,20] for similar retransmission schemes.If a maximum number of retransmissions is reached without a packet being successfully received, the packet will be deleted and the source will turn to the next packet.We use the same maximum retransmission number M for both the source and the relay.

System parameter specifications
The channel state is updated by a period of T > 0, and we specify the two channel gains by the discrete time Markov chains x(kT) and y(kT), k = 0, 1, 2, . . . .Both x(t) and y(t), t = kT, are homogeneous with one-step transition matrices P x and P y , respectively.During the period [kT, (k +1)T), k ≥ 0, the channel state remains a constant until a possible jump at (k + 1)T, and moreover, the transmitting node can choose different packet rate R p (packets/second) for that interval; however, under our direct sequence spread spectrum (DSSS) scheme the chip rate for both links is assumed to be the same fixed constant R c .Hereafter, we refer to [kT, (k + 1)T) as a transmission cycle, or simply a cycle, on which a packet rate is selected at kT. Obviously, with the given constant chip rate, the packet rate R p may be equivalently translated into a corresponding processing gain G p in order to maintain the constant chip rate.This is the so-called variable spreading gain technique [21].We assume a constant packet size of L bits.Then R c = R p LG p , and a cycle contains R c T chips.In our subsequent analysis, the word "packet rate" refers to R p and the term "scaled rate" (or simply "rate") refers to the number of packets transmitted per cycle of duration T, given by R = R p T.

SYSTEM DYNAMICS FOR TRANSMISSION
In this section, we describe the packet transmission mechanism.We assume that the source buffer is always nonempty and that the relay buffer is sufficiently large such that the issue of buffer overflow may be neglected.The power control problem amounts to selecting the power level of individual packets in a transmission cycle during which the channel state does not change.The number of packets transmitted during a cycle (of duration T) is given by R p T, which is integer-valued.

The bit-energy-to-interference ratio with a single source
We use the terminology "bit-energy-to-interference ratio" even though we are only analyzing a single user case.This is done with the intention that we can use the same terminology when multiple users are concerned.For the incoming link, at time t we denote the power by p x (t) and the packet rate by R x (t).The background noise intensity at the relay receiver is η x > 0. So the bit-energy-to-interference ratio (E b /I) can be denoted as1 where c 1 = R c /(Lη x ).Similarly, for the outgoing link we introduce the bit-energy-to-interference ratio where c 2 = R c /(Lη y ) and η y > 0 is the background noise intensity observed by the receiver at the destination.
For both links, we use the same function P s (r) to denote the success probability of a packet transmission when the bitenergy-to-interference ratio is r ≥ 0. In practical systems, such a probability depends on the specific detection scheme at the receiver, and whether coding as well as packet combining is employed [20].

A Markov chain model for retransmissions
We introduce the integer-valued random process I x (resp.,I y ) for the incoming (resp., outgoing) link to index the number of trials of the current transmission.We call I x and I y the label processes with state space S = {1, 2, . . ., M} where M is the maximum retransmission number.
We introduce the variable a taking values in {1, 2}, where a = 1 and a = 2 mean, respectively, the incoming and outgoing links being active.a will be called the scheduling variable or simply the scheduler.Notice that under the operating assumption, the value of a is chosen at kT and it remains constant over [kT, (k + 1)T) until it is updated at (k + 1)T.
For the incoming link, suppose a scaled rate of R = R p T packets is selected at kT for the cycle [kT, (k + 1)T).Denote where 0 ≤ i ≤ R. Consider the transmission of a packet on the subinterval [kT with an associated bit-energy-to-interference ratio e x (kT + Δ i R ).We define the conditional probability where we recall that a = 1 means that the incoming link is active.The above gives the probability of transmitting the same packet at the next time instant resulting from a packet loss.Due to the maximal trial number constraint, we have which means that the channel must transmit a new packet no matter what is the outcome of the previous transmission provided that the link continues to be active.We also set where a = 1 indicates that link is inactive.In this case, we necessarily have e x (kT + Δ i R ) = 0 since the power becomes zero.The interpretation is obvious: if that link is not active, the label process should be frozen.
The transition of I x (and also I y ) is illustrated by the directed graph in Figure 2 where the probability p = 1−P s (e x ).I x is incremented by 1 if I x < M and if there is a packet loss.In the case of a transmission success or when the maximum trial number has been reached, I x will transit to 1.
The analysis for I y is similar and will not be repeated here.However, if I y is introduced into the system state specification, there must be at least one packet in the buffer; otherwise, the index I y is automatically ignored.
We note that in a data packet network, a packet discard is a rare event.However, it plays an important role in affecting the quality of service [19].Now we examine the mechanism for a packet discard event in the outgoing link.We use D t with t = kT + Δ i R to denote a packet discard event for the outgoing link on the time interval Then a packet discard occurs on that interval if and only if I y (t) = M and a packet loss results at kT + Δ i+1 R .By use of Bayesian rule, we have For a relevant analysis on packet discard rates, see [19].It is shown that by increasing the number M, the packet discard rate can be effectively reduced at a modest expense of increased transmission delay.When M continues to increase towards a high value, the resulting additional delay will rapidly saturate.

SYSTEM STATE TRANSITION IN A CYCLE
Once a link is activated, the system state may be described using a finite state transition model involving only the active link.Since for the two label processes, only I y will be involved in the optimization formulation as it affects the buffer state directly, below we give the details when the outgoing link is active.The case for the incoming link is only briefly sketched.

The outgoing link
We denote the channel state by y ∈ S y , the labelling parameter I y by l ∈ S = {1, 2, . . ., M}, and the relay buffer state z by i.Here we require i ≥ 1.For the cycle [kT, (k + 1)T), let where the first entry in the quadruple is time, 1 ≤ l ≤ M − 1 and i ≥ 1.We have y = y if 0 ≤ j ≤ R−2, and if j = R−1, y can take a different value in S y if the channel gain has a jump.The same rule is applicable to all the following scenarios for the relation between y and y .
Case 3. Packet discard: where i ≥ 1.Following a transmission failure, that packet is deleted and the system turns to the next packet which is labelled by 1.
We note that for both Cases 2 and 3, if i = 1, then the label processes I y automatically vanish at kT + Δ j+1 R , and it will be recreated only when a new packet enters the buffer.For the state transitions specified in the above three cases, the associated transition probability can be easily computed.For example, let us consider Case 1 for the outgoing link with j ≤ R−2.Then we have y = y and the transition probability is 1 − P s (e y ) where e y is easily determined by use of y, R, and the power on the interval [kT +Δ j R , kT +Δ j+1 R ).If we have j = R − 1, we have the transition probability P y (y, y )[1 − P s (e y )] with its corresponding e y where P y is the one step transition matrix for the channel state at the outgoing link and y ∈ S y .

The incoming link
We denote the channel state by x, the labelling parameter in I x by l, and the buffer state by i ≥ 0. For the cycle [kT, (k + 1)T), assume R = R p T is selected.
The analysis of the state transition is very similar to that of the outgoing link.The only notable difference is that after a transmission success, the buffer state will increase by 1; specifically, we have the following transition: where 1 ≤ l ≤ M. We omit the details for the state transition for the other cases.

The partial idle period case
We need to consider a particular situation for the outgoing link.Assume R > 1 for the cycle [kT, (k + 1)T), and the buffer state decreases from a positive number to zero before the time instant kT For such a scenario we stipulate that the transmission time is still reserved for the outgoing link and the incoming link can only be activated at t = (k + 1)T.Then the system state transition can be easily determined by updating y at (k + 1)T, and the label index I y temporarily disappears.
Although this rule seemingly wastes part of the available transmission time, in reality this does not constitute a drawback.First, by choosing kT, k = 0, 1, 2, . . ., as the activating time, we may reduce the implementational complexity.Second, for an optimized control policy, if it is the only choice to activate the outgoing link when there is only a small number of buffered packets, the system will tend to minimize (if it cannot avoid) the idle time by using a small packet rate which increases the effectiveness of each transmission and also energy efficiency.

PERFORMANCE MEASURE
We begin by specifying a one-stage cost for the cycle [kT, (k + 1)T), k = 0, 1, 2, . . . .Such an interval is used to describe the operation of the active link which can be either the incoming or the outgoing link.For notational convenience, we will optimize with respect to the scaled rate R (packets/cycle) rather than R p (packets/second).Following the notation in (3), we divide the cycle into R subintervals [kT Depending on which link is active, we may have a positive constant power level, denoted as p x (kT Following the success of a transmission at the incoming (outgoing, resp.)link, the buffer state will increase (decrease, resp.) by one, and in the event of a packet loss, the buffer state will remain the same unless a packet discard forces a decrease by one.Corresponding to [kT, (k + 1)T), we introduce the cost where x and y denote the channel states at t = kT.The values of I y and z at kT are l y and j, respectively.The scheduler a determines which set of powers is positive, and the constant λ > 0 is the coefficient for power penalty.h is the reward rate for sending a packet into the relay buffer.The power is not explicitly indicated inside J c .J c (kT, R, a, x, y, l y , j) will be called the cycle cost on [kT, (k + 1)T).I x has no impact on the evolution of the buffer state.Hence, J c is independent of I x , which is a useful feature for reducing the size of the state space in further numerical solutions.
In J c , the first two terms in the summand indicate that if there is a change of buffer level in two successive time instances, that is, a packet is successfully transported into or out of the buffer, then a negative penalty (hence a reward) should be imposed on the system.Note that conditioned on {I y = M}, the buffer state will necessarily decrease by one following one transmission; however, we only reward the favorable outcome when the packet is successfully transmitted.Such terms effectively capture the aggregate utility of the relay node in either receiving or forwarding traffic.However, in the calculation, there is an asymmetry for the one-stage reward in moving a packet into or out of the buffer.Such an asymmetry in the reward rate as adjusted by the weight function h(z) is useful for buffer management.In fact, we can choose h(z) as a monotonically decreasing function defined on the set of nonnegative integers.Then the marginal benefit in receiving packets will decrease when the buffer level z is large and hence the priority of activating the incoming link will be lowered under such circumstances.Without buffer load control, under very general conditions, there may be an unbounded accumulation of packets in the buffer, and we will address this issue separately in Section 7.
We decompose J c into the form where where the right-hand side of ( 14) or ( 15) simply reduces to zero if the corresponding link is inactive.Here J (m) c , m = 1, 2, is naturally understood as the cost incurred by the individual links.
Now we introduce the infinite horizon discounted cost function to be employed for the joint rate and power allocation: where we again omit the power entries and (x, y, l y , j) (denoting the set of values at time kT) is determined by the sample path of the channel states, label process I y , and the buffer state.The parameter ρ ∈ (0, 1) is the discount factor.R ∞ and a ∞ denote the sequences of rate allocation and scheduling actions.Here (x, y, l y , j) gives the values of channel states, label index I y , and buffer state z at time t = 0.
The optimal control problem amounts to finding a scheduling rule and associated rate/power allocation such that the cost J is minimized.For notational brevity, in further analysis we may drop the time index kT in J c , J (1)  c or J (2)   c without causing confusion.
Remark 1.It should be noted that due to the half-duplex nature of the relay transmission scheme, only one link is active at a given time.Therefore the performance function only rewards the success of the individual links at any given cycle of duration T. This is captured by the individual link costs J (1)   c and J (2)  c given by ( 14) and (15), respectively.When an individual link is not active, the corresponding cost is zero.However, notice that the expected cost (defined by ( 16)) represents an infinite horizon average discounted cost where both links are rewarded for successful transmissions in the long term.

A TWO-LAYER DYNAMIC PROGRAMMING EQUATION
In this optimal control framework, the control may be represented as a composite vector including the scaled rate R, the link scheduler a, and the power levels for the active link.
For both links, we assume the rate R and power p are selected from two finite sets R = {R 1 , R 2 , . . ., R N1 } and P = {p 1 , p 2 , . . ., p N2 }, respectively.At time t = 0, if the system state is (x, y, l y , i), representing the two channel states, the label parameter I y , and the buffer state z in sequence, we write the optimal cost v(x, y, l y , i) = inf R,p,a J(R ∞ , a ∞ , x, y, l y , i). v is also called the value function to the underlying optimal control problem.Here the infimum is computed from all admissible controls using the available (channel and buffer) information, and the rate and power are then assigned to the active link.
The dynamic programming principle gives2 v(x, y, l, i) = min min where we use l or l to denote a value of I y .The second term at the right-hand side of ( 17) is defined only for buffer level i ≥ 1.We term (17) the intercycle dynamic programming equation which determines which link should be active if both internal terms were known by some means.We give some interpretation for the two components at the right-hand side of (17).We consider the first component.When the scheduling action a = 1 is employed at the initial time t = 0, the label I y = l will remain the same value on [0, T), but all other quantities will change to new values (x , y , i ) at t = T. Hence in the second expectation, we have the set of entries (x , y , l, i ) within the value function.The leading term J (1)  c (x, y, l, i) corresponds to the cost on the interval [0, T), and the term ρEv(x , y , l, i ) is the discounted optimal cost-to-go from T to ∞.The second component at the right-hand side of ( 17) is interpreted analogously.However, when a = 2, the index l will transit to a new value l at t = T.

The intracycle dynamic programming
Notice that in (17) we need to carry out an internal minimization step which is used for rate selection and power allocation for the subintervals within a cycle at the active link.This internal minimization leads to an independent application of the dynamic programming principle.
For given R, we have the Bellman equation where m = 1, 2, 0 ≤ j ≤ R − 1, x , y , l , i denote the two channel states, I y and the buffer state z at time kT + Δ j+1 R , respectively, and The cases m = 1, 2 correspond to the activation of the links by a = 1, 2, respectively.Since the outgoing link transmits only when there is at least one packet in the relay buffer, v (2)  is defined for i ≥ 1.For the case m = 1, we have l = l in (18).The variable p in (18) stands for p x for m = 1, and p y for m = 2.The terminal condition for ( 18) is Associated with (18), the state transition within a cycle is determined in Section 4. Let Combing the intracycle and intercycle dynamic programming equations, we get v(x, y, l, i) = min v (1) (x, y, l, i), v (2) (x, y, l, i) , where v (2) is defined only for i ≥ 1.Finally, the optimal scheduler a * and rate R * for the system state (x, y, l, i) are given as a * should be set to 1 for i = 0. Once a * and R * are computed for a transmission cycle, the optimal power is easily determined using (18) by substituting m = a * and R = R * .Now a comment on the time-scale of the implementation of optimal scheduling, rate, and power allocation is in order.Note that the channel state changes at the end of each cycle which is the time scale for link scheduling and rate selection.In each cycle, more than one packet can be transmitted with different power levels depending on the channel quality (and buffer level).Therefore, power control is done on a faster time scale.
Computational complexity: the dynamic programming approach for optimal control problems (including control of Markov decision processes (MDP)) suffers from the "curse of dimensionality" in general.The application considered in this paper to optimal resource allocation in wireless relay networks is no exception.Indeed the computational complexity of the proposed algorithm increases exponentially with the number of users.However, by using an infinite time horizon discounted performance measure in this paper (which is reasonable when the time scale of individual packets is much smaller than the overall service time of users), the complexity can be partially reduced in that the control strategy only depends on the system operating states (buffer level, channel quality, etc.) and not on time, and such a control strategy can be computed offline, provided the channel statistics, and so forth remain unaltered over the time scale of the application.Indeed, it is important to consider more practical approaches for multiple users.We argue that our analysis with the simple models provide some guidelines in developing reduced-complexity optimization strategies.For instance, we expect that the threshold-type scheduling rule observed in the case for the simple model may carry over to the case of many users.Thus, in the case of many users, it may be reasonable to select suboptimal strategies by restricting the solution space to threshold-type strategies.One can also resort to neuro-dynamic programming-based value function approximation techniques [22] to reduce computational complexity.However, these studies are beyond the scope of the current paper and will be carried out in future work.

THE ROLE OF BUFFER LOAD CONTROL
Recall that in the cycle cost J c , we have introduced the weight function h(z) for buffer load control.Now we examine the effect of h(z) in affecting the scheduler and packet buffering.Since h is mostly related to the preference of the buffer towards receiving over forwarding traffic or vice versa, we only consider the scheduling action, and both the power and rate are fixed for the purpose of this simplified analysis.We assume R = 1.Furthermore, we take the maximum retransmission number M = ∞, that is, a packet is always retransmitted until it is received by the next node.
The link quality of the two channels is specified as follows.Each channel has two states ("good" and "bad," represented by rows 1 and 2, resp.) with state transition probability matrix P x = 0.92 0.08 0.18 0.82 , P y = 0.90 0.10 0.16 0.84 Such two state Markov chain models are also called the Gilbert-Elliott (GE) model [23].Obviously, under the previous fixed power and rate assumption, the quality of channel as measured by transmission success rate translates into a corresponding channel gain.For the incoming link, when the channel is at "good" and "bad" states, let the success probability of packet transmission be respectively, and for the outgoing link, let the success probability be respectively.We adopt a cost function of the form where we use the sequence of integers 0, 1, 2, . . ., to index the system states including the buffer level at different times.
Here one packet is transmitted between two successive time instants since R = 1.The cost ( 27) is based on the first two terms in (12).We take ρ = 0.95.

The case without buffer load control
We first examine the case h(z) ≡ 1.Since a closed form expression of the scheduling action as a function of the buffer and channel states is not available, we adopt a numerical method to examine the control actions for different buffer levels.We can easily solve the associated dynamic programming equation by value iteration.It is seen from Figure 3 that the optimal solution is very close to opportunistic scheduling which we define here as the scheduling rule which maximizes the one step reward.For relevant literature on opportunistic scheduling, see [24][25][26].In [24], the notion of opportunistic scheduling in a multiuser multiaccess channel is based on the principle that the user with the best channel transmits.The seminal work on opportunistic beamforming [26] is also based on this idea which forms the basis of the notion of "multiuser diversity" when a sufficiently large number of users are present to increase the sum capacity in a multiaccess channel.In [25], opportunistic scheduling is defined as a policy where the user with the largest performance value transmits, where the performance measure is defined for each user based on some desirable criteria such as high throughput and/or low power consumption and so forth.For our example with the given parameters, the opportunistic scheduling policy is given as 1 if incoming link is "good," 2 if incoming link is "bad" and z > 0.
Notice that in Figure 3, for the three scenarios with channel state pairs (G, G), (G, B), and (B, G), the associated action a is consistent with (28).Here G and B stand for "good" and "bad" states, respectively.With the channel state pair (B, B), there is a minor discrepancy between the optimally computed a and (28) in that a(z) = 1 for z = 0, 1, 2, 3 as shown in Figure 3.By inspecting Figure 4, we have the natural interpretation-by activating the incoming link so as to increase the buffered packet number from the very low level, the system will be steered into a lower-cost state.Indeed, when the initial state corresponds to a mild buffer load z > 4, the optimal cost is lower.The reason is that with that higher buffer level, the scheduler has better flexibility (i.e., utilizing channel diversity) in choosing the most profitable action before hitting the boundary z = 0 which would force the scheduler to take a = 1 even if the incoming channel link is poor.
Although the above opportunistic scheduling as well as its approximate version as shown in Figure 3 is simple for im-plementation, it may cause the buffer to grow without bound and thus necessitate buffer load control.We state the following result.
Proof.See the appendix.
The above instability results suggest in link scheduling, that the usual basic opportunistic scheduling is generally inadequate for producing practical control laws.

The case with buffer load control
We select the weight function h(i) = 1/(1 + 0.001i), i ≥ 0. Similar to Case 1, the optimal scheduling rule is computed by value iteration.For the scenarios (G, B), (B, G), and (B, B), the scheduling action is the same as in Case 1; see Figure 5.In contrast, when both channels are in the good states, the transmission time allocation depends on the buffer level.Once the number of buffered packets exceeds a certain threshold, transmission switches to the outgoing link.This effectively prevents the unlimited growth of the buffer level.
For justifications of such rational fraction expressions for the success probability in terms of the signal-to-interference ratio, see [11] and references therein.Here we use an exponent of 4 for the ratio p/R so that a large R can rapidly decrease the success probability.The reason for introducing such an effect is that the successful transmission of a packet relies on the correct detection of all its bits.Thus, the packet error probability can be made very sensitive to the bit-energy-tointerference ratio which affects the bit error rate.The value function is computed by value iteration in 50 steps, which further determines the optimal transmission link allocation, and rate and power selection.
Our computations indicate that the value function v(x, y, l, i) is insensitive to the label index l, which denotes the Figure 5: Link scheduling with buffer load control.When both links are in the good state and the buffer level exceeds a certain level, there is a switch of transmission from the incoming link to the outgoing link.Horizontal axis: buffer level; vertical axis: scheduler state.
retransmission index.For a fixed i, x, y triple, when l changes in the range 1 ≤ l ≤ M = 6, the relative error is less than 2 × 10 −4 .Hence in Figure 6 we select l = 1 and display v as a function of the buffer state i for given values of x, y.
The optimal link allocation is shown in Figure 7 where the incoming and outgoing links are represented by the numbers 1 and 2 along the vertical axis, respectively, and the associated rate is given in Figure 8.It is clearly seen that the optimal link scheduling is based on a threshold-type policy, that is once the buffer level exceeds a certain threshold, the link switching takes place.It is also seen that when the link with a poor state is required to transmit, a low rate R = 1 is used.When the channel condition is good for either the incoming or the outgoing link, the optimal rate selection is high for the appropriate link with R = 3.Here we do not explicitly display the power, but for the reader's reference, in the case of channel state being (G, G), the power level p = 1.5 is used for either active link.In the the channel state is given by (B, B) and the outgoing link is active, the low rate R = 1 is used and the power is taken as p = 3.0 to ensure adequate success probability.
For the channel state (B, G) in both Figures 7 and 8, we redisplay the low buffer level part in Figure 9.An interesting link and rate adaptation phenomenon is observed.With the low buffer condition i = 0, 1, the incoming link is active with R = 1 as constrained by the poor channel state.For i = 2, the outgoing link with the "good" channel state becomes active with R = 2, and if there is an adequate number of packets stored (i ≥ 3), it operates more aggressively with R = 3.

MULTIUSER POWER-CONTROLLED RELAY
In this section, we focus on power control for a multiuser packet relay model.

Channel modeling
We consider N transmitting source nodes.Let x i (t), t = kT, i = 1, . . ., N, denote the gain of the incoming channel between the ith source and the relay.Let the gain of the outgoing link still be denoted as y(kT).
We model x i (t) and y(t) by independent finite state Markov chains with state space S xi = {a i 1 , . . ., a i n } and S y = {b 1 , . . ., b m }.Both x i (t) and y(t), t = kT, are homogeneous with one step transition matrices P xi and P y , respectively.In order to simplify notation, we assume the number of channel states is n for all incoming links, and a generalization to different state space sizes is obvious.As in the joint rate and power control formulation, at a given time the relay node can only operate in the transmitting or receiving mode, and all source nodes can simultaneously transmit to the relay node by use of a CDMA scheme.In this model, the packet rate R p is set to one, that is, each transmitting node sends out one packet on the interval [kT, (k + 1)T).The retransmission scheme is similar to Section 2 and will not be repeated here.

The received signal-to-interference ratio
For the ith incoming link, we denote the power at time instant t by p i (t).The background noise intensity at the relay receiver is η 1 > 0. So the signal-to-interference ratio (SIR) after detection by matched filtering can be denoted as where h i j denotes the squared cross-correlation between the signature sequences of users i and j.Let the signature sequence of the ith user be s i = (1/ G p )(s i1 , . . ., s iGp ) where s ik ∈ {−1, 1}, and then we have the relation It is obvious that s i s i = 1.In practical implementation, one can use the simple method of generating s ik , 1 ≤ k ≤ G p as G p i.i.d.binary random variables, and h i j , i = j can be reduced by increasing G p [27].In decoding the source bits, the bit-error rate (BER) depends on the above received SIR.
Similarly, for the outgoing link we introduce the SIR e y (t) = y(t)p y (t) where c 2 = 1/η 2 , and η 2 > 0 is the background noise intensity observed by the receiver at the destination.Note that e y (t) does not depend on G p due to the scaled spreading sequence.For all wireless links, we use the same function P s (r) to denote the success probability of a packet transmission when the received SIR is r ≥ 0.

System dynamics and cost function
As in Section 3, we can also describe the system state transition in a similar fashion; the main difference is that when the incoming links are active, the buffer state may transit from j ≥ 0 to j ∈ {j, j + 1, . . ., j + N}, depending on how many sources succeed in transmission.Subsequently, the performance function is determined as follows: corresponding to [kT, (k + 1)T), we introduce the one-stage cost J c kT, a, x, y, l y , j where x = (x 1 , . . ., x N ) and y correspond to the channel states at the initial time of the cycle [kT, (k +1)T).We denote the scheduler by a.The values of I y and z at kT are denoted by l y and j, respectively.The power is not explicitly indicated inside J c .The scheduler a determines which set of powers is positive, and the constants λ i , λ > 0 are the weight coefficients for power penalty.Now we introduce the cost function for scheduling and power control: where we again omit the power entries and (x, y, l y , j) is determined by the sample path of the channel states, label process I y , and the buffer state.The parameter ρ ∈ (0, 1) is the discount factor, and a ∞ denotes the sequence of scheduling actions.The variables x = (x 1 , . . ., x N ), y, l y and j at the lefthand side of (32) describe the system condition at the initial time t = 0. Then by using the same method as in Section 6, we may write the dynamic programming equation for the optimal scheduler and powers.The details are omitted here.The switch of transmission time between two links due to buffer conditions.Horizontal axis: buffer level; vertical axis: scheduler state.

A numerical example with two users
In the following, we analyze a two-user model.First, we denote the received SIR in the form for the two users.For the relay node, the SIR is given as e y = y p y /η 2 .Let the channel state transition matrices for the two incoming links and the outgoing link be given by P x1 = P x2 = 0.9 0.1 0.2 0.8 , P y = 0.92 0.08 0.25 0.75 , respectively. (34) Other parameters for channel modeling and packet transmission are chosen as follows.The squared crosscorrelation coefficients are chosen to be h 12 = h 21 = 0.015.Making use of the random construction of signature sequences in [27], h i j at such a magnitude can be attained by a processing gain G p approximately equal to 64 (1/64 ≈ 0.0156).The noise power intensity is η 1 = η 2 = 10 −10 mW.The channel gain for x i , i = 1, 2 or y may change between two values {10 −10 , 10 −11 }.In other words, when deteriorating, the channel gain may drop by 10 dB.The emitting power for each of the three wireless links may be chosen from the set {40, 130, 220, 310, 400} in mW.We take the maximum retransmission number M = 5.
To avoid calculations with very small quantities, we use appropriate normalization for the noise intensity, channel gain, and power to set η 1 = η 2 = 0.025 (for 10 −10 mW), and the values of x 1 , x 2 , and y by the same set {1, 0.1}, Figure 8: The optimal rate assigned to the active link for different combinations of channel states.Except the case of (G, B), there is a switch of transmitting node as shown in Figure 7. Horizontal axis: buffer level; vertical axis: rate.corresponding to the "good" (10 −10 ) and "bad" (10 −11 ) channel conditions, respectively.We also set the candidate power levels for the mobile users and the relay node by {1, 3.25, 5.50, 7.75, 10}, with the base power level being 1 (representing 40 mW).It can be checked that under the base power level (p i = 1) and the good channel state (x i = 1), the received SIR is about 16.02 dB when only one source node is transmitting.This is consistent with the observation that for data networks, the target received SIR or bit-energy-tointerference ratio needs to be maintained at high levels for reliable detection of the source bits [20].In the numerical results presented below, all related quantities are computed in terms of these normalized values.
For the three wireless links connecting the two sources, the relay and the destination, (similar to the modeling in [10]), we model the packet transmission success probability by P s (r) = 1 − e −0.1r for an SIR level r.For other typical approximations of packet success probability in terms of exponential functions and rational fractions, see [10].For this choice, an SIR level of 16.02 dB amounts to a packet success rate of 0.9817.In the cost function, we use a discount factor ρ = 0.9, and λ 1 = λ 2 = λ = 0.02 denote the power penalty factors.
In Figures 10-13, we display the numerical solution for the optimal cost v as well as the associated control policies where the value of retransmission index l is 1. Figure 10 shows the curves of the optimal cost as a function of the initial buffer level with two sets of initial channel conditions.It is seen that for the two curves, the cost monotonically increases.The reason is that when the buffer level is higher, the resulting cost due to receiving packets into the buffer is also higher (i.e., less profitable).Figure 11 shows the allocation of link scheduling to the incoming links or the outgoing link, depending on whether or not the buffer level exceeds  a certain threshold level when the channel states are fixed.In Figure 11, the channel states are given by (1, 2, 1) and (1, 2, 2), listed in the order (x 1 , x 2 , y), respectively where 1 represents "good" and 2 represents "bad" channels, respectively.For the    channel state being (1, 2, 2), at the beginning, the system utilizes the incoming links and stops to do so only when the buffer level exceeds a higher threshold compared to the case of channel state (1, 2, 1).The power allocation in Figure 12 is associated with the scheduling rule depicted in Figure 11(a) where the channel state is (1, 2, 1) and only the buffer level is treated as a variable.Figure 12(b) shows that the second user has a poorer link gain and is hence compensated with a higher power.Once the buffer level exceeds a certain value, the outgoing link (with the good channel state) should be activated using the base power level 1. Figure 13 displays the power allocation with channel states (1, 1, 1), in which the low buffer level corresponds to higher powers p 1 = p 2 = 3.25.When the buffer level j is small, the reward rate h( j) is high.This leads to an increasing transmission success probability by using higher transmission power and the resulting interference further causes the two users to mutually increase their power levels to 3.25.
We have also examined the difference v(x, y, l 1 , j) − v(x, y, l 2 , j), as a function of the buffer level j, incurred by taking two different label indices (1 ≤ l 1 = l 2 ≤ M = 5) while the channel states are fixed as (x 1 , x 2 , y) = (1, 1, 1) or other fixed triples.Compared to the magnitude of v itself, this difference is seen to be negligible.This is a very interesting and useful feature that simplifies numerical computations.Specifically, in a suboptimal computation of the value function v, one can essentially treat v simply as a function of the buffer level and channel states, and then can obtain the rate and power selection solutions using only a fraction of time required for solving the original problem optimally.In effect, this is equivalent to solving the problem with M = ∞.Indeed, when M has a moderate magnitude (say, above 5), packet discard becomes rare and the system behavior, including the evolution of the buffer level, is very close to the case by taking M = ∞.
It is worthwhile pointing out that in this discrete dynamic programming context, although one cannot find a closed form solution for the optimal power control and transmission scheduling strategies, link scheduling can be achieved by some simple switching rules or threshold type policies, specified in terms of the buffer level and channel states, and this feature is true for different values of the label index I y .In practical applications, this fact can be used to design low complexity implementation of the optimal control law by specifying some simple lookup tables.

CONCLUDING REMARKS AND FUTURE WORK
In this paper, we developed a unified optimization framework based on a two-stage dynamic programming algorithm for link scheduling and joint rate and power control in wireless data packet relay networks with fading channels.This approach captures the real-time utility of the network and leads to simple "threshold-type" scheduling rules for link allocation as well as simple rate/power selection.For the case of multiple users, the dynamic programming algorithm leads to a high computational complexity, and a potentially useful approach may lie in seeking suboptimal policies via approximate dynamic programming [22,28].
In future work, it is of interest to consider the deployment of a dual mode mobile user as a relay station.For such systems, it is potentially useful to introduce an incentive mechanism [29] (e.g., a node receives credit for forwarding traffic) for the relay node to promote its willingness in sharing its resources with other users while maintaining its own service.In general, this requires introducing a performance measure capturing the service objectives of all users in a balanced manner and will be investigated in future work.

APPENDIX
The buffer state may be regarded as being driven by the Markov chains, and its growth rate can be estimated by use of the asymptotics of x t , y t as well as the scheduler a.In fact, (x t , y t , z t ) may be looked at as a joint Markov process.Let E[z t+1 | z t , x t , y t ] denote the conditional expectation of z t+1 given (z t , x t , y t ).In view of the scheduling rule, we estimate the increment

Figure 3 :
Figure 3: Link scheduling without buffer load control.Horizontal axis: buffer level; vertical axis: scheduler state.

Figure 4 :
Figure 4: The optimal cost as a function of the initial buffer state with different combinations of initial channel states.

Figure 6 :
Figure6: The shape of the cost versus buffer load (with optimal rate and power control).

Figure 7 :
Figure 7: The switch of transmission time between two links due to buffer conditions.Horizontal axis: buffer level; vertical axis: scheduler state.

Figure 9 :
Figure 9: The scheduler and rate's dependence on the buffer state with limited load.The channel condition is (B, G).

Figure 10 :
Figure 10: v as a function of buffer level when the channel states are fixed.

Figure 11 :
Figure 11: The state of the scheduler switching between 1 and 2.