Unified approach to cross-layer scheduling and resource allocation in OFDMA wireless networks

Orthogonal frequency division multiple access (OFDMA) has been selected as the core physical layer access scheme for state-of-the-art and next-generation wireless communications standards. In these systems, scheduling and resource allocation algorithms, jointly assigning transmission data rates, bandwidth and power, become crucial to optimize the resource utilization while providing support to multimedia applications with heterogeneous quality of service (QoS) requirements. In this article, a unified framework for channeland queue-aware QoS-guaranteed cross-layer scheduling and resource allocation algorithms for heterogeneous multiservice OFDMA wireless networks is presented. The framework encompasses different types of traffic, uniform and continuous power allocation, discrete and continuous rate allocation, and protocols with different amounts of channel- and queue-awareness. System parameters and QoS requirements are projected into utility functions and the optimization problem is then formulated as a constrained utility maximization problem. Optimal solutions for this problem are obtained for the uniform power allocation schemes, and novel quasioptimal algorithms are proposed for the adaptive power allocation strategies. Remarkably, these techniques exhibit complexities that are linear in the number of resource units and users. Simulation results demonstrate the validity and merits of the proposed cross-layer unified approach.


Introduction
Due to its high spectral efficiency, inherent robustness against frequency-selective fading and flexibility in resource allocation, orthogonal frequency division multiple access (OFDMA), combined with multiple-input multiple-output (MIMO) strategies, has been chosen as the multiple access technique for state-of-the-art and next-generation wireless communications standards such as IEEE 802.16e/m-basedWiMAX systems [1] and Third Generation Partnership Project (3GPP) technologies based on the long-term evolution (LTE) and LTEadvanced (LTE-A) a [2].These systems have been designed with different quality of service (QoS) frameworks and strategies to allow the delivery of the wide range of emerging Internet multimedia applications with diverse QoS requirements [3].In this context, scheduling and resource allocation algorithms jointly assigning transmission data rates (AMC-adaptive modulation and coding), subcarriers, time slots and power become crucial for maximizing the resource utilization while providing satisfactory service delivery to end users.
The instantaneous characteristics of the transmission channel used by wireless MIMO-OFDMA networks are inherently varying in time and frequency due to multipath propagation, changing positions of mobile stations (MS) relative to the base station (BS), and nonsta-tionary environment.Consequently, the result is that different users sharing a BS experience different channel conditions at the same time and frequency.This phenomenon, referred to as multiuser diversity, constitutes the basis of opportunistic or channel-aware scheduling algorithms.The goal of these strategies is to jointly allocate resources (i.e., power, subcarriers and/or time slots) in order to either minimize the weighted sum of powers under a prescribed minimum rate budget [4][5][6] or maximize the weighted sum-rate under a prescribed power budget [7][8][9].Nevertheless, greedy-opportunistic schedulers serving only the users with favorable channel quality conditions raise the issue of fairness, as those users experiencing bad channel quality conditions may suffer from starvation.Therefore, besides channel state information (CSI), fairness is also an important issue that has been taken into account when designing scheduling algorithms for OFDMA-based multiservice networks [10][11][12][13][14][15][16].Fairness, however, may lead to low spectral efficiency, and this may become an issue when facing realtime services with stringent QoS requirements in terms of delay and error tolerance.Thus, beyond channel quality conditions and fairness, another important issue that should be considered to maximize users' satisfaction is the one raised by the wide range of QoS requirements of heterogeneous applications supported by emerging OFDMA-based wireless networks.
In order to tackle all previously mentioned issues, the data link layer (DLC) bursty packet arrivals and queueing behavior should be jointly taken into consideration, in a cross-layer fashion, with the physical layer (PHY) channel conditions when designing scheduling and resource allocation algorithms.In this context, publications such as [17][18][19][20][21][22][23], reporting optimal and suboptimal cross-layer algorithms for very specific wireless multiuser OFDMA network configurations, lack a complete overview of the full problem, making it difficult to extract general conclusions.Song and Li [17,18] present a framework for cross-layer optimization of downlink multiuser single-cell OFDMA systems, where the interactions between the physical and DLC layers are modeled using a utility function that trades fairness for throughput efficiency.This work assumes, however, that the system has an infinite number of subcarriers and proposes suboptimal allocation algorithms for practical realization.A cross-layer scheduling scheme for OFDMA wireless systems with heterogeneous delay requirements taking into account both queueing theory and information theory in modeling the system dynamics is presented in [19].The objective of maximizing system throughput with constraints on the delay and the maximum transmitted power is formulated as a mixed convex and combinatorial optimization problem.Mohanram and Bhashyan [20] propose a sub-optimal joint subcarrier and power allocation for channel-and queue-aware schedulers aiming at the maximization of the global average long term throughput.This scheduler, however, seems to be only applicable for traffic types without any constraint on delays.In [21] the authors propose a QoS-aware proportional fairness (QPF) scheduling policy based on a cross-layer design where the scheduler is aware of both the channel and the queue state information.The proposed approach, however, apart from using suboptimal modified greedy multicarrier proportional fairness algorithms, only considers Shannon's capacity-based data rate allocation schemes and uniform power allocation (UPA) in the frequency domain.In [22], Song et al. propose a joint channel-and queue-aware scheduler, which is called the max-delayutility (MDU) scheduling, designed to efficiently support delay-sensitive applications.However, this scheduler is only effective for traffic types without explicit constraints on the minimum achievable average data rate and/or the maximum allowable absolute delay.Furthermore, only suboptimal sorting-search algorithms for the subcarrier (subband) allocation problem and greedy algorithms for the power allocation problem are proposed.Finally, Zhou et al. [23] propose a packet-dependent adaptive cross-layer design for downlink multiuser OFDMA systems, designed to maximize the weighted sum capacity of users with multiple heterogeneous traffic queues and based on the suboptimal algorithms proposed in [19].
Scheduling and resource allocation based on crosslayer principles can be regarded as a multi-objective optimization problem taking into account not only the system throughput but also the transmitted power, the QoS constraints on traffic delay and minimum and maximum data rates, the priority levels of different traffic classes and amount of backlogged data in the queues.In general, there is not a single optimal solution to a multi-objective optimization problem, however, using tools from information theory, queueing theory, convex optimization, and stochastic approximation [24], a unified framework for channel-and queue-aware QoS guaranteed scheduling and resource allocation for heterogeneous multiservice OFDMA wireless networks is proposed in this article.To this end, this study introduces a framework able to account for different types of traffic (e.g., best effort, non-real-time and real-time), different allocation strategies (e.g., continuous and discrete rate allocation (DRA), uniform and adaptive power allocation (APA)), protocols with different amounts of channel-and queue-awareness, and different utility functions measuring user's satisfaction in terms of, for instance, throughput, queue length and/or service time (waiting time in the queues).Channel state, physicallayer characteristics, queueing delay and/or QoS requirements are projected into utility functions and the multiobjective optimization problem is then formulated as a constrained utility maximization problem, where the objective function is the maximization of the user services' utility functions.The constraints are related to the specifications of the network and offered services under consideration, namely, power limitations, per-service rate limits, and exclusive chunk (frequency/time resource unit) assignment.The unified algorithmic framework presented in this article generalizes results presented in, for instance, [19,21,22,25,26].The proposed approach is based on dual decomposition optimization [27] and stochastic approximation techniques [24] exhibiting complexities that are linear in the number of resource units and users, and that achieve negligible duality gaps in numerical simulations based on current standards-like scenarios.Algorithms presented in this article optimize non-static utility functions based on the temporal evolution of throughput and/or waiting time of packets in the queues.Stochastic approximation techniques are used that allow these strategies to be implemented in real time.
This article is organized as follows.Section 2 presents a brief description of the system model under consideration alongside with the key assumptions made in the formulation of the optimization problem.A thorough description of the single-cell scenario, transmitter and receiver architectures, as well as of the channel model employed is also provided.As part of the cross-layer unified framework, the variables involved in the optimization problem are described in Section 3. Next, Section 4 presents a unified framework for constrained channeland queue-aware QoS guaranteed scheduling and resource allocation for heterogeneous multi-service OFDMA wireless networks.Both continuous (Shannoncapacity-based) and discrete (AMC-based) strategies are considered, and solutions based on dual-optimization techniques are provided.In Section 6, numerical results illustrating the different performance/complexity tradeoffs of the proposed unified optimization framework are presented.Special emphasis is paid to efficiency, fairness and the fulfillment of QoS requirements.Finally, Section 7 summarizes the contributions of this article, and outlines the most interesting avenues for further research.
This introduction ends with a notational remark.Vectors and matrices are denoted by lower-and uppercase bold letters, respectively.The K-dimensional identity matrix is represented by I K .The symbols R + and C serve to denote the set of non-negative real numbers and the set of complex numbers, respectively.Superscripts (•) T and (•) † are used to denote the transpose and the conjugate transpose (hermitian) of a matrix.Finally, and for the sake of clarity, a list of the most important symbols (in order of appearance) is also provided in Table 1.

System model and assumptions
Let us consider the downlink of a time-slotted MIMO-OFDMA wireless packet access network as the one depicted in Figure 1.In this setup, a BS with a total transmit power P T and equipped with N T transmit antennas provides service to N m active MS, each equipped, without loss of generality, with an equal number of receive antennas, denoted by N R .
Transmission between the BS and active MSs is organized in time slots of a fixed duration T s , assumed to be less than the channel coherence time.Thus, the channel fading can be considered constant over the whole slot and it only varies from slot to slot, i.e., a slot-based block fading channel is assumed.Each of these slots consists of a fixed number N o of OFDM symbols of duration T o + T CP = T s / N o , where T CP is the cyclic prefix duration.Slotted transmissions take place over a bandwidth B, which is divided into N b orthogonal subbands, each consisting of N sc adjacent subcarriers and with a bandwidth B b = B/N b small enough to assume that all subcarriers in a subband experience frequency flat fading.One subband in the frequency axis over one slot in the time axis forms a basic resource allocation unit.Active MS and frequency subbands in a given slot are indexed by the sets N m = {1, . . ., N m } and Without loss of generality, and in order to simplify the mathematical notation of the problem, only one service data flow (also known as connection or session) per active MS will be assumed.Depending on the traffic type, three classes of service and the associated QoS requirements and priorities must be accounted for in wireless communications [24]: -Best effort (BE) low priority services with a prescribed maximum allowable error rate but without specific requirements on rate or delay guarantees.Examples of best-effort services include applications such as email or HTTP web browsing.
-Non-real-time (nRT) services entail applications such as file transfers (FTP).They do not impose any constraint on delays but, in addition to a maximum allowable error rate, they require sustained throughput guarantees.
-Real-time (RT) high priority services are used for applications such as video conferencing and streaming entailing QoS guarantees on maximum allowable error rate, minimum throughput, and maximum delay.
Traffic flows arriving from higher layers are buffered into the corresponding N m first-in first-out (FIFO) queues at the DLC layer.At the beginning of each scheduling time interval, based on the available joint channel-and queue-state information (CSI/QSI), the crosslayer scheduling and resource allocation algorithms select some packets in the queues for transmission, which are then forwarded to the OFDM transmitter, at a rate R m (t) for all m ∈ N m , where they are adaptively modulated and channel encoded (AMC), and are allocated power and subbands, just before MIMO processing.

PHY layer modeling 2.1.1 Transmitter
Multiple-input multiple-output technology provides a great variety of techniques to exploit the multiple propagation paths between the N T transmit antennas and the N R receive antennas.Notably when CSI is available at the transmitter and receiver sides, and multiplexing in the spatial domain is not used, the joint use of maximum ratio transmission (MRT) [28] at the transmitter and maximal ratio combining (MRC) at the receiver is known to provide optimum performance in the sense of maximizing the received signal-to-noise ratio (SNR).
Let us assume that subband b has been allocated to MS m and that the BS uses an MRT scheme to exploit the spatial diversity provided by the MIMO channel.In this case, bits from the queue of MS m are channel encoded and mapped onto a sequence of symbols drawn from the allocated normalized unit energy complex constellation (e.g., BPSK, QPSK, 16QAM, 64QAM).Furthermore, before the usual OFDM modulation steps on each transmit antenna (IFFT, cyclic prefix appending and up-conversion), the symbols are allocated power and are processed in accordance with the MRT where p m, b (t) is the power allocated to MS m on subband b during the time slot t (in a given subband, power is uniformly allocated to subcarriers), and υ m,b (t) ∈ C N T ×1 denotes the unit energy linear transmit filter used by the MRT transmission system.

Channel model
The propagation channel between the BS and MS m is characterized by a power delay profile [29], common to all pairs of transmit and receive antennas, that can be expressed as where L p denotes the number of independent propagation paths, and σ 2 m,l and τ l are, respectively, the power and delay of the lth propagation path.Hence, assuming that the channel coherence time is greater than T s , the channel impulse response between transmit antenna n T and the receive antenna n R of MS m, over the whole frame period t, can be written as where . The corresponding frequency response, when evaluated over subband b (with center frequency f b ), can be safely approximated by Accordingly, the MIMO channel between the BS and MS m, for subband b and over the whole time slot period t, will be characterized by the complex valued N R × N T matrix

Receiver
At the receiver side, as usual, ideal synchronization and sampling processes, and an OFDM cyclic prefix duration greater than the maximum delay spread of the channel impulse response are assumed.In this case, the received samples at the output of the N R FFT processing stages of MS m over subcarrier c of subband b and OFDM symbol o during time slot t are given by the N R × 1 complex valued vector where ν is a noise vector with elements modeled as independent identically distributed (i.i.d.) zero-mean complex circular-symmetric Gaussian random variables, with covariance matrix According to the MRT strategy [28], the transmission filter υ m, b (t) that maximizes the SNR at the receiver side is the right singular vector of matrix H m, b (t) associated with its largest singular value, denoted as s max (H m, b (t)).In this case, the instantaneous SNR experienced by all subcarriers in subband b at the output of the maximal ratio combiner (MRC) used at the receiver side can be expressed as where

DLC layer modeling
In order to characterize the queueing behavior at the DLC layer, a slightly modified version of the model proposed by Kong et al. [[21], Section IV.A] is assumed.At the beginning of time slot t, MS m is assumed to have Q m (t) bits in the queue.If there are A m (t) bits arriving during time slot t, the queue length at the end of this time slot, assuming queues of infinite capacity, can then be expressed as where with r m (t) denoting the data rate allocated to user m during time slot t.A cross-layer resource allocation strategy that, in order to avoid the waste of resources, selects a transmission rate is said to fulfill the frugality constraint (FC) [22].
As will be shown in Section 4.1, most of the schedulers and resource allocation schemes that have been proposed in the literature can be interpreted as decision making algorithms that, at the beginning of time slot t estimate or predict the future behavior of QoS quantitative performance measures such as the throughput, average delay, queue length and/or head-of-line delay, and decide which users will be granted a transmission opportunity and the amount of resources that they will be allocated.

Predicting the queue length
As A m (t) is unknown at the beginning of time slot t, and assuming that the DLC layer only knows the average arrival data rate l m , then a prediction of the queue length at the end of this time slot can be obtained from (8) as where E x {⋅} denotes the statistical expectation operator with respect to the random variable x.

Predicting the average waiting time
Using standard stochastic approximation recursions, a recursive estimate of the slot-by-slot queue length sample average can be obtained as [24] where the step-size b t (0, 1) implements a forgetting factor in the averaging and can be selected to be either constant (i.e., b t = b) or asymptotically vanishing (e.g., b t = 1/t).Little's law [30] asserts that with stable queues the average delay at the end of time slot t can be obtained as Using ( 11) and (12), this in turn leads to a recursive prediction of the slot-by-slot average delay via

Estimating the average throughput
Stochastic approximation tools can also be used to obtain a recursive estimate of the frame-by-frame throughput sample average as

Predicting the head-of-line delay
The HOL delay of user m at the beginning of time slot t (or equivalently, the end of time slot (t -1)) can be written as W HOL,m (t) = tT s − τ 3 Optimization variables For a given set of constraints, the scheduling and resource allocation algorithm will be in charge of determining the power allocation vector optimizing a prescribed objective function.In addition to determining the power allocation values, the resource allocation algorithms should also allocate subbands and transmission rates.Nevertheless, as it will be shown next, the power allocation vector p(t) can also be used to represent the allocation of all these resources, thus simplifying the formulation of the optimization problem [9].

Subband allocation
As usual, it is assumed that subband allocation is exclusive, that is, only one MS is allowed to transmit on a given subband.Hence, the subband allocation constraints can be captured by constraining the power allocation vectors as where with R + denoting the set of all non-negative real numbers.Hence, the power allocation vector satisfies where × denotes the Cartesian product (or product set).

Rate allocation
In the downlink of multi-rate systems based on AMC, a channel estimate is obtained at the receiver of each MS and it is then fed back to the BS so that the transmission scheme, comprising a modulation format and a channel code, can be adapted in accordance with the channel characteristics.
If MS m is allocated subband b over time slot t, then the BS selects a modulation and coding scheme (MCS) that can be characterized by a transmission rate r m, b (t) (measured in bits per second).As each subband contains N sc subcarriers, the aggregated data rate allocated to MS m over time t will be given by Transmission rate r m, b (t) can be related to bit error rate (BER) observed by MS m, denoted as ɛ m , and instantaneous SNR g m, b (t) as [ [31], Chapter 9] (see also [24]) where 1 and 2 are modulation-and code-specific constants that can be accurately approximated by exponential curve fitting.This expression is general enough to obtain the BER performance of any transmission system for which the joint effects of transmission filters, channel coefficients and reception filters can be represented through an instantaneous SNR g m, b (t).For the special case of MRT/MRC scheme, g m, b (t) is defined by (7).

Discrete-rate AMC
Realistic AMC strategies can only use a discrete set N k = {0, 1, . . ., N k } of MCSs that can differ for different MSs.Each MCS is characterized by a particular transmission rate (k)  m , with (1)  m < • • • < N k m , and (0) m = 0 denoting the case where MS m does not transmit.
Given p m, b (t), δ m, b (t) and the noise variance σ 2 ν , we can use (7) to find g m, b (t) and then, considering the maximum allowable BER εm , employ (21) to select the most adequate MCS scheme as the one with transmission rate In fact, the transmission rate r m, b (t) can be expressed using the staircase function where , are the instantaneous SNR boundaries defining the MCS intervals, which can be obtained from (21) as

Continuous-rate AMC
A useful abstraction when exploring rate limits is to assume that each user's set of MCSs is infinite.In this case, the maximum allowable transmission rate fulfilling the prescribed BER constraint with equality can be obtained from (21) as where Λ m = κ −1 2 ln(κ 1 /ε m ) ≥ 1 represents the coding gap due to the utilization of a practical (rather than ideal) coding scheme.With Λ m = 1 this expression results in the Shannon's capacity limit and allows the comparison of practical AMC-based schemes against fundamental capacity-achieving benchmarks.

Problem formulation
The main objective of cross-layer scheduling and resource allocation algorithms over a wireless network is the establishment of effective policies able to optimize metrics related to spectral/energy efficiency and fairness, while satisfying prescribed QoS constraints.The issues of efficient and fair allocation of resources have been intensively investigated in the context of economics, where utility functions have been used to quantify the benefit obtained from the usage of a pool of resources.In a similar way, utility theory can be used in wireless communication networks to evaluate the degree up to which a given network configuration can satisfy users' QoS requirements [17,18].
Utility functions are used to map the resources (e.g., bandwidth, power, ...), performance criteria (e.g., throughput, delay,...) and QoS requirements (e.g., maximum tolerable error rate, maximum absolute delay, maximum allowable outage delay probability, ...) into the corresponding user's satisfaction.Different applications can be characterized by different utility functions and/or even different performance quantitative measures and QoS requirements.For instance, utility functions for BE applications are typically characterized in terms of throughput, whereas those for nRT or RT delay-sensitive applications are characterized in terms of queuing delay with QoS requirements on the sustainable throughput, and/or the average or absolute delay.Thus, in general, the satisfaction of MS m at time t can be expressed by a utility function U m (θ m (t), Ωm ) , where

Gradient-based scheduling and resource allocation
The first order Taylor's expansion of U m (θ , Ωm ) in a neighborhood of θ = θ m (t) can be written as where ∇ θ denotes the vector differential operator or gradient function with respect to θ.Thus, using this approximation, the variation of utility for MS m during time slot t is given by Using this result, the cross-layer long-term optimization problem in (26) can be rewritten, as shown in [17,18,32], as the instantaneous gradient-based optimization problem Although utility functions based on QoS quantitative performance measures other than the throughput, the average delay, the queue length and/or the HOL delay could be devised, most practical utility functions used in state-of-the-art wireless communications are based on either one of these performance measures or a combination of them.Therefore, let us assume hereafter that θ m (t) = {θ 1 In this case, using ( 11)-(2.2.4), then and Finally, using these expressions in (29) and eliminating constants not affecting the optimization process yields where the weighing (prioritization) coefficient for MS m during the time slot t is given by

Marginal utility functions 4.2.1 Max-sum-rate (MSR) rule
The MSR scheduler [33] is based on a channel-aware scheduling rule that, using maximizes the slot-by-slot aggregated transmission rate However, as stated by Song et al. [22], although it maximizes the spectral efficiency, it can lead to unfairness and queue instability, especially for nonuniform traffic patterns and MSs operating in uneven channel conditions.Furthermore, since MSs with unfavorable channel conditions can experience long deep fading periods, long delays are expected and consequently, the MSR rule is not able to support delay-sensitive applications.

Proportional fair (PF) rule
The PF scheduler [34] is based also on a channel-aware scheduling rule aiming at maximizing the logarithmicsum-throughput of the system, that is Thus, the gradient-based PF scheduling algorithm is effected by using Nevertheless, although PF rule can trade off spectral efficiency and fairness among users belonging to the same QoS class, it cannot cope with MSs with disparate QoS requirements, especially those supporting delaysensitive applications.Particularly, long deep fading starvation periods are not solved by this rule.
It is worth pointing out that for incoming low-rate data flows it is quite common that for some users T m (t) = λ m no matter how good their average channel condition is; as a result, for those users, T m (t) is not a good measure of the actual amount of resources allocated to them and so, it is better to use [35],

Modified largest weighted delay first (M-LWDF) rule
The M-LWDF scheduler was proposed by Andrews et al. [36] for single-carrier CDMA networks with a shared downlink channel and was proved to be throughput optimal.b It is based on a channel-and queue-aware scheduling rule that considers the waiting time in the queues, the instantaneous potential transmission rates and the maximum tolerable delay requirements.At each time slot t, the M-LWDF scheduler aims at choosing the best combination of queueing delay and potential transmission rate, serving the users that maximize the sum of marginal utility functions given by [36] That is, where c m (t) are arbitrary positive constants that can be used to set different priority levels between traffic flows.The M-LWDF scheduling rule remains throughput optimal if for all or some queues, the headof-line delay W HOL,m (t) is replaced by the queue length Q m (t).Thus, the scheduler can be easily implemented by time stamping arriving data packets of all MSs, and/or keeping track of the corresponding queue lengths.
In order to guarantee that users with absolute delay requirement Ď m and maximum outage delay probability requirement ξ m will be satisfied, the authors of [36] propose to properly set the values of c m (t) as providing in this way QoS differentiation among user's flows.
As stated by Andrews et al. [37], services with QoS constraints on the minimum sustainable throughput Ť m can also be supported by the M-LWDF scheduling rule, provided that the scheduler is used in conjunction with a token bucket control.In this case, each queue m is associated to a virtual token bucket, with tokens arriving at a constant rate Ť m .At each time slot, queues are served according to the M-LWDF rule, with W HOL,m (t) denoting the delay of the head-of-line token in bucket m instead of the head-of-line packet delay for queue m.After serving a given queue m, the number of tokens in the corresponding bucket must be reduced by the actual amount of data served.

Exponential (EXP) rule
The EXP scheduler, proposed by Shakkottai and Stolyar [38], is also on a channel-and queue-aware throughput optimal scheduling rule that considers the waiting time in the queues, the instantaneous potential transmission rates and the maximum tolerable delay requirements.It was proposed for single-carrier CDMA networks with a shared downlink channel but, similarly to M-LWDF, it can easily be extended to multichannel scenarios.In this case, at each time slot t, the EXP scheduler serves the users maximizing the sum of marginal utility functions given by [35] As in the M-LWDF scheduler, the head-of-line delay W HOL,m (t) can be replaced, for all or some queues, by the queue length Q m (t) without affecting the throughput optimality of this strategy.Furthermore, if providing a minimum throughput Ť m to a given flow is a goal, the EXP rule can also be modified by introducing a virtual token queue, where tokens arrive at a constant rate Ť m , and serving the queue according to the EXP rule, with W HOL,m (t) denoting the delay of the head-of-line token in bucket m.

MDU rule
To efficiently support delay-sensitive applications, Song et al. [22] proposed another joint channel-and queueaware scheduling approach, known as MDU rule, which maximizes the total utility with respect to average delays or average waiting times in the queues.Generalizing the marginal utility functions proposed by [39], the MDU scheduling rule can be treated in the unified optimization framework defined in (34) by setting where m = { m , 1 , m , 2 } is a set of constants used to differentiate between heterogeneous services and Wm is a delay threshold related to the maximum tolerable delay Ť m .In [39], based on the corresponding required QoS, these parameters were set to m = {1, 1.5} and Wm = 25 ms for packet-switched voice with end-to-end delay required to be less than 100 ms, m = {0.6,1} and Wm = 100 ms for good-quality streaming transmission requiring end-to-end delays between 150-400 ms and, finally, m = {0.5, 0} and Wm = 100 ms ms for BE traffic.Actually, using these settings the MDU scheduling for the best-effort traffic becomes the PF scheduling.

Other scheduling rules
Although not treated in this article, the unified crosslayer optimization approach defined in (34) can also be extended to scheduling rules such as those proposed in [16,23,[40][41][42].Notably, Al-Manthari et al. [41] propose the use of utility functions that are based on both the throughput and the average delay.

Unified optimization framework
The optimization problem formulated in (34) is general enough to account for different power and rate allocation strategies, either with or without FC.For clarity of presentation, the following list of acronyms will be used: UPA, APA, continuous rate allocation (CRA), DRA and FC.Furthermore, since optimization is performed on a slot-by-slot basis, from this point onwards the time dependence (i.e., (t)) of all the variables will be dropped.

UPA without FC
Let us assume a system where the scheduling and rate allocation schemes do not consider the FC.In this case, problem (34)  Let us also assume that the BS transmit power P T is uniformly allocated to all subbands.In this case, if subband b is allocated to user m * b , then the subband exclusive allocation constraint (i.e., p ∈ P) forces that for all b.Thus, using (20) in (44) it is straightforward to show that subband b must be allocated to MS m * b satisfying with r m, b obtained as in either (23), for the DRA case, or (25), for the CRA case.

APA without FC
The objective function in (44) is concave, but P is a highly non-convex discrete constraint space.Fortunately, problem (44) is separable across the subbands and, as stated in [9,27], it can be approached by using Lagrange duality principles.With μ denoting the Lagrange multiplier associated with the power constraint, the Lagrangian of (44) can be expressed as and the dual problem can then be written as [43] g(p, μ) = min Now, using the subband exclusive allocation constraint and the separability of power variables across subbands, the dual problem can be simplified as [26] g(p, μ) = min (49) The solution to the simplified dual problem is given by optimizing (49) over all (p, μ) ≽ 0. This optimization can be done iteratively and coordinate-wise, starting with the p variables and continuing with μ.

Optimizing the dual function over p
CRA: In case of using r m, b as defined in (25), and for a given value of μ, the innermost maximization in (49) provides a multilevel water-filling closed-form expression for the optimal power allocation given by where [x] + ≜ max{0, x}.Now, using (50) in (49) yields Hence, for a fixed dual variable μ, the subband b will be allocated to MS m * b satisfying DRA: In this case r m, b is a non-derivable discontinuous function.However, the approach proposed in [ [9] Chapter 3] can be applied to arrive at the optimal solution.Using (23) the set of non-negative real numbers (i.e., R + ) can be subdivided, for each MS m and subband b, into Furthermore, given that μ and p m, b belong to R + , if a power allocation p m, b is used such that As a consequence, there only exist N k candidate power allocations from which the one maximizing where Furthermore, as in the CRA case, given μ and p * m,b , the subband b must be allocated to MS m * b satisfying (53).

Optimizing the dual function over μ
Once known the optimal vector p* for a given μ, the dual optimization problem (49) reduces to Using standard properties of dual optimization problems [9,27], it can be shown that this problem is convex with respect to μ, and thus, derivative-free line search methods like, for example, Golden-section or Fibonacci, can be used to determine μ*.Once μ* has been found, it can be used to obtain optimal power, subband and rate allocation for each of the data flows in the system.

UPA and APA with FC
When considering the so-called FC, the unified optimization problem in (34) can be rewritten as This problem belongs to the class of nonlinear integer optimization programs, which have no general global optimal solution.In an attempt to provide a fast and efficient subopti-mal solution to the joint scheduling and resource allocation problem, an iterative searching algorithm providing quasi-optimal solutions is proposed in Algorithm 1.Our approach is based on a modified version of the optimal solutions presented in Sections 5.1 and 5.2.If necessary, the proposed algorithm allocates a subband b ∈ N b per iteration i, assuming that queue length and available transmit power (APA cases only) are updated, in each iteration, by taking into account the data rate and power (APA cases only) allocated to subband b.

UP A with FC
When implementing UPA strategies, if subband b is allocated to user m * b in iteration i, then the subband exclusive allocation constraint (i.e., p ∈ P) forces that for all b.Thus, in iteration i, subband b is allocated to MS m * b satisfying where m is the updated queue length of user m at the beginning of this iteration, with Q The per-subband data rate r m, b is obtained as in either (23), for the DRA case, or (25), for the CRA case.

APA with FC
When implementing APA strategies, assuming that vector p* (i) for a given μ (i) fulfils the FC, then the corresponding dual optimization problem, for iteration i, reduces to which, as previously stated, can be solved by using derivative-free line search methods.Once μ* (i) has been found, it can be used to obtain power, subband and rate allocation.
In the APA/CRA scheme the optimal power allocation in (50) must be redefined in order to fulfil the FC.Thus, where which has been obtained from (25), is the minimum power required to fulfill the FC in iteration i. Subband b will be allocated to MS m * b satisfying In the APA/DRA scheme, the power allocation in iteration i is obtained as where m,b is obtained by redefining (58) as Furthermore, as in the APA/CRA case, given μ* (i) and m,b , the subband b must be allocated to MS m * b satisfying (67).

Numerical results
Based on the unified cross-layer framework previously described, this section is devoted to the performance comparison of different scheduling and resource allocation algorithms in the downlink of a MIMO-OFDMA network.The following performance metrics will be discussed: -Average system throughput: average number of transmitted bits per second by the BS.It is obtained as the average sum-rate achieved by the whole set of users connected to the BS.
-Average delay: average amount of time the bits spent in the queue at the BS in addition to transmission time.Notice that delay can be interpreted as an indirect throughput measure.If a given MS is not allocated enough resources, the achievable throughput is below the traffic arrival rate, the corresponding queue gets unstable and delay grows toward infinity.On the other hand, if the MS is overprovisioned, the traffic arrival rate is below the maximum achievable throughput, the queue is stable, and the mean delay remains bounded.
-Fairness: Jain's fairness index [44] will be used to calculate fairness among users of the same class of service.With Ω m denoting the performance metric for user m (i.e., throughput or average delay), then Jain's fairness index is calculated as where C is the set of users belonging to a given class of service and |C| denotes the cardinality of this set.The Jain's fairness index is constrained to the set of values JFI C = 1.If all the users in C get the same Ω m , then JFI C = 1 and maximum fairness is achieved.Lower Jain's fairness index values indicate a higher variance in their achieved QoS, revealing unfairness in scheduling and resource allocation.
-Service coverage: percentage of users who achieve their QoS requirements in terms of minimum throughput or maximum allowable average or absolute delay.

Simulation configuration
Let us consider a single-cell downlink scenario where the BS, transmitting with a power of P T = 37 dBm over a carrier frequency f 0 = 2 GHz, is assumed to be located at the center of a circular coverage area with a radius R = 500 m.This BS serves a set of N u MSs that are uniformly distributed over the whole coverage area.Unless otherwise specified, a default 2 × 2 MIMO configuration will be assumed.The entire system bandwidth is B = 5.6 MHz, and is divided into N b = 64 orthogonal subbands, each with a bandwidth B b = 87.5 kHz and consisting of N sc = 8 adjacent subcarriers.Transmission between the BS and active MSs is organized in time slots of duration T s = 2.0571 ms, and each of these slots consists of N o = 20 OFDM symbols of duration (without considering the cyclic prefix) T o = 91.4286μs.Thus, the basic resource allocation unit is formed by 8 adjacent subcarriers and 20 OFDM symbols.We would like to point out that, without loss of generality, most of the chosen parameters are very much aligned with those considered in the Mobile WIMAX standard (see, for instance, [ [45] Table 2.3]).
When using DRA strategies, the set of achievable transmission rates in bits/symbol has been fixed to {0, 0.5, 1, 1.5, 2, 3, 4, 4.5}, the coding gap has been set to Λ m = 3, and the switching thresholds between transmission modes have been obtained as . In contrast, a coding gap of Λ m = 1 (Shannon's capacity limit) has been set when using CRA strategies, whose performance serves as a benchmark against which practical DRA strategies can be measured.
The channel model describing the path-losses, shadowing effects and frequency-, time-and space-selective fading experienced by the transmitted signal on its way from the BS to the MSs, has been implemented by using Stanford University Interim (SUI) channel model 4 [46] with a shadow fading standard deviation of 6 dB.The power delay profile of this model is characterized by L p = 3 Rayleigh distributed paths with power gains σ 2 m,0 = 0 dB,σ 2 m,1 = −4 dB and σ 2 m,2 = −8 dB, and corresponding delays τ 0 = 0 μs, τ 1 = 1.5 μs and τ 2 = 4 μs.Moreover, a per subcarrier AWGN power of σ 2 ν = −163.6dBW has been assumed at the receiver front-end.
To demonstrate the ability of our proposed unified framework to schedule and allocate resources to service flows with different QoS requirements, three traffic classes are considered, i.e., real time (RT), non real time (nRT) and BE.As in [22], traffic arrivals have been modeled as Poisson random variables, with a mean that depends on the average arrival rate per flow (measured in bits/s).Without loss of generality, the maximum tolerable delays (Ď m ) for each traffic class have been set to 100 ms (RT), 2 s (nRT) and 20 s (BE), and the outage delay probabilities (ξ m ) to 0.01 (RT), 0.01 (nRT) and 0.1 (BE).
All the numerical results presented in this article have been obtained by averaging the outcomes of a dynamic discrete event simulation performed over 60 scenarios, each with a particular random distribution of MSs over the coverage area, and transmitting 15,000 slots per scenario.To guarantee that the presented results correspond to the steady-state of the system, initial transitory periods of 1,000 slots per scenario, which are not accounted for in the performance evaluation process, have been used.

Comparing scheduling rules
The performance metrics versus traffic load for MSR, PF, EXP and MLWDF scheduling rules are compared in Figure 2 for an adaptive MIMO-OFDMA system serving N u = 20 RT users with the same average arrival rate.The use of uniform power and CRA strategies over a 2 × 2 MIMO system has been considered.Furthermore, since the FC can be implemented with all scheduling rules, performance results have been obtained for each scheduling rule either with or without FC.
Without FC, the EXP and MLWDF rules provide the best joint results in terms of throughput, delay, Jain's fairness indexes, and service coverage, with MLWDF achieving a slightly higher throughput, lower delay and better service coverage that EXP, at the cost of lower throughput and delay fairness indexes.c The PF scheduler, although achieves a quite good result in terms of average throughput per flow, fails in providing QoS requirements.In fact, the PF rule can only guarantee a 99% service coverage for average arrival rates per flow less than 0.3 Mbps compared to the 0.8 and 1 Mbps that can be guaranteed by EXP and MLWDF rules, respectively.The MSR scheduling rule, which only considers CSI as a quality indicator, allocates all the resources to the users with favorable channel quality conditions, and those users experiencing bad channel quality conditions suffer from starvation.Hence, as it wastes resources, MSR rule is not capable of achieving queue stability and presents a very low average throughput and an infinite d average delay per flow, irrespective of the average traffic arrival rate.
Except for a slight increase in delay Jain's fairness index, which is only perceptible for light or moderate traffic loads, the effect of implementing FC on the performance of EXP and MLDF scheduling rules is very small.This can be explained by the fact that, when calculating the weighting coefficients w m (t), the EXP and MLDF schedulers use QSI and thus, the performance gains provided by the introduction of the FC are just incremental.On the contrary, the performance improvement induced by the implementation of FC is considerable for the PF rule, and specially important for the MSR scheduler, which do not use QSI when calculating w m (t).In fact, even though the PF and MSR rules provide poorer Jain's fairness indexes than those delivered by the EXP and MLWDF rules, they can guarantee a 99% service coverage for average arrival rates per flow less than approximately 0.8 Mbps, which is almost the same that can be guaranteed when using the EXP scheduler.

Comparing allocation strategies
Figure 3 shows the performance metrics versus traffic load for a MIMO-OFDMA system using MLWDF scheduling rule and different combinations of UPA, APA, DRA and CRA strategies, with and without FC.A set of N u = 20 RT users with the same average arrival rate has been assumed.As it can be observed, APA-based strategies improve the performance of UPA-based ones.Nevertheless, this performance improvement, although noticeable for discrete rate-based systems, becomes almost negligible when using continuous rate-based schemes.This result suggests that using AMC schemes with a large set of modulation formats combined with powerful channel codes with adaptive coding rates can make unnecessary the use of power allocation strategies.
The effect of implementing FC on the system performance metrics is practically identical irrespective of the power and rate allocation strategies implemented at the cross-layer resource allocation unit.The average throughput per flow, delay, throughput JFI and service coverage are basically unaffected, and only an improvement in delay JFI is obtained with light and moderate traffic loads.Although not shown in the graphs, when implementing APA strategies, the use of FC also introduces a decrease in power consumption.This is due to the fact that resources (power and subbands) are only allocated when necessary, that is, when there is enough information in the queues ready to be transmitted.

Comparing MIMO configurations
The effects of using different N T × N R MRT/MRC MIMO configurations on the average throughput per flow and service coverage are depicted in Figure 4. Results have been obtained for an adaptive MIMO-OFDMA system using the MLWDF scheduler, uniform power and CRA strategies, without FC, to serve N u = 20 RT users with the same average arrival rate.As it can be observed, increasing the number of transmit and/or receive antennas at the PHY can significantly improve the system capacity.In fact, the increase of N T and/or N R translates into a widening of the stability region, which proves the convenience of employing MIMO spatial diversity at the PHY to support statistical QoS for upper layer protocols.For instance, Figure 4b shows that, using this particular configuration, a single transmit/receive antenna system can only guarantee a 99% service coverage for average arrival rates per flow less than 0.45 Mbps, compared to the 0.95 or 1.5 Mbps that can be guaranteed by using 2 × 2 or 4 × 4 MIMO configurations, respectively.

Performance results for heterogeneous traffic scenarios
Figure 5 shows the performance metrics versus traffic load for a MIMO-OFDMA system serving a set of heterogeneous traffic flows.The behavior of three scheduling rules, namely, MLWDF, EXP and MDU, are compared for a system implementing UPA and CRA, without FC.Simulations have been performed assuming that N = 10 users are always active in the system.Furthermore, based on the required QoS of the different traffic flows, the parameters of the MDU scheduler have been set to m = {1, 1.5} and Wm = 25 ms ms for RT users, m = {0.6,1} and Wm = 500 ms for nRT users and, finally, m = {0.5, 0} and Wm = 500 ms for BE users.As it can be observed, cross-layer scheduling and resource allocation strategies are able to fairly allocate resources among traffic classes, according to the assigned priorities c m (t), obtained from the QoS requirements.Obviously, RT users, which exhibit stringent absolute delay requirements, tend to be allocated more resources than nRT and BE users as the arrival data rates increase.For the same reasons, nRT users are allocated more resources than BE users.The result is that, although for light traffic arrivals the three classes of service can achieve good performance figures, for moderate traffic arrivals, RT and nRT users can only maintain acceptable performance at the cost of a decrease in the performance of BE users.Furthermore, for heavy traffic arrivals, the performance of RT users can only be maintained by sacrificing that of nRT and BE users.
In this particular scenario, except for very heavy traffic arrivals, the MLWDF scheduling rule provides the best performance results in terms of average throughput per flow and service coverage at the cost of a worse behavior of the delay Jain's fairness index.The MDU scheduler provides the best results in terms of both throughput and delay Jain's fairness indexes for RT and nRT traffic classes, but such a fair behavior is obtained at the cost of service coverage.The EXP scheduler sacrifices the average throughput and delay per flow of BE users to obtain a good trade-off between service coverage and delay Jain's fairness index.

Conclusions
The emergence of state-of-the-art and next-generation wireless communications networks based on adaptive MIMO-OFDMA PHY access schemes, will enable the support of a wide range of multimedia applications with heterogeneous QoS requirements.In order to optimize the resource utilization while maintaining the QoS provided to as many users as possible, these systems require of adaptive scheduling and resource allocation algorithms able to grant a proper trade off between efficiency and fairness.In this context, using tools from information and queueing theories, mathematical convex programming, and stochastic approximation, a unified framework for channel-and queue-aware QoS-guaranteed cross-layer scheduling and resource allocation algorithms has been developed in this article.The proposed unified framework generalizes previous work on this topic by encompassing different types of traffic, different utility functions measuring user's satisfaction, uniform and adaptive power allocation, continuous and DRA, and protocols with different amounts of channel-and queue-awareness.System parameters and QoS requirements have been projected into utility functions, which have then been used to formulate a unified constrained utility maximization problem, whose main aim is to balance the efficiency and fairness of resource allocation.Optimal solutions for this problem have been obtained for the UPA schemes, and novel quasi-optimal algorithms have been proposed for the APA strategies, exhibiting complexities that are linear in the number of resource units and users.
The proposed unified optimization framework allows for a fair performance comparison of different scheduling rules, different allocation strategies, different MRT/ MRC-based MIMO configurations, and different traffic scenarios.Simulation results presented in this article have shown that: -Without FC, the EXP and MLWDF rules provide the best joint performance results, with MLWDF achieving a slightly higher throughput, lower delay and better service coverage that EXP, at the cost of lower throughput and delay fairness indexes.The PF and MSR scheduling rules, which only consider CSI as a quality indicator, fail in providing QoS.However, although implementing FC has a negligible impact on the performance of EXP and MLDF scheduling rules, the performance improvement induced by FC is remarkable for the PF and MSR schedulers.
-APA-based strategies improve the performance of UPA-based ones.Nevertheless, this performance improvement, although noticeable for DRA systems, becomes almost negligible when using CRA schemes.Thus, using AMC schemes with a large set of modulation and coding formats can make unnecessary the use of power allocation strategies.
-Increasing the number of transmit and/or receive antennas at the PHY translates into a widening of the stability region, proving in this way the convenience of employing MIMO spatial diversity to support statistical QoS provision to upper layer protocols.
-Channel-and queue-aware cross-layer scheduling and resource allocation strategies can fairly allocate resources among heterogenous traffic classes, with different scheduling policies (e.g., EXP, MDU and MLWDF) providing different trade-offs between efficiency, delay, fairness and service coverage.
Simulation results have demonstrated the validity and merits of the proposed cross-layer unified approach.However, the optimization problem treated in this article is only applicable to single cell scenarios using MRT/MRC-based MIMO techniques.Therefore, to widen its application scope, current work focusses on extending the cross-layer unified approach to distributed scheduling and resource allocation in generalized MIMO-OFDMA multicellular wireless heterogeneous networks, possibly including more sophisticated MIMO techniques, one-and two-way relays, shared relays, femto-cells and/or clusters of coordinated BSs.

Endnotes
a LTE was introduced in 3GPP Releases 8 and 9 as a major step forward for UMTS-based networks, and LTE-Advanced is the fourth generation (4G) LTE standard in 3GPP Release 10. b A scheduling algorithm is said to be throughput optimal if it can keep all the queues stable if this is at all feasible to do.
c Typically, the delay Jain's fairness index is high for light traffic arrival rates because, in this case, all the flows can be served after very low average waiting times in the queues.For moderate traffic arrival rates, the  variance of per flow waiting times in the queues increases and, consequently, the delay Jain's fairness index decreases.Heavy traffic arrivals tend to cause queue instability, with almost all the flows experiencing large average delays, thus producing again an increase of the delay fairness index.
d The average delay per flow would be infinite if simulations were performed over an infinite period of time.
transmission scheme.Denoting by d (c,o) m,b (t) the symbol to be sent to MS m over subcarrier c {1, ..., N sc } of subband b and OFDM symbol o {1, ..., N o } during time slot t, then the corresponding N T × 1 transmitted vector can be expressed as x (c,o) m,b (t) = p m,b (t) N sc υ m,b (t)d (c,o) m,b (t),

Figure 1
Figure 1 System model for cross-layer downlink scheduling and resource allocation over a OFDMA wireless network.

m
(t)  , where τ (A) m (t) denotes the arrival time of the HOL packet in the queue of user m.Hence, a prediction of the HOL delay at the end of time slot t can be readily obtained as

3. 1
Power allocation Let p b (t) = [p 1,b (t) ... p Nm, b (t)] T denote the vector of power allocation values for subband b and time slot t.

Figure 5
Figure 5 Performance metrics for heterogeneous traffic.

Table 1
List of selected symbols (in order of appearance) } is the set of quantitative QoS measures used to characterize the satisfaction of MS m (e.g., throughput T m (t), average delay W m (t), queue length Q m (t) or HOL delay W HOL,m (t)) and