 Research
 Open Access
 Published:
Optimal resource allocation in fetmocell networks based on Markov modeling of interferers’ activity
EURASIP Journal on Wireless Communications and Networking volume 2012, Article number: 371 (2012)
Abstract
Femtocell networks offer a series of advantages with respect to conventional cellular networks. However, a potential massive deployment of femtoaccess points (FAPs) poses a big challenge in terms of interference management, which requires proper radio resource allocation techniques. In this article, we propose alternative optimal power/bit allocation strategies over a timefrequency frame based on a statistical modeling of the interference activity. Given the lack of knowledge of the interference activity, we assume a Bayesian approach that provides the optimal allocation, conditioned to periodic spectrum sensing, and estimation of the interference activity statistical parameters. We consider first a single FAP accessing the radio channel in the presence of a dynamical interference environment. Then, we extend the formulation to a multiFAP scenario, where nearby FAP’s react to the strategies of the other FAP’s, still within a dynamical interference scenario. The multiuser case is first approached using a strategic noncooperative game formulation. Then, we propose a coordination game based on the introduction of a pricing mechanism that exploits the backhaul link to enable the exchange of parameters (prices) among FAP’s.
Introduction
Femtocell networks are composed of cells having a coverage radius in the order of tens of meters, providing enhanced indoor coverage through the use of femtoaccess points (FAPs) or homeenhanced node B (HeNB), in the longterm evolution (LTE) terminology[1, 2]. A typical scenario is sketched in Figure1, where we can notice the wireless links among femto user equipments (FUE), macro user equipments (MUE), macro base stations (MBSs) and FAPs. More specifically, the wireless links are classified as useful or interfering depending on whether they refer, respectively, to the useful link between a transmitter and its intended receiver or to other receivers falling within its coverage area. Being installed in residential areas, e.g., home, offices, etc., the FAP’s are typically interconnected with each other through a wired link, usually an ADSL subscriber line which allows the access to a broadband Internet network, as depicted in Figure1. One of the ideas proposed in this article is to exploit the backhaul to set up a local coordination among nearby FAPs to improve the efficiency of the radio resource management (RRM), without the presence of a centralized control.
Femtocells are becoming more and more attractive due to their benefits to both cellular operators and subscribers. On the one hand, operators see femtocells as a way to improve indoor coverage and to offload wireless traffic from the macro cellular network to the wired network, thus releasing wireless channels to additional mobile users. On the other hand, subscribers see femtocells as a way to get higher quality services, either higher data throughput or better voice quality, thanks to a better indoor coverage, and seamless connectivity.
Following the current evolution of cellular standardization process, in this study we assume an LTE framework and we focus on the downlink channel, which assumes an OFDMA strategy. In this context, femtocell networks offer advantages with respect to WiFi, as they avoid vertical handoff and offer better QoS.
In view of a potential massive deployment of FAP’s, a special attention has to be devoted to RRM. In fact, different from MBSs, FAPs are typically installed by the subscribers and maintained without global planning, with no special consideration about traffic demands or interference with other cells, either femto or macro cells. Hence, a dense deployment of FAPs might induce an intolerable interference from FUE’s to MUE’s or to other FUE’s. Interference management is then arguably one of the major challenges to be faced in femtocell networks.
The goal of this study is to propose an algorithm for optimizing power/bit allocation over a joint time–frequency domain, incorporating a statistical model of the macrousers activity. Since the interference is unknown, the proposed algorithm follows a Bayesian approach, which allocates power/bits over successive time/frequency slots depending on a preliminary sensing and estimation of the parameters of the interference model. We assume a Markov modeling for simplicity, but the approach can be generalized to more sophisticated models, like e.g.,[3, 4]. More specifically, in this study the interference over different frequency subchannels is modeled as a set of statistically independent homogeneous discretetime Markov chains (DTMCs). We consider a singleuser allocation first, where a single FAP finds the optimal resource allocation according to two alternative strategies: (i) maximize the expected rate, conditioned to the result of the sensing and estimation phase, under a transmit power constraint; (ii) minimize the transmit power under the expected rate constraint.
Opportunistic spectrum access (OSA) in multicarrier networks where the channel occupancy follows a Markovian evolution has already been studied in the framework of cognitive radio (CR) in[5, 6], for example. Chen et al.[5] develop an optimal OSA scheme aimed at optimizing spectrum sensing and access policies jointly. They assumed that the secondary transmitter receives errorfree ACK signals from the secondary’s receiver, whenever the transmission is successful, and this information is used to track the state of the primary channels. Interestingly enough, Chen et al.[5] establish a separation principle that decouples the design of spectrum sensor and access policy. A similar context is studied in[6, 7], where the authors combine learning and dynamic spectrum access. Both Chen et al.[5] and Unnikrishnan and Veeravalli[6] consider an objective function that depends only on the available cognitive bandwidth and puts a constraint on the collision probability with the primary users. Anandkumar et al.[8] and Liu and Zhao[9] formulate the multiuser OSA problem as a decentralized multiarmed bandit problem[10]. In such a framework, each user learns the channel availability statistics and designs a channel access rule in order to maximize the transmission throughput (or equivalently minimize the system regret, defined as the loss in secondary throughput due to learning errors and collisions under distributed access). In[9], which is an extension of the singleuser policy proposed in[10] to the multiuser case, Liu and Zhao propose a family of distributed learning and access policies known as timedivision fair share. For these policies, they prove the minimum growth rate of the system regret, which is shown to behave logarithmically with respect to the number of time slots. Moreover, Liu and Zhao[9] distinguish the case of known number of secondary users from the case in which this number is unknown but estimated at each user through feedback. An alternative scheme for distributed resource allocation between CRs incorporating aggregated interference control is analyzed in[11, 12], where the authors propose a form of realtime multiagent reinforcement learning, known as decentralized Qlearning[13], to manage the aggregated interference. The objective function to be minimized is an expected discounted cumulative cost related to the difference between the effective signaltonoise plus interference ratio (SINR) and a target SINR, which has to be guaranteed to the primary system. This SINR is measured at some control points located in the protection contour of the primary network and it is fed back to the secondary base stations that adjust their transmit power consequently. One of the interesting aspects of such an approach is that it is modelfree and does not require the knowledge of the transition probabilities of the underlying Markov process. Finally, Geirhofer et al.[14] propose an interferenceaware resource allocation for OFDMA systems, based on the sensing and prediction of the ad hoc users from the infrastructure users.
Different from the previous studies, in this article we propose a Bayesian radio access method enabling (possibly multiple) FAP’s to allocate power/bits over a time–frequency grid based on the current belief on the interference level, as obtained from previous sensing. Since the interference cannot be known in advance, we use a Bayesian approach and formulate the utility function as the expected value of the utility conditioned to previous measurements. The goal is to relax the requirement on sensing time and allocate resources over a certain number of future time slots, depending on the interference model and on our prediction capability.
The article is organized as follows. We consider first the radio access of a single FAP and we maximize the expected rate, averaged over the interference activity model, under a transmit power constraint. In this case, the solution can be found in closed form and it represents a sort of generalized waterfilling algorithm, with water level depending on the interference activity probabilities. Then, we illustrate an alternative approach consisting in the minimization of the transmit power, subject to a constraint on the minimum average femtouser rate. Then, we generalize the proposed approaches to the multiFAP scenario, where we analyze the interaction among FAPs using a gametheoretic approach. In particular, we consider first a purely competitive game, where each FAP adopts a purely selfish strategy. Since the competitive game might lead to inefficient Nash equilibria, we also propose a coordinated game where, thanks to the exchange of a few parameters through the backhaul link, the FAP’s coordinate their action to improve upon inefficient Nash equilibria and maximize the sumrate or minimize the sumpower.
Singleuser Bayesian adaptive allocation
Femtocell networks are fully compliant with cellular standards. Given the current evolution of 3G systems, in this article we are concerned with an LTE system and the goal is to allocate power over a time–frequency grid adaptively, as a function of the current occupancy. This implies that the channel and interference power must be sensed at the beginning of each frame. Given the low mobility of indoor users, the channel can be assumed to be nearly constant over the frame. However, the interference from macrousers may vary along the frame depending on the macrouser activity. A correct power allocation across time and frequency would require a noncausal knowledge of the interference, which is of course unavailable. To circumvent this inconvenience, we propose a time–frequency resource allocation based on a Markov modeling of the interference activity over each frequency subchannel. More specifically, we assume that the activity of the macro users over the frequency subchannels is modeled as a set of statistically independent homogeneous DTMCs. The parameters of the statistical model are assumed constant within the frame, but they may vary over successive frames. Each FUE estimates the interference power and the transition probabilities of the interference activity over each subchannel and feeds this information back to the associated FAP, which computes the optimal power allocation over a time–frequency frame, following a Bayesian approach.
In this section, we assume that the interferers do not react to the strategy of the FAP of interest. In the subsequent sections, we will extend the study to the case where the other FAPs react to the choice of nearby, interfering FAPs, thus generating an iterative process, whose stability properties will properly be studied.
Markovian interference model
The interference activity over each frequency subchannel is modeled as a DTMC. We use the random binary variable S_{k,m}to indicate the macro activity over the k th subchannel, at time m: S_{k,m}= 1 if the subchannel is busy, with interference power${\sigma}_{I}^{2}(k,m)$, while S_{k,m}= 0, if the subchannel is idle. We consider different orders for the DTMC so that we can test the effect of the order on the performance of the proposed strategy. As an example, for the order L = 1,2,3, we introduce the following transition probabilities.
where k is the subchannel index and (j), (i,j), (r,i,j) are, respectively, the states for L = 1,2,3, i.e., all the binary sequence in {0,1}^{L}. The probability of being in the state i ∈ {0,1} at time m over the k th subchannel is denoted with${\Pi}_{i}^{(k,m)}=\mathrm{Pr}({S}_{k,m}=i)$. In the case of a firstorder DTMC, starting from an initial time slot m_{0} = 1, the probability${\Pi}_{i}^{(k,m)}$ can be obtained recursively as
where${\omega}_{1}^{\left(k\right)}:={p}_{00}^{\left(k\right)}$,${\mu}_{1}^{\left(k\right)}:={p}_{11}^{\left(k\right)}$, whereas the initial state$({\Pi}_{0}^{(k,1)},{\Pi}_{1}^{(k,1)})$ is obtained by observing the channel state at the time slot of index m_{0}. Equation (4) can be written in compact matrix form as
where${\mathbf{\pi}}^{(k,m)}={[{\Pi}_{0}^{(k,m)},{\Pi}_{1}^{(k,m)}]}^{T}$ and the entries${p}_{\mathit{\text{jl}}}^{\left(k\right)}$ of the transition matrix${\mathit{P}}_{1}^{\left(k\right)}$ are given in (4). Let β_{k,m}= Pr(S_{k,m}= 0) and γ_{k,m}= Pr(S_{k,m}= 1) the probabilities that the channel k at time m is, respectively, idle and busy. Then, we can iteratively calculate them at time m from Equation (4) as${\beta}_{k,m}={\Pi}_{0}^{(k,m)}$ and${\gamma}_{k,m}={\Pi}_{1}^{(k,m)}$. The generalization to higher orders is straightforward and the formulas are reported in Appendix 1, for convenience. In Appendix 1, we also report the formulas used to estimate the transition probabilities from the observations.
Maximum expected rate optimization
Having introduced the interference model, our goal now is to find the bit/power allocation over an OFDM frame composed of N subcarriers and M consecutive time slots, in order to maximize the expected rate, taking into account the macrousers activity. The assumptions underlying the proposed approach are (1) the channels are affected by multipath, with timeinvariant coefficients within each frame, supposed to be known at the transmitter side; (2) the activity of the interferers over each subchannel is modeled as a homogeneous DTMC of order L and the transition probabilities are estimated by using the ML estimators discussed in Appendix 1; (3) the activities of the interferers over different channels are statistically independent of each other; (4) the power allocation of the interferers are independent of the power allocation of the FAP of interest.
The last assumption is made to distinguish this situation from the case where the interferers are themselves sensing the channel (interference) and adapting their strategy consequently. In this second case, each adaptive transmitter reacts to the strategies of the other, thus inducing an iterative process that must properly be studied. The first scenario, which is the subject of this section, is appropriate when the interferer is an MBS, for example. The second scenario is more appropriate to model the situation where there are a few nearby FAP’s attempting to access the radio channel at the same time. The analysis of this scenario will be carried out in the next section by resorting to gametheoretic tools.
In the case where there is only one adaptive device, the FUE is supposed to measure the interference power from the macro network over each subchannel over a number of time slots that depend on the order of the Markov chain as well as on the accuracy of the estimation.
Based on the channel sensing, up to a current time slot m_{0}, our goal is to find out the optimal power allocation over a set of M successive time slots m = m_{0},m_{0} + 1,…,m_{0} + M−1. Since the interference in the slots successive to the current one is not known, we follow a Bayesian approach. More specifically, our first optimization criterion is the maximization of the expected rate, conditioned to the current estimation of the interference power profile and of the Markov chain parameters, over each frequency subchannel. In formulas, our objective function is
where
where${\sigma}_{n}^{2}\left(k\right)$ denotes the variance of noise and H_{ k }is the FAP channel transfer coefficient over the k th subchannel. The average rate is then
where${a}_{n}\left(k\right):={H}_{k}{}^{2}/{\sigma}_{n}^{2}(k)$ and${a}_{I}(k,m):={H}_{k}{}^{2}/({\sigma}_{n}^{2}\left(k\right)+{\sigma}_{I}^{2}(k,m))$. Since the transition probabilities are not a priori known, they are estimated from the observations, using Equations (42), (44) or (45), depending on the most appropriate Markov order L. Knowing the transition probabilities, the occupancy probabilities β_{k,m} and γ_{k,m} at any time m, conditioned to the observation of the channel state at the first L time slots can easily be derived by using Equations (4), (39), (40). Then, denoting with p the (time–frequency) NMdimensional power allocation vector, the maxrate optimization problem is formulated as follows
where the upper limit p^{max}(k), k = 1,…,N represents a mask constraint useful to limit the transmit power over some prescribed channels, for example, the channels occupied by the MBSs. This is a convex problem, as$\stackrel{\u0304}{r}\left(\mathit{p}\right)$ is a concave function of p and the constraint set is convex. The optimum power vector p^{∗} can be expressed in closed form by imposing the KKT conditions (see Appendix 2 for further details). The optimal power over the k th frequency subchannel, at time m, is
where${\left[x\right]}_{a}^{b}=a$ if x ≤ a,${\left[x\right]}_{a}^{b}=b$ if x ≥ b and${\left[x\right]}_{a}^{b}=x$ if a < x < b. The coefficients${\stackrel{~}{b}}_{k,m},{\stackrel{~}{c}}_{k,m},{\stackrel{~}{d}}_{k,m}$ are related to a_{ n }(k) and a_{ I }(k,m) as follows
where λ is the Lagrange multiplier. Since the optimal powers${p}_{k,m}^{\ast}$ are functions of λ, we can find this multiplier numerically as the solution of the power constraint$\sum _{m={m}_{0}}^{{m}_{0}+M1}\sum _{k=1}^{N}{p}_{k,m}^{\ast}=M{P}_{T}$. Expression (10) is a generalization of the wellknown waterfilling solution. Indeed, it can be shown that, by taking the limit for the transition probabilities going to 1 or 0, i.e., by turning the Markov chain into the degenerate case of a deterministic signal, Equation (10) converges to the waterfilling solution.
Minpower optimization strategy
Since one of the most critical issues in femtocells is interference management, an alternative optimization procedure consists in minimizing the FAP transmit power, under the constraint of guaranteeing the required rate over the link between the FAP and the associated FUE. This strategy was proposed, for example, in[15] assuming a static interference. Here, we generalize that approach to the case where the interference is dynamic and its activity evolves as a Markov chain, as described in the previous section. The objective now is to minimize the average transmit power across the N subchannels and over M consecutive time slots. Denoting with m_{0} the index of the time slot where the interference power profile is measured, the goal is to allocate power over a set of consecutive slots, starting from m_{0}, i.e., for m = m_{0},…,m_{0} + M−1, under the constraint that the expected rate, conditioned to the observation on the initial L time slots, i.e., for m = m_{0}−L + 1,…,m_{0}, does not have to be smaller than a given value R_{0}.
The optimization problem can then be formulated as
where the expected rate is computed as in (8).
The minimization problem in (11) is a convex optimization problem, since the objective function is a linear (then convex) function of p and the set is convex. The solution can be written in closed form by exploiting the KKT conditions and following the same steps as in Appendix 2, the result is
with
where the Lagrange multiplier λ is found numerically in order to satisfy the rate constraint$\stackrel{\u0304}{R}\left({\mathit{p}}^{\ast}\right)={R}_{0}$.
One important difference between the minpower and the maxrate problems is that, in the minpower case, the feasible set could be empty. If this happens, it means that the rate requirement is too high to be accommodated. Hence, either the rate requirement is lowered until the feasible set becomes nonempty, or the user is not admitted. This protocol is handled by the call admission control.
MultiFAP case: maximum expected rate game
In a scenario containing multiple nearby FAP’s implementing the radio access according to the adaptive strategy described in the previous section, each FAP may react to the power allocation of nearby FAP’s, by changing its own power allocation and so on. This interaction induces an iterative mechanism whose convergence properties have to be carefully studied. The problem can be studied using the theoretical tools of game theory, which is well suited for this kind of multiobjective decision problem. In particular, given the existence of a wired backhaul connecting the FAP’s, we will consider a purely competitive game, where each FAP (player) seeks to optimize its own utility function, irrespective of the other FAP’s performance, and a coordination game, where nearby FAP’s exchange some parameters to improve performance with respect to the purely competitive case.
Denoting again with the binary variable S_{k,m}the macrointerference activity over the channel k at time m and with Pr(S_{k,m}= 0) = β_{k,m} and Pr(S_{k,m}= 1) = γ_{k,m} the probabilities that channel k is idle or busy, at time m, the expected rate of the q th FAP is^{a}
with q = 1,…,Q, and
where${\mathcal{N}}_{q}$ is the set of neighbors of the q th FAP and${H}_{k}^{\mathit{\text{rq}}}$ is the channel transfer function of the k th subchannel between the r th transmitter and the q th receiver. The probabilities β_{k,m} and γ_{k,m} evolve in time according to a Markov chain of order L. We assume, as in the previous section that the allocation over M consecutive time slots is carried out on the basis of the observation of a number of initial time slots equal to the order of the Markov chain, i.e., L.
It is worth noticing that the major difference between the expected rate in (14) with respect to (8) is that now the interference is composed of two contributions: the dynamic interference of the macrousers, whose activities evolve as Markov chains but whose power profile, when on, is fixed, and the interference from the other FAP’s, whose activity is always on, but whose power profile, described by the vectors${p}_{k,m}^{r}$ evolve as a response to the choices of the other FAP’s.
Denoting by Ω = {1…Q} the set of Q players, with P_{ q } the maximum transmit power over a frame and with${p}_{q}^{\text{max}}\left(k\right)$ the mask constraint over each subcarrier, the problem can be cast as a game, i.e.,
where the feasible set of FAP q is
Since the objective function in (16) is strictly concave in${\mathit{p}}_{q}\in {\mathcal{P}}_{q}$, for any given p_{−q}, and the feasible set${\mathcal{P}}_{q}$ is compact and convex, game${\mathcal{G}}_{1}$ admits a nonempty solution set for any set of channels and transmit power constraints of the users. In[16], we reformulated this game as a Variational Inequality[17] and we applied the iterative gradient projection algorithm to solve it, deriving sufficient conditions for its convergence to a Nash Equilibrium (NE).
Game${\mathcal{G}}_{1}$ may possess multiple equilibria, which may not be Paretoefficient.^{b} To improve upon the performance of the NE of purely competitive game${\mathcal{G}}_{1}$, we can modify the utility function of each user in order to induce the players to incorporate a social utility function, rather than being purely selfish. For example, in[18, 19] it has been proposed to modify the utility function of each player so as to maximize the sum of all users’ rates. In principle, this change should require a centralized solution. Nevertheless, Huang et al.[18] showed that the solution of the sumrate game can be still achieved in decentralized form, provided that the players exchange some parameters, the socalled prices. These parameters induce a penalty on each player utility proportional to the rate decrease that each player strategy induces on the other players. Introducing pricing mechanisms in femtocell networks is possible, thanks to the existence of the backhaul link, which allows the exchange of prices among FAP’s. Furthermore, we will show next that every FAP needs to exchange pricing coefficients only with its neighbors, thus keeping the amount of extra signaling limited.
Generalizing the approach proposed in[18] to our Bayesian formulation, we introduce the price coefficient:
with${I}_{k,m}^{r}\left({\mathit{p}}_{r}\right):=\sum _{i\in {\mathcal{N}}_{r}}{p}_{k,m}^{i}{H}_{k}^{\mathrm{ir}}{}^{2}$. These coefficients are proportional to the marginal decrease of user r’s expected rate resulting from an increase of the q th node’s transmit power, as$\frac{\partial {R}_{r}\left(\mathit{p}\right)}{\partial {p}_{k,m}^{q}}={\Pi}_{k,m}^{r}\frac{\partial {I}_{k,m}^{r}}{\partial {p}_{k,m}^{q}}={\Pi}_{k,m}^{r}{H}_{k}^{\mathit{\text{qr}}}{}^{2}$. The incorporation of the pricing mechanism leads to the new game^{c}:
Each local problem is convex, hence the KKT conditions lead to power coefficients${p}_{k,m}^{q}$ of each FAP that, within the interval$[0,{p}_{q}^{\text{max}}(k\left)\right)$, must satisfy the following equation
where, denoting with ν_{ q }the Lagrangian multiplier, we have set
We can verify that, ∀ν_{ q }> 0, we have${\stackrel{~}{b}}^{q}{(k,m)}^{2}4{\xe3}^{q}(k,m){\stackrel{~}{c}}^{q}(k,m)\ge 0$, and the only solution is
More specifically, we get
and the optimal power allocation vector is${p}_{k,m}^{q\ast}={\left[{\stackrel{~}{p}}_{k,m}^{b}\right]}_{0}^{{p}_{q}^{\text{max}}(k)}$ where the multiplier ν_{ q } is chosen in order to satisfy the constraint$\sum _{k=1}^{N}\sum _{m={m}_{0}}^{{m}_{0}+M1}{\left[{\stackrel{~}{p}}_{k,m}^{b}\right]}_{0}^{{p}_{q}^{\text{max}}(k)}={P}_{q}$. The previous solution assumes, for each player, that the powers used by the other players are given.
In practice, the game evolves with each FAP reacting to the choices of the other FAPs. It is then fundamental to prove the convergence of this iterative mechanism. In the following, we present a version of the socalled Modified Asynchronous Distributed Pricing algorithm (MADP) proposed in[19], adapted to our formulation. To this purpose, it is useful to rewrite (19) introducing a unique index h so that the entries of the power vector p_{ q } are${p}_{h}^{q}$ for h = 1,…,NM. Then, defining the quantities
we can derive the q th user best response as
where${c}_{h}^{q}=\frac{{\beta}_{h}^{q}{\text{SNIR}}_{h}^{{\beta}^{q}}}{1+{\text{SNIR}}_{h}^{{\beta}^{q}}}+\frac{{\gamma}_{h}^{q}{\text{SNIR}}_{h}^{{\gamma}^{q}}}{1+{\text{SNIR}}_{h}^{{\gamma}^{q}}}$ and ν_{ q } and${\eta}_{h}^{q}$ are the Lagrangian multipliers. Given this setting, the modified MADP algorithm is illustrated below.
Algorithm 1 MADP algorithm
Each FAP performs its allocation over M consecutive time slots from m_{0} to m_{0} + M − 1.
Before performing its allocation each FAP has to observe q_{ s }samples from m_{0}−q_{ s } + 1 to m_{0}, in order to estimate the transition probabilities of the underlying Markov chain.
S.0: Each FAP q chooses an initial power profile in the set${\mathcal{P}}_{q}$ and set n = 0;
S.1: Each FAP computes its interference prices${\Pi}_{h}^{q}\left(n\right){H}_{k}^{\mathit{\text{iq}}}{}^{2}$ for h = 1,…,MN andsends them to its neighbors with index$i\in {\mathcal{N}}_{q}$;
S.2: At each time n, each FAP updates its power profile so as to maximize its utility function${\stackrel{\u0304}{R}}_{q}$,given the other FAP’s power profiles p_{−q}and price vectors according to${p}_{h}^{q}\left(n+1\right)={p}_{h}^{q}\left(n\right)+{\alpha}_{q}\left(n\right)\left({p}_{h}^{q\ast}{p}_{h}^{q}\left(n\right)\right)$ for h = 1,…,MN, where${p}_{h}^{q\ast}$ is given by (25);
S.3: Set n = n + 1, go to step S.1 and repeat until convergence is reached.
Following similar arguments as[19], we proved in Appendix 3 that there exists a small enough step size values α_{ q }(n) for which the MADP algorithm converges monotonically to a fixed point.
MultiFAP case: minpower game
As with the maxrate game, let us consider now the generalization of the minpower algorithm to the multiFAP case. The utility of each player is now the total transmit power over the N subchannels and the M time slots
and the constraint is that the expected rate for each FAP, conditioned to the initial observations, be not smaller than a given value${R}_{q}^{0}$. The feasible set is now
and the game is
The optimal strategy for each player amounts to solving the following optimization problem
The minimization problem in (29) for each player q, given the strategies of the others, is a convex optimization problem, since the objective function is a linear (then convex) function of p_{ q } and the set${\stackrel{~}{\mathcal{F}}}_{q}\left({\mathit{p}}_{q}\right)$, given the power vector p_{−q} of the other players, is a convex set. Imposing the KKT conditions, as in the single FAP case, the solution can be expressed in closed form as${\mathit{p}}_{q}^{\ast}=\mathit{g}\left({\mathit{p}}_{q}\right)$ whose entries are (see Appendix 4)
with
where the Lagrange multiplier λ_{ q } is chosen in order to satisfy the rate constraint${\stackrel{\u0304}{R}}_{q}({\mathit{p}}_{q}^{\ast},{\mathit{p}}_{q})={R}_{q}^{0}$. However, now the overall feasible set is not jointly convex with respect to the power vectors of all the users, i.e., it is not convex in$\mathit{p}={({\mathit{p}}_{q})}_{q=1}^{Q}$. This makes the study of this game much harder than the standard NE problem. Nevertheless, game${\stackrel{~}{\mathcal{G}}}_{1}$ is a Generalized Potential Game (GPG)[20], with a potential Φ equal to the sum power. In such a case, the existence of a NE of the potential game can be proved directly by the existence of a maximum of the potential function Φ on the set$\stackrel{~}{\mathcal{X}}$ of the game. To exploit the theory of GPG, we must prove that game${\stackrel{~}{\mathcal{G}}}_{1}$ admits a nonempty feasible set. The proof of this result is given in Appendix 5, containing the sufficient conditions under which the feasible set of the game${\stackrel{~}{\mathcal{G}}}_{1}$, i.e.,
is compact and nonempty. Hence, game${\stackrel{~}{\mathcal{G}}}_{1}=\{\mathrm{\Omega},{\{{\stackrel{~}{\mathcal{F}}}_{q}({\mathit{p}}_{q})\}}_{q\in \mathrm{\Omega}},{\{{u}_{q}({\mathit{p}}_{q})\}}_{q\in \mathrm{\Omega}}\}$ is a GPG with potential function Φ(p) the sum of the objective functions of all players, i.e.,$\mathrm{\Phi}\left(\mathit{p}\right)=\sum _{q=1}^{Q}{u}_{q}\left({\mathit{p}}_{q}\right)$.
Nevertheless, being a potential game does not guarantee the equilibrium to be efficient. Hence, as with the maxrate game, efficiency can be improved by introducing pricing. The introduction of pricing leads to a modified game which can be cast as
where λ_{ s }is the Lagrangian multiplier of user s relative to the rate constraint. The pricing coefficients${\Pi}_{k,m}^{s}$ are defined in the same way as in the maxrate case.
Numerical results
In this section, we present some numerical results in order to assess the performance of the algorithms proposed in the previous sections. Let us start with the singleFAP case. In all the simulation results, we have considered Rayleigh fading frequencyselective channels where the number of resolvable paths is 4, each one with unit variance. The number of subcarriers N is set to 12, as in LTE Primary Resource Block (PRB). In Figure2 we show the average rate per OFDM symbol as a function of the allocated time slots obtained in the maxrate problem. The simulation results have been averaged over 100 independent channel and Markov chain realizations. The different curves indicate the rate obtained by assuming different kinds of knowledge of the interference: the green curve assumes perfect knowledge of the future evolution of the macrouser activity and it is used as a benchmark case; the pink curve assumes that the interference power level over each subchannel is equal to the value observed in the first time slot; all other curves refer to the proposed algorithm, where we observe the channels in the first slot and allocate power using our proposed method. The different curves refer to different Markov orders (from L = 1 to L = 3). We also compare the case where the transition probabilities are perfectly known and the case where the probabilities are estimated from the data. The interesting behavior is that, as the order increases, our approach is able to approach the ideal case where the interference activity is noncausally known. The price to be paid is the loss of performance resulting from the estimation of the Markov parameters. Further developments could incorporate some kind of reinforcement learning to be able to allocate resources without necessarily estimating the transition probabilities, as proposed in[12], for example.
In Figure3, we report the optimal rate of our algorithm versus the number of time slots, for different Markov orders, L = 3 in the upper subplot and L = 1 in the lower. We have considered different numbers of samples q_{ s }used to estimate the transition probabilities. Of course, the higher is the Markov order, the greater is the number of parameters to estimate. In fact, we can notice that, for the same number of samples q_{ s } used for the estimate, the performance loss with respect to the ideal case of perfect knowledge of the macrouser activity (curve with red squared markers) is much higher for L = 3 than for L = 1. The aim of Figure4 is to show the cumulative rate versus the number of slots m_{0}used for the recursive estimation of the transition probabilities assuming M = 3 and by modeling the macrouser activity as a firstorder Markov chain. We observe that, after less than 50 time slots, the performance gets very close to the asymptotic case. Finally, in Figure5, we show the performance of the minpower allocation strategy. The utility function is the SNR [dB] at the FUE receiver obtained for different number of allocated time slots. As in the maxrate case, it is evident the advantage of increasing the order of statistical knowledge (from L = 1 to L = 3) and it can be observed a gain of about 4 dB with respect to the case where no knowledge about the macrouser activity has been assumed.
Finally, we provide some numerical examples to assess the performance of the proposed approaches (maxrate and minpower) in the multiuser case. The reference scenario is composed of one MBS and Q=10 FAPs randomly distributed over a square area. The MBS activity is modeled as a thirdorder Markov chain and the results have been averaged over 50 independent Markov chain realizations. In Figure6, we have reported the users’ sumrate versus the iteration index for the maximum expected rate game${\mathcal{G}}_{1}$ in order to test the convergence of the algorithm. It can be observed that it converges in a few iterations. In Figures7 and8, we depict the FAPs’ sumrate versus the number of allocated time slots. In particular, Figure7 refers to the purely competitive maximum expected rate game${\mathcal{G}}_{1}$, while Figure8 refers to the modified pricing game${\mathcal{G}}_{2}$. In both cases, we assumed the same maximum transmit power per FAP. The three different curves in each figure indicate the sumrate obtained by assuming perfect (noncausal) knowledge of the macrouser activity, no knowledge at all (thus assuming the interference to remain equal to the values observed in the first slots of each frame), or only knowledge of the Markov parameters. Both figures show that acquiring a statistical knowledge (estimation) of the interference activity parameters (Markov transition rates) yields a performance advantage over the case with no information and brings the performance close to the ideal case of perfect noncausal knowledge of the interference activity. Of course, as time evolves, there is a mismatch between what is predicted and the real interference so that the performance improvement tends to decrease in time. Furthermore, comparing Figures7 and8, it is evident that the gain achieved with the introduction of pricing.
Considering the same scenario, in Figures9 and10 we report the simulation results corresponding to our proposed minimum power games,${\stackrel{~}{\mathcal{G}}}_{1}$ and${\stackrel{~}{\mathcal{G}}}_{2}$. Figure9 refers to the minpower game with no pricing, while Figure10 refers to the game including pricing. The curves show the average SNR per FUE as a function of the number of allocated time slots. The expected target rate${R}_{q}^{0}$ in both cases is set to 3 bps for each user. From both Figures9 and10 we can verify that, also in this case, the simple statistical knowledge of the transition rates yields performance close to the ideal case where the interference activity is noncausally known. Observe that the curve referring to the noncausal knowledge tends to have zero slope asymptotically and the statistical knowledge curve presents a performance gain which tends to decrease as time evolves due to the mismatch between what is predicted and the real interference. In both Figures9 and10, the advantage of the statistical approach with respect to the case where there is no knowledge is considerable and by comparing the two figures it is evident the performance gain due to the introduction of a pricing mechanism.
Conclusion
In conclusion, in this article we have shown how the estimation of the interference statistical parameters can be beneficial to improve the performance of a power allocation technique, provided that the statistical model fits the real data. In this study, we assume that the transition probabilities are estimated from the data. An interesting future direction consists in incorporating methods which do not really require such an estimation, but acquire the proper behavior through reinforcement learning. The interesting part of our method is that, for a given estimation of the transition probabilities, the power allocation across the set of time slots/frequency channels is found in closed form. This is indeed useful to save convergence time with respect to gradientbased techniques.
The other important contribution of this article is the decentralized approach for resource allocation based on game theory, in the case where the interference is dynamically varying. Our Bayesian formulation of the game provides interesting results in such an uncertain environment. Finally, the introduction of coordination among FAP’s based on the exchange of a few parameters (prices) through the backhaul link has been shown to provide significant performance gains with respect to the purely competitive game.
Further developments should incorporate a robust approach for the situation where the interference statistical model is not known or the statistical parameters are timevarying. One more critical aspect is the availability of the backhaul link for the exchange of prices. Since such a link is affected by random delays, it may be useful to incorporate robust mechanisms to cope with the situation where the price does not arrive within a maximum tolerable delay.
Appendix 1
Extending the Markov model used in (5) to higherorder chains, we can write the time evolution of the occupancy probabilities conditioned to the observations of the channel state at the first L time slots as
for$m=L+1,\dots ,\infty $ and given the initial state π^{(k,1,…,L)}. More specifically, the entries of the 2^{L}dimensional vector π^{(k,m−L + 1,…,m)}are defined as
while the transition matrix${\mathit{P}}_{L}^{\left(k\right)}$ are expressed, respectively, for L = 2 as
with${\omega}_{2}^{\left(k\right)}={p}_{000}^{\left(k\right)}$,${\theta}_{2}^{\left(k\right)}={p}_{100}^{\left(k\right)}$,${\nu}_{2}^{\left(k\right)}={p}_{010}^{\left(k\right)}$,${\mu}_{2}^{\left(k\right)}={p}_{111}^{\left(k\right)}$ and for L = 3 as
where${\omega}_{3}^{\left(k\right)}={p}_{0000}^{\left(k\right)}$,${\lambda}_{3}^{\left(k\right)}={p}_{1000}^{\left(k\right)}$,${\nu}_{3}^{\left(k\right)}={p}_{0010}^{\left(k\right)}$,${\psi}_{3}^{\left(k\right)}=$${p}_{1010}^{\left(k\right)}$,${\eta}_{3}^{\left(k\right)}={p}_{0100}^{\left(k\right)}$,${\theta}_{3}^{\left(k\right)}={p}_{1100}^{\left(k\right)}$,${\gamma}_{3}^{\left(k\right)}={p}_{0111}^{\left(k\right)}$,${\mu}_{3}^{\left(k\right)}=$${p}_{1111}^{\left(k\right)}$. Hence, assuming for simplicity of notation${\Pi}_{\mathit{\text{ij}}}^{(k,m2,m1)}={\Pi}_{\mathit{\text{ij}}}^{\left(k\right)}$ and${\Pi}_{\mathit{\text{rij}}}^{(k,m3,m2,m1)}={\Pi}_{\mathit{\text{rij}}}^{\left(k\right)}$, the probabilities that the k th subchannel is idle (busy) at time m, i.e., β_{k,m}(γ_{k,m}) will be at time m for L = 2,3, respectively,
and
The transition probabilities of a Markov chain of arbitrary order can be estimated from the observed data using the maximum likelihood strategy, as suggested for example in[21, 22]. To simplify the description of the estimator we focus on a firstorder Markov chain, but the extension to higher orders is straightforward. Let us assume that a set of m states, namely, s^{m}≡ s_{k,1},…,s_{k,m}, are observed. The probability of observing a specific sequence of states is
Let n_{ ij }(l) denote the number of times the state i at time l − 1 switches to state j at time l. It has been proved in[21] that for a stationary Markov chain, the set${n}_{\mathit{\text{ij}}}=\sum _{l=2}^{m}{n}_{\mathit{\text{ij}}}\left(l\right)$ forms a set of sufficient statistics. Furthermore, the maximum likelihood estimator based on the set s^{m} is
Introducing the counter${N}_{\mathit{\text{ij}}}\left(m\right):=\sum _{l=2}^{m}{n}_{\mathit{\text{ij}}}\left(l\right)$, (42) can be rewritten in a recursive form as
The extension of the estimator to higherorder Markov chains is straightforward. As shown in[21], the estimators for a second and thirdorder Markov chain are, respectively,
and
Appendix 2
In order to solve the problem (9) let us consider the following Lagrangian function
where λ, μ_{k,m}, ν_{k,m} are the Lagrangian multipliers and the KKT conditions can be written as
Observe that if p_{k,m}< p^{max}(k) then ν_{k,m}= 0 and this system can be reduced to
By exploiting the following inequality whose validity can easily be proved
we can deduce that if λ < a_{ n }(k)β_{k,m} + a_{ I }(k,m)γ_{k,m} the second inequality in (48) can hold only if p_{k,m}> 0 so that from the first equation in (48) it results
On the other hand if λ ≥ a_{ n }(k)β_{k,m} + a_{ I }(k,m)γ_{k,m}, then p_{k,m}> 0 is never verified since it would imply
which violates the complementary conditions. As a consequence for λ ≥ a_{ n }(k)β_{k,m} + a_{ I }(k,m)γ_{k,m} we have p_{k,m}= 0.Let us now solve Equation (50), i.e.,
then defining${\xe3}_{k,m}=\lambda \phantom{\rule{0.3em}{0ex}}{a}_{n}\left(k\right){a}_{I}(k,m)$,${\stackrel{~}{b}}_{k,m}=\lambda \left[{a}_{n}\right(k)+{a}_{I}(k,m\left)\right]{a}_{n}\left(k\right){a}_{I}(k,m)$ and${\stackrel{~}{c}}_{k,m}=\lambda {a}_{n}\left(k\right){\beta}_{k,m}{a}_{I}(k,m){\gamma}_{k,m}$
Let us now make some useful observations:

1.
The solutions in (53) are always real. In fact the term under the squared root is always positive ∀ λ ≥ 0 since it results
$$\begin{array}{l}{\stackrel{~}{b}}_{k,m}^{2}4{\xe3}_{k,m}{\stackrel{~}{c}}_{k,m}={\lambda}^{2}{\left({a}_{n}\left(k\right){a}_{I}(k,m)\right)}^{2}+2\lambda {a}_{n}\left(k\right)\\ \phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{2em}{0ex}}\times {a}_{I}(k,m)\left({a}_{n}\left(k\right){a}_{I}(k,m)\right)\\ \phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{2em}{0ex}}\times \left(12{\gamma}_{k,m}\right)+{a}_{n}{\left(k\right)}^{2}{a}_{I}{(k,m)}^{2}\end{array}$$(54)where a_{ n }(k)−a_{ I }(k,m) > 0 and the minimum, achieved for γ_{k,m}= 1, is given by
$${\left[\lambda \left({a}_{n}\left(k\right){a}_{I}(k,m)\right){a}_{n}\left(k\right){a}_{I}(k,m)\right]}^{2}\ge 0,\phantom{\rule{1em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\forall \lambda \ge 0\phantom{\rule{2.77695pt}{0ex}};$$(55) 
2.
The solution
$$\begin{array}{l}{p}_{k,m}^{a}=\frac{{\stackrel{~}{b}}_{k,m}\sqrt{{\stackrel{~}{b}}_{k,m}^{2}4{\xe3}_{k,m}{\stackrel{~}{c}}_{k,m}}}{2{\xe3}_{k,m}}\end{array}$$(56)is always negative since the inequality
$$\phantom{\rule{7.0pt}{0ex}}\begin{array}{l}\sqrt{{\stackrel{~}{b}}_{k,m}^{2}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}4\lambda {a}_{n}(k){a}_{I}(k,m)\left(\lambda \phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{a}_{n}(k){\beta}_{k,m}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{a}_{I}(k,m){\gamma}_{k,m}\right)}\\ \phantom{\rule{1em}{0ex}}>{\stackrel{~}{b}}_{k,m}\end{array}$$(57)is verified for all λ > 0;

3.
The sign of the solution
$$\begin{array}{l}{p}_{k,m}^{b}=\frac{{\stackrel{~}{b}}_{k,m}+\sqrt{{\stackrel{~}{b}}_{k,m}^{2}4{\xe3}_{k,m}{\stackrel{~}{c}}_{k,m}}}{2{\xe3}_{k,m}}\end{array}$$(58)can be studied by considering the following inequality
$$\phantom{\rule{7.0pt}{0ex}}\begin{array}{l}\sqrt{{\stackrel{~}{b}}_{k,m}^{2}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}4\lambda {a}_{n}(k){a}_{I}(k,m)\left(\lambda \phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{a}_{n}(k){\beta}_{k,m}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{a}_{I}(k,m){\gamma}_{k,m}\right)}\\ \phantom{\rule{1em}{0ex}}>{\stackrel{~}{b}}_{k,m}.\end{array}$$(59)In particular^{d} it results
$${p}_{k,m}^{b}=\left\{\begin{array}{lll}>0& \text{for}& \lambda <{a}_{n}\left(k\right){\beta}_{k,m}+{a}_{I}(k,m){\gamma}_{k,m}\\ \le 0& \text{for}& \lambda \ge {a}_{n}\left(k\right){\beta}_{k,m}+{a}_{I}(k,m){\gamma}_{k,m}\end{array}\right..$$(60)
According to the above considerations the solution for 0 < p_{k,m}< p^{max}(k) and λ < a_{ n }(k)β_{k,m} + a_{ I }(k,m)γ_{k,m} is${p}_{k,m}={p}_{k,m}^{b}$, hence we can write
so that the optimal solution can be written as${p}_{k,m}^{\ast}={\left[{p}_{k,m}^{b}\right]}_{0}^{{p}^{\text{max}}\left(k\right)}$ with$\sum _{k=1}^{N}\sum _{m={m}_{0}}^{{m}_{0}+M1}{\left[{p}_{k,m}^{b}\right]}_{0}^{{p}^{\text{max}}\left(k\right)}={P}_{T}M$.
Appendix 3
Convergence Analysis of MADP Algorithm
Proceeding as in[19], in order to prove the convergence of the algorithm it is sufficient to show that

(a)
With a proper choice of the step α _{ q }(n), MADP converges to a fixed point;

(b)
This point is a solution of the KKT conditions of the modified game in (19) and then it is also a solution point of the optimization problem
$$\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\begin{array}{ll}\underset{\mathit{p}}{\text{max}}& \stackrel{\u0304}{R}\left(\mathit{p}\right)\\ \mathrm{s.t.}& \mathit{p}\in \mathcal{P}\end{array}$$(62)
where the FAPs’ sum rate is$\stackrel{\u0304}{R}\left(\mathit{p}\right)=\sum _{q=1}^{Q}{\stackrel{\u0304}{R}}_{q}\left(\mathit{p}\right)$ and$\mathcal{P}$ is the cartesian product of the sets${\mathcal{P}}_{q}$.
Let us denote with${U}_{1}\left(n\right)=\stackrel{\u0304}{R}\left(\mathit{p}\right(n\left)\right)$ the sum utility reached at the step n of the MADP algorithm. Then for each user q we must prove that there exists a sequence α_{ q }(n) > 0 so that U_{1}(n) is monotonically increasing and convergent, i.e., U_{1}(n + 1) ≥ U_{1}(n) ∀ n and${U}_{1}\left(n\right)\to {U}_{1}^{\ast}$ as$n\to \infty $. As discussed in[19], we only need to show that U_{1}(n) is monotonically increasing, i.e., it suffices to consider a given iteration n in which user q is selected to update its power profile, and show that U_{1}(p_{ q }(n + 1)) ≥ U_{1}(p_{ q }(n)), where the total utility U_{1} is now regarded as a function of p_{ q }because only the power profile of user q is updated. Hence, our goal is to prove that U_{1}(p_{ q }(n + 1)) ≥ U_{1}(p_{ q }(n)). To do this we will use the descent lemma to bound U_{1}(p_{ q }(n + 1)). Descent lemma[19] says that if a function$F:{\mathbb{R}}^{n}\to \mathbb{R}$ is continuously differentiable and its gradient is Lipschitz continuous with Lipschitz constant equal to K then,$\forall \mathit{x},\mathit{y}\in {\mathbb{R}}^{n}$
One sufficient condition for Lipschitz continuity is that the l_{2}norm of the Hessian matrix of F(x) is bounded, in which case this bound can be used for the Lipschitz constant. It can easily be shown that it is true for U_{1}(p_{ q }). Specifically, there exists a constant${B}_{{U}_{1}}^{q}$ which upper bounds the l_{2}norm of the Hessian matrix of U_{1}(p_{ q }) independent of others’ power profiles.
Applying the Descent lemma to −U_{1}(p_{ q }), we get
Hence to prove that U_{1}(p_{ q }(n + 1)) ≥ U_{1}(p_{ q }(n)), it suffices to show that
Using the power updating rule
with the best response of user q defined in (25), the inequality in (65) can be written as
Observe that
then exploiting the result in (25), we can write the lefthand side (LHS) of (67) as
Now from (69), with the same steps as in[19], to ensure that
we can choose the step α_{ q }(n) as
where the coefficient${A}_{q}^{n}$ is defined as
Finally, in order to prove the point (b), let${U}_{1}^{\ast}$ a fixed point of the algorithm such that${U}_{1}\left(n\right)={U}_{1}^{\ast}$ for some index n. Then since this is a fixed point, it follows that${p}_{h}^{q}\left(n\right)={p}_{h}^{q\ast}$, ∀h,q. It can then be seen that for all q,${p}_{h}^{q\ast}$ must be an optimal solution to the problem (19), given the other users current power profiles and interference price vectors. Hence, p(n) will satisfy also the KKT conditions of the problem (62).
Appendix 4
In order to find the optimal solutions of the convex problem$\left({\stackrel{~}{P}}_{1}\right)$ by studying the Lagrange dual problem, some additional constraint qualification conditions must hold, beyond convexity, to ensure strong duality[23]. One simple constraint qualification is Slater’s condition, i.e., we must verify that some strictly feasible point exists. We can prove that the set${\stackrel{~}{\mathcal{F}}}_{q}\left({\mathit{p}}_{q}\right)$ for each user q fixed the strategies of the others is nonempty. For simplicity in this proof we assume w.l.o.g. m_{0} = 1. More specifically, the constraint${R}_{q}\left(\mathit{p}\right)>{R}_{q}^{0}$ can be written as
where we have denoted with${\mathcal{V}}_{N}^{q}\subseteq \{1,\dots ,N\}$ and${\mathcal{V}}_{M}^{q}\subseteq \{1,\dots ,M\}$ the subsets, respectively, of subcarriers and time slots that the player q is using during the game. Since${a}_{n}^{q}(k,m)>{a}_{I}^{q}(k,m)$, to verify (73), it is sufficient to prove that
and clearly it exists always a set of positive values${p}_{k,m}^{q},{p}_{q}^{\text{max}}\left(k\right)\phantom{\rule{0.3em}{0ex}}\in {\mathbb{R}}_{+}$ such that
Let us consider, for k = 1,…,N, m = 1,…,M, the KKT conditions of the optimization problem$\left({\stackrel{~}{P}}_{1}\right)$:
Observe that if${p}_{q}^{\text{max}}\left(k\right){p}_{k,m}^{q}>0$, then${\alpha}_{k,m}^{q}=0$ so that, by eliminating in (76) the multiplier${\mu}_{k,m}^{q}$, we obtain
where λ_{ q }> 0 otherwise complementarity yields${p}_{k,m}^{q}=0$, ∀ k = 1,…,N,m = 1,…,M, and the rate constraint is contradicted. Then, the optimum power vector must satisfy the following equation
having set
The solutions of (78) are
It can be proved that ∀λ_{ q }> 0, it results${p}_{k,m}^{a}\le 0$, b^{q}(k,m)^{2}−4a^{q}(k,m)c^{q}(k,m) ≥ 0, and
According to the above considerations the solution is${p}_{k,m}^{q}={p}_{k,m}^{b}$ for$0<{p}_{k,m}^{q}<{p}_{q}^{\text{max}}\left(k\right)$ and${\lambda}_{q}>\frac{1}{{a}_{n}^{q}(k,m){\beta}_{k,m}+{a}_{I}^{q}(k,m){\gamma}_{k,m}}$ so that we can write the optimal power allocation vector as
where the multiplier λ_{ q } is chosen in order to satisfy the constraint${\stackrel{\u0304}{R}}_{q}({\mathit{p}}_{q}^{\ast},{\mathit{p}}_{q})={R}_{q}^{0}$.
Appendix 5
Proof that the feasible set of the game ${\stackrel{~}{\mathcal{G}}}_{1}$ is compact and nonempty so that it can be cast as a GPG.
Let us start by the following definition of GPG given in[20]:
Definition 1
A Generalized Nash Equilibrium Problem is a GPG if

(a)
There exists a nonempty, closed set $\stackrel{~}{\mathcal{X}}\subseteq {\mathbb{R}}^{n}$ such that
$${\mathcal{X}}_{q}\left({\mathit{x}}_{q}\right)=\{{\mathit{x}}_{q}\in {D}_{q}\phantom{\rule{0.3em}{0ex}}:\phantom{\rule{0.3em}{0ex}}({\mathit{x}}_{q},{\mathit{x}}_{q})\in \stackrel{~}{\mathcal{X}}\}\phantom{\rule{1em}{0ex}}\forall \phantom{\rule{0.3em}{0ex}}q=1,\dots ,Q$$(83)where${D}_{q}\subseteq {\mathbb{R}}^{{n}_{q}}$^{e} are nonempty, closed sets such that$\prod _{q=1}^{Q}{D}_{q}\bigcap \stackrel{~}{\mathcal{X}}\ne \varnothing $;

(b)
There exists a continuous function, $\mathrm{\Phi}\left(\mathit{x}\right):{\mathbb{R}}^{n}\to \mathbb{R}$, named potential function, such that ∀ q∈Ω, ∀ x _{−q} and for all ${\mathit{y}}_{q},{\mathit{z}}_{q}\in {\mathcal{X}}_{q}\left({\mathit{x}}_{q}\right)$
$${u}_{q}\left({\mathit{y}}_{q},{\mathit{x}}_{q}\right){u}_{q}\left({\mathit{z}}_{q},{\mathit{x}}_{q}\right)>0$$(84)implies
$$\mathrm{\Phi}\left({\mathit{y}}_{q},{\mathit{x}}_{q}\right)\mathrm{\Phi}\left({\mathit{z}}_{q},{\mathit{x}}_{q}\right)\ge {u}_{q}\left({\mathit{y}}_{q},{\mathit{x}}_{q}\right){u}_{q}\left({\mathit{z}}_{q},{\mathit{x}}_{q}\right)$$(85)where u_{ q } is the q th player payoff function.
According to Definition 1, we have to check the validity of the conditions (a) and (b) for the game${\stackrel{~}{\mathcal{G}}}_{1}$. As regard the condition (a), let us consider the feasible set of the game${\stackrel{~}{\mathcal{G}}}_{1}$, i.e.,
where we have assumed w.l.o.g. m_{0} = 1. Then we have to prove the following lemma.
Lemma 1
The feasible set$\stackrel{~}{\mathcal{X}}$ of the game${\stackrel{~}{\mathcal{G}}}_{1}$ is a nonempty, closed and bounded (then compact) subset of${\mathbb{R}}^{\mathit{\text{NMQ}}\times 1}$ if the matrices A_{ k } defined in (91) are Pmatrices, for all k = 1,…,N, m = 1,…,M. Sufficient conditions for which this happens are
Proof
Let us start by considering the constraints${\stackrel{\u0304}{R}}_{q}\left(\mathit{p}\right)\ge {R}_{q}^{0}$ that, by considering only the subcarriers and the time slots that are effectively occupied, can be written as
where${\mathcal{V}}_{N}^{q}\subseteq \{1,\dots ,N\}$,${\mathcal{V}}_{M}^{q}\subseteq \{1,\dots ,M\}$ are the subsets, respectively, of subcarriers and time slots, that the player q is using during the game. We can note that${a}_{n}^{q}(k,m)>{a}_{I}^{q}(k,m)$ then (88) is surely valid if we prove that