Skip to main content

Optimal resource allocation in fetmocell networks based on Markov modeling of interferers’ activity

Abstract

Femtocell networks offer a series of advantages with respect to conventional cellular networks. However, a potential massive deployment of femto-access points (FAPs) poses a big challenge in terms of interference management, which requires proper radio resource allocation techniques. In this article, we propose alternative optimal power/bit allocation strategies over a time-frequency frame based on a statistical modeling of the interference activity. Given the lack of knowledge of the interference activity, we assume a Bayesian approach that provides the optimal allocation, conditioned to periodic spectrum sensing, and estimation of the interference activity statistical parameters. We consider first a single FAP accessing the radio channel in the presence of a dynamical interference environment. Then, we extend the formulation to a multi-FAP scenario, where nearby FAP’s react to the strategies of the other FAP’s, still within a dynamical interference scenario. The multi-user case is first approached using a strategic non-cooperative game formulation. Then, we propose a coordination game based on the introduction of a pricing mechanism that exploits the backhaul link to enable the exchange of parameters (prices) among FAP’s.

Introduction

Femtocell networks are composed of cells having a coverage radius in the order of tens of meters, providing enhanced indoor coverage through the use of femto-access points (FAPs) or home-enhanced node B (HeNB), in the long-term evolution (LTE) terminology[1, 2]. A typical scenario is sketched in Figure1, where we can notice the wireless links among femto user equipments (FUE), macro user equipments (MUE), macro base stations (MBSs) and FAPs. More specifically, the wireless links are classified as useful or interfering depending on whether they refer, respectively, to the useful link between a transmitter and its intended receiver or to other receivers falling within its coverage area. Being installed in residential areas, e.g., home, offices, etc., the FAP’s are typically interconnected with each other through a wired link, usually an ADSL subscriber line which allows the access to a broadband Internet network, as depicted in Figure1. One of the ideas proposed in this article is to exploit the backhaul to set up a local coordination among nearby FAPs to improve the efficiency of the radio resource management (RRM), without the presence of a centralized control.

Figure 1
figure 1

Femtocell network scenario.

Femtocells are becoming more and more attractive due to their benefits to both cellular operators and subscribers. On the one hand, operators see femtocells as a way to improve indoor coverage and to off-load wireless traffic from the macro cellular network to the wired network, thus releasing wireless channels to additional mobile users. On the other hand, subscribers see femtocells as a way to get higher quality services, either higher data throughput or better voice quality, thanks to a better indoor coverage, and seamless connectivity.

Following the current evolution of cellular standardization process, in this study we assume an LTE framework and we focus on the downlink channel, which assumes an OFDMA strategy. In this context, femtocell networks offer advantages with respect to Wi-Fi, as they avoid vertical hand-off and offer better QoS.

In view of a potential massive deployment of FAP’s, a special attention has to be devoted to RRM. In fact, different from MBSs, FAPs are typically installed by the subscribers and maintained without global planning, with no special consideration about traffic demands or interference with other cells, either femto or macro cells. Hence, a dense deployment of FAPs might induce an intolerable interference from FUE’s to MUE’s or to other FUE’s. Interference management is then arguably one of the major challenges to be faced in femtocell networks.

The goal of this study is to propose an algorithm for optimizing power/bit allocation over a joint time–frequency domain, incorporating a statistical model of the macro-users activity. Since the interference is unknown, the proposed algorithm follows a Bayesian approach, which allocates power/bits over successive time/frequency slots depending on a preliminary sensing and estimation of the parameters of the interference model. We assume a Markov modeling for simplicity, but the approach can be generalized to more sophisticated models, like e.g.,[3, 4]. More specifically, in this study the interference over different frequency subchannels is modeled as a set of statistically independent homogeneous discrete-time Markov chains (DTMCs). We consider a single-user allocation first, where a single FAP finds the optimal resource allocation according to two alternative strategies: (i) maximize the expected rate, conditioned to the result of the sensing and estimation phase, under a transmit power constraint; (ii) minimize the transmit power under the expected rate constraint.

Opportunistic spectrum access (OSA) in multicarrier networks where the channel occupancy follows a Markovian evolution has already been studied in the framework of cognitive radio (CR) in[5, 6], for example. Chen et al.[5] develop an optimal OSA scheme aimed at optimizing spectrum sensing and access policies jointly. They assumed that the secondary transmitter receives error-free ACK signals from the secondary’s receiver, whenever the transmission is successful, and this information is used to track the state of the primary channels. Interestingly enough, Chen et al.[5] establish a separation principle that decouples the design of spectrum sensor and access policy. A similar context is studied in[6, 7], where the authors combine learning and dynamic spectrum access. Both Chen et al.[5] and Unnikrishnan and Veeravalli[6] consider an objective function that depends only on the available cognitive bandwidth and puts a constraint on the collision probability with the primary users. Anandkumar et al.[8] and Liu and Zhao[9] formulate the multi-user OSA problem as a decentralized multi-armed bandit problem[10]. In such a framework, each user learns the channel availability statistics and designs a channel access rule in order to maximize the transmission throughput (or equivalently minimize the system regret, defined as the loss in secondary throughput due to learning errors and collisions under distributed access). In[9], which is an extension of the single-user policy proposed in[10] to the multi-user case, Liu and Zhao propose a family of distributed learning and access policies known as time-division fair share. For these policies, they prove the minimum growth rate of the system regret, which is shown to behave logarithmically with respect to the number of time slots. Moreover, Liu and Zhao[9] distinguish the case of known number of secondary users from the case in which this number is unknown but estimated at each user through feedback. An alternative scheme for distributed resource allocation between CRs incorporating aggregated interference control is analyzed in[11, 12], where the authors propose a form of real-time multi-agent reinforcement learning, known as decentralized Q-learning[13], to manage the aggregated interference. The objective function to be minimized is an expected discounted cumulative cost related to the difference between the effective signal-to-noise plus interference ratio (SINR) and a target SINR, which has to be guaranteed to the primary system. This SINR is measured at some control points located in the protection contour of the primary network and it is fed back to the secondary base stations that adjust their transmit power consequently. One of the interesting aspects of such an approach is that it is model-free and does not require the knowledge of the transition probabilities of the underlying Markov process. Finally, Geirhofer et al.[14] propose an interference-aware resource allocation for OFDMA systems, based on the sensing and prediction of the ad hoc users from the infrastructure users.

Different from the previous studies, in this article we propose a Bayesian radio access method enabling (possibly multiple) FAP’s to allocate power/bits over a time–frequency grid based on the current belief on the interference level, as obtained from previous sensing. Since the interference cannot be known in advance, we use a Bayesian approach and formulate the utility function as the expected value of the utility conditioned to previous measurements. The goal is to relax the requirement on sensing time and allocate resources over a certain number of future time slots, depending on the interference model and on our prediction capability.

The article is organized as follows. We consider first the radio access of a single FAP and we maximize the expected rate, averaged over the interference activity model, under a transmit power constraint. In this case, the solution can be found in closed form and it represents a sort of generalized water-filling algorithm, with water level depending on the interference activity probabilities. Then, we illustrate an alternative approach consisting in the minimization of the transmit power, subject to a constraint on the minimum average femto-user rate. Then, we generalize the proposed approaches to the multi-FAP scenario, where we analyze the interaction among FAPs using a game-theoretic approach. In particular, we consider first a purely competitive game, where each FAP adopts a purely selfish strategy. Since the competitive game might lead to inefficient Nash equilibria, we also propose a coordinated game where, thanks to the exchange of a few parameters through the backhaul link, the FAP’s coordinate their action to improve upon inefficient Nash equilibria and maximize the sum-rate or minimize the sum-power.

Single-user Bayesian adaptive allocation

Femtocell networks are fully compliant with cellular standards. Given the current evolution of 3G systems, in this article we are concerned with an LTE system and the goal is to allocate power over a time–frequency grid adaptively, as a function of the current occupancy. This implies that the channel and interference power must be sensed at the beginning of each frame. Given the low mobility of indoor users, the channel can be assumed to be nearly constant over the frame. However, the interference from macro-users may vary along the frame depending on the macro-user activity. A correct power allocation across time and frequency would require a non-causal knowledge of the interference, which is of course unavailable. To circumvent this inconvenience, we propose a time–frequency resource allocation based on a Markov modeling of the interference activity over each frequency subchannel. More specifically, we assume that the activity of the macro users over the frequency subchannels is modeled as a set of statistically independent homogeneous DTMCs. The parameters of the statistical model are assumed constant within the frame, but they may vary over successive frames. Each FUE estimates the interference power and the transition probabilities of the interference activity over each subchannel and feeds this information back to the associated FAP, which computes the optimal power allocation over a time–frequency frame, following a Bayesian approach.

In this section, we assume that the interferers do not react to the strategy of the FAP of interest. In the subsequent sections, we will extend the study to the case where the other FAPs react to the choice of nearby, interfering FAPs, thus generating an iterative process, whose stability properties will properly be studied.

Markovian interference model

The interference activity over each frequency subchannel is modeled as a DTMC. We use the random binary variable Sk,mto indicate the macro activity over the k th subchannel, at time m: Sk,m= 1 if the subchannel is busy, with interference power σ I 2 (k,m), while Sk,m= 0, if the subchannel is idle. We consider different orders for the DTMC so that we can test the effect of the order on the performance of the proposed strategy. As an example, for the order L = 1,2,3, we introduce the following transition probabilities.

p jl k = Pr ( S k , m = l S k , m 1 = j ) for L = 1
(1)
p ijl k = Pr ( S k , m = l S k , m 1 = j , S k , m 2 = i ) for L = 2
(2)
p rijl k = Pr ( S k , m = l S k , m 1 = j , S k , m 2 = i , S k , m 3 = r ) for L = 3
(3)

where k is the subchannel index and (j), (i,j), (r,i,j) are, respectively, the states for L = 1,2,3, i.e., all the binary sequence in {0,1}L. The probability of being in the state i {0,1} at time m over the k th subchannel is denoted with Π i ( k , m ) =Pr( S k , m =i). In the case of a first-order DTMC, starting from an initial time slot m0 = 1, the probability Π i ( k , m ) can be obtained recursively as

Π 0 ( k , m ) Π 1 ( k , m ) = ω 1 ( k ) 1 μ 1 ( k ) 1 ω 1 ( k ) μ 1 ( k ) Π 0 ( k , m 1 ) Π 1 ( k , m 1 ) m = 2 , 3 , ,
(4)

where ω 1 ( k ) := p 00 ( k ) , μ 1 ( k ) := p 11 ( k ) , whereas the initial state( Π 0 ( k , 1 ) , Π 1 ( k , 1 ) ) is obtained by observing the channel state at the time slot of index m0. Equation (4) can be written in compact matrix form as

π ( k , m ) = P 1 ( k ) π ( k , m 1 )
(5)

where π ( k , m ) = [ Π 0 ( k , m ) , Π 1 ( k , m ) ] T and the entries p jl ( k ) of the transition matrix P 1 ( k ) are given in (4). Let βk,m= Pr(Sk,m= 0) and γk,m= Pr(Sk,m= 1) the probabilities that the channel k at time m is, respectively, idle and busy. Then, we can iteratively calculate them at time m from Equation (4) as β k , m = Π 0 ( k , m ) and γ k , m = Π 1 ( k , m ) . The generalization to higher orders is straightforward and the formulas are reported in Appendix 1, for convenience. In Appendix 1, we also report the formulas used to estimate the transition probabilities from the observations.

Maximum expected rate optimization

Having introduced the interference model, our goal now is to find the bit/power allocation over an OFDM frame composed of N subcarriers and M consecutive time slots, in order to maximize the expected rate, taking into account the macro-users activity. The assumptions underlying the proposed approach are (1) the channels are affected by multipath, with time-invariant coefficients within each frame, supposed to be known at the transmitter side; (2) the activity of the interferers over each subchannel is modeled as a homogeneous DTMC of order L and the transition probabilities are estimated by using the ML estimators discussed in Appendix 1; (3) the activities of the interferers over different channels are statistically independent of each other; (4) the power allocation of the interferers are independent of the power allocation of the FAP of interest.

The last assumption is made to distinguish this situation from the case where the interferers are themselves sensing the channel (interference) and adapting their strategy consequently. In this second case, each adaptive transmitter reacts to the strategies of the other, thus inducing an iterative process that must properly be studied. The first scenario, which is the subject of this section, is appropriate when the interferer is an MBS, for example. The second scenario is more appropriate to model the situation where there are a few nearby FAP’s attempting to access the radio channel at the same time. The analysis of this scenario will be carried out in the next section by resorting to game-theoretic tools.

In the case where there is only one adaptive device, the FUE is supposed to measure the interference power from the macro network over each subchannel over a number of time slots that depend on the order of the Markov chain as well as on the accuracy of the estimation.

Based on the channel sensing, up to a current time slot m0, our goal is to find out the optimal power allocation over a set of M successive time slots m = m0,m0 + 1,…,m0 + M−1. Since the interference in the slots successive to the current one is not known, we follow a Bayesian approach. More specifically, our first optimization criterion is the maximization of the expected rate, conditioned to the current estimation of the interference power profile and of the Markov chain parameters, over each frequency subchannel. In formulas, our objective function is

r ̄ = 1 M m = m 0 m 0 + M 1 k = 1 N E S k , m {r( S k , m )}
(6)

where

r( S k , m )= log 1 + p k , m | H k | 2 σ n 2 ( k ) if S k , m = 0 log 1 + p k , m | H k | 2 σ n 2 ( k ) + σ I 2 ( k , m ) if S k , m = 1 ,
(7)

where σ n 2 (k) denotes the variance of noise and H k is the FAP channel transfer coefficient over the k th subchannel. The average rate is then

r ̄ = 1 M m = m 0 m 0 + M 1 k = 1 N β k , m log 1 + p k , m a n ( k ) + γ k , m × log 1 + p k , m a I ( k , m )
(8)

where a n (k):=| H k | 2 / σ n 2 (k) and a I (k,m):=| H k | 2 /( σ n 2 (k)+ σ I 2 (k,m)). Since the transition probabilities are not a priori known, they are estimated from the observations, using Equations (42), (44) or (45), depending on the most appropriate Markov order L. Knowing the transition probabilities, the occupancy probabilities βk,m and γk,m at any time m, conditioned to the observation of the channel state at the first L time slots can easily be derived by using Equations (4), (39), (40). Then, denoting with p the (time–frequency) NM-dimensional power allocation vector, the max-rate optimization problem is formulated as follows

max p r ̄ ( p ) s.t. 1 M m = m 0 m 0 + M 1 k = 1 N p k , m P T 0 p k , m p max ( k ) k , m
(9)

where the upper limit pmax(k), k = 1,…,N represents a mask constraint useful to limit the transmit power over some prescribed channels, for example, the channels occupied by the MBSs. This is a convex problem, as r ̄ (p) is a concave function of p and the constraint set is convex. The optimum power vector p can be expressed in closed form by imposing the KKT conditions (see Appendix 2 for further details). The optimal power over the k th frequency subchannel, at time m, is

p k , m = b ~ k , m + b ~ k , m 2 4 ã k , m c ~ k , m 2 ã k , m 0 p max ( k )
(10)

where [ x ] a b =a if xa, [ x ] a b =b if xb and [ x ] a b =x if a < x < b. The coefficients b ~ k , m , c ~ k , m , d ~ k , m are related to a n (k) and a I (k,m) as follows

ã k , m = λ a n ( k ) a I ( k , m ) b ~ k , m = λ [ a n ( k ) + a I ( k , m ) ] a n ( k ) a I ( k , m ) c ~ k , m = λ a n ( k ) β k , m a I ( k , m ) γ k , m ,

where λ is the Lagrange multiplier. Since the optimal powers p k , m are functions of λ, we can find this multiplier numerically as the solution of the power constraint m = m 0 m 0 + M 1 k = 1 N p k , m =M P T . Expression (10) is a generalization of the well-known water-filling solution. Indeed, it can be shown that, by taking the limit for the transition probabilities going to 1 or 0, i.e., by turning the Markov chain into the degenerate case of a deterministic signal, Equation (10) converges to the water-filling solution.

Min-power optimization strategy

Since one of the most critical issues in femtocells is interference management, an alternative optimization procedure consists in minimizing the FAP transmit power, under the constraint of guaranteeing the required rate over the link between the FAP and the associated FUE. This strategy was proposed, for example, in[15] assuming a static interference. Here, we generalize that approach to the case where the interference is dynamic and its activity evolves as a Markov chain, as described in the previous section. The objective now is to minimize the average transmit power across the N subchannels and over M consecutive time slots. Denoting with m0 the index of the time slot where the interference power profile is measured, the goal is to allocate power over a set of consecutive slots, starting from m0, i.e., for m = m0,…,m0 + M−1, under the constraint that the expected rate, conditioned to the observation on the initial L time slots, i.e., for m = m0L + 1,…,m0, does not have to be smaller than a given value R0.

The optimization problem can then be formulated as

min p 1 M k = 1 N m = m 0 m 0 + M 1 p k , m subject to R ̄ ( p ) R 0 0 p k , m p max ( k ) , k = 1 , , N , m = 1 , , M
(11)

where the expected rate is computed as in (8).

The minimization problem in (11) is a convex optimization problem, since the objective function is a linear (then convex) function of p and the set is convex. The solution can be written in closed form by exploiting the KKT conditions and following the same steps as in Appendix 2, the result is

p k , m = b ( k , m ) + b ( k , m ) 2 4 a ( k , m ) c ( k , m ) 2 a ( k , m ) 0 p max ( k ) k = 1 , , N , m = 1 , , M
(12)

with

a ( k , m ) = a n k a I k , m b ( k , m ) = a n k + a I k , m λ a n k a I k , m c ( k , m ) = 1 λ a n k β k , m + a I k , m γ k , m
(13)

where the Lagrange multiplier λ is found numerically in order to satisfy the rate constraint R ̄ ( p )= R 0 .

One important difference between the min-power and the max-rate problems is that, in the min-power case, the feasible set could be empty. If this happens, it means that the rate requirement is too high to be accommodated. Hence, either the rate requirement is lowered until the feasible set becomes non-empty, or the user is not admitted. This protocol is handled by the call admission control.

Multi-FAP case: maximum expected rate game

In a scenario containing multiple nearby FAP’s implementing the radio access according to the adaptive strategy described in the previous section, each FAP may react to the power allocation of nearby FAP’s, by changing its own power allocation and so on. This interaction induces an iterative mechanism whose convergence properties have to be carefully studied. The problem can be studied using the theoretical tools of game theory, which is well suited for this kind of multi-objective decision problem. In particular, given the existence of a wired backhaul connecting the FAP’s, we will consider a purely competitive game, where each FAP (player) seeks to optimize its own utility function, irrespective of the other FAP’s performance, and a coordination game, where nearby FAP’s exchange some parameters to improve performance with respect to the purely competitive case.

Denoting again with the binary variable Sk,mthe macro-interference activity over the channel k at time m and with Pr(Sk,m= 0) = βk,m and Pr(Sk,m= 1) = γk,m the probabilities that channel k is idle or busy, at time m, the expected rate of the q th FAP isa

R ̄ q ( p q , p q ) = m = m 0 m 0 + M 1 k = 1 N β k , m log 1 + p k , m q a n q ( k , m ) + γ k , m log 1 + p k , m q a I q ( k , m ) ,
(14)

with q = 1,…,Q, and

a n q ( k , m ) = | H k qq | 2 σ n , q 2 ( k ) + r N q p k , m r | H k rq | 2 , a I q ( k , m ) = | H k qq | 2 σ n , q 2 ( k ) + r N q p k , m r | H k rq | 2 + σ I q 2 ( k , m )
(15)

where N q is the set of neighbors of the q th FAP and H k rq is the channel transfer function of the k th subchannel between the r th transmitter and the q th receiver. The probabilities βk,m and γk,m evolve in time according to a Markov chain of order L. We assume, as in the previous section that the allocation over M consecutive time slots is carried out on the basis of the observation of a number of initial time slots equal to the order of the Markov chain, i.e., L.

It is worth noticing that the major difference between the expected rate in (14) with respect to (8) is that now the interference is composed of two contributions: the dynamic interference of the macro-users, whose activities evolve as Markov chains but whose power profile, when on, is fixed, and the interference from the other FAP’s, whose activity is always on, but whose power profile, described by the vectors p k , m r evolve as a response to the choices of the other FAP’s.

Denoting by Ω = {1…Q} the set of Q players, with P q the maximum transmit power over a frame and with p q max (k) the mask constraint over each subcarrier, the problem can be cast as a game, i.e.,

G 1 : max p q R ̄ q ( p q , p q ) subject to p q P q q Ω
(16)

where the feasible set of FAP q is

P q = p q R NM × 1 : m = m 0 m 0 + M 1 k = 1 N p k , m q P q , 0 p k , m q p q max ( k ) , k { 1 , , N } , m { m 0 , , m 0 + M 1 } .
(17)

Since the objective function in (16) is strictly concave in p q P q , for any given pq, and the feasible set P q is compact and convex, game G 1 admits a non-empty solution set for any set of channels and transmit power constraints of the users. In[16], we reformulated this game as a Variational Inequality[17] and we applied the iterative gradient projection algorithm to solve it, deriving sufficient conditions for its convergence to a Nash Equilibrium (NE).

Game G 1 may possess multiple equilibria, which may not be Pareto-efficient.b To improve upon the performance of the NE of purely competitive game G 1 , we can modify the utility function of each user in order to induce the players to incorporate a social utility function, rather than being purely selfish. For example, in[18, 19] it has been proposed to modify the utility function of each player so as to maximize the sum of all users’ rates. In principle, this change should require a centralized solution. Nevertheless, Huang et al.[18] showed that the solution of the sum-rate game can be still achieved in decentralized form, provided that the players exchange some parameters, the so-called prices. These parameters induce a penalty on each player utility proportional to the rate decrease that each player strategy induces on the other players. Introducing pricing mechanisms in femtocell networks is possible, thanks to the existence of the backhaul link, which allows the exchange of prices among FAP’s. Furthermore, we will show next that every FAP needs to exchange pricing coefficients only with its neighbors, thus keeping the amount of extra signaling limited.

Generalizing the approach proposed in[18] to our Bayesian formulation, we introduce the price coefficient:

Π k , m r := R ̄ r ( p ) I k , m r ( p r )
(18)

with I k , m r ( p r ):= i N r p k , m i | H k ir | 2 . These coefficients are proportional to the marginal decrease of user r’s expected rate resulting from an increase of the q th node’s transmit power, as R r ( p ) p k , m q = Π k , m r I k , m r p k , m q = Π k , m r | H k qr | 2 . The incorporation of the pricing mechanism leads to the new gamec:

G 2 : max p q R ̄ q ( p q , p q ) m = m 0 m 0 + M 1 k = 1 N p k , m q r N q Π k , m r | H k qr | 2 s.t. p q P q .
(19)

Each local problem is convex, hence the KKT conditions lead to power coefficients p k , m q of each FAP that, within the interval[0, p q max (k)), must satisfy the following equation

ã q (k,m) p k , m q 2 + b ~ q (k,m) p k , m q + c ~ q (k,m)=0
(20)

where, denoting with ν q the Lagrangian multiplier, we have set

ã q ( k , m ) = ν q + r N q Π k , m r | H k qr | 2 a n q ( k , m ) a I q ( k , m ) b ~ q ( k , m ) = ν q + r N q Π k , m r | H k qr | 2 a n q ( k , m ) + a I q ( k , m ) a n q ( k , m ) a I q ( k , m ) c ~ q ( k , m ) = ν q + r N q Π k , m r | H k qr | 2 a n q ( k , m ) β k , m + a I q ( k , m ) γ k , m .
(21)

We can verify that, ν q > 0, we have b ~ q ( k , m ) 2 4 ã q (k,m) c ~ q (k,m)0, and the only solution is

p ~ k , m b = b ~ q ( k , m ) + b ~ q ( k , m ) 2 4 ã q ( k , m ) c ~ q ( k , m ) 2 ã q ( k , m ) .
(22)

More specifically, we get

p k , m q = 0 if ν q + r N q Π k , m r | H k qr | 2 a n q ( k , m ) β k , m + a I q ( k , m ) γ k , m p ~ k , m b if ν q + r N q Π k , m r | H k qr | 2 < a n q ( k , m ) β k , m + a I q ( k , m ) γ k , m
(23)

and the optimal power allocation vector is p k , m q = p ~ k , m b 0 p q max ( k ) where the multiplier ν q is chosen in order to satisfy the constraint k = 1 N m = m 0 m 0 + M 1 p ~ k , m b 0 p q max ( k ) = P q . The previous solution assumes, for each player, that the powers used by the other players are given.

In practice, the game evolves with each FAP reacting to the choices of the other FAPs. It is then fundamental to prove the convergence of this iterative mechanism. In the following, we present a version of the so-called Modified Asynchronous Distributed Pricing algorithm (MADP) proposed in[19], adapted to our formulation. To this purpose, it is useful to rewrite (19) introducing a unique index h so that the entries of the power vector p q are p h q for h = 1,…,NM. Then, defining the quantities

SNIR h β q : = p h q H h qq 2 σ n , q 2 ( h ) + r N q p h r H h rq 2 , SNIR h γ q : = p h q H h qq 2 σ n , q 2 ( h ) + r N q p h r H h rq 2 + σ I q 2 ( h )
(24)

we can derive the q th user best response as

p h q = 2 c h q r N q Π h r H h qr 2 + ν q η h q p h q h=1,,MN,
(25)

where c h q = β h q SNIR h β q 1 + SNIR h β q + γ h q SNIR h γ q 1 + SNIR h γ q and ν q and η h q are the Lagrangian multipliers. Given this setting, the modified MADP algorithm is illustrated below.

Algorithm 1 MADP algorithm

Each FAP performs its allocation over M consecutive time slots from m0 to m0 + M − 1.

Before performing its allocation each FAP has to observe q s samples from m0q s + 1 to m0, in order to estimate the transition probabilities of the underlying Markov chain.

S.0: Each FAP q chooses an initial power profile in the set P q and set n = 0;

S.1: Each FAP computes its interference prices Π h q (n)| H k iq | 2 for h = 1,…,MN andsends them to its neighbors with indexi N q ;

S.2: At each time n, each FAP updates its power profile so as to maximize its utility function R ̄ q ,given the other FAP’s power profiles pqand price vectors according to p h q n + 1 = p h q n + α q (n) p h q p h q n for h = 1,…,MN, where p h q is given by (25);

S.3: Set n = n + 1, go to step S.1 and repeat until convergence is reached.

Following similar arguments as[19], we proved in Appendix 3 that there exists a small enough step size values α q (n) for which the MADP algorithm converges monotonically to a fixed point.

Multi-FAP case: min-power game

As with the max-rate game, let us consider now the generalization of the min-power algorithm to the multi-FAP case. The utility of each player is now the total transmit power over the N subchannels and the M time slots

u q ( p q )= k = 1 N m = m 0 m 0 + M 1 p k , m q
(26)

and the constraint is that the expected rate for each FAP, conditioned to the initial observations, be not smaller than a given value R q 0 . The feasible set is now

F ~ q ( p q ) = p q R NM × 1 : R ̄ q ( p q , p q ) R q 0 , 0 p k , m q p q max ( k ) , k = 1 , , N , m = m 0 , , m 0 + M 1
(27)

and the game is

G ~ 1 ={Ω, { F ~ q ( p q ) } q Ω , { u q ( p q ) } q Ω }.
(28)

The optimal strategy for each player amounts to solving the following optimization problem

( P ~ 1 ) min p q u q ( p q ) subject to p q F ~ q ( p q ) q Ω .
(29)

The minimization problem in (29) for each player q, given the strategies of the others, is a convex optimization problem, since the objective function is a linear (then convex) function of p q and the set F ~ q ( p q ), given the power vector pq of the other players, is a convex set. Imposing the KKT conditions, as in the single FAP case, the solution can be expressed in closed form as p q =g( p q ) whose entries are (see Appendix 4)

p k , m q = g p q k , m = b q ( k , m ) + b q ( k , m ) 2 4 a q ( k , m ) c q ( k , m ) 2 a q ( k , m ) 0 p q max ( k ) × k = 1 , , N , m = 1 , , M
(30)

with

a q ( k , m ) = a n q k , m a I q k , m b q ( k , m ) = a n q k , m + a I q k , m λ q a n q k , m a I q k , m c q ( k , m ) = 1 λ q a n q k , m β k , m + a I q k , m γ k , m
(31)

where the Lagrange multiplier λ q is chosen in order to satisfy the rate constraint R ̄ q ( p q , p q )= R q 0 . However, now the overall feasible set is not jointly convex with respect to the power vectors of all the users, i.e., it is not convex inp= ( p q ) q = 1 Q . This makes the study of this game much harder than the standard NE problem. Nevertheless, game G ~ 1 is a Generalized Potential Game (GPG)[20], with a potential Φ equal to the sum power. In such a case, the existence of a NE of the potential game can be proved directly by the existence of a maximum of the potential function Φ on the set X ~ of the game. To exploit the theory of GPG, we must prove that game G ~ 1 admits a non-empty feasible set. The proof of this result is given in Appendix 5, containing the sufficient conditions under which the feasible set of the game G ~ 1 , i.e.,

X ~ = p R NMQ : R ̄ q ( p ) R q 0 , 0 p k , m q p q max ( k ) , × k , m , q Ω
(32)

is compact and non-empty. Hence, game G ~ 1 ={Ω, { F ~ q ( p q ) } q Ω , { u q ( p q ) } q Ω } is a GPG with potential function Φ(p) the sum of the objective functions of all players, i.e.,Φ(p)= q = 1 Q u q ( p q ).

Nevertheless, being a potential game does not guarantee the equilibrium to be efficient. Hence, as with the max-rate game, efficiency can be improved by introducing pricing. The introduction of pricing leads to a modified game which can be cast as

G ~ 2 : min p q u q ( p q ) + m = m 0 m 0 + M 1 k = 1 N s N q λ s Π k , m s | H k qs | 2 × p k , m q q Ω subject to p q F ~ q ( p q )
(33)

where λ s is the Lagrangian multiplier of user s relative to the rate constraint. The pricing coefficients Π k , m s are defined in the same way as in the max-rate case.

Numerical results

In this section, we present some numerical results in order to assess the performance of the algorithms proposed in the previous sections. Let us start with the single-FAP case. In all the simulation results, we have considered Rayleigh fading frequency-selective channels where the number of resolvable paths is 4, each one with unit variance. The number of subcarriers N is set to 12, as in LTE Primary Resource Block (PRB). In Figure2 we show the average rate per OFDM symbol as a function of the allocated time slots obtained in the max-rate problem. The simulation results have been averaged over 100 independent channel and Markov chain realizations. The different curves indicate the rate obtained by assuming different kinds of knowledge of the interference: the green curve assumes perfect knowledge of the future evolution of the macro-user activity and it is used as a benchmark case; the pink curve assumes that the interference power level over each subchannel is equal to the value observed in the first time slot; all other curves refer to the proposed algorithm, where we observe the channels in the first slot and allocate power using our proposed method. The different curves refer to different Markov orders (from L = 1 to L = 3). We also compare the case where the transition probabilities are perfectly known and the case where the probabilities are estimated from the data. The interesting behavior is that, as the order increases, our approach is able to approach the ideal case where the interference activity is non-causally known. The price to be paid is the loss of performance resulting from the estimation of the Markov parameters. Further developments could incorporate some kind of reinforcement learning to be able to allocate resources without necessarily estimating the transition probabilities, as proposed in[12], for example.

Figure 2
figure 2

Single FAP max-rate: rate versus number of time slots.

In Figure3, we report the optimal rate of our algorithm versus the number of time slots, for different Markov orders, L = 3 in the upper subplot and L = 1 in the lower. We have considered different numbers of samples q s used to estimate the transition probabilities. Of course, the higher is the Markov order, the greater is the number of parameters to estimate. In fact, we can notice that, for the same number of samples q s used for the estimate, the performance loss with respect to the ideal case of perfect knowledge of the macro-user activity (curve with red squared markers) is much higher for L = 3 than for L = 1. The aim of Figure4 is to show the cumulative rate versus the number of slots m0used for the recursive estimation of the transition probabilities assuming M = 3 and by modeling the macro-user activity as a first-order Markov chain. We observe that, after less than 50 time slots, the performance gets very close to the asymptotic case. Finally, in Figure5, we show the performance of the min-power allocation strategy. The utility function is the SNR [dB] at the FUE receiver obtained for different number of allocated time slots. As in the max-rate case, it is evident the advantage of increasing the order of statistical knowledge (from L = 1 to L = 3) and it can be observed a gain of about 4 dB with respect to the case where no knowledge about the macro-user activity has been assumed.

Figure 3
figure 3

Single FAP max-rate: rate versus number of time slots for different number of samples used for TPs estimation.

Figure 4
figure 4

Single FAP max-rate: cumulative rate versus number of time slots used for the estimation of the transition probabilities.

Figure 5
figure 5

Single FAP min-power: SNR versus number of time slots.

Finally, we provide some numerical examples to assess the performance of the proposed approaches (max-rate and min-power) in the multiuser case. The reference scenario is composed of one MBS and Q=10 FAPs randomly distributed over a square area. The MBS activity is modeled as a third-order Markov chain and the results have been averaged over 50 independent Markov chain realizations. In Figure6, we have reported the users’ sum-rate versus the iteration index for the maximum expected rate game G 1 in order to test the convergence of the algorithm. It can be observed that it converges in a few iterations. In Figures7 and8, we depict the FAPs’ sum-rate versus the number of allocated time slots. In particular, Figure7 refers to the purely competitive maximum expected rate game G 1 , while Figure8 refers to the modified pricing game G 2 . In both cases, we assumed the same maximum transmit power per FAP. The three different curves in each figure indicate the sum-rate obtained by assuming perfect (non-causal) knowledge of the macro-user activity, no knowledge at all (thus assuming the interference to remain equal to the values observed in the first slots of each frame), or only knowledge of the Markov parameters. Both figures show that acquiring a statistical knowledge (estimation) of the interference activity parameters (Markov transition rates) yields a performance advantage over the case with no information and brings the performance close to the ideal case of perfect non-causal knowledge of the interference activity. Of course, as time evolves, there is a mismatch between what is predicted and the real interference so that the performance improvement tends to decrease in time. Furthermore, comparing Figures7 and8, it is evident that the gain achieved with the introduction of pricing.

Figure 6
figure 6

Multi-FAP max-rate: convergence of the game.

Figure 7
figure 7

Multi-FAP max-rate: sum-rate versus number of time slots for the maximum expected rate game without pricing.

Figure 8
figure 8

Multi-FAP max-rate: sum-rate versus number of time slots for the max-rate game with pricing.

Considering the same scenario, in Figures9 and10 we report the simulation results corresponding to our proposed minimum power games, G ~ 1 and G ~ 2 . Figure9 refers to the min-power game with no pricing, while Figure10 refers to the game including pricing. The curves show the average SNR per FUE as a function of the number of allocated time slots. The expected target rate R q 0 in both cases is set to 3 bps for each user. From both Figures9 and10 we can verify that, also in this case, the simple statistical knowledge of the transition rates yields performance close to the ideal case where the interference activity is non-causally known. Observe that the curve referring to the non-causal knowledge tends to have zero slope asymptotically and the statistical knowledge curve presents a performance gain which tends to decrease as time evolves due to the mismatch between what is predicted and the real interference. In both Figures9 and10, the advantage of the statistical approach with respect to the case where there is no knowledge is considerable and by comparing the two figures it is evident the performance gain due to the introduction of a pricing mechanism.

Figure 9
figure 9

Multi-FAP min-power: average SNR versus number of time slots for the min-power game without pricing.

Figure 10
figure 10

Multi-FAP min-power: average SNR versus number of time slots for the min power game with pricing.

Conclusion

In conclusion, in this article we have shown how the estimation of the interference statistical parameters can be beneficial to improve the performance of a power allocation technique, provided that the statistical model fits the real data. In this study, we assume that the transition probabilities are estimated from the data. An interesting future direction consists in incorporating methods which do not really require such an estimation, but acquire the proper behavior through reinforcement learning. The interesting part of our method is that, for a given estimation of the transition probabilities, the power allocation across the set of time slots/frequency channels is found in closed form. This is indeed useful to save convergence time with respect to gradient-based techniques.

The other important contribution of this article is the decentralized approach for resource allocation based on game theory, in the case where the interference is dynamically varying. Our Bayesian formulation of the game provides interesting results in such an uncertain environment. Finally, the introduction of coordination among FAP’s based on the exchange of a few parameters (prices) through the backhaul link has been shown to provide significant performance gains with respect to the purely competitive game.

Further developments should incorporate a robust approach for the situation where the interference statistical model is not known or the statistical parameters are time-varying. One more critical aspect is the availability of the backhaul link for the exchange of prices. Since such a link is affected by random delays, it may be useful to incorporate robust mechanisms to cope with the situation where the price does not arrive within a maximum tolerable delay.

Appendix 1

Extending the Markov model used in (5) to higher-order chains, we can write the time evolution of the occupancy probabilities conditioned to the observations of the channel state at the first L time slots as

π ( k , m L + 1 , , m ) = P L ( k ) π ( k , m L , m L + 1 , , m 1 )
(34)

form=L+1,, and given the initial state π(k,1,…,L). More specifically, the entries of the 2L-dimensional vector π(k,mL + 1,…,m)are defined as

Π ij ( k , m 1 , m ) = Pr ( S k , m 1 = i , S k , m = j ) for L = 2 , ( i , j ) { 0 , 1 } 2
(35)
Π rij ( k , m 2 , m 1 , m ) = Pr ( S k , m 2 = r , S k , m 1 = i , S k , m = j ) for L = 3 , ( r , i , j ) { 0 , 1 } 3
(36)

while the transition matrix P L ( k ) are expressed, respectively, for L = 2 as

P 2 ( k ) = ω 2 ( k ) 0 θ 2 ( k ) 0 1 ω 2 ( k ) 0 1 θ 2 ( k ) 0 0 ν 2 ( k ) 0 1 μ 2 ( k ) 0 1 ν 2 ( k ) 0 μ 2 ( k )
(37)

with ω 2 ( k ) = p 000 ( k ) , θ 2 ( k ) = p 100 ( k ) , ν 2 ( k ) = p 010 ( k ) , μ 2 ( k ) = p 111 ( k ) and for L = 3 as

P 3 ( k ) = ω 3 ( k ) 0 0 0 λ 3 ( k ) 0 0 0 1 ω 3 ( k ) 0 0 0 1 λ 3 ( k ) 0 0 0 0 ν 3 ( k ) 0 0 0 ψ 3 ( k ) 0 0 0 1 ν 3 ( k ) 0 0 0 1 ψ 3 ( k ) 0 0 0 0 η 3 ( k ) 0 0 0 θ 3 ( k ) 0 0 0 1 η 3 ( k ) 0 0 0 1 θ 3 ( k ) 0 0 0 0 1 γ 3 ( k ) 0 0 0 1 μ 3 ( k ) 0 0 0 γ 3 ( k ) 0 0 0 μ 3 ( k )
(38)

where ω 3 ( k ) = p 0000 ( k ) , λ 3 ( k ) = p 1000 ( k ) , ν 3 ( k ) = p 0010 ( k ) , ψ 3 ( k ) = p 1010 ( k ) , η 3 ( k ) = p 0100 ( k ) , θ 3 ( k ) = p 1100 ( k ) , γ 3 ( k ) = p 0111 ( k ) , μ 3 ( k ) = p 1111 ( k ) . Hence, assuming for simplicity of notation Π ij ( k , m 2 , m 1 ) = Π ij ( k ) and Π rij ( k , m 3 , m 2 , m 1 ) = Π rij ( k ) , the probabilities that the k th subchannel is idle (busy) at time m, i.e., βk,m(γk,m) will be at time m for L = 2,3, respectively,

β k , m = ω 2 ( k ) Π 00 ( k ) + θ 2 ( k ) Π 10 ( k ) + ν 2 ( k ) Π 01 ( k ) + ( 1 μ 2 ( k ) ) Π 11 ( k ) γ k , m = ( 1 ω 2 ( k ) ) Π 00 ( k ) + ( 1 θ 2 ( k ) ) Π 10 ( k ) + ( 1 ν 2 ( k ) ) Π 01 ( k ) + μ 2 ( k ) Π 11 ( k )
(39)

and

β k , m = ω 3 ( k ) Π 000 ( k ) + λ 3 ( k ) Π 100 ( k ) + ν 3 ( k ) Π 001 ( k ) + ψ 3 ( k ) Π 101 ( k ) + η 3 ( k ) Π 010 ( k ) + θ 3 ( k ) Π 110 ( k ) + 1 γ 3 ( k ) Π 011 ( k ) + 1 μ 3 ( k ) Π 111 ( k ) γ k , m = 1 ω 3 ( k ) Π 000 ( k ) + 1 λ 3 ( k ) Π 100 ( k ) + 1 ν 3 ( k ) Π 001 ( k ) + 1 ψ 3 ( k ) Π 101 ( k ) + 1 η 3 ( k ) Π 010 ( k ) + 1 θ 3 ( k ) Π 110 ( k ) + γ 3 ( k ) Π 011 ( k ) + μ 3 ( k ) Π 111 ( k ) .
(40)

The transition probabilities of a Markov chain of arbitrary order can be estimated from the observed data using the maximum likelihood strategy, as suggested for example in[21, 22]. To simplify the description of the estimator we focus on a first-order Markov chain, but the extension to higher orders is straightforward. Let us assume that a set of m states, namely, smsk,1,…,sk,m, are observed. The probability of observing a specific sequence of states is

Pr S m = s m = l = 2 m Pr S k , l = s k , l | S k , l 1 = s k , l 1 × Pr S k , 1 = s k , 1 .
(41)

Let n ij (l) denote the number of times the state i at time l − 1 switches to state j at time l. It has been proved in[21] that for a stationary Markov chain, the set n ij = l = 2 m n ij (l) forms a set of sufficient statistics. Furthermore, the maximum likelihood estimator based on the set sm is

p ̂ ij (m)= l = 2 m n ij ( l ) l = 2 m n i 0 ( l ) + n i 1 ( l ) .
(42)

Introducing the counter N ij (m):= l = 2 m n ij (l), (42) can be rewritten in a recursive form as

p ̂ ij (m)= N ij ( m 1 ) + n ij ( m ) N i 0 ( m 1 ) + N i 1 ( m 1 ) + n i 0 ( m ) + n i 1 ( m ) .
(43)

The extension of the estimator to higher-order Markov chains is straightforward. As shown in[21], the estimators for a second- and third-order Markov chain are, respectively,

p ̂ ijk (m)= l = 3 m n ijk ( l ) l = 3 m n ij ( l )
(44)

and

p ̂ ijkr (m)= l = 4 m n ijkr ( l ) l = 4 m n ijk ( l ) .
(45)

Appendix 2

In order to solve the problem (9) let us consider the following Lagrangian function

L ( p ) = 1 M k = 1 N m = m 0 m 0 + M 1 β k , m log 2 1 + p k , m a n ( k ) + γ k , m log 2 1 + p k , m a I ( k , m ) λ × 1 M k = 1 N m = m 0 m 0 + M 1 p k , m P T + k = 1 N m = m 0 m 0 + M 1 × μ k , m p k , m k = 1 N m = m 0 m 0 + M 1 ν k , m p k , m p max ( k )
(46)

where λ, μk,m, νk,m are the Lagrangian multipliers and the KKT conditions can be written as

L = 0 0 μ k , m p k , m 0 k , m 0 ν k , m ( p k , m p max ( k ) ) 0 k , m 0 λ 1 M k = 1 N m = m 0 m 0 + M 1 p k , m P T 0 .
(47)

Observe that if pk,m< pmax(k) then νk,m= 0 and this system can be reduced to

1 M λ a n ( k ) β k , m + a I ( k , m ) ( γ k , m + a n ( k ) p k , m ) ( 1 + a n ( k ) p k , m ) ( 1 + a I ( k , m ) p k , m ) p k , m = 0 λ a n ( k ) β k , m + a I ( k , m ) ( γ k , m + a n ( k ) p k , m ) ( 1 + a n ( k ) p k , m ) ( 1 + a I ( k , m ) p k , m ) 0 λ 1 M k = 1 N m = m 0 m 0 + M 1 p k , m P T 0 p k , m 0 .
(48)

By exploiting the following inequality whose validity can easily be proved

a n ( k ) β k , m + a I ( k , m ) ( γ k , m + a n ( k ) p k , m ) ( 1 + a n ( k ) p k , m ) ( 1 + a I ( k , m ) p k , m ) a n ( k ) β k , m + a I ( k , m ) ( γ k , m + a n ( k ) p k , m ) ( 1 + a n ( k ) p k , m ) ( 1 + a I ( k , m ) p k , m ) p k , m = 0 = a n ( k ) β k , m + a I ( k , m ) γ k , m ,
(49)

we can deduce that if λ < a n (k)βk,m + a I (k,m)γk,m the second inequality in (48) can hold only if pk,m> 0 so that from the first equation in (48) it results

λ= a n ( k ) β k , m + a I ( k , m ) γ k , m + a n ( k ) p k , m 1 + a n ( k ) p k , m 1 + a I ( k , m ) p k , m .
(50)

On the other hand if λa n (k)βk,m + a I (k,m)γk,m, then pk,m> 0 is never verified since it would imply

λ a n ( k ) β k , m + a I ( k , m ) γ k , m > a n ( k ) β k , m + a I ( k , m ) γ k , m + a n ( k ) p k , m 1 + a n ( k ) p k , m 1 + a I ( k , m ) p k , m
(51)

which violates the complementary conditions. As a consequence for λa n (k)βk,m + a I (k,m)γk,m we have pk,m= 0.Let us now solve Equation (50), i.e.,

ã k , m p k , m 2 + b ~ k , m p k , m + c ~ k , m = 0
(52)

then defining ã k , m =λ a n (k) a I (k,m), b ~ k , m =λ[ a n (k)+ a I (k,m)] a n (k) a I (k,m) and c ~ k , m =λ a n (k) β k , m a I (k,m) γ k , m

p k , m = b ~ k , m ± b ~ k , m 2 4 ã k , m c ~ k , m 2 ã k , m .
(53)

Let us now make some useful observations:

  1. 1.

    The solutions in (53) are always real. In fact the term under the squared root is always positive λ ≥ 0 since it results

    b ~ k , m 2 4 ã k , m c ~ k , m = λ 2 a n ( k ) a I ( k , m ) 2 + 2 λ a n ( k ) × a I ( k , m ) a n ( k ) a I ( k , m ) × 1 2 γ k , m + a n ( k ) 2 a I ( k , m ) 2
    (54)

    where a n (k)−a I (k,m) > 0 and the minimum, achieved for γk,m= 1, is given by

    λ a n ( k ) a I ( k , m ) a n ( k ) a I ( k , m ) 2 0,λ0;
    (55)
  2. 2.

    The solution

    p k , m a = b ~ k , m b ~ k , m 2 4 ã k , m c ~ k , m 2 ã k , m
    (56)

    is always negative since the inequality

    b ~ k , m 2 4 λ a n ( k ) a I ( k , m ) λ a n ( k ) β k , m a I ( k , m ) γ k , m > b ~ k , m
    (57)

    is verified for all λ > 0;

  3. 3.

    The sign of the solution

    p k , m b = b ~ k , m + b ~ k , m 2 4 ã k , m c ~ k , m 2 ã k , m
    (58)

    can be studied by considering the following inequality

    b ~ k , m 2 4 λ a n ( k ) a I ( k , m ) λ a n ( k ) β k , m a I ( k , m ) γ k , m > b ~ k , m .
    (59)

    In particulard it results

    p k , m b = > 0 for λ < a n ( k ) β k , m + a I ( k , m ) γ k , m 0 for λ a n ( k ) β k , m + a I ( k , m ) γ k , m .
    (60)

According to the above considerations the solution for 0 < pk,m< pmax(k) and λ < a n (k)βk,m + a I (k,m)γk,m is p k , m = p k , m b , hence we can write

p k , m = 0 if λ a n ( k ) β k , m + a I ( k , m ) γ k , m p k , m b if λ < a n ( k ) β k , m + a I ( k , m ) γ k , m
(61)

so that the optimal solution can be written as p k , m = p k , m b 0 p max ( k ) with k = 1 N m = m 0 m 0 + M 1 p k , m b 0 p max ( k ) = P T M.

Appendix 3

Convergence Analysis of MADP Algorithm

Proceeding as in[19], in order to prove the convergence of the algorithm it is sufficient to show that

  1. (a)

    With a proper choice of the step α q (n), MADP converges to a fixed point;

  2. (b)

    This point is a solution of the KKT conditions of the modified game in (19) and then it is also a solution point of the optimization problem

    max p R ̄ ( p ) s.t. p P
    (62)

where the FAPs’ sum rate is R ̄ (p)= q = 1 Q R ̄ q (p) andP is the cartesian product of the sets P q .

Let us denote with U 1 (n)= R ̄ (p(n)) the sum utility reached at the step n of the MADP algorithm. Then for each user q we must prove that there exists a sequence α q (n) > 0 so that U1(n) is monotonically increasing and convergent, i.e., U1(n + 1) ≥ U1(n) n and U 1 (n) U 1 asn. As discussed in[19], we only need to show that U1(n) is monotonically increasing, i.e., it suffices to consider a given iteration n in which user q is selected to update its power profile, and show that U1(p q (n + 1)) ≥ U1(p q (n)), where the total utility U1 is now regarded as a function of p q because only the power profile of user q is updated. Hence, our goal is to prove that U1(p q (n + 1)) ≥ U1(p q (n)). To do this we will use the descent lemma to bound U1(p q (n + 1)). Descent lemma[19] says that if a functionF: R n R is continuously differentiable and its gradient is Lipschitz continuous with Lipschitz constant equal to K then,x,y R n

F(x+y)F(x)+ y T F(x)+ K 2 y 2 2 .
(63)

One sufficient condition for Lipschitz continuity is that the l2-norm of the Hessian matrix of F(x) is bounded, in which case this bound can be used for the Lipschitz constant. It can easily be shown that it is true for U1(p q ). Specifically, there exists a constant B U 1 q which upper bounds the l2-norm of the Hessian matrix of U1(p q ) independent of others’ power profiles.

Applying the Descent lemma to −U1(p q ), we get

U 1 ( p q n + 1 ) U 1 ( p q n ) + p q n + 1 p q n T × p q U 1 ( p q n ) B U 1 q 2 × p q n + 1 p q n 2 2 .
(64)

Hence to prove that U1(p q (n + 1)) ≥ U1(p q (n)), it suffices to show that

p q n + 1 p q n T p q U 1 ( p q ) n B U 1 q 2 p q n + 1 p q n 2 2 .
(65)

Using the power updating rule

p h q n + 1 = p h q n + α q (n) p h q p h q n
(66)

with the best response of user q defined in (25), the inequality in (65) can be written as

p q p q n T p q U 1 p q ( n ) α q (n) B U 1 q 2 p q p q n 2 2 .
(67)

Observe that

U 1 p q p h q p q = p q ( n ) = β h q SNIR h β q 1 + SNIR h β q · 1 p h q ( n ) + γ h q SNIR h γ q 1 + SNIR h γ q × 1 p h q ( n ) r N q Π h r ( n ) | H h qr | 2 ,
(68)

then exploiting the result in (25), we can write the left-hand side (LHS) of (67) as

LHS = q = 1 Q r N q Π h r ( n ) H h qr 2 2 p h q ( n ) p h q p h q n 2 + q = 1 Q c h q η h q ν q p h q p h q n p h q ( n ) r N q Π h r ( n ) H h qr 2 + ν q η h q .
(69)

Now from (69), with the same steps as in[19], to ensure that

LHS α q (n) B U 1 q 2 p q p q n 2 2 ,
(70)

we can choose the step α q (n) as

α q (n)min 2 A q n B U 1 q , 1
(71)

where the coefficient A q n is defined as

A q n = min h r N q Π h r ( n ) H h qr 2 p h q n c h q .
(72)

Finally, in order to prove the point (b), let U 1 a fixed point of the algorithm such that U 1 (n)= U 1 for some index n. Then since this is a fixed point, it follows that p h q (n)= p h q , h,q. It can then be seen that for all q, p h q must be an optimal solution to the problem (19), given the other users current power profiles and interference price vectors. Hence, p(n) will satisfy also the KKT conditions of the problem (62).

Appendix 4

In order to find the optimal solutions of the convex problem( P ~ 1 ) by studying the Lagrange dual problem, some additional constraint qualification conditions must hold, beyond convexity, to ensure strong duality[23]. One simple constraint qualification is Slater’s condition, i.e., we must verify that some strictly feasible point exists. We can prove that the set F ~ q ( p q ) for each user q fixed the strategies of the others is nonempty. For simplicity in this proof we assume w.l.o.g. m0 = 1. More specifically, the constraint R q (p)> R q 0 can be written as

k = 1 V N q m = 1 V M q β k , m log 1 + p k , m q a n q ( k , m ) + γ k , m × log 1 + p k , m q a I q ( k , m ) > R q 0
(73)

where we have denoted with V N q {1,,N} and V M q {1,,M} the subsets, respectively, of subcarriers and time slots that the player q is using during the game. Since a n q (k,m)> a I q (k,m), to verify (73), it is sufficient to prove that

log 1 + p k , m q a I q ( k , m ) > R q 0 ,k V N q ,m V M q ,qΩ
(74)

and clearly it exists always a set of positive values p k , m q , p q max (k) R + such that

p q max ( k ) > p k , m q ( e R q 0 1 ) 1 a I q ( k , m ) k V N q , m V M q , q Ω .
(75)

Let us consider, for k = 1,…,N, m = 1,…,M, the KKT conditions of the optimization problem( P ~ 1 ):

1 λ q β k , m a n q ( k , m ) 1 + p k , m q a n q ( k , m ) + γ k , m a I q ( k , m ) 1 + p k , m q a I q ( k , m ) μ k , m q + α k , m q = 0 0 λ q R ̄ q ( p q , p q ) R q 0 0 0 p k , m q μ k , m q 0 0 α k , m q p q max ( k ) p k , m q 0 .
(76)

Observe that if p q max (k) p k , m q >0, then α k , m q =0 so that, by eliminating in (76) the multiplier μ k , m q , we obtain

0 1 λ q β k , m a n q ( k , m ) 1 + p k , m q a n q ( k , m ) + γ k , m a I q ( k , m ) 1 + p k , m q a I q ( k , m ) p k , m q 0 0 λ q R ̄ q ( p q , p q ) R q 0 0
(77)

where λ q > 0 otherwise complementarity yields p k , m q =0, k = 1,…,N,m = 1,…,M, and the rate constraint is contradicted. Then, the optimum power vector must satisfy the following equation

a q (k,m) ( p k , m q ) 2 + b q (k,m) p k , m q + c q (k,m)=0
(78)

having set

a q ( k , m ) = a n q k , m a I q k , m b q ( k , m ) = a n q k , m + a I q k , m λ q a n q ( k , m ) a I q ( k , m ) c q ( k , m ) = 1 λ q a n q ( k , m ) β k , m + a I q ( k , m ) γ k , m .
(79)

The solutions of (78) are

p k , m q = p k , m a = b q ( k , m ) b q ( k , m ) 2 4 a q ( k , m ) c q ( k , m ) 2 a q ( k , m ) p k , m b = b q ( k , m ) + b q ( k , m ) 2 4 a q ( k , m ) c q ( k , m ) 2 a q ( k , m ) .
(80)

It can be proved that λ q > 0, it results p k , m a 0, bq(k,m)2−4aq(k,m)cq(k,m) ≥ 0, and

p k , m b = > 0 for λ q > 1 a n q ( k , m ) β k , m + a I q ( k , m ) γ k , m 0 for λ q 1 a n q ( k , m ) β k , m + a I q ( k , m ) γ k , m .
(81)

According to the above considerations the solution is p k , m q = p k , m b for0< p k , m q < p q max (k) and λ q > 1 a n q ( k , m ) β k , m + a I q ( k , m ) γ k , m so that we can write the optimal power allocation vector as

p k , m q = p k , m b 0 p q max ( k )
(82)

where the multiplier λ q is chosen in order to satisfy the constraint R ̄ q ( p q , p q )= R q 0 .

Appendix 5

Proof that the feasible set of the game G ~ 1 is compact and non-empty so that it can be cast as a GPG.

Let us start by the following definition of GPG given in[20]:

Definition 1

A Generalized Nash Equilibrium Problem is a GPG if

  1. (a)

    There exists a non-empty, closed set X ~ R n such that

    X q ( x q )={ x q D q :( x q , x q ) X ~ }q=1,,Q
    (83)

    where D q R n q e are non-empty, closed sets such that q = 1 Q D q X ~ ;

  2. (b)

    There exists a continuous function, Φ x : R n R, named potential function, such that qΩ, x q and for all y q , z q X q ( x q )

    u q y q , x q u q z q , x q >0
    (84)

    implies

    Φ y q , x q Φ z q , x q u q y q , x q u q z q , x q
    (85)

    where u q is the q th player payoff function.

According to Definition 1, we have to check the validity of the conditions (a) and (b) for the game G ~ 1 . As regard the condition (a), let us consider the feasible set of the game G ~ 1 , i.e.,

X ~ = { p R NMQ × 1 : R ̄ q ( p ) R q 0 , 0 p k , m q p q max ( k ) , k { 1 , , N } , m { 1 , , M } , q Ω } ,
(86)

where we have assumed w.l.o.g. m0 = 1. Then we have to prove the following lemma.

Lemma 1

The feasible set X ~ of the game G ~ 1 is a non-empty, closed and bounded (then compact) subset of R NMQ × 1 if the matrices A k defined in (91) are P-matrices, for all k = 1,…,N, m = 1,…,M. Sufficient conditions for which this happens are

r N q | H k rq | 2 | H k qq | 2 < 1 e R q 0 1 qΩ,k=1,,N.
(87)

Proof

Let us start by considering the constraints R ̄ q (p) R q 0 that, by considering only the subcarriers and the time slots that are effectively occupied, can be written as

k = 1 V N q m = 1 V M q β k , m log 1 + p k , m q a n q ( k , m ) + γ