- Research
- Open Access

# Optimal resource allocation in fetmocell networks based on Markov modeling of interferers’ activity

- Stefania Sardellitti
^{1}Email author, - Alessandro Carfagna
^{1}and - Sergio Barbarossa
^{1}

**2012**:371

https://doi.org/10.1186/1687-1499-2012-371

© Sardellitti et al.; licensee Springer. 2012

**Received:**28 November 2011**Accepted:**12 November 2012**Published:**27 December 2012

## Abstract

Femtocell networks offer a series of advantages with respect to conventional cellular networks. However, a potential massive deployment of femto-access points (FAPs) poses a big challenge in terms of interference management, which requires proper radio resource allocation techniques. In this article, we propose alternative optimal power/bit allocation strategies over a time-frequency frame based on a statistical modeling of the interference activity. Given the lack of knowledge of the interference activity, we assume a Bayesian approach that provides the optimal allocation, conditioned to periodic spectrum sensing, and estimation of the interference activity statistical parameters. We consider first a single FAP accessing the radio channel in the presence of a dynamical interference environment. Then, we extend the formulation to a multi-FAP scenario, where nearby FAP’s react to the strategies of the other FAP’s, still within a dynamical interference scenario. The multi-user case is first approached using a strategic non-cooperative game formulation. Then, we propose a coordination game based on the introduction of a pricing mechanism that exploits the backhaul link to enable the exchange of parameters (prices) among FAP’s.

## Introduction

Femtocells are becoming more and more attractive due to their benefits to both cellular operators and subscribers. On the one hand, operators see femtocells as a way to improve indoor coverage and to off-load wireless traffic from the macro cellular network to the wired network, thus releasing wireless channels to additional mobile users. On the other hand, subscribers see femtocells as a way to get higher quality services, either higher data throughput or better voice quality, thanks to a better indoor coverage, and seamless connectivity.

Following the current evolution of cellular standardization process, in this study we assume an LTE framework and we focus on the downlink channel, which assumes an OFDMA strategy. In this context, femtocell networks offer advantages with respect to Wi-Fi, as they avoid vertical hand-off and offer better QoS.

In view of a potential massive deployment of FAP’s, a special attention has to be devoted to RRM. In fact, different from MBSs, FAPs are typically installed by the subscribers and maintained without global planning, with no special consideration about traffic demands or interference with other cells, either femto or macro cells. Hence, a dense deployment of FAPs might induce an intolerable interference from FUE’s to MUE’s or to other FUE’s. Interference management is then arguably one of the major challenges to be faced in femtocell networks.

The goal of this study is to propose an algorithm for optimizing power/bit allocation over a joint time–frequency domain, incorporating a statistical model of the macro-users activity. Since the interference is unknown, the proposed algorithm follows a Bayesian approach, which allocates power/bits over successive time/frequency slots depending on a preliminary sensing and estimation of the parameters of the interference model. We assume a Markov modeling for simplicity, but the approach can be generalized to more sophisticated models, like e.g.,[3, 4]. More specifically, in this study the interference over different frequency subchannels is modeled as a set of statistically independent homogeneous discrete-time Markov chains (DTMCs). We consider a single-user allocation first, where a single FAP finds the optimal resource allocation according to two alternative strategies: (i) maximize the *expected* rate, conditioned to the result of the sensing and estimation phase, under a transmit power constraint; (ii) minimize the transmit power under the *expected* rate constraint.

Opportunistic spectrum access (OSA) in multicarrier networks where the channel occupancy follows a Markovian evolution has already been studied in the framework of cognitive radio (CR) in[5, 6], for example. Chen et al.[5] develop an optimal OSA scheme aimed at optimizing spectrum sensing and access policies *jointly*. They assumed that the secondary transmitter receives error-free ACK signals from the secondary’s receiver, whenever the transmission is successful, and this information is used to track the state of the primary channels. Interestingly enough, Chen et al.[5] establish a separation principle that decouples the design of spectrum sensor and access policy. A similar context is studied in[6, 7], where the authors combine learning and dynamic spectrum access. Both Chen et al.[5] and Unnikrishnan and Veeravalli[6] consider an objective function that depends only on the available cognitive bandwidth and puts a constraint on the collision probability with the primary users. Anandkumar et al.[8] and Liu and Zhao[9] formulate the multi-user OSA problem as a decentralized multi-armed bandit problem[10]. In such a framework, each user *learns* the channel availability statistics and designs a channel access rule in order to maximize the transmission throughput (or equivalently minimize the system *regret*, defined as the loss in secondary throughput due to learning errors and collisions under distributed access). In[9], which is an extension of the single-user policy proposed in[10] to the multi-user case, Liu and Zhao propose a family of distributed learning and access policies known as time-division fair share. For these policies, they prove the minimum growth rate of the system regret, which is shown to behave logarithmically with respect to the number of time slots. Moreover, Liu and Zhao[9] distinguish the case of known number of secondary users from the case in which this number is unknown but estimated at each user through feedback. An alternative scheme for distributed resource allocation between CRs incorporating aggregated interference control is analyzed in[11, 12], where the authors propose a form of real-time multi-agent reinforcement learning, known as decentralized Q-learning[13], to manage the aggregated interference. The objective function to be minimized is an expected discounted cumulative cost related to the difference between the effective signal-to-noise plus interference ratio (SINR) and a target SINR, which has to be guaranteed to the primary system. This SINR is measured at some control points located in the protection contour of the primary network and it is fed back to the secondary base stations that adjust their transmit power consequently. One of the interesting aspects of such an approach is that it is model-free and does not require the knowledge of the transition probabilities of the underlying Markov process. Finally, Geirhofer et al.[14] propose an interference-aware resource allocation for OFDMA systems, based on the sensing and prediction of the ad hoc users from the infrastructure users.

Different from the previous studies, in this article we propose a Bayesian radio access method enabling (possibly multiple) FAP’s to allocate power/bits over a time–frequency grid based on the current belief on the interference level, as obtained from previous sensing. Since the interference cannot be known in advance, we use a Bayesian approach and formulate the utility function as the expected value of the utility conditioned to previous measurements. The goal is to relax the requirement on sensing time and allocate resources over a certain number of future time slots, depending on the interference model and on our prediction capability.

The article is organized as follows. We consider first the radio access of a single FAP and we maximize the expected rate, averaged over the interference activity model, under a transmit power constraint. In this case, the solution can be found in closed form and it represents a sort of generalized water-filling algorithm, with water level depending on the interference activity probabilities. Then, we illustrate an alternative approach consisting in the minimization of the transmit power, subject to a constraint on the minimum average femto-user rate. Then, we generalize the proposed approaches to the multi-FAP scenario, where we analyze the interaction among FAPs using a game-theoretic approach. In particular, we consider first a purely competitive game, where each FAP adopts a purely selfish strategy. Since the competitive game might lead to inefficient Nash equilibria, we also propose a coordinated game where, thanks to the exchange of a few parameters through the backhaul link, the FAP’s coordinate their action to improve upon inefficient Nash equilibria and maximize the sum-rate or minimize the sum-power.

## Single-user Bayesian adaptive allocation

Femtocell networks are fully compliant with cellular standards. Given the current evolution of 3*G* systems, in this article we are concerned with an LTE system and the goal is to allocate power over a time–frequency grid adaptively, as a function of the current occupancy. This implies that the channel and interference power must be sensed at the beginning of each frame. Given the low mobility of indoor users, the channel can be assumed to be nearly constant over the frame. However, the interference from macro-users may vary along the frame depending on the macro-user activity. A correct power allocation across time and frequency would require a non-causal knowledge of the interference, which is of course unavailable. To circumvent this inconvenience, we propose a time–frequency resource allocation based on a Markov modeling of the interference activity over each frequency subchannel. More specifically, we assume that the activity of the macro users over the frequency subchannels is modeled as a set of statistically independent homogeneous DTMCs. The parameters of the statistical model are assumed constant within the frame, but they may vary over successive frames. Each FUE estimates the interference power and the transition probabilities of the interference activity over each subchannel and feeds this information back to the associated FAP, which computes the optimal power allocation over a time–frequency frame, following a Bayesian approach.

In this section, we assume that the interferers do not react to the strategy of the FAP of interest. In the subsequent sections, we will extend the study to the case where the other FAPs react to the choice of nearby, interfering FAPs, thus generating an iterative process, whose stability properties will properly be studied.

### Markovian interference model

*S*

_{k,m}to indicate the macro activity over the

*k*th subchannel, at time

*m*:

*S*

_{k,m}= 1 if the subchannel is busy, with interference power${\sigma}_{I}^{2}(k,m)$, while

*S*

_{k,m}= 0, if the subchannel is idle. We consider different orders for the DTMC so that we can test the effect of the order on the performance of the proposed strategy. As an example, for the order

*L*= 1,2,3, we introduce the following transition probabilities.

*k*is the subchannel index and (

*j*), (

*i*,

*j*), (

*r*,

*i*,

*j*) are, respectively, the states for

*L*= 1,2,3, i.e., all the binary sequence in {0,1}

^{ L }. The probability of being in the state

*i*∈ {0,1} at time

*m*over the

*k*th subchannel is denoted with${\Pi}_{i}^{(k,m)}=\mathrm{Pr}({S}_{k,m}=i)$. In the case of a first-order DTMC, starting from an initial time slot

*m*

_{0}= 1, the probability${\Pi}_{i}^{(k,m)}$ can be obtained recursively as

*m*

_{0}. Equation (4) can be written in compact matrix form as

where${\mathbf{\pi}}^{(k,m)}={[{\Pi}_{0}^{(k,m)},{\Pi}_{1}^{(k,m)}]}^{T}$ and the entries${p}_{\mathit{\text{jl}}}^{\left(k\right)}$ of the transition matrix${\mathit{P}}_{1}^{\left(k\right)}$ are given in (4). Let *β*_{k,m}= Pr(*S*_{k,m}= 0) and *γ*_{k,m}= Pr(*S*_{k,m}= 1) the probabilities that the channel *k* at time *m* is, respectively, idle and busy. Then, we can iteratively calculate them at time *m* from Equation (4) as${\beta}_{k,m}={\Pi}_{0}^{(k,m)}$ and${\gamma}_{k,m}={\Pi}_{1}^{(k,m)}$. The generalization to higher orders is straightforward and the formulas are reported in Appendix 1, for convenience. In Appendix 1, we also report the formulas used to estimate the transition probabilities from the observations.

### Maximum expected rate optimization

Having introduced the interference model, our goal now is to find the bit/power allocation over an OFDM frame composed of *N* subcarriers and *M* consecutive time slots, in order to maximize the expected rate, taking into account the macro-users activity. The assumptions underlying the proposed approach are (1) the channels are affected by multipath, with time-invariant coefficients within each frame, supposed to be known at the transmitter side; (2) the activity of the interferers over each subchannel is modeled as a homogeneous DTMC of order *L* and the transition probabilities are estimated by using the ML estimators discussed in Appendix 1; (3) the activities of the interferers over different channels are statistically independent of each other; (4) the power allocation of the interferers are independent of the power allocation of the FAP of interest.

The last assumption is made to distinguish this situation from the case where the interferers are themselves sensing the channel (interference) and adapting their strategy consequently. In this second case, each adaptive transmitter reacts to the strategies of the other, thus inducing an iterative process that must properly be studied. The first scenario, which is the subject of this section, is appropriate when the interferer is an MBS, for example. The second scenario is more appropriate to model the situation where there are a few nearby FAP’s attempting to access the radio channel at the same time. The analysis of this scenario will be carried out in the next section by resorting to game-theoretic tools.

In the case where there is only one adaptive device, the FUE is supposed to measure the interference power from the macro network over each subchannel over a number of time slots that depend on the order of the Markov chain as well as on the accuracy of the estimation.

*m*

_{0}, our goal is to find out the optimal power allocation over a set of

*M*successive time slots

*m*=

*m*

_{0},

*m*

_{0}+ 1,…,

*m*

_{0}+

*M*−1. Since the interference in the slots successive to the current one is not known, we follow a Bayesian approach. More specifically, our first optimization criterion is the maximization of the expected rate, conditioned to the current estimation of the interference power profile and of the Markov chain parameters, over each frequency subchannel. In formulas, our objective function is

*H*

_{ k }is the FAP channel transfer coefficient over the

*k*th subchannel. The average rate is then

*a priori*known, they are estimated from the observations, using Equations (42), (44) or (45), depending on the most appropriate Markov order

*L*. Knowing the transition probabilities, the occupancy probabilities

*β*

_{k,m}and

*γ*

_{k,m}at any time

*m*, conditioned to the observation of the channel state at the first

*L*time slots can easily be derived by using Equations (4), (39), (40). Then, denoting with p the (time–frequency)

*NM*-dimensional power allocation vector, the max-rate optimization problem is formulated as follows

*p*

^{max}(

*k*),

*k*= 1,…,

*N*represents a mask constraint useful to limit the transmit power over some prescribed channels, for example, the channels occupied by the MBSs. This is a convex problem, as$\stackrel{\u0304}{r}\left(\mathit{p}\right)$ is a concave function of p and the constraint set is convex. The optimum power vector p

^{∗}can be expressed in closed form by imposing the KKT conditions (see Appendix 2 for further details). The optimal power over the

*k*th frequency subchannel, at time

*m*, is

*x*≤

*a*,${\left[x\right]}_{a}^{b}=b$ if

*x*≥

*b*and${\left[x\right]}_{a}^{b}=x$ if

*a*<

*x*<

*b*. The coefficients${\stackrel{~}{b}}_{k,m},{\stackrel{~}{c}}_{k,m},{\stackrel{~}{d}}_{k,m}$ are related to

*a*

_{ n }(

*k*) and

*a*

_{ I }(

*k*,

*m*) as follows

where *λ* is the Lagrange multiplier. Since the optimal powers${p}_{k,m}^{\ast}$ are functions of *λ*, we can find this multiplier numerically as the solution of the power constraint$\sum _{m={m}_{0}}^{{m}_{0}+M-1}\sum _{k=1}^{N}{p}_{k,m}^{\ast}=M{P}_{T}$. Expression (10) is a generalization of the well-known water-filling solution. Indeed, it can be shown that, by taking the limit for the transition probabilities going to 1 or 0, i.e., by turning the Markov chain into the degenerate case of a deterministic signal, Equation (10) converges to the water-filling solution.

### Min-power optimization strategy

Since one of the most critical issues in femtocells is interference management, an alternative optimization procedure consists in minimizing the FAP transmit power, under the constraint of guaranteeing the required rate over the link between the FAP and the associated FUE. This strategy was proposed, for example, in[15] assuming a static interference. Here, we generalize that approach to the case where the interference is dynamic and its activity evolves as a Markov chain, as described in the previous section. The objective now is to minimize the average transmit power across the *N* subchannels and over *M* consecutive time slots. Denoting with *m*_{0} the index of the time slot where the interference power profile is measured, the goal is to allocate power over a set of consecutive slots, starting from *m*_{0}, i.e., for *m* = *m*_{0},…,*m*_{0} + *M*−1, under the constraint that the expected rate, conditioned to the observation on the initial *L* time slots, i.e., for *m* = *m*_{0}−*L* + 1,…,*m*_{0}, does not have to be smaller than a given value *R*_{0}.

where the expected rate is computed as in (8).

where the Lagrange multiplier *λ* is found numerically in order to satisfy the rate constraint$\stackrel{\u0304}{R}\left({\mathit{p}}^{\ast}\right)={R}_{0}$.

One important difference between the min-power and the max-rate problems is that, in the min-power case, the feasible set could be empty. If this happens, it means that the rate requirement is too high to be accommodated. Hence, either the rate requirement is lowered until the feasible set becomes non-empty, or the user is not admitted. This protocol is handled by the call admission control.

## Multi-FAP case: maximum expected rate game

In a scenario containing multiple nearby FAP’s implementing the radio access according to the adaptive strategy described in the previous section, each FAP may react to the power allocation of nearby FAP’s, by changing its own power allocation and so on. This interaction induces an iterative mechanism whose convergence properties have to be carefully studied. The problem can be studied using the theoretical tools of game theory, which is well suited for this kind of multi-objective decision problem. In particular, given the existence of a wired backhaul connecting the FAP’s, we will consider a purely competitive game, where each FAP (player) seeks to optimize its own utility function, irrespective of the other FAP’s performance, and a coordination game, where nearby FAP’s exchange some parameters to improve performance with respect to the purely competitive case.

*S*

_{k,m}the macro-interference activity over the channel

*k*at time

*m*and with Pr(

*S*

_{k,m}= 0) =

*β*

_{k,m}and Pr(

*S*

_{k,m}= 1) =

*γ*

_{k,m}the probabilities that channel

*k*is idle or busy, at time

*m*, the expected rate of the

*q*th FAP is

^{a}

*q*= 1,…,

*Q*, and

where${\mathcal{N}}_{q}$ is the set of neighbors of the *q* th FAP and${H}_{k}^{\mathit{\text{rq}}}$ is the channel transfer function of the *k* th subchannel between the *r* th transmitter and the *q* th receiver. The probabilities *β*_{k,m} and *γ*_{k,m} evolve in time according to a Markov chain of order *L*. We assume, as in the previous section that the allocation over *M* consecutive time slots is carried out on the basis of the observation of a number of initial time slots equal to the order of the Markov chain, i.e., *L*.

It is worth noticing that the major difference between the expected rate in (14) with respect to (8) is that now the interference is composed of two contributions: the dynamic interference of the macro-users, whose activities evolve as Markov chains but whose power profile, when on, is fixed, and the interference from the other FAP’s, whose activity is always on, but whose power profile, described by the vectors${p}_{k,m}^{r}$ evolve as a response to the choices of the other FAP’s.

*Q*} the set of

*Q*players, with

*P*

_{ q }the maximum transmit power over a frame and with${p}_{q}^{\text{max}}\left(k\right)$ the mask constraint over each subcarrier, the problem can be cast as a game, i.e.,

*q*is

Since the objective function in (16) is strictly concave in${\mathit{p}}_{q}\in {\mathcal{P}}_{q}$, for any given p_{−q}, and the feasible set${\mathcal{P}}_{q}$ is compact and convex, game${\mathcal{G}}_{1}$ admits a non-empty solution set for any set of channels and transmit power constraints of the users. In[16], we reformulated this game as a *Variational Inequality*[17] and we applied the iterative gradient projection algorithm to solve it, deriving sufficient conditions for its convergence to a Nash Equilibrium (NE).

Game${\mathcal{G}}_{1}$ may possess multiple equilibria, which may not be Pareto-efficient.^{b} To improve upon the performance of the NE of purely competitive game${\mathcal{G}}_{1}$, we can modify the utility function of each user in order to induce the players to incorporate a social utility function, rather than being purely selfish. For example, in[18, 19] it has been proposed to modify the utility function of each player so as to maximize the sum of all users’ rates. In principle, this change should require a centralized solution. Nevertheless, Huang et al.[18] showed that the solution of the sum-rate game can be still achieved in decentralized form, provided that the players exchange some parameters, the so-called *prices*. These parameters induce a penalty on each player utility proportional to the rate decrease that each player strategy induces on the other players. Introducing pricing mechanisms in femtocell networks is possible, thanks to the existence of the backhaul link, which allows the exchange of prices among FAP’s. Furthermore, we will show next that every FAP needs to exchange pricing coefficients only with its neighbors, thus keeping the amount of extra signaling limited.

*price*coefficient:

*r*’s

*expected*rate resulting from an increase of the

*q*th node’s transmit power, as$\frac{\partial {R}_{r}\left(\mathit{p}\right)}{\partial {p}_{k,m}^{q}}=-{\Pi}_{k,m}^{r}\frac{\partial {I}_{k,m}^{r}}{\partial {p}_{k,m}^{q}}=-{\Pi}_{k,m}^{r}|{H}_{k}^{\mathit{\text{qr}}}{|}^{2}$. The incorporation of the pricing mechanism leads to the new game

^{c}:

*ν*

_{ q }the Lagrangian multiplier, we have set

*ν*

_{ q }> 0, we have${\stackrel{~}{b}}^{q}{(k,m)}^{2}-4{\xe3}^{q}(k,m){\stackrel{~}{c}}^{q}(k,m)\ge 0$, and the only solution is

and the optimal power allocation vector is${p}_{k,m}^{q\ast}={\left[{\stackrel{~}{p}}_{k,m}^{b}\right]}_{0}^{{p}_{q}^{\text{max}}(k)}$ where the multiplier *ν*_{
q
} is chosen in order to satisfy the constraint$\sum _{k=1}^{N}\sum _{m={m}_{0}}^{{m}_{0}+M-1}{\left[{\stackrel{~}{p}}_{k,m}^{b}\right]}_{0}^{{p}_{q}^{\text{max}}(k)}={P}_{q}$. The previous solution assumes, for each player, that the powers used by the other players are given.

*h*so that the entries of the power vector p

_{ q }are${p}_{h}^{q}$ for

*h*= 1,…,

*NM*. Then, defining the quantities

*q*th user best response as

where${c}_{h}^{q}=\frac{{\beta}_{h}^{q}{\text{SNIR}}_{h}^{{\beta}^{q}}}{1+{\text{SNIR}}_{h}^{{\beta}^{q}}}+\frac{{\gamma}_{h}^{q}{\text{SNIR}}_{h}^{{\gamma}^{q}}}{1+{\text{SNIR}}_{h}^{{\gamma}^{q}}}$ and *ν*_{
q
} and${\eta}_{h}^{q}$ are the Lagrangian multipliers. Given this setting, the modified MADP algorithm is illustrated below.

### Algorithm 1 MADP algorithm

Each FAP performs its allocation over M consecutive time slots from *m*_{0} to *m*_{0} + *M* − 1.

Before performing its allocation each FAP has to observe *q*_{
s
}samples from *m*_{0}−*q*_{
s
} + 1 to *m*_{0}, in order to estimate the transition probabilities of the underlying Markov chain.

S.0: Each FAP *q* chooses an initial power profile in the set${\mathcal{P}}_{q}$ and set *n* = 0;

S.1: Each FAP computes its interference prices${\Pi}_{h}^{q}\left(n\right)|{H}_{k}^{\mathit{\text{iq}}}{|}^{2}$ for *h* = 1,…,*MN* andsends them to its neighbors with index$i\in {\mathcal{N}}_{q}$;

S.2: At each time *n*, each FAP updates its power profile so as to maximize its utility function${\stackrel{\u0304}{R}}_{q}$,given the other FAP’s power profiles p_{−q}and price vectors according to${p}_{h}^{q}\left(n+1\right)={p}_{h}^{q}\left(n\right)+{\alpha}_{q}\left(n\right)\left({p}_{h}^{q\ast}-{p}_{h}^{q}\left(n\right)\right)$ for *h* = 1,…,*MN*, where${p}_{h}^{q\ast}$ is given by (25);

S.3: Set *n* = *n* + 1, go to step S.1 and repeat until convergence is reached.

Following similar arguments as[19], we proved in Appendix 3 that there exists a small enough step size values *α*_{
q
}(*n*) for which the MADP algorithm converges monotonically to a fixed point.

## Multi-FAP case: min-power game

*N*subchannels and the

*M*time slots

*q*, given the strategies of the others, is a convex optimization problem, since the objective function is a linear (then convex) function of p

_{ q }and the set${\stackrel{~}{\mathcal{F}}}_{q}\left({\mathit{p}}_{-q}\right)$, given the power vector p

_{−q}of the other players, is a convex set. Imposing the KKT conditions, as in the single FAP case, the solution can be expressed in closed form as${\mathit{p}}_{q}^{\ast}=\mathit{g}\left({\mathit{p}}_{-q}\right)$ whose entries are (see Appendix 4)

*λ*

_{ q }is chosen in order to satisfy the rate constraint${\stackrel{\u0304}{R}}_{q}({\mathit{p}}_{q}^{\ast},{\mathit{p}}_{-q})={R}_{q}^{0}$. However, now the overall feasible set is not jointly convex with respect to the power vectors of all the users, i.e., it is not convex in$\mathit{p}={({\mathit{p}}_{q})}_{q=1}^{Q}$. This makes the study of this game much harder than the standard NE problem. Nevertheless, game${\stackrel{~}{\mathcal{G}}}_{1}$ is a Generalized Potential Game (GPG)[20], with a potential

*Φ*equal to the sum power. In such a case, the existence of a NE of the potential game can be proved directly by the existence of a maximum of the potential function Φ on the set$\stackrel{~}{\mathcal{X}}$ of the game. To exploit the theory of GPG, we must prove that game${\stackrel{~}{\mathcal{G}}}_{1}$ admits a non-empty feasible set. The proof of this result is given in Appendix 5, containing the sufficient conditions under which the feasible set of the game${\stackrel{~}{\mathcal{G}}}_{1}$, i.e.,

is compact and non-empty. Hence, game${\stackrel{~}{\mathcal{G}}}_{1}=\{\mathrm{\Omega},{\{{\stackrel{~}{\mathcal{F}}}_{q}({\mathit{p}}_{-q})\}}_{q\in \mathrm{\Omega}},{\{{u}_{q}({\mathit{p}}_{q})\}}_{q\in \mathrm{\Omega}}\}$ is a GPG with potential function Φ(p) the sum of the objective functions of all players, i.e.,$\mathrm{\Phi}\left(\mathit{p}\right)=\sum _{q=1}^{Q}{u}_{q}\left({\mathit{p}}_{q}\right)$.

where *λ*_{
s
}is the Lagrangian multiplier of user *s* relative to the rate constraint. The pricing coefficients${\Pi}_{k,m}^{s}$ are defined in the same way as in the max-rate case.

### Numerical results

*N*is set to 12, as in LTE Primary Resource Block (PRB). In Figure2 we show the average rate per OFDM symbol as a function of the allocated time slots obtained in the max-rate problem. The simulation results have been averaged over 100 independent channel and Markov chain realizations. The different curves indicate the rate obtained by assuming different kinds of knowledge of the interference: the green curve assumes perfect knowledge of the future evolution of the macro-user activity and it is used as a benchmark case; the pink curve assumes that the interference power level over each subchannel is equal to the value observed in the first time slot; all other curves refer to the proposed algorithm, where we observe the channels in the first slot and allocate power using our proposed method. The different curves refer to different Markov orders (from

*L*= 1 to

*L*= 3). We also compare the case where the transition probabilities are perfectly known and the case where the probabilities are estimated from the data. The interesting behavior is that, as the order increases, our approach is able to approach the ideal case where the interference activity is non-causally known. The price to be paid is the loss of performance resulting from the estimation of the Markov parameters. Further developments could incorporate some kind of reinforcement learning to be able to allocate resources without necessarily estimating the transition probabilities, as proposed in[12], for example.

*L*= 3 in the upper subplot and

*L*= 1 in the lower. We have considered different numbers of samples

*q*

_{ s }used to estimate the transition probabilities. Of course, the higher is the Markov order, the greater is the number of parameters to estimate. In fact, we can notice that, for the same number of samples

*q*

_{ s }used for the estimate, the performance loss with respect to the ideal case of perfect knowledge of the macro-user activity (curve with red squared markers) is much higher for

*L*= 3 than for

*L*= 1. The aim of Figure4 is to show the cumulative rate versus the number of slots

*m*

_{0}used for the recursive estimation of the transition probabilities assuming

*M*= 3 and by modeling the macro-user activity as a first-order Markov chain. We observe that, after less than 50 time slots, the performance gets very close to the asymptotic case. Finally, in Figure5, we show the performance of the min-power allocation strategy. The utility function is the SNR [dB] at the FUE receiver obtained for different number of allocated time slots. As in the max-rate case, it is evident the advantage of increasing the order of statistical knowledge (from

*L*= 1 to

*L*= 3) and it can be observed a gain of about 4 dB with respect to the case where no knowledge about the macro-user activity has been assumed.

*Q*=10 FAPs randomly distributed over a square area. The MBS activity is modeled as a third-order Markov chain and the results have been averaged over 50 independent Markov chain realizations. In Figure6, we have reported the users’ sum-rate versus the iteration index for the maximum expected rate game${\mathcal{G}}_{1}$ in order to test the convergence of the algorithm. It can be observed that it converges in a few iterations. In Figures7 and8, we depict the FAPs’ sum-rate versus the number of allocated time slots. In particular, Figure7 refers to the purely competitive maximum expected rate game${\mathcal{G}}_{1}$, while Figure8 refers to the modified pricing game${\mathcal{G}}_{2}$. In both cases, we assumed the same maximum transmit power per FAP. The three different curves in each figure indicate the sum-rate obtained by assuming perfect (non-causal) knowledge of the macro-user activity, no knowledge at all (thus assuming the interference to remain equal to the values observed in the first slots of each frame), or only knowledge of the Markov parameters. Both figures show that acquiring a statistical knowledge (estimation) of the interference activity parameters (Markov transition rates) yields a performance advantage over the case with no information and brings the performance close to the ideal case of perfect non-causal knowledge of the interference activity. Of course, as time evolves, there is a mismatch between what is predicted and the real interference so that the performance improvement tends to decrease in time. Furthermore, comparing Figures7 and8, it is evident that the gain achieved with the introduction of pricing.

## Conclusion

In conclusion, in this article we have shown how the estimation of the interference statistical parameters can be beneficial to improve the performance of a power allocation technique, provided that the statistical model fits the real data. In this study, we assume that the transition probabilities are estimated from the data. An interesting future direction consists in incorporating methods which do not really require such an estimation, but acquire the proper behavior through reinforcement learning. The interesting part of our method is that, for a given estimation of the transition probabilities, the power allocation across the set of time slots/frequency channels is found in closed form. This is indeed useful to save convergence time with respect to gradient-based techniques.

The other important contribution of this article is the decentralized approach for resource allocation based on game theory, in the case where the interference is dynamically varying. Our Bayesian formulation of the game provides interesting results in such an uncertain environment. Finally, the introduction of coordination among FAP’s based on the exchange of a few parameters (prices) through the backhaul link has been shown to provide significant performance gains with respect to the purely competitive game.

Further developments should incorporate a robust approach for the situation where the interference statistical model is not known or the statistical parameters are time-varying. One more critical aspect is the availability of the backhaul link for the exchange of prices. Since such a link is affected by random delays, it may be useful to incorporate robust mechanisms to cope with the situation where the price does not arrive within a maximum tolerable delay.

## Appendix 1

*L*time slots as

**π**

^{(k,1,…,L)}. More specifically, the entries of the 2

^{ L }-dimensional vector

**π**

^{(k,m−L + 1,…,m)}are defined as

*L*= 2 as

*L*= 3 as

*k*th subchannel is idle (busy) at time

*m*, i.e.,

*β*

_{k,m}(

*γ*

_{k,m}) will be at time

*m*for

*L*= 2,3, respectively,

*m*states, namely, s

^{ m }≡

*s*

_{k,1},…,

*s*

_{k,m}, are observed. The probability of observing a specific sequence of states is

*n*

_{ ij }(

*l*) denote the number of times the state

*i*at time

*l*− 1 switches to state

*j*at time

*l*. It has been proved in[21] that for a stationary Markov chain, the set${n}_{\mathit{\text{ij}}}=\sum _{l=2}^{m}{n}_{\mathit{\text{ij}}}\left(l\right)$ forms a set of sufficient statistics. Furthermore, the maximum likelihood estimator based on the set s

^{ m }is

## Appendix 2

*λ*,

*μ*

_{k,m},

*ν*

_{k,m}are the Lagrangian multipliers and the KKT conditions can be written as

*p*

_{k,m}<

*p*

^{max}(

*k*) then

*ν*

_{k,m}= 0 and this system can be reduced to

*λ*<

*a*

_{ n }(

*k*)

*β*

_{k,m}+

*a*

_{ I }(

*k*,

*m*)

*γ*

_{k,m}the second inequality in (48) can hold only if

*p*

_{k,m}> 0 so that from the first equation in (48) it results

*λ*≥

*a*

_{ n }(

*k*)

*β*

_{k,m}+

*a*

_{ I }(

*k*,

*m*)

*γ*

_{k,m}, then

*p*

_{k,m}> 0 is never verified since it would imply

*λ*≥

*a*

_{ n }(

*k*)

*β*

_{k,m}+

*a*

_{ I }(

*k*,

*m*)

*γ*

_{k,m}we have

*p*

_{k,m}= 0.Let us now solve Equation (50), i.e.,

- 1.The solutions in (53) are always real. In fact the term under the squared root is always positive ∀
*λ*≥ 0 since it results$\begin{array}{l}{\stackrel{~}{b}}_{k,m}^{2}-4{\xe3}_{k,m}{\stackrel{~}{c}}_{k,m}={\lambda}^{2}{\left({a}_{n}\left(k\right)-{a}_{I}(k,m)\right)}^{2}+2\lambda {a}_{n}\left(k\right)\\ \phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{2em}{0ex}}\times {a}_{I}(k,m)\left({a}_{n}\left(k\right)-{a}_{I}(k,m)\right)\\ \phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{2em}{0ex}}\times \left(1-2{\gamma}_{k,m}\right)+{a}_{n}{\left(k\right)}^{2}{a}_{I}{(k,m)}^{2}\end{array}$(54)where*a*_{ n }(*k*)−*a*_{ I }(*k*,*m*) > 0 and the minimum, achieved for*γ*_{k,m}= 1, is given by${\left[\lambda \left({a}_{n}\left(k\right)-{a}_{I}(k,m)\right)-{a}_{n}\left(k\right){a}_{I}(k,m)\right]}^{2}\ge 0,\phantom{\rule{1em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\forall \lambda \ge 0\phantom{\rule{2.77695pt}{0ex}};$(55) - 2.The solution$\begin{array}{l}{p}_{k,m}^{a}=\frac{-{\stackrel{~}{b}}_{k,m}-\sqrt{{\stackrel{~}{b}}_{k,m}^{2}-4{\xe3}_{k,m}{\stackrel{~}{c}}_{k,m}}}{2{\xe3}_{k,m}}\end{array}$(56)is always negative since the inequality$\phantom{\rule{-7.0pt}{0ex}}\begin{array}{l}\sqrt{{\stackrel{~}{b}}_{k,m}^{2}\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}4\lambda {a}_{n}(k){a}_{I}(k,m)\left(\lambda \phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}{a}_{n}(k){\beta}_{k,m}\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}{a}_{I}(k,m){\gamma}_{k,m}\right)}\\ \phantom{\rule{1em}{0ex}}>-{\stackrel{~}{b}}_{k,m}\end{array}$(57)
is verified for all

*λ*> 0; - 3.The sign of the solution$\begin{array}{l}{p}_{k,m}^{b}=\frac{-{\stackrel{~}{b}}_{k,m}+\sqrt{{\stackrel{~}{b}}_{k,m}^{2}-4{\xe3}_{k,m}{\stackrel{~}{c}}_{k,m}}}{2{\xe3}_{k,m}}\end{array}$(58)can be studied by considering the following inequality$\phantom{\rule{-7.0pt}{0ex}}\begin{array}{l}\sqrt{{\stackrel{~}{b}}_{k,m}^{2}\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}4\lambda {a}_{n}(k){a}_{I}(k,m)\left(\lambda \phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}{a}_{n}(k){\beta}_{k,m}\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}{a}_{I}(k,m){\gamma}_{k,m}\right)}\\ \phantom{\rule{1em}{0ex}}>{\stackrel{~}{b}}_{k,m}.\end{array}$(59)In particular
^{d}it results${p}_{k,m}^{b}=\left\{\begin{array}{lll}>0& \text{for}& \lambda <{a}_{n}\left(k\right){\beta}_{k,m}+{a}_{I}(k,m){\gamma}_{k,m}\\ \le 0& \text{for}& \lambda \ge {a}_{n}\left(k\right){\beta}_{k,m}+{a}_{I}(k,m){\gamma}_{k,m}\end{array}\right..$(60)

*p*

_{k,m}<

*p*

^{max}(

*k*) and

*λ*<

*a*

_{ n }(

*k*)

*β*

_{k,m}+

*a*

_{ I }(

*k*,

*m*)

*γ*

_{k,m}is${p}_{k,m}={p}_{k,m}^{b}$, hence we can write

so that the optimal solution can be written as${p}_{k,m}^{\ast}={\left[{p}_{k,m}^{b}\right]}_{0}^{{p}^{\text{max}}\left(k\right)}$ with$\sum _{k=1}^{N}\sum _{m={m}_{0}}^{{m}_{0}+M-1}{\left[{p}_{k,m}^{b}\right]}_{0}^{{p}^{\text{max}}\left(k\right)}={P}_{T}M$.

## Appendix 3

### Convergence Analysis of MADP Algorithm

- (a)
With a proper choice of the step

*α*_{ q }(*n*), MADP converges to a fixed point; - (b)This point is a solution of the KKT conditions of the modified game in (19) and then it is also a solution point of the optimization problem$\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\begin{array}{ll}\underset{\mathit{p}}{\text{max}}& \stackrel{\u0304}{R}\left(\mathit{p}\right)\\ \mathrm{s.t.}& \mathit{p}\in \mathcal{P}\end{array}$(62)

where the FAPs’ sum rate is$\stackrel{\u0304}{R}\left(\mathit{p}\right)=\sum _{q=1}^{Q}{\stackrel{\u0304}{R}}_{q}\left(\mathit{p}\right)$ and$\mathcal{P}$ is the cartesian product of the sets${\mathcal{P}}_{q}$.

*n*of the MADP algorithm. Then for each user

*q*we must prove that there exists a sequence

*α*

_{ q }(

*n*) > 0 so that

*U*

_{1}(

*n*) is monotonically increasing and convergent, i.e.,

*U*

_{1}(

*n*+ 1) ≥

*U*

_{1}(

*n*) ∀

*n*and${U}_{1}\left(n\right)\to {U}_{1}^{\ast}$ as$n\to \infty $. As discussed in[19], we only need to show that

*U*

_{1}(

*n*) is monotonically increasing, i.e., it suffices to consider a given iteration

*n*in which user

*q*is selected to update its power profile, and show that

*U*

_{1}(p

_{ q }(

*n*+ 1)) ≥

*U*

_{1}(p

_{ q }(

*n*)), where the total utility

*U*

_{1}is now regarded as a function of p

_{ q }because only the power profile of user

*q*is updated. Hence, our goal is to prove that

*U*

_{1}(p

_{ q }(

*n*+ 1)) ≥

*U*

_{1}(p

_{ q }(

*n*)). To do this we will use the descent lemma to bound

*U*

_{1}(p

_{ q }(

*n*+ 1)). Descent lemma[19] says that if a function$F:{\mathbb{R}}^{n}\to \mathbb{R}$ is continuously differentiable and its gradient is Lipschitz continuous with Lipschitz constant equal to

*K*then,$\forall \mathit{x},\mathit{y}\in {\mathbb{R}}^{n}$

One sufficient condition for Lipschitz continuity is that the *l*_{2}-norm of the Hessian matrix of *F*(x) is bounded, in which case this bound can be used for the Lipschitz constant. It can easily be shown that it is true for *U*_{1}(p_{
q
}). Specifically, there exists a constant${B}_{{U}_{1}}^{q}$ which upper bounds the *l*_{2}-norm of the Hessian matrix of *U*_{1}(p_{
q
}) independent of others’ power profiles.

*U*

_{1}(p

_{ q }), we get

*U*

_{1}(p

_{ q }(

*n*+ 1)) ≥

*U*

_{1}(p

_{ q }(

*n*)), it suffices to show that

*q*defined in (25), the inequality in (65) can be written as

*α*

_{ q }(

*n*) as

Finally, in order to prove the point (b), let${U}_{1}^{\ast}$ a fixed point of the algorithm such that${U}_{1}\left(n\right)={U}_{1}^{\ast}$ for some index *n*. Then since this is a fixed point, it follows that${p}_{h}^{q}\left(n\right)={p}_{h}^{q\ast}$, ∀*h*,*q*. It can then be seen that for all *q*,${p}_{h}^{q\ast}$ must be an optimal solution to the problem (19), given the other users current power profiles and interference price vectors. Hence, p(*n*) will satisfy also the KKT conditions of the problem (62).

## Appendix 4

*q*fixed the strategies of the others is nonempty. For simplicity in this proof we assume w.l.o.g.

*m*

_{0}= 1. More specifically, the constraint${R}_{q}\left(\mathit{p}\right)>{R}_{q}^{0}$ can be written as

*q*is using during the game. Since${a}_{n}^{q}(k,m)>{a}_{I}^{q}(k,m)$, to verify (73), it is sufficient to prove that

*k*= 1,…,

*N*,

*m*= 1,…,

*M*, the KKT conditions of the optimization problem$\left({\stackrel{~}{P}}_{1}\right)$:

*λ*

_{ q }> 0 otherwise complementarity yields${p}_{k,m}^{q}=0$, ∀

*k*= 1,…,

*N*,

*m*= 1,…,

*M*, and the rate constraint is contradicted. Then, the optimum power vector must satisfy the following equation

*λ*

_{ q }> 0, it results${p}_{k,m}^{a}\le 0$,

*b*

^{ q }(

*k*,

*m*)

^{2}−4

*a*

^{ q }(

*k*,

*m*)

*c*

^{ q }(

*k*,

*m*) ≥ 0, and

where the multiplier *λ*_{
q
} is chosen in order to satisfy the constraint${\stackrel{\u0304}{R}}_{q}({\mathit{p}}_{q}^{\ast},{\mathit{p}}_{-q})={R}_{q}^{0}$.

## Appendix 5

*Proof that the feasible set of the game*
${\stackrel{~}{\mathcal{G}}}_{1}$
*is compact and non-empty so that it can be cast as a GPG.*

Let us start by the following definition of GPG given in[20]:

### Definition 1

- (a)There exists a non-empty, closed set $\stackrel{~}{\mathcal{X}}\subseteq {\mathbb{R}}^{n}$ such that${\mathcal{X}}_{q}\left({\mathit{x}}_{-q}\right)=\{{\mathit{x}}_{q}\in {D}_{q}\phantom{\rule{0.3em}{0ex}}:\phantom{\rule{0.3em}{0ex}}({\mathit{x}}_{q},{\mathit{x}}_{-q})\in \stackrel{~}{\mathcal{X}}\}\phantom{\rule{1em}{0ex}}\forall \phantom{\rule{0.3em}{0ex}}q=1,\dots ,Q$(83)
where${D}_{q}\subseteq {\mathbb{R}}^{{n}_{q}}$

^{e}are non-empty, closed sets such that$\prod _{q=1}^{Q}{D}_{q}\bigcap \stackrel{~}{\mathcal{X}}\ne \varnothing $; - (b)There exists a continuous function, $\mathrm{\Phi}\left(\mathit{x}\right):{\mathbb{R}}^{n}\to \mathbb{R}$, named
*potential function*, such that ∀*q*∈*Ω*, ∀ x_{−q}and for all ${\mathit{y}}_{q},{\mathit{z}}_{q}\in {\mathcal{X}}_{q}\left({\mathit{x}}_{-q}\right)$${u}_{q}\left({\mathit{y}}_{q},{\mathit{x}}_{-q}\right)-{u}_{q}\left({\mathit{z}}_{q},{\mathit{x}}_{-q}\right)>0$(84)implies$\mathrm{\Phi}\left({\mathit{y}}_{q},{\mathit{x}}_{-q}\right)-\mathrm{\Phi}\left({\mathit{z}}_{q},{\mathit{x}}_{-q}\right)\ge {u}_{q}\left({\mathit{y}}_{q},{\mathit{x}}_{-q}\right)-{u}_{q}\left({\mathit{z}}_{q},{\mathit{x}}_{-q}\right)$(85)where

*u*_{ q }is the*q*th player payoff function.

where we have assumed w.l.o.g. *m*_{0} = 1. Then we have to prove the following lemma.

### Lemma 1

_{ k }defined in (91) are

*P*-matrices, for all

*k*= 1,…,

*N*,

*m*= 1,…,

*M*. Sufficient conditions for which this happens are

### Proof

*q*is using during the game. We can note that${a}_{n}^{q}(k,m)>{a}_{I}^{q}(k,m)$ then (88) is surely valid if we prove that