Optimal resource allocation in fetmocell networks based on Markov modeling of interferers’ activity

Sardellitti, Stefania; Carfagna, Alessandro; Barbarossa, Sergio

doi:10.1186/1687-1499-2012-371

Research
Open access
Published: 27 December 2012

Optimal resource allocation in fetmocell networks based on Markov modeling of interferers’ activity

Stefania Sardellitti¹,
Alessandro Carfagna¹ &
Sergio Barbarossa¹

EURASIP Journal on Wireless Communications and Networking volume 2012, Article number: 371 (2012) Cite this article

3036 Accesses
2 Citations
Metrics details

Abstract

Femtocell networks offer a series of advantages with respect to conventional cellular networks. However, a potential massive deployment of femto-access points (FAPs) poses a big challenge in terms of interference management, which requires proper radio resource allocation techniques. In this article, we propose alternative optimal power/bit allocation strategies over a time-frequency frame based on a statistical modeling of the interference activity. Given the lack of knowledge of the interference activity, we assume a Bayesian approach that provides the optimal allocation, conditioned to periodic spectrum sensing, and estimation of the interference activity statistical parameters. We consider first a single FAP accessing the radio channel in the presence of a dynamical interference environment. Then, we extend the formulation to a multi-FAP scenario, where nearby FAP’s react to the strategies of the other FAP’s, still within a dynamical interference scenario. The multi-user case is first approached using a strategic non-cooperative game formulation. Then, we propose a coordination game based on the introduction of a pricing mechanism that exploits the backhaul link to enable the exchange of parameters (prices) among FAP’s.

Introduction

Femtocell networks are composed of cells having a coverage radius in the order of tens of meters, providing enhanced indoor coverage through the use of femto-access points (FAPs) or home-enhanced node B (HeNB), in the long-term evolution (LTE) terminology[1, 2]. A typical scenario is sketched in Figure1, where we can notice the wireless links among femto user equipments (FUE), macro user equipments (MUE), macro base stations (MBSs) and FAPs. More specifically, the wireless links are classified as useful or interfering depending on whether they refer, respectively, to the useful link between a transmitter and its intended receiver or to other receivers falling within its coverage area. Being installed in residential areas, e.g., home, offices, etc., the FAP’s are typically interconnected with each other through a wired link, usually an ADSL subscriber line which allows the access to a broadband Internet network, as depicted in Figure1. One of the ideas proposed in this article is to exploit the backhaul to set up a local coordination among nearby FAPs to improve the efficiency of the radio resource management (RRM), without the presence of a centralized control.

Femtocells are becoming more and more attractive due to their benefits to both cellular operators and subscribers. On the one hand, operators see femtocells as a way to improve indoor coverage and to off-load wireless traffic from the macro cellular network to the wired network, thus releasing wireless channels to additional mobile users. On the other hand, subscribers see femtocells as a way to get higher quality services, either higher data throughput or better voice quality, thanks to a better indoor coverage, and seamless connectivity.

Following the current evolution of cellular standardization process, in this study we assume an LTE framework and we focus on the downlink channel, which assumes an OFDMA strategy. In this context, femtocell networks offer advantages with respect to Wi-Fi, as they avoid vertical hand-off and offer better QoS.

In view of a potential massive deployment of FAP’s, a special attention has to be devoted to RRM. In fact, different from MBSs, FAPs are typically installed by the subscribers and maintained without global planning, with no special consideration about traffic demands or interference with other cells, either femto or macro cells. Hence, a dense deployment of FAPs might induce an intolerable interference from FUE’s to MUE’s or to other FUE’s. Interference management is then arguably one of the major challenges to be faced in femtocell networks.

The goal of this study is to propose an algorithm for optimizing power/bit allocation over a joint time–frequency domain, incorporating a statistical model of the macro-users activity. Since the interference is unknown, the proposed algorithm follows a Bayesian approach, which allocates power/bits over successive time/frequency slots depending on a preliminary sensing and estimation of the parameters of the interference model. We assume a Markov modeling for simplicity, but the approach can be generalized to more sophisticated models, like e.g.,[3, 4]. More specifically, in this study the interference over different frequency subchannels is modeled as a set of statistically independent homogeneous discrete-time Markov chains (DTMCs). We consider a single-user allocation first, where a single FAP finds the optimal resource allocation according to two alternative strategies: (i) maximize the expected rate, conditioned to the result of the sensing and estimation phase, under a transmit power constraint; (ii) minimize the transmit power under the expected rate constraint.

Opportunistic spectrum access (OSA) in multicarrier networks where the channel occupancy follows a Markovian evolution has already been studied in the framework of cognitive radio (CR) in[5, 6], for example. Chen et al.[5] develop an optimal OSA scheme aimed at optimizing spectrum sensing and access policies jointly. They assumed that the secondary transmitter receives error-free ACK signals from the secondary’s receiver, whenever the transmission is successful, and this information is used to track the state of the primary channels. Interestingly enough, Chen et al.[5] establish a separation principle that decouples the design of spectrum sensor and access policy. A similar context is studied in[6, 7], where the authors combine learning and dynamic spectrum access. Both Chen et al.[5] and Unnikrishnan and Veeravalli[6] consider an objective function that depends only on the available cognitive bandwidth and puts a constraint on the collision probability with the primary users. Anandkumar et al.[8] and Liu and Zhao[9] formulate the multi-user OSA problem as a decentralized multi-armed bandit problem[10]. In such a framework, each user learns the channel availability statistics and designs a channel access rule in order to maximize the transmission throughput (or equivalently minimize the system regret, defined as the loss in secondary throughput due to learning errors and collisions under distributed access). In[9], which is an extension of the single-user policy proposed in[10] to the multi-user case, Liu and Zhao propose a family of distributed learning and access policies known as time-division fair share. For these policies, they prove the minimum growth rate of the system regret, which is shown to behave logarithmically with respect to the number of time slots. Moreover, Liu and Zhao[9] distinguish the case of known number of secondary users from the case in which this number is unknown but estimated at each user through feedback. An alternative scheme for distributed resource allocation between CRs incorporating aggregated interference control is analyzed in[11, 12], where the authors propose a form of real-time multi-agent reinforcement learning, known as decentralized Q-learning[13], to manage the aggregated interference. The objective function to be minimized is an expected discounted cumulative cost related to the difference between the effective signal-to-noise plus interference ratio (SINR) and a target SINR, which has to be guaranteed to the primary system. This SINR is measured at some control points located in the protection contour of the primary network and it is fed back to the secondary base stations that adjust their transmit power consequently. One of the interesting aspects of such an approach is that it is model-free and does not require the knowledge of the transition probabilities of the underlying Markov process. Finally, Geirhofer et al.[14] propose an interference-aware resource allocation for OFDMA systems, based on the sensing and prediction of the ad hoc users from the infrastructure users.

Different from the previous studies, in this article we propose a Bayesian radio access method enabling (possibly multiple) FAP’s to allocate power/bits over a time–frequency grid based on the current belief on the interference level, as obtained from previous sensing. Since the interference cannot be known in advance, we use a Bayesian approach and formulate the utility function as the expected value of the utility conditioned to previous measurements. The goal is to relax the requirement on sensing time and allocate resources over a certain number of future time slots, depending on the interference model and on our prediction capability.

The article is organized as follows. We consider first the radio access of a single FAP and we maximize the expected rate, averaged over the interference activity model, under a transmit power constraint. In this case, the solution can be found in closed form and it represents a sort of generalized water-filling algorithm, with water level depending on the interference activity probabilities. Then, we illustrate an alternative approach consisting in the minimization of the transmit power, subject to a constraint on the minimum average femto-user rate. Then, we generalize the proposed approaches to the multi-FAP scenario, where we analyze the interaction among FAPs using a game-theoretic approach. In particular, we consider first a purely competitive game, where each FAP adopts a purely selfish strategy. Since the competitive game might lead to inefficient Nash equilibria, we also propose a coordinated game where, thanks to the exchange of a few parameters through the backhaul link, the FAP’s coordinate their action to improve upon inefficient Nash equilibria and maximize the sum-rate or minimize the sum-power.

Single-user Bayesian adaptive allocation

Femtocell networks are fully compliant with cellular standards. Given the current evolution of 3G systems, in this article we are concerned with an LTE system and the goal is to allocate power over a time–frequency grid adaptively, as a function of the current occupancy. This implies that the channel and interference power must be sensed at the beginning of each frame. Given the low mobility of indoor users, the channel can be assumed to be nearly constant over the frame. However, the interference from macro-users may vary along the frame depending on the macro-user activity. A correct power allocation across time and frequency would require a non-causal knowledge of the interference, which is of course unavailable. To circumvent this inconvenience, we propose a time–frequency resource allocation based on a Markov modeling of the interference activity over each frequency subchannel. More specifically, we assume that the activity of the macro users over the frequency subchannels is modeled as a set of statistically independent homogeneous DTMCs. The parameters of the statistical model are assumed constant within the frame, but they may vary over successive frames. Each FUE estimates the interference power and the transition probabilities of the interference activity over each subchannel and feeds this information back to the associated FAP, which computes the optimal power allocation over a time–frequency frame, following a Bayesian approach.

In this section, we assume that the interferers do not react to the strategy of the FAP of interest. In the subsequent sections, we will extend the study to the case where the other FAPs react to the choice of nearby, interfering FAPs, thus generating an iterative process, whose stability properties will properly be studied.

Markovian interference model

The interference activity over each frequency subchannel is modeled as a DTMC. We use the random binary variable S_k,mto indicate the macro activity over the k th subchannel, at time m: S_k,m= 1 if the subchannel is busy, with interference power $σ_{I}^{2} (k, m)$ , while S_k,m= 0, if the subchannel is idle. We consider different orders for the DTMC so that we can test the effect of the order on the performance of the proposed strategy. As an example, for the order L = 1,2,3, we introduce the following transition probabilities.

\begin{array}{l} p_{jl}^{k} = \Pr (S_{k, m} = l ∣ S_{k, m - 1} = j) for L = 1 \end{array}

(1)

\begin{array}{l} p_{ijl}^{k} = \Pr (S_{k, m} = l ∣ S_{k, m - 1} = j, S_{k, m - 2} = i) for L = 2 \end{array}

(2)

\begin{array}{l} p_{rijl}^{k} = \Pr (S_{k, m} = l ∣ S_{k, m - 1} = j, S_{k, m - 2} = i, S_{k, m - 3} = r) \\ for L = 3 \end{array}

(3)

where k is the subchannel index and (j), (i,j), (r,i,j) are, respectively, the states for L = 1,2,3, i.e., all the binary sequence in {0,1}^L. The probability of being in the state i ∈ {0,1} at time m over the k th subchannel is denoted with $Π_{i}^{(k, m)} = \Pr (S_{k, m} = i)$ . In the case of a first-order DTMC, starting from an initial time slot m₀ = 1, the probability $Π_{i}^{(k, m)}$ can be obtained recursively as

\begin{array}{l} (\begin{array}{l} Π_{0}^{(k, m)} \\ Π_{1}^{(k, m)} \end{array}) = (\begin{array}{l} ω_{1}^{(k)} & 1 - μ_{1}^{(k)} \\ 1 - ω_{1}^{(k)} & μ_{1}^{(k)} \end{array}) (\begin{array}{l} Π_{0}^{(k, m - 1)} \\ Π_{1}^{(k, m - 1)} \end{array}) \\ m = 2, 3, \dots, \end{array}

(4)

where $ω_{1}^{(k)} : = p_{00}^{(k)}$ , $μ_{1}^{(k)} : = p_{11}^{(k)}$ , whereas the initial state $(Π_{0}^{(k, 1)}, Π_{1}^{(k, 1)})$ is obtained by observing the channel state at the time slot of index m₀. Equation (4) can be written in compact matrix form as

π^{(k, m)} = P_{1}^{(k)} π^{(k, m - 1)}

(5)

where $π^{(k, m)} = {[Π_{0}^{(k, m)}, Π_{1}^{(k, m)}]}^{T}$ and the entries $p_{jl}^{(k)}$ of the transition matrix $P_{1}^{(k)}$ are given in (4). Let β_k,m= Pr(S_k,m= 0) and γ_k,m= Pr(S_k,m= 1) the probabilities that the channel k at time m is, respectively, idle and busy. Then, we can iteratively calculate them at time m from Equation (4) as $β_{k, m} = Π_{0}^{(k, m)}$ and $γ_{k, m} = Π_{1}^{(k, m)}$ . The generalization to higher orders is straightforward and the formulas are reported in Appendix 1, for convenience. In Appendix 1, we also report the formulas used to estimate the transition probabilities from the observations.

Maximum expected rate optimization

Having introduced the interference model, our goal now is to find the bit/power allocation over an OFDM frame composed of N subcarriers and M consecutive time slots, in order to maximize the expected rate, taking into account the macro-users activity. The assumptions underlying the proposed approach are (1) the channels are affected by multipath, with time-invariant coefficients within each frame, supposed to be known at the transmitter side; (2) the activity of the interferers over each subchannel is modeled as a homogeneous DTMC of order L and the transition probabilities are estimated by using the ML estimators discussed in Appendix 1; (3) the activities of the interferers over different channels are statistically independent of each other; (4) the power allocation of the interferers are independent of the power allocation of the FAP of interest.

The last assumption is made to distinguish this situation from the case where the interferers are themselves sensing the channel (interference) and adapting their strategy consequently. In this second case, each adaptive transmitter reacts to the strategies of the other, thus inducing an iterative process that must properly be studied. The first scenario, which is the subject of this section, is appropriate when the interferer is an MBS, for example. The second scenario is more appropriate to model the situation where there are a few nearby FAP’s attempting to access the radio channel at the same time. The analysis of this scenario will be carried out in the next section by resorting to game-theoretic tools.

In the case where there is only one adaptive device, the FUE is supposed to measure the interference power from the macro network over each subchannel over a number of time slots that depend on the order of the Markov chain as well as on the accuracy of the estimation.

Based on the channel sensing, up to a current time slot m₀, our goal is to find out the optimal power allocation over a set of M successive time slots m = m₀,m₀ + 1,…,m₀ + M−1. Since the interference in the slots successive to the current one is not known, we follow a Bayesian approach. More specifically, our first optimization criterion is the maximization of the expected rate, conditioned to the current estimation of the interference power profile and of the Markov chain parameters, over each frequency subchannel. In formulas, our objective function is

\bar{r} = \frac{1}{M} \sum_{m = m_{0}}^{m_{0} + M - 1} \sum_{k = 1}^{N} E_{S_{k, m}} {r (S_{k, m})}

(6)

where

r (S_{k, m}) = \{\begin{array}{l} log (1 + \frac{p_{k, m} | H_{k} |^{2}}{σ_{n}^{2} (k)}) & if & S_{k, m} = 0 \\ log (1 + \frac{p_{k, m} | H_{k} |^{2}}{σ_{n}^{2} (k) + σ_{I}^{2} (k, m)}) & if & S_{k, m} = 1 \end{array},

(7)

where $σ_{n}^{2} (k)$ denotes the variance of noise and H_kis the FAP channel transfer coefficient over the k th subchannel. The average rate is then

\begin{array}{l} \bar{r} = \frac{1}{M} \sum_{m = m_{0}}^{m_{0} + M - 1} \sum_{k = 1}^{N} [β_{k, m} log (1 + p_{k, m} a_{n} (k)) + γ_{k, m} \\ \times log (1 + p_{k, m} a_{I} (k, m))] \end{array}

(8)

where $a_{n} (k) : = | H_{k} |^{2} / σ_{n}^{2} (k)$ and $a_{I} (k, m) : = | H_{k} |^{2} / (σ_{n}^{2} (k) + σ_{I}^{2} (k, m))$ . Since the transition probabilities are not a priori known, they are estimated from the observations, using Equations (42), (44) or (45), depending on the most appropriate Markov order L. Knowing the transition probabilities, the occupancy probabilities β_k,m and γ_k,m at any time m, conditioned to the observation of the channel state at the first L time slots can easily be derived by using Equations (4), (39), (40). Then, denoting with p the (time–frequency) NM-dimensional power allocation vector, the max-rate optimization problem is formulated as follows

\begin{array}{l} max_{p} & \bar{r} (p) \\ s.t. & \frac{1}{M} \sum_{m = m_{0}}^{m_{0} + M - 1} \sum_{k = 1}^{N} p_{k, m} \leq P_{T} \\ 0 \leq p_{k, m} \leq p^{max} (k) \forall k, m \end{array}

(9)

where the upper limit p^max(k), k = 1,…,N represents a mask constraint useful to limit the transmit power over some prescribed channels, for example, the channels occupied by the MBSs. This is a convex problem, as $\bar{r} (p)$ is a concave function of p and the constraint set is convex. The optimum power vector p^∗ can be expressed in closed form by imposing the KKT conditions (see Appendix 2 for further details). The optimal power over the k th frequency subchannel, at time m, is

\begin{array}{l} p_{k, m}^{*} = {[\frac{- {\tilde{b}}_{k, m} + \sqrt{{\tilde{b}}_{k, m}^{2} - 4 ã_{k, m} {\tilde{c}}_{k, m}}}{2 ã_{k, m}}]}_{0}^{p^{max} (k)} \end{array}

(10)

where ${[x]}_{a}^{b} = a$ if x ≤ a, ${[x]}_{a}^{b} = b$ if x ≥ b and ${[x]}_{a}^{b} = x$ if a < x < b. The coefficients ${\tilde{b}}_{k, m}, {\tilde{c}}_{k, m}, {\tilde{d}}_{k, m}$ are related to a_n(k) and a_I(k,m) as follows

\begin{array}{l} ã_{k, m} = λ a_{n} (k) a_{I} (k, m) \\ {\tilde{b}}_{k, m} = λ [a_{n} (k) + a_{I} (k, m)] - a_{n} (k) a_{I} (k, m) \\ {\tilde{c}}_{k, m} = λ - a_{n} (k) β_{k, m} - a_{I} (k, m) γ_{k, m}, \end{array}

where λ is the Lagrange multiplier. Since the optimal powers $p_{k, m}^{*}$ are functions of λ, we can find this multiplier numerically as the solution of the power constraint $\sum_{m = m_{0}}^{m_{0} + M - 1} \sum_{k = 1}^{N} p_{k, m}^{*} = M P_{T}$ . Expression (10) is a generalization of the well-known water-filling solution. Indeed, it can be shown that, by taking the limit for the transition probabilities going to 1 or 0, i.e., by turning the Markov chain into the degenerate case of a deterministic signal, Equation (10) converges to the water-filling solution.

Min-power optimization strategy

Since one of the most critical issues in femtocells is interference management, an alternative optimization procedure consists in minimizing the FAP transmit power, under the constraint of guaranteeing the required rate over the link between the FAP and the associated FUE. This strategy was proposed, for example, in[15] assuming a static interference. Here, we generalize that approach to the case where the interference is dynamic and its activity evolves as a Markov chain, as described in the previous section. The objective now is to minimize the average transmit power across the N subchannels and over M consecutive time slots. Denoting with m₀ the index of the time slot where the interference power profile is measured, the goal is to allocate power over a set of consecutive slots, starting from m₀, i.e., for m = m₀,…,m₀ + M−1, under the constraint that the expected rate, conditioned to the observation on the initial L time slots, i.e., for m = m₀−L + 1,…,m₀, does not have to be smaller than a given value R₀.

The optimization problem can then be formulated as

\begin{array}{l} min_{p} & \frac{1}{M} \sum_{k = 1}^{N} \sum_{m = m_{0}}^{m_{0} + M - 1} p_{k, m} \\ subject to & \bar{R} (p) \geq R_{0} \\ 0 \leq p_{k, m} \leq p^{max} (k), \forall k = 1, \dots, N, \\ m = 1, \dots, M \end{array}

(11)

where the expected rate is computed as in (8).

The minimization problem in (11) is a convex optimization problem, since the objective function is a linear (then convex) function of p and the set is convex. The solution can be written in closed form by exploiting the KKT conditions and following the same steps as in Appendix 2, the result is

\begin{array}{l} p_{k, m}^{*} = {[\frac{- b (k, m) + \sqrt{b {(k, m)}^{2} - 4 a (k, m) c (k, m)}}{2 a (k, m)}]}_{0}^{p^{max} (k)} \\ \forall k = 1, \dots, N, m = 1, \dots, M \end{array}

(12)

with

\begin{array}{l} a (k, m) = a_{n} (k) a_{I} (k, m) \\ b (k, m) = a_{n} (k) + a_{I} (k, m) - λ a_{n} (k) a_{I} (k, m) \\ c (k, m) = 1 - λ [a_{n} (k) β_{k, m} + a_{I} (k, m) γ_{k, m}] \end{array}

(13)

where the Lagrange multiplier λ is found numerically in order to satisfy the rate constraint $\bar{R} (p^{*}) = R_{0}$ .

One important difference between the min-power and the max-rate problems is that, in the min-power case, the feasible set could be empty. If this happens, it means that the rate requirement is too high to be accommodated. Hence, either the rate requirement is lowered until the feasible set becomes non-empty, or the user is not admitted. This protocol is handled by the call admission control.

Multi-FAP case: maximum expected rate game

In a scenario containing multiple nearby FAP’s implementing the radio access according to the adaptive strategy described in the previous section, each FAP may react to the power allocation of nearby FAP’s, by changing its own power allocation and so on. This interaction induces an iterative mechanism whose convergence properties have to be carefully studied. The problem can be studied using the theoretical tools of game theory, which is well suited for this kind of multi-objective decision problem. In particular, given the existence of a wired backhaul connecting the FAP’s, we will consider a purely competitive game, where each FAP (player) seeks to optimize its own utility function, irrespective of the other FAP’s performance, and a coordination game, where nearby FAP’s exchange some parameters to improve performance with respect to the purely competitive case.

Denoting again with the binary variable S_k,mthe macro-interference activity over the channel k at time m and with Pr(S_k,m= 0) = β_k,m and Pr(S_k,m= 1) = γ_k,m the probabilities that channel k is idle or busy, at time m, the expected rate of the q th FAP is^a

\begin{array}{l} {\bar{R}}_{q} (p_{q}, p_{- q}) = \sum_{m = m_{0}}^{m_{0} + M - 1} \sum_{k = 1}^{N} [β_{k, m} log (1 + p_{k, m}^{q} a_{n}^{q} (k, m)) \\ + γ_{k, m} log (1 + p_{k, m}^{q} a_{I}^{q} (k, m))], \end{array}

(14)

with q = 1,…,Q, and

\begin{array}{l} a_{n}^{q} (k, m) = \frac{| H_{k}^{qq} |^{2}}{σ_{n, q}^{2} (k) + \sum_{r \in N_{q}} p_{k, m}^{r} | H_{k}^{rq} |^{2}}, a_{I}^{q} (k, m) \\ = \frac{| H_{k}^{qq} |^{2}}{σ_{n, q}^{2} (k) + \sum_{r \in N_{q}} p_{k, m}^{r} | H_{k}^{rq} |^{2} + σ_{I^{q}}^{2} (k, m)} \end{array}

(15)

where $N_{q}$ is the set of neighbors of the q th FAP and $H_{k}^{rq}$ is the channel transfer function of the k th subchannel between the r th transmitter and the q th receiver. The probabilities β_k,m and γ_k,m evolve in time according to a Markov chain of order L. We assume, as in the previous section that the allocation over M consecutive time slots is carried out on the basis of the observation of a number of initial time slots equal to the order of the Markov chain, i.e., L.

It is worth noticing that the major difference between the expected rate in (14) with respect to (8) is that now the interference is composed of two contributions: the dynamic interference of the macro-users, whose activities evolve as Markov chains but whose power profile, when on, is fixed, and the interference from the other FAP’s, whose activity is always on, but whose power profile, described by the vectors $p_{k, m}^{r}$ evolve as a response to the choices of the other FAP’s.

Denoting by Ω = {1…Q} the set of Q players, with P_q the maximum transmit power over a frame and with $p_{q}^{max} (k)$ the mask constraint over each subcarrier, the problem can be cast as a game, i.e.,

G_{1} : \begin{array}{l} \begin{array}{l} max_{p_{q}} & {\bar{R}}_{q} (p_{q}, p_{- q}) \\ subject to & p_{q} \in P_{q} \end{array} & \forall q \in Ω \end{array}

(16)

where the feasible set of FAP q is

\begin{array}{l} P_{q} = \{p_{q} \in R^{NM \times 1} : \sum_{m = m_{0}}^{m_{0} + M - 1} \sum_{k = 1}^{N} p_{k, m}^{q} \leq P_{q}, 0 \leq p_{k, m}^{q} \\ \leq p_{q}^{max} (k), \forall k \in {1, \dots, N}, m \in {m_{0}, \dots, m_{0} + M - 1}\} . \end{array}

(17)

Since the objective function in (16) is strictly concave in $p_{q} \in P_{q}$ , for any given p_−q, and the feasible set $P_{q}$ is compact and convex, game $G_{1}$ admits a non-empty solution set for any set of channels and transmit power constraints of the users. In[16], we reformulated this game as a Variational Inequality[17] and we applied the iterative gradient projection algorithm to solve it, deriving sufficient conditions for its convergence to a Nash Equilibrium (NE).

Game $G_{1}$ may possess multiple equilibria, which may not be Pareto-efficient.^b To improve upon the performance of the NE of purely competitive game $G_{1}$ , we can modify the utility function of each user in order to induce the players to incorporate a social utility function, rather than being purely selfish. For example, in[18, 19] it has been proposed to modify the utility function of each player so as to maximize the sum of all users’ rates. In principle, this change should require a centralized solution. Nevertheless, Huang et al.[18] showed that the solution of the sum-rate game can be still achieved in decentralized form, provided that the players exchange some parameters, the so-called prices. These parameters induce a penalty on each player utility proportional to the rate decrease that each player strategy induces on the other players. Introducing pricing mechanisms in femtocell networks is possible, thanks to the existence of the backhaul link, which allows the exchange of prices among FAP’s. Furthermore, we will show next that every FAP needs to exchange pricing coefficients only with its neighbors, thus keeping the amount of extra signaling limited.

Generalizing the approach proposed in[18] to our Bayesian formulation, we introduce the price coefficient:

Π_{k, m}^{r} : = - \frac{\partial {\bar{R}}_{r} (p)}{\partial I_{k, m}^{r} (p_{- r})}

(18)

with $I_{k, m}^{r} (p_{- r}) : = \sum_{i \in N_{r}} p_{k, m}^{i} | H_{k}^{ir} |^{2}$ . These coefficients are proportional to the marginal decrease of user r’s expected rate resulting from an increase of the q th node’s transmit power, as $\frac{\partial R_{r} (p)}{\partial p_{k, m}^{q}} = - Π_{k, m}^{r} \frac{\partial I_{k, m}^{r}}{\partial p_{k, m}^{q}} = - Π_{k, m}^{r} | H_{k}^{qr} |^{2}$ . The incorporation of the pricing mechanism leads to the new game^c:

G_{2} : \begin{array}{l} max_{p_{q}} & {\bar{R}}_{q} (p_{q}, p_{- q}) - \sum_{m = m_{0}}^{m_{0} + M - 1} \sum_{k = 1}^{N} p_{k, m}^{q} (\sum_{r \in N_{q}} Π_{k, m}^{r} | H_{k}^{qr} |^{2}) \\ s.t. & p_{q} \in P_{q} . \end{array}

(19)

Each local problem is convex, hence the KKT conditions lead to power coefficients $p_{k, m}^{q}$ of each FAP that, within the interval $[0, p_{q}^{max} (k))$ , must satisfy the following equation

ã^{q} (k, m) {(p_{k, m}^{q})}^{2} + {\tilde{b}}^{q} (k, m) p_{k, m}^{q} + {\tilde{c}}^{q} (k, m) = 0

(20)

where, denoting with ν_qthe Lagrangian multiplier, we have set

\begin{array}{l} ã^{q} (k, m) = (ν_{q} + \sum_{r \in N_{q}} Π_{k, m}^{r} | H_{k}^{qr} |^{2}) a_{n}^{q} (k, m) a_{I}^{q} (k, m) \\ {\tilde{b}}^{q} (k, m) = (ν_{q} + \sum_{r \in N_{q}} Π_{k, m}^{r} | H_{k}^{qr} |^{2}) (a_{n}^{q} (k, m) + a_{I}^{q} (k, m)) \\ - a_{n}^{q} (k, m) a_{I}^{q} (k, m) \\ {\tilde{c}}^{q} (k, m) = ν_{q} + \sum_{r \in N_{q}} Π_{k, m}^{r} | H_{k}^{qr} |^{2} - (a_{n}^{q} (k, m) β_{k, m} \\ + a_{I}^{q} (k, m) γ_{k, m}) \end{array} .

(21)

We can verify that, ∀ν_q> 0, we have ${\tilde{b}}^{q} {(k, m)}^{2} - 4 ã^{q} (k, m) {\tilde{c}}^{q} (k, m) \geq 0$ , and the only solution is

{\tilde{p}}_{k, m}^{b} = \frac{- {\tilde{b}}^{q} (k, m) + \sqrt{{\tilde{b}}^{q} {(k, m)}^{2} - 4 ã^{q} (k, m) {\tilde{c}}^{q} (k, m)}}{2 ã^{q} (k, m)} .

(22)

More specifically, we get

p_{k, m}^{q} = \{\begin{array}{l} 0 & if & ν_{q} + \sum_{r \in N_{q}} Π_{k, m}^{r} | H_{k}^{qr} |^{2} \geq a_{n}^{q} (k, m) β_{k, m} + a_{I}^{q} (k, m) γ_{k, m} \\ {\tilde{p}}_{k, m}^{b} & if & ν_{q} + \sum_{r \in N_{q}} Π_{k, m}^{r} | H_{k}^{qr} |^{2} < a_{n}^{q} (k, m) β_{k, m} + a_{I}^{q} (k, m) γ_{k, m} \end{array}

(23)

and the optimal power allocation vector is $p_{k, m}^{q *} = {[{\tilde{p}}_{k, m}^{b}]}_{0}^{p_{q}^{max} (k)}$ where the multiplier ν_q is chosen in order to satisfy the constraint $\sum_{k = 1}^{N} \sum_{m = m_{0}}^{m_{0} + M - 1} {[{\tilde{p}}_{k, m}^{b}]}_{0}^{p_{q}^{max} (k)} = P_{q}$ . The previous solution assumes, for each player, that the powers used by the other players are given.

In practice, the game evolves with each FAP reacting to the choices of the other FAPs. It is then fundamental to prove the convergence of this iterative mechanism. In the following, we present a version of the so-called Modified Asynchronous Distributed Pricing algorithm (MADP) proposed in[19], adapted to our formulation. To this purpose, it is useful to rewrite (19) introducing a unique index h so that the entries of the power vector p_q are $p_{h}^{q}$ for h = 1,…,NM. Then, defining the quantities

\begin{array}{l} {SNIR}_{h}^{β^{q}} : = \frac{p_{h}^{q} {|H_{h}^{qq}|}^{2}}{σ_{n, q}^{2} (h) + \sum_{r \in N_{q}} p_{h}^{r} {|H_{h}^{rq}|}^{2}}, \\ {SNIR}_{h}^{γ^{q}} : = \frac{p_{h}^{q} {|H_{h}^{qq}|}^{2}}{σ_{n, q}^{2} (h) + \sum_{r \in N_{q}} p_{h}^{r} {|H_{h}^{rq}|}^{2} + σ_{I_{q}}^{2} (h)} \end{array}

(24)

we can derive the q th user best response as

p_{h}^{q^{*}} = \frac{2 c_{h}^{q}}{\sum_{r \in N_{q}} Π_{h}^{r} {|H_{h}^{qr}|}^{2} + ν_{q} - η_{h}^{q}} - p_{h}^{q} \forall h = 1, \dots, MN,

(25)

where $c_{h}^{q} = \frac{β_{h}^{q} {SNIR}_{h}^{β^{q}}}{1 + {SNIR}_{h}^{β^{q}}} + \frac{γ_{h}^{q} {SNIR}_{h}^{γ^{q}}}{1 + {SNIR}_{h}^{γ^{q}}}$ and ν_q and $η_{h}^{q}$ are the Lagrangian multipliers. Given this setting, the modified MADP algorithm is illustrated below.

Algorithm 1 MADP algorithm

Each FAP performs its allocation over M consecutive time slots from m₀ to m₀ + M − 1.

Before performing its allocation each FAP has to observe q_ssamples from m₀−q_s + 1 to m₀, in order to estimate the transition probabilities of the underlying Markov chain.

S.0: Each FAP q chooses an initial power profile in the set $P_{q}$ and set n = 0;

S.1: Each FAP computes its interference prices $Π_{h}^{q} (n) | H_{k}^{iq} |^{2}$ for h = 1,…,MN andsends them to its neighbors with index $i \in N_{q}$ ;

S.2: At each time n, each FAP updates its power profile so as to maximize its utility function ${\bar{R}}_{q}$ ,given the other FAP’s power profiles p_−qand price vectors according to $p_{h}^{q} (n + 1) = p_{h}^{q} (n) + α_{q} (n) (p_{h}^{q *} - p_{h}^{q} (n))$ for h = 1,…,MN, where $p_{h}^{q *}$ is given by (25);

S.3: Set n = n + 1, go to step S.1 and repeat until convergence is reached.

Following similar arguments as[19], we proved in Appendix 3 that there exists a small enough step size values α_q(n) for which the MADP algorithm converges monotonically to a fixed point.

Multi-FAP case: min-power game

As with the max-rate game, let us consider now the generalization of the min-power algorithm to the multi-FAP case. The utility of each player is now the total transmit power over the N subchannels and the M time slots

u_{q} (p_{q}) = \sum_{k = 1}^{N} \sum_{m = m_{0}}^{m_{0} + M - 1} p_{k, m}^{q}

(26)

and the constraint is that the expected rate for each FAP, conditioned to the initial observations, be not smaller than a given value $R_{q}^{0}$ . The feasible set is now

\begin{array}{l} {\tilde{F}}_{q} (p_{- q}) = \{p_{q} \in R^{NM \times 1} : {\bar{R}}_{q} (p_{q}, p_{- q}) \geq R_{q}^{0}, 0 \leq p_{k, m}^{q} \\ \leq p_{q}^{max} (k), \forall k = 1, \dots, N, m = m_{0}, \dots, m_{0} \\ + M - 1\} \end{array}

(27)

and the game is

{\tilde{G}}_{1} = {Ω, {{\tilde{F}}_{q} (p_{- q})}_{q \in Ω}, {u_{q} (p_{q})}_{q \in Ω}} .

(28)

The optimal strategy for each player amounts to solving the following optimization problem

\begin{array}{l} ({\tilde{P}}_{1}) \begin{array}{l} min_{p_{q}} & u_{q} (p_{q}) \\ subject to & p_{q} \in {\tilde{F}}_{q} (p_{- q}) \end{array} & \forall q \in Ω . \end{array}

(29)

The minimization problem in (29) for each player q, given the strategies of the others, is a convex optimization problem, since the objective function is a linear (then convex) function of p_q and the set ${\tilde{F}}_{q} (p_{- q})$ , given the power vector p_−q of the other players, is a convex set. Imposing the KKT conditions, as in the single FAP case, the solution can be expressed in closed form as $p_{q}^{*} = g (p_{- q})$ whose entries are (see Appendix 4)

\begin{array}{l} p_{k, m}^{q *} = {[g (p_{- q})]}_{k, m} \\ = {[\frac{- b^{q} (k, m) + \sqrt{b^{q} {(k, m)}^{2} - 4 a^{q} (k, m) c^{q} (k, m)}}{2 a^{q} (k, m)}]}_{0}^{p_{q}^{max} (k)} \\ \times \forall k = 1, \dots, N, m = 1, \dots, M \end{array}

(30)

with

\begin{align} a^{q} (k, m) = a_{n}^{q} (k, m) a_{I}^{q} (k, m) \\ b^{q} (k, m) = a_{n}^{q} (k, m) + a_{I}^{q} (k, m) - λ_{q} a_{n}^{q} (k, m) a_{I}^{q} (k, m) \\ c^{q} (k, m) = 1 - λ_{q} [a_{n}^{q} (k, m) β_{k, m} + a_{I}^{q} (k, m) γ_{k, m}] \end{align}

(31)

where the Lagrange multiplier λ_q is chosen in order to satisfy the rate constraint ${\bar{R}}_{q} (p_{q}^{*}, p_{- q}) = R_{q}^{0}$ . However, now the overall feasible set is not jointly convex with respect to the power vectors of all the users, i.e., it is not convex in $p = {(p_{q})}_{q = 1}^{Q}$ . This makes the study of this game much harder than the standard NE problem. Nevertheless, game ${\tilde{G}}_{1}$ is a Generalized Potential Game (GPG)[20], with a potential Φ equal to the sum power. In such a case, the existence of a NE of the potential game can be proved directly by the existence of a maximum of the potential function Φ on the set $\tilde{X}$ of the game. To exploit the theory of GPG, we must prove that game ${\tilde{G}}_{1}$ admits a non-empty feasible set. The proof of this result is given in Appendix 5, containing the sufficient conditions under which the feasible set of the game ${\tilde{G}}_{1}$ , i.e.,

\begin{array}{l} \tilde{X} = \{p \in R^{NMQ} : {\bar{R}}_{q} (p) \geq R_{q}^{0}, 0 \leq p_{k, m}^{q} \leq p_{q}^{max} (k), \\ \times \forall k, m, \forall q \in Ω\} \end{array}

(32)

is compact and non-empty. Hence, game ${\tilde{G}}_{1} = {Ω, {{\tilde{F}}_{q} (p_{- q})}_{q \in Ω}, {u_{q} (p_{q})}_{q \in Ω}}$ is a GPG with potential function Φ(p) the sum of the objective functions of all players, i.e., $Φ (p) = \sum_{q = 1}^{Q} u_{q} (p_{q})$ .

Nevertheless, being a potential game does not guarantee the equilibrium to be efficient. Hence, as with the max-rate game, efficiency can be improved by introducing pricing. The introduction of pricing leads to a modified game which can be cast as

\begin{array}{l} {\tilde{G}}_{2} : & \begin{array}{l} min_{p_{q}} & u_{q} (p_{q}) + \sum_{m = m_{0}}^{m_{0} + M - 1} \sum_{k = 1}^{N} (\sum_{s \in N_{q}} λ_{s} Π_{k, m}^{s} | H_{k}^{qs} |^{2}) \\ \times p_{k, m}^{q} \forall q \in Ω \\ subject to & p_{q} \in {\tilde{F}}_{q} (p_{- q}) \end{array} \end{array}

(33)

where λ_sis the Lagrangian multiplier of user s relative to the rate constraint. The pricing coefficients $Π_{k, m}^{s}$ are defined in the same way as in the max-rate case.

Numerical results

In this section, we present some numerical results in order to assess the performance of the algorithms proposed in the previous sections. Let us start with the single-FAP case. In all the simulation results, we have considered Rayleigh fading frequency-selective channels where the number of resolvable paths is 4, each one with unit variance. The number of subcarriers N is set to 12, as in LTE Primary Resource Block (PRB). In Figure2 we show the average rate per OFDM symbol as a function of the allocated time slots obtained in the max-rate problem. The simulation results have been averaged over 100 independent channel and Markov chain realizations. The different curves indicate the rate obtained by assuming different kinds of knowledge of the interference: the green curve assumes perfect knowledge of the future evolution of the macro-user activity and it is used as a benchmark case; the pink curve assumes that the interference power level over each subchannel is equal to the value observed in the first time slot; all other curves refer to the proposed algorithm, where we observe the channels in the first slot and allocate power using our proposed method. The different curves refer to different Markov orders (from L = 1 to L = 3). We also compare the case where the transition probabilities are perfectly known and the case where the probabilities are estimated from the data. The interesting behavior is that, as the order increases, our approach is able to approach the ideal case where the interference activity is non-causally known. The price to be paid is the loss of performance resulting from the estimation of the Markov parameters. Further developments could incorporate some kind of reinforcement learning to be able to allocate resources without necessarily estimating the transition probabilities, as proposed in[12], for example.

In Figure3, we report the optimal rate of our algorithm versus the number of time slots, for different Markov orders, L = 3 in the upper subplot and L = 1 in the lower. We have considered different numbers of samples q_sused to estimate the transition probabilities. Of course, the higher is the Markov order, the greater is the number of parameters to estimate. In fact, we can notice that, for the same number of samples q_s used for the estimate, the performance loss with respect to the ideal case of perfect knowledge of the macro-user activity (curve with red squared markers) is much higher for L = 3 than for L = 1. The aim of Figure4 is to show the cumulative rate versus the number of slots m₀used for the recursive estimation of the transition probabilities assuming M = 3 and by modeling the macro-user activity as a first-order Markov chain. We observe that, after less than 50 time slots, the performance gets very close to the asymptotic case. Finally, in Figure5, we show the performance of the min-power allocation strategy. The utility function is the SNR [dB] at the FUE receiver obtained for different number of allocated time slots. As in the max-rate case, it is evident the advantage of increasing the order of statistical knowledge (from L = 1 to L = 3) and it can be observed a gain of about 4 dB with respect to the case where no knowledge about the macro-user activity has been assumed.

Finally, we provide some numerical examples to assess the performance of the proposed approaches (max-rate and min-power) in the multiuser case. The reference scenario is composed of one MBS and Q=10 FAPs randomly distributed over a square area. The MBS activity is modeled as a third-order Markov chain and the results have been averaged over 50 independent Markov chain realizations. In Figure6, we have reported the users’ sum-rate versus the iteration index for the maximum expected rate game $G_{1}$ in order to test the convergence of the algorithm. It can be observed that it converges in a few iterations. In Figures7 and8, we depict the FAPs’ sum-rate versus the number of allocated time slots. In particular, Figure7 refers to the purely competitive maximum expected rate game $G_{1}$ , while Figure8 refers to the modified pricing game $G_{2}$ . In both cases, we assumed the same maximum transmit power per FAP. The three different curves in each figure indicate the sum-rate obtained by assuming perfect (non-causal) knowledge of the macro-user activity, no knowledge at all (thus assuming the interference to remain equal to the values observed in the first slots of each frame), or only knowledge of the Markov parameters. Both figures show that acquiring a statistical knowledge (estimation) of the interference activity parameters (Markov transition rates) yields a performance advantage over the case with no information and brings the performance close to the ideal case of perfect non-causal knowledge of the interference activity. Of course, as time evolves, there is a mismatch between what is predicted and the real interference so that the performance improvement tends to decrease in time. Furthermore, comparing Figures7 and8, it is evident that the gain achieved with the introduction of pricing.

Considering the same scenario, in Figures9 and10 we report the simulation results corresponding to our proposed minimum power games, ${\tilde{G}}_{1}$ and ${\tilde{G}}_{2}$ . Figure9 refers to the min-power game with no pricing, while Figure10 refers to the game including pricing. The curves show the average SNR per FUE as a function of the number of allocated time slots. The expected target rate $R_{q}^{0}$ in both cases is set to 3 bps for each user. From both Figures9 and10 we can verify that, also in this case, the simple statistical knowledge of the transition rates yields performance close to the ideal case where the interference activity is non-causally known. Observe that the curve referring to the non-causal knowledge tends to have zero slope asymptotically and the statistical knowledge curve presents a performance gain which tends to decrease as time evolves due to the mismatch between what is predicted and the real interference. In both Figures9 and10, the advantage of the statistical approach with respect to the case where there is no knowledge is considerable and by comparing the two figures it is evident the performance gain due to the introduction of a pricing mechanism.

Conclusion

In conclusion, in this article we have shown how the estimation of the interference statistical parameters can be beneficial to improve the performance of a power allocation technique, provided that the statistical model fits the real data. In this study, we assume that the transition probabilities are estimated from the data. An interesting future direction consists in incorporating methods which do not really require such an estimation, but acquire the proper behavior through reinforcement learning. The interesting part of our method is that, for a given estimation of the transition probabilities, the power allocation across the set of time slots/frequency channels is found in closed form. This is indeed useful to save convergence time with respect to gradient-based techniques.

The other important contribution of this article is the decentralized approach for resource allocation based on game theory, in the case where the interference is dynamically varying. Our Bayesian formulation of the game provides interesting results in such an uncertain environment. Finally, the introduction of coordination among FAP’s based on the exchange of a few parameters (prices) through the backhaul link has been shown to provide significant performance gains with respect to the purely competitive game.

Further developments should incorporate a robust approach for the situation where the interference statistical model is not known or the statistical parameters are time-varying. One more critical aspect is the availability of the backhaul link for the exchange of prices. Since such a link is affected by random delays, it may be useful to incorporate robust mechanisms to cope with the situation where the price does not arrive within a maximum tolerable delay.

Appendix 1

Extending the Markov model used in (5) to higher-order chains, we can write the time evolution of the occupancy probabilities conditioned to the observations of the channel state at the first L time slots as

π^{(k, m - L + 1, \dots, m)} = P_{L}^{(k)} π^{(k, m - L, m - L + 1, \dots, m - 1)}

(34)

for $m = L + 1, \dots, \infty$ and given the initial state π^(k,1,…,L). More specifically, the entries of the 2^L-dimensional vector π^{(k,m−L + 1,…,m)}are defined as

\begin{array}{l} Π_{ij}^{(k, m - 1, m)} = \Pr (S_{k, m - 1} = i, S_{k, m} = j) for L = 2, \\ \forall (i, j) \in {0, 1}^{2} \end{array}

(35)

\begin{array}{l} Π_{rij}^{(k, m - 2, m - 1, m)} = \Pr (S_{k, m - 2} = r, S_{k, m - 1} = i, S_{k, m} = j) \\ for L = 3, \forall (r, i, j) \in {0, 1}^{3} \end{array}

(36)

while the transition matrix $P_{L}^{(k)}$ are expressed, respectively, for L = 2 as

P_{2}^{(k)} = (\begin{array}{l} ω_{2}^{(k)} & 0 & θ_{2}^{(k)} & 0 \\ 1 - ω_{2}^{(k)} & 0 & 1 - θ_{2}^{(k)} & 0 \\ 0 & ν_{2}^{(k)} & 0 & 1 - μ_{2}^{(k)} \\ 0 & 1 - ν_{2}^{(k)} & 0 & μ_{2}^{(k)} \end{array})

(37)

with $ω_{2}^{(k)} = p_{000}^{(k)}$ , $θ_{2}^{(k)} = p_{100}^{(k)}$ , $ν_{2}^{(k)} = p_{010}^{(k)}$ , $μ_{2}^{(k)} = p_{111}^{(k)}$ and for L = 3 as

\begin{array}{l} P_{3}^{(k)} = \\ (\begin{array}{l} ω_{3}^{(k)} & 0 & 0 & 0 & λ_{3}^{(k)} & 0 & 0 & 0 \\ 1 - ω_{3}^{(k)} & 0 & 0 & 0 & 1 - λ_{3}^{(k)} & 0 & 0 & 0 \\ 0 & ν_{3}^{(k)} & 0 & 0 & 0 & ψ_{3}^{(k)} & 0 & 0 \\ 0 & 1 - ν_{3}^{(k)} & 0 & 0 & 0 & 1 - ψ_{3}^{(k)} & 0 & 0 \\ 0 & 0 & η_{3}^{(k)} & 0 & 0 & 0 & θ_{3}^{(k)} & 0 \\ 0 & 0 & 1 - η_{3}^{(k)} & 0 & 0 & 0 & 1 - θ_{3}^{(k)} & 0 \\ 0 & 0 & 0 & 1 - γ_{3}^{(k)} & 0 & 0 & 0 & 1 - μ_{3}^{(k)} \\ 0 & 0 & 0 & γ_{3}^{(k)} & 0 & 0 & 0 & μ_{3}^{(k)} \end{array}) \end{array}

(38)

where $ω_{3}^{(k)} = p_{0000}^{(k)}$ , $λ_{3}^{(k)} = p_{1000}^{(k)}$ , $ν_{3}^{(k)} = p_{0010}^{(k)}$ , $ψ_{3}^{(k)} =$ $p_{1010}^{(k)}$ , $η_{3}^{(k)} = p_{0100}^{(k)}$ , $θ_{3}^{(k)} = p_{1100}^{(k)}$ , $γ_{3}^{(k)} = p_{0111}^{(k)}$ , $μ_{3}^{(k)} =$ $p_{1111}^{(k)}$ . Hence, assuming for simplicity of notation $Π_{ij}^{(k, m - 2, m - 1)} = Π_{ij}^{(k)}$ and $Π_{rij}^{(k, m - 3, m - 2, m - 1)} = Π_{rij}^{(k)}$ , the probabilities that the k th subchannel is idle (busy) at time m, i.e., β_k,m(γ_k,m) will be at time m for L = 2,3, respectively,

\begin{array}{l} β_{k, m} = ω_{2}^{(k)} Π_{00}^{(k)} + θ_{2}^{(k)} Π_{10}^{(k)} + ν_{2}^{(k)} Π_{01}^{(k)} + (1 - μ_{2}^{(k)}) Π_{11}^{(k)} \\ γ_{k, m} = (1 - ω_{2}^{(k)}) Π_{00}^{(k)} + (1 - θ_{2}^{(k)}) Π_{10}^{(k)} + (1 - ν_{2}^{(k)}) Π_{01}^{(k)} \\ + μ_{2}^{(k)} Π_{11}^{(k)} \end{array}

(39)

and

\begin{array}{l} β_{k, m} = ω_{3}^{(k)} Π_{000}^{(k)} + λ_{3}^{(k)} Π_{100}^{(k)} + ν_{3}^{(k)} Π_{001}^{(k)} + ψ_{3}^{(k)} Π_{101}^{(k)} \\ + η_{3}^{(k)} Π_{010}^{(k)} + θ_{3}^{(k)} Π_{110}^{(k)} + (1 - γ_{3}^{(k)}) Π_{011}^{(k)} \\ + (1 - μ_{3}^{(k)}) Π_{111}^{(k)} \\ γ_{k, m} = (1 - ω_{3}^{(k)}) Π_{000}^{(k)} + (1 - λ_{3}^{(k)}) Π_{100}^{(k)} + (1 - ν_{3}^{(k)}) Π_{001}^{(k)} \\ + (1 - ψ_{3}^{(k)}) Π_{101}^{(k)} + (1 - η_{3}^{(k)}) Π_{010}^{(k)} + (1 - θ_{3}^{(k)}) Π_{110}^{(k)} \\ + γ_{3}^{(k)} Π_{011}^{(k)} + μ_{3}^{(k)} Π_{111}^{(k)} . \end{array}

(40)

The transition probabilities of a Markov chain of arbitrary order can be estimated from the observed data using the maximum likelihood strategy, as suggested for example in[21, 22]. To simplify the description of the estimator we focus on a first-order Markov chain, but the extension to higher orders is straightforward. Let us assume that a set of m states, namely, s^m≡ s_k,1,…,s_k,m, are observed. The probability of observing a specific sequence of states is

\begin{array}{l} \Pr (S^{m} = s^{m}) = \prod_{l = 2}^{m} \Pr (S_{k, l} = s_{k, l} | S_{k, l - 1} = s_{k, l - 1}) \\ \times \Pr (S_{k, 1} = s_{k, 1}) . \end{array}

(41)

Let n_ij(l) denote the number of times the state i at time l − 1 switches to state j at time l. It has been proved in[21] that for a stationary Markov chain, the set $n_{ij} = \sum_{l = 2}^{m} n_{ij} (l)$ forms a set of sufficient statistics. Furthermore, the maximum likelihood estimator based on the set s^m is

{\hat{p}}_{ij} (m) = \frac{\sum_{l = 2}^{m} n_{ij} (l)}{\sum_{l = 2}^{m} [n_{i 0} (l) + n_{i 1} (l)]} .

(42)

Introducing the counter $N_{ij} (m) : = \sum_{l = 2}^{m} n_{ij} (l)$ , (42) can be rewritten in a recursive form as

{\hat{p}}_{ij} (m) = \frac{N_{ij} (m - 1) + n_{ij} (m)}{N_{i 0} (m - 1) + N_{i 1} (m - 1) + n_{i 0} (m) + n_{i 1} (m)} .

(43)

The extension of the estimator to higher-order Markov chains is straightforward. As shown in[21], the estimators for a second- and third-order Markov chain are, respectively,

{\hat{p}}_{ijk} (m) = \frac{\sum_{l = 3}^{m} n_{ijk} (l)}{\sum_{l = 3}^{m} n_{ij} (l)}

(44)

and

{\hat{p}}_{ijkr} (m) = \frac{\sum_{l = 4}^{m} n_{ijkr} (l)}{\sum_{l = 4}^{m} n_{ijk} (l)} .

(45)

Appendix 2

In order to solve the problem (9) let us consider the following Lagrangian function

\begin{array}{l} L (p) = \frac{1}{M} \sum_{k = 1}^{N} \sum_{m = m_{0}}^{m_{0} + M - 1} [β_{k, m} \underset{2}{log} (1 + p_{k, m} a_{n} (k)) \\ + γ_{k, m} \underset{2}{log} (1 + p_{k, m} a_{I} (k, m))] - λ \\ \times (\frac{1}{M} \sum_{k = 1}^{N} \sum_{m = m_{0}}^{m_{0} + M - 1} p_{k, m} - P_{T}) + \sum_{k = 1}^{N} \sum_{m = m_{0}}^{m_{0} + M - 1} \\ \times μ_{k, m} p_{k, m} - \sum_{k = 1}^{N} \sum_{m = m_{0}}^{m_{0} + M - 1} ν_{k, m} (p_{k, m} - p^{max} (k)) \end{array}

(46)

where λ, μ_k,m, ν_k,m are the Lagrangian multipliers and the KKT conditions can be written as

\{\begin{array}{l} \nabla L = 0 \\ 0 \leq μ_{k, m} ⊥ p_{k, m} \geq 0 \forall k, m \\ 0 \leq ν_{k, m} ⊥ (p_{k, m} - p^{max} (k)) \leq 0 \forall k, m \\ 0 \leq λ ⊥ (\frac{1}{M} \sum_{k = 1}^{N} \sum_{m = m_{0}}^{m_{0} + M - 1} p_{k, m} - P_{T}) \leq 0 \end{array} .

(47)

Observe that if p_k,m< p^max(k) then ν_k,m= 0 and this system can be reduced to

\{\begin{array}{l} \frac{1}{M} [λ - \frac{a_{n} (k) β_{k, m} + a_{I} (k, m) (γ_{k, m} + a_{n} (k) p_{k, m})}{(1 + a_{n} (k) p_{k, m}) (1 + a_{I} (k, m) p_{k, m})}] p_{k, m} = 0 \\ λ \geq \frac{a_{n} (k) β_{k, m} + a_{I} (k, m) (γ_{k, m} + a_{n} (k) p_{k, m})}{(1 + a_{n} (k) p_{k, m}) (1 + a_{I} (k, m) p_{k, m})} \\ 0 \leq λ ⊥ (\frac{1}{M} \sum_{k = 1}^{N} \sum_{m = m_{0}}^{m_{0} + M - 1} p_{k, m} - P_{T}) \leq 0 \\ p_{k, m} \geq 0 \end{array} .

(48)

By exploiting the following inequality whose validity can easily be proved

\begin{array}{l} \frac{a_{n} (k) β_{k, m} + a_{I} (k, m) (γ_{k, m} + a_{n} (k) p_{k, m})}{(1 + a_{n} (k) p_{k, m}) (1 + a_{I} (k, m) p_{k, m})} \\ \leq \frac{a_{n} (k) β_{k, m} + a_{I} (k, m) (γ_{k, m} + a_{n} (k) p_{k, m})}{(1 + a_{n} (k) p_{k, m}) (1 + a_{I} (k, m) p_{k, m})} ∣_{p_{k, m} = 0} \\ = a_{n} (k) β_{k, m} + a_{I} (k, m) γ_{k, m}, \end{array}

(49)

we can deduce that if λ < a_n(k)β_k,m + a_I(k,m)γ_k,m the second inequality in (48) can hold only if p_k,m> 0 so that from the first equation in (48) it results

λ = \frac{a_{n} (k) β_{k, m} + a_{I} (k, m) (γ_{k, m} + a_{n} (k) p_{k, m})}{(1 + a_{n} (k) p_{k, m}) (1 + a_{I} (k, m) p_{k, m})} .

(50)

On the other hand if λ ≥ a_n(k)β_k,m + a_I(k,m)γ_k,m, then p_k,m> 0 is never verified since it would imply

\begin{array}{l} λ \geq a_{n} (k) β_{k, m} + a_{I} (k, m) γ_{k, m} \\ > \frac{a_{n} (k) β_{k, m} + a_{I} (k, m) (γ_{k, m} + a_{n} (k) p_{k, m})}{(1 + a_{n} (k) p_{k, m}) (1 + a_{I} (k, m) p_{k, m})} \end{array}

(51)

which violates the complementary conditions. As a consequence for λ ≥ a_n(k)β_k,m + a_I(k,m)γ_k,m we have p_k,m= 0.Let us now solve Equation (50), i.e.,

\begin{array}{l} ã_{k, m} p_{k, m}^{2} + {\tilde{b}}_{k, m} p_{k, m} + {\tilde{c}}_{k, m} = 0 \end{array}

(52)

then defining $ã_{k, m} = λ a_{n} (k) a_{I} (k, m)$ , ${\tilde{b}}_{k, m} = λ [a_{n} (k) + a_{I} (k, m)] - a_{n} (k) a_{I} (k, m)$ and ${\tilde{c}}_{k, m} = λ - a_{n} (k) β_{k, m} - a_{I} (k, m) γ_{k, m}$

\begin{array}{l} p_{k, m} = \frac{- {\tilde{b}}_{k, m} \pm \sqrt{{\tilde{b}}_{k, m}^{2} - 4 ã_{k, m} {\tilde{c}}_{k, m}}}{2 ã_{k, m}} \end{array} .

(53)

Let us now make some useful observations:

1.
The solutions in (53) are always real. In fact the term under the squared root is always positive ∀ λ ≥ 0 since it results
$\begin{array}{l} {\tilde{b}}_{k, m}^{2} - 4 ã_{k, m} {\tilde{c}}_{k, m} = λ^{2} {(a_{n} (k) - a_{I} (k, m))}^{2} + 2 λ a_{n} (k) \\ \times a_{I} (k, m) (a_{n} (k) - a_{I} (k, m)) \\ \times (1 - 2 γ_{k, m}) + a_{n} {(k)}^{2} a_{I} {(k, m)}^{2} \end{array}$
(54)

where a_n(k)−a_I(k,m) > 0 and the minimum, achieved for γ_k,m= 1, is given by
${[λ (a_{n} (k) - a_{I} (k, m)) - a_{n} (k) a_{I} (k, m)]}^{2} \geq 0, \forall λ \geq 0;$
(55)
2.
The solution
$\begin{array}{l} p_{k, m}^{a} = \frac{- {\tilde{b}}_{k, m} - \sqrt{{\tilde{b}}_{k, m}^{2} - 4 ã_{k, m} {\tilde{c}}_{k, m}}}{2 ã_{k, m}} \end{array}$
(56)

is always negative since the inequality
$\begin{array}{l} \sqrt{{\tilde{b}}_{k, m}^{2} - 4 λ a_{n} (k) a_{I} (k, m) (λ - a_{n} (k) β_{k, m} - a_{I} (k, m) γ_{k, m})} \\ > - {\tilde{b}}_{k, m} \end{array}$
(57)

is verified for all λ > 0;
3.
The sign of the solution
$\begin{array}{l} p_{k, m}^{b} = \frac{- {\tilde{b}}_{k, m} + \sqrt{{\tilde{b}}_{k, m}^{2} - 4 ã_{k, m} {\tilde{c}}_{k, m}}}{2 ã_{k, m}} \end{array}$
(58)

can be studied by considering the following inequality
$\begin{array}{l} \sqrt{{\tilde{b}}_{k, m}^{2} - 4 λ a_{n} (k) a_{I} (k, m) (λ - a_{n} (k) β_{k, m} - a_{I} (k, m) γ_{k, m})} \\ > {\tilde{b}}_{k, m} . \end{array}$
(59)

In particular^d it results
$p_{k, m}^{b} = \{\begin{array}{l} > 0 & for & λ < a_{n} (k) β_{k, m} + a_{I} (k, m) γ_{k, m} \\ \leq 0 & for & λ \geq a_{n} (k) β_{k, m} + a_{I} (k, m) γ_{k, m} \end{array} .$
(60)

According to the above considerations the solution for 0 < p_k,m< p^max(k) and λ < a_n(k)β_k,m + a_I(k,m)γ_k,m is $p_{k, m} = p_{k, m}^{b}$ , hence we can write

p_{k, m} = \{\begin{array}{l} 0 & if & λ \geq a_{n} (k) β_{k, m} + a_{I} (k, m) γ_{k, m} \\ p_{k, m}^{b} & if & λ < a_{n} (k) β_{k, m} + a_{I} (k, m) γ_{k, m} \end{array}

(61)

so that the optimal solution can be written as $p_{k, m}^{*} = {[p_{k, m}^{b}]}_{0}^{p^{max} (k)}$ with $\sum_{k = 1}^{N} \sum_{m = m_{0}}^{m_{0} + M - 1} {[p_{k, m}^{b}]}_{0}^{p^{max} (k)} = P_{T} M$ .

Appendix 3

Convergence Analysis of MADP Algorithm

Proceeding as in[19], in order to prove the convergence of the algorithm it is sufficient to show that

(a)
With a proper choice of the step α _q(n), MADP converges to a fixed point;
(b)
This point is a solution of the KKT conditions of the modified game in (19) and then it is also a solution point of the optimization problem
$\begin{array}{l} max_{p} & \bar{R} (p) \\ s.t. & p \in P \end{array}$
(62)

where the FAPs’ sum rate is $\bar{R} (p) = \sum_{q = 1}^{Q} {\bar{R}}_{q} (p)$ and $P$ is the cartesian product of the sets $P_{q}$ .

Let us denote with $U_{1} (n) = \bar{R} (p (n))$ the sum utility reached at the step n of the MADP algorithm. Then for each user q we must prove that there exists a sequence α_q(n) > 0 so that U₁(n) is monotonically increasing and convergent, i.e., U₁(n + 1) ≥ U₁(n) ∀ n and $U_{1} (n) \to U_{1}^{*}$ as $n \to \infty$ . As discussed in[19], we only need to show that U₁(n) is monotonically increasing, i.e., it suffices to consider a given iteration n in which user q is selected to update its power profile, and show that U₁(p_q(n + 1)) ≥ U₁(p_q(n)), where the total utility U₁ is now regarded as a function of p_qbecause only the power profile of user q is updated. Hence, our goal is to prove that U₁(p_q(n + 1)) ≥ U₁(p_q(n)). To do this we will use the descent lemma to bound U₁(p_q(n + 1)). Descent lemma[19] says that if a function $F : R^{n} \to R$ is continuously differentiable and its gradient is Lipschitz continuous with Lipschitz constant equal to K then, $\forall x, y \in R^{n}$

F (x + y) \leq F (x) + y^{T} \nabla F (x) + \frac{K}{2} {∥y∥}_{2}^{2} .

(63)

One sufficient condition for Lipschitz continuity is that the l₂-norm of the Hessian matrix of F(x) is bounded, in which case this bound can be used for the Lipschitz constant. It can easily be shown that it is true for U₁(p_q). Specifically, there exists a constant $B_{U_{1}}^{q}$ which upper bounds the l₂-norm of the Hessian matrix of U₁(p_q) independent of others’ power profiles.

Applying the Descent lemma to −U₁(p_q), we get

\begin{array}{l} U_{1} (p_{q} (n + 1)) \geq U_{1} (p_{q} (n)) + {[p_{q} (n + 1) - p_{q} (n)]}^{T} \\ \times \nabla_{p_{q}} U_{1} (p_{q} (n)) - \frac{B_{U_{1}}^{q}}{2} \\ \times {∥p_{q} (n + 1) - p_{q} (n)∥}_{2}^{2} . \end{array}

(64)

Hence to prove that U₁(p_q(n + 1)) ≥ U₁(p_q(n)), it suffices to show that

\begin{array}{l} {[p_{q} (n + 1) - p_{q} (n)]}^{T} \nabla_{p_{q}} U_{1} (p_{q}) (n) \geq \frac{B_{U_{1}}^{q}}{2} ∥p_{q} (n + 1) \\ - {p_{q} (n)∥}_{2}^{2} . \end{array}

(65)

Using the power updating rule

p_{h}^{q} (n + 1) = p_{h}^{q} (n) + α_{q} (n) (p_{h}^{q *} - p_{h}^{q} (n))

(66)

with the best response of user q defined in (25), the inequality in (65) can be written as

{[p_{q}^{*} - p_{q} (n)]}^{T} \nabla_{p_{q}} U_{1} (p_{q} (n)) \geq α_{q} (n) \frac{B_{U_{1}}^{q}}{2} {∥p_{q}^{*} - p_{q} (n)∥}_{2}^{2} .

(67)

Observe that

\begin{array}{l} {\frac{\partial U_{1} (p_{q})}{\partial p_{h}^{q}}|}_{p_{q} = p_{q} (n)} = \frac{β_{h}^{q} {SNIR}_{h}^{β^{q}}}{1 + {SNIR}_{h}^{β^{q}}} \cdot \frac{1}{p_{h}^{q} (n)} + \frac{γ_{h}^{q} {SNIR}_{h}^{γ^{q}}}{1 + {SNIR}_{h}^{γ^{q}}} \\ \times \frac{1}{p_{h}^{q} (n)} - \sum_{r \in N_{q}} Π_{h}^{r} (n) | H_{h}^{qr} |^{2}, \end{array}

(68)

then exploiting the result in (25), we can write the left-hand side (LHS) of (67) as

\begin{array}{l} LHS = \sum_{q = 1}^{Q} \frac{\sum_{r \in N_{q}} Π_{h}^{r} (n) {|H_{h}^{qr}|}^{2}}{2 p_{h}^{q} (n)} {({p_{h}^{q}}^{*} - p_{h}^{q} (n))}^{2} \\ + \sum_{q = 1}^{Q} \frac{c_{h}^{q} (η_{h}^{q} - ν^{q}) ({p_{h}^{q}}^{*} - p_{h}^{q} (n))}{p_{h}^{q} (n) (\sum_{r \in N_{q}} Π_{h}^{r} (n) {|H_{h}^{qr}|}^{2} + ν^{q} - η_{h}^{q})} . \end{array}

(69)

Now from (69), with the same steps as in[19], to ensure that

LHS \geq α_{q} (n) \frac{B_{U_{1}}^{q}}{2} {∥p_{q}^{*} - p_{q} (n)∥}_{2}^{2},

(70)

we can choose the step α_q(n) as

α_{q} (n) \leq min \{\frac{2 A_{q}^{n}}{B_{U_{1}}^{q}}, 1\}

(71)

where the coefficient $A_{q}^{n}$ is defined as

A_{q}^{n} = min_{h} \{\frac{\sum_{r \in N_{q}} Π_{h}^{r} (n) {|H_{h}^{qr}|}^{2}}{p_{h}^{q} (n) c_{h}^{q}}\} .

(72)

Finally, in order to prove the point (b), let $U_{1}^{*}$ a fixed point of the algorithm such that $U_{1} (n) = U_{1}^{*}$ for some index n. Then since this is a fixed point, it follows that $p_{h}^{q} (n) = p_{h}^{q *}$ , ∀h,q. It can then be seen that for all q, $p_{h}^{q *}$ must be an optimal solution to the problem (19), given the other users current power profiles and interference price vectors. Hence, p(n) will satisfy also the KKT conditions of the problem (62).

Appendix 4

In order to find the optimal solutions of the convex problem $({\tilde{P}}_{1})$ by studying the Lagrange dual problem, some additional constraint qualification conditions must hold, beyond convexity, to ensure strong duality[23]. One simple constraint qualification is Slater’s condition, i.e., we must verify that some strictly feasible point exists. We can prove that the set ${\tilde{F}}_{q} (p_{- q})$ for each user q fixed the strategies of the others is nonempty. For simplicity in this proof we assume w.l.o.g. m₀ = 1. More specifically, the constraint $R_{q} (p) > R_{q}^{0}$ can be written as

\begin{array}{l} \sum_{k = 1}^{V_{N}^{q}} \sum_{m = 1}^{V_{M}^{q}} [β_{k, m} log (1 + p_{k, m}^{q} a_{n}^{q} (k, m)) + γ_{k, m} \\ \times log (1 + p_{k, m}^{q} a_{I}^{q} (k, m))] > R_{q}^{0} \end{array}

(73)

where we have denoted with $V_{N}^{q} \subseteq {1, \dots, N}$ and $V_{M}^{q} \subseteq {1, \dots, M}$ the subsets, respectively, of subcarriers and time slots that the player q is using during the game. Since $a_{n}^{q} (k, m) > a_{I}^{q} (k, m)$ , to verify (73), it is sufficient to prove that

log (1 + p_{k, m}^{q} a_{I}^{q} (k, m)) > R_{q}^{0}, \forall k \in V_{N}^{q}, m \in V_{M}^{q}, q \in Ω

(74)

and clearly it exists always a set of positive values $p_{k, m}^{q}, p_{q}^{max} (k) \in R_{+}$ such that

\begin{array}{l} p_{q}^{max} (k) > p_{k, m}^{q} \geq (e^{R_{q}^{0}} - 1) \frac{1}{a_{I}^{q} (k, m)} \forall k \in V_{N}^{q}, \\ m \in V_{M}^{q}, q \in Ω . \end{array}

(75)

Let us consider, for k = 1,…,N, m = 1,…,M, the KKT conditions of the optimization problem $({\tilde{P}}_{1})$ :

\begin{array}{l} 1 - λ_{q} [\frac{β_{k, m} a_{n}^{q} (k, m)}{1 + p_{k, m}^{q} a_{n}^{q} (k, m)} + \frac{γ_{k, m} a_{I}^{q} (k, m)}{1 + p_{k, m}^{q} a_{I}^{q} (k, m)}] - μ_{k, m}^{q} + α_{k, m}^{q} = 0 \\ 0 \leq λ_{q} ⊥ {\bar{R}}_{q} (p_{q}, p_{- q}) - R_{q}^{0} \geq 0 \\ 0 \leq p_{k, m}^{q} ⊥ μ_{k, m}^{q} \geq 0 \\ 0 \leq α_{k, m}^{q} ⊥ p_{q}^{max} (k) - p_{k, m}^{q} \geq 0 \end{array} .

(76)

Observe that if $p_{q}^{max} (k) - p_{k, m}^{q} > 0$ , then $α_{k, m}^{q} = 0$ so that, by eliminating in (76) the multiplier $μ_{k, m}^{q}$ , we obtain

\begin{array}{l} 0 \leq [1 - λ_{q} (\frac{β_{k, m} a_{n}^{q} (k, m)}{1 + p_{k, m}^{q} a_{n}^{q} (k, m)} + \frac{γ_{k, m} a_{I}^{q} (k, m)}{1 + p_{k, m}^{q} a_{I}^{q} (k, m)})] ⊥ p_{k, m}^{q} \geq 0 \\ 0 \leq λ_{q} ⊥ {\bar{R}}_{q} (p_{q}, p_{- q}) - R_{q}^{0} \geq 0 \end{array}

(77)

where λ_q> 0 otherwise complementarity yields $p_{k, m}^{q} = 0$ , ∀ k = 1,…,N,m = 1,…,M, and the rate constraint is contradicted. Then, the optimum power vector must satisfy the following equation

a^{q} (k, m) {(p_{k, m}^{q})}^{2} + b^{q} (k, m) p_{k, m}^{q} + c^{q} (k, m) = 0

(78)

having set

\begin{array}{l} a^{q} (k, m) = a_{n}^{q} (k, m) a_{I}^{q} (k, m) \\ b^{q} (k, m) = a_{n}^{q} (k, m) + a_{I}^{q} (k, m) - λ_{q} a_{n}^{q} (k, m) a_{I}^{q} (k, m) \\ c^{q} (k, m) = 1 - λ_{q} [a_{n}^{q} (k, m) β_{k, m} + a_{I}^{q} (k, m) γ_{k, m}] \end{array} .

(79)

The solutions of (78) are

p_{k, m}^{q} = \{\begin{array}{l} p_{k, m}^{a} = \frac{- b^{q} (k, m) - \sqrt{b^{q} {(k, m)}^{2} - 4 a^{q} (k, m) c^{q} (k, m)}}{2 a^{q} (k, m)} \\ p_{k, m}^{b} = \frac{- b^{q} (k, m) + \sqrt{b^{q} {(k, m)}^{2} - 4 a^{q} (k, m) c^{q} (k, m)}}{2 a^{q} (k, m)} \end{array} .

(80)

It can be proved that ∀λ_q> 0, it results $p_{k, m}^{a} \leq 0$ , b^q(k,m)²−4a^q(k,m)c^q(k,m) ≥ 0, and

p_{k, m}^{b} = \{\begin{array}{l} > 0 & for & λ_{q} > \frac{1}{a_{n}^{q} (k, m) β_{k, m} + a_{I}^{q} (k, m) γ_{k, m}} \\ \leq 0 & for & λ_{q} \leq \frac{1}{a_{n}^{q} (k, m) β_{k, m} + a_{I}^{q} (k, m) γ_{k, m}} \end{array} .

(81)

According to the above considerations the solution is $p_{k, m}^{q} = p_{k, m}^{b}$ for $0 < p_{k, m}^{q} < p_{q}^{max} (k)$ and $λ_{q} > \frac{1}{a_{n}^{q} (k, m) β_{k, m} + a_{I}^{q} (k, m) γ_{k, m}}$ so that we can write the optimal power allocation vector as

p_{k, m}^{q *} = {[p_{k, m}^{b}]}_{0}^{p_{q}^{max} (k)}

(82)

where the multiplier λ_q is chosen in order to satisfy the constraint ${\bar{R}}_{q} (p_{q}^{*}, p_{- q}) = R_{q}^{0}$ .

Appendix 5

Proof that the feasible set of the game ${\tilde{G}}_{1}$ is compact and non-empty so that it can be cast as a GPG.

Let us start by the following definition of GPG given in[20]:

Definition 1

A Generalized Nash Equilibrium Problem is a GPG if

(a)
There exists a non-empty, closed set $\tilde{X} \subseteq R^{n}$ such that
$X_{q} (x_{- q}) = {x_{q} \in D_{q} : (x_{q}, x_{- q}) \in \tilde{X}} \forall q = 1, \dots, Q$
(83)

where $D_{q} \subseteq R^{n_{q}}$ ^e are non-empty, closed sets such that $\prod_{q = 1}^{Q} D_{q} ⋂ \tilde{X} \neq \emptyset$ ;
(b)
There exists a continuous function, $Φ (x) : R^{n} \to R$ , named potential function, such that ∀ q∈Ω, ∀ x _−q and for all $y_{q}, z_{q} \in X_{q} (x_{- q})$
$u_{q} (y_{q}, x_{- q}) - u_{q} (z_{q}, x_{- q}) > 0$
(84)

implies
$Φ (y_{q}, x_{- q}) - Φ (z_{q}, x_{- q}) \geq u_{q} (y_{q}, x_{- q}) - u_{q} (z_{q}, x_{- q})$
(85)

where u_q is the q th player payoff function.

According to Definition 1, we have to check the validity of the conditions (a) and (b) for the game ${\tilde{G}}_{1}$ . As regard the condition (a), let us consider the feasible set of the game ${\tilde{G}}_{1}$ , i.e.,

\begin{array}{l} \tilde{X} = {p \in R^{NMQ \times 1} : {\bar{R}}_{q} (p) \geq R_{q}^{0}, 0 \leq p_{k, m}^{q} \leq p_{q}^{max} (k), \\ \forall k \in {1, \dots, N}, \forall m \in {1, \dots, M}, \forall q \in Ω}, \end{array}

(86)

where we have assumed w.l.o.g. m₀ = 1. Then we have to prove the following lemma.

Lemma 1

The feasible set $\tilde{X}$ of the game ${\tilde{G}}_{1}$ is a non-empty, closed and bounded (then compact) subset of $R^{NMQ \times 1}$ if the matrices A_k defined in (91) are P-matrices, for all k = 1,…,N, m = 1,…,M. Sufficient conditions for which this happens are

\sum_{r \in N_{q}} \frac{| H_{k}^{rq} |^{2}}{| H_{k}^{qq} |^{2}} < \frac{1}{e^{R_{q}^{0}} - 1} \forall q \in Ω, \forall k = 1, \dots, N .

(87)

Proof

Let us start by considering the constraints ${\bar{R}}_{q} (p) \geq R_{q}^{0}$ that, by considering only the subcarriers and the time slots that are effectively occupied, can be written as

\begin{array}{l} \sum_{k = 1}^{V_{N}^{q}} \sum_{m = 1}^{V_{M}^{q}} [β_{k, m} log (1 + p_{k, m}^{q} a_{n}^{q} (k, m)) + γ_{k, m} \\ \times log (1 + p_{k, m}^{q} a_{I}^{q} (k, m))] \geq R_{q}^{0}, \forall q \in Ω \end{array}

(88)

where $V_{N}^{q} \subseteq {1, \dots, N}$ , $V_{M}^{q} \subseteq {1, \dots, M}$ are the subsets, respectively, of subcarriers and time slots, that the player q is using during the game. We can note that $a_{n}^{q} (k, m) > a_{I}^{q} (k, m)$ then (88) is surely valid if we prove that

log (1 + \frac{p_{k, m}^{q} | H_{k}^{qq} |^{2}}{σ_{n, q}^{2} (k) + \sum_{r \in N_{q}} p_{k, m}^{r} | H_{k}^{rq} |^{2} + σ_{I^{q}}^{2} (k, m)}) \geq R_{q}^{0}

(89)

so that we have to verify $\forall k \in V_{N}^{q}, m \in V_{M}^{q}$ the following set of inequalities

\begin{array}{l} p_{k, m}^{q} | H_{k}^{qq} |^{2} - (e^{R_{q}^{0}} - 1) \sum_{r \in N_{q}} p_{k, m}^{r} | H_{k}^{rq} |^{2} \\ \geq (e^{R_{q}^{0}} - 1) (σ_{n, q}^{2} (k) + σ_{I^{q}}^{2} (k, m)) \forall q \in Ω . \end{array}

(90)

Defining the vector $p_{k, m} = {(p_{k, m}^{q})}_{q = 1}^{Q}$ and the matrices

A_{k} = {[\begin{array}{l} | H_{k}^{11} |^{2} & - (e^{R_{1}^{0}} - 1) | H_{k}^{12} |^{2} & \dots & - (e^{R_{1}^{0}} - 1) | H_{k}^{1 Q} |^{2} \\ - (e^{R_{2}^{0}} - 1) | H_{k}^{21} |^{2} & | H_{k}^{22} |^{2} & \dots & - (e^{R_{2}^{0}} - 1) | H_{k}^{2 Q} |^{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ - (e^{R_{Q}^{0}} - 1) | H_{k}^{Q 1} |^{2} & - (e^{R_{Q}^{0}} - 1) | H_{k}^{Q 2} |^{2} & \dots & | H_{k}^{qq} |^{2} \end{array}]}^{T}

(91)

we can express the system of inequalities in (90) as

A_{k} p_{k, m} \geq v_{k, m} \forall k \in V_{N}^{q}, m \in V_{M}^{q}

(92)

where the positive entries of the vector $v_{k, m} = {(v_{k, m}^{q})}_{q = 1}^{Q}$ are given by $v_{k, m}^{q} = (e^{R_{q}^{0}} - 1) (σ_{n, q}^{2} (k) + σ_{I^{q}}^{2} (k, m))$ . It can be observed that each matrix A_kis a Z-matrix, i.e. a matrix with all off-diagonal elements non positive. Furthermore if we ensure that each A_kis also a P-matrix, i.e. a matrix whose determinant and all principal minors are positive[24], we can deduce that its inverse is well defined. By imposing diagonally dominance on the elements of the matrices A_k, ∀ k = 1,…,N, we can find the following sufficient conditions for them to be P-matrices, i.e.,

\sum_{r \in N_{q}} \frac{| H_{k}^{rq} |^{2}}{| H_{k}^{qq} |^{2}} < \frac{1}{e^{R_{q}^{0}} - 1} \forall q \in Ω, \forall k = 1, \dots, N .

(93)

Hence, by considering the general case $V_{N}^{q} = {1, \dots, N}$ and $V_{M}^{q} = {1, \dots, M}$ , we can deduce from (92) that there exist positive vectors $p_{k, m}, p^{max} (k) ≜ {(p_{q}^{max} (k))}_{q = 1}^{Q} \in R^{Q}$ such that

p^{max} (k) > p_{k, m} \geq A_{k}^{- 1} v_{k, m} \forall k = 1, \dots, N, m = 1, \dots, M

(94)

or the sets $F_{k, m} = {p_{k, m} \in R^{Q} : log (1 + p_{k, m}^{q} a_{I}^{q} (k, m)) \geq R_{q}^{0}, 0 \leq p_{k, m}^{q} \leq p_{q}^{max} (k), \forall q}$ are non-empty. Of course also the product set $F = {\prod_{k = 1}^{N} \prod_{m = 1}^{M} F_{k, m}} \subseteq \tilde{X}$ is non-empty, so that the non-emptiness of $\tilde{X}$ is implied.

Furthermore, ∀ q ∈ Ω the set ${p \in R_{+}^{MNQ \times 1} : {\bar{R}}_{q} (p) \geq R_{q}^{0}}$ , is the upper level set of the continuous function ${\bar{R}}_{q} (p)$ , then it is closed for all scalar $R_{q}^{0}$ [25]. Hence, the set $\tilde{X} = {p \in R^{NMQ \times 1} : {\bar{R}}_{q} (p) \geq R_{q}^{0}, 0 \leq p_{k, m}^{q} \leq p_{q}^{max} (k), \forall k \in {1, \dots, N}, \forall m \in {1, \dots, M}, \forall q \in Ω}$ , as non-empty intersection of closed sets, is closed[25] and, since it is also bounded, its compactness is proved.

Verification of condition (b) is rather straightforward. In our case, the objective functions do not depend on the other players variables, i.e., u_q(p_q,p_−q) = u_q(p_q) so that the interaction of the players takes places only at the level of feasible sets. In this case, it is immediate to see that condition (b) is satisfied with the potential function Φ(p) simply given by the sum of the objective functions of all players, i.e.,

Φ (p) = \sum_{q = 1}^{Q} u_{q} (p_{q})

(95)

and this concludes the proof. □

Endnotes

^a We denote with p_q the NM-dimensional power vector with entries $p_{k, m}^{q}$ and define $p_{- q} ≜ {(p_{i})}_{i = 1, i \neq q}^{Q}$ where Q is the number of FAPs.

^b We recall that a set of strategies is Pareto efficient, or Pareto optimal, if it is not possible to make at least some player better off without making any other player worse off. Given the whole set of feasible strategies, i.e., the strategies satisfying the system constraints, the Pareto boundary is defined as the set of choices that are Pareto efficient. If an equilibrium point belongs to the Pareto boundary, the equilibrium is said to be efficient.

^c We will assume the prices constant with respect to $p_{k, m}^{q}$ . In general, the assumption of $Π_{k, m}^{r}$ to be constant with respect to $p_{k, m}^{q}$ is only an approximation. Nevertheless, the resulting algorithm provides significant performance improvement with respect to the purely competitive game.

^d In order to prove this result we have exploited the inequality $\frac{a_{n} (k) a_{I} (k, m)}{[a_{n} (k) + a_{I} (k, m)]} < a_{n} (k) - [a_{n} (k) - a_{I} (k, m)] γ_{k, m}$ whose validity can easily be proved.

^e We assume that $\prod_{q = 1}^{Q} R^{n_{q}} = R^{n}$ .

References

Chandrasekhar V, Andrews JG, Gatherer A: Femtocell networks: a survey. IEEE Commun. Mag 2008, 46: 59-67.
Article Google Scholar
Simeone O, Erkip E, Shitz SS: Robust transmission and interference management for femtocells with unreliable network access. IEEE J. Sel. Areas Commun 2010, 28: 1469-1478.
Article Google Scholar
Chandrasekhar V, Andrews JG: Uplink capacity and interference avoidance for two-tier femtocell networks. IEEE Trans. Wirel. Commun 2009, 8: 3498-3509.
Article Google Scholar
Baccelli F, Blaszczyszyn B, Stochastic geometry and wireless networks. Vol. II–applications: Foundations and Trends in Networking. New York; November 2009.
Google Scholar
Chen Y, Zhao Q, Swami A: Joint design and separation principle for opportunistic spectrum access in the presence of sensing errors. IEEE Trans. Inf. Theory 2008, 54(5):2053-2071.
Article MathSciNet Google Scholar
Unnikrishnan J, Veeravalli VV: Algorithms for dynamic spectrum access with learning for cognitive radio. IEEE Trans. Signal Process 2010, 58(2):750-760.
Article MathSciNet Google Scholar
Motamedi A, Bahai A: Optimal channel selection for spectrum-agile low-power wireless packet switched networks in unlicensed band. EURASIP J. Wirel. Commun. Netw 2008, 2008: 10.
Article Google Scholar
Anandkumar A, Michael N, Tang AK, Swami A: Distributed algorithms for learning and cognitive medium access with logarithmic regret. IEEE J. Sel. Areas Commun 2011, 29(4):731-745.
Article Google Scholar
Liu K, Zhao Q: Distributed learning in multi-armed bandit with multiple players. IEEE Trans. Signal Process 2010, 58(11):5667-5681.
Article MathSciNet Google Scholar
Auer P, Cesa-Bianchi N, Fischer P: Finite-time analysis of the multiarmed bandit problem. Mach. Learn 2002, 47(2):235-256. 10.1023/A:1013689704352
Article Google Scholar
Galindo-Serrano A, Giupponi L: Distributed Q-learning for aggregated interference control in cognitive radio networks. IEEE Trans. Veh. Technol 2010, 59(4):1823-1834.
Article Google Scholar
Galindo-Serrano A, Giupponi L: Distributed Q-learning for interference control in OFDMA-based femtocell networks. In 71st Veh. Technol. Conf. (VTC 2010-Spring). (Taipei; May 2010:1-5.
Chapter Google Scholar
Watkins CJ, Dayan P: Technical note: Q-learning. Mach. Learn 8(3–4):279-292. (Kluwer Academic Publishers, Boston, 1992)
Google Scholar
Geirhofer S, Tong L, Sadler BM: Interference-aware OFDMA resource allocation: a predictive approach. In Proc. of Military Commun. Conf. (MILCOM 2008). (San Diego, CA; November 2008:1-7.
Google Scholar
Pang JS, Scutari G, Facchinei F, Wang C: Distributed power allocation with rate constraints in Gaussian parallel interference channels. IEEE Trans. Inf. Theory 2008, 54(8):3471-3489.
Article MathSciNet Google Scholar
Barbarossa S, Carfagna A, Sardellitti S, Omilipo M, Pescosolido L: Optimal radio access in femtocell networks based on Markov modeling of interferers’ activity. In ICASSP 2011. Prague; 22–27 May 2011:3212-3215.
Google Scholar
Facchinei F, Pang J: Finite-Dimensional Varational Inequalities and Complementarity Problems. Springer-Verlag, New York; 2003.
Google Scholar
Huang J, Berry RA, Honig ML: Distributed interference compensation for wireless networks. IEEE J. Sel. Areas Commun 2006, 24: 1074-1084.
Article Google Scholar
Shi C, Berry RA, Honig ML: Distributed interference pricing for OFDM wireless networks with non-separable utilities. In Proc. of CISS 2008. Princeton; March 2008:755-760.
Google Scholar
Facchinei F, Piccialli V, Sciandrone M: Decomposition algorithms for generalized potential games. In Computational Optimzation and Application. Springer-Verlag, New York; 2010.
Google Scholar
Anderson TW, Goodman LA: Statistical inference about Markov chains. Ann. Math. Stat 1957, 28: 89-110. 10.1214/aoms/1177707039
Article MathSciNet Google Scholar
Bai DS, Kim S: Estimation of transition probabilities in a two-state Markov chain. Commun. Stat. Theort Methods 1979, 8(6):591-599. 10.1080/03610927908827785
Article MathSciNet Google Scholar
Boyd S, Vandenberghe L: Convex Optimization. Cambridge University Press, New York; 2004.
Book Google Scholar
Cottle RW, Pang J-S, Stone RE: The Linear Complementarity Problem. Academic Press, Cambridge, UK; 1992.
Google Scholar
Bertsekas DP: Convex Analysis and Optimization. Athena Scientific, Belmont, MA; 2003.
Google Scholar

Download references

Acknowledgements

This study was performed in the framework of the FP7 project FREEDOM ICT-248891 STP, which was funded by the European Community. The authors would like to acknowledge the contributions of their colleagues from FREEDOM Consortium (http://www.ict-freedom.eu). Part of this study has been presented at ICASSP 2011.

Author information

Authors and Affiliations

Department of Information Engineering, Electronics and Telecommunications, Sapienza University of Rome, Via Eudossiana 18, 00184, Rome, Italy
Stefania Sardellitti, Alessandro Carfagna & Sergio Barbarossa

Authors

Stefania Sardellitti
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Carfagna
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Barbarossa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefania Sardellitti.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Sardellitti, S., Carfagna, A. & Barbarossa, S. Optimal resource allocation in fetmocell networks based on Markov modeling of interferers’ activity. J Wireless Com Network 2012, 371 (2012). https://doi.org/10.1186/1687-1499-2012-371

Download citation

Received: 28 November 2011
Accepted: 12 November 2012
Published: 27 December 2012
DOI: https://doi.org/10.1186/1687-1499-2012-371

Optimal resource allocation in fetmocell networks based on Markov modeling of interferers’ activity

Abstract

Introduction

Single-user Bayesian adaptive allocation

Markovian interference model

Maximum expected rate optimization

Min-power optimization strategy

Multi-FAP case: maximum expected rate game

Algorithm 1 MADP algorithm

Multi-FAP case: min-power game

Numerical results

Conclusion

Appendix 1

Appendix 2

Appendix 3

Convergence Analysis of MADP Algorithm

Appendix 4

Appendix 5

Definition 1

Lemma 1

Proof

Endnotes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords