Optimal resource allocation in fetmocell networks based on Markov modeling of interferers’ activity
© Sardellitti et al.; licensee Springer. 2012
Received: 28 November 2011
Accepted: 12 November 2012
Published: 27 December 2012
Femtocell networks offer a series of advantages with respect to conventional cellular networks. However, a potential massive deployment of femto-access points (FAPs) poses a big challenge in terms of interference management, which requires proper radio resource allocation techniques. In this article, we propose alternative optimal power/bit allocation strategies over a time-frequency frame based on a statistical modeling of the interference activity. Given the lack of knowledge of the interference activity, we assume a Bayesian approach that provides the optimal allocation, conditioned to periodic spectrum sensing, and estimation of the interference activity statistical parameters. We consider first a single FAP accessing the radio channel in the presence of a dynamical interference environment. Then, we extend the formulation to a multi-FAP scenario, where nearby FAP’s react to the strategies of the other FAP’s, still within a dynamical interference scenario. The multi-user case is first approached using a strategic non-cooperative game formulation. Then, we propose a coordination game based on the introduction of a pricing mechanism that exploits the backhaul link to enable the exchange of parameters (prices) among FAP’s.
Femtocells are becoming more and more attractive due to their benefits to both cellular operators and subscribers. On the one hand, operators see femtocells as a way to improve indoor coverage and to off-load wireless traffic from the macro cellular network to the wired network, thus releasing wireless channels to additional mobile users. On the other hand, subscribers see femtocells as a way to get higher quality services, either higher data throughput or better voice quality, thanks to a better indoor coverage, and seamless connectivity.
Following the current evolution of cellular standardization process, in this study we assume an LTE framework and we focus on the downlink channel, which assumes an OFDMA strategy. In this context, femtocell networks offer advantages with respect to Wi-Fi, as they avoid vertical hand-off and offer better QoS.
In view of a potential massive deployment of FAP’s, a special attention has to be devoted to RRM. In fact, different from MBSs, FAPs are typically installed by the subscribers and maintained without global planning, with no special consideration about traffic demands or interference with other cells, either femto or macro cells. Hence, a dense deployment of FAPs might induce an intolerable interference from FUE’s to MUE’s or to other FUE’s. Interference management is then arguably one of the major challenges to be faced in femtocell networks.
The goal of this study is to propose an algorithm for optimizing power/bit allocation over a joint time–frequency domain, incorporating a statistical model of the macro-users activity. Since the interference is unknown, the proposed algorithm follows a Bayesian approach, which allocates power/bits over successive time/frequency slots depending on a preliminary sensing and estimation of the parameters of the interference model. We assume a Markov modeling for simplicity, but the approach can be generalized to more sophisticated models, like e.g.,[3, 4]. More specifically, in this study the interference over different frequency subchannels is modeled as a set of statistically independent homogeneous discrete-time Markov chains (DTMCs). We consider a single-user allocation first, where a single FAP finds the optimal resource allocation according to two alternative strategies: (i) maximize the expected rate, conditioned to the result of the sensing and estimation phase, under a transmit power constraint; (ii) minimize the transmit power under the expected rate constraint.
Opportunistic spectrum access (OSA) in multicarrier networks where the channel occupancy follows a Markovian evolution has already been studied in the framework of cognitive radio (CR) in[5, 6], for example. Chen et al. develop an optimal OSA scheme aimed at optimizing spectrum sensing and access policies jointly. They assumed that the secondary transmitter receives error-free ACK signals from the secondary’s receiver, whenever the transmission is successful, and this information is used to track the state of the primary channels. Interestingly enough, Chen et al. establish a separation principle that decouples the design of spectrum sensor and access policy. A similar context is studied in[6, 7], where the authors combine learning and dynamic spectrum access. Both Chen et al. and Unnikrishnan and Veeravalli consider an objective function that depends only on the available cognitive bandwidth and puts a constraint on the collision probability with the primary users. Anandkumar et al. and Liu and Zhao formulate the multi-user OSA problem as a decentralized multi-armed bandit problem. In such a framework, each user learns the channel availability statistics and designs a channel access rule in order to maximize the transmission throughput (or equivalently minimize the system regret, defined as the loss in secondary throughput due to learning errors and collisions under distributed access). In, which is an extension of the single-user policy proposed in to the multi-user case, Liu and Zhao propose a family of distributed learning and access policies known as time-division fair share. For these policies, they prove the minimum growth rate of the system regret, which is shown to behave logarithmically with respect to the number of time slots. Moreover, Liu and Zhao distinguish the case of known number of secondary users from the case in which this number is unknown but estimated at each user through feedback. An alternative scheme for distributed resource allocation between CRs incorporating aggregated interference control is analyzed in[11, 12], where the authors propose a form of real-time multi-agent reinforcement learning, known as decentralized Q-learning, to manage the aggregated interference. The objective function to be minimized is an expected discounted cumulative cost related to the difference between the effective signal-to-noise plus interference ratio (SINR) and a target SINR, which has to be guaranteed to the primary system. This SINR is measured at some control points located in the protection contour of the primary network and it is fed back to the secondary base stations that adjust their transmit power consequently. One of the interesting aspects of such an approach is that it is model-free and does not require the knowledge of the transition probabilities of the underlying Markov process. Finally, Geirhofer et al. propose an interference-aware resource allocation for OFDMA systems, based on the sensing and prediction of the ad hoc users from the infrastructure users.
Different from the previous studies, in this article we propose a Bayesian radio access method enabling (possibly multiple) FAP’s to allocate power/bits over a time–frequency grid based on the current belief on the interference level, as obtained from previous sensing. Since the interference cannot be known in advance, we use a Bayesian approach and formulate the utility function as the expected value of the utility conditioned to previous measurements. The goal is to relax the requirement on sensing time and allocate resources over a certain number of future time slots, depending on the interference model and on our prediction capability.
The article is organized as follows. We consider first the radio access of a single FAP and we maximize the expected rate, averaged over the interference activity model, under a transmit power constraint. In this case, the solution can be found in closed form and it represents a sort of generalized water-filling algorithm, with water level depending on the interference activity probabilities. Then, we illustrate an alternative approach consisting in the minimization of the transmit power, subject to a constraint on the minimum average femto-user rate. Then, we generalize the proposed approaches to the multi-FAP scenario, where we analyze the interaction among FAPs using a game-theoretic approach. In particular, we consider first a purely competitive game, where each FAP adopts a purely selfish strategy. Since the competitive game might lead to inefficient Nash equilibria, we also propose a coordinated game where, thanks to the exchange of a few parameters through the backhaul link, the FAP’s coordinate their action to improve upon inefficient Nash equilibria and maximize the sum-rate or minimize the sum-power.
Single-user Bayesian adaptive allocation
Femtocell networks are fully compliant with cellular standards. Given the current evolution of 3G systems, in this article we are concerned with an LTE system and the goal is to allocate power over a time–frequency grid adaptively, as a function of the current occupancy. This implies that the channel and interference power must be sensed at the beginning of each frame. Given the low mobility of indoor users, the channel can be assumed to be nearly constant over the frame. However, the interference from macro-users may vary along the frame depending on the macro-user activity. A correct power allocation across time and frequency would require a non-causal knowledge of the interference, which is of course unavailable. To circumvent this inconvenience, we propose a time–frequency resource allocation based on a Markov modeling of the interference activity over each frequency subchannel. More specifically, we assume that the activity of the macro users over the frequency subchannels is modeled as a set of statistically independent homogeneous DTMCs. The parameters of the statistical model are assumed constant within the frame, but they may vary over successive frames. Each FUE estimates the interference power and the transition probabilities of the interference activity over each subchannel and feeds this information back to the associated FAP, which computes the optimal power allocation over a time–frequency frame, following a Bayesian approach.
In this section, we assume that the interferers do not react to the strategy of the FAP of interest. In the subsequent sections, we will extend the study to the case where the other FAPs react to the choice of nearby, interfering FAPs, thus generating an iterative process, whose stability properties will properly be studied.
Markovian interference model
where and the entries of the transition matrix are given in (4). Let βk,m= Pr(Sk,m= 0) and γk,m= Pr(Sk,m= 1) the probabilities that the channel k at time m is, respectively, idle and busy. Then, we can iteratively calculate them at time m from Equation (4) as and. The generalization to higher orders is straightforward and the formulas are reported in Appendix 1, for convenience. In Appendix 1, we also report the formulas used to estimate the transition probabilities from the observations.
Maximum expected rate optimization
Having introduced the interference model, our goal now is to find the bit/power allocation over an OFDM frame composed of N subcarriers and M consecutive time slots, in order to maximize the expected rate, taking into account the macro-users activity. The assumptions underlying the proposed approach are (1) the channels are affected by multipath, with time-invariant coefficients within each frame, supposed to be known at the transmitter side; (2) the activity of the interferers over each subchannel is modeled as a homogeneous DTMC of order L and the transition probabilities are estimated by using the ML estimators discussed in Appendix 1; (3) the activities of the interferers over different channels are statistically independent of each other; (4) the power allocation of the interferers are independent of the power allocation of the FAP of interest.
The last assumption is made to distinguish this situation from the case where the interferers are themselves sensing the channel (interference) and adapting their strategy consequently. In this second case, each adaptive transmitter reacts to the strategies of the other, thus inducing an iterative process that must properly be studied. The first scenario, which is the subject of this section, is appropriate when the interferer is an MBS, for example. The second scenario is more appropriate to model the situation where there are a few nearby FAP’s attempting to access the radio channel at the same time. The analysis of this scenario will be carried out in the next section by resorting to game-theoretic tools.
In the case where there is only one adaptive device, the FUE is supposed to measure the interference power from the macro network over each subchannel over a number of time slots that depend on the order of the Markov chain as well as on the accuracy of the estimation.
where λ is the Lagrange multiplier. Since the optimal powers are functions of λ, we can find this multiplier numerically as the solution of the power constraint. Expression (10) is a generalization of the well-known water-filling solution. Indeed, it can be shown that, by taking the limit for the transition probabilities going to 1 or 0, i.e., by turning the Markov chain into the degenerate case of a deterministic signal, Equation (10) converges to the water-filling solution.
Min-power optimization strategy
Since one of the most critical issues in femtocells is interference management, an alternative optimization procedure consists in minimizing the FAP transmit power, under the constraint of guaranteeing the required rate over the link between the FAP and the associated FUE. This strategy was proposed, for example, in assuming a static interference. Here, we generalize that approach to the case where the interference is dynamic and its activity evolves as a Markov chain, as described in the previous section. The objective now is to minimize the average transmit power across the N subchannels and over M consecutive time slots. Denoting with m0 the index of the time slot where the interference power profile is measured, the goal is to allocate power over a set of consecutive slots, starting from m0, i.e., for m = m0,…,m0 + M−1, under the constraint that the expected rate, conditioned to the observation on the initial L time slots, i.e., for m = m0−L + 1,…,m0, does not have to be smaller than a given value R0.
where the expected rate is computed as in (8).
where the Lagrange multiplier λ is found numerically in order to satisfy the rate constraint.
One important difference between the min-power and the max-rate problems is that, in the min-power case, the feasible set could be empty. If this happens, it means that the rate requirement is too high to be accommodated. Hence, either the rate requirement is lowered until the feasible set becomes non-empty, or the user is not admitted. This protocol is handled by the call admission control.
Multi-FAP case: maximum expected rate game
In a scenario containing multiple nearby FAP’s implementing the radio access according to the adaptive strategy described in the previous section, each FAP may react to the power allocation of nearby FAP’s, by changing its own power allocation and so on. This interaction induces an iterative mechanism whose convergence properties have to be carefully studied. The problem can be studied using the theoretical tools of game theory, which is well suited for this kind of multi-objective decision problem. In particular, given the existence of a wired backhaul connecting the FAP’s, we will consider a purely competitive game, where each FAP (player) seeks to optimize its own utility function, irrespective of the other FAP’s performance, and a coordination game, where nearby FAP’s exchange some parameters to improve performance with respect to the purely competitive case.
where is the set of neighbors of the q th FAP and is the channel transfer function of the k th subchannel between the r th transmitter and the q th receiver. The probabilities βk,m and γk,m evolve in time according to a Markov chain of order L. We assume, as in the previous section that the allocation over M consecutive time slots is carried out on the basis of the observation of a number of initial time slots equal to the order of the Markov chain, i.e., L.
It is worth noticing that the major difference between the expected rate in (14) with respect to (8) is that now the interference is composed of two contributions: the dynamic interference of the macro-users, whose activities evolve as Markov chains but whose power profile, when on, is fixed, and the interference from the other FAP’s, whose activity is always on, but whose power profile, described by the vectors evolve as a response to the choices of the other FAP’s.
Since the objective function in (16) is strictly concave in, for any given p−q, and the feasible set is compact and convex, game admits a non-empty solution set for any set of channels and transmit power constraints of the users. In, we reformulated this game as a Variational Inequality and we applied the iterative gradient projection algorithm to solve it, deriving sufficient conditions for its convergence to a Nash Equilibrium (NE).
Game may possess multiple equilibria, which may not be Pareto-efficient.b To improve upon the performance of the NE of purely competitive game, we can modify the utility function of each user in order to induce the players to incorporate a social utility function, rather than being purely selfish. For example, in[18, 19] it has been proposed to modify the utility function of each player so as to maximize the sum of all users’ rates. In principle, this change should require a centralized solution. Nevertheless, Huang et al. showed that the solution of the sum-rate game can be still achieved in decentralized form, provided that the players exchange some parameters, the so-called prices. These parameters induce a penalty on each player utility proportional to the rate decrease that each player strategy induces on the other players. Introducing pricing mechanisms in femtocell networks is possible, thanks to the existence of the backhaul link, which allows the exchange of prices among FAP’s. Furthermore, we will show next that every FAP needs to exchange pricing coefficients only with its neighbors, thus keeping the amount of extra signaling limited.
and the optimal power allocation vector is where the multiplier ν q is chosen in order to satisfy the constraint. The previous solution assumes, for each player, that the powers used by the other players are given.
where and ν q and are the Lagrangian multipliers. Given this setting, the modified MADP algorithm is illustrated below.
Algorithm 1 MADP algorithm
Each FAP performs its allocation over M consecutive time slots from m0 to m0 + M − 1.
Before performing its allocation each FAP has to observe q s samples from m0−q s + 1 to m0, in order to estimate the transition probabilities of the underlying Markov chain.
S.0: Each FAP q chooses an initial power profile in the set and set n = 0;
S.1: Each FAP computes its interference prices for h = 1,…,MN andsends them to its neighbors with index;
S.2: At each time n, each FAP updates its power profile so as to maximize its utility function,given the other FAP’s power profiles p−qand price vectors according to for h = 1,…,MN, where is given by (25);
S.3: Set n = n + 1, go to step S.1 and repeat until convergence is reached.
Following similar arguments as, we proved in Appendix 3 that there exists a small enough step size values α q (n) for which the MADP algorithm converges monotonically to a fixed point.
Multi-FAP case: min-power game
is compact and non-empty. Hence, game is a GPG with potential function Φ(p) the sum of the objective functions of all players, i.e.,.
where λ s is the Lagrangian multiplier of user s relative to the rate constraint. The pricing coefficients are defined in the same way as in the max-rate case.
In conclusion, in this article we have shown how the estimation of the interference statistical parameters can be beneficial to improve the performance of a power allocation technique, provided that the statistical model fits the real data. In this study, we assume that the transition probabilities are estimated from the data. An interesting future direction consists in incorporating methods which do not really require such an estimation, but acquire the proper behavior through reinforcement learning. The interesting part of our method is that, for a given estimation of the transition probabilities, the power allocation across the set of time slots/frequency channels is found in closed form. This is indeed useful to save convergence time with respect to gradient-based techniques.
The other important contribution of this article is the decentralized approach for resource allocation based on game theory, in the case where the interference is dynamically varying. Our Bayesian formulation of the game provides interesting results in such an uncertain environment. Finally, the introduction of coordination among FAP’s based on the exchange of a few parameters (prices) through the backhaul link has been shown to provide significant performance gains with respect to the purely competitive game.
Further developments should incorporate a robust approach for the situation where the interference statistical model is not known or the statistical parameters are time-varying. One more critical aspect is the availability of the backhaul link for the exchange of prices. Since such a link is affected by random delays, it may be useful to incorporate robust mechanisms to cope with the situation where the price does not arrive within a maximum tolerable delay.
- 1.The solutions in (53) are always real. In fact the term under the squared root is always positive ∀ λ ≥ 0 since it results(54)where a n (k)−a I (k,m) > 0 and the minimum, achieved for γk,m= 1, is given by(55)
- 2.The solution(56)is always negative since the inequality(57)
is verified for all λ > 0;
- 3.The sign of the solution(58)can be studied by considering the following inequality(59)In particulard it results(60)
so that the optimal solution can be written as with.
Convergence Analysis of MADP Algorithm
With a proper choice of the step α q (n), MADP converges to a fixed point;
- (b)This point is a solution of the KKT conditions of the modified game in (19) and then it is also a solution point of the optimization problem(62)
where the FAPs’ sum rate is and is the cartesian product of the sets.
One sufficient condition for Lipschitz continuity is that the l2-norm of the Hessian matrix of F(x) is bounded, in which case this bound can be used for the Lipschitz constant. It can easily be shown that it is true for U1(p q ). Specifically, there exists a constant which upper bounds the l2-norm of the Hessian matrix of U1(p q ) independent of others’ power profiles.
Finally, in order to prove the point (b), let a fixed point of the algorithm such that for some index n. Then since this is a fixed point, it follows that, ∀h,q. It can then be seen that for all q, must be an optimal solution to the problem (19), given the other users current power profiles and interference price vectors. Hence, p(n) will satisfy also the KKT conditions of the problem (62).
where the multiplier λ q is chosen in order to satisfy the constraint.
Proof that the feasible set of the game is compact and non-empty so that it can be cast as a GPG.
Let us start by the following definition of GPG given in:
- (a)There exists a non-empty, closed set such that(83)
wheree are non-empty, closed sets such that;
- (b)There exists a continuous function, , named potential function, such that ∀ q∈Ω, ∀ x −q and for all(84)implies(85)
where u q is the q th player payoff function.
where we have assumed w.l.o.g. m0 = 1. Then we have to prove the following lemma.
or the sets are non-empty. Of course also the product set is non-empty, so that the non-emptiness of is implied.
Furthermore, ∀ q ∈ Ω the set, is the upper level set of the continuous function, then it is closed for all scalar. Hence, the set, as non-empty intersection of closed sets, is closed and, since it is also bounded, its compactness is proved.
and this concludes the proof. □
a We denote with p q the NM-dimensional power vector with entries and define where Q is the number of FAPs.
b We recall that a set of strategies is Pareto efficient, or Pareto optimal, if it is not possible to make at least some player better off without making any other player worse off. Given the whole set of feasible strategies, i.e., the strategies satisfying the system constraints, the Pareto boundary is defined as the set of choices that are Pareto efficient. If an equilibrium point belongs to the Pareto boundary, the equilibrium is said to be efficient.
c We will assume the prices constant with respect to. In general, the assumption of to be constant with respect to is only an approximation. Nevertheless, the resulting algorithm provides significant performance improvement with respect to the purely competitive game.
d In order to prove this result we have exploited the inequality whose validity can easily be proved.
e We assume that.
This study was performed in the framework of the FP7 project FREEDOM ICT-248891 STP, which was funded by the European Community. The authors would like to acknowledge the contributions of their colleagues from FREEDOM Consortium (http://www.ict-freedom.eu). Part of this study has been presented at ICASSP 2011.