A twostage game theoretic approach for selforganizing networks
 Habib BA Sidi^{1}Email author,
 Rachid ElAzouzi^{1} and
 Majed Haddad^{1}
https://doi.org/10.1186/168714992013119
© Sidi et al.; licensee Springer. 2013
Received: 10 August 2012
Accepted: 15 March 2013
Published: 2 May 2013
Abstract
Growth of network access technologies in the mobile environment has raised several new issues due to the interference between the available access. Thanks to the currently used access methods such as the orthogonal frequency division multiple access in mobile networks and the long term evolutionadvanced systems, the intracell interferences are avoided and the quality of service has increased. Nevertheless, the diversity and multiplicity of base stations in the network have left behind a major problem of intercell interferences. In this paper, we focus on the optimization of the total throughput of cellular networks using fractional frequency reuse and allowing each mobile user to individually choose its serving base station. We derive analytically the utilities related to the network manager and mobile users and develop a Stackelberg game to obtain the equilibrium. We propose a distributed algorithm that allows the base stations, using a light collaboration, to achieve an efficient utilization of the frequencies, with the optic of maximizing the total system utility. This algorithm is based on stochastic gradient descent which requires some information to be exchanged between neighboring base stations. At user association level, we propose an iterative distributed algorithm based on automata learning algorithm. Both algorithms allow the system to converge to the Stackelberg equilibrium. Furthermore, simulation results carried out based on a realistic network setting show promising results in terms of global utility and convergence issues. In this setting, we include scenarios with a varying number of users and address the problem of robustness and scalability of the proposed approach.
Keywords
Introduction
Recently, the use of selforganizing network (SON) features in a framework of general policy management has been suggested. In such frameworks, SON entities are used as a means to enforce highlevel operator policies, introduced in the management plane, and translated into lowlevel objectives guiding coordinated SON entities [1]. Among the most important selfoptimization mechanisms in radio access networks (RAN) are interference coordination [2], mobility management, and energy saving [3]. Several such problems need further investigation to fully benefit from SON in RAN, in areas where little material has been published. Examples are autonomous cell outage management and coverage capacity optimization [4]. It is noted that the problem of coordinating simultaneous SON processes is an open and challenging problem that needs to be addressed in order to allow the deployment of SON mechanisms.
Specifically, we assume that the fractional frequencyreuse (FFR) of a cell can be configured dynamically. In that case, some base stations (BSs or eNodeBs) would be enabled to adjust their FFR in order to provide coverage/capacity for other neighboring cells. We further model the network behavior as a Stackelberg game between the network manager and the mobile users using the game theory framework [8].
At the core lies the idea that introducing a certain degree of hierarchy in noncooperative games not only improves the individual efficiency of all users but can also be a way of reaching a desired tradeoff between the global network performance at the equilibrium and the requested amount of signaling. The proposed approach can be seen as an intermediate scheme between the totally centralized policy and the noncooperative policy. It is also quite relevant for flexible networks where the trend is to split the intelligence between the network infrastructure and mobile users’ equipments. In the Stackelberg game, the network manager is acting as the leader and mobile users as the followers. In the first stage, the leader chooses its strategy profile and announces it to the followers. Then, the followers decide their respective outcomes depending on the strategy profile of the leader. Under our scenario, the network manager maximizes the total network throughput by means of power control and announces its strategy profile to mobile users. Each mobile will decide individually to which of the available base stations it is best to connect according to its radio condition and the strategy profile broadcasted by the network.
We also propose a twostage selfoptimization algorithm for both the leader and the followers. The objective is to achieve dynamically an efficient frequency reuse pattern based on their past experience and their learning capabilities. The leader’s algorithm is based on stochastic gradient descent algorithm which requires some information to be exchanged between neighboring base stations. For user association, we propose an iterative distributed algorithm based on automata learning mechanisms. Both algorithms have been shown to converge to the Stackelberg equilibrium while providing substantial gain compared to optimal solution and fixed full reuse scheme.
The original contributions of our approach are threefold:

Investigating fractional frequency reuse technique for intercell interference coordination in an OFDMA network

Modeling the interaction between the network and mobiles using a Stakelberg game framework

Proposing a hierarchical algorithm that allows convergence towards the Stackelberg equilibrium
In comparison to our previous work [9] presented at Wireless Days 2011, this paper^{a} extends with richer developments the materials presented before. Especially, we further explore the case when the network environment is dynamic. By dynamic, we mean that the number of users varies in time with mobiles arriving and departing the system. Through extensive simulations based on a realistic network setting, the proposed approach is shown to be robust and scalable. In this latter setting, we also give some insight on how to design a tradeoff between the global network performance at the equilibrium and the requested amount of signaling. More clearly, the following contributions have been developed:

Addressing convergence and stability properties of our distributed mechanisms with an evaluation of the computational cost.

Exploring the robustness of the proposed approach with timevarying number of users, thus simulating a seamless dynamic environment.

Giving some insight on the ways of finding a desired tradeoff between the desired global network performance and the amount of control feedback.

At the equilibrium, our mechanisms achieve up to 90% of the optimal association policy, with similar results with partial exchange of information or perturbed environment.
The paper is organized as follows: The system model is exposed in the ‘ The system model’ section. The ‘ Network resources’ section provides a description of the network scenario adopted throughout the paper. In the ‘ Hierarchical game formulation’ section, we present the game theoretic framework and propose formally how the network manager and mobile users can obtain their respective equilibria by means of a Stackelberg formulation. In the ‘Learning for optimal decision’ section, the proposed hierarchical algorithm is investigated for both the leader and the followers. In the ‘ Implementation and validation’ section, simulation results under realistic wireless network settings are shown to exhibit interesting features in terms of selfoptimizing deployment for intercell interference coordination. The ‘ Conclusion’ section concludes the paper.
Scenario description
The system model
Network resources
A key example of dynamic resource allocation is that of power control, which serves as means for both battery savings at the mobile as well as interference management in the network. Formally, in this work, we assume that the network manager optimizes its global utility by means of power control optimization. Let P be the (M×N) power control matrix whose element P(j,n) represents the power received from BS $j\in \mathcal{M}$ at PRB $n\in \mathcal{N}$. Given these optimized power levels P, mobile users choose the association actions that optimize their individual utilities. Notice that the maximization of the total throughput by the network manager is based on information sent by mobile users on interferences experienced from neighboring cells. We further assume that each base station can allocate a PRB to only one mobile user at a given time slot.
Hierarchical game formulation
where s^{NE} is a Nash equilibrium among K mobiles considering the strategy of the leader.
Let $\mathcal{S}=\{{\Omega}_{1}\times \xb7\xb7\xb7\times {\Omega}_{K}\}$ be the strategy space of our one shot game and s=(s_{ k },s_{ − k}) a strategy profile in the game.
for every r_{ k }∈Ω_{ k } and s_{−k}∈Ω_{−k} where Ω_{−k} = {Ω_{1} × ··· × Ω_{k−1} ×Ω_{k+1} ×··· × Ω_{ K }} is the joint feasible strategy space of all users but the k th one.
Learning for optimal decision
The interaction between the leader and the followers provides a potential incentive for both agents to make a decision process based on their respective perceived payoff. This section focuses on how to reach the Stackelberg equilibrium for both the leader and the followers. To accomplish the task of global optimization problem, a twostage optimization algorithm is proposed. One difficulty in our context is that mobile users do not know the payoffs (thus the strategy) of each other at each stage. Thus, the environment of each mobile user, including its opponents, is dynamic and may not insure convergence of the algorithm. In [11], authors develop a a NashStackelberg fuzzy Qlearning in a heterogeneous cognitive network. As an alternative way, we adopt a hierarchical algorithm. The proposed approach requires neighboring base stations to exchange load (or interference) information experienced at user level on regular intervals. Consequently, the hierarchical algorithm is performed based on a coordination on both local (user level) and global scope (network level), which could scale accordingly.
As far as the twostage learning algorithm is concerned, this can be conducted in the following steps: First, every user reports to its serving base station the experienced interference from neighboring cells. Then, the interference information is exchanged between base stations over the X2 interface while trying to optimize the global network utility by means of power control. Based on these power levels (broadcasted by BSs), each user checks distributively whether the serving BS is still the best choice according to its utility. Otherwise, it can perform a handover to the other RANs after checking that it could be admitted on it. As a result, this approach tends to substantially reduce signaling overhead from the base stations.
Leader: gradient computation mechanism
 (1)
The UE collects interference information from each neighbor over every allocated RBs and forwards the obtained vector to its serving eNodeB.
 (2)
Once information is received at the eNodeB from all the attached users, an interference matrix is built that contains interference information from each neighbor (M neighbors on the figure).
 (3)
Interference load is then put on format for each neighbor.
 (4)
Finally, each neighbor receives on the X2 interface the interference information concerning all the covered mobiles.
The pseudocode for the proposed gradient descent approach is given in Algorithm 1.
Algorithm 1
Note that the implementation of gradientlike algorithms is familiar in optimization problems. The convergence of such algorithms has been shown in [12], under some specific conditions such that the derivative of objective function is Lipschitz continuous which is satisfied here, and for an accurate choice of γ_{ t }.
Proposition 1
Proposition 1. The derivative of our utility function is Lipshitz continuous.
Proof.
This ends the proof. □
In our computations, we use the implementation of WOLFE linear search to find an appropriate value of γ_{ t } at each iteration.
On another hand, at each iteration, the gradient algorithm delivers the values of the power vector for each base station that can go out of the bounds of the allowed space. To handle this problem, we implement a computational mechanism to satisfy the power constraint in (5). We define the constraint c (P (j,n),P_{max}) to relax the problem, where c is built as follows:Define$\phantom{\rule{2.77626pt}{0ex}}\xi =\left\{n\in \mathcal{N}\phantom{\rule{2.77626pt}{0ex}}\mathrm{s.t.}\phantom{\rule{2.77626pt}{0ex}}P(j,n)>{p}_{\mathit{\text{th}}}\right\}\phantom{\rule{2.77626pt}{0ex}}$, where p_{ t h } is a threshold value for every PRB $n\phantom{\rule{2.77626pt}{0ex}}\mathrm{s.t.}\phantom{\rule{2.77626pt}{0ex}}\sum _{n}{p}_{\mathit{\text{th}}}={P}_{\text{max}}$. Let $\delta =\left({P}_{\text{max}}\sum _{\mathcal{N}\backslash \xi}P(j,n)\sum _{\xi}{p}_{\mathit{\text{th}}}\right);\phantom{\rule{2.77626pt}{0ex}}\text{if}\phantom{\rule{2.77626pt}{0ex}}\exists \phantom{\rule{0.3em}{0ex}}k\in \xi \phantom{\rule{2.77626pt}{0ex}}\text{and}\phantom{\rule{2.77626pt}{0ex}}\sum _{n}P(j,n)\ge {P}_{\text{max}}$, set the values of each P (j,n), using the projection $\stackrel{\u0304}{P}(j,n)=min\left(\frac{\delta}{\left\xi \right}+{p}_{\mathit{\text{th}}},\phantom{\rule{2.77626pt}{0ex}}P(j,n)\right)$. It is to say that the remaining power on each BS power budget, if any, is evenly shared among the channels requiring a power level above the threshold value.
Followers: pursuit algorithm
At the user level of our Stakelberg framework, we use the pursuit algorithm as a tool to allow users to reach iteratively and individually a Nash equilibrium. The pursuit algorithm is a distributed association algorithm proposed in [13] allowing each individual in a set of players to select a given strategy, among several others, that will best maximize its utility within a limited number of iterations.
Algorithm 2
It has been proven in [13] that the pursuit algorithm always converges under some specific conditions on the step size parameter. They show that when the step size parameter is very small, the game converges to a stable equilibrium for the learning automata game. This algorithm has the property to converge to an extremum of the game when there exists a pure equilibrium. To reach mixed equilibrium, the authors in [14] present a distributed algorithm that can be used in such situations. However, mixed equilibria are not efficient in our context since it will lead mobile users to process continuously handovers between base stations. To avoid mixed equilibria, we introduce a cost of handovers in the utility function to give more incentive to mobile users in reaching pure equilibria.
Discussion on cost of handover
Implementation and validation
Simulations settings
Parameter descriptions  Values 

Number of cells  7 
Number of PRBs per cell  10 
Number of users per cell  4 
Outer radius of hexagonal cells  200m 
Distance to insure target SINR  300m 
Fairness parameter  α=1 
Iterations scale^{a}  1 for 30 
The iteration scale parameter in Table 1 traduces how frequently BSs update the gradient algorithm and set new values of powers. By tunning this parameter, one can control the amount of signalization between BSs. We consider in our simulations that users run 30 iterations of the association algorithm for 1 iteration of the gradient. We first build the framework for a fairness parameter α = 1 which represents the proportional fairness algorithm and then extended it to different values of α.
Dynamic fractional frequency reuse
Utility maximization
Fairness issues
Robustness and scalability
For the first (Figure 10), second (Figure 11), and third (Figure 12) scenarios, we consider that each BS exchanges data, respectively, with the firstring neighbors, only with the two first closest neighbors, and finally with all the interfering neighbors from the two considered rings. By comparison of the different scenarios, it is observed that more exchanged information lead, as one can intuitively expect, to an increased outcome in utility. However, although this can be imputed to randomness, when comparing Figures 10 and 12, we see that the system stability is not necessarily insured by an increase of exchanged information rate. On another hand, even with a very few amount of exchanged information, our algorithm preserves a convergence to 97% of the exhaustive association search utility. From the same figures, we address the scalability of our algorithms, with the introduction of a burst of new arrivals in the system at iteration 200. Although this event in not clearly captured in the case of Figure 11, when less information is exchanged, we can observe from Figures 10 and 12 that our mechanism adapts very fast to the system evolution in order to reach the new point of convergence.
Conclusion
In this paper, we have investigated the idea of a hierarchical learning game for fractional frequency reuse in an OFDMA network. In this framework, both the network manager and mobile users learn to reach an equilibrium that optimizes the global network utility while ensuring individual utility optimization for mobile users. We have first proposed formally a game model to define how the network manager and mobile users can obtain their respective equilibria by means of a Stackelberg formulation. Then, we have presented a twostage learning algorithm for finding a Stackelberg equilibrium and the corresponding mobiles’ association strategies. Practical directions for implementability of our solution are also presented. We have shown using several numerical examples the efficiency of the obtained equilibrium compared to the exhaustive (optimal) solution and a fixed full frequency reuse pattern. In particular, in the case of proportional fair policy, the proposed FFR approach offers approximately 90% of the optimal association policy and 40% of gain with respect to the fixed full reuse. Indeed, for implementation purposes and in order to adapt to the dynamic of the mobile environment, the number of iterations before convergence should remain in the order of a few tens. Eventually, we have addressed interesting issues such as fairness, robustness, and scalability and offered insights into how to design such scenario in a wireless network environment.
Endnotes
^{a} The work reported herein was partially supported by the projects Ecocells.
^{b} Though some users stay silent, they may be active during the next scheduling period.
Declarations
Authors’ Affiliations
References
 3rd Generation Partnership Project (3GPP): in TR 36.902,. Evolved Universal Terrestrial Radio Access Network (EUTRAN), Selfconfiguring and selfoptimizing network (SON) use cases and solutions Dec 2008 . Accessed May 2011 http://www.etsi.org/deliver/etsi_tr/136900_136999/136902/09.03.01_60/tr_136902v090301p.pdfGoogle Scholar
 Stolyar AL, Viswanathan H: Selforganizing dynamic fractional frequency reuse for besteffort traffic through distributed intercell coordination. In Proc. IEEE INFOCOM. Rio de Janeiro; April 2009:1925.Google Scholar
 NGMN Alliance: NGMN Recommendation on SON and O&M Requirements, Frankfurt. In Edited by: Lehser F, Lehser F . 5 Dec 2008.http://www.ngmn.org/uploads/media/NGMN_Recommendation_on_SON_and_O_M_Requirements.pdf Accessed 23 April 2013Google Scholar
 Combes R, Altman Z, Haddad M, Altman E: Selfoptimizing strategies for interference coordination in OFDMA networks. In IEEE ICC 2011 Workshop on Planning and Optimization of Wireless Communication Networks. Kyoto, Japan; June 2011:59.Google Scholar
 Giuliano R, Monti C, Loreti P: WiMAX fractional frequency reuse for rural environments. In IEEE ICC 2011 Workshop on Planning and Optimization of Wireless Communication Networks. Washington: IEEE; 2008:6065.Google Scholar
 Son K, Chong S, de Veciana G: Dynamic association for load balancing and interference avoidance in multicell networks. In IEEE ICC 2011 Workshop on Planning and Optimization of Wireless Communication Networks. Washington: IEEE; 2009:35663576.Google Scholar
 R1050507: ‘Soft Frequency Reuse Scheme for UTRAN LTE’. In Huawei, 3GPP TSG RAN WG1 Meeting. Athens, Greece; 2005:913.Google Scholar
 Fudenberg D, Tirole J: Game Theory. Cambridge: MIT Press; 1991.Google Scholar
 Sidi HBA, ElAzouzi R, Haddad M: Fractional frequency reuse stackelberg model for selforganizing networks. In Wireless Days (WD), 2011 IFIP. Ontario, Canada; 2011:16. http://dx.doi.org/10.1109/WD.2011.6098171View ArticleGoogle Scholar
 Mo J, Walrand J: Fair endtoend windowbased congestion control. IEEE/ACM Trans. Netw 2000, 8: 556567. http://dx.doi.org/10.1109/90.879343 10.1109/90.879343View ArticleGoogle Scholar
 Haddad M, Altman Z, Elayoubi SE, Altman E: A NashStackelberg Fuzzy QLearning Decision Approach in Heterogeneous Cognitive Networks. In Global Telecommunications Conference (GLOBECOM 2010), 2010 IEEE. Miami; 2010:16. http://dx.doi.org/10.1109/GLOCOM.2010.5684318View ArticleGoogle Scholar
 Bertsekasb DP, Tsitsiklis JN: Parallel and distributed computation: numerical methods. GLOBECOM Accessed 23 April 2013 http://dspace.mit.edu/handle/1721.1/3719
 Thathachar M, Sastry P: Network of Learning Automata: Techniques for Online Stochastic Optimization. In GLOBECOM. New York: Kluwer Academic; 2004.Google Scholar
 Xing Y, Chandramouli R: Stochastic learning solution for distributed discrete power control game in wireless data networks. IEEE/ACM Trans. Netw. 2008, 16(4):932944. http://doi.acm.org/10.1145/1453698.1453713View ArticleGoogle Scholar
 EUROCOST Std. 231: Urban transmission loss models for mobile radio in the 900 and 1800 MHz bands. In European Cooperation in the Field of Scientific and Technical Research. Luxembourg: Commission of European Communities; 1991.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.