A two-stage game theoretic approach for self-organizing networks

Sidi, Habib BA; El-Azouzi, Rachid; Haddad, Majed

doi:10.1186/1687-1499-2013-119

Research
Open access
Published: 02 May 2013

A two-stage game theoretic approach for self-organizing networks

Habib BA Sidi¹,
Rachid El-Azouzi¹ &
Majed Haddad¹

EURASIP Journal on Wireless Communications and Networking volume 2013, Article number: 119 (2013) Cite this article

2506 Accesses
Metrics details

Abstract

Growth of network access technologies in the mobile environment has raised several new issues due to the interference between the available access. Thanks to the currently used access methods such as the orthogonal frequency division multiple access in mobile networks and the long term evolution-advanced systems, the intra-cell interferences are avoided and the quality of service has increased. Nevertheless, the diversity and multiplicity of base stations in the network have left behind a major problem of inter-cell interferences. In this paper, we focus on the optimization of the total throughput of cellular networks using fractional frequency reuse and allowing each mobile user to individually choose its serving base station. We derive analytically the utilities related to the network manager and mobile users and develop a Stackelberg game to obtain the equilibrium. We propose a distributed algorithm that allows the base stations, using a light collaboration, to achieve an efficient utilization of the frequencies, with the optic of maximizing the total system utility. This algorithm is based on stochastic gradient descent which requires some information to be exchanged between neighboring base stations. At user association level, we propose an iterative distributed algorithm based on automata learning algorithm. Both algorithms allow the system to converge to the Stackelberg equilibrium. Furthermore, simulation results carried out based on a realistic network setting show promising results in terms of global utility and convergence issues. In this setting, we include scenarios with a varying number of users and address the problem of robustness and scalability of the proposed approach.

Introduction

Recently, the use of self-organizing network (SON) features in a framework of general policy management has been suggested. In such frameworks, SON entities are used as a means to enforce high-level operator policies, introduced in the management plane, and translated into low-level objectives guiding coordinated SON entities [1]. Among the most important self-optimization mechanisms in radio access networks (RAN) are interference coordination [2], mobility management, and energy saving [3]. Several such problems need further investigation to fully benefit from SON in RAN, in areas where little material has been published. Examples are autonomous cell outage management and coverage capacity optimization [4]. It is noted that the problem of coordinating simultaneous SON processes is an open and challenging problem that needs to be addressed in order to allow the deployment of SON mechanisms.

In this paper, we propose a self-optimization framework for inter-cell interference coordination in an orthogonal frequency division multiple access (OFDMA) network. Inter-cell interference can dramatically degrade cell performance and perceived quality of service (QoS), particularly at cell edge. We are interested in distributed solutions that can be implemented in a flat architecture (e.g., long term evolution (LTE)-Advanced architecture). To coordinate interference between neighboring cells, eNodeBs need to exchange information. In the case of LTE, for example, signaling between eNodeBs can be exchanged over the X2 interface (see Figure 1). Recent works such as fractional frequency reuse [5, 6] and soft frequency reuse [7] allowing users in different channel conditions to benefit from different reuse patterns have been proposed. Still, all of these schemes mentioned above are static interference management approaches, where a specific reuse pattern is predetermined a priori by a network operator at offline.

Specifically, we assume that the fractional frequency-reuse (FFR) of a cell can be configured dynamically. In that case, some base stations (BSs or eNodeBs) would be enabled to adjust their FFR in order to provide coverage/capacity for other neighboring cells. We further model the network behavior as a Stackelberg game between the network manager and the mobile users using the game theory framework [8].

At the core lies the idea that introducing a certain degree of hierarchy in non-cooperative games not only improves the individual efficiency of all users but can also be a way of reaching a desired trade-off between the global network performance at the equilibrium and the requested amount of signaling. The proposed approach can be seen as an intermediate scheme between the totally centralized policy and the non-cooperative policy. It is also quite relevant for flexible networks where the trend is to split the intelligence between the network infrastructure and mobile users’ equipments. In the Stackelberg game, the network manager is acting as the leader and mobile users as the followers. In the first stage, the leader chooses its strategy profile and announces it to the followers. Then, the followers decide their respective outcomes depending on the strategy profile of the leader. Under our scenario, the network manager maximizes the total network throughput by means of power control and announces its strategy profile to mobile users. Each mobile will decide individually to which of the available base stations it is best to connect according to its radio condition and the strategy profile broadcasted by the network.

We also propose a two-stage self-optimization algorithm for both the leader and the followers. The objective is to achieve dynamically an efficient frequency reuse pattern based on their past experience and their learning capabilities. The leader’s algorithm is based on stochastic gradient descent algorithm which requires some information to be exchanged between neighboring base stations. For user association, we propose an iterative distributed algorithm based on automata learning mechanisms. Both algorithms have been shown to converge to the Stackelberg equilibrium while providing substantial gain compared to optimal solution and fixed full reuse scheme.

The original contributions of our approach are threefold:

Investigating fractional frequency reuse technique for inter-cell interference coordination in an OFDMA network
Modeling the interaction between the network and mobiles using a Stakelberg game framework
Proposing a hierarchical algorithm that allows convergence towards the Stackelberg equilibrium

In comparison to our previous work [9] presented at Wireless Days 2011, this paper^a extends with richer developments the materials presented before. Especially, we further explore the case when the network environment is dynamic. By dynamic, we mean that the number of users varies in time with mobiles arriving and departing the system. Through extensive simulations based on a realistic network setting, the proposed approach is shown to be robust and scalable. In this latter setting, we also give some insight on how to design a trade-off between the global network performance at the equilibrium and the requested amount of signaling. More clearly, the following contributions have been developed:

Addressing convergence and stability properties of our distributed mechanisms with an evaluation of the computational cost.
Exploring the robustness of the proposed approach with time-varying number of users, thus simulating a seamless dynamic environment.
Giving some insight on the ways of finding a desired trade-off between the desired global network performance and the amount of control feedback.
At the equilibrium, our mechanisms achieve up to 90% of the optimal association policy, with similar results with partial exchange of information or perturbed environment.

The paper is organized as follows: The system model is exposed in the ‘ The system model’ section. The ‘ Network resources’ section provides a description of the network scenario adopted throughout the paper. In the ‘ Hierarchical game formulation’ section, we present the game theoretic framework and propose formally how the network manager and mobile users can obtain their respective equilibria by means of a Stackelberg formulation. In the ‘Learning for optimal decision’ section, the proposed hierarchical algorithm is investigated for both the leader and the followers. In the ‘ Implementation and validation’ section, simulation results under realistic wireless network settings are shown to exhibit interesting features in terms of self-optimizing deployment for inter-cell interference coordination. The ‘ Conclusion’ section concludes the paper.

Scenario description

The system model

Consider the downlink of a multi-cell system, operating in an OFDMA context giving rise to an inter-cell interference phenomenon. Power control is used by the base stations in an effort to preserve power and to limit interference and fading effects. With the same goal of maximizing their payoff, mobile users try to connect to the best serving cell. Specifically, we consider $M = {1, \dots, M}$ as the set of all possible serving base stations (or cells) within the network and $K = {1, \dots, K}$ as a set of K mobile users randomly distributed over the network. Each cell operates in a multi-band context with N physical resource blocks (PRB). Let $N = {1, \dots, N}$ be the set of N PRBs per cell. Mobile users strategies s_k are the choice of a PRB n at a given BS j, i.e., s_k = (j,n). Hence, the signal received by a mobile user k using strategy s_k depends not only on the BS transmit power but also on the interferences introduced by the other cells. The signal-to-interference-plus-noise ratio (SINR) measured at the user k associated with BS j can be expressed for all $j \in M$ and $k \in K$ as follows:

{SINR}_{j, k} = \frac{h_{j, k} \cdot P (s_{k})}{σ^{2} + \sum_{\begin{matrix} l \neq k \end{matrix}} h_{j^{'}, k} \cdot P (s_{l}) \cdot f (s_{k}, s_{l})},

(1)

where h_j,k is the block fading process measured at user k associated with BS j, P(s_l) is the power received from BS i at PRB n^′ for mobile user l with s_l=(j^′,n^′), and σ² is the noise variance. The interference function f(s_k,s_l) is defined as follows:

f (s_{k}, s_{l}) = \{\begin{array}{l} 1, if n = n^{'} \\ 0, otherwise . \end{array}

(2)

Network resources

A key example of dynamic resource allocation is that of power control, which serves as means for both battery savings at the mobile as well as interference management in the network. Formally, in this work, we assume that the network manager optimizes its global utility by means of power control optimization. Let P be the (M×N) power control matrix whose element P(j,n) represents the power received from BS $j \in M$ at PRB $n \in N$ . Given these optimized power levels P, mobile users choose the association actions that optimize their individual utilities. Notice that the maximization of the total throughput by the network manager is based on information sent by mobile users on interferences experienced from neighboring cells. We further assume that each base station can allocate a PRB to only one mobile user at a given time slot.

Hierarchical game formulation

We make use of a hierarchical equilibrium solution concept, i.e., the Stackelberg game, where the network manager is acting as the leader and mobile users are the followers. In view of maximizing its utility, the leader enforces its strategy on the followers that react rationally to this enforcement. A mobile user can decide to either transmit data or stay silent depending on its utility^b We assume that each mobile k has a target SINR noted by η_k which reflects its required QoS. Let $H_{k} \subset M$ be the set of base stations within a radius of r from user k such that $r \leq {(\frac{P_{max}}{η_{k} \cdot σ^{2}})}^{1 / β}$ where β is the path loss coefficient and P_max is the maximum power at each base station. The motivation behind doing so is that, for computation purpose, one may only consider the subset $H_{k}$ rather than the original set $M$ . Let w_k be the user’s strategy when the user decides to stay silent on that specific slot. Hence, the set of mobile user actions is $Ω_{k} = H_{k} \times N \cup w_{k}$ . Mobile user utility function for each choice of strategy is the following:

v_{k} (s_{k}) = \{\begin{array}{l} (R_{j, k} + ϵ) {1 I}_{\{{SINR}_{j, k} > η_{k}\}} - ϵ, & if s_{k} \neq w_{k} \\ 0, & otherwise \end{array}

(3)

where ϵ is a small positive value and R_j,k = log(1 + SINR_j,k) is the throughput of user k associated to BS j. This means that if a mobile user decides to transmit, it obtains either a utility equal to its transmission rate (R_j,k) or a negative utility (−ϵ) depending on its SINR. Otherwise, the mobile user decides to stay silent (v(s_k) = 0). As a result, this tends to lead users who do not contribute enough utility to outweigh the interference degradation and remain silent. In order to provide a right balance between efficiency and fairness between cells, one possible remedy would be to use the so-called α-fairness [10]. This guarantees that any point in which one BS is shut down cannot be a local maxima. The global utility can be expressed as follows:

U = \{\begin{array}{l} \frac{1}{1 - α} \sum_{j} U_{j}^{1 - α}, if α \neq 1 \\ \sum_{j} log (U_{j}), if α = 1 \end{array}

(4)

where $U_{j} = \sum_{k} R_{j, k}$ and α is the fairness parameter. The network manager is assumed to perfectly know the set of strategies and the utilities of the K mobile users. Similarly, it is guaranteed under this setting that the followers can observe the actions of the leader through the broadcast channel. Accordingly, the Stackelberg game can be formulated as follows:

\begin{array}{l} P^{SE} = arg \max_{P} U (P (s^{NE})) \\ s.t. \sum_{n = 1}^{N} P (j, n) \leq P_{max}; \forall j \in M \end{array}

(5)

where s^NE is a Nash equilibrium among K mobiles considering the strategy of the leader.

Let $S = {Ω_{1} \times \cdot \cdot \cdot \times Ω_{K}}$ be the strategy space of our one shot game and s=(s_k,s_− k) a strategy profile in the game.

Mathematically, the Nash equilibrium can be expressed by the following inequality for all association strategies $s \in S$ :

v_{k} (s_{k}, s_{- k}) \geq v_{k} (r_{k}, s_{- k}); \forall k = 1, \dots, K

(6)

for every r_k∈Ω_k and s_−k∈Ω_−k where Ω_−k = {Ω₁ × ··· × Ω_k−1 ×Ω_k+1 ×··· × Ω_K} is the joint feasible strategy space of all users but the k th one.

Learning for optimal decision

The interaction between the leader and the followers provides a potential incentive for both agents to make a decision process based on their respective perceived payoff. This section focuses on how to reach the Stackelberg equilibrium for both the leader and the followers. To accomplish the task of global optimization problem, a two-stage optimization algorithm is proposed. One difficulty in our context is that mobile users do not know the payoffs (thus the strategy) of each other at each stage. Thus, the environment of each mobile user, including its opponents, is dynamic and may not insure convergence of the algorithm. In [11], authors develop a a Nash-Stackelberg fuzzy Q-learning in a heterogeneous cognitive network. As an alternative way, we adopt a hierarchical algorithm. The proposed approach requires neighboring base stations to exchange load (or interference) information experienced at user level on regular intervals. Consequently, the hierarchical algorithm is performed based on a coordination on both local (user level) and global scope (network level), which could scale accordingly.

As far as the two-stage learning algorithm is concerned, this can be conducted in the following steps: First, every user reports to its serving base station the experienced interference from neighboring cells. Then, the interference information is exchanged between base stations over the X2 interface while trying to optimize the global network utility by means of power control. Based on these power levels (broadcasted by BSs), each user checks distributively whether the serving BS is still the best choice according to its utility. Otherwise, it can perform a handover to the other RANs after checking that it could be admitted on it. As a result, this approach tends to substantially reduce signaling overhead from the base stations.

Leader: gradient computation mechanism

In this section, we propose an operational way of computing the derivative of the global utility in a distributive fashion. At each time epoch, consider that each mobile user k reports to its base station j the matrix $(b_{k}^{1 j}, \dots, b_{k}^{Mj})$ , where $b_{k}^{mj} = (P (m, n^{'}) \cdot h_{m, k}, n^{'} \in N)$ is the vector of interferences perceived by user k from base station m or its signal strength on sub-band n when m = j. In the scope of this paper, and without lost of generality, we consider that only one user can interfere per base station with another user from a neighboring base station if they use corresponding channels. Hence, adjacent channel interferences are not included, and we assume that users are not allocated more than one PRB at the time. The vector $b_{k}^{mj}$ restricts then to a single interference value for each m. Base station j will then be able to build the hypermatrix $B = (\begin{array}{l} b_{1}^{1 j} & . & . & . & b_{K}^{1 j} \\ . & . & . \\ . & . & . \\ . & . & . \\ b_{1}^{Mj} & . & . & . & b_{K}^{Mj} \end{array})$ and send the matrices $a^{j} = (a_{k}^{j}$ , k=1,…,K), with $a_{k}^{j} = \sum_{m} b_{k}^{mj}$ and $b^{j} = (\begin{array}{l} b_{1}^{j^{'} j}, & \dots & , & b_{K}^{j^{'} j} \\ b_{1}^{jj}, & \dots & , & b_{K}^{jj} \end{array})$ to each base station j^′. To illustrate our mechanism, we depicted in Figure 2 the exchange flow of information. The exchange of information works as follows:

(1)
The UE collects interference information from each neighbor over every allocated RBs and forwards the obtained vector to its serving eNodeB.
(2)
Once information is received at the eNodeB from all the attached users, an interference matrix is built that contains interference information from each neighbor (M neighbors on the figure).
(3)
Interference load is then put on format for each neighbor.
(4)
Finally, each neighbor receives on the X2 interface the interference information concerning all the covered mobiles.

The derivative of the utility from base station j computed here below is then obtained in base station j^′ for sub-band n by

\{\begin{array}{l} \frac{\partial U_{j} (P (j^{'}, n))}{∂P (j^{'}, n)} & = \underset{i = 1}{\sum^{K}} (\frac{b_{i}^{j^{'} j} / P (j^{'}, n)}{σ^{2} + a_{i}^{j}} - \frac{b_{i}^{j^{'} j} / P (j^{'}, n)}{σ^{2} + a_{i}^{j} - b_{i}^{jj}}) and \\ \frac{\partial U_{j^{'}} (P (j^{'}, n))}{∂P (j^{'}, n)} & = \underset{i = 1}{\sum^{K}} (\frac{h_{j^{'}, k}}{σ^{2} + a_{i}^{j^{'}}}) . \end{array}

(7)

So far, $\frac{dU}{∂P (j^{'}, n)} = U_{j^{'}}^{- α} \sum_{j} \frac{\partial U_{j}}{∂P (j^{'}, n)}$ . We then need to express $\frac{\partial U_{j}}{∂P (j^{'}, n)}$ for all j, assuming we are considering cell j^′. Because the derivative goes the same for every sub-band, we will focus only on one particular sub-band n. It can be easily shown that $\frac{\partial U_{j}}{∂P (j^{'}, n)}$ is given by

\{\begin{array}{l} (\frac{h_{j^{'}, k}}{σ^{2} x + \sum_{m} P (m, n) \cdot h_{j, m}} - \frac{h_{j^{'}, k}}{σ^{2} + \sum_{m \neq j} P (m, n) \cdot h_{j, m}}); & if j \neq j^{'} \\ (\frac{h_{j, k}}{σ^{2} + \sum_{m} P (m, n) \cdot h_{j, m}}); & if j = j^{'} . \end{array}

The pseudo-code for the proposed gradient descent approach is given in Algorithm 1.

Algorithm 1

Note that the implementation of gradient-like algorithms is familiar in optimization problems. The convergence of such algorithms has been shown in [12], under some specific conditions such that the derivative of objective function is Lipschitz continuous which is satisfied here, and for an accurate choice of γ_t.

Proposition 1

Proposition 1. The derivative of our utility function is Lipshitz continuous.

Proof.

If s_k≠w_k, the utility function is given by

U = \{\begin{array}{l} - ϵ if {SINR}_{jk} > η_{k} \\ R_{jk} + ϵ otherwise. \end{array}

Again, SINR_{j
k}> η _k ⇒ U = R_{j
k} + ϵ. Let us show that ∇U is Lipschitz continous. We have $\frac{\partial U_{j}}{∂P (j^{'}, n)} =$

\{\begin{array}{l} \frac{h_{j^{'} k}}{σ^{2} + \sum_{m} P (m, n) h_{m, k}} - \frac{h_{j^{'} k}}{σ^{2} + \sum_{m \neq j} P (m, n) h_{m, k}} & if j \neq j^{'} \\ \frac{h_{jk}}{σ^{2} + \sum_{m} P (m, n) h_{m, k}} & if j = j^{'} \end{array} .

Then, for another value P₁ (j^′,n) of power level on channel P₂ (j^′,n), we obtain

\begin{array}{l} ∥ \frac{\partial U_{j}}{\partial P_{1} (j^{'}, n)} - \frac{\partial U_{j}}{\partial P_{2} (j^{'}, n)} ∥ \\ = \{\begin{array}{l} ∥ \frac{h_{j^{'} k}}{σ^{2} + \sum_{m \neq j^{'}} P (m, n) h_{m, k} + P (j, n) h_{j, k} + P_{1} (j^{'}, n) h_{j^{'}, k}} \\ - \frac{h_{j^{'} k}}{σ^{2} + \sum_{m \neq j, j^{'}} P (m, n) h_{m, k} + P_{1} (j^{'}, n) h_{j^{'}, k}} \\ - \frac{h_{j^{'} k}}{σ^{2} + \sum_{m \neq j^{'}} P (m, n) h_{m, k} + P (j, n) h_{j, k} + P_{2} (j^{'}, n) h_{j^{'}, k}} \\ + \frac{h_{j^{'} k}}{σ^{2} + \sum_{m \neq j, j^{'}} P (m, n) h_{m, k} + P_{2} (j^{'}, n) h_{j^{'}, k}} ∥ if j \neq j^{'} \\ ∥ \frac{h_{jk}}{σ^{2} + \sum_{m \neq j} P (m, n) h_{m, k} + P_{1} (j, n) h_{j, k}} \\ - \frac{h_{jk}}{σ^{2} + \sum_{m \neq j} P (m, n) h_{m, k} + P_{2} (j, n) h_{j, k}} ∥ if j = j^{'} \end{array} \end{array}

if j = j^′, then

\begin{array}{l} ∥ \frac{\partial U_{j}}{\partial P_{1} (j^{'}, n)} - \frac{\partial U_{j}}{\partial P_{2} (j^{'}, n)} & ∥ = ∥ \frac{P_{2} (j, n) - P_{1} (j, n)}{(C + P_{1} (j, n)) (C + P_{2} (j, n))} ∥; \\ C = σ^{2} / h_{j^{'}, k} + 1 / h_{j^{'}, k} \sum_{m \neq j^{'}} P (m, n) h_{m, k} \\ \leq ∥ \frac{1}{C^{2}} ∥ ∥ P_{2} (j, n) - P_{1} (j, n) ∥ . \end{array}

We have

\begin{array}{l} C & = & σ^{2} / h_{j, k} + 1 / h_{j, k} \sum_{m \neq j} \\ \times P (m, n) h_{m, k}, since h_{j, k} is an attenuation factor \\ \geq & σ^{2} + \sum_{m \neq j} P (m, n) h_{m, k} \\ \geq & σ^{2} \end{array}

so that

∥ \frac{\partial U_{j}}{\partial P_{1} (j^{'}, n)} - \frac{\partial U_{j}}{\partial P_{2} (j^{'}, n)} ∥ \leq \frac{1}{σ^{4}} ∥ P_{2} (j, n) - P_{1} (j, n) ∥

Similarly, if j ≠j^′, then

\begin{array}{l} ∥ \frac{\partial U_{j}}{\partial P_{1} (j^{'}, n)} - \frac{\partial U_{j}}{\partial P_{2} (j^{'}, n)} ∥ \\ = ∥ \frac{P_{2} (j^{'}, n) - P_{1} (j^{'}, n)}{(C + P_{1} (j^{'}, n) + \frac{P (j, n) h_{j, k}}{h_{j^{'}, k}}) (C + P_{2} (j^{'}, n) + \frac{P (j, n) h_{j, k}}{h_{j^{'}, k}})} \\ - \frac{P_{2} (j^{'}, n) - P_{1} (j^{'}, n)}{(C + P_{1} (j^{'}, n)) (C + P_{2} (j^{'}, n))} ∥ \\ = ∥ \frac{\frac{P (j, n) h_{j, k}}{h_{j^{'}, k}} [C + P_{1} (j^{'}, n) + P_{2} (j^{'}, n) + \frac{P (j, n) h_{j, k}}{h_{j^{'}, k}}]}{(C + P_{1} (j^{'}, n)) (C + P_{2} (j^{'}, n)) (C + P_{1} (j^{'}, n) + \frac{P (j, n) h_{j, k}}{h_{j^{'}, k}}) (C + P_{2} (j^{'}, n) + \frac{P (j, n) h_{j, k}}{h_{j^{'}, k}})} ∥ \\ \times ∥ P_{2} (j^{'}, n) - P_{1} (j^{'}, n) ∥ \\ \leq ∥ \frac{\frac{P (j, n) h_{j, k}}{h_{j^{'}, k}} [C + P_{1} (j^{'}, n) + P_{2} (j^{'}, n) + \frac{P (j, n) h_{j, k}}{h_{j^{'}, k}}]}{C^{4}} ∥ ∥ P_{2} (j^{'}, n) - P_{1} (j^{'}, n) ∥ \\ \leq ∥ \frac{P_{max} (C + 3 P_{max})}{C^{4}} ∥ ∥ P_{2} (j^{'}, n) - P_{1} (j^{'}, n) ∥ \\ \leq ∥ \frac{P_{max} (σ^{2} + (M + 2) P_{max})}{σ^{8}} ∥ ∥ P_{2} (j^{'}, n) - P_{1} (j^{'}, n) ∥ \end{array}

This ends the proof. □

In our computations, we use the implementation of WOLFE linear search to find an appropriate value of γ_t at each iteration.

On another hand, at each iteration, the gradient algorithm delivers the values of the power vector for each base station that can go out of the bounds of the allowed space. To handle this problem, we implement a computational mechanism to satisfy the power constraint in (5). We define the constraint c (P (j,n),P_max) to relax the problem, where c is built as follows:Define $ξ = \{n \in N s.t. P (j, n) > p_{th}\}$ , where p_{t
h} is a threshold value for every PRB $n s.t. \sum_{n} p_{th} = P_{max}$ . Let $δ = (P_{max} - \sum_{N \ ξ} P (j, n) - \sum_{ξ} p_{th}); if \exists k \in ξ and \sum_{n} P (j, n) \geq P_{max}$ , set the values of each P (j,n), using the projection $\bar{P} (j, n) = min (\frac{δ}{|ξ|} + p_{th}, P (j, n))$ . It is to say that the remaining power on each BS power budget, if any, is evenly shared among the channels requiring a power level above the threshold value.

Followers: pursuit algorithm

At the user level of our Stakelberg framework, we use the pursuit algorithm as a tool to allow users to reach iteratively and individually a Nash equilibrium. The pursuit algorithm is a distributed association algorithm proposed in [13] allowing each individual in a set of players to select a given strategy, among several others, that will best maximize its utility within a limited number of iterations.

Algorithm 2

It has been proven in [13] that the pursuit algorithm always converges under some specific conditions on the step size parameter. They show that when the step size parameter is very small, the game converges to a stable equilibrium for the learning automata game. This algorithm has the property to converge to an extremum of the game when there exists a pure equilibrium. To reach mixed equilibrium, the authors in [14] present a distributed algorithm that can be used in such situations. However, mixed equilibria are not efficient in our context since it will lead mobile users to process continuously handovers between base stations. To avoid mixed equilibria, we introduce a cost of handovers in the utility function to give more incentive to mobile users in reaching pure equilibria.

Discussion on cost of handover

As stated in the previous paragraph, a major weakness of the learning algorithm in mobile networks is the number of handovers, especially when the algorithm converges to mixed equilibria. We try to tackle this issue by introducing a cost of handover as a reward in the utlity function to users who are not operating handovers. User utility function is given by, v_k(s_k)=

\{\begin{array}{l} (R_{j, k} + ϵ) {1 I}_{\{{SINR}_{j, k} > η_{k}\}} - ϵ + α_{h}, if s_{k} \neq w_{k} \\ 0, otherwise \end{array}

(8)

where α_h = β_h (1−1 I_{handoff}) and β_h is a small positive value. In Figure 3, we compare the number of handovers with and without the defined handover control. On the figure, we can see that the handover control policy decreases considerably the number of handovers and that the system remains stable after a few iterations. Interestingly, we noticed in our simulations that users are more motivated in following the handover control when the control is a reward rather than a penalty as we suggested in the first place. A trade-off on using such control can be seen at the utility side. As shown in Figure 4, the gap in utility between the two policies can be marginal.

Implementation and validation

To go further with the analysis, we resort to realistic network simulations. We consider a cellular radio network as described in Figure 1 where users are attempting to communicate during a downlink transmission, subject to mutual inter-cell interferences. Specifically, a hexagonal cellular system functioning at 1.8 GHz where the cell radius is equal to R=200 m is considered. Note that this radius only stands for geographical positions in the network. It does not prevent users located out of this area to connect with another base station in case of significant connection opportunity. Channel gains are based on the COST-231 path loss model [15] including log-normal shadowing with a standard deviation of 10 dB, plus fast fading assumed to be i.i.d. circularly symmetric with distribution $C N (0, 1)$ . The peak power constraint is given by P_max=100 mW. We evaluate under those settings the joint processing of the gradient descent algorithm with the pursuit algorithm. Without loss of generality, we assume that every cell has the same number of users randomly positioned inside the cell. We consider a cluster of seven interfering cells, featured with ten PRBs each. The values of the other parameters are set in Table 1.

Table 1 Simulations settings

Full size table

The iteration scale parameter in Table 1 traduces how frequently BSs update the gradient algorithm and set new values of powers. By tunning this parameter, one can control the amount of signalization between BSs. We consider in our simulations that users run 30 iterations of the association algorithm for 1 iteration of the gradient. We first build the framework for a fairness parameter α = 1 which represents the proportional fairness algorithm and then extended it to different values of α.

Dynamic fractional frequency reuse

In Figure 5, we illustrate the snapshot of the dynamic fractional frequency reuse pattern at the equilibrium. The small colored disks indicate the positions of users inside the cells, and the face colors are the frequencies used by those users. Disks are indexed with a couple of values (BS, power) where the first value represents the base station to which this user is connected and the second the power level assigned by the base station on that frequency. As expected, users close to each other are attributed different frequencies, and power levels are set accordingly to avoid a high level of interferences. From the same figure, we also have an overview on user-network association. Indeed, many cases appear where users would rather associate in a neighboring cell rather than in the cell where they are positioned due to the influence of path loss and/or interference impairments. For instance, in Figure 5, the user indexed (2,7.5) in cell 2 is connected to BS 2 and is assigned frequency F 2 with a high power level. This reflects the maximization goal of the gradient algorithm since frequency F 2 is reused only once by a user far away in cell 7 at a low power level.

Utility maximization

In Figure 6, we compare the proposed FFR algorithm with traditional fixed reuse patterns, namely, the full reuse. The exhaustive search algorithm, shown as a dashed line, considers all possible combinations of PRB selection given the power level of the gradient algorithm. This will thus serve as an optimal association solution for users and will demonstrate just how much gain may theoretically be exploited through the pursuit algorithm. It clearly appears that the joint gradient and pursuit algorithms perform better than the full reuse and reduce considerably the gap with the exhaustive search. As shown in Figure 6, we reach up to 90% of the overall network throughput compared to the exhaustive (optimal) association search.

Fairness issues

In this section, we intend to show the impact of fairness on the global utility maximization by simulating different values of the α-fairness parameter. For α=0 (the maximum throughput algorithm), we can see from Figure 7 that some BSs (BS 6 for instance) are set to idle. Several other channels in the network are also switched off, while a few number of users are attributed very high levels of power. This behavior was somehow expected since using α=0 means that the major goal of the network is to maximize the overall network utility. Nevertheless, as shown in Figure 8, this policy does not help to tighten the gap to the exhaustive association search (71%) as much as using a value of α=1.

Further analysis of the max-min fairness policy (α → ∞) shows that most of the BSs are set to idle, and only a few channels are activated. Being too fair leads then the network to follow the policy of highly loaded BSs, thus providing an overall network utility which is almost null. Finally, we plot in Figure 9 the network block call rate (BCR) for the increasing number of iterations obtained when users follow the strategy corresponding to the Stackelberg equilibrium. We can observe that the BCR can be substantially reduced as the number of iteration increases. Moreover, the fairness policy has a negligible influence on the BCR which remains less than 10% for the different fairness policies.

Robustness and scalability

Next, we evaluate in this section a seamless adaptation of our algorithms to a dynamic environment. We simulate a discrete time system over several iterations and generate a burst of user arrival at a specific time instant during the simulation time. Indeed, while new arrivals generally occur every minute in the cellular systems, our association algorithm converges at the order of a few milliseconds. This speed of reactivity and adaptation shows improved performances of our hierarchical algorithm and is traduced in Figures 10, 11, and 12. We consider two rings of a small cell network where each base station is featured with four PRBs and contains three users each, randomly positioned inside the cell at the beginning of the simulation. In this new setting, we assume that the algorithm iteration scale is 1 for 100. For each of the simulated schemes, we assume that each BS can exchange interference information only with a subset of all the interfering neighbors. This consideration helps to understand how the proposed scheme reacts when the amount of information exchanged between BSs is limited.

For the first (Figure 10), second (Figure 11), and third (Figure 12) scenarios, we consider that each BS exchanges data, respectively, with the first-ring neighbors, only with the two first closest neighbors, and finally with all the interfering neighbors from the two considered rings. By comparison of the different scenarios, it is observed that more exchanged information lead, as one can intuitively expect, to an increased outcome in utility. However, although this can be imputed to randomness, when comparing Figures 10 and 12, we see that the system stability is not necessarily insured by an increase of exchanged information rate. On another hand, even with a very few amount of exchanged information, our algorithm preserves a convergence to 97% of the exhaustive association search utility. From the same figures, we address the scalability of our algorithms, with the introduction of a burst of new arrivals in the system at iteration 200. Although this event in not clearly captured in the case of Figure 11, when less information is exchanged, we can observe from Figures 10 and 12 that our mechanism adapts very fast to the system evolution in order to reach the new point of convergence.

Conclusion

In this paper, we have investigated the idea of a hierarchical learning game for fractional frequency reuse in an OFDMA network. In this framework, both the network manager and mobile users learn to reach an equilibrium that optimizes the global network utility while ensuring individual utility optimization for mobile users. We have first proposed formally a game model to define how the network manager and mobile users can obtain their respective equilibria by means of a Stackelberg formulation. Then, we have presented a two-stage learning algorithm for finding a Stackelberg equilibrium and the corresponding mobiles’ association strategies. Practical directions for implementability of our solution are also presented. We have shown using several numerical examples the efficiency of the obtained equilibrium compared to the exhaustive (optimal) solution and a fixed full frequency reuse pattern. In particular, in the case of proportional fair policy, the proposed FFR approach offers approximately 90% of the optimal association policy and 40% of gain with respect to the fixed full reuse. Indeed, for implementation purposes and in order to adapt to the dynamic of the mobile environment, the number of iterations before convergence should remain in the order of a few tens. Eventually, we have addressed interesting issues such as fairness, robustness, and scalability and offered insights into how to design such scenario in a wireless network environment.

Endnotes

^a The work reported herein was partially supported by the projects Ecocells.

^b Though some users stay silent, they may be active during the next scheduling period.

References

3rd Generation Partnership Project (3GPP): in TR 36.902,. Evolved Universal Terrestrial Radio Access Network (EUTRAN), Self-configuring and self-optimizing network (SON) use cases and solutions Dec 2008 . Accessed May 2011 http://www.etsi.org/deliver/etsi_tr/136900_136999/136902/09.03.01_60/tr_136902v090301p.pdf
Google Scholar
Stolyar AL, Viswanathan H: Self-organizing dynamic fractional frequency reuse for best-effort traffic through distributed inter-cell coordination. In Proc. IEEE INFOCOM. Rio de Janeiro; April 2009:19-25.
Google Scholar
NGMN Alliance: NGMN Recommendation on SON and O&M Requirements, Frankfurt. In Edited by: Lehser F, Lehser F . 5 Dec 2008.http://www.ngmn.org/uploads/media/NGMN_Recommendation_on_SON_and_O_M_Requirements.pdf Accessed 23 April 2013
Google Scholar
Combes R, Altman Z, Haddad M, Altman E: Self-optimizing strategies for interference coordination in OFDMA networks. In IEEE ICC 2011 Workshop on Planning and Optimization of Wireless Communication Networks. Kyoto, Japan; June 2011:5-9.
Google Scholar
Giuliano R, Monti C, Loreti P: WiMAX fractional frequency reuse for rural environments. In IEEE ICC 2011 Workshop on Planning and Optimization of Wireless Communication Networks. Washington: IEEE; 2008:60-65.
Google Scholar
Son K, Chong S, de Veciana G: Dynamic association for load balancing and interference avoidance in multi-cell networks. In IEEE ICC 2011 Workshop on Planning and Optimization of Wireless Communication Networks. Washington: IEEE; 2009:3566-3576.
Google Scholar
R1-050507: ‘Soft Frequency Reuse Scheme for UTRAN LTE’. In Huawei, 3GPP TSG RAN WG1 Meeting. Athens, Greece; 2005:9-13.
Google Scholar
Fudenberg D, Tirole J: Game Theory. Cambridge: MIT Press; 1991.
Google Scholar
Sidi HBA, El-Azouzi R, Haddad M: Fractional frequency reuse stackelberg model for self-organizing networks. In Wireless Days (WD), 2011 IFIP. Ontario, Canada; 2011:1-6. http://dx.doi.org/10.1109/WD.2011.6098171
Chapter Google Scholar
Mo J, Walrand J: Fair end-to-end window-based congestion control. IEEE/ACM Trans. Netw 2000, 8: 556-567. http://dx.doi.org/10.1109/90.879343 10.1109/90.879343
Article Google Scholar
Haddad M, Altman Z, Elayoubi SE, Altman E: A Nash-Stackelberg Fuzzy Q-Learning Decision Approach in Heterogeneous Cognitive Networks. In Global Telecommunications Conference (GLOBECOM 2010), 2010 IEEE. Miami; 2010:1-6. http://dx.doi.org/10.1109/GLOCOM.2010.5684318
Chapter Google Scholar
Bertsekasb DP, Tsitsiklis JN: Parallel and distributed computation: numerical methods. GLOBECOM Accessed 23 April 2013 http://dspace.mit.edu/handle/1721.1/3719
Thathachar M, Sastry P: Network of Learning Automata: Techniques for Online Stochastic Optimization. In GLOBECOM. New York: Kluwer Academic; 2004.
Google Scholar
Xing Y, Chandramouli R: Stochastic learning solution for distributed discrete power control game in wireless data networks. IEEE/ACM Trans. Netw. 2008, 16(4):932-944. http://doi.acm.org/10.1145/1453698.1453713
Article Google Scholar
EURO-COST Std. 231: Urban transmission loss models for mobile radio in the 900 and 1800 MHz bands. In European Cooperation in the Field of Scientific and Technical Research. Luxembourg: Commission of European Communities; 1991.
Google Scholar

Download references

Author information

Authors and Affiliations

CERI/LIA, University of Avignon, Agroparc BP 1228, Avignon, 84911, France
Habib BA Sidi, Rachid El-Azouzi & Majed Haddad

Authors

Habib BA Sidi
View author publications
You can also search for this author in PubMed Google Scholar
Rachid El-Azouzi
View author publications
You can also search for this author in PubMed Google Scholar
Majed Haddad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Habib BA Sidi.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Authors’ original file for figure 12

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Sidi, H.B., El-Azouzi, R. & Haddad, M. A two-stage game theoretic approach for self-organizing networks. J Wireless Com Network 2013, 119 (2013). https://doi.org/10.1186/1687-1499-2013-119

Download citation

Received: 10 August 2012
Accepted: 15 March 2013
Published: 02 May 2013
DOI: https://doi.org/10.1186/1687-1499-2013-119

A two-stage game theoretic approach for self-organizing networks

Abstract

Introduction

Scenario description

The system model

Network resources

Hierarchical game formulation

Learning for optimal decision

Leader: gradient computation mechanism

Algorithm 1

Proposition 1

Proof.

Followers: pursuit algorithm

Algorithm 2

Discussion on cost of handover

Implementation and validation

Dynamic fractional frequency reuse

Utility maximization

Fairness issues

Robustness and scalability

Conclusion

Endnotes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords