Subcarrier allocation based on correlated equilibrium in multi-cell OFDMA systems

In uplink orthogonal frequency division multiple access (OFDMA) systems, efficient resource allocation can greatly improve system performance. In this article, taking throughput, inter-cell interference and complexity into account, we present a game-theoretical approach to perform distributed subcarrier allocation for multi-cell OFDMA systems with limited base station (BS) coordination. Firstly, we construct a multi-cell resource allocation game. Creatively, the subcarrier is viewed as a game player to choose the most satisfying user and the BS acts as a referee or a coordinator. Then, we introduce the correlated equilibrium which helps the non-cooperative players coordinate their strategies, hence, achieves better performance compared to Nash equilibrium. Particularly, we point out the condition under which the correlated equilibrium is Pareto efficient. Moreover, we propose a novel subcarrier allocation algorithm based on the no-regret procedure that guarantees convergence to correlated equilibrium, in which the BS coordinates the players’ strategies and provides only partial information exchange. Extensive simulation results are provided to demonstrate the effectiveness of the proposed algorithm.


Introduction
Orthogonal frequency division multiple access (OFDMA) has emerged as one of the most promising multiple access techniques for high data rate transmission over wireless channels due to its ability to mitigate multipath fading and its efficient implementation using IFFT and FFT blocks. The most recently proposed next generation wireless communication technologies, such as wireless wide area network (WWAN) standards, 3GPP2 ultra mobile broadband (UMB), IEEE 802.20 mobile broadband wireless access (MBWA), 3GPP LTE and worldwide interoperability for microwave access (WiMAX) are all OFDMA based [1].
In an OFDMA system, the spectrum is orthogonally divided into time-frequency resource blocks (RBs), which increases flexibility in resource allocation, thereby allowing high spectral efficiency. Exploiting all RBs simultaneously in every cell to achieve universal frequency reuse becomes a key objective toward the deployment of 4G networks [2]. Focusing on the universal frequency reuse *Correspondence: longxingren.zjc.s@163.com Institute of Communications Engineering, PLA University of Science and Technology, Nanjing, China scheme, inter-cell interference is a major impairment that limits the system throughput [3]. In multi-cell environment, one of the major issues to research is how to maximize the performance by controlling the co-channel interference among the neighboring cells [4]. Interference coordination can be fulfilled by allocating system resources with interference awareness in terms of frequency, time, transmit power and space, etc. [5,6]. Due to limited and precious radio resources in cellular networks, interference aware resource allocation is a challenging problem and has received much attention from both research and standardization communities in recent years [5][6][7][8][9]. Moreover, because any change of resource allocation in a specific cell will affect the performance of the nearby cells, joint resource allocation considering both throughput and interference over a cluster of neighboring cells via BS coordination is a promising solution.
Recently, BS coordination, where neighboring BSs connected through high-speed wireline links only share channel state information (CSI) and can jointly compute their transmit power and user scheduling, has been proposed as a major technique to mitigate co-channel interference, since it shifts the signal processing burden to the BSs http://jwcn.eurasipjournals.com/content/2012/1/233 [10]. Many works have been done on coordinated resource allocation in cellular wireless networks, including both centralized and distributed procedures [11]. Centralized algorithms (e.g., [12][13][14]) require global information to decide the user assignment and transmit power in each cell. The problem is often formulated as an optimization task subject to bit rate, power level, or other types of constraints [5,8]. However, since most of the optimization problems in cellular networks can be proved to be NP-hard [15,16] (as a mixed integer-nonlinear problem), standard optimization techniques do not apply directly and even centralized algorithms cannot guarantee that the globally optimal solution is found. In addition, even if computational issues were to be resolved, the optimal solution still requires a central controller updated with instantaneous intercell channel gains which would create serious signaling overhead issues, thus hinder it from practice [17]. Consequently, distributed algorithms (e.g., [11,[18][19][20]) are more attractive as they do not require a central controller and may demand less information exchange and computational complexity.
Game theory, which is naturally the dominant paradigm for analyzing the decentralized framework, is recently adopted by many researchers to seek for a satisfactory solution to the problem of resource allocation and/or interference coordination [4,[21][22][23]. Kwon and Le [4] design the utility function that represents the weighted sum of the data rates and the power consumption in a cell. The problem of maximizing the utility under the maximum power constraint is modeled as a noncooperative resource allocation game, in which the BS is viewed as the game player. Liang et al. [21] focus on the adaptive allocation of subcarrier, bit, and power among BSs of a downlink multi-cell OFDMA systems. The utility function takes both data rate and power consumption into account. However, the authors have not formulated the problem from the perspective of interference mitigation. In [22], a noncooperative game in which each user selfishly tries to minimize its own transmitted power subject to a transmission rate constraint is proposed. Nevertheless, the proposed game is not guaranteed to converge to a Nash equilibrium, and therefore a virtual referee is introduced to monitor the resource allocation and force it to a stable and efficient equilibrium point. Al-Zahrani et al. [23] consider a transmit power adaptation method using a noncooperative game theory approach to reduce the inter-cell interference in the OFDM networks. The throughput is enhanced by finding the optimum transmit power for each co-channel user using game theory-based scheme. However, no subcarrier allocation is discussed.
The existing works based on game theory mainly concentrate on power control, while subcarrier allocation is more or less simplified. Moreover, the power allocation is easier to be solved by continuous game method, while the discrete game applying to subcarrier allocation is much harder that few works consider. Thus in this work we make a game-theoretic study on the distributed subcarrier allocation algorithm in the uplink multi-cell OFDMA systems. Note that the pure non-cooperative game may result in non-convergence or some undesirable Nash equilibria with low system and individual performance. To enhance the performance, we introduce an important generalization of the Nash Equilibrium, known as the correlated equilibrium, which is more preferable than Nash equilibrium since it directly considers the ability of agents to coordinate their actions. This coordination can lead to better performance than if each agent was required to act in isolation [24].
The main contributions of this article are summarized as follows: • In this article, we formulate the subcarrier allocation problem in a novel point of view that each subcarrier performs as a game player to choose the most satisfying user, which guarantees the fairness from the perspective of the subcarriers. Therefore, it is different from the traditional subcarrier allocation, in which the subcarriers are allocated passively. • An efficient distributed learning algorithm is developed to perform subcarrier allocation for the multi-cell scenario, which achieves a good performance, jointly considering the throughput, interference and fairness. The proposed algorithm exhibits low complexity and converges to the set of correlated equilibria with probability one. • In general, the outcomes of individual optimization might not always be as good as those of system optimization. To solve this problem, the BS is introduced as a referee or coordinator, which is in charge of monitoring and improving the outcome of non-cooperative competition among the distributed players. Thus, strictly speaking, the approach is limited cooperation among BSs by adopting distributed algorithms, which is recognized as a good tradeoff between the performance gain nd the relevant cost, considering that they demand less information exchange and computational complexity.
The rest of this article is organized as follows. In Section "System model and problem formulation", we present the system model and a novel utility function considering both throughput and interference. In Section "Correlated equilibrium for joint strategy selection", we study the correlated equilibrium. Then, we construct a distributed subcarrier allocation algorithm based on no-regret procedure http://jwcn.eurasipjournals.com/content/2012/1/233 in Section "Distributed learning algorithm for joint strategy selection" and prove that the algorithm converges to a set of correlated equilibria. Simulation results are shown in Section "Simulation results and analysis" and finally conclusions are drawn in Section "Conclusion".

System model
We consider the uplink of a multi-cell OFDMA system which consists of a set of L BSs denoted by L = {1, 2, . . . , l, . . . L}, shown in Figure 1. Neighboring BSs connected through high-speed wireline links could be regarded as a BS pool which is managed by a joint central BS controller like [25]. And the available spectrum is divided into K subchannels. Denote the index sets of all users and all subcarriers as Users and BSs are equipped with one transmit and one receive antenna, respectively. We define the channel gain matrix G = R N×N×K , where g k ij gives the channel gain between transmitter of user i and receiver of user j when transmission is made through subcarrier k. g k ij = g k ji generally. g k ii denotes the channel gain between transmitter of user i and the BS over subcarrier k. Similarly, the transmission power matrix is denoted by P = R N×K , whose element p ik is the transmit power of user i over subcarrier k, which must satisfy the non-negative requirement. And the total power transmitted by user i should be less than P i max . In addition, some assumptions should be made as follows: (1) In each single-cell OFDMA network, K is always much larger than the number of users and no subcarrier can simultaneously support transmission for more than one user. Hence, all the users can simultaneously transmit data to the BS on one or more subcarriers without interfering others. (2) Each user is served by only one BS which locates in his cell. Thus, N l ∩ N l = ∅, for l = l . (3) The bandwidth of each subchannel is less than the coherence bandwidth of the channel so that each subcarrier experiences flat fading. (4) The subcarriers are perfectly orthogonal such that no intersymbol interference between adjacent symbols occurs. (5) Perfect synchronization is assumed so that there will not be intersubcarrier interference. (6) The BS periodically estimates the uplink channel gains on all subcarriers for all the users through pilot signals. And all the CSI needed can be accurately tracked by the BS. (7) The network is geographically static in the sense that the time scale of algorithm convergence is shorter than the channel's coherence time. Thus, the channel gains on subcarriers remain unchanged during one implementation of the algorithm [26].

Problem formulation
In this section, we model the subcarrier allocation as a multi-player discrete, finite strategy game, in which the subcarriers are considered as players. Therefore, there is a shift in perspective from a user's view of allocation to a subcarrier's view, and the subcarriers can choose the most satisfying users for themselves. It means that the fairness from the perspective of the subcarriers can be ensured. Each BS is assumed to have access to all available subcarriers, i.e., the frequency reuse factor is 1. Consequently, each BS has K available subcarriers. Note that different cells have different user sets, which means that the same subcarrier in different cells has different strategy sets, thereby the same subcarriers in different cells should be treated as different players. Thus, we denote the subcarrier k in cell l by k l for distinguishing. And the strategy of player k l is denoted by S k l , while the joint strategy by opponents of player k l is denoted by S −k l . Hence, S = (S k l , S −k l ) is the joint strategy of all players, also known as a strategy profile. U k l (S k l , S −k l ) or U k l (S) denotes the utility function of player k l .
Taking both throughput and inter-cell interference into consideration, we define the utility function similar to [27]: (1) where i = S k l and j = S k l denote the users chosen by subcarrier k l and k l , respectively. Obviously, user i and user j are in different cells. Subcarrier k l and k l are the same frequency band which are allocated by different BSs. l and l denote the cell index.
The utility function of each subcarrier is designed based on the profit of the user who achieves the subcarrier and from the system optimization point of view. In fact, users act as the prolocutors of the subcarriers that they achieve. The first part of the utility function denotes the profit of user on subcarrier k l , which is relevant to throughput, while the second part indicates the total interference it receives from the neighboring cells. Furthermore, the increase of utility value indicates improvement of throughput and decrease of interference. The objective is to maximize the throughput and minimize the interference simultaneously. All subcarriers will compete for the most suitable user assignment under the coordination of the BSs in order to maximize their utility function. This problem is given by: which can be easily solved by modeling a game: where the components of the game are given in the list: . . , K} is the index set of the players (we use player, subcarrier interchangeably). In fact, the same subcarrier in different cells acts as the different players making their own decisions alone. Therefore, the number of the total players is KL.
(2) k l is the strategy space of player k l . Obviously, k l = N l . Therefore, the space for the joint strategy profile is defined by (3) U k l : S → R is the individual utility mapping the joint strategy space to the set of the real number.

Remark 1.
Although the subcarriers can decide which user to choose at their will, they have no capability of performing strategy selection. Therefore, the subcarriers are virtual game players actually. In essence, the game is managed by BSs who act as the coordinators or referees, and BSs allocate the subcarriers according to the equilibrium point of the game.

Correlated equilibrium for joint strategy selection
In order to analyze the outcome of the proposed game, we focus on an important generalization of the Nash equilibrium, known as the correlated equilibrium that a strategy profile is chosen randomly according to a certain distribution given to the players by some "coordinator" or "referee". Each player is given-privately-instructions for his own play only and the joint distribution is known to all of them. It is to the players' best interests to conform to this recommended strategy, and the distribution is called the correlated equilibrium [28].
Correlated equilibrium Definition 1. [29]: For the proposed game G, a joint probability distribution p over the strategy space S = 1 × 2 ×· · ·× KL is a correlated equilibrium, if and only if, for all k l ∈ K l , S k l ∈ k l , and S −k l ∈ −k l , ∀S k l ∈ k l , The inequality means that when the recommendation to player k l is to choose action S k l , then choosing the any other action instead cannot obtain a higher expected utility. Proof. The result from [30] shows that every finite game has a correlated equilibrium. Hence, Theorem 1 is justified, and enables the application of the proposed game.

Remark 2.
The set of correlated equilibria is nonempty, closed and convex in G. In fact, every Nash equilibrium is a correlated equilibrium and Nash equilibrium corresponds to the special case where the action of the different players is independent, i.e., p(S k l , S −k l ) = p(S 1 ) × · · · p(S k ) × · · · p(S KL ). Moreover, the set of correlated equilibrium distributions of G is a convex polytope and the Nash equilibia all lie on the boundary of the polytope [31].

Optimal correlated equilibrium
The correlated equilibria defines a set of solutions which is better than Nash equilibrium, but which one is the most suitable should be carefully considered in practical design. Altman et al. [32,33] discussed the criterion of optimal correlated equilibirum. Han et al. [29] proposed two refinements. The first one is the maximum sum correlated equilibrium that maximizes the sum of utilities of players. The second one is the max-min fair correlated equilibrium that aims to improve the worst player situation. It can be formulated as a linear programming solution.
where E p () is the expectation over p. The constraints guarantee that the solution is within the correlated equilibrium set.

Theorem 2.
In the proposed game G, the correlated equilibrium which maximizes the expected sum of utilities of the subcarriers, P * , is Pareto efficient.
Proof. If the resulting correlated equilibrium P * is not Pareto efficient, there must exist a different probability distribution P such that S∈S P (S)U k l (S) S∈S P * (S)U k l (S), ∀k l ∈ K l , ∀l ∈ L and S∈S P (S)U k l (S) > S∈S P * (S)U k l (S) for some k l , thus l∈L k l ∈K l S∈S P (S)U k l (S)> l∈L k l ∈K l S∈S P * (S)U k l (S), which contradicts the fact that P * is the optimal solution. The proof is completed.

Algorithm description
In this section, we present a distributed learning algorithm which always leads to the set of correlated equilibria. From the result, each player can independently determine its own cooperative strategy. Concretely, the proposed algorithm is based on the no-regret procedure of [29]. In this procedure, players may depart from their current play with probabilities that are proportional to measures of regret for not having used other strategies in the past.
The learning algorithm is executed independently by each virtual player, coordinated by the BSs and summarized as follows.

Utility update
For all l ∈ L, each player k l ∈ K l calculates the utility of the current strategy S k l ∈ k l and the utility for choosing the different strategy S k l ∈ k l .

Regret value update
If player k l replaces strategy S k l , every time that it was played in the past, by the different strategy S k l , the resulting difference in k l 's average utility up to time n is where S (τ ) k l , S −k l denotes the strategy chosen at time τ . Then, where R n k l (S k l , S k l ) represents the average regret value at time n for not having played, every time that S k l was played in the past, the different strategy S k l .

Transition probability update
Assuming S k l ∈ k l is the strategy last chosen by player i, i.e., S n k l = S k l , the transition probability distribution is defined as where μ is a normalization factor which is chosen to ensure the probabilities are non-negative. http://jwcn.eurasipjournals.com/content/2012/1/233

Strategy update
At the period n + 1, k l updates its decision strategy according to the transition probability distribution.
In the proposed algorithm, each player does not need to be concerned about the individual strategies and utilities of other players, global network structure, etc. Each one just needs to know the effect of other players on its individual utility function. In addition, each player views its current actual strategy as a reference point, and makes a decision for next period according to propensities to depart from it. However, the change should bring the improvement in individual utility, relative to the current choice.

Remark 3. The implementation of the proposed algorithm needs the history of play H n = S
τ n τ =1 ∈ n τ =1 S given. And the BSs take the responsibility naturally and expediently, thus the cooperative strategy is obtained. As Hart and Mas-Colell observe in [28] that, "there is a natural coordination device: the common history, observed by all players. "

Convergence analysis
Define z n ∈ S as the empirical distribution of the N-tuples of strategies played up to time n. Its element, denoted by z n (S), ∀S ∈ S, represents the relative frequency that S has been played at time n,i.e., Moreover, the empirical distribution z n can be obtained by the recursion: where e S n+1 =[ 0, 0, . . . , 1, 0, . . . , 0] denotes the | S| dimensional unit vector with the one in the position of S n+1 .

Theorem 3.
If every player follows the proposed algorithm, the empirical distributions of play z n converge almost surely as n → ∞ to the set of correlated equilibria of our game. That z n converges to the set of correlated equilibria has been proved in many works, such as [24,28,34]. Here, we only provide a brief sketch of these proofs: (1) Huang and Krishnamuthy [24] prove convergence indirectly by proving an inequality which is originated from the Blackwell's sufficient condition for approachability (2) In [28], the proof is based on a recursive formula for the distance of the vector of regrets to the negative orthant. In particular, by adopting multi-period recursion where a large "block" of periods is combined together instead of one-period recursion, the conditions of Blackwell's approachability theorem are proved (3) In [34], the proof relies on a stochastic averaging theory. Due to the set theoretic nature of the correlated equilibira, the convergence analysis is carried out through a differential inclusion, which is the set theoretic extension of a differential equation.

Computational complexity analysis
At each iteration, each player k l needs to keep a record of the utility of choosing the current strategy and the utilities for changing to the other strategies. In addition, the proposed algorithm requires one table lookup, not more than n + KL additions and KL + 1 multiplication to update the regret value, and one comparison to choose the next strategy. And similar to the analysis in [35,36], the complexity of our algorithm only depends on the number of player's strategies, that is, O( k l ).

Simulation results and analysis
In this section, we conduct simulations to study the performance of the proposed subcarrier allocation algorithm. We consider a 3-cell OFDMA system, as shown in Figure 2, where each hexagonal cell has a radius of 100 m similar to the case in [22] and the users are generated as a uniform distribution within the corresponding cell. The base stations (BSs) are located at the center of each cell and are separated by 100 √ 3 m among each other. The path loss between two users is expressed as h ij = 0.097/d υ ij , where υ = 4, d ij is the distance between transmitter of user i and receiver of user j. Then for user i, j and subcarrier k, the channel gain is g k ij = h ij |β k | 2 , where β k ∼ CN (0, 1) is a unitary power, Rayleigh fading coefficient. The total bandwidth is divided into sub-channels, the capacity of user i in cell l over where γ l ik = p ik l g k l ii L l =1,l =l p jk l g k l ji +σ 2 is the signal-to noiseand-interference-ratio (SINR), = − ln(5BER)/1.5 is the bit error rate (BER) gap. For simplification, we set =

1, and use
for the capacity comparison in the simulation. The Gaussian noise variance σ 2 is 10 −10 W. In order to focus on the subcarrier allocation, we decide the maximum power per user beforehand according to the diverse channel gain of each user and the same power budget will be distributed among subcarriers assigned to the same user. The maximal power constraint of all users is set to P max = 0.2 W. http://jwcn.eurasipjournals.com/content/2012/1/233 We initialize the game with a random user assignment for each player. The players will take action to search for improvement in utility value by looking for the best response strategy after observing the opponent's action. Figure 3 plots the improvement of system capacity through the proposed algorithm versus the number of iterations, when considering the OFDMA system with 12 users employing 32, 64, 128 subcarriers, respectively. The capacity value is updated at each iteration and greatly improved at the convergence time. We can also get that the increase of the number of subcarriers brings a higher capacity value through the improvement of the frequency diversity gains. And a higher capacity value indicates the improvement of throughput and decrease of interference. Figure 4 plots the variation of system capacity against the number of iterations with 12, 24, 48 users served in the system respectively and the number of subcarriers is fixed at 64. This figure indicates that the capacity value will be higher due to a better multiuser diversity when more users are located in the system. Similar simulation results can be achieved when more subcarriers are considered. Figures 3 and 4 also show that no matter how many subcarriers are employed and users are placed in the uplink OFDMA system, the correlated equilibrium can be obtained via using the proposed algorithm. It is easy to observe that the convergence should take no longer than 100 iterations. Furthermore, the more the subcarriers employed or the more users located, the slower the convergence speed. It can be explained that the increase in numbers of subcarriers or users can result in the growth of interaction among the players. In addition, it should http://jwcn.eurasipjournals.com/content/2012/1/233 be noted that the speed of convergence changes with μ (the normalization factor) and the initial strategy of players.
From Figure 5 we can see that the interference value decreases quickly with respect to the number of iterations. Thus we can get the conclusion that our proposed algorithm achieves a good performance for interference mitigation in multi-cell OFDMA system. Here the interference value is on the scale of 10 −11 which is similar to the scale of utility value shown in Figure 2 in [27], because the interference received by user i in cell l over subcarrier k is expressed as I l k = L l =1,l =l p jk i g k l ji , where g k ij = c/d 4 ij is on the scale of 10 −10 when d ij is larger than 100 m and p ik l ≤ 0.2 W.
For comparison, the following algorithms are considered: (1) Algorithm 1 is our proposed distributed subcarrier allocation algorithm, (2) Algorithm 2: Nash bargaining algorithm in [37], (3) Algorithm 3: each subcarrier is assigned to the user according to the channel gain, (4) Algorithm 4: each user is allocated the same number of subcarriers. Also, we perform equal power allocation for all the algorithms. The fairness and efficiency of these four different subcarrier allocation algorithms are compared. Fixing the number of subcarriers at 64, Figure 6 shows the number of subcarriers allocated to each user, Figure 7 plots the capacity value achieved by each user, and Figure 8 plots the system capacity for varying number of users. When evaluating the fairness from the number of subcarriers assigned to each user shown in Figure 6, Algorithm 4 is best of course, and Algorithm 1 follows, the other two are worse. Nevertheless, Algorithms 1, 2, 3 are indistinctive and Algorithm 4 is worst from the view of capacity each user achieves illustrated in Figure 7, which should assess the fairness more properly. To make a system performance comparison according to Figure 8, Algorithm 2 is best, Algorithm 1 follows, and Algorithm 3 ranks third. And Algorithm 4 is much worse than the other three as a result of not considering the channel condition when allocating subcarriers, which causes severe interference. Also, the convergence comparison of Algorithm 1 and Algorithm 2 is shown in Figure 9. The Nash bargaining solution found by Algorithm 2 which is proved to be Pareto optimal [37] outperforms our proposed distributed algorithm in terms of achievable capacity comparison, while it is a cooperative game-theoretical approach which requires much more information exchange. Moreover, for each iteration, the complexity of Algorithm 2 is given as O(N 2 ) in [37], while our proposed algorithm only has the complexity of O(N). In addition, our proposed algorithm can achieve very near performance compared with the Nash bargaining solution. Hence, the proposed algorithm is more suitable for implementation in multi-cell OFDMA networks, especially when the number of users is large. Figure 8 also implies that the increasing speed of system capacity gets slower when more users are served in the system, as a consequence of that a large amount of users may bring about more serious interference.

Conclusion
In this work, we have presented a distributed subcarrier allocation approach with limited BS coordination for multi-cell OFDMA systems. The goal is to maximize the performance by controlling the co-channel interference at the same time. Concretely, we model a joint strategy selection game in a novel point of view that each subcarrier performs as a game player to choose the most satisfying user, which guarantees the fairness from the perspective of the subcarriers and focus on the implementation of the set of correlated equilibria to analyze the outcome of the proposed game. Moreover, since any change of resource allocation in a specific cell will affect the performance of the nearby cells and the outcomes of individual optimization might not always be as good as those of system optimization, joint resource allocation via BS coordination is considered. Then, we develop a novel distributed subcarrier allocation algorithm based on no-regret procedure to learn the correlated equilibrium, which demands less information exchange and computational complexity. The simulation results show that the proposed algorithm achieves good performance, such as quick convergence, large interference mitigation, evident capacity improvement, and good fairness. Further study could be focused on both the power and subcarrier allocation simultaneously to achieve a higher overall throughput of the system.