# Basics of coalitional games with applications to communications and networking

- Farshad Shams
^{1}Email author and - Marco Luise
^{2}

**2013**:201

https://doi.org/10.1186/1687-1499-2013-201

© Shams and Luise; licensee Springer. 2013

**Received: **1 December 2012

**Accepted: **2 July 2013

**Published: **31 July 2013

## Abstract

Game theory is the study of decision making in an interactive environment. *Coalitional games* fulfill the promise of group efficient solutions to problems involving strategic actions. Formulation of optimal player behavior is a fundamental element in this theory. This paper comprises a self-instructive didactic means to study basics of coalitional games indicating how coalitional game theory tools can provide a framework to tackle different problems in communications and networking. We show that coalitional game approaches achieve an improved performance compare to non-cooperative game theoretical solutions.

## Keywords

## 1 Introduction

The increase of the number of wireless services, combined with demand for high-definition multimedia communications, have made the radio resources, and particularly the spectrum and power, a very precious and scarce resource, not because of their unavailability but because they are used inefficiently. For licensed spectrum, the measurements by Shared Spectrum Company [1] shows that the maximal usage of the spectrum is a low percentage of the whole licensed. While the number of users and the spectrum usage steadily increases, the amount of spectrum is still considered a limited resource. Besides, to differentiate between the true signal and background noise is complex for a radio equipment. Generally, this complex process enforces terminals to transmit strong version of signals, which wastes the energy of a transmitter.

The modern wireless entities, i.e., wireless terminals and base stations, have considerable capacities to execute dynamic processes. This capability encourages wireless service providers to consider wireless entities as autonomous agents which could cooperate and negotiate with each other to achieve an efficient resource allocation in different situations. Cooperation among wireless terminals is usually intended to achieve a fair radio resource allocation. Cooperation between base stations can be devised to mitigate interference and promote soft handover where channel gain is varying rapidly which is a challenge in LTE [2].

Game theory is the most prominent tool to analyze interaction issue in social sciences, wherein often, cooperation among autonomous agents is essential for successful task completion. In many settings, groups of competing agents are simultaneously concerned with both individual and overall benefits. In the game theory literature, this branch is known as cooperative game [3, 4]. The players, as the main decision making entities in the game, are considered to negotiate with each other to determine a binding agreement among them. If we assume that all users act rationally and we know what the behavior of the users are, it is possible to determine the overall performance of a system since the actions of one user becomes part of the circumstances for another user. Thus, we are interested in individual performance and overall system performance under a specific set of rules. To fully develop the different possibilities within a game for cooperation among players, we have to address which groups the players can achieve collectively. Indeed, if a player assesses that within a certain group it does not receive what it is able to get by itself, then it might decide to abandon the cooperation and pursue an alternative allocation by itself. Cooperative game theory offers the opportunity to extend and expand the treatment of the players in traditional non-cooperative games, especially where selfish players compete over a set of resources. The cooperative game theory is divided into two parts: coalitional game theory and bargaining games [3, 4]. In this contribution, we focus on the coalitional game theory. We show that in comparison to non-cooperative game theory, coalitional games approaches can achieve better results in terms of performance and stability.

Saad et al. in a tutorial paper [5] classify coalitional games into three categories: canonical (coalitional) games, coalition formation games, and coalitional graph games. In canonical games, no group of players can do worse by joining a coalition than by acting non-cooperatively. In coalition formation games, forming a coalition brings advantage to its members, but the gains are limited by a cost for forming the coalition. In coalitional graph games, the coalitional game is in graph form, and the interconnection between the players strongly affects the characteristics as well as the outcome of the game.

In the last few years, cooperative game theory has been successfully applied to communications and networking. Hossain et al. [6] provides a guide to state-of-art which unifies the essential information, addressing both theoretical and practical aspects of cooperative communications and networking in the context of cellular design. The current literature is mainly focused on applying cooperative games in various applications such as distributed/centralized radio resource allocation [7–9], power control [10, 11], spectrum sharing in cognitive radio [12, 13], cooperative automatic repeat request (ARQ) mechanism [14], cooperative routing [15], and cooperative communications [16, 17]. These problems in wireless networks can be modeled as a cooperative game since it is highly likely that each wireless user can obtain a better utility value by forming groups and controlling resources cooperatively rather than individually. It has been shown that cooperation can result in an enhanced QoS in terms of throughput expansion, bit error rate reduction, or energy saving [6].

Cooperation can be realized at various layers of the network. At the physical layer, different separate antennas can constitute a cluster and then cooperate with each other to exploit multiple-input multiple-output gains. At the MAC sublayer, some wireless terminals can cooperate with each other to share a common wireless medium in an efficient manner and consequently mitigate the interference hazard. There is also the possibility of cooperation of physical and application layers among individual terminals to adapt channel and source codings in multimedia communications. The altruistic decision of cooperation with other network entities may result in an improvement of the overall network performance and concurrently achieve an egoistic interest of self improvement.

The rest of this paper is divided into eight sections. The following section provides an introductory discussion of coalitional game theory. We systematically study fundamental definitions and conditions of coalitional games: superadditivity and convexity. Then, Section 3 and the sub-section inside discuss the core set solution as the most known solution for payoff distribution. Section 4 is devoted to a study of a strong payoff distribution, the so-called Shapley value. In Section 5, we present a systematic study of two other reward divisions called the *kernel* and *nucleolus*. Then, in Section 6, we extend the concept of Nash equilibria in coalitional games. Section 7 is an investigation of the concept of coordinated equilibria, where players of the game are admitted to pre-communicate among themselves at once. Section 8 helps the reader to understand the basic concepts and importance of dynamic learning in coalitional games. Every section contains some motivation examples that are expedient to understand how different communication network problems can be modeled as coalitional game. We discuss the features of mentioned approaches in Section 9, and finally we conclude this paper in Section 10.

## 2 Preliminaries

Game theory deals with the study, through mathematical models, of conflict situations in which two or more rational players make decisions that will influence each other’s welfare. The theory of coalitional games [3, 4] also assumes that binding agreements may be established among the players in the course of the conflict situation. In transferable utility (TU) games, the agreement may be reached by any subset of the players, and the gain obtained from this agreement is a real number and is transferable among these players. In non-transferable utility (NTU) games, the agreement may be reached by any subset of the players, but the gain may be non-transferable. The main focus of this dissertation will be on the study of TU games.

*ν*the

*coalition (characteristic) function*which is interpreted as the maximum outcome (a real number) to each coalition (subset of $\mathcal{K}$) whose players can jointly produce. An NTU game is a pair $\mathcal{G}=(\mathcal{K},V)$ where

*V*is a mapping which for each coalition $\mathcal{A}$, defines a characteristic set, $V\phantom{\rule{1em}{0ex}}\left(\mathcal{A}\right)$, satisfying

- (1)
$V\phantom{\rule{1em}{0ex}}\left(\mathcal{A}\right)$ is non-empty and closed subset of ${\mathbb{R}}^{\left|\mathcal{A}\right|}$,

- (2)
For each $k\in \mathcal{A}$, there is a ${V}_{k}\in \mathbb{R}$, such that

*V*({*k*}) = (-*∞*,*V*_{ k }], - (3)
$V\phantom{\rule{1em}{0ex}}\left(\mathcal{A}\right)$ is comprehensive, i.e., for all $\mathbf{x}\in V\phantom{\rule{1em}{0ex}}\left(\mathcal{A}\right)$ and for all $\mathbf{y}\in {\mathbb{R}}^{\left|\mathcal{A}\right|}$, if $\mathbf{y}\left[\phantom{\rule{0.3em}{0ex}}k\right]\le \mathbf{x}\left[k\right]\forall \phantom{\rule{2.77626pt}{0ex}}k\in \mathcal{A}$, then $\mathbf{y}\in V\phantom{\rule{1em}{0ex}}\left(\mathcal{A}\right)$,

- (4)
The set $V\phantom{\rule{1em}{0ex}}\left(\mathcal{A}\right)\bigcap \left\{\mathbf{y}\in {\mathbb{R}}^{\left|\mathcal{A}\right|}\mid \mathbf{y}\left[\phantom{\rule{0.3em}{0ex}}k\right]\ge {V}_{k}\phantom{\rule{2.77626pt}{0ex}}\forall \phantom{\rule{2.77626pt}{0ex}}k\in \mathcal{A}\right\}$ is bounded.

where $\mathbf{x}=\left[{x}_{1},\dots ,{x}_{\left|\mathcal{A}\right|}\right]\in {\mathbb{R}}^{\left|\mathcal{A}\right|}$ and *x*_{
k
} is the payoff of player *k* in $\mathcal{A}$ and $\nu :{2}^{\mathcal{K}}\to \mathbb{R}$. If $\mathcal{A}$ is a coalition (subset) of $\mathcal{K}$ formed in $\mathcal{G}$, then its members get an overall payoff $\nu \left(\mathcal{A}\right)$, zero for the empty set. Each coalition can be represented as a pure strategy in non-cooperative game theory. There exist only few works on NTU game applications to problems in communications [18]. This is because defining a utility function which meets all conditions of a character set in NTU game is not always feasible.

An important property of interest in characteristic form TU games is *superadditivity*, which, if present, implies that the value of the unite of any two disjoint coalitions is at least as big as the sum of their values.

*Definition 1.*A TU game $\mathcal{G}$ is superadditive if

In a superadditive TU game, there are positive synergies and the players prefer to join each other rather than act alone. Under superadditivity condition, the players are willing to form the *grand coalition* (the set $\mathcal{K}$).

*Convex* or alternatively *supermodular* coalitional games were introduced by Shapley [19]. He models coalitional situations, where the marginal contribution of a player to a coalition increases as the coalition becomes larger.

*Definition 2.*A TU game $\mathcal{G}$ is convex or supermodular if for all $k\in \mathcal{K}$

Equivalently,

*Definition 3.*A TU game $\mathcal{G}$ is convex or supermodular if

Convexity means that there are increasing returns to scale. Note that a convex game is superadditive. To better understand the importance of convexity approach in network problems, we verify the convexity condition in a *K* -user channel access game. The payoff of each coalition of players (transmitters) is defined as the outer MAC capacity region. ParandehGheibi et al. ([20], Lemma 1) shows that in a multiple access channel scenario, the inequality (4) is not met. This means that the game is not convex, and thus adding a new player does not give benefit to other transmitters.

## 3 The core solution

A central question in a coalitional game is how to divide the extra earnings (or cost savings) among the members of the formed coalition. In a TU game, an allocation is a function **x** from $\mathcal{K}$ to $\mathbb{R}$ that specifies for each player $k\in \mathcal{K}$ the payoff ${x}_{k}\in \mathbb{R}$ that this player can expect when it cooperates with the other players. The payoff of each player can show the cost borne by the player, the power of influence, and so on, depending on the problem setting.

*Definition 4.*Let $\mathcal{K}$ be the set of

*K*players of the superadditive TU game $\mathcal{G}$, and let

*ν*be the payoff of the game. The set of all ‘imputations’ of $\mathcal{G}$ is the set

where $\mathbf{x}=\left[{x}_{1},\dots ,{x}_{k},\dots ,{x}_{K}\right]\in {\mathbb{R}}^{K}$ is the imputation vector of the players. The former condition is called the *feasibility*, and the latter *individually rational* condition.

The *core* concept was introduced in [21] and is the most attractive and natural way to define a payoff distribution: if a payoff distribution is in the core, no agent has any incentive to be in a different coalition. The core of a TU game is the subset of all imputations $\mathbf{x}\in \mathcal{I}(\mathcal{K},\nu )$ that no other imputation *directly dominates*, that is, $\nexists \phantom{\rule{0.3em}{0ex}}\mathbf{y}\in \mathcal{I}(\mathcal{K},\nu )\text{s.t.}{y}_{k}>{x}_{k}\phantom{\rule{1em}{0ex}}\forall \phantom{\rule{0.3em}{0ex}}k\in \mathcal{K}$. As can be seen, for coalitional games as well as non-cooperative games, the notion of dominance is essentially equivalent; the payoffs under the various situations are compared, and one situation dominates the others if these payoffs are higher. The core actually presents a condition stronger than Nash equilibrium in non-cooperative game: no group of agents should be able to profitably deviate from a configuration in the core. Equivalently, no set of players can benefit from forming a new coalition, which corresponds to the group rationality assumption.

In an NTU game $\mathcal{G}=(\mathcal{K},V)$, the core apportionment is defined as ([4], Ch. 12)

*Definition 5.*Let $\mathcal{K}$ be the set of

*K*players of the superadditive NTU-game $\mathcal{G}$, and let

*V*be the payoff of the game. The core of $\mathcal{G}$ is the set

where **x** is the payoff distribution across players, and *x*_{
k
} ∈ **x** if and only if no coalition can improve upon *x*_{
k
}.

In a TU game $\mathcal{G}=(\mathcal{K},\nu )$, the core apportionment is defined as follows:

*Definition 6.*Let $\mathcal{K}$ be the set of

*K*players of the superadditive TU game $\mathcal{G}$, and let

*ν*be the payoff of the game. The core of $\mathcal{G}$ is the set

where $\mathbf{x}=\left[{x}_{1},\dots ,{x}_{k},\dots ,{x}_{K}\right]\in {\mathbb{R}}^{K}$ is the payoff distribution across players, and *x*_{
k
} ∈ **x** if and only if no coalition can improve upon *x*_{
k
}. The second condition is called *non-blocking* condition.

Madiman [22] introduces some intuitive applications of core solution to information theory contexts, e.g., source coding and multiple-access channel, and summarizes some of its limitations in multi-user scenarios. Li et al. [23] show that the cooperation among wireless nodes and core apportionment can increase spectrum efficiency in a TDMA cooperative communication. In [24], Niyato and Hossain apply the core solution in a coalition among different wireless access networks to offer a stable and efficient bandwidth allocation.

Indeed, there is a number of realistic application scenarios, in which the emergence of the grand coalition is either not guaranteed or might be perceivably harmful, or is plainly impossible [25]. For a non-superadditive coalitional game, the coalition formation process does not lead the players to form the grand coalition. In this case, Definition 6 does not apply. Let us redefine the core set in a general (not necessarily superadditive) coalitional formation TU game [9]. Let $\psi =\left[{\mathcal{A}}_{1},{\mathcal{A}}_{2},\dots ,{\mathcal{A}}_{m}\right]$ denote a partition of the set $\mathcal{K}$, wherein ${\mathcal{A}}_{i}\cap {\mathcal{A}}_{j}=\varnothing $ for *i* ≠ *j*, $\bigcup _{i=1}^{m}{\mathcal{A}}_{i}=\mathcal{K}$ and ${\mathcal{A}}_{i}\ne \varnothing $ for *i* = 1,…,*m*, and let Ψ denote the set of all possible partitions *ψ*. Let us also define $\mathcal{F}=\left[{\mathcal{A}}_{1},{\mathcal{A}}_{2},\dots ,{\mathcal{A}}_{n}\right]$, such that $\bigcup _{i=1}^{n}{\mathcal{A}}_{i}=\mathcal{K}$ and ${\mathcal{A}}_{i}\ne \varnothing $ for *i* = 1,…,*n*, as a family of (not necessarily disjoint) coalitions.

*Definition 7.*A ‘core apportionment’ $\mathbf{x}\in {\mathbb{R}}^{K}$ is a payoff distribution with the following property:

Note that, if $\mathcal{G}$ is superadditive, then $\underset{\psi \in \mathit{\Psi}}{\text{max}}\sum _{\mathcal{A}\in \psi}\nu \left(\mathcal{A}\right)=\nu \left(\mathcal{K}\right)$.

The core allocation set can be found through linear programming; its existence in general, depends upon the feasibility of (8). Unfortunately, the core is a strong notion, and there exist many games where it is empty. We can study the non-emptiness of the core without explicitly solving the core equation. The following notation helps simplify the dual of (8):

*Definition 8.*A superadditive TU game $\mathcal{G}$ for a family $\mathcal{F}$ of coalitions is

*totally balanced*if for any $\mathcal{A}\in \mathcal{F}$, the inequality

The following pathbreaking result in the theory of TU games was independently gave by Bondareva [26] and Shapley [27].

*Lemma 1.* [3]. A totally balanced TU game has a non-empty core set.

Where forming the grand coalition is not guaranteed, the following notation is applied:

*Definition 9.*A (not necessarily superadditive) TU game $\mathcal{G}$ for a family $\mathcal{F}$ of coalitions is totally balanced if for every balanced collection of weights ${\mu}_{\mathcal{A}}$, and for any $\mathcal{A}\in \mathcal{F}$,

So, if a TU game is totally balanced, then the core is non-empty; therefore, it is a convenient solution concept on the class of totally balanced TU games. There is an interesting relation between convex and balanced games.

*Lemma 2.* [4]. A convex game is totally balanced, but the converse is not necessarily true.

The other key feature of coalitional convex games is

*Lemma 3.* [19] The core set of a convex game is unique.

*power distribution*based on core set solution. This example is an extended form of the example established by ([28], Ch. 12). The network sketched in Figure 1 wishes to allocate power among three players $\mathcal{K}=\{{k}_{1},\phantom{\rule{0.3em}{0ex}}{k}_{2},\phantom{\rule{0.3em}{0ex}}{k}_{3}\}$, according to their will to cooperate with each other. A power of 1 mW is provided to the network if three players decide to cooperate, or equivalently if the grand coalition will form. If only one player refuses to cooperate, a power of 0.8 mW will be assigned to the pair of cooperating nodes. The coalition game of Figure 1 is defined by

The players of each coalition will cooperate with each other. The player of a singleton coalition will be isolated.

Each player receives a positive payoff if it decides to cooperate, whereas all players receive zero if no agreement is bound. To divide the total payoff (power) in some appropriate way, we rest on the core set definition. It is straightforward to show that the coalitional TU game defined by (14) is superadditive. From Equations 3 and 4, it is easy to show that TU game (14) is not convex (supermodular). To check whether the core set of TU game (14) is empty or not, we resort to the balanced solution. TU game (14) is not balanced even though assigning the balanced weights as ${\mu}_{\mathcal{A}}=1$ for singleton coalitions, and ${\mu}_{\mathcal{A}}=0$ otherwise, inequality (10) holds. Using the fact that other balanced collection of weights exists in which ${\mu}_{\mathcal{A}}=\frac{1}{2}$ for $\left|\mathcal{A}\right|=2$ and ${\mu}_{\mathcal{A}}=0$ otherwise, the game is not balanced, and its core set may be empty. Note that this result *does not* mean that the core set of the game is *surely* empty.

Now, we heuristically find a core apportionment studying various possible networks. When there is no cooperation among players, the players are not provided with any power, that is, $\mathcal{F}=\left[\phantom{\rule{0.3em}{0ex}}\left\{\phantom{\rule{0.3em}{0ex}}{k}_{1}\phantom{\rule{0.3em}{0ex}}\right\},\left\{\phantom{\rule{0.3em}{0ex}}{k}_{2}\phantom{\rule{0.3em}{0ex}}\right\},\left\{\phantom{\rule{0.3em}{0ex}}{k}_{3}\phantom{\rule{0.3em}{0ex}}\right\}\phantom{\rule{0.3em}{0ex}}\right]$ with payoff distribution:

*k*

_{2}, decides to cooperate with both

*k*

_{1}and

*k*

_{3}, but the two players

*k*

_{1}and

*k*

_{3}do not bind an agreement to mutually cooperate. It is reasonable to suppose that the player

*k*

_{2}can act as a relay between

*k*

_{1}and

*k*

_{3}, and it must be provided with more power, that is, $\mathcal{F}=\left[\phantom{\rule{0.3em}{0ex}}\left\{\phantom{\rule{0.3em}{0ex}}{k}_{1},{k}_{2}\phantom{\rule{0.3em}{0ex}}\right\},\left\{\phantom{\rule{0.3em}{0ex}}{k}_{2}\phantom{\rule{0.3em}{0ex}},{k}_{3}\phantom{\rule{0.3em}{0ex}}\right\}\phantom{\rule{0.3em}{0ex}}\right]$ with payoff distribution:

As can be easily seen, the above argument satisfies feasibility and non-blocking conditions of the core set apportionment in Definition 6. It is worthwhile to note that the core set definition does not imply an even division of the whole payoff across players. Thus, it is clear that this game consists of multiple core sets. The power distribution problem can also be solved by game-theoretic bargaining solutions, e.g., Nash bargaining game and auction [3].

### 3.1 On core stability

The goal of the network Figure 1 is to allocate power among players in order to stimulate all of them to cooperate. Obviously, each player tries to get the highest possible payoff. Let us predict the behavior of the players after having known the definition of the game. Suppose that the players *k*_{1} and *k*_{2} find an opportunity to meet each other. Obviously, they quickly take advantage to cooperate and achieve payoff distribution **x** = [ 0.4,0.4,0 ]. Then, it is profitable for player *k*_{1} to invite player *k*_{3} to join, therefore improving its own payoff from 0.4 to 0.6 and that of player *k*_{3} from zero to 0.2. On the other hand, this new agreement causes a decreasing payoff of player *k*_{2} from 0.4 to 0.2, and now the players *k*_{2} and *k*_{3} have an incentive to cooperate and increase their proper payoff from 0.2 to 1/3. Note that this agreement makes the player *k*_{1}’s payoff decrease from 0.6 to 1/3. The unfavorable decision of player *k*_{2} would tempt player *k*_{1} to retaliate. A negotiation between *k*_{1} and *k*_{3} to release cooperation with *k*_{2} results increasing their payoffs and boiling down *k*_{2}’s payoff to zero. The result of the above argument concerns the network is sustained by only one pair cooperation under the threat of ‘If you cooperate with the third player, then I will do the same’. ^{a} It is fairly clear that the players would seek to cooperate only as pairs for the purpose of negotiation, and not cooperate in the grand coalition framework, even though the game is superadditive. This is due to the fact of being superadditive but not balanced. The pairs can be changed as time goes on. In fact, the core apportionment suffers the lack of ‘farsighted’ (i.e., long-term)stability.

A coalition structure based on the core set is not adequately farsighted to avoid the elusiveness of the negotiation structure. At first sight, the core appears to be an extremely myopic notion, requiring the stability of a proposed allocation to deviations or blocks by coalitions, but not examining the stability of the deviations themselves. In general, the stability requirement is that the outcome be immune to deviations of a certain sort by coalitions. To provide the formal definition of farsighted stability, we need some additional notation.

*Definition 10.* [29] For $\mathbf{x},\phantom{\rule{0.3em}{0ex}}\mathbf{y}\in \mathcal{I}\left(\mathcal{K},\mathit{\nu}\right)$, **x** *indirectly dominates* **y**, which is denoted by **y** ≪ **x**, if there exist a finite sequence of imputations **y** = **x**_{1},**x**_{2},…,**x**_{
m
} = **x** and a finite sequence of nonempty coalitions ${\mathcal{A}}_{1},{\mathcal{A}}_{2},\dots ,{\mathcal{A}}_{m}$, such that for each *j* = 1, 2,…,*m* - 1: (i) by the deviation of ${\mathcal{A}}_{j}$, the imputation of **x**_{
j
} is replaced to **x**_{j+1}, and (ii) **x**_{
j
} [*k*] < **x** [*k*] for all $k\in {\mathcal{A}}_{j}$.

Condition (i) says that each coalition in ${\mathcal{A}}_{j}$ has the power to replace imputation **x**_{
j
} by imputation **x**_{j+1}, and condition (ii) says that each player in ${\mathcal{A}}_{j}$ strictly prefers imputation **x** to imputation **x**_{
j
}. It is clear that the indirect dominance relation contains the direct dominance relation.

*Definition 11.* [29, 30] Let $\mathcal{G}=\left(\mathcal{K},\nu \right)$ be a TU game. A subset $\mathcal{J}$ of $\mathcal{I}\left(\mathcal{K},\mathit{\nu}\right)$ is a *farsighted stable* set if: (i) for all $\mathbf{x},\mathbf{y}\in \mathcal{J}$, neither **x** ≪ **y** nor **y** ≪ **x**, and (ii) for all $\mathbf{y}\in \mathcal{I}(\mathcal{K},\nu )\setminus \mathcal{J}$ there exists $\mathbf{x}\in \mathcal{J}$ such that **y** ≪ **x**. Conditions (i) and (ii) are called *internal stability* and *external stability*, respectively.

By internal stability, there is no imputation in $\mathcal{J}$ that is dominated by another imputation in $\mathcal{J}$. By external stability, an imputation outside a stable set $\mathcal{J}$ is unlikely to be attained. Let us introduce three other different payoff distribution concepts which capture foresight of the players.

## 4 Shapley value

The Shapley value is an alternative solution for the payoff distribution in TU games. The Shapley value has long been a central solution concept in coalitional game theory. It was introduced by L. S. Shapley in the seminal paper [31] and it was seen as a reasonable way of distributing the gains of cooperation, in a fair and unique way, among the players in the game. In the Shapley solution, those who contribute more to the groups that include them are paid more. Let us denote *ϕ*_{
k
} (*ν*) as the Shapley value of player *k* in the TU game defined by *ν*. The surprising result due to Shapley is the following theorem.

*Theorem 1.*There is a unique single-valued solution to TU games satisfying efficiency, symmetry, additivity, and dummy. It is the well-known Shapley value, the function that assigns to each player

*k*the payoff:

*k*to the coalition $\mathcal{A}$. The Shapley value can be interpreted as the expected marginal contribution made by a player to the value of a coalition, where the distribution of coalitions is such that any ordering of the players is equally likely. That makes the Shapley value exponentially hard to compute. Shapley characterized such value as the unique solution that satisfies the following four axioms:

- (1)
*Efficiency*: The payoffs must add up to $\nu \left(\mathcal{K}\right)$, which means that all the grand coalition surplus is allocated, that is,$\sum _{k\in \mathcal{K}}{\varphi}_{k}\left(\nu \right)=\nu \left(\mathcal{K}\right).$In the absence of superadditivity, instead we use $\underset{\psi \in \mathit{\Psi}}{\text{max}}\phantom{\rule{0.3em}{0ex}}\sum _{\mathcal{A}\in \psi}\nu \left(\mathcal{A}\right)$.

- (2)
*Symmetry*: This axiom requires that the names of the players play no role in determining the value. If two players are substitutes because they contribute the same to each coalition, the solution should treat them equally, that is,$\nu (\mathcal{A}\cup \{k\left\}\right)=\nu (\mathcal{A}\cup \{i\left\}\right)\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\Rightarrow \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{\varphi}_{k}\left(\nu \right)={\varphi}_{i}\left(\nu \right).$ - (3)
*Additivity*: The solution to the sum of two TU games must be the sum of what it awards to each of the two games, that is,${\varphi}_{k}\left(\nu +\omega \right)={\varphi}_{k}\left(\nu \right)+{\varphi}_{k}\left(\omega \right)\phantom{\rule{2em}{0ex}}\forall \phantom{\rule{2.77626pt}{0ex}}k\in \mathcal{K.}$ - (4)
*Dummy player*: The player*k*is dummy (null) if $\nu (\mathcal{A}\cup \{k\left\}\right)=\nu \left(\mathcal{A}\right)$ for all $\mathcal{A}$ not containing*k*. If a player*k*is dummy, the solution should pay it nothing, i.e.,*ϕ*_{ k }(*ν*) = 0.

*,*[33]. To make an example, let us calculate the Shapley value of the players in the power distribution game of Figure 1:

- (1)
*Marginality*: If the marginal contribution to coalitions of a player in two games is the same, then the award of the player must be the same, that is, if$\begin{array}{ll}\phantom{\rule{-14.0pt}{0ex}}\nu \left({\mathcal{A}}_{i}\right)-\nu ({\mathcal{A}}_{i}\setminus \{k\left\}\right)& =\omega \left({\mathcal{A}}_{j}\right)-\omega ({\mathcal{A}}_{j}\setminus \{k\left\}\right)\\ \phantom{\rule{1.6em}{0ex}}\forall \phantom{\rule{2.77626pt}{0ex}}{\mathcal{A}}_{i}\in \nu \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{and}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\forall \phantom{\rule{2.77626pt}{0ex}}{\mathcal{A}}_{j}\in \omega \phantom{\rule{0.3em}{0ex}},\end{array}$

then *ϕ*_{
k
} (*ν*) = *ϕ*_{
k
} (*ω*).

Marginality is an idea with a strong tradition in economic theory. In Young’s definition, marginality is assumed and additivity is dropped. Young [34] shows that the Shapley value is unique.

*Theorem 2.* [34] There exists a unique single-valued solution to TU games satisfying efficiency, symmetry, and marginality; this solution is the Shapley value.

In the network engineering literature, Kim [35] proposes an energy efficient routing protocol based on the Shapley value. The concept of the Shapley value is used by Khouzani and Sarkar [36] to achieve a fair aggregate cost of link sharing, among primary and secondray users in a cognitive network. Using the Shapley value, a suitable network resource sharing among multimedia users is fairly achievable, as Park and van der Schaar propose in [37].

## 5 The kernel and nucleolus

*excess*of the coalition $\mathcal{A}$ with respect to the payoff vector $\mathbf{x}\in {\mathbb{R}}^{K}$ is defined as

*maximum excess*of player

*k*against

*i*is defined as

If player *k* departs from **x**, the most it can hope to gain (the least to lose) without the consent of player *i* is the amount of maximum excess. The extensions of the excess for NTU games are formalized in [38].

As defined by Osborne and Rubinstein ([3], Ch. 14), a coalition ${\mathcal{A}}_{i}$ is an *objection* of *k* against *i* to **x**, if ${\mathcal{A}}_{i}$ includes *k* but not *i* and *x*_{
i
} > *ν* ({*i*}). Equivalently, ${\mathcal{A}}_{i}$ is a coalition that contains *k*, excludes *i*, and which gains too little. A coalition ${\mathcal{A}}_{j}$ is a *counter-objection* to the objection ${\mathcal{A}}_{i}$ of *k* against *i*, if ${\mathcal{A}}_{j}$ includes *i* but not *k* and $e\left({\mathcal{A}}_{j},\mathbf{x}\right)\phantom{\rule{0.3em}{0ex}}\ge \phantom{\rule{0.3em}{0ex}}e\left({\mathcal{A}}_{i},\mathbf{x}\right)$. Equivalently, ${\mathcal{A}}_{j}$ is a coalition that contains *i* and excludes *k* and that gains even less. Objections and counter-objections are exchanged between members of the same coalition in ${\mathcal{A}}_{i}$.

The idea captured by the *kernel* is that if at a non-empty imputation **x**, the maximum excess of player *k* against any other player *i* is less than the maximum excess of player *i* against the player *k*, then player *k* should get less. Of course, the players cannot get less than their individual worths if **x** is an imputation. The definition of the kernel follows:

*Definition 12.*The kernel is the set of all imputations

**x**with the property that for every objection ${\mathcal{A}}_{i}$ of any player

*k*against any other player

*i*to

**x**, there is a counter-objection of

*i*to ${\mathcal{A}}_{i}$, such that

- (a)
*e*_{ ki }(**x**) =*e*_{ i k }(**x**); or - (b)
*e*_{ ki }(**x**) <*e*_{ i k }(**x**) and*x*_{ k }=*ν*({*k*}); or - (c)
*e*_{ ki }(**x**) >*e*_{ i k }(**x**) and*x*_{ i }=*ν*({*i*}).

The kernel is the set of imputations **x** such that for any coalition ${\mathcal{A}}_{i}$, for each objection ${\mathcal{A}}_{j}$ of a user $k\in {\mathcal{A}}_{i}$ over any other member $i\in {\mathcal{A}}_{i}$, there is a counter-objection of *i* to ${\mathcal{A}}_{j}$. The kernel is contained in the (non-empty) core in any assignment game *ν* ([39], Theorem 1). In Figure 1, the unique kernel element is the equal split **x** = [ 1/3,1/3,1/3 ]; otherwise, for the single player coalition objection of the player with the minimum payoff, there is no any counter-objection.

*nucleolus*. With the nucleolus, no confusion regarding the player set can arise. The basic motivation behind the nucleolus is that one can provide an allocation that minimizes the excess of the coalitions in a given coalitional game $\mathcal{G}=(\mathcal{K},\nu )$. For a TU game $\mathcal{G}=(\mathcal{K},\nu )$ and the payoff vector $\mathbf{x}\in {\mathbb{R}}^{K}$, let us denote $\mathbf{E}\left(\mathbf{x}\right)=\left[\cdots \ge e\left(\mathcal{A},\mathbf{x}\right)\ge \cdots :\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\varnothing \ne \mathcal{A}\ne \mathcal{K}\right]$ as a 2

^{ K }- 2 dimensional vector whose components are the values of the excess function for all $\mathcal{A}\subset \mathcal{K}$, arranged in a non-increasing order. The nucleolus of a game is the imputation which minimizes the excess with respect to the lexicographic order

^{b}over the set of imputations. The nucleolus of $\mathcal{G}$ with respect to $\mathcal{I}\left(\mathcal{K},\mathit{\nu}\right)$ is given by

The definition of the nucleolus of a coalitional game in characteristic function form entails comparisons between vectors of exponential length. Thus, if one attempts to compute the nucleolus by simply following its definition, it would take an exponential time. In the network engineering literature, Han and Poor [40] apply the Shapley value, excess, and nucleolus solutions to study a possible cooperative transmission among intermediate nodes to help relay the information of wireless users.

This defining property makes the nucleolus appealing as a fair single-valued solution. It is easy to see that whenever the core of a game is non-empty, the nucleolus lies in it [4]. Moreover, the nucleolus always belongs to the kernel and satisfies the symmetry and dummy axioms of Shapley: dummy players receive zero payoffs. If a null player is removed from the game, the payoff allocation of the remaining players is uninfluenced by its departure. Because of these desirable properties, the nucleolus solution has found a lot of applications in cost sharing and resource allocation as Maschler in [41] reports. However, the nucleolus possesses certain features that makes it less agreeable. The original definition treats the excesses of any two coalitions as equally important, regardless of coalition sizes and coalition composition. Some unappealing features of utility distribution, derived with the nucleolus, are listed in [34]. For instance, the nucleolus lacks many monotonicity properties, that is, if a game changes so that some player’s contribution to all coalitions increases, then the player’s allocation should not decrease. Monotonicity states that as the underlying data of game change, the utility must change in a parallel fashion.

## 6 Cooperative Nash equilibria

Coalitional games aim at identifying the best coalitions of the agents and a fair distribution of the payoff among the agents. The classic core solution is an extension of the Nash equilibrium, since the coalitions bind agreements of agents with each other and earns a vector value rather than a real number. In ([42], Section 7.6), it is shown that the core set of an underlying coalitional game, if it exists, asymptotically coincides with the set of Nash equilibria of the repeated game, in the long run. The result of the Nash equilibrium is not always a satisfactory outcome for an external observer (e.g., prisoner’s dilemma game). Aumann in [43] and Bernheim et al. [44] introduce a stronger notion of Nash equilibria based on coalitional game theory. First, let us review the definition of the Nash equilibrium, where each pure strategy in a static game is presented as a coalition in a coalitional game. Thus, each player belongs to only one coalition.

*Definition 13.* A pure strategy (coalition) combination $\psi =\left[{\mathcal{A}}_{1},{\mathcal{A}}_{2},\dots ,{\mathcal{A}}_{m}\right]$, wherein ${\mathcal{A}}_{i}\bigcap {\mathcal{A}}_{j\ne i}=\varnothing $, $\bigcup _{i=1}^{m}{\mathcal{A}}_{i}=\mathcal{K}$, and a payoff distribution **x** = [*x*_{1},…,*x*_{
K
}] is a pure Nash equilibrium, if a player $k\in \mathcal{K}$ whose unilateral deviation to a different coalition (pure strategy) yields a new distribution **y** = [*y*_{1},…,*y*_{
K
}], such that *y*_{
k
} > *x*_{
k
}, does not exist.

*k*

_{1}and

*k*

_{2}, are supposed to operate a direct link that enables them to communicate without intermediaries. Each player wants to send a packet to its destination,

*d*

_{1}and

*d*

_{2}respectively, in each time step using the other player as a forwarder. We assume that each forwarding has a energy cost 0 <

*c*≪ 1. If player

*k*

_{1}forwards (

*F*) the player’s

*k*

_{2}packet, player

*k*

_{2}gets a reward 1 and vice versa. Each player’s utility is its reward minus the cost. Each player is allured to drop (

*D*) the received packet for saving energy. The strategic form of this game is depicted in Figure 3. In the cooperative representation of the forwarder’s dilemma game, there are two coalitions $\psi =\left[{\mathcal{A}}_{F},{\mathcal{A}}_{D}\right]$, and each player in $\mathcal{K}=\{{k}_{1},{k}_{2}\}$ must choose one coalition. For instance, $\psi =\left[{\mathcal{A}}_{F}=\{{k}_{1},{k}_{2}\},{\mathcal{A}}_{D}=\varnothing \right]$ is equivalent to the strategy profile (

*F*,

*F*), and $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{2}\right\},{\mathcal{A}}_{D}=\left\{{k}_{1}\right\}\right]$ corresponds to the strategy profile (

*D,F*),and so on.

Unilateral deviation of player *k*_{1} from $\psi =\left[{\mathcal{A}}_{F}=\{{k}_{1},{k}_{2}\},{\mathcal{A}}_{D}=\varnothing \right]$ to $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{2}\right\},{\mathcal{A}}_{D}=\left\{{k}_{1}\right\}\right]$ increases its own payoff; therefore, the pure strategy profile (*F,F*) is not a Nash equilibrium point. The same applies to the departure of player *k*_{2} from $\psi =\left[{\mathcal{A}}_{F}=\{{k}_{1},\phantom{\rule{0.3em}{0ex}}{k}_{2}\},{\mathcal{A}}_{D}=\varnothing \right]$ to the pure strategy $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{1}\right\},{\mathcal{A}}_{D}=\left\{{k}_{2}\right\}\right]$. We can easily check the different combinations of $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{1}\right\},\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{D}=\left\{{k}_{2}\right\}\right]$, $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{2}\right\},\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{D}=\left\{{k}_{1}\right\}\right]$, and finally $\psi =\left[{\mathcal{A}}_{F}=\varnothing ,\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{D}=\{{k}_{1},\phantom{\rule{0.3em}{0ex}}{k}_{2}\}\right]$. The unilateral move of user *k*_{1} (respectively *k*_{2}) from the strategy profile $\psi =\left[{\mathcal{A}}_{F}=\varnothing ,\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{D}=\{{k}_{1},\phantom{\rule{0.3em}{0ex}}{k}_{2}\}\right]$ to $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{1}\right\},\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{D}=\left\{{k}_{2}\right\}\right]$ (respectively to $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{2}\right\},\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{D}=\left\{{k}_{1}\right\}\right]$) does not yield any benefit. This game has a unique Nash equilibrium at the pure joint strategy $\psi =\left[{\mathcal{A}}_{F}=\varnothing ,{\mathcal{A}}_{D}=\{{k}_{1},{k}_{2}\}\right]$ with unsatisfactory payoff distribution **x** = [0,0]. At the Nash equilibrium point, either players choose the ‘competitive’ and ‘egoistic’ strategy *D*.

In many games, there are opportunities for joint deviations that are mutually beneficial for a subset of players. This led Aumann [43] to propose the idea of *strong Nash equilibrium* which ensures a more restrictive stability than the conventional Nash equilibrium. Strong Nash equilibrium reflects the unprofitability of coalition deviations. It is a strategy profile that is stable against deviations not only by single players but also by all coalitions of players. A strong equilibrium is defined as a strategic profile for which no subset of players has a joint deviation that strictly benefits all of them, while all other players (in the subset) are expected to maintain their equilibrium strategies.

*Definition 14.* A strategy (coalition) combination $\psi =\left[{\mathcal{A}}_{1},{\mathcal{A}}_{2},\dots ,{\mathcal{A}}_{m}\right]$, where ${\mathcal{A}}_{i}\bigcap {\mathcal{A}}_{j\ne i}=\varnothing $ and $\bigcup _{i=1}^{m}{\mathcal{A}}_{i}=\mathcal{K}$ with payoff distribution **x** = [*x*_{1},…,*x*_{
K
}] is a strong Nash equilibrium if there do not exist a coalition ${\mathcal{A}}_{i}\in \psi $ whose deviation yields a new distribution **y** = [ *y*_{1},…,*y*_{
K
}] such that ${y}_{k}\ge {x}_{k}\forall \phantom{\rule{0.3em}{0ex}}k\in {\mathcal{A}}_{i}$ and $\exists \phantom{\rule{0.3em}{0ex}}k\in {\mathcal{A}}_{i}$ such that *y*_{
k
} > *x*_{
k
}.

This definition of strong equilibrium is actually slightly different from those of [43] and [44]. Definition 14 allows a coalition to deviate from a strategy profile that strictly increases the payoffs of some of its members without decreasing those of the other members, whereas the original definition allows only deviations that strictly increase the payoffs of all members of a deviating coalition. We note that if a game implements a strategy for strong equilibrium, it does not necessarily implement it for Nash equilibrium. Both interpretations of strong Nash equilibrium are prominent in the literature, and in most games, the two definitions lead to the same sets of strong Nash equilibria; however, the one that we use here is slightly more appealing in the context of network formation games (see, e.g., [46]). Network formation games involve a number of independent players that interact with each other in order to form a suited graph that connects them.

- 1.
$\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{1}\right\},{\mathcal{A}}_{D}=\left\{{k}_{2}\right\}\right]$ is not a strong Nash equilibrium because the deviation of ${\mathcal{A}}_{F}$ increases its member’s payoff.

- 2.
$\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{2}\right\},{\mathcal{A}}_{D}=\left\{{k}_{1}\right\}\right]$ is not a strong Nash equilibrium because the deviation of ${\mathcal{A}}_{F}$ renders its member’s payoff higher.

- 3.
$\psi =\left[{\mathcal{A}}_{F}=\varnothing ,{\mathcal{A}}_{D}=\{{k}_{1},{k}_{2}\}\right]$ is not a strong Nash equilibrium because the deviation of both players from ${\mathcal{A}}_{D}$ to ${\mathcal{A}}_{F}$ increases payoff distribution.v2

- 4.
$\psi =\left[{\mathcal{A}}_{F}=\{{k}_{1},{k}_{2}\},{\mathcal{A}}_{D}=\varnothing \right]$ is a strong Nash equilibrium because the departure of one or both players from ${\mathcal{A}}_{F}$ to ${\mathcal{A}}_{D}$ decreases at least one player’s payoff.

The unique strong Nash equilibrium is the strategy profile (*F,F*) which corresponds to the coalition set of $\psi =\left[{\mathcal{A}}_{F}=\{{k}_{1},{k}_{2}\},{\mathcal{A}}_{D}=\varnothing \right]$, since no deviation can better off the payoff distribution vector **x** = [ 1 - *c*,1 - *c* ]. In fact, at the strong Nash equilibrium, both players choose the ‘cooperative’ and ‘altruistic’ strategy of *F* in spite of the energy transmission cost.

In network problems, Zhong and Wu show that using strong Nash equilibria context makes possible a collusion-resistant routing in non-cooperative wireless *ad hoc* networks [47]. Altman et al. [48] examine a dynamic random access game with orthogonal power constraints, in which the probability of the transmission of a terminal in each slot depends on the amount of energy left prior to that slot. They show the existence of a strong Nash equilibrium point.

Conventional Nash equilibrium is concerned with the possibilities of only one step deviation by any player. The notion of strong Nash equilibrium requires an agreement not be subject to an improving (one step) deviation by any coalition of players given that all other coalitions be inert. This notion is stronger than the Nash equilibrium, but it is not resistant to further deviation by sub-coalitions (the subsets of a coalition). Recognizing this problem, Bernheim et al. [44] introduced the notion of *coalition-proof Nash equilibrium*, which requires only that an agreement be immune to improving deviations which are *self-enforcing*. The definition of a self-enforcing deviation is recursive.

*Definition 15.* For a singleton coalition, a deviation is self-enforcing if it maximizes the player’s payoff. For a coalition of more than one player, a deviation is self-enforcing if (1) it is profitable for all its members and (2) if there is no further self-enforcing and improving deviation available to a proper sub-coalition of players.

Generally, a deviation by a coalition is self-enforcing if no sub-coalition has an incentive to initiate a new deviation. In the forwarder’s dilemma game, the Nash equilibria is upset by a deviation of the coalition of both players *k*_{1} and *k*_{2}. At the pure strategy Nash equilibrium where each player choose strategy *D*, they each obtain a payoff of 0. By jointly deviating (both choosing *F* instead) *k*_{1} and *k*_{2}, each earn a payoff 1 - *c*. This deviation is not self-enforcing even though the movement to the pure strategy $\psi =\left[{\mathcal{A}}_{F}=\{{k}_{1},{k}_{2}\},{\mathcal{A}}_{D}=\varnothing \right]$ is profitable for both players. At strong Nash pure strategy (*F, F*), the player *k*_{1} tempts to move to strategy (*D, F*) to get more payoff, and player *k*_{2} to that (*F, D*). Thus, the strong Nash equilibrium is not immune against self-enforceability.

This notion of self-enforceability provides a useful means of distinguishing coalitional deviations that are viable from those that are not resistant to further deviations. With the concept of self-enforceability, our notion of coalition-proofness is easily formulated.

*Definition 16.* In a one player game, a strategy is a coalition-proof Nash equilibrium if it maximizes the player utility. In a game with more than one player, a combination strategy is coalition-proof Nash equilibrium if no sub-coalition has a self-enforcing deviation that makes all its members better off.

This solution concept requires that there is no sub-coalition that can make a mutually beneficial deviation (keeping the strategies of non-members fixed) in a way that the deviation itself is stable according to the same criterion. In the forwarder’s dilemma game, the strong Nash equilibrium profile (*F*,*F*) is not equivalent to coalition-proof Nash equilibrium. This is due to the fact that the deviation of $\left\{{k}_{1}\right\}\subset {\mathcal{A}}_{F}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}\{{k}_{1},{k}_{2}\}$ to the strategy (*D*,*F*) increases payoff of *k*_{1}. In this game, any coalition-proof Nash equilibrium does not exist due to the fact that all pure strategies have at least one self-enforcing deviation.

Bernheim et al. [44] note that for two-person games, the set of coalition-proof equilibria coincides with the set of Nash equilibria that are not Pareto-dominated by any other Nash equilibrium. However in *n*-person games (*K* ≥3), the equilibrium concepts are independent. At coalition-proof Nash equilibrium, the deviations are restricted to be stable themselves against further deviations by sub-coalitions. Moldovanu [49] discusses the situations of a three-player game, wherein coalition-proof Nash equilibrium is equivalent to the core set. The conditions under which the set of coalition-proof Nash equilibria coincides with the set of strong Nash equilibria are formulated by Konishi et al. [50].

*ad hoc*networks [52]. To better understand the concepts of self-enforceability and coalition-proof Nash equilibrium, let us introduce an intuitive

*subcarrier allocation game in an OFDMA network*. Let us focus on three wireless transmitters $\mathcal{K}=\left\{{k}_{1},\phantom{\rule{0.3em}{0ex}}{k}_{2},\phantom{\rule{0.3em}{0ex}}{k}_{3}\right\}$ and an OFDMA base station with two subcarriers $\mathcal{N}=\{1,2\}$. Every subcarrier $n\in \mathcal{N}$ has a frequency spacing

*Δ*

*f*. Each user $k\in \mathcal{K}$ experiences a Gaussian complex-valued channel gain |

*H*

_{ kn }|

^{2}on the

*n*th subcarrier to the base station. We assume that each subcarrier can be shared among more than one transmitter. The payoff of each player (transmitter) is defined as the achieved Shannon channel capacity. Each user $k\in \mathcal{K}$ is allowed to either spend a certain power ${\overline{p}}_{k}$ on only one chosen subcarrier, or equally divide it among both subcarriers. In the pure strategy

*a*

_{1}, player

*k*transmits with the maximum power ${\overline{p}}_{k}$ on subcarrier

*n*= 1 and does not transmit any information on subcarrier

*n*= 2. The strategy

*a*

_{2}is contrary to

*a*

_{1}, i.e., exclusively transmitting on subcarrier

*n*= 2 with maximum power. Finally, strategy

*a*

_{3}equally divides its power on two subcarriers and exploits transmitting on both tones. The terminal

*k*achieves a channel capacity:

*C*

_{ kn }is the Shannon capacity achieved by user

*k*on the

*n*th subcarrier

wherein *p*_{kn} represents the power allocated by terminal *k* over the *n* th subcarrier and where the interference term $\sum _{k\ne i\in \mathcal{K}}|{H}_{\text{in}}{|}^{2}{p}_{\text{in}}$ is approximated with a Gaussian random variable of equal mean and variance. Choosing the strategy *a*_{1} means selecting ${p}_{k1}={\overline{p}}_{k}$ and *p*_{k 2} = 0. For the strategy *a*_{2}, *p*_{k 1} = 0, and ${p}_{k2}={\overline{p}}_{k}$, and for strategy *a*_{3}, ${p}_{k1}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}{p}_{k2}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}\frac{{\overline{p}}_{k}}{2}$. The parameter ${\sigma}_{\mathrm{w}}^{2}$ is the power of the additive white Gaussian noise (AWGN). Note thats in an OFDMA system, there is no interference between adjacent subcarriers. Hence, *C*_{kn} considers only intra-subcarrier noise that occurs when the same subcarrier is shared by more terminals.

*k*

_{1}chooses the row, player

*k*

_{2}chooses the column, and player

*k*

_{3}chooses the matrix. Each payoff reports the (rounded) value of the achieved Shannon channel capacity in kb/s. We consider the following parameters for our simulations: the maximum power of each terminal

*k*is ${\overline{p}}_{k}=10\phantom{\rule{1em}{0ex}}\text{mW}$; the power of the ambient AWGN noise on each subcarrier is ${\sigma}_{\mathrm{w}}^{2}=100\phantom{\rule{1em}{0ex}}\text{pW}$, and finally the carrier spacing is $\Delta \phantom{\rule{0.3em}{0ex}}f=\frac{10}{1024}\phantom{\rule{1em}{0ex}}\text{MHz}$.

^{c}The path coefficients |

*H*

_{kn}|

^{2}, corresponding to the frequency response of the multipath wireless channel, are computed using the 24-tap ITU modified vehicular-B channel model adopted by the IEEE 802.16m standard [53].

It is easy to show that the (pure) Nash equilibrium strategies of Figure 4 are (*a*_{3},*a*_{3},*a*_{3}) equivalent to $\psi =\left[{\mathcal{A}}_{{a}_{1}}=\varnothing ,{\mathcal{A}}_{{a}_{2}}=\varnothing ,{\mathcal{A}}_{{a}_{3}}=\mathcal{K}\phantom{\rule{0.3em}{0ex}}\right]$ and (*a*_{1},*a*_{2},*a*_{2}) to $\psi =\left[{\mathcal{A}}_{{a}_{1}}=\left\{\phantom{\rule{0.3em}{0ex}}{k}_{1}\phantom{\rule{0.3em}{0ex}}\right\},{\mathcal{A}}_{{a}_{2}}=\{\phantom{\rule{0.3em}{0ex}}{k}_{2},{k}_{3}\phantom{\rule{0.3em}{0ex}}\},{\mathcal{A}}_{{a}_{3}}=\varnothing \right]$. The Nash equilibrium strategy (*a*_{3},*a*_{3},*a*_{3}) is neither coalition-proof nor strong. With the deviation of the coalition ${\mathcal{A}}_{{a}_{3}}$ to the strategy profile (*a*_{2},*a*_{1},*a*_{3}), all players profit more with payoff distribution [ 13, 9, 11 ]. This change is no longer valid since there exists a self-enforceability for player *k*_{1} to transit to the strategy profile (*a*_{3},*a*_{1},*a*_{3}). This transition is not favorable for players *k*_{2} and *k*_{3}. The player *k*_{2} is tempted to transit to the Nash equilibrium point to earn a higher payoff, whereas the Nash equilibrium strategy profile (*a*_{1},*a*_{2},*a*_{2}) with payoff vector [ 15, 10, 10 ] is a strong and coalition-proof Nash equilibrium. This is due to the fact that in $\psi =\left[{\mathcal{A}}_{{a}_{1}}=\left\{\phantom{\rule{0.3em}{0ex}}{k}_{1}\phantom{\rule{0.3em}{0ex}}\right\},\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{{a}_{2}}=\{\phantom{\rule{0.3em}{0ex}}{k}_{2},{k}_{3}\phantom{\rule{0.3em}{0ex}}\},\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{{a}_{3}}=\varnothing \right]$, there is no deviation and self-enforceability that can improve the payoff distribution. As can be seen, all players prefer to stay at the coalition-proof Nash equilibrium rather than at the pure Nash equilibrium strategy (*a*_{3},*a*_{3},*a*_{3}). Note that a strong or coalition-proof Nash equilibrium does not necessarily coincide with a Nash equilibrium strategy profile, and the result of Figure 4 is an exception.

In general, the existence of a pure cooperative or non-cooperative Nash equilibrium for subcarrier allocation game in OFDMA network is not guaranteed. Given different parameter approaches to quite different channel capacities, and this may result a matrix form without any type of Nash equilibrium. There even might exist a Nash equilibrium which is Pareto-dominated by another strategy profile. This shows that in OFDMA networks, an appropriate resource allocation technique is needed [9].

## 7 Coordinated equilibrium

The most common solution concept in (non-cooperative) game theory, Nash equilibrium, assumes that players take mixed actions independently from each other. Cooperative games allow players to coordinate each other to find out possible equilibria and (joint) optimizations that the players can perform on their own. Unlike evolutionary games ([3], Ch. 3), in coordinated games, the interaction between players is implemented once among all players by a central authority to increase their throughput. The notion of *correlated equilibrium* was introduced by Aumann [54]. Correlated equilibria are defined in a context where there is an intermediator who sends random (private or public) signals to the players. An intermediator needs not to have any intelligence or knowledge of the game. These signals allow players to coordinate their actions and, in particular, to perform joint randomization over strategies. ‘Correlated strategies are familiar from cooperative game theory, but their applications in non-cooperative games are less understood’, says Auman [54]. This is because the players of a coordination game are not totally isolated and without a communication between them, achieving coordinated strategy profile not possible.

*k*

_{1}and

*k*

_{2}wish to send some packets to their receivers sharing a common resource, i.e., the wireless medium. They are in the sight of each other, and accordingly, they interferer if transmitting at the same time. The users have two possible pure strategies: access (

*A*) and wait (

*W*). In this game, two identical transmitters must simultaneously decide whether to access to channel or wait. The transmission of each packet has an energy cost of 0 <

*c*≪ 1. Each player earns a payoff 1 if it succeeds to transmit its packet without collision with the other. Waiting does not bring either cost or reward for the player. Each player’s utility is its reward minus the cost. This game has three Nash equilibria: (

*A*,

*W*), (

*W*,

*A*), and a mixed strategy Nash equilibrium, where each player transmits with the probability 1-

*c*([45], Sections 2.3 and 2.4). The utilities of Nash equilibria strategies are (1 -

*c*,0), (0,1 -

*c*), and (0,0), respectively. It is clear that the mixed strategy is not resistant to an improving deviation. In the following, we give the possibility of preplay communication to achieve a stable Nash equilibria.

In the game with ‘cheap conversation’, each player simultaneously and publicly announces whether it decides to access or wait. Following the announcements, each player makes its choice. Suppose the players agree to participate to the game binding the following agreement: each player announces *A* with probability $\frac{3}{4}$. If the profile of announcements is either (*A*,*W*) or (*W*,*A*), then each player plays its own announcement. Otherwise, each player plays *A* with probability $\frac{1}{2}$. Note that no further communication is possible. The use of joint deviation requires the unanimity of all members of the deviating coalition. A player agrees to be a part of a joint deviation if given its own information the deviation is profitable. Thus, if a joint deviation is used, it is common knowledge that each deviator believes that deviation is profitable.

*correlated strategy*[54]

*,*[55] (of the original game) given in Figureف6, in order to face a higher utility in strategy profiles (

*A*,

*W*) and (

*W*,

*A*). It is important to note that this joint probability distribution is not the product of its marginal distributions and therefore cannot be achieved from a mixed strategy profile of the game without correlation among players.

As can be seen, the proposed correlated deviation from the mixed strategy equilibrium makes both players better off. Note that the players are allowed to bind an agreement only on the space of feasible outcomes. In the correlated multiple access game, the outcome is feasible since the correlated results are in the range between the smallest and highest possible payoff. In fact, the set of correlated equilibria contains those equilibria from which no coalition has a self-enforcing deviation, making all members better off.

*k*

_{1}and

*k*

_{2}are placed close to and far from a certain access point (AP), respectively, in a code division multiple access (CDMA) network with high SINR regime. The strategy of each player is to transmit either with the maximum power $\overline{p}$ or with a weakened level $\eta \overline{p}$, where 0 <

*η*< 1. Due to the interference at the AP, the throughput (the amount of delivered information) of each player depends on the strategies chosen by both players. Transmitting with a higher power increases the BER, and this results decreasing the throughput. Each player is rewarded

*r*if it successfully delivers its packet and a reduced

*δ*

*r*, if it delivers a corrupted version of the packet, where 0 <

*η*<

*δ*< 1. If the near player

*k*

_{1}decides to transmit with the power $\overline{p}$, the farther player

*k*

_{2}will not be able to deliver any information to the AP.

This results in no benefit for *k*_{2} and causes a power consumption cost equal to -*η* *c* if *k*_{2} chooses strategy $\eta \overline{p}$ and -*c* otherwise, where *c* ≪ *r*. Obviously, transmitting with power $\overline{p}$ for *k*_{1} results in a complete information delivery. This concerns a payoff equal to reward minus power consumption cost, i.e., *r* - *c*, irrespective of the *k*_{2} strategy. The packets of player *k*_{2} are successfully delivered if it chooses the maximum power $\overline{p}$ and player *k*_{1} that reduced $\eta \overline{p}$. On the other hand, if both players decide to transmit with reduced power $\eta \overline{p}$, the near player takes the payoff *δ* *r* - *η* *c* > 0, while the farther player *k*_{2} will not successfully deliver any packet and suffers only a power cost -*η* *c*.

*r*-

*c*and -

*η*

*c*for

*k*

_{1}and

*k*

_{2}, respectively. This means that at the Nash equilibrium point, the farther player is not able to send any information. On the other hand, the Pareto optimal solutions of the game are the strategies $\left(\overline{p},\eta \overline{p}\right)$ and $\left(\eta \overline{p},\overline{p}\right)$. This is an unsatisfactory outcome for the far player

*k*

_{2}, while the near player

*k*

_{1}takes the highest possible payoff. Now, let us find the mixed strategy of the game. We denote

*α*

_{1}the probability with which the near player

*k*

_{1}decides to transmit with the maximum power $\overline{p}$ and

*α*

_{2}the same probability for the far player

*k*

_{2}. The payoffs of the players

*k*

_{1}and

*k*

_{2}are represented by

Both players want to maximize their own payoff. As can be seen, ${x}_{{k}_{1}}$ takes its maximum value *r* - *c* with *α*_{1} = 1. On the other hand, with *α*_{1} = 1, the far player *k*_{2} earns a negative payoff whatever *α*_{2} ∈ [0,1]. Instead, with *α*_{1} = 0, the near player *k*_{1} gains *δ* *r* - *η* *c*, and player *k*_{2} setting up *α*_{2} = 1 achieves the payoff *δ* *r* - *c*. Thus, the best values for *α*_{1} and *α*_{2} are 0 and 1, respectively. The conclusion is that the mixed strategy is equivalent to the pure strategy $\left(\eta \overline{p},\overline{p}\right)$ with payoff **x** = [ *δ* *r* - *η* *c*,*δ* *r* - *c* ]. In this game, there is no (totally) mixed strategy and that is equal to the one of the pure Pareto optimal points.

*δ*

*r*-

*c*. We show that an appropriate agreement among players can satisfy both of them at correlated equilibrium. Players

*k*

_{1}and

*k*

_{2}can guarantee an expected payoff of

**x**= [

*r*-

*c*,

*δ*

*r*-

*c*] by playing the correlated strategy profile:

*κ*in the expression $\kappa \xb7\left(\overline{p},\eta \overline{p}\right)+\left(1-\kappa \right)\xb7\left(\overline{p},\overline{p}\right)$ is indifferent for the near player

*k*

_{1}, since it gets its own highest possible payoff,

*r*-

*c*as well. To satisfy the far player

*k*

_{2}, it is enough to solve the following equation for ${x}_{{k}_{2}}$:

Bonneau et al. [57] show that the coordination among mobile users can significantly increase the performance of access to a common channel in ALOHA setting. A coordination mechanism is also considered by Bonneau et al. [58] to achieve the optimal power allocation in a wireless network, wherein each terminal knows only its own channel state. The concept of correlated equilibrium is also introduced in a multi-user interference channel context in [59]. Different types of coordination are deeply discussed and widely used in [55].

## 8 Dynamic learning

Until now, we have realized that the Nash equilibrium suffers from the lack of farsighted stability, i.e., the relative results can be unsatisfactory; because of this, any player can have incentive to improve its outcome by moving to another strategy. The existence of the strong and coalition-proof Nash equilibrium is not guaranteed and even if so, when the number of pure strategies is large, finding such solutions is very complicated. The challenge of finding a profitable accord among players is persistent in coordinated equilibria solution. In this section, the main question we seek an answer to is *How can the players be led to a stable joint pure strategy gaining an acceptable payoff?* This question is important, even if multiple equilibrium points with the same payoff have been identified, since each player may autonomously decide to stay in a different strategy.

*Dynamic learning*[60] has been widely used in order to get rid of the anarchy derived from the conflicts between selfish decisions. Learning is a joint adaptive process for agents to converge and to get the best final response. The agents either have a common interest like a team work or each agent has its own greedy goal. Generally, there are three learning process types: *individual learning*, *joint-action learning*, and *stochastic learning*. In individual learning process, the independent agents cannot observe one another’s actions, i.e., for each player, the opponents are passive agents. Instead, during joint-action learning process, the notion of the ‘optimality’ is improved by adding the observation of other concurrent learners to accomplish a stable optimal solution. The stochastic learning framework, having Markovian property and a stochastic inter-state transition rule, enables each player to observe the opponents’ actions history.

In the network engineering literature, van der Schaar and Fu [61] introduce a stochastic learning process among autonomous wireless agents for the optimization of dynamic spectrum access, given the QoS of multimedia applications. A reconfigurable multi-hop wireless network is studied by Shiang and van der Schaar [62], wherein a decentralized stochastic learning process optimizes the transmission decisions of nodes aimed at supporting mission-critical applications. In [63], Lin and van der Schaar propose a reinforcement learning among agents of a multi-hop wireless network based on Markov decision process. Each terminal autonomously adjust transmission power in order to maximize the network utility, in a dynamic delay-sensitive environment.

*k*has a finite set of individual actions

**A**

_{ k }. Each agent

*k*individually chooses a pure joint action (strategy) to be performed

**a**

_{ k }= (

*a*

_{ 1 },…,

*a*

_{ K }) ∈

**A**

_{ 1 }× ⋯ ×

**A**

_{ K }from the available joint strategy space. Q-learning enables the individual learners to achieve optimal coordination from repeated trials. Q-learning introduces a certain value

*Q*as the immediate reward obtained after having moved to the new strategy. Each player individually updates a

*Q*value for each of its actions. In each time step and after having selected the new joint action

**A**

_{ k }, the values of${Q}_{k}^{t}$ is individually updated. In particular, the value of${Q}_{k}^{t+1}\left({\mathbf{a}}_{k}\right)$ estimates the utility of performing the joint strategy

**A**

_{ k }for user

*k*. In the seminal paper of Watkins and Dayan [64], the

*Q*value is updated by the following recursion:

*δ*

_{ k }∈ (0,1) is a discount factor, and

*r*

_{ k }(

**a**

_{ k }) is a reward of the joint action

**A**

_{ k }for the respective player;

*f*

_{ k }is a function of

*t*which is related to ‘learning rate’. Watkins and Dayan showed that given bounded rewards, learning rate$0\le {f}_{k}^{t}<1$, and

*Q*

_{ k }values updating (25) converge a common joint pure strategy with probability one. The reward

*r*

_{ k }is defined by a learning policy, and it is not necessarily equal to the payoff defined by the game. The learning policy is greedy with respect to the

*Q*value, i.e., the particular action

**A**

_{ k }will be selected in long-run if it makes

*Q*value better off. Q-learning is guaranteed to converge to an optimal and stable joint strategy regardless of the action selection policy. Q-learning is not applicable where the strategy space is continuous or the number of strategies is not finite. Claus and Boutilier [65] establish a simplified version of the

*Q*recursion (25) which updates the

*Q*value by the following recursion:

*Q*recursion (27). In a multi-learners scenario, a major challenge of Q-learning is strategy selection. When the number of strategies and players are large, the number of time step to achieve an optimal joint action exponentially increases. It is fairly clear that the best manner is to start with ‘exploration’ of different strategies and then focus on ‘exploitation’ of the strategies with the best value of

*Q*. Kaelbling et al. [66] recall

*Boltzmann function*as an efficient strategy selection to strike a balance between exploration and exploitation. Boltzmann functions define a probability distribution among different joint actions. At each time step

*t*+ 1, every player will individually select the joint strategy

**A**

_{ k }with the probability

*p*(

**a**

_{ k }):

_{ k }(

**a**

_{ k }) = (

*δ*

_{ k })

^{ t }·

*r*

_{ k }(

**a**

_{ k }) is the discounted reward for taking action

**A**

_{ k }by the user

*k*in time step

*t*. The

*T*is a function which provides a randomness component to control exploration and exploitation of the actions. Practically, the

*temperature function*

*T*is a decreasing function over time to decrease the exploration and increase exploitation. High values of

*T*yield a small

*p*(

**a**

_{ k }) value and this encourages exploration, whereas a low

*T*makes

*Q*(

**a**

_{ k }) more important and this encourages exploitation. At time

*t*= 0, each player randomly chooses a strategy and assign a random number to its own

*Q*value. At time step

*t*, after having been updated function

*T*, each concurrent agent’s experience consists of a sequence of stages [65]:

- 1.
Computing

*p*(**a**_{ k }) for all$\phantom{\rule{0.3em}{0ex}}{\mathbf{a}}_{k}\in {\times}_{\forall k\in \mathcal{K}}{\mathbf{A}}_{k}$. - 2.
Generating a random number${\xi}_{k}^{t}$ uniformly distributed in [0,1], and then choosing the best joint strategy

**A**_{ k }, i.e., the highest*p*(**a**_{ k }) such that${\xi}_{k}^{t}\ge p\left({\mathbf{a}}_{k}\right)$. If${\xi}_{k}^{t}<p\left({\mathbf{a}}_{k}\right)$**for all**${\mathbf{a}}_{k}\in \underset{\forall k\in \mathcal{K}}{\times}{\mathbf{A}}_{k}$, then the learner randomly picks a strategy. - 3.
Updating the${Q}_{k}^{t}$ value according to (27). If${Q}_{k}^{t}$ grows, then the learner moves to selected joint strategy

**A**_{ k }, otherwise it stays in the current joint action and do not update*Q*.

Despite the individual best strategy selection of the learners, this process reach a common stable joint strategy such that all players stay there forever, i.e., no player deviates from the (common) achieved joint strategy.

The theory of learning in games studies how and which equilibria might arise as a consequence of a long-run non-equilibrium process of learning. A natural question is *Can learning algorithms find a Nash equilibrium?* The reason for asking this question is in the hope of being able to achieve Nash equilibria, as a plausible concept, via a reasonable learning algorithm in particular when there are a large number of players and strategies. At the first look, the stability of the above addressed dynamic learning approach is described as to converge to a *pure* joint strategy, and it is clear that the existence of a pure Nash equilibrium is not guaranteed. The fact is, in general, a dynamic learning algorithm is not able to guarantee to achieve a non-cooperative or cooperative Nash equilibrium. In the literature, there are some efforts to present a dynamic learning algorithm that achieves a Nash equilibrium in dynamic and repeated games under particular constraints [67–70].

We present now some results about Q-learning in a CDMA network. In what follows, the experimental work is presented highlighting how the agents learn to increase their individual rewards by revealing their actions. As above mentioned, the strategy selection can significantly influence the number of time steps to converge. Choosing an appropriate temperature function is a heuristic search. In our experiment, we define *T* = *q* · *e*^{-m t} as our temperature function, wherein *m* controls the rate of exponential decay and *q* > 1 encourages the exploration of different strategies in the initial time steps.

*power control problem in a CDMA network*applying Q-learning and Boltzmann function. Assume a CDMA network with

*K*mobile terminals denoted by set$\mathcal{K}$. The players wish to transmit data to a certain AP. The strategies of every player is a set of discrete power levels denoted by

**A**=

**A**

_{ k }= [

*Δ*

*p*,2.

*Δ*

*p*,…,

*M*.

*Δ*

*p*], where

*Δ*

*p*is our power step and

*M*> 1 is an integer number. Each user has

*M*actions to choose from, and accordingly, the matrix game is made by${\times}_{k\in \mathcal{K}}\mathbf{A}$ which consists of

*M*

^{ K }joint strategies. The Shannon capacity between player

*k*and the AP is

with *N*_{
s
}, |*H*_{
k
}|^{
2
}, and${\sigma}_{\mathrm{w}}^{2}$ denoting (the common) spreading factor for all players, user’s *k* path gain, and the AWGN power, respectively, and where the *p*_{
k
} ∈ **A** denotes the transmit power of user *k*.

*K*= 8 players, such that each player

*k*must choose the best

*p*

_{ k }between

*M*= 5 strategies. The power step is assumed to be

*Δ*

*p*= 100mW, the power of AWGN${\sigma}_{\mathrm{w}}^{2}=1\text{nW}$ and the spreading factor

*N*

_{ s }= 64. The players are uniformly located at a distance between 3 and 50 m from the AP. The matrix form of this game is composed of 390,625 joint strategies, and there may exist different power combinations (joint strategies) which achieve the same Shannon channel capacities. Q-learning leads the players to that joint strategy (

*p*

_{ 1 },…,

*p*

_{ 8 }) ∈

**A**

^{ 8 }in which all players are satisfied of the proper achieved Shannon channel capacities. In the

*Q*function of (27) for all players, the discount factor parameter is fixed to

*δ*

_{ k }= 0.09, and the payoff function

*r*

_{ k }is defined as

Our experiments with different parameters show that good values of the temperature function parameters are *m* = 0.001 and *q* = 50, and we start with${Q}_{k}^{t=0}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}0$. It is obvious to say that an existing strategy in which all *r*_{
k
}(**a**_{
k
}) are maximal value is not always guaranteed, since there is a huge conflict of interest between the players to choose different strategies.

*C*

_{ k }of

*K*= 8 terminals as a function of the time step

*t*in our scenario. The figure exhibits the convergence of all learners to a stable joint strategy after six time steps. Numerical results of 500 random realization of a network show the convergence of all players to a stable joint strategy after (in average) six steps of the iterative Q-learning algorithm wherein each joint action is probabilistically chosen according to the distribution of Boltzmann function. Furthermore, it is experimentally observed that the sum of the achieved Shannon channel capacities is (in average) 22.4 b/s/Hz, and that is 94% of the maximum possible of$\sum _{k\in \mathcal{K}}{C}_{k}$.

## 9 Discussion

Cooperation can be seen as the action of obtaining some advantage by giving, sharing, or allowing something. In this contribution, we aimed at mapping different coalitional game approaches into communications and networking systems. A very important boundary condition for cooperation is that each participating entity is gaining more by cooperation than it would by operating alone. It is not important that all entities contribute the same effort, gain the same amount, or even have the same gain-to-cost ratio, but the effect of cooperation should bring advantage or gain to each cooperating entity. One different form of cooperation is altruism, a strategy wherein one of the players may sacrifice and does not gain from the cooperation to support others. In networking, for instance, one terminal sacrifices battery power and bandwidth to act as a relay for other terminals and to increase the throughput of the whole system. In some communication systems, network protocols themselves can be seen as an implicit cooperation to achieve better performance, e.g., ALOHA system. In some communication systems, network entities establish a cooperation with each other to achieve better performance, e.g., relay communications.

Cooperative game theory is a branch of game theory which aims at studying the cooperations among individual and rational participants. Unlike non-cooperative game approaches, cooperative game concepts are centralized, and they need a central authority for exchange of information and policy-making process. The most challenging part of a cooperative game theoretic framework is the choice of characteristic function, since it interprets the agents’ perceptions of gain and satisfaction.

The main fundamental question in coalitional game theory is the question how to allocate the total generated gain by the collective of all players over the different players in the game. The distribution of payoff is described as a binding contract between the players, and various criteria have been developed. The problem of the gain distribution is approached with the aid of solution concepts in coalitional game theory like core, Shapley value, kernel, and nucleolus. Core solution is the most classic solution whose result is stable against deviation of coalitions. The core solution is useful, where the negotiation process is centralized and no subset of players can selfishly and privately negotiate with each other. The core set can be empty. Shapley value is the unique single-valued solution which explores the fairness in every possible prospective coalition forming. The kernel solution should be understood as the set of all efficient allocations for which no pair of players want to exchange payoff. The nucleolus selects the unique imputation that successively (lexicographically) minimizes the maximal excesses. This defining property makes the nucleolus appealing as a fair single-valued solution. The kernel of a game always contains the nucleolus. The process of computing the kernel and nucleolus of arbitrary transferable utility games is hard.

The most fundamental solution concept for non-cooperative game is that of Nash equilibrium. In a Nash equilibrium, no agent is motivated to deviate from its strategy given that others do not deviate. If every player individually agrees on a certain profile of strategies without binding an agreement, then these strategies constitute a Nash equilibrium. Nash equilibrium does not account for the possibility that groups of agents (coalitions) can change their strategies in a coordinated manner. A strategy profile is in strong Nash equilibrium if no subgroup of agents is motivated to change their strategies given that others do not change. Often, the strong Nash equilibrium is a too strong solution concept, since in many games, no such equilibrium exists. Coalition-proof has been suggested as a partial remedy to this problem. This solution concept requires that there is no sub-group that can make a mutually beneficial deviation (keeping the strategies of nonmembers fixed) in a way that the deviation itself is stable according to the same criterion. These solution concepts which allow coalitions to make agreements simultaneously typically suffer from incompatibility of agreements, which can give rise to empty solution sets in games of networking interest. Mixed (vs. pure) strong and coalition-proof Nash equilibrium have not been introduced.

In a game wherein there are a huge number of agents and strategies, finding a pure cooperative/non-cooperative Nash equilibrium is hard and maybe even impossible. A learning process leads participants to a common joint action with an acceptable payoff. During a learning process, agents act as independent learners, i.e., they only get information about their own action choice and payoff. As such, they neglect the presence of the other agents. The learning process happens at regular time steps and is basically a signal for the agents to start an exploration phase. During each exploration phase, some agents exclude their current best action so as to give the team the opportunity to look for a possibly better joint action. This technique of reducing the action space by exclusions was only recently introduced for finding periodical policies in games of conflicting interests. There are two problems in the process of learning optimal cooperative pursuit strategy for multiple agents. One is the probability of circulation among the actions chosen by the agents, which make the learning process not converging; the other is there are many conflicts among the actions chosen by the agents, which make the learned pursuit strategy not optimal. Q-learning with the Boltzmann action-selection strategy guarantees the convergence of multi-agents to a common and optimal joint strategy after a few time step.

## 10 Conclusion

This paper has provided a unified reference for network engineers investigating the applicability of coalitional game theory to practical problems. Different approaches such as core solution, Shapley value, kernel, and nucleolus were shown to provide a strong foundation in finding possible and stable resource/cost sharing arrangements. The results confirm the apparent analogy between the definition of Nash equilibrium in non-cooperative and coalitional game theory: both strong and coalition-proof Nash equilibria reflect on unprofitability of coalition deviations rather than an individual player deviation. In a network wherein informational exchange is possible, either through a central controller or among players themselves, the concept of coordinated equilibrium arises. The results of intuitive examples show a significant improvement in coordinated equilibrium when compared with non-cooperative schemes. When the number of agents or strategies is large, the ability to jointly reach a consensus through environmental learning guarantees convergence to the best joint action.

## Endnotes

^{
a
}
* Two is cooperation, three is a crowd.*

^{
b
} The lexicographic order between two vectors **x** and **y** is defined by **x**≼_{
lex
}**y** if there exists an index *k*, such that **x** [ *l*] = **y** [ *l*] for all *l* < *k*, and **x** [ *k*] < **y** [ *k*].

^{c} This is the carrier spacing of each subcarrier at a base station with 10 MHz bandwidth and 1024 subcarriers.

## Declarations

### Acknowledgements

This work was supported by the European Commission in the framework of the FP7 Network of Excellence in Wireless COMmunications NEWCOM (grant agreement no.: 318306).

## Authors’ Affiliations

## References

- McHenry MA, McCloskeyk D: Spectrum occupancy measurements. (Shared Spectrum Company, 2005),. {NSF}_{Chicago}_2005-11_{measurements}_v12.pdf. Accessed 13 July 2013 http://www.sharedspectrum.com/wp-content/uploads/
- Hyun-Ho C, Jong Bu L, Hyosun H, Kyunghun J: Optimal handover decision algorithm for throughput enhancement in cooperative cellular networks. Paper presented at the 2010 IEEE 72nd vehicular technology conference fall (VTC 2010-fall). Ottawa, Ontario, Canada, 6–9 September 2010Google Scholar
- Osborne MJ, Rubinstein A:
*A Course in Game Theory*. MIT Press, Cambridge; 1994.Google Scholar - Peleg B, Sudhölter P:
*Introduction to the Theory of Cooperative Games*. Springer, Berlin; 2007.Google Scholar - Saad W, Han Z, Debbah M, Hjørungnes A, Basar T: Coalitional game theory for communication networks: A tutorial.
*IEEE Signal Process. Mag*2009, 26(5):77-97.View ArticleGoogle Scholar - Hossain E, Kim DI, Bhargava VK:
*Cooperative Cellular Wireless Networks*. Cambridge University Press, New York; 2011.View ArticleGoogle Scholar - Hew S-L, White L: Cooperative resource allocation games in shared networks: Symmetric and asymmetric fair bargaining models.
*IEEE Trans. Wireless Commun*2008, 7(11):4166-4175.View ArticleGoogle Scholar - Chee TK, Lim CC, Choi J: A cooperative game theoretic framework for resource allocation in OFDMA systems. Paper presented at IEEE international conference on communication systems (ICCS), Singapore, 30 October–1, November 2006Google Scholar
- Shams F, Bacci G, Luise M: An OFDMA resource allocation algorithm based on coalitional games.
*EURASIP J. Wireless Commun. Netw*2011, 2011(1):46. 10.1186/1687-1499-2011-46View ArticleGoogle Scholar - Kwon H, Lee GB: Cooperative power allocation for broadcast/multicast services in cellular OFDM systems.
*IEEE Trans. Commun*2009, 57(10):3092-3102.View ArticleGoogle Scholar - Zeydan E, Kivanc D, Tureli U, Comaniciu C: Joint iterative beamforming power adaptation for MIMO ad hoc networks.
*EURASIP J. Wireless Commun. Netw*2011, 2011(1):79. 10.1186/1687-1499-2011-79View ArticleGoogle Scholar - Li D, Xu Y, Wang X, Guizani M: Coalitional game theoretic approach for secondary spectrum access in cooperative cognitive radio networks.
*IEEE Trans. Wireless Commun*2011, 10(3):844-856.View ArticleGoogle Scholar - Khan Z, Glisic S, DaSilva L, Lehtomȧndki J: Modeling the dynamics of coalition formation games for cooperative spectrum sharing in an interference channel.
*IEEE Trans. Comput. Intell. AI Games*2011, 3(1):17-30.View ArticleGoogle Scholar - Stanojev I, Simeone O, Spagnolini U, Bar-Ness Y, Pickholtz R: Cooperative ARQ via auction-based spectrum leasing.
*IEEE Trans. Commun*2010, 58(6):1843-1856.View ArticleGoogle Scholar - Javadi F, Kibria M, Jamalipour A: Bilateral Shapley value based cooperative gateway selection in congested wireless mesh networks. Paper presented at IEEE global telecommunications conference (GLOBECOM), New Orleans, LO, USA, 30 November–4 December 2008Google Scholar
- Huang J, Han Z, Chiang M, Poor H: Auction-based resource allocation for multi-relay asynchronous cooperative networks. Paper presented at IEEE international conference on acoustics, speech, and signal processing (ICASSP), Las Vegas, NV, USA, 31 March–4 April 2008Google Scholar
- Deng H, Wang Y, Lu J: Auction based resource allocation for balancing efficiency and fairness in OFDMA relay networks with service differentiation. Paper presented at IEEE 72nd vehicular technology conference (VTC), Ottawa, Canada, 6–9 September 2010Google Scholar
- Rodoplu V, Meng T: Core capacity region of energy-limited, delay-tolerant wireless networks.
*IEEE Trans. Wireless Commun*2007, 6(5):1844-1853.View ArticleGoogle Scholar - Shapley LS: Cores of convex games.
*Int. J. Game Theory*1971, 1: 11-26. 10.1007/BF01753431MathSciNetView ArticleGoogle Scholar - ParandehGheibi A, Eryilmaz A, Ozdaglar A, Medard M: Resource allocation in multiple access channels. Paper presented at conference on signals, systems and computers (ACSSC), Pacific Grove, CA, USA, 4–7 November 2007Google Scholar
- Gillies DB:
*Some Theorems on N-Person Games, PhD Thesis*. Department of Mathematics, Princeton University; 1953.Google Scholar - Madiman MM: Cores of cooperative games in information theory.
*EURASIP J Wireless Commun. Netw*2008, 2008: 318704. 10.1155/2008/318704View ArticleGoogle Scholar - Li D, Xu Y, Liu J, Wang X: Relay assignment cooperation maintenance in wireless networks. Paper presented at IEEE wireless communication networking conference (WCNC), Sydney, Australia, 18–21 April 2010Google Scholar
- Niyato D, Hossain E: A cooperative game framework for bandwidth allocation in 4G heterogeneous wireless networks.
*ICT-MICC*2006, 9: 4357-4362.Google Scholar - Sandholm TW, Lesser VR: Coalitions among computationally bounded agents.
*Artif. Intell*1997, 94: 99-137. 10.1016/S0004-3702(97)00030-1MathSciNetView ArticleGoogle Scholar - Bondareva O: Some applications of the methods of linear programming to the theory of cooperative games.
*Problemy Kibernetiki*1963, 10: 119-139. (Russian)MathSciNetGoogle Scholar - Shapley LS: On balanced sets and cores.
*Naval Res. Logistics Q*1967, 14(4):453-560. 10.1002/nav.3800140404View ArticleGoogle Scholar - Roth AE:
*The Shapley Value. Essays in Honor of Lloyd S. Shapley*. Cambridge University Press, UK; 1988.View ArticleGoogle Scholar - Harsanyi JC: An equilibrium-point interpretation of stable sets and a proposed alternative definition.
*Manage. Sci*1974, 20(11):1472-1495. 10.1287/mnsc.20.11.1472MathSciNetView ArticleGoogle Scholar - Rafels C, Tijs S: On the cores of cooperative games and the stability of the Weber set.
*In. J. Game Theory*1997, 26: 491-499. 10.1007/BF01813887MathSciNetView ArticleGoogle Scholar - Shapley LS: A value for n-person games. Contribution to the theory of games.
*Annals Math. Studies*1953, 2: 28.Google Scholar - Hart S: A comparison of non-transferable utility values. Center for Rationality and Interactive Decision Theory. Hebrew University, Jerusalem. Discussion Paper Series, 2003Google Scholar
- Otten G-J, Peters HJ:
*The Shapley Transfer Procedure for NTU-Games*. Maastricht University, Netherlands; 2002.Google Scholar - Young HP: Monotonic solutions of cooperative games.
*Int. J. Game Theory*1985, 14: 65-72. 10.1007/BF01769885View ArticleGoogle Scholar - Kim S: Cooperative game theoretic online routing scheme for wireless network managements.
*Commun. IET*2010, 4(17):2074-2083. 10.1049/iet-com.2009.0686View ArticleGoogle Scholar - Khouzani M, Sarkar S: Economy of spectrum access in time varying multichannel networks.
*IEEE Trans. Mobile Comput*2010, 9(10):1361-1376.View ArticleGoogle Scholar - Park H, van der Schaar M: Coalition-based resource negotiation for multimedia applications in informationally decentralized networks.
*IEEE Trans. Multimedia*2009, 11(4):765-779.View ArticleGoogle Scholar - Pechersky S: On proportional excess for NTU games. European University at St. Petersburg, Department of Economics, Tech. Rep. Ec-02/01 (2001)Google Scholar
- Driessen TSH: A note on the inclusion of the kernel in the core of the bilateral assignment game.
*Int. J. Game Theory*1998, 27(2):301-303. 10.1007/s001820050073MathSciNetView ArticleGoogle Scholar - Han Z, Poor HV: Coalition games with cooperative transmission: A cure for the curse of boundary nodes in selfish packet-forwarding wireless networks.
*IEEE Trans. Commun*2009, 57(1):203-213.View ArticleGoogle Scholar - Maschler M: The bargaining set, kernel, and nucleolus. In
*Handbook of Game Theory with Economic Applications*. Edited by: Aumann R, Hart S. Elsevier New York; 1992:591-667.View ArticleGoogle Scholar - Demange G, Wooders M:
*Group Formation in Economics: Networks, Clubs, and Coalitions*. Cambridge University Press, UK; 2005.View ArticleGoogle Scholar - Aumann RJ: Acceptable points in general cooperative n-person games. In
*Annals of Mathematics Studies, 40, in Contributions to the Theory of Games*. Edited by: Princeton University, Princeton University . Princeton University Press, NJ; 1959:287-324.Google Scholar - Bernheim BD, Peleg B, Whinston MD: Coalition-proof Nash equilibria I. Concepts.
*J Econ. Theory*1987, 42(1):1-12. 10.1016/0022-0531(87)90099-8MathSciNetView ArticleGoogle Scholar - Félegyházi M, Hubaux JP: Game theory in wireless networks: A tutorial. ACM Comput. Surveys 2006, Technical Report: LCA-REPORT-2006-002, EPFLGoogle Scholar
- Jackson M, Nouweland VD: Strongly stable networks.
*Games Econ. Behavior*2005, 51(2):420-444. 10.1016/j.geb.2004.08.004View ArticleGoogle Scholar - Zhong S, Wu F: A collusion-resistant routing scheme for noncooperative wireless ad hoc networks.
*IEEE/ACM Trans. Netw*2010, 18(2):582-595.View ArticleGoogle Scholar - Altman E, Basar T, Menache I, Tembine H: A dynamic random access game with energy constraints. Paper presented at the international symposium on modeling and optimization in mobile, ad hoc, and wireless networks (WiOpt), Seoul, South Korea, 23–27 June 2009Google Scholar
- Moldovanu B: Coalition-proof Nash equilibria and the core in three-player games.
*Games Econ. Behavior*1992, 4(4):565-581. 10.1016/0899-8256(92)90037-SMathSciNetView ArticleGoogle Scholar - Konishi H, Le Breton M, Weber S: Equivalence of strong and coalition-proof Nash equilibria in games without spillovers.
*Econ. Theory*1997, 9: 97-113. 10.1007/BF01213445MathSciNetView ArticleGoogle Scholar - Félegyházi M, Cagalj M, Bidokhti S, Hubaux J-P: Non-cooperative multi-radio channel allocation in wireless networks. Paper presented at IEEE computer and communication societies conference (INFOCOM), Anchorage, AK, 6–12 May 2007Google Scholar
- Gao L, Wang X, Xu Y: Multiradio channel allocation in multihop wireless networks.
*IEEE Trans. Mobile Comput*2009, 8(11):1454-1468.View ArticleGoogle Scholar - IEEE 802.16 Broadband Wireless Access Working Group: IEEE 802.16m Evaluation Methodology Document (EMD). Tech. Rep. IEEE 802.16m-08/004r5 (Jan. 2009)Google Scholar
- Aumann RJ: Subjectivity and correlation in randomized strategies.
*J. Math. Econ*1974, 1(1):67-96. 10.1016/0304-4068(74)90037-8MathSciNetView ArticleGoogle Scholar - Heller Y: Correlated equilibrium behavior and seemingly-iterational. Ph.D. Dissertation, Faculty of Exact Sciences, Tel-Aviv University, Israel, 2011Google Scholar
- Bacci G, Luise M, Poor HV: Game theory and power control in ultrawideband networks.
*Phys. Commun*2008, 1(1):21-39. 10.1016/j.phycom.2008.01.004View ArticleGoogle Scholar - Bonneau N, Altman E, Debbah M: Correlated equilibrium in access control for wireless communications. Paper presented at international conference on networking (IFIP), Coimbra, Portugal, 15–19 May 2006Google Scholar
- Bonneau N, Debbah M, Altman E, Hjorungnes A: Non-atomic games for multi-user systems.
*IEEE J. Sel. Areas Commun*2008, 26(7):1047-1058.View ArticleGoogle Scholar - Charafeddine M, Han Z, Paulraj A, Cioffi J: Crystallized rates region of the interference channel via correlated equilibrium with interference as noise. Paper presented at IEEE international conference on communication (ICC), Dresden, Germany, 14–18 June 2009Google Scholar
- Dilts R, Epstein TA:
*Dynamic Learning*. MeTa Publications, Capitola; 1995.Google Scholar - van der Schaar M, Fu F: Spectrum access games and strategic learning in cognitive radio networks for delay-critical applications.
*Proc. IEEE*2009, 97(4):720-740.View ArticleGoogle Scholar - Shiang H-P, van der Schaar M: Online learning in autonomic multi-hop wireless networks for transmitting mission-critical applications.
*IEEE J. Sel. Areas Commun*2010, 28(5):728-741.View ArticleGoogle Scholar - Lin Z, van der Schaar M: Autonomic and distributed joint routing and power control for delay-sensitive applications in multi-hop wireless networks.
*IEEE Trans. Wireless Commun*2011, 10(1):102-113.View ArticleGoogle Scholar - Watkins CJCH, Dayan P: Q-learning.
*Mach. Learn*1992, 8(3):279-292.Google Scholar - Claus C, Boutilier C: The dynamics of reinforcement learning in cooperative multiagent systems. Paper presented at the fifteenth national conference on artificial intelligence (AAAI-98), Madison, WI, 26–30 July 1998Google Scholar
- Kaelbling LP, Littman ML, Moore AW: Reinforcement learning: A survey.
*J. Artif. Intell. Res*1996, 4: 237-285.Google Scholar - Kalai E, Lehrer E: Rational learning leads to Nash equilibrium.
*Econometrica*1993, 61(5):1019-1045. 10.2307/2951492MathSciNetView ArticleGoogle Scholar - Fudenberg D, Levine DK: Learning and equilibrium.
*Annu. Rev. Econ*2009, 1(1):385-420. 10.1146/annurev.economics.050708.142930MathSciNetView ArticleGoogle Scholar - Milgrom P, Roberts J: Rationalizability, learning, and equilibrium in games with strategic complementarities.
*Econometrica*1990, 58(6):1255-1277. 10.2307/2938316MathSciNetView ArticleGoogle Scholar - Daskalakis C, Frongillo R, Papadimitriou CH, Pierrakos G, Valiant G: On learning algorithms for Nash equilibria. Paper presented at the third international symposium on algorithmic game theory (SAGT), Athens, Greece, 18–20 October 2010Google Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.