 Research
 Open Access
 Published:
Basics of coalitional games with applications to communications and networking
EURASIP Journal on Wireless Communications and Networking volume 2013, Article number: 201 (2013)
Abstract
Game theory is the study of decision making in an interactive environment. Coalitional games fulfill the promise of group efficient solutions to problems involving strategic actions. Formulation of optimal player behavior is a fundamental element in this theory. This paper comprises a selfinstructive didactic means to study basics of coalitional games indicating how coalitional game theory tools can provide a framework to tackle different problems in communications and networking. We show that coalitional game approaches achieve an improved performance compare to noncooperative game theoretical solutions.
1 Introduction
The increase of the number of wireless services, combined with demand for highdefinition multimedia communications, have made the radio resources, and particularly the spectrum and power, a very precious and scarce resource, not because of their unavailability but because they are used inefficiently. For licensed spectrum, the measurements by Shared Spectrum Company [1] shows that the maximal usage of the spectrum is a low percentage of the whole licensed. While the number of users and the spectrum usage steadily increases, the amount of spectrum is still considered a limited resource. Besides, to differentiate between the true signal and background noise is complex for a radio equipment. Generally, this complex process enforces terminals to transmit strong version of signals, which wastes the energy of a transmitter.
The modern wireless entities, i.e., wireless terminals and base stations, have considerable capacities to execute dynamic processes. This capability encourages wireless service providers to consider wireless entities as autonomous agents which could cooperate and negotiate with each other to achieve an efficient resource allocation in different situations. Cooperation among wireless terminals is usually intended to achieve a fair radio resource allocation. Cooperation between base stations can be devised to mitigate interference and promote soft handover where channel gain is varying rapidly which is a challenge in LTE [2].
Game theory is the most prominent tool to analyze interaction issue in social sciences, wherein often, cooperation among autonomous agents is essential for successful task completion. In many settings, groups of competing agents are simultaneously concerned with both individual and overall benefits. In the game theory literature, this branch is known as cooperative game [3, 4]. The players, as the main decision making entities in the game, are considered to negotiate with each other to determine a binding agreement among them. If we assume that all users act rationally and we know what the behavior of the users are, it is possible to determine the overall performance of a system since the actions of one user becomes part of the circumstances for another user. Thus, we are interested in individual performance and overall system performance under a specific set of rules. To fully develop the different possibilities within a game for cooperation among players, we have to address which groups the players can achieve collectively. Indeed, if a player assesses that within a certain group it does not receive what it is able to get by itself, then it might decide to abandon the cooperation and pursue an alternative allocation by itself. Cooperative game theory offers the opportunity to extend and expand the treatment of the players in traditional noncooperative games, especially where selfish players compete over a set of resources. The cooperative game theory is divided into two parts: coalitional game theory and bargaining games [3, 4]. In this contribution, we focus on the coalitional game theory. We show that in comparison to noncooperative game theory, coalitional games approaches can achieve better results in terms of performance and stability.
Saad et al. in a tutorial paper [5] classify coalitional games into three categories: canonical (coalitional) games, coalition formation games, and coalitional graph games. In canonical games, no group of players can do worse by joining a coalition than by acting noncooperatively. In coalition formation games, forming a coalition brings advantage to its members, but the gains are limited by a cost for forming the coalition. In coalitional graph games, the coalitional game is in graph form, and the interconnection between the players strongly affects the characteristics as well as the outcome of the game.
In the last few years, cooperative game theory has been successfully applied to communications and networking. Hossain et al. [6] provides a guide to stateofart which unifies the essential information, addressing both theoretical and practical aspects of cooperative communications and networking in the context of cellular design. The current literature is mainly focused on applying cooperative games in various applications such as distributed/centralized radio resource allocation [7–9], power control [10, 11], spectrum sharing in cognitive radio [12, 13], cooperative automatic repeat request (ARQ) mechanism [14], cooperative routing [15], and cooperative communications [16, 17]. These problems in wireless networks can be modeled as a cooperative game since it is highly likely that each wireless user can obtain a better utility value by forming groups and controlling resources cooperatively rather than individually. It has been shown that cooperation can result in an enhanced QoS in terms of throughput expansion, bit error rate reduction, or energy saving [6].
Cooperation can be realized at various layers of the network. At the physical layer, different separate antennas can constitute a cluster and then cooperate with each other to exploit multipleinput multipleoutput gains. At the MAC sublayer, some wireless terminals can cooperate with each other to share a common wireless medium in an efficient manner and consequently mitigate the interference hazard. There is also the possibility of cooperation of physical and application layers among individual terminals to adapt channel and source codings in multimedia communications. The altruistic decision of cooperation with other network entities may result in an improvement of the overall network performance and concurrently achieve an egoistic interest of self improvement.
The rest of this paper is divided into eight sections. The following section provides an introductory discussion of coalitional game theory. We systematically study fundamental definitions and conditions of coalitional games: superadditivity and convexity. Then, Section 3 and the subsection inside discuss the core set solution as the most known solution for payoff distribution. Section 4 is devoted to a study of a strong payoff distribution, the socalled Shapley value. In Section 5, we present a systematic study of two other reward divisions called the kernel and nucleolus. Then, in Section 6, we extend the concept of Nash equilibria in coalitional games. Section 7 is an investigation of the concept of coordinated equilibria, where players of the game are admitted to precommunicate among themselves at once. Section 8 helps the reader to understand the basic concepts and importance of dynamic learning in coalitional games. Every section contains some motivation examples that are expedient to understand how different communication network problems can be modeled as coalitional game. We discuss the features of mentioned approaches in Section 9, and finally we conclude this paper in Section 10.
2 Preliminaries
Game theory deals with the study, through mathematical models, of conflict situations in which two or more rational players make decisions that will influence each other’s welfare. The theory of coalitional games [3, 4] also assumes that binding agreements may be established among the players in the course of the conflict situation. In transferable utility (TU) games, the agreement may be reached by any subset of the players, and the gain obtained from this agreement is a real number and is transferable among these players. In nontransferable utility (NTU) games, the agreement may be reached by any subset of the players, but the gain may be nontransferable. The main focus of this dissertation will be on the study of TU games.
A TU game is a pair $\mathcal{G}=(\mathcal{K},\nu )$, where $\mathcal{K}=\phantom{\rule{0.3em}{0ex}}[\phantom{\rule{0.3em}{0ex}}1,\dots ,K]$ denotes the set of players, and ν the coalition (characteristic) function which is interpreted as the maximum outcome (a real number) to each coalition (subset of $\mathcal{K}$) whose players can jointly produce. An NTU game is a pair $\mathcal{G}=(\mathcal{K},V)$ where V is a mapping which for each coalition $\mathcal{A}$, defines a characteristic set, $V\phantom{\rule{1em}{0ex}}\left(\mathcal{A}\right)$, satisfying

(1)
$V\phantom{\rule{1em}{0ex}}\left(\mathcal{A}\right)$ is nonempty and closed subset of ${\mathbb{R}}^{\left\mathcal{A}\right}$,

(2)
For each $k\in \mathcal{A}$, there is a ${V}_{k}\in \mathbb{R}$, such that V ({k}) = (∞,V _{ k }],

(3)
$V\phantom{\rule{1em}{0ex}}\left(\mathcal{A}\right)$ is comprehensive, i.e., for all $\mathbf{x}\in V\phantom{\rule{1em}{0ex}}\left(\mathcal{A}\right)$ and for all $\mathbf{y}\in {\mathbb{R}}^{\left\mathcal{A}\right}$, if $\mathbf{y}\left[\phantom{\rule{0.3em}{0ex}}k\right]\le \mathbf{x}\left[k\right]\forall \phantom{\rule{2.77626pt}{0ex}}k\in \mathcal{A}$, then $\mathbf{y}\in V\phantom{\rule{1em}{0ex}}\left(\mathcal{A}\right)$,

(4)
The set $V\phantom{\rule{1em}{0ex}}\left(\mathcal{A}\right)\bigcap \left\{\mathbf{y}\in {\mathbb{R}}^{\left\mathcal{A}\right}\mid \mathbf{y}\left[\phantom{\rule{0.3em}{0ex}}k\right]\ge {V}_{k}\phantom{\rule{2.77626pt}{0ex}}\forall \phantom{\rule{2.77626pt}{0ex}}k\in \mathcal{A}\right\}$ is bounded.
The characteristic set, $V\phantom{\rule{1em}{0ex}}\left(\mathcal{A}\right)$, is interpreted as the set of achievable outcomes the players in $\mathcal{A}$ can guarantee themselves without cooperating with the players in $\mathcal{K}\setminus \mathcal{A}$. In particular, an NTUgame $\mathcal{G}=(\mathcal{K},V)$ is called a TU game when the characteristic set for each coalition $\mathcal{A}$ takes the form
where $\mathbf{x}=\left[{x}_{1},\dots ,{x}_{\left\mathcal{A}\right}\right]\in {\mathbb{R}}^{\left\mathcal{A}\right}$ and x_{ k } is the payoff of player k in $\mathcal{A}$ and $\nu :{2}^{\mathcal{K}}\to \mathbb{R}$. If $\mathcal{A}$ is a coalition (subset) of $\mathcal{K}$ formed in $\mathcal{G}$, then its members get an overall payoff $\nu \left(\mathcal{A}\right)$, zero for the empty set. Each coalition can be represented as a pure strategy in noncooperative game theory. There exist only few works on NTU game applications to problems in communications [18]. This is because defining a utility function which meets all conditions of a character set in NTU game is not always feasible.
An important property of interest in characteristic form TU games is superadditivity, which, if present, implies that the value of the unite of any two disjoint coalitions is at least as big as the sum of their values.
Definition 1. A TU game $\mathcal{G}$ is superadditive if
In a superadditive TU game, there are positive synergies and the players prefer to join each other rather than act alone. Under superadditivity condition, the players are willing to form the grand coalition (the set $\mathcal{K}$).
Convex or alternatively supermodular coalitional games were introduced by Shapley [19]. He models coalitional situations, where the marginal contribution of a player to a coalition increases as the coalition becomes larger.
Definition 2. A TU game $\mathcal{G}$ is convex or supermodular if for all $k\in \mathcal{K}$
Equivalently,
Definition 3. A TU game $\mathcal{G}$ is convex or supermodular if
Convexity means that there are increasing returns to scale. Note that a convex game is superadditive. To better understand the importance of convexity approach in network problems, we verify the convexity condition in a K user channel access game. The payoff of each coalition of players (transmitters) is defined as the outer MAC capacity region. ParandehGheibi et al. ([20], Lemma 1) shows that in a multiple access channel scenario, the inequality (4) is not met. This means that the game is not convex, and thus adding a new player does not give benefit to other transmitters.
3 The core solution
A central question in a coalitional game is how to divide the extra earnings (or cost savings) among the members of the formed coalition. In a TU game, an allocation is a function x from $\mathcal{K}$ to $\mathbb{R}$ that specifies for each player $k\in \mathcal{K}$ the payoff ${x}_{k}\in \mathbb{R}$ that this player can expect when it cooperates with the other players. The payoff of each player can show the cost borne by the player, the power of influence, and so on, depending on the problem setting.
Definition 4. Let $\mathcal{K}$ be the set of K players of the superadditive TU game $\mathcal{G}$, and let ν be the payoff of the game. The set of all ‘imputations’ of $\mathcal{G}$ is the set
where $\mathbf{x}=\left[{x}_{1},\dots ,{x}_{k},\dots ,{x}_{K}\right]\in {\mathbb{R}}^{K}$ is the imputation vector of the players. The former condition is called the feasibility, and the latter individually rational condition.
The core concept was introduced in [21] and is the most attractive and natural way to define a payoff distribution: if a payoff distribution is in the core, no agent has any incentive to be in a different coalition. The core of a TU game is the subset of all imputations $\mathbf{x}\in \mathcal{I}(\mathcal{K},\nu )$ that no other imputation directly dominates, that is, $\nexists \phantom{\rule{0.3em}{0ex}}\mathbf{y}\in \mathcal{I}(\mathcal{K},\nu )\text{s.t.}{y}_{k}>{x}_{k}\phantom{\rule{1em}{0ex}}\forall \phantom{\rule{0.3em}{0ex}}k\in \mathcal{K}$. As can be seen, for coalitional games as well as noncooperative games, the notion of dominance is essentially equivalent; the payoffs under the various situations are compared, and one situation dominates the others if these payoffs are higher. The core actually presents a condition stronger than Nash equilibrium in noncooperative game: no group of agents should be able to profitably deviate from a configuration in the core. Equivalently, no set of players can benefit from forming a new coalition, which corresponds to the group rationality assumption.
In an NTU game $\mathcal{G}=(\mathcal{K},V)$, the core apportionment is defined as ([4], Ch. 12)
Definition 5. Let $\mathcal{K}$ be the set of K players of the superadditive NTUgame $\mathcal{G}$, and let V be the payoff of the game. The core of $\mathcal{G}$ is the set
where x is the payoff distribution across players, and x_{ k } ∈ x if and only if no coalition can improve upon x_{ k }.
In a TU game $\mathcal{G}=(\mathcal{K},\nu )$, the core apportionment is defined as follows:
Definition 6. Let $\mathcal{K}$ be the set of K players of the superadditive TU game $\mathcal{G}$, and let ν be the payoff of the game. The core of $\mathcal{G}$ is the set
where $\mathbf{x}=\left[{x}_{1},\dots ,{x}_{k},\dots ,{x}_{K}\right]\in {\mathbb{R}}^{K}$ is the payoff distribution across players, and x_{ k } ∈ x if and only if no coalition can improve upon x_{ k }. The second condition is called nonblocking condition.
The core consists of the set of allocations that can be blocked by any coalition of agents. If for some set of agents $\mathcal{A}$ the nonblocking condition does not hold, then the agents in $\mathcal{A}$ have an incentive to collectively deviate from the coalition structure and divide $\nu \left(\mathcal{A}\right)$ among themselves. In general, the core of a given TU game $(\mathcal{K},\nu )$ is found by linear programming as
Madiman [22] introduces some intuitive applications of core solution to information theory contexts, e.g., source coding and multipleaccess channel, and summarizes some of its limitations in multiuser scenarios. Li et al. [23] show that the cooperation among wireless nodes and core apportionment can increase spectrum efficiency in a TDMA cooperative communication. In [24], Niyato and Hossain apply the core solution in a coalition among different wireless access networks to offer a stable and efficient bandwidth allocation.
Indeed, there is a number of realistic application scenarios, in which the emergence of the grand coalition is either not guaranteed or might be perceivably harmful, or is plainly impossible [25]. For a nonsuperadditive coalitional game, the coalition formation process does not lead the players to form the grand coalition. In this case, Definition 6 does not apply. Let us redefine the core set in a general (not necessarily superadditive) coalitional formation TU game [9]. Let $\psi =\left[{\mathcal{A}}_{1},{\mathcal{A}}_{2},\dots ,{\mathcal{A}}_{m}\right]$ denote a partition of the set $\mathcal{K}$, wherein ${\mathcal{A}}_{i}\cap {\mathcal{A}}_{j}=\varnothing $ for i ≠ j, $\bigcup _{i=1}^{m}{\mathcal{A}}_{i}=\mathcal{K}$ and ${\mathcal{A}}_{i}\ne \varnothing $ for i = 1,…,m, and let Ψ denote the set of all possible partitions ψ. Let us also define $\mathcal{F}=\left[{\mathcal{A}}_{1},{\mathcal{A}}_{2},\dots ,{\mathcal{A}}_{n}\right]$, such that $\bigcup _{i=1}^{n}{\mathcal{A}}_{i}=\mathcal{K}$ and ${\mathcal{A}}_{i}\ne \varnothing $ for i = 1,…,n, as a family of (not necessarily disjoint) coalitions.
Definition 7. A ‘core apportionment’ $\mathbf{x}\in {\mathbb{R}}^{K}$ is a payoff distribution with the following property:
Note that, if $\mathcal{G}$ is superadditive, then $\underset{\psi \in \mathit{\Psi}}{\text{max}}\sum _{\mathcal{A}\in \psi}\nu \left(\mathcal{A}\right)=\nu \left(\mathcal{K}\right)$.
The core allocation set can be found through linear programming; its existence in general, depends upon the feasibility of (8). Unfortunately, the core is a strong notion, and there exist many games where it is empty. We can study the nonemptiness of the core without explicitly solving the core equation. The following notation helps simplify the dual of (8):
Definition 8. A superadditive TU game $\mathcal{G}$ for a family $\mathcal{F}$ of coalitions is totally balanced if for any $\mathcal{A}\in \mathcal{F}$, the inequality
holds, where ${\mu}_{\mathcal{A}}$ is a collection of numbers in [0,1] (balanced collection of weights) such that
with ${1}_{\mathcal{A}}\in {\mathbb{R}}^{K}$ denoting the characteristic vector whose elements are
The following pathbreaking result in the theory of TU games was independently gave by Bondareva [26] and Shapley [27].
Lemma 1. [3]. A totally balanced TU game has a nonempty core set.
Where forming the grand coalition is not guaranteed, the following notation is applied:
Definition 9. A (not necessarily superadditive) TU game $\mathcal{G}$ for a family $\mathcal{F}$ of coalitions is totally balanced if for every balanced collection of weights ${\mu}_{\mathcal{A}}$, and for any $\mathcal{A}\in \mathcal{F}$,
So, if a TU game is totally balanced, then the core is nonempty; therefore, it is a convenient solution concept on the class of totally balanced TU games. There is an interesting relation between convex and balanced games.
Lemma 2. [4]. A convex game is totally balanced, but the converse is not necessarily true.
The other key feature of coalitional convex games is
Lemma 3. [19] The core set of a convex game is unique.
Now, we illustrate an intuitive example of power distribution based on core set solution. This example is an extended form of the example established by ([28], Ch. 12). The network sketched in Figure 1 wishes to allocate power among three players $\mathcal{K}=\{{k}_{1},\phantom{\rule{0.3em}{0ex}}{k}_{2},\phantom{\rule{0.3em}{0ex}}{k}_{3}\}$, according to their will to cooperate with each other. A power of 1 mW is provided to the network if three players decide to cooperate, or equivalently if the grand coalition will form. If only one player refuses to cooperate, a power of 0.8 mW will be assigned to the pair of cooperating nodes. The coalition game of Figure 1 is defined by
The players of each coalition will cooperate with each other. The player of a singleton coalition will be isolated.
Each player receives a positive payoff if it decides to cooperate, whereas all players receive zero if no agreement is bound. To divide the total payoff (power) in some appropriate way, we rest on the core set definition. It is straightforward to show that the coalitional TU game defined by (14) is superadditive. From Equations 3 and 4, it is easy to show that TU game (14) is not convex (supermodular). To check whether the core set of TU game (14) is empty or not, we resort to the balanced solution. TU game (14) is not balanced even though assigning the balanced weights as ${\mu}_{\mathcal{A}}=1$ for singleton coalitions, and ${\mu}_{\mathcal{A}}=0$ otherwise, inequality (10) holds. Using the fact that other balanced collection of weights exists in which ${\mu}_{\mathcal{A}}=\frac{1}{2}$ for $\left\mathcal{A}\right=2$ and ${\mu}_{\mathcal{A}}=0$ otherwise, the game is not balanced, and its core set may be empty. Note that this result does not mean that the core set of the game is surely empty.
Now, we heuristically find a core apportionment studying various possible networks. When there is no cooperation among players, the players are not provided with any power, that is, $\mathcal{F}=\left[\phantom{\rule{0.3em}{0ex}}\left\{\phantom{\rule{0.3em}{0ex}}{k}_{1}\phantom{\rule{0.3em}{0ex}}\right\},\left\{\phantom{\rule{0.3em}{0ex}}{k}_{2}\phantom{\rule{0.3em}{0ex}}\right\},\left\{\phantom{\rule{0.3em}{0ex}}{k}_{3}\phantom{\rule{0.3em}{0ex}}\right\}\phantom{\rule{0.3em}{0ex}}\right]$ with payoff distribution:
If only one player decides to stay alone, the payoff 0.8 is equally divided between the two cooperative players, and the isolated player gets zero, that is, for instance, $\mathcal{F}=\left[\phantom{\rule{0.3em}{0ex}}\left\{\phantom{\rule{0.3em}{0ex}}{k}_{1},{k}_{2}\phantom{\rule{0.3em}{0ex}}\right\},\left\{\phantom{\rule{0.3em}{0ex}}{k}_{3}\phantom{\rule{0.3em}{0ex}}\right\}\phantom{\rule{0.3em}{0ex}}\right]$ with payoff distribution:
Now, we suppose a player, for example k_{2}, decides to cooperate with both k_{1} and k_{3}, but the two players k_{1} and k_{3} do not bind an agreement to mutually cooperate. It is reasonable to suppose that the player k_{2} can act as a relay between k_{1} and k_{3}, and it must be provided with more power, that is, $\mathcal{F}=\left[\phantom{\rule{0.3em}{0ex}}\left\{\phantom{\rule{0.3em}{0ex}}{k}_{1},{k}_{2}\phantom{\rule{0.3em}{0ex}}\right\},\left\{\phantom{\rule{0.3em}{0ex}}{k}_{2}\phantom{\rule{0.3em}{0ex}},{k}_{3}\phantom{\rule{0.3em}{0ex}}\right\}\phantom{\rule{0.3em}{0ex}}\right]$ with payoff distribution:
Finally, in the complete network, each player receives the same payoff, that is, $\mathcal{F}=\left[\phantom{\rule{0.3em}{0ex}}\left\{\phantom{\rule{0.3em}{0ex}}{k}_{1},\phantom{\rule{0.3em}{0ex}}{k}_{2},\phantom{\rule{0.3em}{0ex}}{k}_{3}\phantom{\rule{0.3em}{0ex}}\right\}\phantom{\rule{0.3em}{0ex}}\right]$ with payoff distribution:
As can be easily seen, the above argument satisfies feasibility and nonblocking conditions of the core set apportionment in Definition 6. It is worthwhile to note that the core set definition does not imply an even division of the whole payoff across players. Thus, it is clear that this game consists of multiple core sets. The power distribution problem can also be solved by gametheoretic bargaining solutions, e.g., Nash bargaining game and auction [3].
3.1 On core stability
The goal of the network Figure 1 is to allocate power among players in order to stimulate all of them to cooperate. Obviously, each player tries to get the highest possible payoff. Let us predict the behavior of the players after having known the definition of the game. Suppose that the players k_{1} and k_{2} find an opportunity to meet each other. Obviously, they quickly take advantage to cooperate and achieve payoff distribution x = [ 0.4,0.4,0 ]. Then, it is profitable for player k_{1} to invite player k_{3} to join, therefore improving its own payoff from 0.4 to 0.6 and that of player k_{3} from zero to 0.2. On the other hand, this new agreement causes a decreasing payoff of player k_{2} from 0.4 to 0.2, and now the players k_{2} and k_{3} have an incentive to cooperate and increase their proper payoff from 0.2 to 1/3. Note that this agreement makes the player k_{1}’s payoff decrease from 0.6 to 1/3. The unfavorable decision of player k_{2} would tempt player k_{1} to retaliate. A negotiation between k_{1} and k_{3} to release cooperation with k_{2} results increasing their payoffs and boiling down k_{2}’s payoff to zero. The result of the above argument concerns the network is sustained by only one pair cooperation under the threat of ‘If you cooperate with the third player, then I will do the same’. ^{a} It is fairly clear that the players would seek to cooperate only as pairs for the purpose of negotiation, and not cooperate in the grand coalition framework, even though the game is superadditive. This is due to the fact of being superadditive but not balanced. The pairs can be changed as time goes on. In fact, the core apportionment suffers the lack of ‘farsighted’ (i.e., longterm)stability.
A coalition structure based on the core set is not adequately farsighted to avoid the elusiveness of the negotiation structure. At first sight, the core appears to be an extremely myopic notion, requiring the stability of a proposed allocation to deviations or blocks by coalitions, but not examining the stability of the deviations themselves. In general, the stability requirement is that the outcome be immune to deviations of a certain sort by coalitions. To provide the formal definition of farsighted stability, we need some additional notation.
Definition 10. [29] For $\mathbf{x},\phantom{\rule{0.3em}{0ex}}\mathbf{y}\in \mathcal{I}\left(\mathcal{K},\mathit{\nu}\right)$, x indirectly dominates y, which is denoted by y ≪ x, if there exist a finite sequence of imputations y = x_{1},x_{2},…,x_{ m } = x and a finite sequence of nonempty coalitions ${\mathcal{A}}_{1},{\mathcal{A}}_{2},\dots ,{\mathcal{A}}_{m}$, such that for each j = 1, 2,…,m  1: (i) by the deviation of ${\mathcal{A}}_{j}$, the imputation of x_{ j } is replaced to x_{j+1}, and (ii) x_{ j } [k] < x [k] for all $k\in {\mathcal{A}}_{j}$.
Condition (i) says that each coalition in ${\mathcal{A}}_{j}$ has the power to replace imputation x_{ j } by imputation x_{j+1}, and condition (ii) says that each player in ${\mathcal{A}}_{j}$ strictly prefers imputation x to imputation x_{ j }. It is clear that the indirect dominance relation contains the direct dominance relation.
Definition 11. [29, 30] Let $\mathcal{G}=\left(\mathcal{K},\nu \right)$ be a TU game. A subset $\mathcal{J}$ of $\mathcal{I}\left(\mathcal{K},\mathit{\nu}\right)$ is a farsighted stable set if: (i) for all $\mathbf{x},\mathbf{y}\in \mathcal{J}$, neither x ≪ y nor y ≪ x, and (ii) for all $\mathbf{y}\in \mathcal{I}(\mathcal{K},\nu )\setminus \mathcal{J}$ there exists $\mathbf{x}\in \mathcal{J}$ such that y ≪ x. Conditions (i) and (ii) are called internal stability and external stability, respectively.
By internal stability, there is no imputation in $\mathcal{J}$ that is dominated by another imputation in $\mathcal{J}$. By external stability, an imputation outside a stable set $\mathcal{J}$ is unlikely to be attained. Let us introduce three other different payoff distribution concepts which capture foresight of the players.
4 Shapley value
The Shapley value is an alternative solution for the payoff distribution in TU games. The Shapley value has long been a central solution concept in coalitional game theory. It was introduced by L. S. Shapley in the seminal paper [31] and it was seen as a reasonable way of distributing the gains of cooperation, in a fair and unique way, among the players in the game. In the Shapley solution, those who contribute more to the groups that include them are paid more. Let us denote ϕ_{ k } (ν) as the Shapley value of player k in the TU game defined by ν. The surprising result due to Shapley is the following theorem.
Theorem 1. There is a unique singlevalued solution to TU games satisfying efficiency, symmetry, additivity, and dummy. It is the wellknown Shapley value, the function that assigns to each player k the payoff:
The expression $\nu \left(\mathcal{A}\right)\nu (\mathcal{A}\setminus \{k\left\}\right)$ is the marginal payoff of player k to the coalition $\mathcal{A}$. The Shapley value can be interpreted as the expected marginal contribution made by a player to the value of a coalition, where the distribution of coalitions is such that any ordering of the players is equally likely. That makes the Shapley value exponentially hard to compute. Shapley characterized such value as the unique solution that satisfies the following four axioms:

(1)
Efficiency: The payoffs must add up to $\nu \left(\mathcal{K}\right)$, which means that all the grand coalition surplus is allocated, that is,
$$\sum _{k\in \mathcal{K}}{\varphi}_{k}\left(\nu \right)=\nu \left(\mathcal{K}\right).$$In the absence of superadditivity, instead we use $\underset{\psi \in \mathit{\Psi}}{\text{max}}\phantom{\rule{0.3em}{0ex}}\sum _{\mathcal{A}\in \psi}\nu \left(\mathcal{A}\right)$.

(2)
Symmetry: This axiom requires that the names of the players play no role in determining the value. If two players are substitutes because they contribute the same to each coalition, the solution should treat them equally, that is,
$$\nu (\mathcal{A}\cup \{k\left\}\right)=\nu (\mathcal{A}\cup \{i\left\}\right)\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\Rightarrow \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{\varphi}_{k}\left(\nu \right)={\varphi}_{i}\left(\nu \right).$$ 
(3)
Additivity: The solution to the sum of two TU games must be the sum of what it awards to each of the two games, that is,
$${\varphi}_{k}\left(\nu +\omega \right)={\varphi}_{k}\left(\nu \right)+{\varphi}_{k}\left(\omega \right)\phantom{\rule{2em}{0ex}}\forall \phantom{\rule{2.77626pt}{0ex}}k\in \mathcal{K.}$$ 
(4)
Dummy player: The player k is dummy (null) if $\nu (\mathcal{A}\cup \{k\left\}\right)=\nu \left(\mathcal{A}\right)$ for all $\mathcal{A}$ not containing k. If a player k is dummy, the solution should pay it nothing, i.e., ϕ _{ k } (ν) = 0.
The Shapley value is a feasible allocation, but need not be individually rational. Whenever the TU game is superadditive, the Shapley value is feasible and individually rational, but need not be in the core, hence can be directly dominated by another imputation. [19] shows that the Shapley value of a supermodular TU game is a core imputation, that is, the Shapley value is not dominated. For a superadditive TU game, the Shapley value is an internal and external stable imputation, and for NTU games, it is formulated in [32],[33]. To make an example, let us calculate the Shapley value of the players in the power distribution game of Figure 1:
Young [34] defines an equivalent definition for Shapley value. He withdraws the additivity axiom, and instead adds an axiom of marginality.

(1)
Marginality: If the marginal contribution to coalitions of a player in two games is the same, then the award of the player must be the same, that is, if
$$\begin{array}{ll}\phantom{\rule{14.0pt}{0ex}}\nu \left({\mathcal{A}}_{i}\right)\nu ({\mathcal{A}}_{i}\setminus \{k\left\}\right)& =\omega \left({\mathcal{A}}_{j}\right)\omega ({\mathcal{A}}_{j}\setminus \{k\left\}\right)\\ \phantom{\rule{1.6em}{0ex}}\forall \phantom{\rule{2.77626pt}{0ex}}{\mathcal{A}}_{i}\in \nu \phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\text{and}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\forall \phantom{\rule{2.77626pt}{0ex}}{\mathcal{A}}_{j}\in \omega \phantom{\rule{0.3em}{0ex}},\end{array}$$
then ϕ_{ k } (ν) = ϕ_{ k } (ω).
Marginality is an idea with a strong tradition in economic theory. In Young’s definition, marginality is assumed and additivity is dropped. Young [34] shows that the Shapley value is unique.
Theorem 2. [34] There exists a unique singlevalued solution to TU games satisfying efficiency, symmetry, and marginality; this solution is the Shapley value.
In the network engineering literature, Kim [35] proposes an energy efficient routing protocol based on the Shapley value. The concept of the Shapley value is used by Khouzani and Sarkar [36] to achieve a fair aggregate cost of link sharing, among primary and secondray users in a cognitive network. Using the Shapley value, a suitable network resource sharing among multimedia users is fairly achievable, as Park and van der Schaar propose in [37].
5 The kernel and nucleolus
Let $\mathcal{G}=(\mathcal{K},\nu )$ be a coalitional game with transferable payoff. The excess of the coalition $\mathcal{A}$ with respect to the payoff vector $\mathbf{x}\in {\mathbb{R}}^{K}$ is defined as
A positive excess can be interpreted as an incentive for a coalition to generate more utility. Using the excess notion, the core apportionment in a TU game can be redefined as
The maximum excess of player k against i is defined as
If player k departs from x, the most it can hope to gain (the least to lose) without the consent of player i is the amount of maximum excess. The extensions of the excess for NTU games are formalized in [38].
As defined by Osborne and Rubinstein ([3], Ch. 14), a coalition ${\mathcal{A}}_{i}$ is an objection of k against i to x, if ${\mathcal{A}}_{i}$ includes k but not i and x_{ i } > ν ({i}). Equivalently, ${\mathcal{A}}_{i}$ is a coalition that contains k, excludes i, and which gains too little. A coalition ${\mathcal{A}}_{j}$ is a counterobjection to the objection ${\mathcal{A}}_{i}$ of k against i, if ${\mathcal{A}}_{j}$ includes i but not k and $e\left({\mathcal{A}}_{j},\mathbf{x}\right)\phantom{\rule{0.3em}{0ex}}\ge \phantom{\rule{0.3em}{0ex}}e\left({\mathcal{A}}_{i},\mathbf{x}\right)$. Equivalently, ${\mathcal{A}}_{j}$ is a coalition that contains i and excludes k and that gains even less. Objections and counterobjections are exchanged between members of the same coalition in ${\mathcal{A}}_{i}$.
The idea captured by the kernel is that if at a nonempty imputation x, the maximum excess of player k against any other player i is less than the maximum excess of player i against the player k, then player k should get less. Of course, the players cannot get less than their individual worths if x is an imputation. The definition of the kernel follows:
Definition 12. The kernel is the set of all imputations x with the property that for every objection ${\mathcal{A}}_{i}$ of any player k against any other player i to x, there is a counterobjection of i to ${\mathcal{A}}_{i}$, such that

(a)
e _{ ki } (x) = e _{ i k } (x); or

(b)
e _{ ki } (x) < e _{ i k } (x) and x _{ k } = ν ({k}); or

(c)
e _{ ki } (x) > e _{ i k } (x) and x _{ i } = ν ({i}).
The kernel is the set of imputations x such that for any coalition ${\mathcal{A}}_{i}$, for each objection ${\mathcal{A}}_{j}$ of a user $k\in {\mathcal{A}}_{i}$ over any other member $i\in {\mathcal{A}}_{i}$, there is a counterobjection of i to ${\mathcal{A}}_{j}$. The kernel is contained in the (nonempty) core in any assignment game ν ([39], Theorem 1). In Figure 1, the unique kernel element is the equal split x = [ 1/3,1/3,1/3 ]; otherwise, for the single player coalition objection of the player with the minimum payoff, there is no any counterobjection.
The last type of a stable imputation we will study is the nucleolus. With the nucleolus, no confusion regarding the player set can arise. The basic motivation behind the nucleolus is that one can provide an allocation that minimizes the excess of the coalitions in a given coalitional game $\mathcal{G}=(\mathcal{K},\nu )$. For a TU game $\mathcal{G}=(\mathcal{K},\nu )$ and the payoff vector $\mathbf{x}\in {\mathbb{R}}^{K}$, let us denote $\mathbf{E}\left(\mathbf{x}\right)=\left[\cdots \ge e\left(\mathcal{A},\mathbf{x}\right)\ge \cdots :\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\varnothing \ne \mathcal{A}\ne \mathcal{K}\right]$ as a 2^{K}  2 dimensional vector whose components are the values of the excess function for all $\mathcal{A}\subset \mathcal{K}$, arranged in a nonincreasing order. The nucleolus of a game is the imputation which minimizes the excess with respect to the lexicographic order ^{b} over the set of imputations. The nucleolus of $\mathcal{G}$ with respect to $\mathcal{I}\left(\mathcal{K},\mathit{\nu}\right)$ is given by
The definition of the nucleolus of a coalitional game in characteristic function form entails comparisons between vectors of exponential length. Thus, if one attempts to compute the nucleolus by simply following its definition, it would take an exponential time. In the network engineering literature, Han and Poor [40] apply the Shapley value, excess, and nucleolus solutions to study a possible cooperative transmission among intermediate nodes to help relay the information of wireless users.
This defining property makes the nucleolus appealing as a fair singlevalued solution. It is easy to see that whenever the core of a game is nonempty, the nucleolus lies in it [4]. Moreover, the nucleolus always belongs to the kernel and satisfies the symmetry and dummy axioms of Shapley: dummy players receive zero payoffs. If a null player is removed from the game, the payoff allocation of the remaining players is uninfluenced by its departure. Because of these desirable properties, the nucleolus solution has found a lot of applications in cost sharing and resource allocation as Maschler in [41] reports. However, the nucleolus possesses certain features that makes it less agreeable. The original definition treats the excesses of any two coalitions as equally important, regardless of coalition sizes and coalition composition. Some unappealing features of utility distribution, derived with the nucleolus, are listed in [34]. For instance, the nucleolus lacks many monotonicity properties, that is, if a game changes so that some player’s contribution to all coalitions increases, then the player’s allocation should not decrease. Monotonicity states that as the underlying data of game change, the utility must change in a parallel fashion.
6 Cooperative Nash equilibria
Coalitional games aim at identifying the best coalitions of the agents and a fair distribution of the payoff among the agents. The classic core solution is an extension of the Nash equilibrium, since the coalitions bind agreements of agents with each other and earns a vector value rather than a real number. In ([42], Section 7.6), it is shown that the core set of an underlying coalitional game, if it exists, asymptotically coincides with the set of Nash equilibria of the repeated game, in the long run. The result of the Nash equilibrium is not always a satisfactory outcome for an external observer (e.g., prisoner’s dilemma game). Aumann in [43] and Bernheim et al. [44] introduce a stronger notion of Nash equilibria based on coalitional game theory. First, let us review the definition of the Nash equilibrium, where each pure strategy in a static game is presented as a coalition in a coalitional game. Thus, each player belongs to only one coalition.
Definition 13. A pure strategy (coalition) combination $\psi =\left[{\mathcal{A}}_{1},{\mathcal{A}}_{2},\dots ,{\mathcal{A}}_{m}\right]$, wherein ${\mathcal{A}}_{i}\bigcap {\mathcal{A}}_{j\ne i}=\varnothing $, $\bigcup _{i=1}^{m}{\mathcal{A}}_{i}=\mathcal{K}$, and a payoff distribution x = [x_{1},…,x_{ K }] is a pure Nash equilibrium, if a player $k\in \mathcal{K}$ whose unilateral deviation to a different coalition (pure strategy) yields a new distribution y = [y_{1},…,y_{ K }], such that y_{ k } > x_{ k }, does not exist.
In other words, in a Nash equilibrium, no agent is motivated to deviate from its coalition (strategy) given that the others do not deviate. As an example, we study the forwarder’s dilemma game [45] presented in Figure 2. This game is intended to represent a basic wireless relay operation between two different wireless terminals. These two agents, represented by players k_{1} and k_{2}, are supposed to operate a direct link that enables them to communicate without intermediaries. Each player wants to send a packet to its destination, d_{1} and d_{2} respectively, in each time step using the other player as a forwarder. We assume that each forwarding has a energy cost 0 < c ≪ 1. If player k_{1} forwards (F) the player’s k_{2} packet, player k_{2} gets a reward 1 and vice versa. Each player’s utility is its reward minus the cost. Each player is allured to drop (D) the received packet for saving energy. The strategic form of this game is depicted in Figure 3. In the cooperative representation of the forwarder’s dilemma game, there are two coalitions $\psi =\left[{\mathcal{A}}_{F},{\mathcal{A}}_{D}\right]$, and each player in $\mathcal{K}=\{{k}_{1},{k}_{2}\}$ must choose one coalition. For instance, $\psi =\left[{\mathcal{A}}_{F}=\{{k}_{1},{k}_{2}\},{\mathcal{A}}_{D}=\varnothing \right]$ is equivalent to the strategy profile (F, F), and $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{2}\right\},{\mathcal{A}}_{D}=\left\{{k}_{1}\right\}\right]$ corresponds to the strategy profile (D,F),and so on.
Unilateral deviation of player k_{1} from $\psi =\left[{\mathcal{A}}_{F}=\{{k}_{1},{k}_{2}\},{\mathcal{A}}_{D}=\varnothing \right]$ to $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{2}\right\},{\mathcal{A}}_{D}=\left\{{k}_{1}\right\}\right]$ increases its own payoff; therefore, the pure strategy profile (F,F) is not a Nash equilibrium point. The same applies to the departure of player k_{2} from $\psi =\left[{\mathcal{A}}_{F}=\{{k}_{1},\phantom{\rule{0.3em}{0ex}}{k}_{2}\},{\mathcal{A}}_{D}=\varnothing \right]$ to the pure strategy $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{1}\right\},{\mathcal{A}}_{D}=\left\{{k}_{2}\right\}\right]$. We can easily check the different combinations of $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{1}\right\},\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{D}=\left\{{k}_{2}\right\}\right]$, $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{2}\right\},\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{D}=\left\{{k}_{1}\right\}\right]$, and finally $\psi =\left[{\mathcal{A}}_{F}=\varnothing ,\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{D}=\{{k}_{1},\phantom{\rule{0.3em}{0ex}}{k}_{2}\}\right]$. The unilateral move of user k_{1} (respectively k_{2}) from the strategy profile $\psi =\left[{\mathcal{A}}_{F}=\varnothing ,\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{D}=\{{k}_{1},\phantom{\rule{0.3em}{0ex}}{k}_{2}\}\right]$ to $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{1}\right\},\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{D}=\left\{{k}_{2}\right\}\right]$ (respectively to $\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{2}\right\},\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{D}=\left\{{k}_{1}\right\}\right]$) does not yield any benefit. This game has a unique Nash equilibrium at the pure joint strategy $\psi =\left[{\mathcal{A}}_{F}=\varnothing ,{\mathcal{A}}_{D}=\{{k}_{1},{k}_{2}\}\right]$ with unsatisfactory payoff distribution x = [0,0]. At the Nash equilibrium point, either players choose the ‘competitive’ and ‘egoistic’ strategy D.
In many games, there are opportunities for joint deviations that are mutually beneficial for a subset of players. This led Aumann [43] to propose the idea of strong Nash equilibrium which ensures a more restrictive stability than the conventional Nash equilibrium. Strong Nash equilibrium reflects the unprofitability of coalition deviations. It is a strategy profile that is stable against deviations not only by single players but also by all coalitions of players. A strong equilibrium is defined as a strategic profile for which no subset of players has a joint deviation that strictly benefits all of them, while all other players (in the subset) are expected to maintain their equilibrium strategies.
Definition 14. A strategy (coalition) combination $\psi =\left[{\mathcal{A}}_{1},{\mathcal{A}}_{2},\dots ,{\mathcal{A}}_{m}\right]$, where ${\mathcal{A}}_{i}\bigcap {\mathcal{A}}_{j\ne i}=\varnothing $ and $\bigcup _{i=1}^{m}{\mathcal{A}}_{i}=\mathcal{K}$ with payoff distribution x = [x_{1},…,x_{ K }] is a strong Nash equilibrium if there do not exist a coalition ${\mathcal{A}}_{i}\in \psi $ whose deviation yields a new distribution y = [ y_{1},…,y_{ K }] such that ${y}_{k}\ge {x}_{k}\forall \phantom{\rule{0.3em}{0ex}}k\in {\mathcal{A}}_{i}$ and $\exists \phantom{\rule{0.3em}{0ex}}k\in {\mathcal{A}}_{i}$ such that y_{ k } > x_{ k }.
This definition of strong equilibrium is actually slightly different from those of [43] and [44]. Definition 14 allows a coalition to deviate from a strategy profile that strictly increases the payoffs of some of its members without decreasing those of the other members, whereas the original definition allows only deviations that strictly increase the payoffs of all members of a deviating coalition. We note that if a game implements a strategy for strong equilibrium, it does not necessarily implement it for Nash equilibrium. Both interpretations of strong Nash equilibrium are prominent in the literature, and in most games, the two definitions lead to the same sets of strong Nash equilibria; however, the one that we use here is slightly more appealing in the context of network formation games (see, e.g., [46]). Network formation games involve a number of independent players that interact with each other in order to form a suited graph that connects them.
Now, we restudy the forwarder’s dilemma game and try to find strong Nash equilibria profile. We will show that the game possesses strong Nash equilibria which are not equivalent to the Nash equilibrium. We pick different coalition combinations and test whether any coalition whose deviation satisfies its own members or not exists.

1.
$\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{1}\right\},{\mathcal{A}}_{D}=\left\{{k}_{2}\right\}\right]$ is not a strong Nash equilibrium because the deviation of ${\mathcal{A}}_{F}$ increases its member’s payoff.

2.
$\psi =\left[{\mathcal{A}}_{F}=\left\{{k}_{2}\right\},{\mathcal{A}}_{D}=\left\{{k}_{1}\right\}\right]$ is not a strong Nash equilibrium because the deviation of ${\mathcal{A}}_{F}$ renders its member’s payoff higher.

3.
$\psi =\left[{\mathcal{A}}_{F}=\varnothing ,{\mathcal{A}}_{D}=\{{k}_{1},{k}_{2}\}\right]$ is not a strong Nash equilibrium because the deviation of both players from ${\mathcal{A}}_{D}$ to ${\mathcal{A}}_{F}$ increases payoff distribution.v2

4.
$\psi =\left[{\mathcal{A}}_{F}=\{{k}_{1},{k}_{2}\},{\mathcal{A}}_{D}=\varnothing \right]$ is a strong Nash equilibrium because the departure of one or both players from ${\mathcal{A}}_{F}$ to ${\mathcal{A}}_{D}$ decreases at least one player’s payoff.
The unique strong Nash equilibrium is the strategy profile (F,F) which corresponds to the coalition set of $\psi =\left[{\mathcal{A}}_{F}=\{{k}_{1},{k}_{2}\},{\mathcal{A}}_{D}=\varnothing \right]$, since no deviation can better off the payoff distribution vector x = [ 1  c,1  c ]. In fact, at the strong Nash equilibrium, both players choose the ‘cooperative’ and ‘altruistic’ strategy of F in spite of the energy transmission cost.
In network problems, Zhong and Wu show that using strong Nash equilibria context makes possible a collusionresistant routing in noncooperative wireless ad hoc networks [47]. Altman et al. [48] examine a dynamic random access game with orthogonal power constraints, in which the probability of the transmission of a terminal in each slot depends on the amount of energy left prior to that slot. They show the existence of a strong Nash equilibrium point.
Conventional Nash equilibrium is concerned with the possibilities of only one step deviation by any player. The notion of strong Nash equilibrium requires an agreement not be subject to an improving (one step) deviation by any coalition of players given that all other coalitions be inert. This notion is stronger than the Nash equilibrium, but it is not resistant to further deviation by subcoalitions (the subsets of a coalition). Recognizing this problem, Bernheim et al. [44] introduced the notion of coalitionproof Nash equilibrium, which requires only that an agreement be immune to improving deviations which are selfenforcing. The definition of a selfenforcing deviation is recursive.
Definition 15. For a singleton coalition, a deviation is selfenforcing if it maximizes the player’s payoff. For a coalition of more than one player, a deviation is selfenforcing if (1) it is profitable for all its members and (2) if there is no further selfenforcing and improving deviation available to a proper subcoalition of players.
Generally, a deviation by a coalition is selfenforcing if no subcoalition has an incentive to initiate a new deviation. In the forwarder’s dilemma game, the Nash equilibria is upset by a deviation of the coalition of both players k_{1} and k_{2}. At the pure strategy Nash equilibrium where each player choose strategy D, they each obtain a payoff of 0. By jointly deviating (both choosing F instead) k_{1} and k_{2}, each earn a payoff 1  c. This deviation is not selfenforcing even though the movement to the pure strategy $\psi =\left[{\mathcal{A}}_{F}=\{{k}_{1},{k}_{2}\},{\mathcal{A}}_{D}=\varnothing \right]$ is profitable for both players. At strong Nash pure strategy (F, F), the player k_{1} tempts to move to strategy (D, F) to get more payoff, and player k_{2} to that (F, D). Thus, the strong Nash equilibrium is not immune against selfenforceability.
This notion of selfenforceability provides a useful means of distinguishing coalitional deviations that are viable from those that are not resistant to further deviations. With the concept of selfenforceability, our notion of coalitionproofness is easily formulated.
Definition 16. In a one player game, a strategy is a coalitionproof Nash equilibrium if it maximizes the player utility. In a game with more than one player, a combination strategy is coalitionproof Nash equilibrium if no subcoalition has a selfenforcing deviation that makes all its members better off.
This solution concept requires that there is no subcoalition that can make a mutually beneficial deviation (keeping the strategies of nonmembers fixed) in a way that the deviation itself is stable according to the same criterion. In the forwarder’s dilemma game, the strong Nash equilibrium profile (F,F) is not equivalent to coalitionproof Nash equilibrium. This is due to the fact that the deviation of $\left\{{k}_{1}\right\}\subset {\mathcal{A}}_{F}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}\{{k}_{1},{k}_{2}\}$ to the strategy (D,F) increases payoff of k_{1}. In this game, any coalitionproof Nash equilibrium does not exist due to the fact that all pure strategies have at least one selfenforcing deviation.
Bernheim et al. [44] note that for twoperson games, the set of coalitionproof equilibria coincides with the set of Nash equilibria that are not Paretodominated by any other Nash equilibrium. However in nperson games (K ≥3), the equilibrium concepts are independent. At coalitionproof Nash equilibrium, the deviations are restricted to be stable themselves against further deviations by subcoalitions. Moldovanu [49] discusses the situations of a threeplayer game, wherein coalitionproof Nash equilibrium is equivalent to the core set. The conditions under which the set of coalitionproof Nash equilibria coincides with the set of strong Nash equilibria are formulated by Konishi et al. [50].
In the network engineering literature, Félegyházi et al. [51] apply the concept of coalitionproof Nash equilibria to achieve a stable and fair channel allocation solution in a competitive multiradio multichannel wireless cognitive network. Gao et al. investigate multiradio multichannel allocation in multihop ad hoc networks [52]. To better understand the concepts of selfenforceability and coalitionproof Nash equilibrium, let us introduce an intuitive subcarrier allocation game in an OFDMA network. Let us focus on three wireless transmitters $\mathcal{K}=\left\{{k}_{1},\phantom{\rule{0.3em}{0ex}}{k}_{2},\phantom{\rule{0.3em}{0ex}}{k}_{3}\right\}$ and an OFDMA base station with two subcarriers $\mathcal{N}=\{1,2\}$. Every subcarrier $n\in \mathcal{N}$ has a frequency spacing Δ f. Each user $k\in \mathcal{K}$ experiences a Gaussian complexvalued channel gain H_{ kn }^{2} on the n th subcarrier to the base station. We assume that each subcarrier can be shared among more than one transmitter. The payoff of each player (transmitter) is defined as the achieved Shannon channel capacity. Each user $k\in \mathcal{K}$ is allowed to either spend a certain power ${\overline{p}}_{k}$ on only one chosen subcarrier, or equally divide it among both subcarriers. In the pure strategy a_{1}, player k transmits with the maximum power ${\overline{p}}_{k}$ on subcarrier n = 1 and does not transmit any information on subcarrier n = 2. The strategy a_{2} is contrary to a_{1}, i.e., exclusively transmitting on subcarrier n = 2 with maximum power. Finally, strategy a_{3} equally divides its power on two subcarriers and exploits transmitting on both tones. The terminal k achieves a channel capacity:
where C_{ kn } is the Shannon capacity achieved by user k on the n th subcarrier
wherein p_{kn} represents the power allocated by terminal k over the n th subcarrier and where the interference term $\sum _{k\ne i\in \mathcal{K}}{H}_{\text{in}}{}^{2}{p}_{\text{in}}$ is approximated with a Gaussian random variable of equal mean and variance. Choosing the strategy a_{1} means selecting ${p}_{k1}={\overline{p}}_{k}$ and p_{k 2} = 0. For the strategy a_{2}, p_{k 1} = 0, and ${p}_{k2}={\overline{p}}_{k}$, and for strategy a_{3}, ${p}_{k1}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}{p}_{k2}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}\frac{{\overline{p}}_{k}}{2}$. The parameter ${\sigma}_{\mathrm{w}}^{2}$ is the power of the additive white Gaussian noise (AWGN). Note thats in an OFDMA system, there is no interference between adjacent subcarriers. Hence, C_{kn} considers only intrasubcarrier noise that occurs when the same subcarrier is shared by more terminals.
Figure 4 reports the simulation results obtained after 100 random realizations of a network with terminals distributed at a distance between 3 m and 50 m from the base station. In the pure strategy matrix form of Figure 4, player k_{1} chooses the row, player k_{2} chooses the column, and player k_{3} chooses the matrix. Each payoff reports the (rounded) value of the achieved Shannon channel capacity in kb/s. We consider the following parameters for our simulations: the maximum power of each terminal k is ${\overline{p}}_{k}=10\phantom{\rule{1em}{0ex}}\text{mW}$; the power of the ambient AWGN noise on each subcarrier is ${\sigma}_{\mathrm{w}}^{2}=100\phantom{\rule{1em}{0ex}}\text{pW}$, and finally the carrier spacing is $\Delta \phantom{\rule{0.3em}{0ex}}f=\frac{10}{1024}\phantom{\rule{1em}{0ex}}\text{MHz}$. ^{c} The path coefficients H_{kn}^{2}, corresponding to the frequency response of the multipath wireless channel, are computed using the 24tap ITU modified vehicularB channel model adopted by the IEEE 802.16m standard [53].
It is easy to show that the (pure) Nash equilibrium strategies of Figure 4 are (a_{3},a_{3},a_{3}) equivalent to $\psi =\left[{\mathcal{A}}_{{a}_{1}}=\varnothing ,{\mathcal{A}}_{{a}_{2}}=\varnothing ,{\mathcal{A}}_{{a}_{3}}=\mathcal{K}\phantom{\rule{0.3em}{0ex}}\right]$ and (a_{1},a_{2},a_{2}) to $\psi =\left[{\mathcal{A}}_{{a}_{1}}=\left\{\phantom{\rule{0.3em}{0ex}}{k}_{1}\phantom{\rule{0.3em}{0ex}}\right\},{\mathcal{A}}_{{a}_{2}}=\{\phantom{\rule{0.3em}{0ex}}{k}_{2},{k}_{3}\phantom{\rule{0.3em}{0ex}}\},{\mathcal{A}}_{{a}_{3}}=\varnothing \right]$. The Nash equilibrium strategy (a_{3},a_{3},a_{3}) is neither coalitionproof nor strong. With the deviation of the coalition ${\mathcal{A}}_{{a}_{3}}$ to the strategy profile (a_{2},a_{1},a_{3}), all players profit more with payoff distribution [ 13, 9, 11 ]. This change is no longer valid since there exists a selfenforceability for player k_{1} to transit to the strategy profile (a_{3},a_{1},a_{3}). This transition is not favorable for players k_{2} and k_{3}. The player k_{2} is tempted to transit to the Nash equilibrium point to earn a higher payoff, whereas the Nash equilibrium strategy profile (a_{1},a_{2},a_{2}) with payoff vector [ 15, 10, 10 ] is a strong and coalitionproof Nash equilibrium. This is due to the fact that in $\psi =\left[{\mathcal{A}}_{{a}_{1}}=\left\{\phantom{\rule{0.3em}{0ex}}{k}_{1}\phantom{\rule{0.3em}{0ex}}\right\},\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{{a}_{2}}=\{\phantom{\rule{0.3em}{0ex}}{k}_{2},{k}_{3}\phantom{\rule{0.3em}{0ex}}\},\phantom{\rule{0.3em}{0ex}}{\mathcal{A}}_{{a}_{3}}=\varnothing \right]$, there is no deviation and selfenforceability that can improve the payoff distribution. As can be seen, all players prefer to stay at the coalitionproof Nash equilibrium rather than at the pure Nash equilibrium strategy (a_{3},a_{3},a_{3}). Note that a strong or coalitionproof Nash equilibrium does not necessarily coincide with a Nash equilibrium strategy profile, and the result of Figure 4 is an exception.
In general, the existence of a pure cooperative or noncooperative Nash equilibrium for subcarrier allocation game in OFDMA network is not guaranteed. Given different parameter approaches to quite different channel capacities, and this may result a matrix form without any type of Nash equilibrium. There even might exist a Nash equilibrium which is Paretodominated by another strategy profile. This shows that in OFDMA networks, an appropriate resource allocation technique is needed [9].
7 Coordinated equilibrium
The most common solution concept in (noncooperative) game theory, Nash equilibrium, assumes that players take mixed actions independently from each other. Cooperative games allow players to coordinate each other to find out possible equilibria and (joint) optimizations that the players can perform on their own. Unlike evolutionary games ([3], Ch. 3), in coordinated games, the interaction between players is implemented once among all players by a central authority to increase their throughput. The notion of correlated equilibrium was introduced by Aumann [54]. Correlated equilibria are defined in a context where there is an intermediator who sends random (private or public) signals to the players. An intermediator needs not to have any intelligence or knowledge of the game. These signals allow players to coordinate their actions and, in particular, to perform joint randomization over strategies. ‘Correlated strategies are familiar from cooperative game theory, but their applications in noncooperative games are less understood’, says Auman [54]. This is because the players of a coordination game are not totally isolated and without a communication between them, achieving coordinated strategy profile not possible.
Let us start with an intuitive example. Consider the multiple access game ([45], Table three) described in Figure 5. The players k_{1} and k_{2} wish to send some packets to their receivers sharing a common resource, i.e., the wireless medium. They are in the sight of each other, and accordingly, they interferer if transmitting at the same time. The users have two possible pure strategies: access (A) and wait (W). In this game, two identical transmitters must simultaneously decide whether to access to channel or wait. The transmission of each packet has an energy cost of 0 < c ≪ 1. Each player earns a payoff 1 if it succeeds to transmit its packet without collision with the other. Waiting does not bring either cost or reward for the player. Each player’s utility is its reward minus the cost. This game has three Nash equilibria: (A,W), (W,A), and a mixed strategy Nash equilibrium, where each player transmits with the probability 1c ([45], Sections 2.3 and 2.4). The utilities of Nash equilibria strategies are (1  c,0), (0,1  c), and (0,0), respectively. It is clear that the mixed strategy is not resistant to an improving deviation. In the following, we give the possibility of preplay communication to achieve a stable Nash equilibria.
In the game with ‘cheap conversation’, each player simultaneously and publicly announces whether it decides to access or wait. Following the announcements, each player makes its choice. Suppose the players agree to participate to the game binding the following agreement: each player announces A with probability $\frac{3}{4}$. If the profile of announcements is either (A,W) or (W,A), then each player plays its own announcement. Otherwise, each player plays A with probability $\frac{1}{2}$. Note that no further communication is possible. The use of joint deviation requires the unanimity of all members of the deviating coalition. A player agrees to be a part of a joint deviation if given its own information the deviation is profitable. Thus, if a joint deviation is used, it is common knowledge that each deviator believes that deviation is profitable.
This tradeoff results in an expected payoff for each player of $\frac{1116c}{32}>0$, while in the mixed Nash equilibrium of the original game, each player has an expected payoff of 0. In this coordinated Nash equilibrium of the game, the players effectively play the correlated strategy[54],[55] (of the original game) given in Figureف6, in order to face a higher utility in strategy profiles (A,W) and (W,A). It is important to note that this joint probability distribution is not the product of its marginal distributions and therefore cannot be achieved from a mixed strategy profile of the game without correlation among players.
As can be seen, the proposed correlated deviation from the mixed strategy equilibrium makes both players better off. Note that the players are allowed to bind an agreement only on the space of feasible outcomes. In the correlated multiple access game, the outcome is feasible since the correlated results are in the range between the smallest and highest possible payoff. In fact, the set of correlated equilibria contains those equilibria from which no coalition has a selfenforcing deviation, making all members better off.
Let us describe a more complicated correlated equilibrium. We study the nearfar effect game established by Bacci et al. ([56], Figure six). The basic idea of nearfar effect game scheme is depicted in Figure 7. Two wireless terminals k_{1} and k_{2} are placed close to and far from a certain access point (AP), respectively, in a code division multiple access (CDMA) network with high SINR regime. The strategy of each player is to transmit either with the maximum power $\overline{p}$ or with a weakened level $\eta \overline{p}$, where 0 < η < 1. Due to the interference at the AP, the throughput (the amount of delivered information) of each player depends on the strategies chosen by both players. Transmitting with a higher power increases the BER, and this results decreasing the throughput. Each player is rewarded r if it successfully delivers its packet and a reduced δ r, if it delivers a corrupted version of the packet, where 0 < η < δ < 1. If the near player k_{1} decides to transmit with the power $\overline{p}$, the farther player k_{2} will not be able to deliver any information to the AP.
This results in no benefit for k_{2} and causes a power consumption cost equal to η c if k_{2} chooses strategy $\eta \overline{p}$ and c otherwise, where c ≪ r. Obviously, transmitting with power $\overline{p}$ for k_{1} results in a complete information delivery. This concerns a payoff equal to reward minus power consumption cost, i.e., r  c, irrespective of the k_{2} strategy. The packets of player k_{2} are successfully delivered if it chooses the maximum power $\overline{p}$ and player k_{1} that reduced $\eta \overline{p}$. On the other hand, if both players decide to transmit with reduced power $\eta \overline{p}$, the near player takes the payoff δ r  η c > 0, while the farther player k_{2} will not successfully deliver any packet and suffers only a power cost η c.
The payoff matrix of the nearfar effect game is depicted in Figure 8. As can be seen, the unique pure strategy of this game is represented by the strategy $\left(\overline{p},\eta \overline{p}\right)$ with benefits r  c and η c for k_{1} and k_{2}, respectively. This means that at the Nash equilibrium point, the farther player is not able to send any information. On the other hand, the Pareto optimal solutions of the game are the strategies $\left(\overline{p},\eta \overline{p}\right)$ and $\left(\eta \overline{p},\overline{p}\right)$. This is an unsatisfactory outcome for the far player k_{2}, while the near player k_{1} takes the highest possible payoff. Now, let us find the mixed strategy of the game. We denote α_{1} the probability with which the near player k_{1} decides to transmit with the maximum power $\overline{p}$ and α_{2} the same probability for the far player k_{2}. The payoffs of the players k_{1} and k_{2} are represented by
Both players want to maximize their own payoff. As can be seen, ${x}_{{k}_{1}}$ takes its maximum value r  c with α_{1} = 1. On the other hand, with α_{1} = 1, the far player k_{2} earns a negative payoff whatever α_{2} ∈ [0,1]. Instead, with α_{1} = 0, the near player k_{1} gains δ r  η c, and player k_{2} setting up α_{2} = 1 achieves the payoff δ r  c. Thus, the best values for α_{1} and α_{2} are 0 and 1, respectively. The conclusion is that the mixed strategy is equivalent to the pure strategy $\left(\eta \overline{p},\overline{p}\right)$ with payoff x = [ δ r  η c,δ r  c ]. In this game, there is no (totally) mixed strategy and that is equal to the one of the pure Pareto optimal points.
The near player earns the highest possible payoff at the Nash equilibrium; hence, it does not leave this strategy profile. The highest possible payoff for the far player is on the contrary δ r  c. We show that an appropriate agreement among players can satisfy both of them at correlated equilibrium. Players k_{1} and k_{2} can guarantee an expected payoff of x = [ r  c,δ r  c ] by playing the correlated strategy profile:
This is a plausible end since both players earn their own highest possible payoff. The correlated strategy (23) is derived from the fact that picking any real number κ in the expression $\kappa \xb7\left(\overline{p},\eta \overline{p}\right)+\left(1\kappa \right)\xb7\left(\overline{p},\overline{p}\right)$ is indifferent for the near player k_{1}, since it gets its own highest possible payoff, r  c as well. To satisfy the far player k_{2}, it is enough to solve the following equation for ${x}_{{k}_{2}}$:
Supposing $\kappa =\frac{\mathrm{\delta r}}{\left(1\eta \right)c}<1$, the correlated strategy (23) means that the near player always transmits at its highest power level $\overline{p}$, and the far player transmits at that reduced $\eta \overline{p}$ with probability $\frac{\mathrm{\delta r}}{\left(1\eta \right)c}$, and the maximum power $\overline{p}$ otherwise. Actually, the near and far players effectively play the matrix form game of Figure 9.
Bonneau et al. [57] show that the coordination among mobile users can significantly increase the performance of access to a common channel in ALOHA setting. A coordination mechanism is also considered by Bonneau et al. [58] to achieve the optimal power allocation in a wireless network, wherein each terminal knows only its own channel state. The concept of correlated equilibrium is also introduced in a multiuser interference channel context in [59]. Different types of coordination are deeply discussed and widely used in [55].
8 Dynamic learning
Until now, we have realized that the Nash equilibrium suffers from the lack of farsighted stability, i.e., the relative results can be unsatisfactory; because of this, any player can have incentive to improve its outcome by moving to another strategy. The existence of the strong and coalitionproof Nash equilibrium is not guaranteed and even if so, when the number of pure strategies is large, finding such solutions is very complicated. The challenge of finding a profitable accord among players is persistent in coordinated equilibria solution. In this section, the main question we seek an answer to is How can the players be led to a stable joint pure strategy gaining an acceptable payoff? This question is important, even if multiple equilibrium points with the same payoff have been identified, since each player may autonomously decide to stay in a different strategy.
Dynamic learning[60] has been widely used in order to get rid of the anarchy derived from the conflicts between selfish decisions. Learning is a joint adaptive process for agents to converge and to get the best final response. The agents either have a common interest like a team work or each agent has its own greedy goal. Generally, there are three learning process types: individual learning, jointaction learning, and stochastic learning. In individual learning process, the independent agents cannot observe one another’s actions, i.e., for each player, the opponents are passive agents. Instead, during jointaction learning process, the notion of the ‘optimality’ is improved by adding the observation of other concurrent learners to accomplish a stable optimal solution. The stochastic learning framework, having Markovian property and a stochastic interstate transition rule, enables each player to observe the opponents’ actions history.
In the network engineering literature, van der Schaar and Fu [61] introduce a stochastic learning process among autonomous wireless agents for the optimization of dynamic spectrum access, given the QoS of multimedia applications. A reconfigurable multihop wireless network is studied by Shiang and van der Schaar [62], wherein a decentralized stochastic learning process optimizes the transmission decisions of nodes aimed at supporting missioncritical applications. In [63], Lin and van der Schaar propose a reinforcement learning among agents of a multihop wireless network based on Markov decision process. Each terminal autonomously adjust transmission power in order to maximize the network utility, in a dynamic delaysensitive environment.
Here, we study a wellknown individual reinforcement learning task, namely the socalled QLearning [64]. We assume a set of players$\mathcal{K}$, and each player k has a finite set of individual actions A_{ k }. Each agent k individually chooses a pure joint action (strategy) to be performed a_{ k } = (a_{ 1 },…,a_{ K }) ∈ A_{ 1 } × ⋯ × A_{ K } from the available joint strategy space. Qlearning enables the individual learners to achieve optimal coordination from repeated trials. Qlearning introduces a certain value Q as the immediate reward obtained after having moved to the new strategy. Each player individually updates a Q value for each of its actions. In each time step and after having selected the new joint action A_{ k }, the values of${Q}_{k}^{t}$ is individually updated. In particular, the value of${Q}_{k}^{t+1}\left({\mathbf{a}}_{k}\right)$ estimates the utility of performing the joint strategy A_{ k } for user k. In the seminal paper of Watkins and Dayan [64], the Q value is updated by the following recursion:
where δ_{ k } ∈ (0,1) is a discount factor, and r_{ k } (a_{ k }) is a reward of the joint action A_{ k } for the respective player; f_{ k } is a function of t which is related to ‘learning rate’. Watkins and Dayan showed that given bounded rewards, learning rate$0\le {f}_{k}^{t}<1$, and
all Q_{ k } values updating (25) converge a common joint pure strategy with probability one. The reward r_{ k } is defined by a learning policy, and it is not necessarily equal to the payoff defined by the game. The learning policy is greedy with respect to the Q value, i.e., the particular action A_{ k } will be selected in longrun if it makes Q value better off. Qlearning is guaranteed to converge to an optimal and stable joint strategy regardless of the action selection policy. Qlearning is not applicable where the strategy space is continuous or the number of strategies is not finite. Claus and Boutilier [65] establish a simplified version of the Q recursion (25) which updates the Q value by the following recursion:
For the sake of simplicity, we apply the Q recursion (27). In a multilearners scenario, a major challenge of Qlearning is strategy selection. When the number of strategies and players are large, the number of time step to achieve an optimal joint action exponentially increases. It is fairly clear that the best manner is to start with ‘exploration’ of different strategies and then focus on ‘exploitation’ of the strategies with the best value of Q. Kaelbling et al. [66] recall Boltzmann function as an efficient strategy selection to strike a balance between exploration and exploitation. Boltzmann functions define a probability distribution among different joint actions. At each time step t + 1, every player will individually select the joint strategy A_{ k } with the probability p(a_{ k }):
The E_{ k }(a_{ k }) = (δ_{ k })^{t} · r_{ k }(a_{ k }) is the discounted reward for taking action A_{ k } by the user k in time step t. The T is a function which provides a randomness component to control exploration and exploitation of the actions. Practically, the temperature function T is a decreasing function over time to decrease the exploration and increase exploitation. High values of T yield a small p(a_{ k }) value and this encourages exploration, whereas a low T makes Q(a_{ k }) more important and this encourages exploitation. At time t = 0, each player randomly chooses a strategy and assign a random number to its own Q value. At time step t, after having been updated function T, each concurrent agent’s experience consists of a sequence of stages [65]:

1.
Computing p(a _{ k }) for all$\phantom{\rule{0.3em}{0ex}}{\mathbf{a}}_{k}\in {\times}_{\forall k\in \mathcal{K}}{\mathbf{A}}_{k}$.

2.
Generating a random number${\xi}_{k}^{t}$ uniformly distributed in [0,1], and then choosing the best joint strategy A _{ k }, i.e., the highest p (a _{ k }) such that${\xi}_{k}^{t}\ge p\left({\mathbf{a}}_{k}\right)$. If${\xi}_{k}^{t}<p\left({\mathbf{a}}_{k}\right)$ for all ${\mathbf{a}}_{k}\in \underset{\forall k\in \mathcal{K}}{\times}{\mathbf{A}}_{k}$, then the learner randomly picks a strategy.

3.
Updating the${Q}_{k}^{t}$ value according to (27). If${Q}_{k}^{t}$ grows, then the learner moves to selected joint strategy A _{ k }, otherwise it stays in the current joint action and do not update Q.
Despite the individual best strategy selection of the learners, this process reach a common stable joint strategy such that all players stay there forever, i.e., no player deviates from the (common) achieved joint strategy.
The theory of learning in games studies how and which equilibria might arise as a consequence of a longrun nonequilibrium process of learning. A natural question is Can learning algorithms find a Nash equilibrium? The reason for asking this question is in the hope of being able to achieve Nash equilibria, as a plausible concept, via a reasonable learning algorithm in particular when there are a large number of players and strategies. At the first look, the stability of the above addressed dynamic learning approach is described as to converge to a pure joint strategy, and it is clear that the existence of a pure Nash equilibrium is not guaranteed. The fact is, in general, a dynamic learning algorithm is not able to guarantee to achieve a noncooperative or cooperative Nash equilibrium. In the literature, there are some efforts to present a dynamic learning algorithm that achieves a Nash equilibrium in dynamic and repeated games under particular constraints [67–70].
We present now some results about Qlearning in a CDMA network. In what follows, the experimental work is presented highlighting how the agents learn to increase their individual rewards by revealing their actions. As above mentioned, the strategy selection can significantly influence the number of time steps to converge. Choosing an appropriate temperature function is a heuristic search. In our experiment, we define T = q · e^{mt} as our temperature function, wherein m controls the rate of exponential decay and q > 1 encourages the exploration of different strategies in the initial time steps.
We illustrate the behavior of mobile terminals as Qlearners in a CDMA network. Our example is a power control problem in a CDMA network applying Qlearning and Boltzmann function. Assume a CDMA network with K mobile terminals denoted by set$\mathcal{K}$. The players wish to transmit data to a certain AP. The strategies of every player is a set of discrete power levels denoted by A = A_{ k } = [ Δ p,2.Δ p,…,M.Δ p], where Δ p is our power step and M > 1 is an integer number. Each user has M actions to choose from, and accordingly, the matrix game is made by${\times}_{k\in \mathcal{K}}\mathbf{A}$ which consists of M^{K} joint strategies. The Shannon capacity between player k and the AP is
with N_{ s }, H_{ k }^{2}, and${\sigma}_{\mathrm{w}}^{2}$ denoting (the common) spreading factor for all players, user’s k path gain, and the AWGN power, respectively, and where the p_{ k } ∈ A denotes the transmit power of user k.
We introduce an individual work in which each player must individually choose the joint strategy at which a player achieves the best Shannon channel capacity. We simulate a learning process with K = 8 players, such that each player k must choose the best p_{ k } between M = 5 strategies. The power step is assumed to be Δ p = 100mW, the power of AWGN${\sigma}_{\mathrm{w}}^{2}=1\text{nW}$ and the spreading factor N_{ s } = 64. The players are uniformly located at a distance between 3 and 50 m from the AP. The matrix form of this game is composed of 390,625 joint strategies, and there may exist different power combinations (joint strategies) which achieve the same Shannon channel capacities. Qlearning leads the players to that joint strategy (p_{ 1 },…,p_{ 8 }) ∈ A^{8} in which all players are satisfied of the proper achieved Shannon channel capacities. In the Q function of (27) for all players, the discount factor parameter is fixed to δ_{ k } = 0.09, and the payoff function r_{ k } is defined as
Our experiments with different parameters show that good values of the temperature function parameters are m = 0.001 and q = 50, and we start with${Q}_{k}^{t=0}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}0$. It is obvious to say that an existing strategy in which all r_{ k }(a_{ k }) are maximal value is not always guaranteed, since there is a huge conflict of interest between the players to choose different strategies.
Figure 10 reports the behavior of the (reward) achievable rate C_{ k } of K = 8 terminals as a function of the time step t in our scenario. The figure exhibits the convergence of all learners to a stable joint strategy after six time steps. Numerical results of 500 random realization of a network show the convergence of all players to a stable joint strategy after (in average) six steps of the iterative Qlearning algorithm wherein each joint action is probabilistically chosen according to the distribution of Boltzmann function. Furthermore, it is experimentally observed that the sum of the achieved Shannon channel capacities is (in average) 22.4 b/s/Hz, and that is 94% of the maximum possible of$\sum _{k\in \mathcal{K}}{C}_{k}$.
9 Discussion
Cooperation can be seen as the action of obtaining some advantage by giving, sharing, or allowing something. In this contribution, we aimed at mapping different coalitional game approaches into communications and networking systems. A very important boundary condition for cooperation is that each participating entity is gaining more by cooperation than it would by operating alone. It is not important that all entities contribute the same effort, gain the same amount, or even have the same gaintocost ratio, but the effect of cooperation should bring advantage or gain to each cooperating entity. One different form of cooperation is altruism, a strategy wherein one of the players may sacrifice and does not gain from the cooperation to support others. In networking, for instance, one terminal sacrifices battery power and bandwidth to act as a relay for other terminals and to increase the throughput of the whole system. In some communication systems, network protocols themselves can be seen as an implicit cooperation to achieve better performance, e.g., ALOHA system. In some communication systems, network entities establish a cooperation with each other to achieve better performance, e.g., relay communications.
Cooperative game theory is a branch of game theory which aims at studying the cooperations among individual and rational participants. Unlike noncooperative game approaches, cooperative game concepts are centralized, and they need a central authority for exchange of information and policymaking process. The most challenging part of a cooperative game theoretic framework is the choice of characteristic function, since it interprets the agents’ perceptions of gain and satisfaction.
The main fundamental question in coalitional game theory is the question how to allocate the total generated gain by the collective of all players over the different players in the game. The distribution of payoff is described as a binding contract between the players, and various criteria have been developed. The problem of the gain distribution is approached with the aid of solution concepts in coalitional game theory like core, Shapley value, kernel, and nucleolus. Core solution is the most classic solution whose result is stable against deviation of coalitions. The core solution is useful, where the negotiation process is centralized and no subset of players can selfishly and privately negotiate with each other. The core set can be empty. Shapley value is the unique singlevalued solution which explores the fairness in every possible prospective coalition forming. The kernel solution should be understood as the set of all efficient allocations for which no pair of players want to exchange payoff. The nucleolus selects the unique imputation that successively (lexicographically) minimizes the maximal excesses. This defining property makes the nucleolus appealing as a fair singlevalued solution. The kernel of a game always contains the nucleolus. The process of computing the kernel and nucleolus of arbitrary transferable utility games is hard.
The most fundamental solution concept for noncooperative game is that of Nash equilibrium. In a Nash equilibrium, no agent is motivated to deviate from its strategy given that others do not deviate. If every player individually agrees on a certain profile of strategies without binding an agreement, then these strategies constitute a Nash equilibrium. Nash equilibrium does not account for the possibility that groups of agents (coalitions) can change their strategies in a coordinated manner. A strategy profile is in strong Nash equilibrium if no subgroup of agents is motivated to change their strategies given that others do not change. Often, the strong Nash equilibrium is a too strong solution concept, since in many games, no such equilibrium exists. Coalitionproof has been suggested as a partial remedy to this problem. This solution concept requires that there is no subgroup that can make a mutually beneficial deviation (keeping the strategies of nonmembers fixed) in a way that the deviation itself is stable according to the same criterion. These solution concepts which allow coalitions to make agreements simultaneously typically suffer from incompatibility of agreements, which can give rise to empty solution sets in games of networking interest. Mixed (vs. pure) strong and coalitionproof Nash equilibrium have not been introduced.
In a game wherein there are a huge number of agents and strategies, finding a pure cooperative/noncooperative Nash equilibrium is hard and maybe even impossible. A learning process leads participants to a common joint action with an acceptable payoff. During a learning process, agents act as independent learners, i.e., they only get information about their own action choice and payoff. As such, they neglect the presence of the other agents. The learning process happens at regular time steps and is basically a signal for the agents to start an exploration phase. During each exploration phase, some agents exclude their current best action so as to give the team the opportunity to look for a possibly better joint action. This technique of reducing the action space by exclusions was only recently introduced for finding periodical policies in games of conflicting interests. There are two problems in the process of learning optimal cooperative pursuit strategy for multiple agents. One is the probability of circulation among the actions chosen by the agents, which make the learning process not converging; the other is there are many conflicts among the actions chosen by the agents, which make the learned pursuit strategy not optimal. Qlearning with the Boltzmann actionselection strategy guarantees the convergence of multiagents to a common and optimal joint strategy after a few time step.
10 Conclusion
This paper has provided a unified reference for network engineers investigating the applicability of coalitional game theory to practical problems. Different approaches such as core solution, Shapley value, kernel, and nucleolus were shown to provide a strong foundation in finding possible and stable resource/cost sharing arrangements. The results confirm the apparent analogy between the definition of Nash equilibrium in noncooperative and coalitional game theory: both strong and coalitionproof Nash equilibria reflect on unprofitability of coalition deviations rather than an individual player deviation. In a network wherein informational exchange is possible, either through a central controller or among players themselves, the concept of coordinated equilibrium arises. The results of intuitive examples show a significant improvement in coordinated equilibrium when compared with noncooperative schemes. When the number of agents or strategies is large, the ability to jointly reach a consensus through environmental learning guarantees convergence to the best joint action.
Endnotes
^{a} Two is cooperation, three is a crowd.
^{b} The lexicographic order between two vectors x and y is defined by x≼_{ lex }y if there exists an index k, such that x [ l] = y [ l] for all l < k, and x [ k] < y [ k].
^{c} This is the carrier spacing of each subcarrier at a base station with 10 MHz bandwidth and 1024 subcarriers.
References
 1.
McHenry MA, McCloskeyk D: Spectrum occupancy measurements. (Shared Spectrum Company, 2005),. {NSF}_{Chicago}_200511_{measurements}_v12.pdf. Accessed 13 July 2013 http://www.sharedspectrum.com/wpcontent/uploads/
 2.
HyunHo C, Jong Bu L, Hyosun H, Kyunghun J: Optimal handover decision algorithm for throughput enhancement in cooperative cellular networks. Paper presented at the 2010 IEEE 72nd vehicular technology conference fall (VTC 2010fall). Ottawa, Ontario, Canada, 6–9 September 2010
 3.
Osborne MJ, Rubinstein A: A Course in Game Theory. MIT Press, Cambridge; 1994.
 4.
Peleg B, Sudhölter P: Introduction to the Theory of Cooperative Games. Springer, Berlin; 2007.
 5.
Saad W, Han Z, Debbah M, Hjørungnes A, Basar T: Coalitional game theory for communication networks: A tutorial. IEEE Signal Process. Mag 2009, 26(5):7797.
 6.
Hossain E, Kim DI, Bhargava VK: Cooperative Cellular Wireless Networks. Cambridge University Press, New York; 2011.
 7.
Hew SL, White L: Cooperative resource allocation games in shared networks: Symmetric and asymmetric fair bargaining models. IEEE Trans. Wireless Commun 2008, 7(11):41664175.
 8.
Chee TK, Lim CC, Choi J: A cooperative game theoretic framework for resource allocation in OFDMA systems. Paper presented at IEEE international conference on communication systems (ICCS), Singapore, 30 October–1, November 2006
 9.
Shams F, Bacci G, Luise M: An OFDMA resource allocation algorithm based on coalitional games. EURASIP J. Wireless Commun. Netw 2011, 2011(1):46. 10.1186/16871499201146
 10.
Kwon H, Lee GB: Cooperative power allocation for broadcast/multicast services in cellular OFDM systems. IEEE Trans. Commun 2009, 57(10):30923102.
 11.
Zeydan E, Kivanc D, Tureli U, Comaniciu C: Joint iterative beamforming power adaptation for MIMO ad hoc networks. EURASIP J. Wireless Commun. Netw 2011, 2011(1):79. 10.1186/16871499201179
 12.
Li D, Xu Y, Wang X, Guizani M: Coalitional game theoretic approach for secondary spectrum access in cooperative cognitive radio networks. IEEE Trans. Wireless Commun 2011, 10(3):844856.
 13.
Khan Z, Glisic S, DaSilva L, Lehtomȧndki J: Modeling the dynamics of coalition formation games for cooperative spectrum sharing in an interference channel. IEEE Trans. Comput. Intell. AI Games 2011, 3(1):1730.
 14.
Stanojev I, Simeone O, Spagnolini U, BarNess Y, Pickholtz R: Cooperative ARQ via auctionbased spectrum leasing. IEEE Trans. Commun 2010, 58(6):18431856.
 15.
Javadi F, Kibria M, Jamalipour A: Bilateral Shapley value based cooperative gateway selection in congested wireless mesh networks. Paper presented at IEEE global telecommunications conference (GLOBECOM), New Orleans, LO, USA, 30 November–4 December 2008
 16.
Huang J, Han Z, Chiang M, Poor H: Auctionbased resource allocation for multirelay asynchronous cooperative networks. Paper presented at IEEE international conference on acoustics, speech, and signal processing (ICASSP), Las Vegas, NV, USA, 31 March–4 April 2008
 17.
Deng H, Wang Y, Lu J: Auction based resource allocation for balancing efficiency and fairness in OFDMA relay networks with service differentiation. Paper presented at IEEE 72nd vehicular technology conference (VTC), Ottawa, Canada, 6–9 September 2010
 18.
Rodoplu V, Meng T: Core capacity region of energylimited, delaytolerant wireless networks. IEEE Trans. Wireless Commun 2007, 6(5):18441853.
 19.
Shapley LS: Cores of convex games. Int. J. Game Theory 1971, 1: 1126. 10.1007/BF01753431
 20.
ParandehGheibi A, Eryilmaz A, Ozdaglar A, Medard M: Resource allocation in multiple access channels. Paper presented at conference on signals, systems and computers (ACSSC), Pacific Grove, CA, USA, 4–7 November 2007
 21.
Gillies DB: Some Theorems on NPerson Games, PhD Thesis. Department of Mathematics, Princeton University; 1953.
 22.
Madiman MM: Cores of cooperative games in information theory. EURASIP J Wireless Commun. Netw 2008, 2008: 318704. 10.1155/2008/318704
 23.
Li D, Xu Y, Liu J, Wang X: Relay assignment cooperation maintenance in wireless networks. Paper presented at IEEE wireless communication networking conference (WCNC), Sydney, Australia, 18–21 April 2010
 24.
Niyato D, Hossain E: A cooperative game framework for bandwidth allocation in 4G heterogeneous wireless networks. ICTMICC 2006, 9: 43574362.
 25.
Sandholm TW, Lesser VR: Coalitions among computationally bounded agents. Artif. Intell 1997, 94: 99137. 10.1016/S00043702(97)000301
 26.
Bondareva O: Some applications of the methods of linear programming to the theory of cooperative games. Problemy Kibernetiki 1963, 10: 119139. (Russian)
 27.
Shapley LS: On balanced sets and cores. Naval Res. Logistics Q 1967, 14(4):453560. 10.1002/nav.3800140404
 28.
Roth AE: The Shapley Value. Essays in Honor of Lloyd S. Shapley. Cambridge University Press, UK; 1988.
 29.
Harsanyi JC: An equilibriumpoint interpretation of stable sets and a proposed alternative definition. Manage. Sci 1974, 20(11):14721495. 10.1287/mnsc.20.11.1472
 30.
Rafels C, Tijs S: On the cores of cooperative games and the stability of the Weber set. In. J. Game Theory 1997, 26: 491499. 10.1007/BF01813887
 31.
Shapley LS: A value for nperson games. Contribution to the theory of games. Annals Math. Studies 1953, 2: 28.
 32.
Hart S: A comparison of nontransferable utility values. Center for Rationality and Interactive Decision Theory. Hebrew University, Jerusalem. Discussion Paper Series, 2003
 33.
Otten GJ, Peters HJ: The Shapley Transfer Procedure for NTUGames. Maastricht University, Netherlands; 2002.
 34.
Young HP: Monotonic solutions of cooperative games. Int. J. Game Theory 1985, 14: 6572. 10.1007/BF01769885
 35.
Kim S: Cooperative game theoretic online routing scheme for wireless network managements. Commun. IET 2010, 4(17):20742083. 10.1049/ietcom.2009.0686
 36.
Khouzani M, Sarkar S: Economy of spectrum access in time varying multichannel networks. IEEE Trans. Mobile Comput 2010, 9(10):13611376.
 37.
Park H, van der Schaar M: Coalitionbased resource negotiation for multimedia applications in informationally decentralized networks. IEEE Trans. Multimedia 2009, 11(4):765779.
 38.
Pechersky S: On proportional excess for NTU games. European University at St. Petersburg, Department of Economics, Tech. Rep. Ec02/01 (2001)
 39.
Driessen TSH: A note on the inclusion of the kernel in the core of the bilateral assignment game. Int. J. Game Theory 1998, 27(2):301303. 10.1007/s001820050073
 40.
Han Z, Poor HV: Coalition games with cooperative transmission: A cure for the curse of boundary nodes in selfish packetforwarding wireless networks. IEEE Trans. Commun 2009, 57(1):203213.
 41.
Maschler M: The bargaining set, kernel, and nucleolus. In Handbook of Game Theory with Economic Applications. Edited by: Aumann R, Hart S. Elsevier New York; 1992:591667.
 42.
Demange G, Wooders M: Group Formation in Economics: Networks, Clubs, and Coalitions. Cambridge University Press, UK; 2005.
 43.
Aumann RJ: Acceptable points in general cooperative nperson games. In Annals of Mathematics Studies, 40, in Contributions to the Theory of Games. Edited by: Princeton University, Princeton University . Princeton University Press, NJ; 1959:287324.
 44.
Bernheim BD, Peleg B, Whinston MD: Coalitionproof Nash equilibria I. Concepts. J Econ. Theory 1987, 42(1):112. 10.1016/00220531(87)900998
 45.
Félegyházi M, Hubaux JP: Game theory in wireless networks: A tutorial. ACM Comput. Surveys 2006, Technical Report: LCAREPORT2006002, EPFL
 46.
Jackson M, Nouweland VD: Strongly stable networks. Games Econ. Behavior 2005, 51(2):420444. 10.1016/j.geb.2004.08.004
 47.
Zhong S, Wu F: A collusionresistant routing scheme for noncooperative wireless ad hoc networks. IEEE/ACM Trans. Netw 2010, 18(2):582595.
 48.
Altman E, Basar T, Menache I, Tembine H: A dynamic random access game with energy constraints. Paper presented at the international symposium on modeling and optimization in mobile, ad hoc, and wireless networks (WiOpt), Seoul, South Korea, 23–27 June 2009
 49.
Moldovanu B: Coalitionproof Nash equilibria and the core in threeplayer games. Games Econ. Behavior 1992, 4(4):565581. 10.1016/08998256(92)90037S
 50.
Konishi H, Le Breton M, Weber S: Equivalence of strong and coalitionproof Nash equilibria in games without spillovers. Econ. Theory 1997, 9: 97113. 10.1007/BF01213445
 51.
Félegyházi M, Cagalj M, Bidokhti S, Hubaux JP: Noncooperative multiradio channel allocation in wireless networks. Paper presented at IEEE computer and communication societies conference (INFOCOM), Anchorage, AK, 6–12 May 2007
 52.
Gao L, Wang X, Xu Y: Multiradio channel allocation in multihop wireless networks. IEEE Trans. Mobile Comput 2009, 8(11):14541468.
 53.
IEEE 802.16 Broadband Wireless Access Working Group: IEEE 802.16m Evaluation Methodology Document (EMD). Tech. Rep. IEEE 802.16m08/004r5 (Jan. 2009)
 54.
Aumann RJ: Subjectivity and correlation in randomized strategies. J. Math. Econ 1974, 1(1):6796. 10.1016/03044068(74)900378
 55.
Heller Y: Correlated equilibrium behavior and seeminglyiterational. Ph.D. Dissertation, Faculty of Exact Sciences, TelAviv University, Israel, 2011
 56.
Bacci G, Luise M, Poor HV: Game theory and power control in ultrawideband networks. Phys. Commun 2008, 1(1):2139. 10.1016/j.phycom.2008.01.004
 57.
Bonneau N, Altman E, Debbah M: Correlated equilibrium in access control for wireless communications. Paper presented at international conference on networking (IFIP), Coimbra, Portugal, 15–19 May 2006
 58.
Bonneau N, Debbah M, Altman E, Hjorungnes A: Nonatomic games for multiuser systems. IEEE J. Sel. Areas Commun 2008, 26(7):10471058.
 59.
Charafeddine M, Han Z, Paulraj A, Cioffi J: Crystallized rates region of the interference channel via correlated equilibrium with interference as noise. Paper presented at IEEE international conference on communication (ICC), Dresden, Germany, 14–18 June 2009
 60.
Dilts R, Epstein TA: Dynamic Learning. MeTa Publications, Capitola; 1995.
 61.
van der Schaar M, Fu F: Spectrum access games and strategic learning in cognitive radio networks for delaycritical applications. Proc. IEEE 2009, 97(4):720740.
 62.
Shiang HP, van der Schaar M: Online learning in autonomic multihop wireless networks for transmitting missioncritical applications. IEEE J. Sel. Areas Commun 2010, 28(5):728741.
 63.
Lin Z, van der Schaar M: Autonomic and distributed joint routing and power control for delaysensitive applications in multihop wireless networks. IEEE Trans. Wireless Commun 2011, 10(1):102113.
 64.
Watkins CJCH, Dayan P: Qlearning. Mach. Learn 1992, 8(3):279292.
 65.
Claus C, Boutilier C: The dynamics of reinforcement learning in cooperative multiagent systems. Paper presented at the fifteenth national conference on artificial intelligence (AAAI98), Madison, WI, 26–30 July 1998
 66.
Kaelbling LP, Littman ML, Moore AW: Reinforcement learning: A survey. J. Artif. Intell. Res 1996, 4: 237285.
 67.
Kalai E, Lehrer E: Rational learning leads to Nash equilibrium. Econometrica 1993, 61(5):10191045. 10.2307/2951492
 68.
Fudenberg D, Levine DK: Learning and equilibrium. Annu. Rev. Econ 2009, 1(1):385420. 10.1146/annurev.economics.050708.142930
 69.
Milgrom P, Roberts J: Rationalizability, learning, and equilibrium in games with strategic complementarities. Econometrica 1990, 58(6):12551277. 10.2307/2938316
 70.
Daskalakis C, Frongillo R, Papadimitriou CH, Pierrakos G, Valiant G: On learning algorithms for Nash equilibria. Paper presented at the third international symposium on algorithmic game theory (SAGT), Athens, Greece, 18–20 October 2010
Acknowledgements
This work was supported by the European Commission in the framework of the FP7 Network of Excellence in Wireless COMmunications NEWCOM (grant agreement no.: 318306).
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Shams, F., Luise, M. Basics of coalitional games with applications to communications and networking. J Wireless Com Network 2013, 201 (2013). https://doi.org/10.1186/168714992013201
Received:
Accepted:
Published:
Keywords
 Nash Equilibrium
 Payoff
 Pure Strategy
 Payoff Distribution
 Grand Coalition