Game theoretical analysis of rate adaptation protocols conciliating QoS and QoE
 Smrati Gupta^{1, 4}Email author,
 E. V. Belmega^{2, 3} and
 M. A. VázquezCastro^{1}
https://doi.org/10.1186/s1363801605695
© Gupta et al. 2016
Received: 16 August 2015
Accepted: 24 February 2016
Published: 5 March 2016
Abstract
The recent increase in the use of wireless networks for video transmission has led to the increase in the use of rateadaptive protocols to maximize the resource utilization and increase the efficiency in the transmission. However, a number of these protocols lead to interactions among the users that are subjective in nature and affect the overall performance. In this paper, we present an indepth analysis of interplay between the wireless network dynamics and video transmission dynamics in the light of subjective perceptions of the end users in their interactions. We investigate video exchange applications in which two users interact repeatedly over a wireless relay channel. Each user is driven by three conflicting objectives: maximizing the Quality of Service (QoS) and Quality of Experience (QoE) of the received video, while minimizing the transmission cost. Noncooperative repeated games model precisely interactions among users with independent agendas. We show that adaptive video exchange is impossible if the duration of the interaction is determined. However, if the users interact indefinitely, they achieve cooperation via exchange of video streams. Our simulations evidence the tradeoff between users’ QoS and QoE of their received video. The expected duration of the interaction plays a role and draws the region of solution tradeoffs. We propose further means of shaping this region using Pareto optimality and userfairness arguments. This work proposes a concrete game theoretical framework that allows the optimal use of traditional protocols by taking into account the subjective interactions that occur in practical scenarios.
Keywords
Rate adaptation QoS vs. QoE tradeoff Video exchange Noncooperative repeated games1 Introduction
The past decade has seen an enormous growth in users of wireless networks. With the increasing variety of video applications via wireless channels, it is imperative that this number will grow. Given the limited resources at their disposal, competition emerges among the users to access these video services effectively. This competition affects the experienced video quality and plays a role in optimizing the resource allocation.
An important demand of the users of the video applications is high quality of experience of the perceived video. The QUALITY of Experience (QoE) at the end user depends on the temporal and spatial structure of the video, and its optimization involves the minimization of video distortion and of disruptions in video playback. The temporal structure may relate to many features, for example, to the continuity in the video observed at the end user without freezing or disruptions, ideally implying timely playback of the video. Similarly, the spatial structure may relate to features, for example, to the avoidance of formation of artifacts due to packet losses at the end user [1]. Furthermore, the wireless nodes involved in the video transmission are distributed and autonomous devices. In a video exchange scenario between two selfish autonomous users, the ability of a user to obtain a desired quality video depends on the action chosen by the other user, i.e. how much information it sends. In turn, the other user will incur a transmission cost. This leads to a natural conflict between minimizing their own transmission cost while maximizing the Quality of Service (QoS) and QoE of their video. This paper addresses such competitive video exchange between two users.
A relevant framework to model this type of competitive interactions is provided by game theory. Tools from game theory have already been used to study various aspects of transmission over wireless networks [2, 3]. To be more precise, the game theoretical framework has been applied at the physical layer to design power allocation policies [4, 5] and at the application layer to design rate allocation policies [6, 7]. Furthermore, other types of strategic interactions such as network topology selection games [8], pricing games between service providers and users for network congestion control [9]. The aforementioned works consider a oneshot interaction model; however, repeated interactions that take place over multiple stages seems more realistic. Repeated games have been used in [10] to study a distributed state estimation problem for the interconnected electrical power grid; also, in [11], the authors study a distributed power control problem in a wireless communication scenario. Therefore, the repeated games’ framework seems to be very promising for many other interactive multiuser applications such as distributed rate adaptation problems.
The game theoretical analysis of different transmission protocols has been considered in the literature [7, 12]. In [7], the authors have modeled the players as the Transmission Control Protocol (TCP) flows and the actions as the rate adaptation parameters to optimize their own average throughputs. The user actions, which are the rate adaptation parameters, control the number of buffer packets that are stored to be transmitted at a later stage. In [12], the competition in TCP has been modeled by allowing the users to choose the gradient of rate adaptation. In these works, the players have discrete (often binary) sets of actions. It is important to build models that allow the users to choose smoother actions, e.g. from continuous sets.

We present a video exchange interaction among two users by a noncooperative repeated game framework. The overall gametheoretical framework that has been built is particularly suited for wireless networking analysis. The reason is that both the wireless channel dynamics (network resource) and the video rate dynamics can be jointly treated analytically to analyze the performance of interactive protocols. Moreover, the type of analysis based on selection of (conflicting) objectives allows to account for a wide range of pure technological issues to be studied against different subjective considerations.

We show the application of the game theoretical framework to the analysis of the concrete problem of QoEdriven rate adaptation over underlying Markovian channel dynamics taking into account subjective factors. The use of this framework has revealed the existence of a tradeoff between QoE and QoS in such systems. This tradeoff arises due to conflicting paradigms that are desired by the users: increase in the quality of the video, or QoS, received leading to higher rate of adaptation to channel conditions and increase in quality of video playback, or QoE that is inversely affected by faster rate adaptation due to higher probabilities of exceeding channel capacity. This tradeoff captures the threefold impact of dynamics among the network resources, video transmission and subjective interactions, all three of which are modeled in our framework.

We illustrate that this tradeoff has been found to be controlled by not only the expected channel conditions but other factors like the expected duration of the interaction among the users and the tolerance of the users and Pareto optimality arguments. As the expected duration of interaction increases, the speed of rate adaptation to the channel conditions increases providing better QoS but leading to increase in outages inversely affecting the QoE. Likewise, it is also shown and quantized that the achievable QoS and QoE of the user depends on the level of tolerance towards the performance gains of the other user exceeding its own. Further, we argue that rational users are more likely to agree upon Pareto optimal tradeoffs. Pareto optimality of an outcome means that no other tradeoff exists that offers a strictly better utility to one user without decreasing the other’s. Hence, a number of factors are shown to directly or indirectly determine the achievable rates and consequently the QoS and QoE obtained by the users.

Overall, our results indicate that competing protocols over underlying wireless networks can be more accurately understood by considering also subjective metrics. Robust protocols for wireless channels should then be able to take into account underlying tradeoffs for optimal and stable behavior, etc.
This work extends and improves the conference version [13] in four aspects. (i) The results regarding the solution of the repeated game were only announced in [13]. Here, we provide the complete proofs of all our claims. (ii) We extend the system model to include both satellite and terrestrialbased bentpipe topology. (iii) We add a discussion on the means to shape the optimal tradeoff region of the game using userfairness and Pareto optimality arguments. (iv) At last, our simulation section is wider and offers a more complete view on how the system parameters impact the outcome of the game with an emphasis on the QoS vs. QoE tradeoffs.
This paper is organized as follows. In Section 2, we describe the system model and the QoEdriven rate adaptation performance metrics which are used in the paper. We present the game theoretical framework for video transmission in Section 3. The numerical results are presented in Section 4 followed by the conclusions in Section 5.
2 System and rate adaptation models
In this section, we describe the system model and the rate adaptation scheme. We also define the users performance metrics.
2.1 System model
2.1.1 Topology
2.1.2 Channel model
The wireless channel between any two nodes has variable capacity to transmit packets depending on factors like environmental conditions (rain, cloud, etc.), and congestion due to other users. We model channel links between the nodes to have a certain throughput capacity which varies with time. The throughput capacity is the maximum number of packets that can be transmitted between any node k and node l at time t and is denoted by R _{ kl }(t). It is measured in packets per time slot (pps). As shown in Fig. 1, R _{ AB }=0 for all t. All links are assumed to have equal throughput capacities R _{ c }(t), that is, R _{ kl }(t)=R _{ c }(t), ∀k l∈{A R,B R,R A,R B} at any time instant t. Each time instant is described by t=n Δ, where \(n\in \mathbb {Z}^{+}\) and Δ is the interval size, i.e. a precision parameter. We will refer to instantaneous quantities using the integer value n henceforth. We assume that the relay node (e.g. satellite) simply forwards the packet without any processing. Therefore, R _{ c }(n) is the instantaneous endtoend throughput capacity of the channel between A and B and vice versa.
This assumption (of the symmetric channel links) is taken for the sake of simplicity of analysis and illustration. The main results of this work can be generalized to the case of asymmetric links. However, the underlying mathematical analysis becomes more tedious and hinges the reader into the understanding of the major ideas and intuition behind the repeated user interactions.
We consider a bentpipe topology that belongs to a bigger and denser network. Our aim is to study the dynamics of transmission within this subset of a denser network such that the results can be modularized and extrapolated to more general cases.
The channel coherence time is given by T _{ c }=c Δ, \(c\in \mathbb {Z}^{+}\). We assume that c>>1 which means that the channel coherence time is very large which is reasonable in a number of wireless transmission scenarios (e.g. satellite conditions). We also assume that each video stream exchange lasts for T=N Δ slots, and we focus on the cases where N≫c. This is also a reasonable assumption as the video exchanges often last a long period of time in many wireless transmission scenarios.
We point here that this scenario is general and is applicable to any bentpipe topology. The model captures the scenarios in which the relay node may be a terrestrial node or a satellite node and can be adapted to different wireless scenarios by adjusting the parameters (e.g. delay in transmission depending on satellite or terrestrial scenario).
2.1.3 Crosslayer design model
The video content is transmitted using User Datagram Protocol over RealTime Transport Protocol (RTP). The source node is equipped with a codec responsible for compression of the video content. The output rate of the codec can be reconfigured to deliver a target bitrate. Therefore, at the source node, the output rate can be adjusted as desired. This output rate from any node k∈{A,B} at time t=n Δ is denoted by r _{ k }(n), and it is measured in packets per time slot. The packets are further passed down to the network layer maintaining coherence with standard protocol stack. The rate of transmission of video payload is adapted to the network conditions by optimization of the QoE perceived at the end user. With the use of Real Time Control Protocol (RTCP) signaling, the source node collects the feedback information about the roundtrip time from the destination. The RTCP feedback signaling provides the source node with information to reconfigure the codec rate to suit the target needs.
2.1.4 Buffer and delay model
Each of the users maintains a transmission buffer of size B _{ c } packets. The number of packets stored in transmission buffer at time t=n Δ is given by B(n). The buffer is maintained using a first in, first out (FIFO) queue. Let the source node k transmit the packets at time t=n Δ at the rate r _{ k }(n). The source node is unaware of the channel link capacity R _{ c }(n). Assume that the buffer is initially empty. If r _{ k }(n)<R _{ c }(n), all the packets are transmitted through the channel with rate r _{ k }(n). If r _{ k }(n)≥R _{ c }(n), then packets are transmitted with a rate of R _{ c }(n) through the channel link. From the remaining r _{ k }(n)−R _{ c }(n) packets, B(n)≤B _{ c } packets are stored in the transmission buffer. The remaining packets are lost. In the next time slot, the packets stored in the buffer are transmitted through the channel. These packets face a delay in reaching the destination. The same process as above is applied to the rest of the new packets to be transmitted, but now the effective channel capacity is R _{ c }(n)−B(n−1).
Using the above model, we will now describe a smooth adaptation method that depends on the QoE, QoS and transmission cost derived in [15] that will enable us to study the game theoretical interaction among the nodes in subsequent sections.
2.2 Generic rate adaptation
Note that this rate adaptation is done at the source node k, and it depends on the delay statistics observed at node l≠k. Henceforth, we denote the player other than k as −k. Note that the increment in rate can be modeled using different functions like logarithmic, linear or exponential functions. We chose here a squared function having a slow start followed by a rapid growth which makes our model resemblant to TCP to some extent [16, 17].
We now describe the performance metrics which will be used in this work.
2.2.1 Quality of service metric
It can be seen that the QoS of node k depends on the rate adaptation at the other node via α _{−k }.
2.2.2 Quality of experience metric
We note that the flow continuity at k is a function of the rate adaptation gradient at the other node α _{−k }.
2.2.3 Transmission cost
The cost of transmission at the node k is determined by his own rate adaptation gradient.
3 Game theoretical framework for QoEdriven adaptive video transmission
From the previous discussion in Section 2.1, the quality of the video obtained by a player, say A, depends on the rate of transmission of player B and vice versa. Additionally, in order to transmit the video packets to player A, player B incurs a cost and vice versa. Therefore, if the players are selfish, there is a conflict of interest as both players want to incur minimum cost (affecting the video quality of other player) and also obtain a good quality of video themselves (affected by the rate of transmission by other players). Therefore, there is an interaction arising naturally among such two nodes which is modeled using game theory.
3.1 Oneshot noncooperative game

Players Set \(\mathcal {P}\): The set of players or users is given by \(\mathcal {P}=\{A,B\}\) which are the two nodes A and B that exchange the video packets.

Action Set \(\mathcal {A}_{k}\): The set of actions that can be taken by the player k, where k∈{A,B}, is given by \(\mathcal {A}_{k}=\left [0,\lambda \frac {\pi }{2}\right ]\) and λ∈(0,1). The action chosen by the kth player from \(\mathcal {A}_{k}\) is denoted by α _{ k }. Hence, the player can choose the gradient of rate adaptation as its action in order to maximize its own benefits. The factor λ ensures that the gradient is always \(\alpha <\frac {\pi }{2}\) to prevent infinite increase in rate.

Payoff function u _{ k }: The payoff function of the kth player is defined as follows:$${} u_{k}(\alpha_{k},\alpha_{k})=w_{1}f_{k}^{\text{QoS}}(\alpha_{k})+w_{2}f_{k}^{\text{QoE}}(\alpha_{k})w_{3}f_{k}^{\text{COST}}(\alpha_{k}) $$(9)
The payoff function of the kth player depends not only on his own action α _{ k } but also on the other player’s α _{−k }. This function combines jointly the video quality experienced by player k and the cost of transmitting video to the other player. The throughput and the flow continuity (given by \(f_{k}^{\text {QoS}}(\alpha _{k})\) and \(f_{k}^{\text {QoE}}(\alpha _{k})\) in (3) and (4)) affect the quality of the video experienced by player k and are both determined by the action of the other player α _{−k }. The weights w _{ i }∈(0,1) are positive parameters that assign dimensions to the three factors such that \(\sum _{i=1}^{3}w_{i}=1\).
A solution concept of this game is the Nash equilibrium. The Nash equilibrium (NE) of a noncooperative game \(\mathcal {G}\) is defined as a set of actions of the players (\(\alpha _{k}^{*},\alpha _{k}^{*}\)), from which no player has an incentive to deviate unilaterally. Hence, the selfish rational players can be foreseen to play the action at NE.
Proposition 1.
The unique Nash equilibrium of oneshot QoEdriven adaptive video exchange game \(\mathcal {G_{O}=}\{\mathcal {P},\{\mathcal {A}_{k}\}_{k\in \mathcal {P}},\{u_{k}\}_{k\in \mathcal {P}}\}\) is given by \((\alpha _{1}^{*},\alpha _{2}^{*})=(0,0).\)
Proof.
See Appendix A.
The NE in the oneshot game shows that the two selfish players will not exchange any data if their interaction takes place a single time. No selfish player would be willing to incur a cost to send the video when there is no guarantee of receiving anything in return. Moreover, even if the other player were to send data, rational behavior leads to the choice which minimizes the cost.
In the following, we study repeated games as a mechanism to motivate rational players to share data in the absence of a central authority which may use other mechanisms such as pricing techniques to manipulate the Nash equilibrium [19].
3.2 Repeated games framework
We consider that the players interact repeatedly under the same conditions, i.e. the same game \(\mathcal {G}_{O}\) is played repeatedly. The repeated game can lead to a change in the outcome because the players can now observe the past interactions and decide accordingly upon their present action. We will first formulate the game tuple, in coherence with the definition of Section 3.1. We will consider two cases: (i) finitehorizon repeated game: the players know in advance how many times they will interact or when the game ends and (ii) infinite horizon repeated game: the players are unaware of how many times they will interact. We analyze the outcomes or game equilibria in these two cases.
Before proceeding, we remark that the assumption N>>c is required for the same game \(\mathcal {G}_{O}\) to be played repeatedly (see Section 2.1). The QoS and QoE terms in the players’ payoff functions are empirical averages (over an N time horizon) of random quantities depending on the varying channel state. This state changes at every c temporal instances and if N>>c, we can consider that these functions are approximately equal to their statistical counterparts and, thus, are good estimates as deterministic functions of α _{−k }. Otherwise, since the payoff functions would change randomly at every stage, more advanced tools such as Bayesian games would have to be used instead of repeated games.

Players set \(\mathcal {P}\): refers to the set of players {A,B}.

Strategy set \(\mathcal {S}_{k}\): The strategy set of player k is different from the action set in (8) because it describes a strategic plan on how to choose an action in \(\mathcal {A}_{k}\) at every stage of the game and for any history of play. More precisely, let the actions taken by the players in the τth stage be \(a^{(\tau)}=(a_{1}^{(\tau)},a_{2}^{(\tau)})\). Then the history of the game at the end of stage t≥1 is given by h ^{(t+1)}=(a ^{(1)} a ^{(2)}…a ^{(t)}). The set of all possible histories up to t is given by \(\mathcal {H}^{(t)}=\{\mathcal {A}_{k}\times \mathcal {A}_{k}\}^{t}\) where \(\mathcal {A}_{k}\) is the action set in (8) given by \([0,\lambda \frac {\pi }{2}]\). The strategy of a player k for T interactions is a sequence of functions \(s_{k}=(s_{k}^{(1)},s_{k}^{(2)}\ldots,s_{k}^{(T)})\in \mathcal {S}_{k}\) that map each possible history to an action: \(s_{k}^{(t)}:\mathcal {H}^{(t)}\rightarrow \mathcal {A}_{k}\) such that \(s_{k}^{(t)}(h^{(t)})=a_{k}^{(t)}\).

Overall payoff function v _{ k }: The payoff function is an overall average of the payoffs obtained by player k in every stage of the game. We consider a discounted average payoff such that the player discounts the future payoffs by a factor δ∈(0,1):$$ v_{k}(s)=\frac{(1\delta)}{(1\delta^{T})}\sum_{t=1}^{T}\delta^{t1}u_{k}(a^{(t)}) $$(10)
where a ^{(t)} is the action profile at stage t induced by strategy s (i.e. \(s_{k}^{(t)}(h^{(t)})=a_{k}^{(t)}\forall \: k\)), u _{ k } is the onestage payoff function defined in (9), and T is defined as the number of times the interaction takes place. It is assumed that T>1.
One interesting solution in repeated games is the subgame perfect equilibrium [10] which is a refined NE. In coherence with the NE of oneshot games, the NE of repeated games is a strategy profile from which no player gains by deviating unilaterally. However, there are some strategy profiles, which are not expected to occur because of player rationality although they are NE of the overall interaction. Hence, the subgame perfect equilibrium region is a subset of NE. A subgame is a repeated game starting from stage t onwards and which depends on the starting history h ^{(t)}, and is denoted by \(\mathcal {G}_{R}(h^{(t)})\). The final history for this subgame is given by h ^{(T+1)}=(h ^{(t)},a ^{(t)},…a ^{(T)}). The strategies and payoffs used for the subgame are the functions of possible histories that are consistent with h ^{(t)}. Any strategy profile s of the whole game induces a strategy s∣h ^{(t)} on any subgame \(\mathcal {G}_{R}(h^{(t)})\) such that for all k, s _{ k }∣h ^{(t)} denotes the restriction of s _{ k } to the histories consistent with h ^{(t)}. A subgame perfect equilibrium (SPE) is defined as a strategy profile s ^{∗}=(s _{1},s _{2}) such that for any stage t and any history \(h^{(t)}\in \mathcal {H}^{(t)}\), the strategy s ^{∗}∣h ^{(t)} is a NE for the relevant subgame \(\mathcal {G_{R}}(h^{(t)})\).
3.3 Finite horizon repeated game
We investigate the expected outcomes or SPEs for the repeated video exchange game \(\mathcal {G}_{R}\) assuming T is fixed and known by the two players. The only SPE of this game is given in the following proposition.
Proposition 2.
The details of the proof are provided in Appendix Appendix 2: proof of proposition 2. The players are aware of the number of times the video exchange will occur; hence, the players act selfishly at each stage of the game in order to maximize their payoffs. There is no incentive in building longterm trust in this game for any player, because it is known that the game will be played only T times. This result is similar to the repeated prisoners’ dilemma [19] and is due to the fact that the players have a strictly dominant action (and action that offers strictly higher payoff than any other action irrespective of what the other player does) at every stage of the game which is α _{ k }=0, for all k.
3.4 Infinite horizon repeated game
If the players’ interaction is only temporary and occurs in a determined number of stages, the only rational outcome results in no video exchange at all because the player knows exactly when the interaction ends. Here, we investigate the possibility of achieving different outcomes in the case of uncertain duration or longterm interaction.
We will now study the SPE for infinite horizon repeated game.The players do not know precisely when the game will end, or equivalently, it is assumed that T→+∞. We will now identify some strategies that are SPE for such games. Note that the overall achievable SPE region for the infinite horizon repeated games is an open problem and not known in general [19].
Proposition 3.
The proof is given in Appendix Appendix 3: proof of proposition 3. This pessimistic SPE given by (11) is independent of the choice of the discount factor δ∈(0,1). However, there are other possible SPEs. We will show that, depending upon the discount factor and other system parameters, an SPE that allows nontrivial video exchange is sustainable in the long term interaction. The intuition is similar to the infinite time horizon prisoner’s dilemma [19]. In a long term, the players can build trust with one another to exchange their video (in spite of the incurred transmission cost) nontrivially and improve their received videos QoS and QoE (thats depends on the opposite player’s strategy) which results in overall payoff functions which are higher than the no cooperation state (0,0) for both players.
We focus on action profiles that provide higher payoffs than the oneshot NE for both players. Each player is willing to take the risk of paying a transmission cost in the hope that the other player will do the same and which will lead them both to higher average payoffs.
In the next proposition, we describe such SPE of the game \(\mathcal {G_{R}}\) which is conditional to the value of the discount factor δ.
Proposition 4.
then the following strategy is an SPE: “A player k transmits with gradient \(\alpha _{k}^{*}\) at the first stage and continues to adapt the rate with this gradient as long as the other player adapts its rate at least by \(\alpha _{k}^{*}\). If a defection is detected, both players stop transmitting (i.e. α=0) for the rest of the interaction.”
We detail this proof in Appendix Appendix 4: proof of proposition 4.
We remark that the inequality \(0<\delta _{\text {asym}}^{\text {min}}(\alpha _{1}^{*},\alpha _{2}^{*})<1\) holds for any \((\alpha _{1}^{*},\alpha _{2}^{*})\) satisfying (12) as explained in Appendix Appendix 4: proof of proposition 4; therefore, (13) does not imply any additional condition on the system parameters. If an agreement point satisfies (12), then there is an admissible discount factor range within \((\delta _{\text {asym}}^{\text {min}}(\alpha _{1}^{*},\alpha _{2}^{*}),1)\) such that the agreement point \((\alpha _{1}^{*},\alpha _{2}^{*})\) is sustainable. Intuitively, if an agreement point provides a higher utility than the oneshot NE to both players, such agreement point can be sustained.
From the above Proposition 4, we have identified the set of discount factors leading to a longterm sustainable SPE different than (0,0). The discount factor can be interpreted as the players’ belief on the game to go on at every stage of the game. If the probability of the game to continue is large enough, the players develop trust and obtain better overall payoffs than minimizing their instantaneous costs.
3.5 Selection of sustainable agreement profiles
We have shown that, when the video exchange is performed repeatedly, the players can transmit the video at an agreement profile defined in (12). A natural question arises: out of all these agreement profiles, which specific profile is more likely to be selected by the players? In general, this is an open and difficult question. Also, a unified framework to tackle this problem [19] is still missing.
In this section, we present a qualitative analysis of this problem specific to our video exchange scenario. We illustrate numerically which of the achievable agreement profiles are more likely to occur over a period of time. We consider two factors that play a role in choosing a particular agreement profiles: the tolerance index of the players and the Pareto optimality of the agreement points.
3.5.1 Tolerance index
It can be easily observed that the achievable agreement profiles do not provide equal payoffs to both the players. Some of these agreement profiles are advantageous to one of the players, and only a subset provides equal payoffs to both players.
Until now, we have assumed that every player k agrees to the action profile \((\alpha _{k}^{*},\alpha _{k}^{*})\) without considering the payoff obtained by the other player, \(u_{k}(\alpha _{k}^{*},\alpha _{k}^{*})\), as long as its own payoff \(u_{k}(\alpha _{k}^{*},\alpha _{k}^{*})\) is better than the NE. This means that the players are selfish but not malicious. However, in realistic scenarios, if the two selfish players have similar negotiation stand points, it is unlikely that they agree to a profile \((\alpha _{k}^{*},\alpha _{k}^{*})\) that provides a huge advantage to one of the players.
In addition, there may also be cases when a player agrees to an action profile that offers a large advantage to the other player and leads to asymmetric payoffs in situations when the player wishes to obtain the video at any cost. Such a behavior is critical from the point of view of service provider providing the transmission to both the players: the provider prefers to provide just enough rate (from the other player) as desired by the player and not more, since the player is ready to settle for lower rates due to its extreme needs.
To model such behavior, we introduce the concept of tolerance index as follows:
Corollary 1.
where \(u_{k}^{*}=u_{k}(\alpha _{k}^{*},\alpha _{k}^{*})\), \(u_{k}^{*}=u_{k}(\alpha _{k}^{*},\alpha _{k}^{*})\) and ξ>0 represents the tolerance index.
The agreement points in (12) do not imply that the players’ utilities must be equal. If the players have infinite tolerance index, then the players may agree to any action profile as long as they obtain a higher payoff than the NE. If the players have a limited tolerance index ξ, then the condition (15) restricts the previous region of agreement points as follows: the player obtaining a lower payoff will only agree to (\(\alpha _{1}^{*},\alpha _{2}^{*}\)) if the difference in payoffs is lower than ξ times its own payoff. In other words, when the players exchange the video, they can tolerate a difference in their video qualities or costs which is bounded. Rational players that have similar negotiating stand points will only agree to fair contracts.
Note that apart from this tolerance index, there exist other parameters that could be used to quantify the fairness among the different users. For example, Jain’s fairness index [20] and maxmin fairness index capture the fairness among all the users at the holistic level and from a centralized (systemwise) point of view. In this work, we use the tolerance index for a different reason. Indeed, our tolerance index is a usercentric one in which the individual tolerance of each user is quantized independently from the overall system.
3.5.2 Pareto optimality
Aside from the fairness criterion, Pareto optimality also plays a role in determining the profiles most likely to be chosen by rational players [19]. Pareto optimal profiles are those profiles starting from which no player can improve its own payoff without making another’s payoff worse. These profiles lie on boundary of the whole achievable region. Intuitively, players will tend to agree upon the profiles which are Pareto optimal: assume that the players choose an agreement profile which is not Pareto optimal. This means that either or both players can improve their payoffs without worsening anyone’s payoff. Therefore, the players will tend to agree upon the agreement profiles which are Pareto optimal such that both players can obtain the maximum possible payoff at a given discount factor and tolerance index.
4 Numerical results
We now present our simulation results to illustrate how the channel conditions modify the outcome when two players exchange video repeatedly over an undefined horizon of time. We consider the following scenario: R _{1}=350 kbps (a good state channel), R _{2}=200 kbps (a bad state channel), T _{ c }=20 s (coherence time), T=20000 s (duration of each video exchange), β=150 kbps, Δ=1, w _{1}=w _{2}=0.45, and w _{3}=0.1.
4.1 Achievable agreement region
We remark that the boundary curve has periodical pikes that can be explained by the shape of the utility u _{ k }(α _{ k },α _{−k }) as function of (α _{ k },α _{−k }). For a fixed α _{−k }, this function decreases with α _{ k }. However, this utility function is not a monotonous function of α _{−k }. This is because the utility function is composed of QoE and QoS terms aside from the transmission cost. Although, the QoE decreases with α _{−k } (due to abrupt breaks in flow continuity with a rate exceeding the channel capacity), the QoS may either increase or decrease as a function of {α _{−k }}, depending on magnitude of α _{−k }. Therefore, due to a joint effect of variation of QoS and QoE with α _{−k }, the overall utility function does not vary monotonously as a function of α _{−k }. In fact, the derivative of the utility function as a function of {α _{−k }} is a complex trigonometric function which is periodic in nature leading to the shape of the curve.
4.2 Minimal discount factor
4.3 QoS vs. QoE tradeoff
Figure 7 also shows that as the discount factor increases, the QoS increases. On the contrary, in Fig. 8, as the discount factor increases, the QoE decreases. As the discount factor increases, the probability of the game to continue also increases which allows the players to agree on higher rate adaptation gradients. This causes a higher number of packets to be transmitted per unit time which improves the overall QoS. However, higher gradients cause a higher chance of the rate to exceed the channel capacity resulting in delays and freezing of video: lower QoE. Therefore, a tradeoff arises between QoS and QoE depending on the discount factor.
4.4 Tolerance index and Pareto optimality
Another interesting observation from Fig. 10 is that when the players have a higher tolerance index, the Pareto optimal asymmetric payoffs lie on both sides of the first bisector. If the players have different leverage over each other, the player that has higher leverage (or is more powerful) can influence the payoff in its favor. For example, if player 1 has a higher leverage/power, for a tolerance index of ξ>0 %, it would prefer to have a higher utility, and an agreement point below the first bisector is more likely to be chosen.
4.5 Influence of weights of QoS, QoE, and cost
We now show the dependence of the sustainable agreements’ region on the different weights w _{ i } assigned to the different factors affecting the utility in (9) : QoS, QoE, and cost.
5 Conclusions
It is clear that the rapid increase in the multimedia transmission over wireless networks imposes the evolution of existing transmission protocol. As there is higher agression among the players to compete for the existing resources, it is important to account for the competition within the existing protocols from the end players’ point of view for stable communication systems to evolve. This paper contributes towards this aim by presenting a novel approach to model such competitions and analyzing the outcomes of these competition in a generic framework applicable to different rate adaptation protocols. It has been proved using concrete theoretical analysis and simulations that the video exchange over the wireless networks is quite sensitive to subjective considerations like player rationality, tolerance towards other players, and the period of time for which the interaction takes place. The QoS, QoE, and cost tradeoff that we present in this work is critical towards the design of communication models for realistic setting in the presence of selfish players.
This work provides one of the primary analysis to pave the way for future research in development of robust protocols for transmission over the wireless channels. Furthermore, it is required to finetune this generic analysis for implementation within the scope of existent protocols in order to accommodate and react to underlying tradeoffs. Future lines of work also include extension of the channel models to accommodate dense networks. Another consideration for future work is the asymmetric channel conditions case, in which the throughputs of the links are different and reflective of more complex realworld scenarios. Under this perspective, future investigations could also include physical layer aspects such as modulation, and channel coding.
6 Appendix 1: proof of proposition 1
Consider the utility function defined in (9). Node k can maximize its utility by minimizing the cost given by (7). The cost is a logarithmic function which is an increasing function of α _{ k }, and therefore, is minimized at α _{ k }=0. The node k has no control over the action taken by the other player; therefore, the part of its utility affected by the other player’s action, i.e. its QoS and QoE, cannot be controlled. In the oneshot game, when the players cannot build trust with each other, they selfishly choose the action to maximize their utility. Hence, the NE for video exchange is given by (0,0).
7 Appendix 2: proof of proposition 2
This proof follows by the backward induction principle [10, 19]. We consider the last stage of the game. When the game is played at the Tth stage, the players are aware that they interact for the last time. They have no incentive to transmit the video for the other player while incurring a cost themselves, when there is no guarantee that the other player will transmit the video or not. Hence, their optimal strategy is \(s_{k}^{(T),*}=0\). Now, when the players play the game at time T−1, given that in stage T they will not transmit, there is no incentive to transmit for any history h ^{(T−1)}. Therefore, \(s_{k}^{(T1),*}=0\). Following the backward induction principle, the players have no incentive to transmit at any stage in the repeated game, therefore, the SPNE is given by \(s_{k}^{(t),*}=0\:\forall \: t\in \{1,\ldots,T\}\). The discounted payoff is then v _{ k }(s ^{∗})=0. Note that this result is based on the principle that a rational player will never choose an action that is strictly dominated [19]. At each stage of the game, the strategy (0,0) is strictly dominating.
8 Appendix 3: proof of proposition 3
In order to prove the above proof, we cannot use backward induction as for Proposition 2, because the players are not aware of the last stage of the game. We use the onestep deviation principle to prove this SPE [10, 19]. The onestep deviation principle states that a strategy profile \(s^{*}=(s_{1}^{*},s_{2}^{*})\) is an SPE if for every player k, there exists no strategy \(\hat {s}_{k}\neq s_{k}^{*}\) such that at any stage τ and history h ^{(τ)}, the strategy \(\hat {s}_{k}\mid h^{(\tau)}\) is a better response than \(s_{k}^{*}\mid h^{(\tau)}\) in a subgame \(\mathcal {G_{R}}(h^{(\tau)})\). This principle is based on the fact that if there is a strategy which offers the incentive to a player to deviate at a single stage in the game then the initial strategy is not a SPE. However, if the player has no incentive for any deviation from its current strategy at any stage, such a strategy is an SPE.
Since w _{ i }∈[0,1], hence, the stage payoffs are uniformly bounded.
Secondly, we will evaluate if any deviation from the profile in (11) offers a profit to one of the players. Let us assume that the player k deviates from the strategy \(s_{k}^{*}\) at stage τ and history h ^{(τ)} with the strategy \(\hat {s}_{k}^{(\tau)}(h^{(\tau)})=\alpha _{k}\in (0,\lambda \frac {\pi }{2}]\) and from then on conforms again to the strategy \(s_{k}^{*}\) such that \(\hat {s}_{k}^{(t)}=s_{k}^{(t),*}\) for all t>τ. This implies that the player k transmits video with the rate adaptation gradient α _{ k }>0, at an intermediate game played at the τ th stage and then conforms to α _{ k }=0 for the rest of the game. The player k therefore incurs some extra cost of transmission at the τth stage given by (7) which leads to u _{ k }(α _{ k },0)<u _{ k }(0,0). Therefore, the payoff \(v_{k}(\hat {s}_{k}\mid h^{(\tau)},s_{k}^{*}\mid h^{(\tau)})<v_{k}(s_{k}^{*}\mid h^{(\tau)},s_{k}^{*}\mid h^{(\tau)})\). Hence, the player k has no incentive in sending any video packets at any stage of the game, thereby making (11) an SPE.
9 Appendix 4: proof of proposition 4
We will once again use the onestep deviation principle to prove the SPE. Consider the players following the agreement profile \((\alpha _{1}^{*},\alpha _{2}^{*})\) satisfying (12). Let us consider two cases: (i) case 1: there has been no deviation from \((\alpha _{1}^{*},\alpha _{2}^{*})\) and (ii) case 2: there has been a deviation from \((\alpha _{1}^{*},\alpha _{2}^{*})\) and now both players are using the threat point (0,0).
Case 1.
(2) Let us assume that \(\overline {\alpha }_{k}>\alpha _{k}^{*}\). This is a trivial case because transmitting at one stage at \(\overline {\alpha }_{k}\) is not a deviation from the strategy as the player still transmits at least \(\alpha _{k}^{*}\). However, for completeness, we provide the analysis. Now, the player k conforms to the agreement point until the stage τ. At τ+1, it deviates from agreement point and transmits at an \(\overline {\alpha }_{k}\) greater than \(\alpha _{k}^{*}\). From τ+2, it again conforms to the strategy. Conforming to the strategy implies sending at \(\alpha _{k}^{*}\). In this case, the discounted payoff without deviation, given by \(u_{k}(\alpha _{k}^{*},\alpha _{k}^{*})\), is strictly greater than the discounted payoff with deviation. This is because, at stage τ+1, the payoff \(u_{k}(\overline {\alpha }_{k},\alpha _{k}^{*})<u_{1}(\alpha _{k}^{*},\alpha _{k}^{*})\) for any \(\overline {\alpha }_{k}>\alpha _{k}^{*}\). Further, from stage τ+2 onwards, the payoff at each stage after deviation is equal to the payoff at each stage without deviation, i.e. \(u_{k}(\alpha _{1}^{*},\alpha _{2}^{*})\). Therefore, overall, the discounted payoff without deviation is strictly greater than discounted payoff with deviation. We now consider case 2.
Case 2.
We now consider the case when there has been a deviation from the agreement point \((\alpha _{k}^{*},\alpha _{k}^{*})\) and now both players are using the threat point (0,0). We would now examine if it is possible for any player to deviate from this strategy and play α _{ k }>0. We note that u _{ k }(α _{ k },0)<u _{ k }(0,0). Hence, if any player deviates at any intermediate stage from (0,0) and then conforms to the strategy by playing (0,0), the stage payoff at the deviation will be lesser than the stage payoff if there was no deviation. Therefore, the overall payoff without deviation would be higher than the payoff with deviation. Therefore, the player has no incentive to deviate from (0,0).
Declarations
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 RKP Mok, EWW Chan, RKC Chang, in Integrated Network Management, ed. by N Agoulmine, C Bartolini, T Pfeifer, and D O’Sullivan. Measuring the quality of experience of HTTP video streaming (IEEE, 2011), pp. 485–492.Google Scholar
 AB MacKenzie, LA DaSilva, Synthesis Lectures on Communications Morgan & Claypool Publishers, (2006).Google Scholar
 S Lasaulce, H Tembine, Game Theory and Learning for Wireless Networks: Fundamentals and Applications, 1st ed (Academic Press, 2011).Google Scholar
 EV Belmega, S Lasaulce, M Debbah, Power allocation games for mimo multiple access channels with coordination. IEEE Trans. Wirel. Commun.8(6), 3182–3192 (2009).View ArticleMATHGoogle Scholar
 YE Sagduyu, A Ephremides, in Decision and Control, 2003. Proceedings. 42nd IEEE Conference on. Vol. 4. Power control and rate adaptation as stochastic games for random access (IEEE, 2003), pp. 4202–4207.Google Scholar
 J Ye, M Hamdi, in Global Telecommunications Conference (GLOBECOM 2011). PriorityBased Rate Adaptation Using Game Theory in Vehicular Networks (IEEEHouston, TX, USA, 2011), pp. 1–6. doi:http://dx.doi.org/10.1109/GLOCOM.2011.6133998.Google Scholar
 TA Trinh, S Molnár, in Quality of Service in the Emerging Networking Panorama. A gametheoretic analysis of tcp vegas (SpringerBerlin Heidelberg, 2004), pp. 338–347.View ArticleGoogle Scholar
 H Ren, MH Meng, Gametheoretic modeling of joint topology control and power scheduling for wireless heterogeneous sensor networks. IEEE Trans. Autom. Sci. Eng.6(4), 610–625 (2009).View ArticleGoogle Scholar
 MA VázquezCastro, F PerezFontan, LMS Markov model its use for power control error impact analysis on system capacity. IEEE J. Sel. Areas Commun.20(6), 1258–1265 (2002).View ArticleGoogle Scholar
 EV Belmega, L Sankar, HV Poor, Enabling data exchange in twoagent interactive systems under privacy constraints. IEEE J. Sel. Top. Signal Process. 9(7), 1285–1297 (2015).View ArticleGoogle Scholar
 M Le Treust, S Lasaulce, A repeated game formulation of energyefficient decentralized power control. IEEE Trans. Wirel. Commun. 9(9), 2860–2869 (2010).View ArticleGoogle Scholar
 A Akella, S Seshan, R Karp, S Shenker, C Papadimitriou, Selfish behavior and stability of the internet: a gametheoretic analysis of TCP. SIGCOMM Comput. Commun. Rev.324:, 117–130 (2002).View ArticleGoogle Scholar
 S Gupta, EV Belmega, MA VázquezCastro, in Advanced Satellite Multimedia Systems Conference and the 13th Signal Processing for Space Communications Workshop (ASMS/SPSC), 7th. Game theoretical analysis of the tradeoff between QoE and QoS over satellite channels (Livorno, 2014), pp. 24–31.Google Scholar
 E Lutz, A Markov model for correlated land mobile satellite channels. Int. J. Satell. Commun. 14:, 333–339 (1996). Wiley Subscription Services, Inc., A Wiley Company.View ArticleGoogle Scholar
 S Gupta, MA PimentelNiño, MA VázquezCastro, in Mosharaka 3rd International Conference on Wireless Communications and Mobile Computing (MICWCMC 2013). Joint network codedcross layer optimized video streaming over relay satellite channel (Valencia, 2013), pp. 14–16.Google Scholar
 MA PimentelNiño, MA VázquezCastro, H Skinnemoen, in 30th AIAA International Communications Satellite Systems Conference. Optimized ASMIRA advanced QoE video streaming for mobile satellite communications systems (Ottawa, 2012).Google Scholar
 A Afanasyev, N Tilley, P Reiher, L Kleinrock, Hosttohost congestion control for TCP. IEEE Commun. Surv. Tutor. 12(3), 304–342 (2010). Third Quarter. doi:http://dx.doi.org/10.1109/SURV.2010.042710.00114.View ArticleGoogle Scholar
 S Boyd, L Vandenberghe, Convex optimization (Cambridge University Press, New York, NY, USA, 2009).MATHGoogle Scholar
 D Fudenberg, J Tirole, Game Theory (MIT Press, Cambridge, MA, 1991).MATHGoogle Scholar
 R Jain, DM Chiu, W Hawe, A quantitative measure of fairness and discrimination for resource allocation in shared computer system, vol. 38 (Eastern Research Laboratory, Digital Equipment Corporation, Hudson, MA, 1984).Google Scholar