 Research
 Open Access
 Published:
Dynamic position changeable Stackelberg game for userprovided network control algorithms
EURASIP Journal on Wireless Communications and Networking volume 2017, Article number: 214 (2017)
Abstract
Nowadays, wireless mobile services have been going through a paradigm shift due to three reasons: (i) the increasing needs for ubiquitous connectivity, (ii) an unprecedented volume of mobile data traffic, and (iii) technologically advanced mobile devices with enhanced capabilities. To address these challenges, userprovided network (UPN) is an emerging technology and can be extensively deployed with the aim of providing substantial improvements to cellular coverage and capacity. In this study, we focus on the design of novel UPN control scheme based on the game theory. Motivated by the Stackelberg game model, our proposed scheme allows mobile devices to play changeable roles, i.e., hosts or clients, to improve the UPN system performance. Under dynamically changing UPN environments, it is a suitable approach. Based on the decentralized, individual noncooperative manner, we capture the dynamics of UPN system while leading a deep progress for future networks. Simulations and performance analysis verify the efficiency of proposed scheme, showing that our approach can outperform existing schemes in terms of bandwidth utilization, quality of experience (QoE) of service success, delay and throughput ratios, and normalized users’ profit.
Introduction
Today, we are witnessing technological advances that herald the advent of a new era in telecommunication networks. With widespread wireless techniques, and variety of userfriendly terminals, cellular networks are expected to face new challenges. The traditional existing cellular systems might run out of capacity in the near future due to significantly increasing machinetomachine (M2M) data traffic with various service requirements [1, 2]. With the emergence of a variety of new wireless network paradigms, it is envisioned that a new era of personalized services has arrived, which emphasizes users’ quality of experience (QoE). 5 Generation (5G) network standards represent a promising technology to support users’ QoE; it will become one of the key features in future networks [3].
QoE is a measure of customers’ experiences with services. It focuses on the entire service experience and is a more holistic evaluation than the more narrowly focused user experience. Traditionally, quality of service (QoS) parameters have been employed to evaluate the networkoriented service quality. However, QoE expands this horizon to capture people’s esthetic and even hedonic needs. Therefore, QoE emphasizes the endtoend performance from both the subjective and objective perspectives and takes into consideration user perception in the loop [4, 5]. While the QoE concept is easily motivated, it remains a difficult and complex problem to specially support devicetodevice, ultrareliable, and massive machinebased 5G networks.
Within the past few years, the increasing demand for mobile data in current cellular networks and the proliferation of advanced handheld devices have led to a new network paradigm, known as userprovided network (UPN). The concept of UPN was first defined by R. Sofia as “such a networking technology where the enduser is, at the same time a customer and a provider of network access” [6]. Based on the sociotechnological advance, mobile devices have surplus capacities as a “microprovider” to provide related services for other nearby users without additional network infrastructures. In particular, the rise of social networks inspires end users’ willingness to be microproviders to spread the common interests by means of shared connectivity [4].
Recently, two UPN models have been implemented through different approaches. One method enables mobile devices to create a mesh network and share their Internet connections without the intervention of network operators. The other method adopts a virtual mobile operator that enables its subscribers to act as mobile WiFi hotspots to serve nonsubscribers by offering some free data quota. Open Garden and Karma are wellknown startup examples of two methods of UPN services, respectively [7]. Even though these two methods have different control mechanisms, there has been growing consensus that UPN operations rely on the participation of selforganizing user equipments (UEs). Therefore, a central issue in UPNs is that users must agree to serve each other. However, participating users have conflicting interests, and existing UPN methods lack proper control algorithms to coordinate conflicts of interests [8].
In this paper, we study and design a novel UPN control scheme where individual users share their mobile connections and act as hosts for other users, who are acting clients in hosts’ vicinity. Providing UPN services, users in the proposed scheme can benefit from sharing leftover resources to/from other users. To share users’ connectivity and resources both fairly and efficiently, we need a new control paradigm. Nowadays, the game theoretic approach is widely recognized as a practical perspective for the implementation of realworld network operations. Game theory is the study of strategic interactions between multiple intelligent rational decision makers trying to maximize the expected value of their own payoffs, which is measured in some utility scale [8]. Motivated by the UPN’s situation that users are rational individuals and able to make control decisions logically in order to pursue their own interests, we have adopted a game theoretic mechanism. In this way, we are able to ease the heavy computational burden of theoretically optimal centralized solutions.
To model the interaction among multiple hosts and clients, Stackelberg game is a suitable and proper model; it was initially proposed by the German economist H. von Stackelberg in 1934 to explain some economic monopolization phenomena. In a classical Stackelberg game, one player acts as a leader and the rest as followers, and the main goal is to find an optimal strategy for the leader, assuming that the followers react in such a rational way that followers optimize their objective functions given the leader’s actions [8, 9]. In the study, we devise a new Stackelberg game, called position changeable Stackelberg (PCS) game, to adapt effectively to the UPN situation. In the PCS game, each individual user can be a leader, i.e., host, or a follower, i.e., client, dynamically to maximize his payoff. Based on the current position, each user learns the UPN condition and dynamically selects his strategy.
The success of UPN systems relies on the users’ willingness to contribute their Internet connectivity and network resources [10]. For practical UPN operations, it is important to design effective incentive mechanisms for encouraging hosts to participate UPN services. As examples, the Karma UPN system offers some free data quota reward to UPN hosts who share their connectivity, and the Open Garden UPN system designs a distributed bargainingbased virtual currency approach to fairly and efficiently share the resource. However, these prior studies neither focused on interactions between a particular host and his clients, or on dynamic UPN structures with changeable topologies [10].
In the proposed scheme, hosts collect the UPN service fee from their clients; it is an incentive for each host. To decide the best price strategy, each host uses a reinforcement learning algorithm in a distributed manner. By taking into account temporal and spatial UPN system status, an individual host makes intelligent decisions. Based on the PCS game and reinforcement learning, our proposed UPN control scheme mainly considers three design and operational issues: (i) which position is better for each user in the current UPN topology, (ii) what is an appropriate incentive to each host, and (iii) how users select their strategies with imperfect information. To effectively address these issues, we focus on design principles such as feasibility, selfadaptability, and effectiveness in providing a desirable solution. Although several UPN control schemes have been proposed, no systematic study based on the realistic UPN scenario has been conducted. To the best of our knowledge, there has been very little research on UPN control problems by integrating game theory and reinforcement learning algorithms.
Contribution
This study generalizes the UPN control algorithm in the following aspects. First, users in the UPN can dynamically decide their positions. To model the interaction among position changeable users, we employ the PCS game. Second, the host’s pricing policy for UPN services is developed as an incentive mechanism. Third, a novel price decision process is designed with practical assumptions. Depending on a new reinforcement learning approach, the hosts select their price strategies through individual and social learnings. Finally, a fairbalanced solution can be obtained under diversified UPN situations. In summary, the main contributions are as follows:

Position changeable Stackelberg game model: motivated by diversified UPN situations, we introduce a new game model while capturing a variety of system characteristics. This approach is generic and applicable to various UPN scenarios.

Wellbalanced network performance: we model the interaction of users by considering the responsive tradeoff between hosts and clients. According to a feedbackbased interactive process, we understand thoroughly the impact of host’s price strategy and provide an effective incentive mechanism for an attractive UPN performance.

Implementation practicality: as game players, users learn how to modify their prior knowledge and select their strategies with bounded rationality. Therefore, the decision mechanism is implemented with low complexity. It is practical and suitable for realworld UPN operations.

Solution concept: the main idea of our PCS game lies in its responsiveness to the reciprocal combination of optimality and practicality. Players act with selfadaptability and realtime effectiveness. The concept of our solution is to approximate an efficient UPN status using individual and social learning approaches.

Conclusions: Numerical study shows that our gamebased approach can increase the bandwidth utilization, service QoE, and users’ profit by 5 to 10% under different service request rates, comparing to the existing IDME [10] and MGDT [11] schemes.
Organization
The remainder of this article is organized as follows. In the next section, we review some related UPN schemes and their problems. In Section 3, we provide a detailed description of the proposed PCS game model and UPN control algorithms. In particular, this section provides fresh insights into the benefits and design of PCS gamebased UPN control approach. For convenience, the main steps of the proposed scheme are then listed. In Section 4, we validate the performance of the proposed scheme by means of a comparison with some existing methods. Finally, we present our conclusion and discuss the remaining open challenges in this area along with possible solutions.
Related work
There has been considerable research into the design of UPN control schemes. Sofia et al. [6] first defined the concept of UPN, where the end users can be consumers and providers of Internet access at the same time. Authors provided a characterization of UPN, and a comparison of connectivity features for the UPN against ad hoc and multihop networks. To clarify the main differences from other autonomic networks, this article investigated a paradigm shift in Internet services and wholesale models [6].
The article [4] surveyed recent technical developments in the fields of QoE and UPN. After that, a new UPNbased framework in mobile networks was proposed. In particular, the importance of end users in the QoE provisioning chain was highlighted. In order to aid descriptions of the proposed UPNbased networks, a case study was conducted with the proposed adaptive resource scheduling method for QoE improvement. Some possible challenges and future research were also discussed [4].
Iosifidis et al. [12] analyzed the design challenges of incentive mechanisms for encouraging user engagement in UPNs. Motivated by recently launched business models, they discussed the technical issues pertaining to resource allocation for UPN services and explained the importance of incorporating incentive schemes. And then, they analyzed mobile UPN models that were inspired by Open Garden and Karma. Finally, the challenges in designing incentive mechanisms for these services and two appropriate solutions were discussed [12].
Gao et al. [13] designed an optimal pricing and reimbursing strategy for a mobile virtual network operator that maximized its total revenue while considering the necessary incentives to the operator. They employed a gametheoretic analysis and modeled the interaction between the operator and the hosts as a twostage leaderfollower Stackelberg game. In the first stage, the operator decided the price and free data quota reimbursing plan. In the second stage, every host decides how much data would be consumed for its own needs and how much data would be forwarded for clients. Finally, they systematically analyzed the game equilibrium, which can capture a variety of system characteristics, including the service types of hosts or clients, the capacities of hosts’ Internet connections, and the energy consumption patterns of hosts [13].
The Matching Game based Data Trading (MGDT) scheme [11] considered a new operatorsupervised UPN scheme based on the matching game model. In this scheme, users shared connectivity and acted as an access point for other users. To incentivize user participation in the UPN, the MGDT scheme allowed the users to trade their data plan and obtained a profit by selling and buying leftover data capacities from each other. To formulate the buyerseller data trading process, a manytoone matching game was designed and a distributed algorithm was developed that enabled users to selforganize into a stable matching. By combining the matching theory and market equilibrium, the MGDT scheme ensured dynamic adaptation of price to data demand and supply and calculated operator gains and the benchmark price that would encourage users to join the UPN [11].
The Incentive Design and Market Evolution (IDME) scheme [10] studied the user membership evolution in an operatorassisted UPN and the operator’s best strategy to maximize his profit. Simply, two key questions in the IDME scheme were (i) what was the best membership choice for each mobile user and (ii) what was the operator’s best pricing strategy. To effectively answer these two questions, the IDME scheme formulated the interaction between the operator and mobile devices as a twostage game. In the stage I, the operator determined the usagebased pricing and quotabased incentive mechanism for the data usage. In the stage II, the mobile devices made their decisions about whether to be a host, or a client [10]. Some earlier studies [4, 6, 10,11,12,13] have attracted considerable attention while introducing unique challenges in handling the UPN control problems. In this paper, we demonstrate that our proposed scheme significantly outperforms these existing IDME [10] and MGDT [11] schemes.
The proposed PCS gamebased UPN algorithms
In this section, we provide a brief introduction to our PCS game model, which forms the theoretical basis of the proposed UPN control scheme. By adopting a new PCS gamebased approach, we design effective control protocols to adapt the dynamically changing UPN environments.
PCS game model and players’ utility functions
Currently, there are several UPN models that differ in the architectures and services they offer to users. However, one common aspect of these models is that UPNs consists of several network agents that are organized in a mesh topology where all agents cooperate with each other to improve network performance. Generally, there are two types of agents which are base stations, i.e., eNBs and UEs. The function of eNBs is to support Internet connectivity by acting as gateways to connect backbone networks. UEs participate in UPN services to advance traffic management and reduce energy consumption. The main concern of UPN idea is to exploit the diversity of UEs’ needs, resources, and Internet connectivity by building an autonomous intervention mechanism.
We consider the operation of a multicellular network that consists of a set B = {B _{1}…B _{ n }} of eNBs, and a set G = {G _{1}…G _{ y }} of UEs. Some UEs may be in range with a B ∈B, while some others may not. Each eNB has a set of bandwidth resource blocks that can be allocated to UEs in each time period. We assume that UEs may change their locations and hence have different neighbors in different time periods. Such a mobility assumption allows us to get closed form expressions and gain valuable insights into the design of UPN.
Different UEs are in different situations. In our model, there are two types of UEs, i.e., hosts and clients in the UPN. A host can connect to the eNB directly and share his Internet connection with clients. Therefore, hosts serve as temporal gateways to other clients. A client can connect to the eNB directly only for his service, or connects to the eNB via a nearby host. The host’s goal is to maximize his profit, which depends on the revenue collected from his clients, and the objective of the client is also to maximize his profits, which depend on the outcome from UPN services. According to their current status, they dynamically select their positions and their decisions are adaptively changeable over time. In this study, the UPN control scheme is formulated as a new PCS game model. As game players, UEs select their positions and strategies based on the interactions of other UEs. At each time period of gameplay, we formally define the PCS game model \( {\mathbb{G}}^{PCS} \) = {B, G, \( \left\{{\boldsymbol{S}}_h^G,{\boldsymbol{S}}_c^G\right\} \), \( \left\{{U}_h^G,{U}_c^G\right\} \), \( {L}_G^{\mathfrak{P}} \), T} as follows:

B = {B _{1}…B _{ n }} represents a set of eNBs, and each B has orthogonal bandwidth blocks.

G = {G _{1}…G _{ y }} is the finite set of PCS game players, i.e., UEs. The G can be divided into three subsets G ^{h}, G ^{c}, and G ^{n}; G ^{h} is the subset of hosts \( {G}_k^h\in {\mathbf{G}}^{\boldsymbol{h}}\subset \mathbf{G} \), G ^{c} is the subset of clients \( {G}_i^c\in {\mathbf{G}}^{\boldsymbol{c}}\subset \mathbf{G} \) and G ^{n} is the subset of UEs, which temporarily give up service requests \( {G}_l^n\in {\mathbf{G}}^{\boldsymbol{n}}\subset \mathbf{G} \).

In the \( \left\{{\boldsymbol{S}}_h^G,{\boldsymbol{S}}_c^G\right\} \), \( {\boldsymbol{S}}_h^G=\left\{{s}_{\mathrm{min}}^h\dots {s}_i^h\dots {s}_{\mathrm{max}}^h\right\} \) is the set of host’s strategies where \( {s}_i^h\in {\boldsymbol{S}}_h^G \) means the i ^{th} price level for client’s communications. \( {\boldsymbol{S}}_c^G \) is the set of client’s available strategies; each client has three strategy options: (i) selects one host among his neighboring hosts for relay communications, (ii) directly connects to his corresponding eNB, or (iii) to be a member of G ^{n}.

\( {U}_h^G \) is the payoff received by the host, and \( {U}_c^G \) is the payoff received by the client during the UPN operation.

\( {L}_G^s \) is the learning value for the host’s strategy s; \( {L}_G^s \) is used to estimate the probability distribution (P ^{G}) for the next strategy selection.

T ={H _{1}, …, H _{ t }, H _{ t + 1}, …} denotes time, which is represented by a sequence of time steps with imperfect information for the PCS game process.
Table 1 lists the notations used in this paper.
In the UPN system, eNBs charge their subscribers, i.e., hosts and clients, based on the usagebased price mechanism. If unexpected growth of traffic may develop in a specific eNB, it may create a local traffic congestion. To alleviate this kind of traffic overload, the current price (P _{ B }) should increase. If few subscribers access in a eNB, the current price should decrease to attract subscribers to participate in UPN services. As in the same manner, hosts charge their clients based on the current traffic load. According to the current price, individual clients act independently and selfishly; all game players select their strategies to maximize their payoffs.
Each host can connect to the eNB anytime via his device. At the same time, this device can provide Internet connections for nearby clients as a gateway. Therefore, the assigned communication resource for a host can be shared with clients, and clients need to pay the price for their UPN services. Depending on the payment, cost, and expense functions of UPN services, the utility function of the host \( {G}_k^h \) at time H _{ t } \( \left({U}_h^{G_k^h}\left({H}_t\right)\right) \) can be defined as follows:
where \( {P}_h^{G_k^h}\left(\cdotp \right) \) is the price function to clients at time H _{ t }, and \( {C}_h^{G_k^h}\left(\cdotp \right) \) and \( {D}_h^{G_k^h}\left(\cdotp \right) \) are the cost and expense functions of \( {G}_k^h \), respectively, for the UPN services.\( {\boldsymbol{N}}_{G_k^h} \) is the set of \( {G}_k^h \)’s neighboring hosts, and \( {r}_c^{G_i^c} \) is the bandwidth amount of \( {G}_i^c \)’s UPN service. \( {F}^s\left({G}_k^h,{H}_t\right) \) is the \( {G}_k^h \)’s UPN price strategy per bit at time H _{ t } where \( {F}^s\left({G}_k^h,{H}_t\right)\in {\boldsymbol{S}}_h^{G_k^h} \). Let α and Θ ^{B}(H _{ t }) denote the connected eNB’s minimum charge and bandwidth utilization at time H _{ t }, respectively. \( {m}_h^{G_k^h} \) is the \( {G}_k^h \)’s the expense factor per bit of UPN services.
Each client can connect to the backbone Internet directly via the eNB or indirectly via a nearby host. If there are multiple hosts in his coverage area, the client selects one of them to maximize his payoff. If \( {G}_k^h \) is selected as a host, the utility function of the client \( {G}_i^c \) at time H _{ t } \( \left({U}_c^{G_i^c}\left({G}_k^h,{H}_t\right)\right) \) can be defined as follows:
where \( {Q}_c^{G_i^c}\left(\cdotp \right) \), \( {I}_c^{G_i^c}\left(\cdotp \right) \), and \( {K}_c^{G_i^c}\left(\cdotp \right) \) are the outcome, charge, and expense functions of \( {G}_i^c \), respectively. Let \( {\psi}_c^{G_i^c} \) and \( {\zeta}_c^{G_i^c} \) denote the \( {G}_i^c \)’s satisfaction and expense factors per bit of UPN service, respectively. PS _{ B }(H _{ t }) represents the corresponding eNB’s price strategy at time H _{ t }, e.g., [α + Θ^{B}(H _{ t })].
As discussed previously, there are three positions of UEs: host (G ^{h}), client (G ^{c}), or a member of G ^{n}. At each time, UEs try to decide their best positions. Hence, each UE selects his position Χ to maximize his payoff (U) where Χ ∈ G ^{h} ∪ G ^{c} ∪ G ^{n}.
Strategy selection mechanisms in the PCS game
During the UPN operation, eNBs have to manage radio resource to optimize all UE’s communications. Therefore, the UPN performance depends on its radio resource management algorithm and its implementation. Resource allocation process in each eNB is performed by the radio resource management (RRM) entity that dynamically distributes the radio resource to each active UEs in its covering area. The amount of radio resource allocation is specified in terms of resource units (RUs), where one RU is the minimum allocation amount, e.g., 128 Mbps. The RRM must be able to meet UEs’ requirements while maximizing the resource usage in a flat radio network structure [14].
In this study, we use the main concept of Nash bargaining solution (NBS) to allocate RUs. The NBS is an effective tool to achieve a mutually desirable solution with a good balance between efficiency and fairness [8]. However, in many cases, the assumptions of traditional NBS are unrealistic in reallife environments. In particular, if we consider a scenario of dynamic UPNs, the classical NBS cannot be directly applied to the RU allocation problem in UPNs. To avoid the inevitable burden of uncertainty, it is necessary to admit a new NBS approach in order to adapt current network changes.
In consideration of UEs’ mobility, we develop an iterative resource allocation algorithm, which is formulated as a sequential Nash bargaining manner. The main concept of sequential Nash bargaining is to observe the current system environment and dynamically update the NBS to adapt to the network dynamics. This approach can relax the traditional NBS assumption that all information are completely known. Therefore, during the stepbystep iteration, the eNB adaptively assigns RUs to all resourcerequested UEs. In the B _{ j }, the allocated RU amount for the G _{ k } at time H _{ t } \( \left({\phi}_{H_t}^{G_k}\left({B}_j\right)\right) \) is defined as follows:
where \( {\vartheta}_{H_t}^{G_k} \) is the amount of RU requested by the G _{ k } at time H _{ t } and \( {R}_{H_t}^{B_j} \) is the available RU amount of B _{ j }. A(B _{ j }) is the set of UEs, which now request new RUs from the B _{ j }. \( {\gamma}_{H_t}^{G_l} \) is the G _{ l }’s bargaining power at time H _{ t }, which is the relative ability to exert influence over other UEs. Usually, the bargaining solution is strongly dependent on the bargaining powers. If different bargaining powers are used, the G with a higher bargaining power obtains a higher RUs. In the proposed algorithm, determining the bargaining powers depends on the G’s mobility. Let \( {\mathcal{E}}_{H_t}^{G_l} \) be the entropy for the G _{ l } at the time H _{ t }. Usually, the basic concept of entropy is the uncertainty and a measure of the disorder in a system [15,16,17]. To evaluate G _{ l }’s stability in the UPN, \( {\mathcal{E}}_{H_t}^{G_l} \) represents the topological change, which is a natural quantification of the effect of G _{ l } mobility in the UPN; it is calculated as follows:
where ∆ _{ H } is a time interval, and \( {F}_{G_l} \) denotes the set of the neighboring UEs of G _{ l }, and \( C\left({F}_{G_l}\right) \) is the cardinality (degree) of set \( {F}_{G_l} \). O(G _{ l }, G _{ f }) represents a measure of the relative mobility among two G _{ l } and G _{ f }. Recently, a new sensor technology has been developed rapidly. Nowadays, onboard units (OBUs) are embedded in most sensors of UEs [18]. OBUs are entities capable of event data recorder, global positioning system, forward and backward radar, sensing facility, and shortrange wireless interface. Using OBUs, each UE can easily estimate the relative mobility of other UEs.
The entropy \( {\mathcal{E}}_{H_t}^{G_l} \) is normalized as \( 0\le {\mathcal{E}}_{H_t}^{G_l}\le 1 \). If the value of \( {\mathcal{E}}_{H_t}^{G_l} \) is close to 1 (or close to 0), the G _{ l } is stable (or unstable) [16]. Finally, the G _{ l }’s bargaining power at time H _{ t } (\( {\gamma}_{H_t}^{G_l} \)) is calculated as follows:
As a host, the G ^{h}’s strategy (S ^{h}) represents the price level of client’s communications. Therefore, the payoff of \( {G}^h\kern0.5em \left({U}_h^{G^h}\right) \) can be defined as a function of S ^{h} and requested client services \( \left({r}_c^{G^c}\right) \). Interactions between G ^{h} and G ^{c} continue repeatedly during the UPN process over time. Therefore, the G ^{h} should consider the reactions from G ^{c} to determine his strategy S ^{h}. Under incomplete and asymmetric information situation, G ^{h} should learn from the current environment, build knowledge, and ultimately make strategy decisions to maximize his payoff. Until now, several learning algorithms have been developed to learn from the dynamic environments. However, traditional learning algorithms are not sufficient to follow rapid UPN changes. In particular, slow reaction to system fluctuations is a main drawback.
In this study, we develop a new learning algorithm to decide effectively the host’s price policy for UPN services. Usually, learning is divided into two categories: individual learning and social learning. Individual learning refers to trial and error, or insight temporal learning, and social learning refers to spatial leaning through interactions with other individuals. The main novelty of our learning method is a jointdesign manner concerning individual and social learning approaches. If the strategy \( {S}_j^h \) is selected at time H _{ t − 1} by the \( {G}_k^h \) in the B _{ g }’s coverage area, the \( {G}_k^h \) updates the strategy \( {S}_j^h \)’s learning value for the next time step \( \left({L}_{G_k^h}^{s_j^h}\left({H}_t\right)\right) \) according to the following method:
where x is a learning rate that models how the L values are updated and Λ(B _{ g }) is the set of hosts, which are connected to the B _{ g }. ‖B _{ g }‖ is the cardinality of the set Λ(B _{ g }). In Eq. (7), v and w represent individual and social learning values, respectively. Therefore, β is a control factor for the weighted average between different learning approaches. When a UE’s mobility in the \( {G}_k^h \)’s coverage area is high, we place more emphasis on the social learning, i.e., on w. In this case, a higher value of β is more suitable. When a UE’s mobility in the \( {G}_k^h \)’s coverage area is low, the L(·) value should be strongly dependent on other UEs’ responses in the \( {G}_k^h \)’s area. In this case, a lower value of β is preferable to emphasize the individual learning, i.e., on v. In the proposed scheme, the value of β is dynamically adjusted based on the current entropy average of UEs, which are covered by the \( {G}_k^h \). That is to say, β is the arithmetic mean of each UE’s entropy, i.e., \( {\mathcal{E}}_{H_{t1}}^G \), according to Eq. (5). Therefore, we can learn the finest strategy both the individual and the social level with the incomplete information of UPN conditions.
Based on the L(·) values, a strategy selection distribution (P) for each host can be defined. During the UPN process, we determine \( {\mathbf{P}}^{G_k^h}\left({H}_t\right)=\left\{{p}_{s_{\mathrm{min}}^h}^{G_k^h}\left({H}_t\right)\dots {p}_{s_j^h}^{G_k^h}\left({H}_t\right)\dots {p}_{s_{\mathrm{max}}^h}^{G_k^h}\left({H}_t\right)\right\} \) as the probability distribution of \( {G}_k^h \)’s strategy selection at time H _{ t }; it is sequentially modified over time. In this study, \( {p}_s^G\left(\bullet \right) \) is defined based on the concept of Boltzmann distribution method, which has been used expensively in various machine learning algorithms. Finally, the \( {s}_j^h \) strategy selection probability by the \( {G}_k^h \) at time H _{ t } \( \left({\mathcal{P}}_{s_j^h}^{G_k^h}\left({H}_t\right)\right) \) is given by:
where \( {U}_h^{G_k^h}\left({H}_{t1},{s}_j^h\right) \) is the \( {G}_k^h \)’s payoff with the strategy \( {s}_j^h \) at time H _{ t − 1}. Therefore, the min{max(⋅)} term is used to exclude nonprofitable strategies from selection at time H _{ t }.
Main steps of proposed UPN control algorithm
In this paper, we proposed a novel UPN control scheme based on the PCS game, which is implemented as a distributed and dynamic repeated game for opportunistic UPN operations. In the proposed scheme, individual UEs are game players and attempt to maximize their payoffs through a stepbystep interactive game process. Therefore, UEs can learn the current UPN situation and determine their best position and strategies. Generally, wellknown solution concepts of game theory are presented in closedform expressions under the complete information. However, they cannot capture the adaptation issue of UPN operations over time.
Usually, control algorithms have exponential time complexity in order to solve classical optimal problems. Furthermore, they have mostly been developed in a static setting. These methods are impractical to be implemented for realistic system operations. In this study, we do not focus on trying to get an optimal solution based on the traditional optimal approach. But instead, an interactive game model is proposed. Using feedbackbased selfmonitoring and distributed learning techniques, control decisions are made dynamically while adapting to the current UPN situation. This decision mechanism is implemented with polynomial complexity. In addition, we can transfer the computational burden from a central system to UEs in a distributed online fashion. Therefore, it is a suitable approach for the realworld UPN system in the point view of practical operations. The main steps of the proposed scheme are described as follows (see Fig. 1).

Step 1: Control parameters are determined by the simulation scenario (see Table 2).

Step 2: At the initial time, the L learning values in each host UE are equally distributed. This starting guess guarantees that each host’s price strategy enjoys the same benefit at the beginning of PCS game.

Step 3: During the PCS game, UEs are freely moving around among eNBs’ coverage areas, and each UE constantly checks a connection availability to his corresponding eNB.

Step 4: At every time step (H), each individual UE estimates his available payoffs depending on different positions and decides independently his position among {G ^{h} ∪ G ^{c} ∪ G ^{n}} to maximize his payoff according to Eqs. (1)–(3).

Step 5: If a UE can directly connect to his corresponding eNB, it monitors UPN service requests from his neighboring client UEs. Otherwise, the UE asks his UPN services to his neighboring host UEs.

Step 6: When the total bandwidth amount requested by UEs is larger than the eNB’s capacity, the eNB distributes his available bandwidth through the sequential NBS according to (4). By considering each UE’s stability, bargaining powers (γ) of UEs are adaptively adjusted using (5)–(6).

Step 7: During each time period (H), a host UE (G ^{h}) dynamically calculates current learning values L(·) for each strategy S ^{h} according to (7). Based on the individual and social learning viewpoints, L(·) values effectively reflect the UPN’s temporal and spatial situations.

Step 8: The probability distribution (\( {\mathbf{P}}^{G^h} \)) for each G ^{h}’s strategy selection is dynamically adjusted based on the obtained learning values (L(·)) using the Eq. (8).

Step 9: Based on the interactive feedback process, the dynamics of our PCS game can cause a cascade of interactions among the UEs. As game players, UEs dynamically choose their best position and strategies in an online distributed fashion.

Step 10: Under the dynamic UPN environment, individual UEs are constantly selfmonitoring for the next game process; go to Step 3.
Performance evaluation
In this section, we evaluate the performance of our proposed protocol and compare it with that of the IDME [10] and MGDT [11] schemes. Based on the simulation results, we confirm the superiority of the proposed approach. There are other performance analysis methods: theoretical or numerical analysis. However, these methods have to be limited in scope—limited modeling possibility for dynamic behavior. Therefore, for complex and complicated algorithms, such as our proposed scheme, no capability makes tractable the theoretical and numerical model without many simplifications, which cannot provide precise performance evaluation. In contrast to these methods, a simulation analysis allows more complex realistic modeling for one realworld system. Therefore, in this paper, we propose a simulation model for the performance evaluation of our UPN scheme. To ensure a fair comparison, the following assumptions and system scenario were used:

The simulated system consists of 50 eNBs, and the number of UEs is 1000. The bandwidth capacity of each eNB is 100 Gbps, and the bandwidth resource blocks of each eNB are orthogonal with each other.

UEs can travel in one of six directions with an equal probability. Simply, we consider four cases of user velocity—fast speed (120 km/h), medium speed (60 km/h), slow speed (20 km/h), and 0 speed (stationary; 0 km/h); it is randomly selected.

Based on the speed, a UE’s residence time in a eNB area is estimated that (i) it is 30 s for a fast speed UE, (ii) it is 60 s for a medium speed UE, (iii) it is 180 s for a slow speed UE, and (iv) it is the same as the service duration time for a stationary UE.

According to the UE’s characteristics, new service requests are generated based on the Poisson process, which is with rate λ (services/s), and the range is varied from 0 to 3.

There are eight different service requests; UEs randomly generate different service requests.

In order to represent various application services, eight different traffic types are assumed based on connection duration, bandwidth requirement, and required QoE. They are generated with equal probability.

The durations of service applications are exponentially distributed.

The price strategies in \( {\boldsymbol{S}}_h^{\mathfrak{I}} \) are defined as \( {s}_{\min =1}^h \)= 1, \( {s}_2^h \)= 1.2, \( {s}_3^h \)= 1.4, \( {s}_4^h \)= 1.6, \( {s}_5^h \)= 1.8, and \( {s}_{\max =6}^h \)= 2; the communication resource unit is 1 bps.

System performance measures obtained on the basis of 100 simulation runs are plotted as functions of the service request generation rate.

For simplicity, we assume the absence of physical obstacles and interferences in the experiments.
To demonstrate the validity of our proposed method, we measured the bandwidth utilization, QoE of service success ratio, delay rate and system throughput, and normalized UEs’ profit. Table 2 shows the system parameters used in the simulation. Major system control parameters of the simulation, presented in Table 2, facilitate the development and implementation of our simulator.
Figure 2 compares the bandwidth utilization of each scheme. In this study, the bandwidth utilization is measured as the percentage of actually used bandwidth, and it is a key factor to estimate the resource usability in UPN systems. All schemes exhibit a similar trend; however, the proposed scheme outperforms the existing methods from low to high service load distributions. By using a dynamic PCS game model, UEs in our scheme adaptively select their positions and strategies; it can improve the bandwidth utilization compared to other schemes.
Figure 3 compares the QoE in the UPN system. In this study, we develop a QoE model using the popular sigmoid function with respect to service provision, delay, and system throughput [19, 20], that is,
where κ is the service success ratio, τ is the delay rate, and \( \mathcal{z} \) is the UPN system throughput; \( \kappa, \tau, \mathrm{and}\ \mathcal{z} \) are the parameters constraining the quantization of QoE and normalized values. The QoE gain in the UPN system achieved by the proposed scheme is a result of our scheme’s selfadaptability and realtime effectiveness. Therefore, all UEs in the proposed scheme make control decisions strategically to ensure services. Due to this reason, the proposed scheme attains superior QoE to other existing schemes.
The curves in Fig. 4 indicate the normalized UEs’ profit in the UPN system. As game players, all UEs adaptively select their positions in a distribution online manner. From the viewpoint of a host, the main goal is to adaptively decide the price strategy to maximize his total revenue. From a client perspective, the main concern is to choose the best connection way to maximize his payoff. According to an interactive feedback mechanism, hosts and clients learn the current UPN environments and attempt to improve their profits. At every game period, our learningbased approach can provide synergistic and complementary features to adapt dynamic UPN situations.
The simulation results shown in Figs. 2, 3, and 4 demonstrate that the proposed scheme, which uses a learningbased PCS game model, can monitor the current UPN conditions and adapt to highly dynamic system situations. In particular, all UEs in our approach gain realtime information from the UPN environment and make intelligent decisions in a selfadapting manner. The simulation results indicate that the proposed scheme attains an attractive UPN performance, something that the IDME [10] and MGDT [11] schemes cannot offer.
Conclusions
Today, technology can enable widespread communications without depending upon traditional network structures. From a global perspective, UPN has a tremendous potential and is introducing a paradigm shift in network services while allowing the end user to be a host or a client. In this article, we have proposed a new UPN control scheme based on the PCS game model. As game players, UEs decide their positions considering the mutualinteraction relationship and learn better their strategies under dynamic UPN environments. Using feedbackbased selfmonitoring and distributed learning techniques, game players dynamically adapt to the current UPN situation and effectively maximizes their expected benefits. Based on the unique features of UPNs, our proposed scheme can provide satisfactory services under incomplete information conditions. To demonstrate the validity of our approach, we compared our scheme with existing schemes. Our simulation analysis indicates that our approach can outperform the existing schemes in a simulation environment.
There are many fascinating directions for future work. Our next step is to develop a new mechanism design to provide adaptive incentives to hosts. It will differ from classical mechanism design by adopting distributional assumptions about the agents. To be a viable solution, we also consider computational constraints. Another interesting direction is to study the impact of users’ social relationship in a multiclient multihost UPC model. In addition, our studies will address the optimality issues in the UPN system from the operator’s perspective.
Abbreviations
 5G:

5 Generation
 IDME:

Incentive Design and Market Evolution
 M2M:

Machinetomachine
 MGDT:

Matching GameBased Data Trading
 NBS:

Nash bargaining solution
 OBUs:

Onboard units
 PCS:

Position changeable Stackelberg
 QoE:

Quality of experience
 QoS:

Quality of Service
 RRM:

Radio resource management
 RUs:

Resource units
 UEs:

User equipments
 UPN:

Userprovided network
References
Mohammed Dighriri, Ali Saeed Dayem Alfoudi, Gyu Myoung Lee and Thar Baker, “Data Traffic Model in Machine to Machine Communications over 5G Network Slicing”, IEEE DeSE’, pp. 239–244 (2016)
KC Chen, Machinetomachine Communications for Healthcare. JCSE 6(2), 119–126 (2012)
Y Wang, P Li, L Jiao, S Zhou, N Cheng, XS Shen, P Zhang, A datadriven architecture for personalized QoE management in 5G wireless networks. IEEE Wirel. Commun. 24(1), 102–110 (2017)
Y Wang, X Lin, Userprovided networking for QoE provisioning in mobile networks. IEEE Wirel. Commun. 22(4), 26–33 (2015)
KS Kim, S Uno, MW Kim, Adaptive QoS mechanism for wireless mobile network. JCSE 4(2), 153–172 (2010)
R Sofia, P Mendes, Userprovided networks: Consumer as provider. IEEE Commun. Mag. 46(12), 86–91 (2008)
B Lorenzo, F GomezCuba, J GarciaRois, FJ GonzalezCastano, JC Burguillo, A Microeconomic Approach to Data Trading in User Provided Networks, IEEE Globecom’ 2015 (2015), pp. 1–7
S Kim, Game Theory Applications in Network Design (IGI Global, Hershey, 2014)
S Kim, Multileader multifollower Stackelberg model for cognitive radio spectrum sharing scheme. Comput. Netw. 56(17), 3682–3692 (2012)
MM Khalili, L Gao, J Huang, BH Khalaj, Incentive Design and Market Evolution of Mobile UserProvided Networks, IEEE INFOCOM’2015 (2015), pp. 498–503
B Lorenzo, F Javier GonzalezCastano, A Matching Game for Data Trading in OperatorSupervised UserProvided Networks, IEEE ICC’2016 (2016), pp. 1–7
G Iosifidis, G Lin, J Huang, L Tassiulas, Incentive mechanisms for userprovided networks. IEEE Commun. Mag. 52(9), 20–27 (2014)
G Lin, G Iosifidis, J Huang, L Tassiulas, Hybrid Data Pricing for NetworkAssisted UserProvided Connectivity, IEEE INFOCOM’2014 (2014), pp. 682–690
I Wayan Mustika and Ida Nurcahyani, “Proportional Fairness with Adaptive Bandwidth Allocation for Video Service in Downlink LTE”, IEEE COMNESTAT’, pp. 54–59 (2015)
MS Jeon, SK Kim, JH Yoon, JY Lee, SB Yang, A direction entropybased forwarding scheme in an opportunistic network. JCSE 8(3), 173–179 (2014)
S Kim, Adaptive MANET multipath routing algorithm based on the simulated annealing approach. Sci. World J. 2014, 1–9 (2014)
CT Hieu, CS Hong, A connection entropybased multirate routing protocol for mobile ad hoc networks. JCSE 4(3), 225–239 (2010)
Y Wang, Y Liu, J Zhang, H Ye, Z Tan, Cooperative store–carry–forward scheme for intermittently connected vehicular networks. IEEE Trans. Veh. Technol. 66(1), 777–784 (2017)
JY Kim, I Bang, DK Sung, Y Yi, BH Kim, Design of a MultiVariable QoE Function Based on the Remaining Battery Energy, IEEE PIMRC’2015 (2015), pp. 976–980
CK Tham, T Luo, Quality of contributed service and market equilibrium for participatory sensing. IEEE Trans. Mob. Comput. 14(4), 829–842 (2015)
Acknowledgements
This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP20172014000636) supervised by the IITP (Institute for Information & Communications Technology Promotion) and was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF2015R1D1A1A01060835).
Funding
This study is funded by the MSIT (Ministry of Science and ICT) and the National Research Foundation of Korea (NRF).
Availability of data and materials
Please contact the corresponding author at swkim01@sogang.ac.kr.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Author’s information
Sungwook Kim received the BS, MS degrees in computer science from the Sogang University, Seoul, in 1993 and 1995, respectively. In 2003, he received the PhD degree in computer science from the Syracuse University, Syracuse, New York, supervised by Prof. Pramod K. Varshney in 2003. He is currently a professor of Department of Computer Science & Engineering and is a research director of the Internet Communication Control research laboratory (ICC Lab.). His current research interests are in game theory and network design applications.
Competing interests
The author declares no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Kim, S. Dynamic position changeable Stackelberg game for userprovided network control algorithms. J Wireless Com Network 2017, 214 (2017). https://doi.org/10.1186/s1363801710042
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363801710042
Keywords
 Userprovided network
 Position changeable Stackelberg game
 Quality of experience
 Reinforcement learning
 Game theory