Skip to main content

Dynamic position changeable Stackelberg game for user-provided network control algorithms


Nowadays, wireless mobile services have been going through a paradigm shift due to three reasons: (i) the increasing needs for ubiquitous connectivity, (ii) an unprecedented volume of mobile data traffic, and (iii) technologically advanced mobile devices with enhanced capabilities. To address these challenges, user-provided network (UPN) is an emerging technology and can be extensively deployed with the aim of providing substantial improvements to cellular coverage and capacity. In this study, we focus on the design of novel UPN control scheme based on the game theory. Motivated by the Stackelberg game model, our proposed scheme allows mobile devices to play changeable roles, i.e., hosts or clients, to improve the UPN system performance. Under dynamically changing UPN environments, it is a suitable approach. Based on the decentralized, individual non-cooperative manner, we capture the dynamics of UPN system while leading a deep progress for future networks. Simulations and performance analysis verify the efficiency of proposed scheme, showing that our approach can outperform existing schemes in terms of bandwidth utilization, quality of experience (QoE) of service success, delay and throughput ratios, and normalized users’ profit.

1 Introduction

Today, we are witnessing technological advances that herald the advent of a new era in telecommunication networks. With widespread wireless techniques, and variety of user-friendly terminals, cellular networks are expected to face new challenges. The traditional existing cellular systems might run out of capacity in the near future due to significantly increasing machine-to-machine (M2M) data traffic with various service requirements [1, 2]. With the emergence of a variety of new wireless network paradigms, it is envisioned that a new era of personalized services has arrived, which emphasizes users’ quality of experience (QoE). 5 Generation (5G) network standards represent a promising technology to support users’ QoE; it will become one of the key features in future networks [3].

QoE is a measure of customers’ experiences with services. It focuses on the entire service experience and is a more holistic evaluation than the more narrowly focused user experience. Traditionally, quality of service (QoS) parameters have been employed to evaluate the network-oriented service quality. However, QoE expands this horizon to capture people’s esthetic and even hedonic needs. Therefore, QoE emphasizes the end-to-end performance from both the subjective and objective perspectives and takes into consideration user perception in the loop [4, 5]. While the QoE concept is easily motivated, it remains a difficult and complex problem to specially support device-to-device, ultra-reliable, and massive machine-based 5G networks.

Within the past few years, the increasing demand for mobile data in current cellular networks and the proliferation of advanced handheld devices have led to a new network paradigm, known as user-provided network (UPN). The concept of UPN was first defined by R. Sofia as “such a networking technology where the end-user is, at the same time a customer and a provider of network access” [6]. Based on the socio-technological advance, mobile devices have surplus capacities as a “micro-provider” to provide related services for other nearby users without additional network infrastructures. In particular, the rise of social networks inspires end users’ willingness to be micro-providers to spread the common interests by means of shared connectivity [4].

Recently, two UPN models have been implemented through different approaches. One method enables mobile devices to create a mesh network and share their Internet connections without the intervention of network operators. The other method adopts a virtual mobile operator that enables its subscribers to act as mobile Wi-Fi hotspots to serve non-subscribers by offering some free data quota. Open Garden and Karma are well-known startup examples of two methods of UPN services, respectively [7]. Even though these two methods have different control mechanisms, there has been growing consensus that UPN operations rely on the participation of self-organizing user equipments (UEs). Therefore, a central issue in UPNs is that users must agree to serve each other. However, participating users have conflicting interests, and existing UPN methods lack proper control algorithms to coordinate conflicts of interests [8].

In this paper, we study and design a novel UPN control scheme where individual users share their mobile connections and act as hosts for other users, who are acting clients in hosts’ vicinity. Providing UPN services, users in the proposed scheme can benefit from sharing leftover resources to/from other users. To share users’ connectivity and resources both fairly and efficiently, we need a new control paradigm. Nowadays, the game theoretic approach is widely recognized as a practical perspective for the implementation of real-world network operations. Game theory is the study of strategic interactions between multiple intelligent rational decision makers trying to maximize the expected value of their own payoffs, which is measured in some utility scale [8]. Motivated by the UPN’s situation that users are rational individuals and able to make control decisions logically in order to pursue their own interests, we have adopted a game theoretic mechanism. In this way, we are able to ease the heavy computational burden of theoretically optimal centralized solutions.

To model the interaction among multiple hosts and clients, Stackelberg game is a suitable and proper model; it was initially proposed by the German economist H. von Stackelberg in 1934 to explain some economic monopolization phenomena. In a classical Stackelberg game, one player acts as a leader and the rest as followers, and the main goal is to find an optimal strategy for the leader, assuming that the followers react in such a rational way that followers optimize their objective functions given the leader’s actions [8, 9]. In the study, we devise a new Stackelberg game, called position changeable Stackelberg (PCS) game, to adapt effectively to the UPN situation. In the PCS game, each individual user can be a leader, i.e., host, or a follower, i.e., client, dynamically to maximize his payoff. Based on the current position, each user learns the UPN condition and dynamically selects his strategy.

The success of UPN systems relies on the users’ willingness to contribute their Internet connectivity and network resources [10]. For practical UPN operations, it is important to design effective incentive mechanisms for encouraging hosts to participate UPN services. As examples, the Karma UPN system offers some free data quota reward to UPN hosts who share their connectivity, and the Open Garden UPN system designs a distributed bargaining-based virtual currency approach to fairly and efficiently share the resource. However, these prior studies neither focused on interactions between a particular host and his clients, or on dynamic UPN structures with changeable topologies [10].

In the proposed scheme, hosts collect the UPN service fee from their clients; it is an incentive for each host. To decide the best price strategy, each host uses a reinforcement learning algorithm in a distributed manner. By taking into account temporal and spatial UPN system status, an individual host makes intelligent decisions. Based on the PCS game and reinforcement learning, our proposed UPN control scheme mainly considers three design and operational issues: (i) which position is better for each user in the current UPN topology, (ii) what is an appropriate incentive to each host, and (iii) how users select their strategies with imperfect information. To effectively address these issues, we focus on design principles such as feasibility, self-adaptability, and effectiveness in providing a desirable solution. Although several UPN control schemes have been proposed, no systematic study based on the realistic UPN scenario has been conducted. To the best of our knowledge, there has been very little research on UPN control problems by integrating game theory and reinforcement learning algorithms.

1.1 Contribution

This study generalizes the UPN control algorithm in the following aspects. First, users in the UPN can dynamically decide their positions. To model the interaction among position changeable users, we employ the PCS game. Second, the host’s pricing policy for UPN services is developed as an incentive mechanism. Third, a novel price decision process is designed with practical assumptions. Depending on a new reinforcement learning approach, the hosts select their price strategies through individual and social learnings. Finally, a fair-balanced solution can be obtained under diversified UPN situations. In summary, the main contributions are as follows:

  • Position changeable Stackelberg game model: motivated by diversified UPN situations, we introduce a new game model while capturing a variety of system characteristics. This approach is generic and applicable to various UPN scenarios.

  • Well-balanced network performance: we model the interaction of users by considering the responsive tradeoff between hosts and clients. According to a feedback-based interactive process, we understand thoroughly the impact of host’s price strategy and provide an effective incentive mechanism for an attractive UPN performance.

  • Implementation practicality: as game players, users learn how to modify their prior knowledge and select their strategies with bounded rationality. Therefore, the decision mechanism is implemented with low complexity. It is practical and suitable for real-world UPN operations.

  • Solution concept: the main idea of our PCS game lies in its responsiveness to the reciprocal combination of optimality and practicality. Players act with self-adaptability and real-time effectiveness. The concept of our solution is to approximate an efficient UPN status using individual and social learning approaches.

  • Conclusions: Numerical study shows that our game-based approach can increase the bandwidth utilization, service QoE, and users’ profit by 5 to 10% under different service request rates, comparing to the existing IDME [10] and MGDT [11] schemes.

1.2 Organization

The remainder of this article is organized as follows. In the next section, we review some related UPN schemes and their problems. In Section 3, we provide a detailed description of the proposed PCS game model and UPN control algorithms. In particular, this section provides fresh insights into the benefits and design of PCS game-based UPN control approach. For convenience, the main steps of the proposed scheme are then listed. In Section 4, we validate the performance of the proposed scheme by means of a comparison with some existing methods. Finally, we present our conclusion and discuss the remaining open challenges in this area along with possible solutions.

2 Related work

There has been considerable research into the design of UPN control schemes. Sofia et al. [6] first defined the concept of UPN, where the end users can be consumers and providers of Internet access at the same time. Authors provided a characterization of UPN, and a comparison of connectivity features for the UPN against ad hoc and multi-hop networks. To clarify the main differences from other autonomic networks, this article investigated a paradigm shift in Internet services and wholesale models [6].

The article [4] surveyed recent technical developments in the fields of QoE and UPN. After that, a new UPN-based framework in mobile networks was proposed. In particular, the importance of end users in the QoE provisioning chain was highlighted. In order to aid descriptions of the proposed UPN-based networks, a case study was conducted with the proposed adaptive resource scheduling method for QoE improvement. Some possible challenges and future research were also discussed [4].

Iosifidis et al. [12] analyzed the design challenges of incentive mechanisms for encouraging user engagement in UPNs. Motivated by recently launched business models, they discussed the technical issues pertaining to resource allocation for UPN services and explained the importance of incorporating incentive schemes. And then, they analyzed mobile UPN models that were inspired by Open Garden and Karma. Finally, the challenges in designing incentive mechanisms for these services and two appropriate solutions were discussed [12].

Gao et al. [13] designed an optimal pricing and reimbursing strategy for a mobile virtual network operator that maximized its total revenue while considering the necessary incentives to the operator. They employed a game-theoretic analysis and modeled the interaction between the operator and the hosts as a two-stage leader-follower Stackelberg game. In the first stage, the operator decided the price and free data quota reimbursing plan. In the second stage, every host decides how much data would be consumed for its own needs and how much data would be forwarded for clients. Finally, they systematically analyzed the game equilibrium, which can capture a variety of system characteristics, including the service types of hosts or clients, the capacities of hosts’ Internet connections, and the energy consumption patterns of hosts [13].

The Matching Game based Data Trading (MGDT) scheme [11] considered a new operator-supervised UPN scheme based on the matching game model. In this scheme, users shared connectivity and acted as an access point for other users. To incentivize user participation in the UPN, the MGDT scheme allowed the users to trade their data plan and obtained a profit by selling and buying leftover data capacities from each other. To formulate the buyer-seller data trading process, a many-to-one matching game was designed and a distributed algorithm was developed that enabled users to self-organize into a stable matching. By combining the matching theory and market equilibrium, the MGDT scheme ensured dynamic adaptation of price to data demand and supply and calculated operator gains and the benchmark price that would encourage users to join the UPN [11].

The Incentive Design and Market Evolution (IDME) scheme [10] studied the user membership evolution in an operator-assisted UPN and the operator’s best strategy to maximize his profit. Simply, two key questions in the IDME scheme were (i) what was the best membership choice for each mobile user and (ii) what was the operator’s best pricing strategy. To effectively answer these two questions, the IDME scheme formulated the interaction between the operator and mobile devices as a two-stage game. In the stage I, the operator determined the usage-based pricing and quota-based incentive mechanism for the data usage. In the stage II, the mobile devices made their decisions about whether to be a host, or a client [10]. Some earlier studies [4, 6, 10,11,12,13] have attracted considerable attention while introducing unique challenges in handling the UPN control problems. In this paper, we demonstrate that our proposed scheme significantly outperforms these existing IDME [10] and MGDT [11] schemes.

3 The proposed PCS game-based UPN algorithms

In this section, we provide a brief introduction to our PCS game model, which forms the theoretical basis of the proposed UPN control scheme. By adopting a new PCS game-based approach, we design effective control protocols to adapt the dynamically changing UPN environments.

3.1 PCS game model and players’ utility functions

Currently, there are several UPN models that differ in the architectures and services they offer to users. However, one common aspect of these models is that UPNs consists of several network agents that are organized in a mesh topology where all agents cooperate with each other to improve network performance. Generally, there are two types of agents which are base stations, i.e., eNBs and UEs. The function of eNBs is to support Internet connectivity by acting as gateways to connect backbone networks. UEs participate in UPN services to advance traffic management and reduce energy consumption. The main concern of UPN idea is to exploit the diversity of UEs’ needs, resources, and Internet connectivity by building an autonomous intervention mechanism.

We consider the operation of a multicellular network that consists of a set B = {B 1B n } of eNBs, and a set G = {G 1G y } of UEs. Some UEs may be in range with a B B, while some others may not. Each eNB has a set of bandwidth resource blocks that can be allocated to UEs in each time period. We assume that UEs may change their locations and hence have different neighbors in different time periods. Such a mobility assumption allows us to get closed form expressions and gain valuable insights into the design of UPN.

Different UEs are in different situations. In our model, there are two types of UEs, i.e., hosts and clients in the UPN. A host can connect to the eNB directly and share his Internet connection with clients. Therefore, hosts serve as temporal gateways to other clients. A client can connect to the eNB directly only for his service, or connects to the eNB via a nearby host. The host’s goal is to maximize his profit, which depends on the revenue collected from his clients, and the objective of the client is also to maximize his profits, which depend on the outcome from UPN services. According to their current status, they dynamically select their positions and their decisions are adaptively changeable over time. In this study, the UPN control scheme is formulated as a new PCS game model. As game players, UEs select their positions and strategies based on the interactions of other UEs. At each time period of gameplay, we formally define the PCS game model \( {\mathbb{G}}^{PCS} \) = {B, G, \( \left\{{\boldsymbol{S}}_h^G,{\boldsymbol{S}}_c^G\right\} \), \( \left\{{U}_h^G,{U}_c^G\right\} \), \( {L}_G^{\mathfrak{P}} \), T} as follows:

  • B = {B 1B n } represents a set of eNBs, and each B has orthogonal bandwidth blocks.

  • G = {G 1G y } is the finite set of PCS game players, i.e., UEs. The G can be divided into three subsets G h, G c, and G n; G h is the subset of hosts \( {G}_k^h\in {\mathbf{G}}^{\boldsymbol{h}}\subset \mathbf{G} \), G c is the subset of clients \( {G}_i^c\in {\mathbf{G}}^{\boldsymbol{c}}\subset \mathbf{G} \) and G n is the subset of UEs, which temporarily give up service requests \( {G}_l^n\in {\mathbf{G}}^{\boldsymbol{n}}\subset \mathbf{G} \).

  • In the \( \left\{{\boldsymbol{S}}_h^G,{\boldsymbol{S}}_c^G\right\} \), \( {\boldsymbol{S}}_h^G=\left\{{s}_{\mathrm{min}}^h\dots {s}_i^h\dots {s}_{\mathrm{max}}^h\right\} \) is the set of host’s strategies where \( {s}_i^h\in {\boldsymbol{S}}_h^G \) means the i th price level for client’s communications. \( {\boldsymbol{S}}_c^G \) is the set of client’s available strategies; each client has three strategy options: (i) selects one host among his neighboring hosts for relay communications, (ii) directly connects to his corresponding eNB, or (iii) to be a member of G n.

  • \( {U}_h^G \) is the payoff received by the host, and \( {U}_c^G \) is the payoff received by the client during the UPN operation.

  • \( {L}_G^s \) is the learning value for the host’s strategy s; \( {L}_G^s \) is used to estimate the probability distribution (P G) for the next strategy selection.

  • T ={H 1, …, H t , H t + 1, …} denotes time, which is represented by a sequence of time steps with imperfect information for the PCS game process.

Table 1 lists the notations used in this paper.

Table 1 Parameters used in the proposed algorithm

In the UPN system, eNBs charge their subscribers, i.e., hosts and clients, based on the usage-based price mechanism. If unexpected growth of traffic may develop in a specific eNB, it may create a local traffic congestion. To alleviate this kind of traffic overload, the current price (P B ) should increase. If few subscribers access in a eNB, the current price should decrease to attract subscribers to participate in UPN services. As in the same manner, hosts charge their clients based on the current traffic load. According to the current price, individual clients act independently and selfishly; all game players select their strategies to maximize their payoffs.

Each host can connect to the eNB anytime via his device. At the same time, this device can provide Internet connections for nearby clients as a gateway. Therefore, the assigned communication resource for a host can be shared with clients, and clients need to pay the price for their UPN services. Depending on the payment, cost, and expense functions of UPN services, the utility function of the host \( {G}_k^h \) at time H t \( \left({U}_h^{G_k^h}\left({H}_t\right)\right) \) can be defined as follows:

$$ {U}_h^{G_k^h}\left({H}_t\right)=\sum \limits_{G_i^c\in {\boldsymbol{N}}_{G_k^h}}\left({P}_h^{G_k^h}\left({r}_c^{G_i^c}\left({H}_t\right)\right)-{C}_h^{G_k^h}\left({\Theta}_{G_k^h}^B\left({H}_t\right),{r}_c^{G_i^c}\left({H}_t\right)\right)-{D}_h^{G_k^h}\left({r}_c^{G_i^c}\left({H}_t\right)\right)\right) $$
$$ \mathrm{s}.\mathrm{t}.,\left\{\begin{array}{l}{P}_h^{G_k^h}\left({r}_c^{G_i^c}\left({H}_t\right)\right)={F}^s\left({G}_k^h,{H}_t\right)\times {r}_c^{G_i^c}\left({H}_t\right)\\ {}{C}_h^{G_k^h}\left({\Theta}_{G_k^h}^B\left({H}_t\right),{r}_c^{G_i^c}\left({H}_t\right)\right)=\left[\alpha +{\Theta}^B\left({H}_t\right)\right]\times {r}_c^{G_i^c}\left({H}_t\right)\\ {}{D}_h^{G_k^h}\left({r}_c^{G_i^c}\left({H}_t\right)\right)={m}_h^{G_k^h}\times {r}_c^{G_i^c}\left({H}_t\right)\ \end{array}\right. $$

where \( {P}_h^{G_k^h}\left(\cdotp \right) \) is the price function to clients at time H t , and \( {C}_h^{G_k^h}\left(\cdotp \right) \) and \( {D}_h^{G_k^h}\left(\cdotp \right) \) are the cost and expense functions of \( {G}_k^h \), respectively, for the UPN services.\( {\boldsymbol{N}}_{G_k^h} \) is the set of \( {G}_k^h \)’s neighboring hosts, and \( {r}_c^{G_i^c} \) is the bandwidth amount of \( {G}_i^c \)’s UPN service. \( {F}^s\left({G}_k^h,{H}_t\right) \) is the \( {G}_k^h \)’s UPN price strategy per bit at time H t where \( {F}^s\left({G}_k^h,{H}_t\right)\in {\boldsymbol{S}}_h^{G_k^h} \). Let α and Θ B(H t ) denote the connected eNB’s minimum charge and bandwidth utilization at time H t , respectively. \( {m}_h^{G_k^h} \) is the \( {G}_k^h \)’s the expense factor per bit of UPN services.

Each client can connect to the backbone Internet directly via the eNB or indirectly via a nearby host. If there are multiple hosts in his coverage area, the client selects one of them to maximize his payoff. If \( {G}_k^h \) is selected as a host, the utility function of the client \( {G}_i^c \) at time H t \( \left({U}_c^{G_i^c}\left({G}_k^h,{H}_t\right)\right) \) can be defined as follows:

$$ {U}_c^{G_i^c}\left({G}_k^h,{H}_t\right)={Q}_c^{G_i^c}\left({r}_c^{G_i^c}\left({H}_t\right)\right)-{I}_c^{G_i^c}\left({G}_k^h,{r}_c^{G_i^c}\left({H}_t\right)\right)-{K}_c^{G_i^c}\left({r}_c^{G_i^c}\left({H}_t\right)\right) $$
$$ \mathrm{s}.\mathrm{t}.,\left\{\begin{array}{l}{Q}_c^{G_i^c}\left({r}_c^{G_i^c}\left({H}_t\right)\right)={\psi}_c^{G_i^c}\times {r}_c^{G_i^c}\left({H}_t\right)\\ {}{I}_c^{G_i^c}\left({G}_k^h,\kern0.5em {r}_c^{G_i^c}\left({H}_t\right)\right)\\ {}=\mathbf{\min}\left\{\mathbf{\min}\left\{{G}_k^h\in {\boldsymbol{N}}_{G_i^c}|{F}^s\left({G}_k^h,{H}_t\right)\right\},{PS}_B\left({H}_t\right)\right\}\times {r}_c^{G_i^c}\left({H}_t\right)\\ {}{K}_c^{G_i^c}\left({r}_c^{G_i^c}\left({H}_t\right)\right)={\zeta}_c^{G_i^c}\times {r}_c^{G_i^c}\left({H}_t\right)\end{array}\right. $$

where \( {Q}_c^{G_i^c}\left(\cdotp \right) \), \( {I}_c^{G_i^c}\left(\cdotp \right) \), and \( {K}_c^{G_i^c}\left(\cdotp \right) \) are the outcome, charge, and expense functions of \( {G}_i^c \), respectively. Let \( {\psi}_c^{G_i^c} \) and \( {\zeta}_c^{G_i^c} \) denote the \( {G}_i^c \)’s satisfaction and expense factors per bit of UPN service, respectively. PS B (H t ) represents the corresponding eNB’s price strategy at time H t , e.g., [α + ΘB(H t )].

As discussed previously, there are three positions of UEs: host (G h), client (G c), or a member of G n. At each time, UEs try to decide their best positions. Hence, each UE selects his position Χ to maximize his payoff (U) where Χ G hG cG n.

$$ \mathrm{X}=\underset{\boldsymbol{X} \in {\boldsymbol{G}}^{\boldsymbol{h}}\cup {\boldsymbol{G}}^{\boldsymbol{c}}\cup {\boldsymbol{G}}^{\boldsymbol{n}}}{\mathbf{\max}\ }\left\{\ {U}_h^{\boldsymbol{X} \in {\boldsymbol{G}}^{\boldsymbol{h}}},{U}_c^{\boldsymbol{X} \in {\boldsymbol{G}}^{\boldsymbol{c}}},{U}_c^{\boldsymbol{X} \in {\boldsymbol{G}}^{\boldsymbol{n}}}\ \right\} $$
$$ \mathrm{s}.\mathrm{t}.,{U}_h^{\mathrm{X}\in {\mathbf{G}}^{\boldsymbol{h}}}={U}_h^{G^h}\left({H}_t\right),{U}_c^{\mathrm{X}\in {\mathbf{G}}^{\boldsymbol{c}}}={U}_c^{G^c}\left({G}^h,{H}_t\right)\kern0.5em \mathrm{and}\kern0.5em {U}_c^{\mathrm{X}\in {\mathbf{G}}^{\boldsymbol{n}}}=0 $$

3.2 Strategy selection mechanisms in the PCS game

During the UPN operation, eNBs have to manage radio resource to optimize all UE’s communications. Therefore, the UPN performance depends on its radio resource management algorithm and its implementation. Resource allocation process in each eNB is performed by the radio resource management (RRM) entity that dynamically distributes the radio resource to each active UEs in its covering area. The amount of radio resource allocation is specified in terms of resource units (RUs), where one RU is the minimum allocation amount, e.g., 128 Mbps. The RRM must be able to meet UEs’ requirements while maximizing the resource usage in a flat radio network structure [14].

In this study, we use the main concept of Nash bargaining solution (NBS) to allocate RUs. The NBS is an effective tool to achieve a mutually desirable solution with a good balance between efficiency and fairness [8]. However, in many cases, the assumptions of traditional NBS are unrealistic in real-life environments. In particular, if we consider a scenario of dynamic UPNs, the classical NBS cannot be directly applied to the RU allocation problem in UPNs. To avoid the inevitable burden of uncertainty, it is necessary to admit a new NBS approach in order to adapt current network changes.

In consideration of UEs’ mobility, we develop an iterative resource allocation algorithm, which is formulated as a sequential Nash bargaining manner. The main concept of sequential Nash bargaining is to observe the current system environment and dynamically update the NBS to adapt to the network dynamics. This approach can relax the traditional NBS assumption that all information are completely known. Therefore, during the step-by-step iteration, the eNB adaptively assigns RUs to all resource-requested UEs. In the B j , the allocated RU amount for the G k at time H t \( \left({\phi}_{H_t}^{G_k}\left({B}_j\right)\right) \) is defined as follows:

$$ {\phi}_{H_t}^{G_k}\left({B}_j\right)=\left\{\begin{array}{l}{\vartheta}_{H_t}^{G_k},\kern0.75em \mathrm{if}\ {R}_{H_t}^{B_j}\ge \kern0.5em \sum \limits_{G_l\in \boldsymbol{A}\left({B}_j\right)}{\vartheta}_{H_t}^{G_l}\\ {}\arg \underset{\left[\dots {\vartheta}_{H_t}^{G_l\in \boldsymbol{A}\left({B}_j\right)}\dots \right]}{\mathbf{\max}}\left\{\sum \limits_{G_l\in \boldsymbol{A}\left({B}_j\right)}\left({\gamma}_{H_t}^{G_l}\times \log\ \left({\vartheta}_{H_t}^{G_l}\right)\right)\ \right\},\kern1em \mathrm{otherwise}\\ {}=\arg \underset{\left[\dots {\vartheta}_{H_t}^{G_l\in \boldsymbol{A}\left({B}_j\right)}\dots \right]}{\mathbf{\max}}\left\{\log \prod \limits_{G_l\in \boldsymbol{A}\left({B}_j\right)}{\left({\vartheta}_{H_t}^{G_l}\right)}^{\gamma_{H_t}^{G_l}}\right\},\mathrm{s}.\mathrm{t}.,\kern0.5em \sum \limits_{G_l\in \boldsymbol{A}\left({B}_j\right)}{\vartheta}_{H_t}^{G_l}={R}_{H_t}^{B_j}\ \end{array}\right. $$

where \( {\vartheta}_{H_t}^{G_k} \) is the amount of RU requested by the G k at time H t and \( {R}_{H_t}^{B_j} \) is the available RU amount of B j . A(B j ) is the set of UEs, which now request new RUs from the B j . \( {\gamma}_{H_t}^{G_l} \) is the G l ’s bargaining power at time H t , which is the relative ability to exert influence over other UEs. Usually, the bargaining solution is strongly dependent on the bargaining powers. If different bargaining powers are used, the G with a higher bargaining power obtains a higher RUs. In the proposed algorithm, determining the bargaining powers depends on the G’s mobility. Let \( {\mathcal{E}}_{H_t}^{G_l} \) be the entropy for the G l at the time H t . Usually, the basic concept of entropy is the uncertainty and a measure of the disorder in a system [15,16,17]. To evaluate G l ’s stability in the UPN, \( {\mathcal{E}}_{H_t}^{G_l} \) represents the topological change, which is a natural quantification of the effect of G l mobility in the UPN; it is calculated as follows:

$$ {\mathcal{E}}_{H_t}^{G_l}=\left(-{\sum}_{G_k\in {F}_{G_l}}\left({P}_{G_k}\left({H}_t,{\Delta }_H\right)\times \mathit{\log}\ {P}_{G_k}\left({H}_t,{\Delta }_H\right)\right)\right)/\left(\mathit{\log}\ C\left({F}_{G_l}\right)\right) $$
$$ \mathrm{s}.\mathrm{t}.,{P}_{G_k}\left({H}_t,{\Delta }_H\right)=O\left({G}_l,{G}_k\right)/{\sum}_{G_f\in {F}_{G_l}}O\left({G}_l,{G}_f\right) $$

where H  is a time interval, and \( {F}_{G_l} \) denotes the set of the neighboring UEs of G l , and \( C\left({F}_{G_l}\right) \) is the cardinality (degree) of set \( {F}_{G_l} \). O(G l , G f ) represents a measure of the relative mobility among two G l and G f . Recently, a new sensor technology has been developed rapidly. Nowadays, on-board units (OBUs) are embedded in most sensors of UEs [18]. OBUs are entities capable of event data recorder, global positioning system, forward and backward radar, sensing facility, and short-range wireless interface. Using OBUs, each UE can easily estimate the relative mobility of other UEs.

The entropy \( {\mathcal{E}}_{H_t}^{G_l} \) is normalized as \( 0\le {\mathcal{E}}_{H_t}^{G_l}\le 1 \). If the value of \( {\mathcal{E}}_{H_t}^{G_l} \) is close to 1 (or close to 0), the G l is stable (or unstable) [16]. Finally, the G l ’s bargaining power at time H t (\( {\gamma}_{H_t}^{G_l} \)) is calculated as follows:

$$ {\gamma}_{H_t}^{G_l}=\frac{{\mathcal{E}}_{H_t}^{G_l}}{\sum_{G_k\in \boldsymbol{A}\left({B}_j\right)}{\mathcal{E}}_{H_t}^{G_k}}, $$

As a host, the G h’s strategy (S h) represents the price level of client’s communications. Therefore, the payoff of \( {G}^h\kern0.5em \left({U}_h^{G^h}\right) \) can be defined as a function of S h and requested client services \( \left({r}_c^{G^c}\right) \). Interactions between G h and G c continue repeatedly during the UPN process over time. Therefore, the G h should consider the reactions from G c to determine his strategy S h. Under incomplete and asymmetric information situation, G h should learn from the current environment, build knowledge, and ultimately make strategy decisions to maximize his payoff. Until now, several learning algorithms have been developed to learn from the dynamic environments. However, traditional learning algorithms are not sufficient to follow rapid UPN changes. In particular, slow reaction to system fluctuations is a main drawback.

In this study, we develop a new learning algorithm to decide effectively the host’s price policy for UPN services. Usually, learning is divided into two categories: individual learning and social learning. Individual learning refers to trial and error, or insight temporal learning, and social learning refers to spatial leaning through interactions with other individuals. The main novelty of our learning method is a joint-design manner concerning individual and social learning approaches. If the strategy \( {S}_j^h \) is selected at time H t − 1 by the \( {G}_k^h \) in the B g ’s coverage area, the \( {G}_k^h \) updates the strategy \( {S}_j^h \)’s learning value for the next time step \( \left({L}_{G_k^h}^{s_j^h}\left({H}_t\right)\right) \) according to the following method:

$$ {L}_{G_k^h}^{s_j^h}\left({H}_t\right)=\left(\left(1-x\right)\times {L}_{G_k^h}^{s_j^h}\left({H}_{t-1}\right)\right)+\left(x\times \left[v+w\right]\right) $$
$$ \mathrm{s}.\mathrm{t}.,\kern2em v=\left(1-\beta \right)\times {U}_h^{G_k^h}\left({H}_{t-1}\right)\kern0.5em \mathrm{and}\ w=\left\{\beta \times \sum \limits_{G_i^h\in \boldsymbol{\Lambda} \left({B}_g\right)}\frac{L_{G_i^h}^{s_j^h}\left({H}_{t-1}\right)}{\left\Vert \boldsymbol{\Lambda} \left({B}_g\right)\right\Vert}\kern0.5em \right\} $$

where x is a learning rate that models how the L values are updated and Λ(B g ) is the set of hosts, which are connected to the B g . B g is the cardinality of the set Λ(B g ). In Eq. (7), v and w represent individual and social learning values, respectively. Therefore, β is a control factor for the weighted average between different learning approaches. When a UE’s mobility in the \( {G}_k^h \)’s coverage area is high, we place more emphasis on the social learning, i.e., on w. In this case, a higher value of β is more suitable. When a UE’s mobility in the \( {G}_k^h \)’s coverage area is low, the L(·) value should be strongly dependent on other UEs’ responses in the \( {G}_k^h \)’s area. In this case, a lower value of β is preferable to emphasize the individual learning, i.e., on v. In the proposed scheme, the value of β is dynamically adjusted based on the current entropy average of UEs, which are covered by the \( {G}_k^h \). That is to say, β is the arithmetic mean of each UE’s entropy, i.e., \( {\mathcal{E}}_{H_{t-1}}^G \), according to Eq. (5). Therefore, we can learn the finest strategy both the individual and the social level with the incomplete information of UPN conditions.

Based on the L(·) values, a strategy selection distribution (P) for each host can be defined. During the UPN process, we determine \( {\mathbf{P}}^{G_k^h}\left({H}_t\right)=\left\{{p}_{s_{\mathrm{min}}^h}^{G_k^h}\left({H}_t\right)\dots {p}_{s_j^h}^{G_k^h}\left({H}_t\right)\dots {p}_{s_{\mathrm{max}}^h}^{G_k^h}\left({H}_t\right)\right\} \) as the probability distribution of \( {G}_k^h \)’s strategy selection at time H t ; it is sequentially modified over time. In this study, \( {p}_s^G\left(\bullet \right) \) is defined based on the concept of Boltzmann distribution method, which has been used expensively in various machine learning algorithms. Finally, the \( {s}_j^h \) strategy selection probability by the \( {G}_k^h \) at time H t \( \left({\mathcal{P}}_{s_j^h}^{G_k^h}\left({H}_t\right)\right) \) is given by:

$$ {p}_{s_j^h}^{G_k^h}\left({H}_t\right)=\frac{\mathrm{EXP}\left({L}_{G_k^h}^{s_j^h}\left({H}_{t-1}\right)\right)\times \left(\mathbf{\min}\left\{\mathbf{\max}\left({U}_h^{G_k^h}\left({H}_{t-1},{s}_j^h\right),0\right),1\right\}\right)}{\sum_{i=\mathit{\min}}^{max}\left(\mathrm{EXP}\left({L}_{G_k^h}^{s_i^h}\left({H}_{t-1}\right)\right)\times \left(\mathbf{\min}\left\{\mathbf{\max}\left({U}_h^{G_k^h}\left({H}_{t-1},{s}_j^h\right),0\right),1\right\}\right)\right)\ } $$

where \( {U}_h^{G_k^h}\left({H}_{t-1},{s}_j^h\right) \) is the \( {G}_k^h \)’s payoff with the strategy \( {s}_j^h \) at time H t − 1. Therefore, the min{max()} term is used to exclude non-profitable strategies from selection at time H t .

3.3 Main steps of proposed UPN control algorithm

In this paper, we proposed a novel UPN control scheme based on the PCS game, which is implemented as a distributed and dynamic repeated game for opportunistic UPN operations. In the proposed scheme, individual UEs are game players and attempt to maximize their payoffs through a step-by-step interactive game process. Therefore, UEs can learn the current UPN situation and determine their best position and strategies. Generally, well-known solution concepts of game theory are presented in closed-form expressions under the complete information. However, they cannot capture the adaptation issue of UPN operations over time.

Usually, control algorithms have exponential time complexity in order to solve classical optimal problems. Furthermore, they have mostly been developed in a static setting. These methods are impractical to be implemented for realistic system operations. In this study, we do not focus on trying to get an optimal solution based on the traditional optimal approach. But instead, an interactive game model is proposed. Using feedback-based self-monitoring and distributed learning techniques, control decisions are made dynamically while adapting to the current UPN situation. This decision mechanism is implemented with polynomial complexity. In addition, we can transfer the computational burden from a central system to UEs in a distributed online fashion. Therefore, it is a suitable approach for the real-world UPN system in the point view of practical operations. The main steps of the proposed scheme are described as follows (see Fig. 1).

  • Step 1: Control parameters are determined by the simulation scenario (see Table 2).

  • Step 2: At the initial time, the L learning values in each host UE are equally distributed. This starting guess guarantees that each host’s price strategy enjoys the same benefit at the beginning of PCS game.

  • Step 3: During the PCS game, UEs are freely moving around among eNBs’ coverage areas, and each UE constantly checks a connection availability to his corresponding eNB.

  • Step 4: At every time step (H), each individual UE estimates his available payoffs depending on different positions and decides independently his position among {G hG cG n} to maximize his payoff according to Eqs. (1)–(3).

  • Step 5: If a UE can directly connect to his corresponding eNB, it monitors UPN service requests from his neighboring client UEs. Otherwise, the UE asks his UPN services to his neighboring host UEs.

  • Step 6: When the total bandwidth amount requested by UEs is larger than the eNB’s capacity, the eNB distributes his available bandwidth through the sequential NBS according to (4). By considering each UE’s stability, bargaining powers (γ) of UEs are adaptively adjusted using (5)–(6).

  • Step 7: During each time period (H), a host UE (G h) dynamically calculates current learning values L(·) for each strategy S h according to (7). Based on the individual and social learning viewpoints, L(·) values effectively reflect the UPN’s temporal and spatial situations.

  • Step 8: The probability distribution (\( {\mathbf{P}}^{G^h} \)) for each G h’s strategy selection is dynamically adjusted based on the obtained learning values (L(·)) using the Eq. (8).

  • Step 9: Based on the interactive feedback process, the dynamics of our PCS game can cause a cascade of interactions among the UEs. As game players, UEs dynamically choose their best position and strategies in an online distributed fashion.

  • Step 10: Under the dynamic UPN environment, individual UEs are constantly self-monitoring for the next game process; go to Step 3.

Fig. 1
figure 1

The flowchart of proposed UPN control algorithm

Table 2 System parameters used in the simulation experiments

4 Performance evaluation

In this section, we evaluate the performance of our proposed protocol and compare it with that of the IDME [10] and MGDT [11] schemes. Based on the simulation results, we confirm the superiority of the proposed approach. There are other performance analysis methods: theoretical or numerical analysis. However, these methods have to be limited in scope—limited modeling possibility for dynamic behavior. Therefore, for complex and complicated algorithms, such as our proposed scheme, no capability makes tractable the theoretical and numerical model without many simplifications, which cannot provide precise performance evaluation. In contrast to these methods, a simulation analysis allows more complex realistic modeling for one real-world system. Therefore, in this paper, we propose a simulation model for the performance evaluation of our UPN scheme. To ensure a fair comparison, the following assumptions and system scenario were used:

  • The simulated system consists of 50 eNBs, and the number of UEs is 1000. The bandwidth capacity of each eNB is 100 Gbps, and the bandwidth resource blocks of each eNB are orthogonal with each other.

  • UEs can travel in one of six directions with an equal probability. Simply, we consider four cases of user velocity—fast speed (120 km/h), medium speed (60 km/h), slow speed (20 km/h), and 0 speed (stationary; 0 km/h); it is randomly selected.

  • Based on the speed, a UE’s residence time in a eNB area is estimated that (i) it is 30 s for a fast speed UE, (ii) it is 60 s for a medium speed UE, (iii) it is 180 s for a slow speed UE, and (iv) it is the same as the service duration time for a stationary UE.

  • According to the UE’s characteristics, new service requests are generated based on the Poisson process, which is with rate λ (services/s), and the range is varied from 0 to 3.

  • There are eight different service requests; UEs randomly generate different service requests.

  • In order to represent various application services, eight different traffic types are assumed based on connection duration, bandwidth requirement, and required QoE. They are generated with equal probability.

  • The durations of service applications are exponentially distributed.

  • The price strategies in \( {\boldsymbol{S}}_h^{\mathfrak{I}} \) are defined as \( {s}_{\min =1}^h \)= 1, \( {s}_2^h \)= 1.2, \( {s}_3^h \)= 1.4, \( {s}_4^h \)= 1.6, \( {s}_5^h \)= 1.8, and \( {s}_{\max =6}^h \)= 2; the communication resource unit is 1 bps.

  • System performance measures obtained on the basis of 100 simulation runs are plotted as functions of the service request generation rate.

  • For simplicity, we assume the absence of physical obstacles and interferences in the experiments.

To demonstrate the validity of our proposed method, we measured the bandwidth utilization, QoE of service success ratio, delay rate and system throughput, and normalized UEs’ profit. Table 2 shows the system parameters used in the simulation. Major system control parameters of the simulation, presented in Table 2, facilitate the development and implementation of our simulator.

Figure 2 compares the bandwidth utilization of each scheme. In this study, the bandwidth utilization is measured as the percentage of actually used bandwidth, and it is a key factor to estimate the resource usability in UPN systems. All schemes exhibit a similar trend; however, the proposed scheme outperforms the existing methods from low to high service load distributions. By using a dynamic PCS game model, UEs in our scheme adaptively select their positions and strategies; it can improve the bandwidth utilization compared to other schemes.

Fig. 2
figure 2

Bandwidth utilization of the UPN system

Figure 3 compares the QoE in the UPN system. In this study, we develop a QoE model using the popular sigmoid function with respect to service provision, delay, and system throughput [19, 20], that is,

$$ QoE=\frac{1}{\left(1+{e}^{-\left(\kappa +\tau +\mathcal{z}\right)}\right)} $$
Fig. 3
figure 3

Service QoE in the UPN system

where κ is the service success ratio, τ is the delay rate, and \( \mathcal{z} \) is the UPN system throughput; \( \kappa, \tau, \mathrm{and}\ \mathcal{z} \) are the parameters constraining the quantization of QoE and normalized values. The QoE gain in the UPN system achieved by the proposed scheme is a result of our scheme’s self-adaptability and real-time effectiveness. Therefore, all UEs in the proposed scheme make control decisions strategically to ensure services. Due to this reason, the proposed scheme attains superior QoE to other existing schemes.

The curves in Fig. 4 indicate the normalized UEs’ profit in the UPN system. As game players, all UEs adaptively select their positions in a distribution online manner. From the viewpoint of a host, the main goal is to adaptively decide the price strategy to maximize his total revenue. From a client perspective, the main concern is to choose the best connection way to maximize his payoff. According to an interactive feedback mechanism, hosts and clients learn the current UPN environments and attempt to improve their profits. At every game period, our learning-based approach can provide synergistic and complementary features to adapt dynamic UPN situations.

Fig. 4
figure 4

Normalized UEs’ profit in the UPN system

The simulation results shown in Figs. 2, 3, and 4 demonstrate that the proposed scheme, which uses a learning-based PCS game model, can monitor the current UPN conditions and adapt to highly dynamic system situations. In particular, all UEs in our approach gain real-time information from the UPN environment and make intelligent decisions in a self-adapting manner. The simulation results indicate that the proposed scheme attains an attractive UPN performance, something that the IDME [10] and MGDT [11] schemes cannot offer.

5 Conclusions

Today, technology can enable widespread communications without depending upon traditional network structures. From a global perspective, UPN has a tremendous potential and is introducing a paradigm shift in network services while allowing the end user to be a host or a client. In this article, we have proposed a new UPN control scheme based on the PCS game model. As game players, UEs decide their positions considering the mutual-interaction relationship and learn better their strategies under dynamic UPN environments. Using feedback-based self-monitoring and distributed learning techniques, game players dynamically adapt to the current UPN situation and effectively maximizes their expected benefits. Based on the unique features of UPNs, our proposed scheme can provide satisfactory services under incomplete information conditions. To demonstrate the validity of our approach, we compared our scheme with existing schemes. Our simulation analysis indicates that our approach can outperform the existing schemes in a simulation environment.

There are many fascinating directions for future work. Our next step is to develop a new mechanism design to provide adaptive incentives to hosts. It will differ from classical mechanism design by adopting distributional assumptions about the agents. To be a viable solution, we also consider computational constraints. Another interesting direction is to study the impact of users’ social relationship in a multi-client multi-host UPC model. In addition, our studies will address the optimality issues in the UPN system from the operator’s perspective.



5 Generation


Incentive Design and Market Evolution




Matching Game-Based Data Trading


Nash bargaining solution


On-board units


Position changeable Stackelberg


Quality of experience


Quality of Service


Radio resource management


Resource units


User equipments


User-provided network


  1. Mohammed Dighriri, Ali Saeed Dayem Alfoudi, Gyu Myoung Lee and Thar Baker, “Data Traffic Model in Machine to Machine Communications over 5G Network Slicing”, IEEE DeSE’, pp. 239–244 (2016)

  2. K-C Chen, Machine-to-machine Communications for Healthcare. JCSE 6(2), 119–126 (2012)

    Google Scholar 

  3. Y Wang, P Li, L Jiao, S Zhou, N Cheng, XS Shen, P Zhang, A data-driven architecture for personalized QoE management in 5G wireless networks. IEEE Wirel. Commun. 24(1), 102–110 (2017)

    Article  Google Scholar 

  4. Y Wang, X Lin, User-provided networking for QoE provisioning in mobile networks. IEEE Wirel. Commun. 22(4), 26–33 (2015)

    Article  Google Scholar 

  5. KS Kim, S Uno, MW Kim, Adaptive QoS mechanism for wireless mobile network. JCSE 4(2), 153–172 (2010)

    Google Scholar 

  6. R Sofia, P Mendes, User-provided networks: Consumer as provider. IEEE Commun. Mag. 46(12), 86–91 (2008)

    Article  Google Scholar 

  7. B Lorenzo, F Gomez-Cuba, J Garcia-Rois, FJ Gonzalez-Castano, JC Burguillo, A Microeconomic Approach to Data Trading in User Provided Networks, IEEE Globecom’ 2015 (2015), pp. 1–7

    Google Scholar 

  8. S Kim, Game Theory Applications in Network Design (IGI Global, Hershey, 2014)

    Book  Google Scholar 

  9. S Kim, Multi-leader multi-follower Stackelberg model for cognitive radio spectrum sharing scheme. Comput. Netw. 56(17), 3682–3692 (2012)

    Article  Google Scholar 

  10. MM Khalili, L Gao, J Huang, BH Khalaj, Incentive Design and Market Evolution of Mobile User-Provided Networks, IEEE INFOCOM’2015 (2015), pp. 498–503

    Google Scholar 

  11. B Lorenzo, F Javier Gonzalez-Castano, A Matching Game for Data Trading in Operator-Supervised User-Provided Networks, IEEE ICC’2016 (2016), pp. 1–7

    Google Scholar 

  12. G Iosifidis, G Lin, J Huang, L Tassiulas, Incentive mechanisms for user-provided networks. IEEE Commun. Mag. 52(9), 20–27 (2014)

    Article  Google Scholar 

  13. G Lin, G Iosifidis, J Huang, L Tassiulas, Hybrid Data Pricing for Network-Assisted User-Provided Connectivity, IEEE INFOCOM’2014 (2014), pp. 682–690

    Google Scholar 

  14. I Wayan Mustika and Ida Nurcahyani, “Proportional Fairness with Adaptive Bandwidth Allocation for Video Service in Downlink LTE”, IEEE COMNESTAT’, pp. 54–59 (2015)

  15. MS Jeon, S-K Kim, J-H Yoon, JY Lee, S-B Yang, A direction entropy-based forwarding scheme in an opportunistic network. JCSE 8(3), 173–179 (2014)

    Google Scholar 

  16. S Kim, Adaptive MANET multipath routing algorithm based on the simulated annealing approach. Sci. World J. 2014, 1–9 (2014)

    Google Scholar 

  17. CT Hieu, CS Hong, A connection entropy-based multi-rate routing protocol for mobile ad hoc networks. JCSE 4(3), 225–239 (2010)

    Google Scholar 

  18. Y Wang, Y Liu, J Zhang, H Ye, Z Tan, Cooperative store–carry–forward scheme for intermittently connected vehicular networks. IEEE Trans. Veh. Technol. 66(1), 777–784 (2017)

    Google Scholar 

  19. JY Kim, I Bang, DK Sung, Y Yi, B-H Kim, Design of a Multi-Variable QoE Function Based on the Remaining Battery Energy, IEEE PIMRC’2015 (2015), pp. 976–980

    Google Scholar 

  20. C-K Tham, T Luo, Quality of contributed service and market equilibrium for participatory sensing. IEEE Trans. Mob. Comput. 14(4), 829–842 (2015)

    Article  Google Scholar 

Download references


This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-2014-0-00636) supervised by the IITP (Institute for Information & Communications Technology Promotion) and was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2015R1D1A1A01060835).


This study is funded by the MSIT (Ministry of Science and ICT) and the National Research Foundation of Korea (NRF).

Availability of data and materials

Please contact the corresponding author at

Author information

Authors and Affiliations


Corresponding author

Correspondence to Sungwook Kim.

Ethics declarations

Author’s information

Sungwook Kim received the BS, MS degrees in computer science from the Sogang University, Seoul, in 1993 and 1995, respectively. In 2003, he received the PhD degree in computer science from the Syracuse University, Syracuse, New York, supervised by Prof. Pramod K. Varshney in 2003. He is currently a professor of Department of Computer Science & Engineering and is a research director of the Internet Communication Control research laboratory (ICC Lab.). His current research interests are in game theory and network design applications.

Competing interests

The author declares no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, S. Dynamic position changeable Stackelberg game for user-provided network control algorithms. J Wireless Com Network 2017, 214 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: