Skip to main content

A game-theoretic learning approach to QoE-driven resource allocation scheme in 5G-enabled IoT


To significantly promote Internet of Things (IoT) development, 5G network is enabled for supporting IoT communications without the limitation of distance and location. This paper investigates the channel allocation problem for IoT uplink communications in the 5G network, with the aim of improving the quality of experience (QoE) of smart objects (SOs). To begin with, we define a mean opinion score (MOS) function of transmission delay to measure QoE of each SO. For the sum-MOS maximization problem, we leverage a game-theoretic learning approach to solve it. Specifically, the original optimization problem is equivalently transformed into a tractable form. Then, we formulate the converted problem as a game-theoretical framework and define a potential function which has a near-optimum as the optimization objective. To optimize the potential function, a distributed channel allocation algorithm is proposed to converge to the best Nash equilibrium solution which is the global optimum of maximizing the potential function. Finally, numerical results verify the effectiveness of the proposed scheme.

1 Introduction

The Internet of Things (IoT) is a system of human-to-object or object-to-object connection that sensors, controller, mechanical and digital machines, objects, animals, or people are interrelated and transfer data over a network by using information technology [1, 2]. In IoT, a thing can be a person with wearable devices, an autonomous vehicle with sensors, a farm animal with biochip transponders, or any other smart objects (SOs) provided with the ability to transfer data over a network [3, 4]. The concept of IoT is first mentioned by Kevin Ashton in a presentation he made to Procter Gamble in 1999. At that time, the computers use the data they gathered with the help from human beings [5]. However, people have very limited time, attention, and accuracy, which results in that they are not very good at capturing data about things in reality. The enormous potential demand for things connection drive the rapid development of IoT. IoT SOs contain different types, in which some are sensitive for delay, some are need for high reliability, and some are low-power and low-cost. Moreover, most of the IoT traffic is in the uplink and IoTs’ messages are typically small in size and sparse in time. These characteristics of IoT SOs make their access to the network different from classical users, which brings the network a great challenge [6]. Therefore, providing satisfactory service for IoT applications with differentiated demands is an important field, and the requirement for ultra-reliable low-latency communications of IoT SOs is greatly emphasized.

5G heterogeneous networks are envisioned to play a key role in providing a promising infrastructure for the massive proliferation of IoT SOs and the corresponding services [711]. IoT SOs with very limited computing and storage capabilities are associated with access points of 5G network for cloud services and communications [12, 13]. To handle the massive connectivity and satisfy the requirements of ultra-reliable low-latency communications, 5G network supporting IoT communications requires huge spectrum resources or the improvement of spectral efficiency [14]. Moreover, the interference management problem is one of the key challenges in 5G-enabled IoT, since the co-channel model that small cell base stations (SCBSs) overlayed on the covering area of macrocell base station (MBS) share the same frequency band is generally adopted in 5G network [15]. The resource allocation strategies are usually optimized to overcome the interference problem [16]. In particular, the requirement of ultra-reliable low-latency communications for IoT SOs is greatly emphasized, and thus, the performance enhancement of quality of experience (QoE) of IoT is a challenging and attractive research area. Motivated by achieving a real-time and reliable transmission of IoT, a QoE-driven resource allocation scheme is proposed in this paper.

Next, we give a brief review of the works related to our research. More related works on the efficient IoT support in 5G can be found in [1719]. Yerrapragada and Kelley [17] investigated a perfect interference alignment scheme for multiple-input multiple-output systems and applied it to a 5G-enabled IoT architecture. Since inter-cell interference significantly degraded the performance of IoT communications, Dao et al. [18] proposed a novel algorithm for finding the most appropriate pair of IoT terminal or its associated BS to provide a relay-assisted communication for the IoT terminal with poor signals in the inter-cell interference area. For IoT in cognitive 5G networks, the multiband cooperative spectrum sensing and resource allocation framework was presented in [19]. IoT communications in 5G network are expected to provide flexible delivery of broad services with a high QoE. Recently, the research on improving QoE of IoT SOs has attracted more and more attention [2022]. Aminjavaheri et al. [20] presented an underlay control signaling method for ultra-reliable low-latency communication applications in an LTE network and analyzed its performance. Since the satisfaction of QoE becomes the major challenge in content-centric IoT, the authors in [21] have analyzed lots of factors, i.e., content popularity and weight factor, which impact the resource allocation and how they subsequently influence the QoE. As a cloud resource, fog computing is rationally used for the delay-sensitive services of IoT by minimizing resource underutilization and enhancing QoE [22].

Resource allocation in IoT is investigated in many literatures by using game theory [2327]. Huang et al. [23] employed a cooperative game to model and analyze device-to-device communication for achieving high-performance data transportation in the new cloud-centric IoT paradigm. Device-to-device communication underlaying cellular networks was investigated in [24] to improve spectral efficiency, and a game-theoretic resource allocation scheme was designed by exploring the inherent competition of spectrum resource among users. The authors in [25] proposed Stackelberg game and many-to-many matching to solve the multi-stage problems of pairing, resource pricing, and purchasing in three-tier IoT fog networks. Although the proposed framework can achieve high performance, the optimal solution is ambiguous. In addition, matching theory was also used in [26] to find a stable IoT node pairing. In [27], the problem of efficiently and effectively securing IoT networks was investigated by carefully allocating security tools.

2 Methodology

Bearing the above in mind, we tend to leverage the game-theoretic learning algorithm to solve the resource allocation problem in 5G-enabled IoT network. In this paper, we assume that there are some SOs that access to 5G network, and it is looking forward to achieving the effective deployment of IoT without considering the limitation of distance and location. Certainly, this is confronted with more challenges. SOs are usually sensitive to latency, which raises a higher demand for data transmission. However, the interference, deriving from the reuse of radio resource, greatly affects SOs’ QoE in 5G network. The non-convex and integer optimization objective brings a great challenge to achieve the rational allocation of resource. Moreover, a distributed algorithm is desired for various SOs with different service requirements. In this paper, we study the channel allocation problem by applying game theory to analyze the distributed decisions made by SOs, and perform the learning algorithm to maximize the sum-MOS of SOs in the 5G network. The main contributions of our work are summarized as follows:

  • We consider the QoE of all SOs in the 5G-enabled IoT network as the objective function. A MOS standard in terms of the data transmission delay is proposed to measure QoE of various services. Then, an equivalent form is derived to replace the original optimization objective.

  • We use the game-theoretic model to formulate the modified optimization problem in which the designed potential function is an approximation of the optimization objective. Then, we prove it to be an exact potential game, whose best Nash equilibrium (NE) point is a near-optimal solution of the original optimization problem.

  • To find the best NE point, we design a distributed learning algorithm which can asymptotically converge to the global optimal solution that maximizes potential function with arbitrary high probability.

The rest of this paper is organized as follows. In Section 3, the system model and the QoE metric are presented. Then, the proposed channel allocation problem is equivalently converted into a tractable problem. According to the converted optimization problem, Section 4 establishes a game framework and then investigates the properties of the equilibrium. In Section 5, we propose an algorithm and the asymptotical optimality is verified. Finally, numerical results and discussion are presented in Section 6, and Section 7 concludes this paper.

3 System model and problem formulation

3.1 System model

We consider an uplink 5G-enabled IoT network consisting of B BSs, K diverse SOs (i.e., smart phone, smart meter, wearable device, and monitoring device) and N orthogonal channels, illustrated by Fig. 1. The set of all BSs is denoted by \(\mathcal {B}=\{1,2,\dots,B\}\) and the set of all SOs is represented by \(\mathcal {K}=\{1,2,\dots,K\}\). Suppose that the associations between BSs and SOs have been predetermined and let \(b_{k}\in \mathcal {B}\) be the BS at the service of SO \(k\in \mathcal {K}\). Moreover, suppose that each SO chooses a channel for data transmission and the bandwidth of each orthogonal channel is the same. We denote the set of the channels by \(\mathcal {N}=\{1,2,\dots,N\}\). Let ak be the channel allocation strategy of SO \(k\in \mathcal {K}\) and \(\mathcal {A}_{k}\) be the set of all possible selections for k. Thus, a=(a1,a2,…,aK) is the channel selection profile for all SOs and \(\mathcal {A}=\mathcal {A}_{1}\times \mathcal {A}_{2}\times \dots \times \mathcal {A}_{K}\) is the space of all possible selections for all SOs.

Fig. 1
figure 1

The 5G-enabled IoT network

The channel from SO \(k\in \mathcal {K}\) to BS \(b_{k}\in \mathcal {B}\) is supposed to be flat fading and the channel gain is denoted by \(h_{k,b_{k}}\). Let pk be the transmit power of SO k. Then, the received signal-to-interference-plus-noise ratio (SINR) of BS bk from SO k is given by:

$$ \gamma_{k}=\frac{p_{k}h_{k,b_{k}}}{\sum_{l\in\mathcal{K}\setminus\{k\}}p_{l}h_{l,b_{k}}\mathbb{1}_{\{a_{l}=a_{k}\}}+{\sigma_{k}^{2}}}, $$

where \({\sigma _{k}^{2}}\) is the power of the additive white Gaussian noise at the BS associated by SO k, and the indicator variable \(\mathbb{1}_{\{a_{l}=a_{k}\}}\in \{0,1\}\) is used to denote that the channel allocated to SO k is occupied by SO l, i.e., \(\mathbb{1}_{\{a_{l}=a_{k}\}}=1\), or not occupied, i.e., \(\mathbb{1}_{\{a_{l}=a_{k}\}}=0\).

In IoT, different SOs perform different applications, i.e., picture/video collection, game, file upload, and control information transmission. When performing different applications, SOs need to transfer different sizes of data for purpose of obtaining the same user experience in the same period of time. Mathematically, the set of service types required by SOs is represented as \(\mathcal {S}=\{1,2,\ldots,S\}\), and \(s_{k}\in \mathcal {S}\) is denoted as the performed service type of SO k. Let \(C_{s_{k}}\) be the amount of data required from SO k during a given period of time. Hence, the uplink transmission time from SO k is described as follows.

$$ T_{k}=\frac{C_{s_{k}}}{R_{k}}, $$

where Rk=B log2(1+γk) is the achievable rate of BS bk from SO k and B is the bandwidth of each channel.

3.2 QoE metric

To measure QoE of various services, we propose a mean opinion score (MOS) standard, ranging from 1 to 5, in terms of the data transmission delay. Letting \(\tau _{1,s_{k}}\) and \(\tau _{2,s_{k}}\) be respectively the most satisfied delay and the maximal tolerable delay based on the different service types, the MOS is defined as follows.

$$ \text{MOS}_{k}(\mathbf{a})= \left\{ \begin{array}{l} 5, T_{k}\leq\tau_{1,s_{k}},\\ \alpha \ln \frac{\tau_{1,s_{k}}+\tau_{2,s_{k}}-T_{k}} {\beta}, \tau_{1,s_{k}}<T_{k}<\tau_{2,s_{k}},\\ 1, T_{k}\geq \tau_{2,s_{k}}, \end{array} \right. $$

where \(\alpha =\frac {4}{\ln \tau _{2,s_{k}}-\ln \tau _{1,s_{k}}}\) and \(\beta =\tau _{1,s_{k}} \left (\frac {\tau _{1,s_{k}}} {\tau _{2,s_{k}}} \right)^{\frac {1}{4}}\). Figure 2 shows the curve variation tendency of MOS with the change of delay. The MOS values range from 1 to 5, where MOS=1 represents an unacceptable QoE for SOs and MOS=5 reflects an excellent user experience. In general, SOs have different tolerances for delay with regard to the different services. The application characteristics of SOs are determined by \(\tau _{1,s_{k}}\) and \(\tau _{2,s_{k}}\).

Fig. 2
figure 2

The QoE metric

3.3 Problem formulation and transformation

In this paper, to improve the overall transmission performance of the 5G-enabled IoT network by the optimization of channel allocation, we consider the sum-MOS maximization problem, which is mathematically expressed as:

$$ P: \max\limits_{\mathbf{a}\in\mathcal{A}} \sum_{k\in\mathcal{K}}\text{MOS}_{k}(\mathbf{a}). $$

The problem P is a non-convex and discrete optimization problem, for which finding its solution is expected to be very challenging. In what follows, We convert it into a tractable form. For notational convenience, we first define \(U_{k}=p_{k}g_{k,b_{k}}\phantom {\dot {i}\!}\) and \(I_{k}(a_{k},\mathbf {a}_{-k}) = \sum _{l\in \mathcal {K} \setminus \{k\}} p_{l} g_{l,b_{k}} \mathbb{1}_{\{a_{l}=a_{k}\}} + {\sigma _{k}^{2}}\), where ak is the channel selection profile of all the SOs except SO k. Then, we have

$$ T_{k}(\mathbf{a})=\frac{C_{s_{k}}}{B\log_{2} \left(1+\frac{U_{k}}{I_{k}(\mathbf{a})}\right)}. $$

By using first-order approximation of Taylor expansion at the point \(\phantom {\dot {i}\!}\mathbf {a}^{'}, T_{k}\) is expanded as \(\tilde {T}_{k}\), namely,

$$ \begin{aligned} \tilde{T}_{k}\left(\mathbf{a}^{'},\mathbf{a} \right)& \triangleq T_{k} \left(\mathbf{a}^{'}\right) + \frac{\mathrm{d}T_{k}\left(\mathbf{a}^{'}\right)} {\mathrm{d} I_{k}}\left(I_{k} (\mathbf{a})-I_{k} \left(\mathbf{a}^{'}\right)\right)\\ &=\Delta_{1,k}I_{k}(\mathbf{a})+\Delta_{2,k}, \end{aligned} $$

where \(\Delta _{1,k} = \frac {BC_{s_{k}}U_{k}} {\ln 2{T_{k}^{2}} \left (\mathbf {a}^{'}\right) \left ({I_{k}^{2}} \left (\mathbf {a}^{'} \right) + U_{k}I_{k} \left (\mathbf {a}^{'}\right) \right)}\phantom {\dot {i}\!}\) and \(\Delta _{2,k}=T_{k}\left (\mathbf {a}^{'}\right)-\Delta _{1,k}I_{k}\left (\mathbf {a}^{'}\right)\phantom {\dot {i}\!}\).

According to (6), (3) is expanded at the point \(\mathbf {a}^{'}\phantom {\dot {i}\!}\), namely,

$$ \begin{aligned} &\mathrm{\widetilde{MOS}}_{k}\left(\mathbf{a}^{\prime},\mathbf{a}\right)\\ & = \left\{ \begin{array}{l} 5, I_{k} (\mathbf{a}) \leq \tilde{\tau}_{1,s_{k}},\\ \Delta_{3,k} I_{k}(\mathbf{a}) + \Delta_{4,k}, \tilde{\tau}_{1,s_{k}}<I_{k} (\mathbf{a}) < \tilde{\tau}_{2,s_{k}},\\ 1, I_{k}(\mathbf{a})\geq \tilde{\tau}_{2,s_{k}}, \end{array} \right. \end{aligned} $$

where \(\tilde {\tau }_{1,s_{k}}=\frac {\tau _{1,s_{k}}-\Delta _{2,k}\left (\mathbf {a}^{'}\right)}{\Delta _{1,k}\left (\mathbf {a}^{'}\right)}\), \(\tilde {\tau }_{2,s_{k}}=\frac {\tau _{2,s_{k}}-\Delta _{2,k}\left (\mathbf {a}^{'}\right)}{\Delta _{1,k}\left (\mathbf {a}^{'}\right)}\), \(\Delta _{4,k}=\alpha \ln \frac {\tau _{1,s_{k}}+\tau _{2,s_{k}}-T_{k}\left (\mathbf {a}^{'}\right)}{\beta }-\Delta _{3,k}I_{k}\left (\mathbf {a}^{'}\right)\), and \(\Delta _{3,k}=\frac {\alpha \beta }{T_{k}\left (\mathbf {a}^{'}\right)-\left (\tau _{1,s_{k}}+\tau _{2,s_{k}}\right)}\).

By comparing (6) with (3), it is noted that (4) and \(\sum _{k\in \mathcal {K}} \mathrm {\widetilde {MOS}}_{k} \left (\mathbf {a}^{'},\mathbf {a}\right)\phantom {\dot {i}\!}\) have the same solution when \(\mathbf {a}^{'} = \mathbf {a}^{\ast }\phantom {\dot {i}\!}\) where a is the optimal solution to (4). Therefore, the original problem (4) is equivalently transformed into the following optimization problem.

$$ \tilde{P}: \max\limits_{\mathbf{a}^{\prime}=\mathbf{a}^{\ast},\mathbf{a}\in\mathcal{A}} \sum_{k\in\mathcal{K}} \mathrm{\widetilde{MOS}}_{k} \left(\mathbf{a}^{\prime},\mathbf{a}\right). $$

According to the above definition, it is noted that MOS of each SO depends on not only its channel selection strategy, but also on other SOs’ strategies. If too many SOs occupy the same channel to transmit data, the transmission rates are relatively low, and then the low MOSs lead to low-efficient data processing and put pressure on the data storage. Due to the interdependent and interactional relationship among different SOs, we adopt game theory to model and analyze the channel allocation strategies of SOs in \(\tilde {P}\). Furthermore, it is difficult for each SO to obtain other information of SOs with different types, which motivates us to propose a distributed learning algorithm for achieving the equilibrium solution of the game modeled from the channel access problem.

4 Game-theoretic analysis

In this section, we study the distributed optimization of the channel access problem by using game theory. Every SO is regarded as a player in the game, and the channel access game is defined as \(\mathcal {G}_{\mathbf {a}^{'}} = \{\mathcal {K}, \{\mathcal {A}_{k}\}_{k\in \mathcal {K}}, \{u_{k}\}_{k\in \mathcal {K}}\}\), where \(\mathcal {K}\) is the player (SO) set, \(\mathcal {A}_{k}\) is the action space of player k, and uk is the utility function of player k. The action space of each player is exactly the available channel set. To build a bridge between \(\mathcal {G}_{\mathbf {a}^{'}}\) and problem (8), we give the definition of utility function of \(k,k\in \mathcal {K}\) as follows.

$$ u_{k}=\Delta_{3,k}I_{k}(\mathbf{a}) + \sum_{l\in\mathcal{K} \setminus\{k\}} \Delta_{3,l} p_{k} g_{k,b_{l}} \mathbb{1}_{\{a_{l} = a_{k}\}}. $$

Then, we investigate the properties of \(\mathcal {G}_{\mathbf {a}^{'}}\).

Theorem 1

If the variable \(\phantom {\dot {i}\!}\mathbf {a}^{'}\) is predetermined and the potential function \(\phantom {\dot {i}\!}\phi (\mathbf {a}) = \sum _{l\in \mathcal {K}} \left (\Delta _{3,l}I_{l} (\mathbf {a}) + \Delta _{4,l} \right)\), \(\phantom {\dot {i}\!}\mathcal {G}_{\mathbf {a}^{'}}\) is an exact potential game which exists at least one NE point a. Moreover, the near-optimal solution to the proposed channel access problem (4) is a pure strategy NE of \(\phantom {\dot {i}\!}\mathcal {G}_{\mathbf {a}^{\ast }}\).


The potential function of \(\mathcal {G}_{\mathbf {a}^{'}}\phantom {\dot {i}\!}\) is defined as follows:

$$ \phi(\mathbf{a})=\sum_{l\in\mathcal{K}}\left(\Delta_{3,l}I_{l}(\mathbf{a})+\Delta_{4,l}\right). $$

Then, (10) is rewritten as (11).

$$ {\begin{aligned} \phi(\mathbf{a}) =& \underbrace{\Delta_{3,k}I_{k} (\mathbf{a}) + \sum_{l\in\mathcal{K}\setminus\{k\}} \Delta_{3,l} p_{k} g_{k,b_{l}} \mathbb{1}_{\{a_{l}=a_{k}\}} }_{u_{k} (a_{k},\mathbf{a}_{-k})} \\ &+\! \underbrace{\sum_{l\in\mathcal{K}} \left(\Delta_{3,l} {\sigma_{l}^{2}} +\! \Delta_{4,l} \right) +\! \sum_{l\in\mathcal{K}\setminus\{k\}} \left(\Delta_{3,l} \left(\sum_{m\in\mathcal{K}\setminus\{l,k\}} p_{m} g_{m,b_{l}} \mathbb{1}_{\{a_{m}=a_{l}\}} \right) \right)}_{v(\mathbf{a}_{-k})} \end{aligned}} $$

Suppose that an arbitrary player k unilaterally changes its strategy from ak to \(\bar {a}_{k}\), we can obtain the following equation based on (11):

$$ \phi(\bar{a}_{k}, \mathbf{a}_{-k}) - \phi(a_{k}, \mathbf{a}_{-k}) = u_{k} (\bar{a}_{k}, \mathbf{a}_{-k}) - u_{k}(a_{k}, \mathbf{a}_{-k}). $$

The equation above shows that the change in any single player’s utility function due to unilateral strategy deviation results in exactly the same amount of change in the potential function. Therefore, according to Definition 2.2 in [24, 28], \(\phantom {\dot {i}\!}\mathcal {G}_{\mathbf {a}^{'}}\) is an exact potential game with potential function ϕ(a). As a kind of potential games, \(\phantom {\dot {i}\!}\mathcal {G}_{\mathbf {a}^{'}}\) has some desirable properties, one of which is that \(\mathcal {G}_{\mathbf {a}^{'}}\phantom {\dot {i}\!}\) exists at least one NE point.

Although each player in \(\mathcal {G}_{\mathbf {a}^{'}}\phantom {\dot {i}\!}\) focuses on maximizing its own utility value, we characterize the achievable performance of NE points of \(\mathcal {G}_{\mathbf {a}^{'}}\phantom {\dot {i}\!}\) by exploiting the inherent structure of the exact potential game.

Denote aopt as an optimal channel allocation profile that maximizes the potential function ϕ, i.e.:

$$ \mathbf{a}^{\text{opt}}\in \arg \max_{\mathbf{a}\in\mathcal{A}} \phi(\mathbf{a}). $$

It has been proved that all NEs are the maximizers of the potential function ϕ, either locally or globally, for any exact potential game [28]. The best equilibrium point is aopt. Obviously, aopt is a near-optimal solution of (4) when \(\phantom {\dot {i}\!}\mathbf {a}^{'}=\mathbf {a}^{\ast }\).

Hence, Theorem 1 is proved. □

5 Decentralized algorithm for achieving the best NE

According to the above theoretic analysis of \(\mathcal {G}_{\mathbf {a}^{'}}\phantom {\dot {i}\!}\), an approach of achieving the best NE of \(\mathcal {G}_{\mathbf {a}^{'}}\phantom {\dot {i}\!}\) is also the approach to obtain a near-optimal solution of problem (4). In this section, we propose a distributed channel allocation learning algorithm to solve (4) in a distributed manner.

5.1 Algorithm description

Taking into account the above analysis, we give a detailed procedure of solving the channel allocation problem, labeled as Algorithm 1. Algorithm 1 includes two-tier loops of inner loop and outer loop, in which the variable \(\phantom {\dot {i}\!}\mathbf {a}^{'}\) is updated until the near-optimal solution is achieved in inner loop. Two stages of inner loop in this improved algorithm are presented as follows: (a) In step 1, one player is randomly selected from the set of updatable players to update its strategy. Then, the selected player chooses an action and gets feedback in the form of the resulting state and an associated reward. (b) In step 2, the selected player updates its alternative action selection based on (14). In Algorithm 1, the stop criterion is set to be the case that the change of the potential function is trivial.

The proposed algorithm is not easily trapped in an undesirable NE when the game has multiple NE points because of its some favorable properties: (a) it is an uncoupled algorithm, namely, each player only needs to acquire the information of channel selection actions; (b) it can achieve the best NE which is the global optimum of maximizing potential function.

5.2 Convergence and optimality analysis

In order to investigate the actual performance of Algorithm 1, Theorems 2 and 3 characterize its convergence and optimality.

Theorem 2

If all players perform the proposed distributed channel allocation learning algorithm with the fixed \(\phantom {\dot {i}\!}\mathbf {a}^{'}\), the network converges to an unique stationary distribution of players’ strategy profile, which is given by:

$$ \pi(\bar{\mathbf{a}})=\frac{(1+\lambda)^{\gamma\phi(\bar{\mathbf{a}})}}{\sum_{\mathbf{a}\in\mathcal{A}}(1+\lambda)^{\gamma\phi(\mathbf{a})}}. $$


Let z(t) be the state of channel allocations at the t-th iteration of Algorithm 1 with the fixed \(\phantom {\dot {i}\!}\mathbf {a}^{'}\). Obviously, z(t) is an irreducible and aperiodic Markov process. Then, we will verify that the process determined by the distribution (14) is reversible. It is to say that for \(\forall \mathbf {a},\bar {\mathbf {a}}\in \mathcal {A}\), we have:

$$ \pi(\mathbf{a})P(\bar{\mathbf{a}}|\mathbf{a})=\pi(\bar{\mathbf{a}})P(\mathbf{a}|\bar{\mathbf{a}}), $$

where \(P(\bar {\mathbf {a}}|\mathbf {a})\) is the the transition probability from state a to \(\bar {\mathbf {a}}\).

When \(\mathbf {a}=\bar {\mathbf {a}}\), (15) clearly holds. When \(\mathbf {a}\neq \bar {\mathbf {a}}\), one player, say k, changes its working channel, which results in that one element of the network state has been changed, i.e., a=(ak,ak) and \(\bar {\mathbf {a}}=(\bar {a}_{k},\mathbf {a}_{-k})\). It is easy for us to check that:

$$ \begin{aligned} \pi(\mathbf{a}) P (\bar{\mathbf{a}} | \mathbf{a}) & = \left(\frac{(1+\lambda)^{\gamma\phi(\mathbf{a})}} {\sum_{\tilde{\mathbf{a}} \in \mathcal{A}} (1+\lambda)^{\gamma \phi\left(\tilde{ \mathbf{a}}\right)}} \right) \left(\frac{1} {|\mathcal{K}|} \right)\\ & \quad\left(\frac{(1+\lambda)^{\gamma u_{k}(\bar{a}_{k}, \mathbf{a}_{-k})}} {\max \left\{ (1+\lambda)^{\gamma u_{k} (\mathbf{a})}, (1+\lambda)^{\gamma u_{k} (\bar{a}_{k}, \mathbf{a}_{-k})} \right\}} \right)\\ & = c(1+\lambda)^{\gamma (\phi(\mathbf{a})+u_{k}(\bar{a}_{k}, \mathbf{a}_{-k}))}, \end{aligned} $$

where c=c1c2, \(c_{1} = \frac {1} {|\mathcal {K}| \sum _{\tilde {\mathbf {a}} \in \mathcal {A}}(1+\lambda)^{\gamma \phi (\tilde {\mathbf {a}})}}\), and \(c_{2} = \frac {1}{\max \left \{ (1+\lambda)^{\gamma u_{k} (\mathbf {a})}, (1+\lambda)^{\gamma u_{k} \left (\bar {a}_{k},\mathbf {a}_{-k}\right)} \right \}}\).

According to the symmetry, we have:

$$ \pi(\bar{\mathbf{a}})P(\mathbf{a}|\bar{\mathbf{a}})=c(1+\lambda)^{\gamma(\phi(\bar{\mathbf{a}})+u_{k}(a_{k},\mathbf{a}_{-k}))}. $$

By substituting (12) into (16), we can obtain:

$$ \begin{aligned} \pi(\mathbf{a})P(\bar{\mathbf{a}}|\mathbf{a})&=c(1+\lambda)^{\gamma(\phi(\bar{\mathbf{a}})+u_{k}(a_{k},\mathbf{a}_{-k}))}\\ & = \pi(\bar{\mathbf{a}})P(\mathbf{a}|\bar{\mathbf{a}}). \end{aligned} $$

Thus, we can derive that:

$$ \sum_{\mathbf{a}\in\mathcal{A}} \pi (\mathbf{a}) P (\bar{\mathbf{a}} | \mathbf{a}) = \sum_{\mathbf{a}\in\mathcal{A}} \pi (\bar{\mathbf{a}}) P (\mathbf{a}| \bar{\mathbf{a}}) = \pi(\bar{\mathbf{a}}), $$

which is the balanced equation of Markov process.

Hence, Theorem 2 is proved. □

Theorem 3

If the variable \(\phantom {\dot {i}\!}\mathbf {a}^{'}\) is fixed, the inner loop of Algorithm 1 converges to the best NE point of \(\mathcal {G}_{\mathbf {a}^{'}}\phantom {\dot {i}\!}\) with an arbitrarily high probability when γ is sufficiently large. Therefore, the MOS level of the IoT network is approximately maximized when \(\mathbf {a}^{'}\phantom {\dot {i}\!}\) is the NE point.


It is noted from Theorem 1 that aopt is represented as an optimal channel allocation profile that maximizes the potential function ϕ, which is also the best NE of \(\mathcal {G}_{\mathbf {a}^{'}}\phantom {\dot {i}\!}\).

According to Theorem 2, the proposed algorithm converges to a unique stationary distribution. When γ is sufficiently large, \((1+\lambda)^{\gamma \phi \left (\mathbf {a}^{\text {opt}} \right)} \gg (1+\lambda)^{\gamma \phi (\mathbf {a})}, \forall \mathbf {a}\in \mathcal {A}\setminus \{\mathbf {a}\}\). According to (14), the unique stationary distribution of players’ strategy profile is (0,…,0,1,0,…,0), where 1 denotes the probability of the optimal channel allocation solution and the probabilities of other non-optimal solutions are all 0. Thus:

$$ {\lim}_{\gamma \rightarrow \infty} \pi \left(\mathbf{a}^{\text{opt}}\right)=1, $$

which means that the proposed learning algorithm converges to the best NE of \(\mathcal {G}_{\mathbf {a}^{'}}\phantom {\dot {i}\!}\) with an arbitrarily high probability. When \(\phantom {\dot {i}\!}\mathbf {a}^{'}=\mathbf {a}^{\text {opt}}\), the NE obtained by the proposed algorithm is a near-optimum solution to problem (8). Then, the MOS level of the IoT network is approximately maximized when \(\phantom {\dot {i}\!}\mathbf {a}^{'}\) is the NE point.

Thus, the proof is completed. □

6 Simulation results and discussion

In this section, numerical simulations are performed by Matlab software to validate the efficiency and performance of our proposed algorithm for solving the channel allocation problem of IoT uplink communications over cellular networks.

6.1 Scenario setup

We consider one MBS with a hexagonal coverage area where there are randomly layouts of 2 SCBSs. For convenience, we assume that there are 3 SOs randomly located in each SCBS and the other 10 SOs in the coverage area of MBS. Here, we suppose that each SO has the same uplink transmission power which is set to 23 dBm. Accordingly, suppose that the total 5 MHz bandwidth in this heterogeneous network constitutes N=10 channels with each same bandwidth 487.5 kHz. Each SO chooses 1 channel for transmission. Rayleigh fading model is considered in the simulation and \(h_{b_{c}}\) is the link gain from SO d to BS bc, which is expressed as \(\phantom {\dot {i}\!}h_{b_{c}}=\xi _{b_{c}}\left (L_{b_{c}}\right)^{-\theta }\), where \(L_{b_{c}}\) is the distance between SO d and BS bc, \(\xi _{b_{c}}\) denotes the channel fading component and θ is the path loss exponent. The noise power is set to σ2=− 174 dBm/Hz. In the following simulations, the simulation results are obtained by 400 independent trials and those parameters involved are optimized by experiments.

6.2 Convergence behavior and optimality of this algorithm

In this subsection, we first investigate the convergence behavior comparison between Algorithm 1 and best response dynamic (BRD). It is shown from Fig. 3 that Algorithm 1 and BRD can respectively converge to two stable points as the number of iterations increases. Compared with BRD, Algorithm 1 has a faster convergence speed and achieves a better solution. It is supported with the proved fact that BRD can only converge to one NE of the potential game which may be not the best NE. Conversely, our algorithm can find the the best NE which is also the optimal solution of maximizing potential function. Therefore, the proposed algorithm is distributed and can obtain a better convergence performance. Figure 4 plots the changing curves of the optimization objectives in problems P and \(\tilde {P}\) as the number of iterations increases by performing Algorithm 1 with fixed \(\mathbf {a}^{'}\phantom {\dot {i}\!}\). It is shown from Fig. 4 that sum-MOS in problem P gradually increases and converges eventually along with the increase of iteration times, which is consistent with the variation tendency of sum-\(\widetilde {\text {MOS}}\) in problem \(\tilde {P}\). This indicates that the increase of sum-\(\widetilde {\text {MOS}}\) by selecting better channel allocation strategy profile also causes the improvement of sum-MOS. Although sum-\(\widetilde {\text {MOS}}\) continues to increase in the latter process, the value of sum-MOS is unchangeable. Since MOS is a piecewise function, the better solution for \(\tilde {P}\) cannot further enhance the performance of P, which implies that multiple optimal solutions exist.

Fig. 3
figure 3

The comparison results of the convergence performance of Algorithm 1 and BRD with fixed \(\protect \phantom {\dot {i}\!}\mathbf {a}^{'}\)

Fig. 4
figure 4

The changing curves of the optimization objectives as the number of iterations increases by Algorithm 1 with fixed \(\protect \phantom {\dot {i}\!}\mathbf {a}^{'}\): a in problem P; b in problem \(\tilde {P}\)

In the following, we evaluate the MOS performance of each SO by preforming Algorithm 1. From Fig. 5, it is worth noting that Algorithm 1 can maintain a better SO fairness with respect to MOS performance by taking into account the impact of the interference generated by each SO on the entire network. Algorithm 1 is proposed to find the best NE of the channel access game \(\phantom {\dot {i}\!}\mathcal {G}_{\mathbf {a}^{'}}\), which achieves an approximately equal utility value for each SO shown in Fig. 5. Moreover, the fairness among SOs with respect to MOS or \(\widetilde {\text {MOS}}\) performance is guaranteed. Figure 6 plots the changing curve of sum-MOS as the number of iterations increases by preforming Algorithm 1. It is shown that Algorithm 1 can improve the QoE of SOs. However, our proposed algorithm cannot guarantee convergence to the global optimal solution of \(\tilde {P}\) since the potential function in \(\phantom {\dot {i}\!}\mathcal {G}_{\mathbf {a}^{'}}\) is different from the optimization objective in \(\tilde {P}\) where \(\widetilde {\text {MOS}}\) is a piecewise function. The best NE a of \(\mathcal {G}_{\mathbf {a}^{\ast }}\phantom {\dot {i}\!}\), i.e., the maximum of potential function, is obtained by performing Algorithm 1 which is only the near-optimum of P. It is noted from Fig. 6 that sum-MOS in P can eventually converge to a fixed point as the number of iterations increases and is close to the maximum value of sum-MOS. This indicates that our approach provides high performance for solving this difficult non-convex optimization problem.

Fig. 5
figure 5

The comparisons of utility, \(\widetilde {\text {MOS}}\) and MOS for each SO obtained by preforming Algorithm 1

Fig. 6
figure 6

The changing curve of sum-MOS as the number of iterations increases by preforming Algorithm 1

7 Conclusion

In this paper, we investigated the channel allocation problem in 5G-enabled IoT, by using a game-theoretic learning algorithm, to improve the QoE of SOs. In order to measure the QoE of SOs in IoT, we first defined a MOS function. Then, we proposed the exact potential game to formulate this optimization problem, in which the potential function was designed by approximatively converting the objective function into a tractable form. It was proved that the exact potential game existed the best NE which was a near optimization solution of the channel allocation problem. Aiming at the proposed game, we designed a distributed learning algorithm and proven it can converge to the best NE with an arbitrarily high probability.



Best response dynamic


Internet of things


Macrocell base station


Mean opinion score


Nash equilibrium


Quality of experience


Small cell base stations


Signal-to-interference-plus-noise ratio


Smart objects


  1. M. A. M. Albreem, A. A. El-Saleh, M. Isa, W. Salah, M. Jusoh, M. M. Azizan, A. Ali, in 2017 IEEE 4th International Conference on Smart Instrumentation, Measurement and Application (ICSIMA). Green Internet of things (IoT): an overview, (2017), pp. 1–6.

  2. W. Tan, M. Matthaiou, S. Jin, X. Li, Spectral efficiency of dft-based processing hybrid architectures in massive mimo. IEEE Wirel. Commun. Lett.6(5), 586–589 (2017).

    Article  Google Scholar 

  3. A. Musaddiq, Y. B. Zikria, O. Hahm, H. Yu, A. K. Bashir, S. W. Kim, A survey on resource management in IoT operating systems. IEEE Access. 6:, 8459–8482 (2018).

    Article  Google Scholar 

  4. W. Tan, S. Jin, C. -K. Wen, T. Jiang, Spectral efficiency of multi-user millimeter wave systems under single path with uniform rectangular arrays. EURASIP Wirel, J., Commun. Netw.2017(1), 181 (2017).

    Article  Google Scholar 

  5. X. Sun, N. Ansari, Dynamic resource caching in the IoT application layer for smart cities. IEEE Internet Things J.5(2), 606–613 (2018).

    Article  Google Scholar 

  6. Z. Qin, J. A. McCann, in 2017 IEEE Global Communications Conference. Resource efficiency in low-power wide-area networks for IoT applications, (2017), pp. 1–7.

  7. B. Khalfi, B. Hamdaoui, M. Guizani, Extracting and exploiting inherent sparsity for efficient IoT support in 5G: Challenges and potential solutions. IEEE Wirel. Commun.24(5), 68–73 (2017).

    Article  Google Scholar 

  8. C. Li, Y. Li, K. Song, L. Yang, Energy efficient design for multiuser downlink energy and uplink information transfer in 5G. Sci. China Inf. Sci.59(2), 1–8 (2016).

    Google Scholar 

  9. J. Yuan, S. Jin, W. Xu, W. Tan, M. Matthaiou, K. K. Wong, User-centric networking for dense C-RANs: high-SNR capacity analysis and antenna selection. IEEE Trans. Commun.65(11), 5067–5080 (2017).

    Article  Google Scholar 

  10. M. Zhang, W. Tan, J. Gao, S. Jin, Spectral efficiency and power allocation for mixed-ADC massive MIMO system. China Commun.15(3), 112–127 (2018).

    Article  Google Scholar 

  11. W. Tan, D. Xie, J. Xia, W. Tan, L. Fan, S. Jin, Spectral and energy efficiency of massive MIMO for hybrid architectures based on phase shifters. IEEE Access. 6:, 11751–11759 (2018).

    Article  Google Scholar 

  12. H. Ji, S. Park, J. Yeo, Introduction to ultra reliable and low latency communications in 5G (2017). arXiv preprint arXiv:1704.05565.

  13. C. Li, K. Song, D. Wang, F. Zheng, L. Yang, Optimal remote radio head selection for cloud radio access networks. Sci. China Inf. Sci.59(10), 102315:1–102315:12 (2016).

    Google Scholar 

  14. S. Li, Q. Ni, Y. Sun, G. Min, S. Al-Rubaye, Energy-efficient resource allocation for industrial cyber-physical IoT systems in 5G era. IEEE Trans. Ind. Inform.4(6), 2618– 28 (2018).

    Article  Google Scholar 

  15. C. Li, S. Zhang, P. Liu, F. Sun, J. M. Cioffi, L. Yang, Overhearing protocol design exploiting intercell interference in cooperative green networks. IEEE Trans. Veh. Technol.65(1), 441–446 (2016).

    Article  Google Scholar 

  16. H. Dai, Y. Huang, J. Wang, L. Yang, Resource optimization in heterogeneous cloud radio access networks. IEEE Commun. Lett.22(3), 494–497 (2018).

    Article  Google Scholar 

  17. A. K. Yerrapragada, B. Kelley, in 2017 12th System of Systems Engineering Conference (SoSE). An IoT self organizing network for 5G dense network interference alignment, (2017), pp. 1–6.

  18. N. N. Dao, M. Park, J. Kim, Resource-aware relay selection for inter-cell interference avoidance in 5G heterogeneous network for internet of things systems. Futur. Gener. Comput. Syst. (2018).

  19. W. Ejaz, M. Ibnkahla, Multiband spectrum sensing and resource allocation for IoT in cognitive 5G networks. IEEE Internet Things J.5(1), 150–163 (2018).

    Article  Google Scholar 

  20. A. Aminjavaheri, A. RezazadehReyhani, R. Khalona, Underlay control signaling for ultra-reliable low-latency IoT communications (2018). arXiv preprint arXiv:1711.02248.

  21. X. He, K. Wang, H. Huang, T. Miyazaki, Y. Wang, S. Guo, Green resource allocation based on deep reinforcement learning in content-centric IoT. IEEE Trans. Emerg. Top. Comput.9(3), 1–15 (2018).

    Google Scholar 

  22. M. Aazam, M. St-Hilaire, C. H. Lung, I. Lambadaris, in 2016 23rd International Conference on Telecommunications (ICT). MeFoRE: QoE based resource estimation at Fog to enhance QoS in IoT, (2016), pp. 1–5.

  23. J. Huang, Y. Yin, Q. Duan, H. Yan, in 2015 3rd International Conference on Future Internet of Things and Cloud. A game-theoretic analysis on context-aware resource allocation for device-to-device communications in cloud-centric internet of things, (2015), pp. 80–86.

  24. H. Dai, Y. Huang, R. Zhao, J. Wang, L. Yang, Resource optimization for device-to-device and small cell uplink communications underlaying cellular networks. IEEE Trans. Veh. Technol.67(2), 1187–1201 (2018).

    Article  Google Scholar 

  25. H. Zhang, Y. Xiao, S. Bu, D. Niyato, F. R. Yu, Z. Han, Computing resource allocation in three-tier IoT Fog networks: a joint optimization approach combining stackelberg game and matching. IEEE Internet Things J.4(5), 1204–1215 (2017).

    Article  Google Scholar 

  26. S. F. Abedin, M. G. R. Alam, N. H. Tran, C. S. Hong, in 2015 17th Asia-Pacific Network Operations and Management Symposium APNOMS). A Fog based system model for cooperative IoT node pairing using matching theory, (2015), pp. 309–314.

  27. A. Rullo, D. Midi, E. Serra, E. Bertino, in 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS). Strategic security resource allocation for internet of things, (2016), pp. 737–738.

  28. H. Dai, Y. Huang, C. Li, S. Li, L. Yang, Energy-efficient resource allocation for device-to-device communication with WPT. IET Commun.11(3), 326–334 (2017).

    Article  Google Scholar 

Download references


This work was supported by the National Natural Science Foundation of China (Grant No. 61801243), by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No. 18KJB510026), and by the Foundation of Nanjing University of Posts and Telecommunications (Grant No. NY218124).

Availability of data and materials

Mostly, I got the writing material from different journals as presented in the references. A MATLAB tool has been used to simulate my concept.

Author information

Authors and Affiliations



All authors contributed significantly to the research work presented in this paper. HD had a leading role in the formulation and solution of the considered optimization problem, while performing a detailed evaluation and analysis of the developed algorithm, through conducting an extensive set of simulations. HD and HZ completed the writing and formatting of the paper. WW did the experiments and simulations. BW helped in finalizing the solution and amending the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Baoyun Wang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dai, H., Zhang, H., Wu, W. et al. A game-theoretic learning approach to QoE-driven resource allocation scheme in 5G-enabled IoT. J Wireless Com Network 2019, 55 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: