Distributed resource allocation for D2D communications underlaying cellular network based on Stackelberg game

With the development of artificial intelligence, the large-scale access of intelligence equipment with complex heterogeneity will bring unpredictable spectrum limitation and complex interferences to traditional cellular networks. This situation can be alleviated by device-to-device (D2D) technique which improves spectrum efficiency by reusing cellular resources. In this paper, a resource allocation framework comprising channel allocation and power control based on Stackelberg game is proposed for distributed interference coordination between D2D and cellular communications with quality-of-service guarantee. First, base station matches the channel to D2D pairs on the basis of potential throughput gain. Second, interferences from D2D pairs to cellular users are converted into a penalty for D2D pairs through the interlayer price, and the optimization problem of system throughput is decoupled into multiple subproblems those can be solved in distributed and iterative manner in each D2D pair. Simulation results show that two proposed distributed algorithms of channel allocation and power control on interference coordination perform well in convergence and overall system throughput.

An intelligent equipment can use D2D communication technology for direct communication via an isotropic antenna at every base station. D2D provides a new way for lowlatency communications, large data transmission and massive access for AI terminal [8].
D2D communications improve the performance of traditional cellular networks in terms of spectral efficiency, overall throughput and energy efficiency [9,10]. Despite the potential data rate gain, D2D communications also pose new challenges on interference management [11][12][13]. On the one hand, CUs experience interlayer interferences from D2D pairs. On the other hand, D2D pairs experience interlayer interference from CUs and intralayer interference from other D2D pairs reusing the same resource [14]. Channel allocation and power control are two well-known effective methods for the coordination of cellular and D2D communications [15,16]. The two methods reduce interferences from D2D communications to the cellular networks and enhance the gain of D2D communications [17]. A great deal of work has recently appeared in the literature on the design of interference mitigation strategies of channel allocation and power control.
Most scholars have merely studied the resource allocation problem without power control to avoid interferences. Resource allocation methods in [18] and [19] prevent serious interference with the near-far principle. The base station (BS) always chooses CUs those are well isolated for a particular D2D pair to share RBs. These methods are simple but may result in poor performance when QoS is further considered. Li et al. propose a resource allocation solution on the basis of adaptive antenna arrays and interference alignment to coordinate interference efficiently [20]. One D2D pair reuses the resource of one CU, and one CU shares resource with one D2D pair. Evidently, the spectral efficiency of one-to-one resources reuse model is limited. Many studies allow multiple D2D pairs to share the same resources with one CU or one D2D pair to reuse the resources of multiple CUs [21,22]. As research continues, an increasing number of researchers combine resource allocation and power control to improve the efficiency of resource reusing. Xu et al. studied the problem of pairing CUs and D2D users for sharing the same radio resources and proposed a reverse iterative combinatorial auction-based power allocation scheme to optimize the system throughput [23]. Wang et al. further analyze the relationship between access probability and channel gain of D2D pairs and propose a heuristic resource allocation scheme based on greedy method [24]. A joint RBs assignment and power allocation framework based on interference graph is proposed in [25]. Similarly, in [26], the authors use hyper-graph theory-based channel allocation to coordinate interferences between D2D pairs and CUs. However, schemes focused on centralized management are not suitable for a large-scale network [27]. Generally, CUs and D2D users may be of self-interest to maximize their benefits in the network. This will require CUs, D2D users and BS to solve distributed decision problems. A distributed framework of resource management and interferences coordination was designed in [28], and authors solve the resource allocation by a column generation algorithm. Some auction-based game resource allocation schemes have been developed [29]. A distributed algorithm using auction game technique was adopted in [30] to maximize data rate of the underlay layer users while maintaining interferences in macro-tier within acceptable range. However, an auction game needs many rounds of negotiation between buyers and sellers, which may result in large control signaling overhead [31]. Fictitious prices can be used in coordinating and controlling the transmissions of network nodes in game theory to solve contradiction between overhead and gain.
In this study, we design a distributed channel allocation and power control strategy in addressing the interlayer and intralayer interferences between users to maximize the overall system throughput and protect cellular communications. First, we design a low-complexity channel allocation method based on the potential throughput gain of D2D users; next, for the power control problem of D2D pairs, we model the interaction between CUs and D2D pairs as a Stackelberg game, in which BS prices the received interferences from the D2D pairs. Given the restriction and penalty by price, competition among all users turns into cooperation.

System model
We consider resource sharing in a single cellular network, where D2D communications reuses the uplink spectrum resources of cellular communications as shown in Fig. 1. The network is provided with a set K of K orthogonal channels, K = {1, 2, . . . , K } , each of which is represented by k ∈ K . C CUs are denoted by set C . We consider a fully loaded network where the channels in the network are occupied by an equal number of CUs, C = K . For brevity, we refer to the CU occupying channel k as CU k. N D2D pairs in the network are denoted by set D , D = {1, 2, . . . , N } , N > K . Multiple D2D pairs can share the same channel, and each D2D pair can reuse one channel at most. For conciseness, D k , D k = {1, 2, . . . , D k } , represents the set of D2D pairs reusing channel k. p k i represents the transmit power of D2D pair i on channel k. p D k = p k 1 , p k 2 , . . . , p k i , . . . , p k D k represents the power allocation vector for D k . We denote a channel reusing indicator of the i-th D2D link at channel k by x ik ∈ {0, 1} , where x ik = 1 if D2D pair i accesses channel k, and x ik = 0 otherwise. In uplink transmission, BS suffers the interlayer interferences from D2D transmitters, and D2D receivers suffer interlayer and intralayer interferences from CUs and D2D transmitters those reuse the same channel, respectively. Therefore, the received SINR of D2D pair i and CU k on channel k can be written as where p c denotes the transmit power of the CUs, g k ii is the channel gain from the i-th D2D transmitter to the i-th D2D receiver on channel k, g ki is channel gain from the CUs k to the ith D2D receiver, g k ji is channel gain from j-th D2D transmitter to the i-th D2D receiver on channel k, g kb is channel gain from CU k to BS, and σ 2 represents noise power.

Problem formulation
Our objective is to maximize the total throughput of CUs and D2D pairs subject to a constraint to guarantee the performance of cellular transmissions. Mathematically, the overall throughput optimization problem for channel allocation and power control can be formulated as where w c is the frequency bandwidth of a channel. Constraint (3a) limits the maximum transmit power of D2D transmitter. Constraint (3b) guarantees the QoS of CUs, where Q k is the interference tolerance level depending on the requirements and channel gain of the CU at channel k. Constraint (3c) ensures that one channel is assigned to each D2D pair at most.
Equation (3) is a typical complex MINLP problem, which is usually intractable. As the channel allocation matrix for D2D pairs, X N×K = [x ik ] leads to a combinatorial problem. Moreover, given the channel allocation, the objective function is non-concave for power allocation vector. The optimal solution may be solved through exhaustive search, which has extremely high complexity even in a modest size network. One method to solve it is to relax the integer constraint to [0, 1] [31]. After relaxation, the data rate of D2D pairs and CUs can be calculated by expected SINR, respectively. However, Eq. (3) is still not a (1) ii k∈C x ik p c g ki + j∈D,j� =i x jk p k j g k ji + σ 2 convex optimization problem. Another solution is to transform the expectation of rate to a convex function by changing logarithmic inversion of variable x ik ; thus, the optimal solution is on the boundary of the feasible set [32]. However, given that the feasible set is no longer a polyhedron, it is difficult to the optimal solution. Instead of centralized methods, we propose a distributed strategy with low coordination and communication overhead. We decouple the problem into two subproblems, channel allocation and power control. First, a heuristic algorithm based on the potential rate obtained by reusing partners of a single D2D pair and a CU is adopted to allocate channels for D2D pairs. Subsequently, we introduce the price for channel resources to decouple the interference constraint and develop a Stackelberg game model to arrive at an optimal power allocation iteratively.

Channel allocation
When D2D pair i reuses the channel resources of CU k, the sum rate of D2D pair i and CU k is where w c is the frequency channel bandwidth. In the optimization problem of overall throughput, the sum rate obtained by a reusing the partner of a single D2D pair and a CU is rarely concerned. We optimize T (p k i ) to find an approximately optimal reusing relationship between D2D pairs and CUs, The SINR of CUs in cellular network is usually required to be greater than a certain threshold. Here, we set SINR threshold of CUs in our model to be γ th c . Then, the actual power constraint of D2D pairs in channel allocation phase can be further expressed as T p k i is a convex function on variable p k i , and Eq. (5) is a convex optimization problem. The optimal power of Optimization Problem can be expressed as At the same time, we obtain the throughput T (p k i ) corresponding to the optimal power. For channel k, allowing D2D pair with the optimal throughput T (p k i ) is more likely to improve system reusing efficiency under the premise of reaching the SINR threshold of CUs. Therefore, specific steps of channel allocation for D2D pairs in the system are as follows.
Step 1 Initialize D k = , ∀k ∈ C, ∀i ∈ D, x ik = 0 . Calculate the optimal power of D2D pair p k i and the corresponding T (p k i ).
Step 2 (i * , k * ) = arg max k∈C,∀i∈D T p k i , find D2D pairs i * and k * corresponding to the maximum throughput.
Step 3 If Step 4 Check D . Go to Step 5 when D is empty; otherwise, perform Steps 2 and 3.
Step 5 Output D k and X N×K .

Power control
We assume that the transmit power of CUs in the network is a fixed value p c , and the channel allocation is determined by a heuristic algorithm, as mentioned in channel allocation section. In this section, we consider the manner in which a reasonable power control strategy is designed for D2D pairs to maximize system throughput under the QoS constraint of CUs. The channels occupied by CUs in the system are orthogonal. Therefore, power control problem can be decoupled into K independent subproblems, each corresponding to a CU. We merely need to research power control of one D2D pair set D k . Without interference coordination, D2D pairs will choose the maximum transmit power to maximize their revenue, while CUs will refuse to share channel resources with D2D pairs. Therefore, we propose a power allocation scheme based on Stackelberg game with a price charging mechanism, where BS charges D2D pairs for their interferences to BS at channel k. Stackelberg is a strategic game that consists of a leader and multiple followers competing with one another. In our model, the leader initially sets the price of interferences for the channel, and then D2D pairs, as followers, update their transmit power to maximize their utilities on the basis of the assigned interference price.

Utility functions
The channel rate of D2D pair i and CU k on channel k can be obtained by For the leader, let µ k denote the unit price for the interferences brought by D2D pairs to channel k, and µ k ≥ 0 . The utility of the leader can be defined as its own throughput performance plus the gain earned from the followers. Mathematically, the utility function of the leader can be formulated as The optimization problem of leader is to maximize the utility, which can be expressed as where Q k is the interference tolerance level, which is dependent on the channel condition of CU k. Equation (11a) protects cellular transmissions. We define C −i D2D,k as the sum rate, except the D2D pair i in D k as Take the partial derivative of C −i D2D,k on p k i and let p k i = p k * i , and p k * i is the value obtained by the previous power iteration. Then, we derive the intralayer price c k i as In followers' game, players are the D2D pairs allocated to channel k, which are denoted by set D k . As one of followers, the utility function of D2D pair i is defined as where the second item of utility function is the cost of interlayer interferences from D2D pair i to CU k, and the last term of utility function is the cost due to the intralayer interferences among D2D pairs. On the basis of the utility function of D2D pairs, the optimization problem at each follower can be defined as Let p k * i be the optimal transmit power of D2D pair i on channel k, and p * D k = p k * 1 , p k * 2 , . . . , p k * i , . . . , p k D k be the optimal power allocation vector for D k . For the proposed Stackelberg game, Stackelberg equilibrium (SE) is defined as follows: Definition1 : µ * k and p * D k are the optimal strategies for leader and followers in Stackelberg game, respectively. A pair of strategy (µ k , p D k ) is an SE if no unilateral deviation in the strategy by the leader or the follower is profitable, that is, Generally, the best responses (BRs) for the followers must be initially calculated to obtain SE. This step is done because the leader moves first and the followers move accordingly. Then, the leader derives BR according to the followers' best strategy. On the basis of the functions of leader and followers, we can solve the problem through backward induction method. Therefore, we start with the problem of the followers.

Analysis of followers' game
For D2D pairs, on the one hand, a high transmit power can bring a high data rate. On the other hand, a high transmitted power incurs high cost due to the interferences to other D2D pairs and the CUs. Therefore, trade-off occurs between the data rate and the cost. The followers' game can be written as a tuple, The logarithmic function grows slower than the cost. Thus, the objective function of optimization problem in Eq. (15) is an approximate concave function with respect to p k i , and the second derivative of the function is greater than zero. BR of the power is derived by solving the first-order partial derivative of objective function, shown as Given that optimal power in Eq. (18) has a water-filling form as that in [22], the followers' game can reach SE between D2D pairs through iterative method. We define function . f describes the optimal transmitted power given that the power of other D2D pairs are fixed, and f is BR function. We propose a synchronous iterative algorithm corresponding to the BR function, which is called BR algorithm, where all D2D links adjust their power according to the BR function, as shown as follows: BR algorithm is described in Algorithm 1. By applying maximum theorem with U F i µ k , p k i , we find that BR function is a continuous function [33]. If BR algorithm converges, each D2D pair adjusts its transmit power that maximizes the utility. Thus, no D2D pairs can increase their utility by adjusting their power only; that is, they are at an SE. The BR algorithm will never converge to a solution that is not an SE.

Analysis of leader's game
The utility of leader consists of two parts: revenue from the data rate of CU k on channel k and selling interferences to D2D users. Therefore, the optimization can be written approximately as Thus, the solution of optimization problem in Eq. (20) which we call analytical price has the following form, The interlayer price µ * k is the optimal price adjusted by the leader at a set of given transmit power, and BS updates the price and dynamically broadcasts the price to D2D pairs in D k .
We propose a Stackelberg game algorithm based on analytical price, which is executed periodically to overcome channel state changing caused by mobility. The iterative update process in each cycle is as follows.
Step 2 Collect the information of D2D and cellular links and execute BR algorithm.
Step 5 Broadcast the final price. The scheme for the interference management of D2D and cellular communications based on the heuristic channel allocation and the Stackelberg game power control described above is named Distributed Resource Allocation for D2D communications underlaying cellular network based on Stackelberg Game (DRASG). Specifically, given that the power control is updated based on analytical price, we call it DRASG-AP algorithm.
The total complexity of DRASG-AP can be derived from the complexity of each step in the two-stage algorithm of channel selection and power control. The complexity of the channel selection phase is O(KN ) . In the power control, the upper bound of the complexity of the followers' game is O(2 D K log 2 ε −1 ) , and the upper bound complexity of the leader's game based on the Lagrangian analytical price is O log 2 −1 2 D K × D K 2 . Therefore, the total complexity of DRASG-AP is We also propose a method for obtaining the optimal price by decreasing the price to reduce computational complexity. The optimization problem of the leader is the utility maximization problem subject to interference constraints. Thus, the optimal price is dependent on the channel conditions, interferences and power constraints. If the leader sets the price extremely low, the followers will buy the interferences generated by p max , and the leader will then increase the price to further earn revenue. If the leader sets the price extremely high, the revenue from D2D will be zero. (23) i∈D k w c ln 2 Given 0 ≤ p k i ≤ p max , the optimal price is actually limited to a certain range, which is denoted as µ l k ≤μ k ≤ µ u k . The upper bound µ u k and lower bound µ l k of the price can be obtained by Eq. (18) as On the basis of the above analysis, we initially divide the price range µ l k , µ u k into intervals those are sufficiently small. Then, the leader calculates the corresponding price in each interval and measures the aggregate interferences. BS checks each price in a descending order and ultimately stops at the price that maximizes revenue while maintaining interference constraint. The power control of our scheme framework here is updated on the basis of the decreasing price. Thus, we call the algorithm DRASG-DP algorithm. The iterative update algorithm based on the decreasing price is shown in Algorithm 2.

Experimental method
We conduct comprehensive experimental simulations to evaluate the performance of the proposed algorithms for distributed channel allocation and power control for D2D communications underlaying cellular network. We consider a single-cell network with a radius of 500 m, where BS is located at the center and D2D pairs are randomly distributed. The distance from each D2D transmitter to receiver is less than 50 m. We assume that the bandwidth of the subband is 180 kHz. The main simulation parameters are listed in Table 1, unless otherwise specified. Simulations were performed with MATLAB 7.0 platform, and 100 rounds were performed for each set of simulations to ensure the reliability of simulation results. (25) µ l k = w c g k ii ln 2 g k ib g k ii p max + p c g ki + σ 2 + j∈D k \{i} p k j g k ji g k ib − j∈D k \{i} c k i g k ij g k ib (26) µ u k = w c g k ii ln 2 p c g ki +σ 2 + j∈D k \{i} p k j g k ji g k ib − j∈D k \{i} c k i g k ij g k ib

Sum rate of D2D pairs
We show the change of sum rate of D2D pairs versus the interference tolerance level of CUs numerically in Fig. 2. The sum rates of D2D pairs with the two proposed resources allocation algorithms, DRASG-AP and DRASG-DP, increase with the interference tolerance level. CUs accept higher interferences, which indicates that D2D links can access the channel occupied. Moreover, Fig. 2 shows that the performance of DRASG-AP is slightly better than that of DRASG-DP in terms of the sum rate of D2D pairs; on average, the former is 7.35 % higher than the latter because each iteration aims to maximize the leader's revenue and select the optimal price for the power adjustment in the first subgame stage of DRASG-AP, which can approach the optimal power distribution under  interference tolerance quickly and accurately. We also compare the performance of the algorithms without the influence of intralayer interferences, that is, c k i = 0 . The intralayer interference price factor can restrain the selfishness of D2D pairs and improve the sum rate of D2D communications.

Sum rate of CUs
We simulate the sum rate of CUs under different tolerance level. Figure 3 shows that the sum rate of CUs decreases with the increase of the interference tolerance level, because more D2D pairs are able to access. We compare the performance of our proposed algorithms to the limited-area scheme. In that area, D2D pairs are not permitted access to any channel when their transmitters are within a restricted area, with D as the radius, and BS as the center. The limited-area scheme is adopted only as a benchmark to observe the proposed algorithms. Thus, the average sum rate of the CUs of limited-area scheme is drawn as a horizontal line. Although the throughput of CUs can be protected by increasing the radius of the restricted area, it can be seen from Fig. 2 that our algorithms perform better in the performance of sum rate of D2D pairs when the interference tolerance level is high.

Total system throughput
We compare our algorithms with two similar resource allocation algorithms, PBRA [34] and WOA [35], in terms of total system throughput. In Fig. 4, on the one hand, the proposed algorithms DRASG-AP and DRASG-DP are sensitive to the interference tolerance level of CUs, and the total throughput varies greatly at different interference tolerance levels compared with WOA and PBRA. This result is caused by the optimal or suboptimal power and price, which can be obtained in the accessing process of D2D pairs in each sub link of the two game stages and finally reach the Nash equilibrium. On the other hand, the total system throughput with algorithms proposed in this study is considerably higher than the two comparison schemes under the same interference tolerance level. Particularly, when Q k = 0 , DRASG-AP increases by 39.1 % over WOA and 112.4 % over PBRA. The intralayer interferences between the D2D pairs in WOA and PBRA are only reflected in the calculation of D2D data rate, while the proposed algorithms introduce the intralayer price factor to the followers' game and reduce the adverse impact of the intralayer interferences on the system throughput.

Power convergence of D2D links
The change of transmission power of five D2D pairs on a specific channel is shown in Fig. 5, which reflects the power convergence of D2D links. In this experiment, the initial price of CU is set as 0.1, and Q k = 0 . The transmission powers of D2D links reach convergence quickly. The convergence value of the transmission power of D2D links is dependent on the channel gain of each D2D link. In the initial phase, all D2D links access high power at a low initial price. However, not all D2D links can maintain high power due to the interference constraint of CUs. In subsequent iteration, interferences from D2D pairs to CU are quantitatively converted into punishment by interlayer price. The total throughput of system (bps/Hz) DRASG-AP DRASG-DP WOA PBRA

Fig. 4 Total system throughput versus interference tolerance level
Therefore, D2D links gradually reduce their transmission power to prevent decreasing of D2D pairs revenue, the process of which continues until all D2D transmission powers converge.

Convergence of system throughput
We compare the convergence of system throughput in DRASG-AP and DRASG-DP with different interference tolerance levels. Figure 6 shows convergence trend of the two algorithms. We set the initial value of the interlayer price higher, µ k = 0.45 . DRASG-AP converges faster than DRASG-DP at the same interference level. This is because that DRASG-AP uses current optimal price as the guide price in each iteration of the followers' game, and thus the power of D2D pairs in DRASG-AP converges rapidly to maximize system throughput. However, it can also be found that both DRASG-AP and DRASG-DP perform well in convergence. The system throughputs of DRASG-AP and DRASG-DP are stable after 7 and 10 iterations, respectively, when Q k = 10 . In addition, the convergence performance is related to the interference tolerance level, whereby the larger Q k is, the slower the convergence is.

Conclusions
In the study, a distributed framework of joint channel allocation and power control is proposed for D2D communications underlaying cellular networks where each uplink channel can be shared by one CU and multiple D2D pairs. To maximize the system throughput and guarantee the QoS of CUs, a heuristic channel allocation scheme based on potential data rate gain is designed to find a suitable channel matching between D2D pairs and CUs. The power control problem is formulated as a pricingbased Stackelberg game, where BS acts as a leader. BS sets the price of interferences on each sub channel to suppress interlayer interferences brought by D2D communications. Simulation results demonstrate that the proposed algorithms perform well in overall system throughput and convergence.