Cache-enabled physical-layer secure game against smart uAV-assisted attacks in b5G NOMA networks

Li, Chao; Gao, Zihe; Xia, Junjuan; Deng, Dan; Fan, Liseng

doi:10.1186/s13638-019-1595-x

Research
Open access
Published: 06 January 2020

Cache-enabled physical-layer secure game against smart uAV-assisted attacks in b5G NOMA networks

Chao Li¹,
Zihe Gao²,
Junjuan Xia ORCID: orcid.org/0000-0003-2787-6582¹,
Dan Deng³ &
…
Liseng Fan¹

EURASIP Journal on Wireless Communications and Networking volume 2020, Article number: 7 (2020) Cite this article

2402 Accesses
19 Citations
Metrics details

Abstract

This paper investigates cache-enabled physical-layer secure communication in a no-orthogonal multiple access (NOMA) network with two users, where an intelligent unmanned aerial vehicle (UAV) is equipped with attack module which can perform as multiple attack modes. We present a power allocation strategy to enhance the transmission security. To this end, we propose an algorithm which can adaptively control the power allocation factor for the source station in NOMA network based on reinforcement learning. The interaction between the source station and UAV is regarded as a dynamic game. In the process of the game, the source station adjusts the power allocation factor appropriately according to the current work mode of the attack module on UAV. To maximize the benefit value, the source station keeps exploring the changing radio environment until the Nash equilibrium (NE) is reached. Moreover, the proof of the NE is given to verify the strategy we proposed is optimal. Simulation results prove the effectiveness of the strategy.

1 Introduction

In recent years, ultra-reliable and low-latency have been a very important requirement for supporting the wireless services for the B5G wireless communications [1–4]. To support this requirement, caching technique can pre-store the wireless data during non-peak traffic time and hence reduce the load traffic significantly [5–8]. In addition, non-orthogonal multiple access (NOMA) can provide much higher capacity and spectrum efficiency than that of orthogonal multiple access, and hence, it is one of the most promising candidate for supporting ultra-reliable and low-latency services. Moreover, NOMA protocol enables the source station to allocate the same spectrum and time resource to multiple users with power-domain multiplexing. In particular, NOMA protocol can serve different kinds of users, and it can flexibly support ultra-reliable and low-latency services for both far and near users.

Although NOMA technology can provide a reliable performance in enhancing wireless transmission, its transmission security is threatened by the eavesdroppers due to the broadcasting nature of wireless communications [9–13]. The authors in [14] have studied the protection of physical-layer security and proposed strategies for wireless communication networks which have been confirmed to perform efficiently. In [15], the authors studied the antenna selection algorithm to protect physical-layer security in NOMA network with an eavesdropper. However, the conventional strategies for protecting the physical-layer security in NOMA system work well, only when the attacker just has one work mode. Intelligent attacker with multiple work mode is proposed in [16–20] to reduce the data rate of communication systems by freely switching between eavesdropping, jamming, deception, and silent. If the networks continue to adopt the conventional strategies, the intelligent attacks will not be suppressed.

To tackle this problem, the authors in [21–24] proposed a transmission policy based on reinforcement learning. As a special branch of artificial intelligence, the reinforcement learning proposed in [25] can be regarded as a Markov decision-making process. The agent trained by reinforcement learning can decide the action to be executed according to the environment state at the current moment, and maximize the long-term cumulative rewards to obtain the optimal action set. However, the state transition probability is generally unknowable for the agent. The Q-learning is proposed in [26] to solve the problem. Combining dynamic programming with the Monte Carlo method, Q-learning can make the agent learn optimal strategies without knowing the state transition probability. As far as we know, no previous work has used the Q-learning algorithm to protect secure transmission in the NOMA system, which is threatened by the intelligent attacker.

Due to mobility and ease of deployment, unmanned aerial vehicles (UAVs) have arisen as a new type of communication nodes in the wireless networks, for example, the UAVs can perform as a relay or base station under extreme natural conditions. However, a UAV can be a mobile intelligent attacker if it is equipped with attack module. In this paper, we investigate a NOMA network with two users in the presence of an UAV attacker which can execute multiple attack modes. The source station sends the composite signals to two users at the same time; therefore, the total transmit power is divided into two parts. We dynamically allocate the proportions of transmit power to confront the intelligent attacker. In the wireless communication process, it is hard to know the work mode transition probability of intelligent attacker. As a model-free learning method without depending on the state transition probability, the Q-learning is adopted to obtain a learning-based adaptive policy. Furthermore, we formulate the confrontation between the source station and intelligent attacker as a dynamic game, and we derive the Nash equilibrium (NE) of the dynamic game. Simulation results show that the strategy we proposed significantly improved the data rate of NOMA system.

2 Methods/experimental

Consider one cache-enabled source station S can pre-store a certain amount of information. There exists one cell-edge user U₁ and one central user U₂ in the coverage of S, where U₂ is closer to S than U₁. When the request signals from users are received, S transmits cached messages based on NOMA protocol to users. Furthermore, there exists a UAV which performs as an intelligent attacker E in this area. We suppose that the UAV is more likely to attack cell-edge user U₁, and the UAV remains in the same position when attacking. Programmable radio equipment on E can flexibly select to overheard information from S, send jamming or deception signals to U₁, or keep silent. We denote these four work modes of E as m=0,1,2, and 3, respectively. In the experiment, the purpose of E is to attempt to decrease the system data rate and reduce the correctness of user decoding. For simplicity, all the devices in this experiment are equipped with single antenna.

3 NOMA networks

Now, we depict the NOMA network system model which is shown in Fig. 1. We suppose that S transmits a composite signal consisting of x₁ and x₂, which contains messages requested by U₁ and U₂, respectively. According to NOMA protocol, S divides the total transmit power P_S into two portions, i.e., αP_S and βP_S, where α and β are the power allocation factors for x₁ and x₂, respectively. In order to satisfy the requirements of different transmission distance, the two factors αP_S and βP_S have to meet the following constraint conditions:

$$ \left\{ \begin{array}{lr} {\alpha \gg \beta,} \\ {\alpha + \beta \leq 1.} \end{array} \right. $$

(1)

In order to fight against the intelligent UAV attacker E, S works on improving system data rate by consciously changing its power allocation factor α. For the first step of the transmission process, S chooses a value for the power allocation factor α to transmit the mixture signal x₁,x₂, and then, the received signal at U₁ denoted by $y_{_{U_{1}}}$ can be given as:

$$\begin{array}{*{20}l} y_{_{U_{1}}}=h_{SU_{1}}(\sqrt{\alpha P_{S}}x_{1} + \sqrt{\beta P_{S}}x_{2})+n_{_{U_{1}}} \end{array} $$

(2)

where $h_{SU_{1}}\sim \mathcal {CN}(0,{\nu }^{2})$ is the instantaneous channel coefficient of S−U₁ link. $n_{_{U_{1}}}{\sim }\mathcal {CN}(0,{\sigma }^{2})$ represents the additive white Gaussian noise (AWGN) received at U₁ [27–30]. The resultant SINR for x₁ at U₁ can be written as:

$$\begin{array}{*{20}l} {\text{SINR}}_{U_{1}}^{x_{1}} = \frac{\alpha P_{S}|h_{SU_{1}}|^{2}}{\beta P_{S}|h_{SU_{1}}|^{2}+ {\sigma}^{2}}. \end{array} $$

(3)

when m=0 holds, i.e., E shuts down radio equipment and stays silent. In this case, the achievable rates of x₁ at U₁ denoted by $C_{_{U_{1}}}$ is exactly the system data rate C_sys,0. Thus, the system data rate is acquired by [31]:

$$\begin{array}{*{20}l} C_{sys, 0} & = \log_{2}(1+ \frac{\alpha P_{S}|h_{SU_{1}}|^{2}}{\beta P_{S}|h_{SU_{1}}|^{2}+ {\sigma}^{2}}) \\ & = \log_{2}(1+\frac{\alpha{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}+1}), \end{array} $$

(4)

where $\widetilde {P}_{S} = P_{S}/{\sigma }^{2}$. When m=1 holds, E executes to overhear information from S; the received signal at E can be given as:

$$\begin{array}{*{20}l} y_{_{E}} = h_{SE}(\sqrt{\alpha P_{S}}x_{1} + \sqrt{\beta P_{S}}x_{2}) + n_{_{E}}, \end{array} $$

(5)

we assume that perfect SIC receiver is applied at E; thus, according to [32], the achievable rate of x₁ at E denoted by $C_{_{E}}$ can be written as:

$$\begin{array}{*{20}l} C_{_{E}} = \log_{2}(1+\frac{\alpha{\widetilde{P}_{S}}|h_{SE}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SE}|^{2}+1}), \end{array} $$

(6)

where $h_{SE}{\sim }\mathcal {CN}(0,{\mu }^{2})$ is the instantaneous channel coefficient of S−E link. $n_{_{E}}{\sim }\mathcal {CN}(0,{\sigma }^{2})$ represents AWGN received at E. Consequently, according to [17], the system data rate C_sys,1 can be computed by:

$$\begin{array}{*{20}l} C_{sys, 1} = [C_{sys, 0}-C_{_{E}}]^{+}, \end{array} $$

(7)

where [X]⁺ returns X if X is positive, while returns 0 otherwise. When m=2 holds, E selects to transmit a jamming signal to U₁; the received signal $y_{_{U_{1}}}$ at U₁ can be acquired by:

$$\begin{array}{*{20}l} y_{_{U_{1}, J}}=\! h_{SU_{1}}(\sqrt{\alpha P_{S}}x_{1}\,+\, \sqrt{\beta P_{S}}x_{2})+ h_{EU_{1}}\sqrt{P_{J}}x_{_{J}} \,+\, n_{_{U_{1}}} \end{array} $$

(8)

where $h_{EU_{1}}{\sim }\mathcal {CN} (0, {\lambda }^{2})$ is the instantaneous channel coefficient of E−U₁ link. P_J is the jamming power of E, and $x_{_{J}}$ represents the jamming signal transmitted by E. Therefore, in this case, the system data rate C_sys,2 can be computed by:

$$\begin{array}{*{20}l} C_{sys, 2}=\log{2}(1+\frac{{\alpha}{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}+{\widetilde{P}_{J}}|h_{EU_{1}}|^{2}+1}) \end{array} $$

(9)

where $\widetilde {P}_{J} = P_{J}/{\sigma }^{2}$. When m=3 holds, S does not send signal to U₁ while E transmits the deception signal $x_{_{D}}$. The received signal at U₁ becomes:

$$\begin{array}{*{20}l} y_{_{U_{1}, D}}=h_{EU_{1}}\sqrt{P_{D}}x_{_{D}}+n_{_{U_{1}}}, \end{array} $$

(10)

where P_D is the deception power. The increase of the deception signal received by U₁ is bound to cause more loss in the achievable rate at U₁. Thus, the system data rate C_sys,3 can be formulated as a linear function and given by:

$$\begin{array}{*{20}l} C_{sys, 3}=C_{sys, 0}-{\gamma}\log_{2}(1+{\widetilde{P}_{D}} |h_{EU_{1}}|^{2}), \end{array} $$

(11)

where $\widetilde {P}_{D}=P_{J}/{\sigma }^{2}$. γ∈(0,1) is the deception factor which quantifies the probability of the influence of each deception signal.

4 Secure game in NOMA network

The interaction between S and E in the NOMA network performs in a rivalry way, which is formulated as a secure game. To discuss the process of the secure game, we need to first quantify the variety range of α. While ensuring that U₁ can decode the received information correctly, we must also ensure that U₂ can correctly decode x₂. We denote the minimum data rate requirement for U₁ and U₂ as $C_{\min }^{U_{1}}$ and $C_{\min }^{U_{2}}$. Thus, α and β satisfy the following constraint:

$$\begin{array}{*{20}l} &\log_{2}(1+ \frac{\alpha {\widetilde{P}_{S}}|h_{SU_{1}}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}+1})\geq C_{\min}^{U_{1}}, \end{array} $$

(12)

$$\begin{array}{*{20}l} &\log_{2}(1+{\beta}{\widetilde{P}_{S}}|h_{SU_{2}}|^{2})\geq C_{\min}^{U_{2}}, \end{array} $$

(13)

according to (1), the threshold value of α is given by:

$$ \left\{ \begin{array}{lr} {\alpha_{\max} = 1-\frac{2^{C_{\min}^{U_{2}}}-1}{{\widetilde{P}_{S}}|h_{SU_{2}}|^{2}},} \\ {\alpha_{\min} = \frac{(2^{C_{\min}^{U_{1}}}-1)({\beta}{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}+1)}{{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}}.} \end{array} \right. $$

(14)

where α_max and α_min are the maximum power allocation factor for x₁. We now turn to discuss the process of the secure game. S is adaptively adjusting its power allocation factor in the range of [α_min,α_max], while E selects to execute an attack modes m∈{0,1,2,3}, which represents keeping silent, eavesdropping, jamming, or deception, respectively. In each time slot, E attempts to reduce the system data rate, i.e., C_sys,1,C_sys,2, or C_sys,3. S devotes to increase the system data rate by controlling α and meanwhile suppressing the probability of attacking. In view of this, we regard the confrontation between S and E as a zero-sum game. Depending on the system data rate and power consumption, the reward function of S denoted by R_S in the zero-sum game is formulated as:

$$\begin{array}{*{20}l} R_{S}(\alpha, m)=\ln2 C_{sys, m}- \alpha{\theta}, \end{array} $$

(15)

where θ is the total power consumption. We introduce coefficient ln2 to simplify the subsequent derivation process. According to the distinguishing feature of zero-sum game, the reward function of E denoted by R_E is defined as:

$$\begin{array}{*{20}l} R_{E}(\alpha, m)=-\ln2 C_{sys, m}- \varphi_{m}, \end{array} $$

(16)

where φ_m=0,1,2,3 denotes the consumption of E in mode m. In the secure game, S tries to find an optimal power allocation factor in [α_min,α_max] to maximize R_S, and E is dynamically adjusting its work modes to maximize R_E. The purpose of the game between S and E is to achieve their own optimal strategies α^∗ and m^∗, respectively. Then, we define the set of strategies {α^∗,m^∗} as the Nash equilibrium (NE) of the secure game, where S and E gain the maximize reward value. Thus, the NE strategy is given by:

$$\begin{array}{*{20}l} & R_{S}(\alpha^{*}, m^{*}) \geq R_{S}(\alpha, m^{*}), \end{array} $$

(17)

$$\begin{array}{*{20}l} & R_{E}(\alpha^{*}, m^{*}) \geq R_{E}(\alpha^{*}, m). \end{array} $$

(18)

Through analytical derivation, we obtain one NE solution {α^∗,0}. That is to say, if S keeps choosing a power allocation factor α^∗, E will obtain the maximized reward value by keeping silent, and it has no motivation to execute any attack modes. Specifically, the NE solution is given and proved in the following Lemma 1 and Proof.

Lemma 1

: The secure game in the NOMA network has one NE solution {α^∗,0}, which is acquired by

$$\begin{array}{*{20}l} {\alpha}^{*}=\frac{\widetilde{P}_{S}|h_{SU_{1}}|^{2}-\theta}{\widetilde{P}_{S}|h_{SU_{1}}|^{2}\theta}-\beta \qquad \alpha_{\min} < {\alpha}^{*} \leq \alpha_{\max}. \end{array} $$

(19)

if the following constraints are met:

$$\begin{array}{*{20}l} &\frac{\widetilde{P}_{S}|h_{SU_{1}}|^{2}}{({\alpha_{\max}}\!\,+\,\!\beta)\widetilde{P}_{S}|h_{SU_{1}}|^{2}\!\,+\,\!1}\!\! < \theta < \!\! \frac{\widetilde{P}_{S}|h_{SU_{1}}|^{2}}{({\alpha_{\min}}\!\,+\,\!\beta)\widetilde{P}_{S}|h_{SU_{1}}|^{2}\!\,+\,\!1}, \end{array} $$

(20a)

$$\begin{array}{*{20}l} &\varphi_{1} \geq \ln(1+\frac{\alpha^{*}{\widetilde{P}_{S}}|h_{SE}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SE}|^{2}+1}), \end{array} $$

(20b)

$$\begin{array}{*{20}l} &\varphi_{2} \geq \ln \end{array} $$

(20c)

$$\begin{array}{*{20}l} &\quad-\ln(1+\frac{{\alpha^{*}}{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}+{\widetilde{P}_{J}}|h_{EU_{1}}|^{2}+1}), \end{array} $$

(20d)

$$\begin{array}{*{20}l} &\varphi_{3} \geq \gamma\ln(1+{\widetilde{P}_{D}} |h_{EU_{1}}|^{2}). \end{array} $$

(20e)

Proof

The proof of this Lemma is given in the Appendix □

5 NOMA power allocation algorithm

In order to suppress the attack probability efficiently in the secure game, S must adopt appropriate power allocation strategy. However, because of the complexity and variability of radio signals in the NOMA network, S can barely predict the channel state information and the work modes of E. For this reason, we propose a power allocation algorithm based on Q-learning. By incorporating the Monte Carlo and dynamic programming methods, Q-learning is regarded as one of the most effective algorithms in model-free reinforcement learning. Without knowing the state of the environment and its transition probability, the agent is constantly exploring the environment and making trial-and-error experiments. After many independent repetitive experiments and the average is obtained, the Q-learning-based agent will acquire the optimal strategy.

Based on above ideas, we propose the power allocation algorithm of NOMA for the secure game. In consideration of the inherent relation between S and E, the work mode of E determines the state of S; similarly, S can influence the environment of E by adjusting α. In the first step of the algorithm, we initialize the Q-table denoted by Q(m,α) which is used for updating the reward values of state-action pairs. For each experiment, E first selects a work mode randomly, which determines S to adopt an instantaneous α_t accordingly, where α_t denotes the power allocation factor at time t. It should be emphasized that we do not expect that S always selects the appropriate power allocation factor by searching in the Q-table. To avoid getting the local optimal solution, we use ε−greedy policy when S chooses a value of α. Specifically, S searches for the current optimal α in Q-table with probability ε, otherwise chooses a value in the range of [α_min,α_max] randomly. At this time slot, S transmits a signal with power α_tP_S and computes the system data rate as reward value R_S from the environment. Then, E changes the work mode from m to m_t+1 according to the system data rate. By incorporating the instantaneous reward value R_S and the accumulated experience in Q-table, the update process of Q-table presented by the authors in [33] can be formulated as:

$$\begin{array}{*{20}l} Q(m_{t}, \alpha_{t}){\leftarrow}Q(m_{t}, \alpha_{t})&{+}\zeta[R_{S}\\ &{+}\rho \max Q(m_{t+1}, \alpha)-Q(m_{t}, \alpha_{t})], \end{array} $$

(21)

where ζ∈(0,1] is the parameter to control the rate of learning. ρ∈[0,1] represents the proportion of accumulated experience. To solve the problem of not knowing the state transition probability, we repeat the experiment multiple times and compute the average reward value. After enough updates and repeated experiments, the Q-table converges to be optimal. From the optimal Q-table, S can obtain a learning-based optimal power allocation strategy. Algorithm 1 describes the learning process:

6 Results and discussion

In this section, we simulate the communication process to verify the effectiveness of the proposed algorithms. The links in the network experience the Rayleigh flat fading [34–37], and the nodes are equipped with a single antenna. We set the parameter as follows: $\{\nu ^{2}, \mu ^{2}, \lambda ^{2}\} = \{1.2, 0.5, 2\}, \varphi _{m=\{0, 1, 2, 3\}}=\{0, 1.8, 2.0, 2.1\}, \gamma = 0.6, \widetilde {P}_{J} = 2, \widetilde {P}_{D} = 2.1$. We set the power allocation factor α to vary from 0.6 to 0.9 with a change interval of 0.02, and β is set to a constant value 0.1. Specifically, we set 10,000 time slots for each experiment, and then, we repeat 5000 experiments to find the average.

Figure 2 reflects the variation of the average reward value of S and E from 0 to 10,000 time slots. From this figure, we can see that the average reward value of S and E both increases rapidly between 0 and 1000 time slots. In the subsequent process, the two curves rise slowly and reach their peak value at 3000 time slot point, respectively. Then, the two curves remain steady until the terminal of the experiment. In the learning-based algorithms, we expect agents to select specific actions to improve their long-term cumulative rewards, which is consistent with the experimental results.

The purpose of our proposed power allocation strategy is to improve the average data rate of the system, which is well reflected in Fig. 3. From 0 to 1000 time slot, the average system data rate dramatically grows from the initial value 0.76 to a temporary value 1.23. After that, the average system data rate continues to rise slowly until it converges to 1.31 at 3000 time slot point, and then keeps a steady level from 3000 to the terminal. The change trend of system data rate is basically consistent with the average reward value, which also proves that the increase of system data rate will bring more rewards to agents.

Figure 4 shows a dynamic programming process of average power allocation factor in the reinforcement learning process. As can be seen from the figure, the power allocation factor has a random initial value of 0.75. After the start of the experiment, the work mode of E begins to change, and S dynamically adjusts the power allocation factor according to the environment transformation. In the first 500 time slots, the average power allocation factor gradually decreases to a temporary value of 0.708. Between 500 and 4000, the average power allocation gradually increases and then remains stable around 0.737.

Figure 5 indicates the average attack probabilities of E versus the time slot varying from 0 to 10,000. We find that the average attack probabilities fall quickly from 0 to 1000. After that, the three curves decrease slowly and tend to converge gradually. The probability of eavesdropping drops from the initial value of 0.25 to the convergence value of 0.025, and the decline rate reaches 90%. The probability of jamming drops from the initial value of 0.26 to the convergence value of 0.02, and the decline rate is 92.3%. Similarly, the probability of deception drops from the initial value of 0.27 to the convergence value of 0.01; therefore, the decline rate is 96.2%. What is more, we simulate the average attack probabilities of the power allocation algorithm again with different parameters. We set the channel parameters as {ν²,μ²,λ²}={0.9,0.3,2}. That is to say, we assume that the cell-edge user u₁ is placed further away from S. Correspondingly, E is also further away from S. Compared with Fig. 5, Fig. 6 shows that the converged eavesdropping probability becomes lower; at the same time, the converged deception and jamming probabilities grow up 2% with the condition that the jamming and deception power are fixed. Alignment of Fig. 5 with Fig. 6 can find that the proposed policy performs well regardless of the location of cell-edge user and UAV.

7 Conclusions

In this paper, we investigated the cache-assisted physical-layer security of a NOMA communication network where there exists an intelligent attacker UAV nearby the cell-edge user. The UAV within the coverage of the network tries to reduce the system data rate of the NOMA network by flexibly switching a work mode among eavesdropping, jamming, deception, and keep silence. According to the NOMA protocol, the transmitter in the system has to allocate the total power to two users in a certain proportion. In that way, we need an immediate strategy to adjust the power allocation factor to suppress the attack motivation of the UAV. To tackle this problem, we proposed the power allocation strategy based on Q-learning to control the power allocation factor. From the simulation results, we can see that the proposed strategy can well adjust the power allocation factor in real time. Furthermore, we confirmed that this strategy has excellent performance in enhancing the system data rate and suppressing the attack probabilities. In the future works, we will apply the wireless caching technique[38–40] to the NOMA systems to further enhance the transmission reliability and security. In addition, we will consider some new materials [41–43] for enhancing the communication performance in the practical applications. Furthermore, some intelligent algorithms such as deep learning-based algorithms [44–47] will be applied into the considered system, in order to further enhance the network performance.

8 Appendix

Proof

: By substituting m=0 into (15), we have

$$\begin{array}{*{20}l} R_{S}(\alpha, 0)=\ln(1+\frac{\alpha{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}}{\beta{\widetilde{P}_{S}}|h_{SU_{1}}|^{2}+1})-\alpha \theta. \end{array} $$

(22)

We take the partial derivative of R_S(α,0) with respect to α and have

$$\begin{array}{*{20}l} \frac{\partial{R_{S} (\alpha, 0)}}{\partial{\alpha}}=\frac{{\widetilde{P}_{S} |h_{SU_{1}}|^{2}}}{(\alpha+\beta)\widetilde{P}_{S}|h_{SU_{1}}|^{2}+1} - \theta, \end{array} $$

(23)

by making further derivative, easy to find

$$ \frac{\partial{R_{S}^{2} (\alpha, 0)}}{\partial{\alpha}^{2}}=-\frac{{\widetilde{P}_{S}^{2} |h_{SU_{1}}|^{4}}}{[(\alpha+\beta)\widetilde{P}_{S}|h_{SU_{1}}|^{2}+1]^{2}} \leq 0, $$

(24)

showing that (22) is a convex function, i.e., ∂R_S(α,0)/∂α=0. So we substitute α=α^∗ into (23); thus, (19) holds on. To ensure that (23) acquires the maximum in the range of [α_min,α_max], let the following inequalities hold:

$$\begin{array}{*{20}l} &\frac{\partial{R_{S} (\alpha, 0)}}{\partial{\alpha}}|_{\alpha={\alpha_{\min}}}\,=\,\frac{{\widetilde{P}_{S} |h_{SU_{1}}|^{2}}}{(\alpha_{\min}+\beta)\widetilde{P}_{S}|h_{SU_{1}}|^{2}+1}\! - \theta\!>\!\!0, \end{array} $$

(25)

$$\begin{array}{*{20}l} &\frac{\partial{R_{S} (\alpha, 0)}}{\partial{\alpha}}\!|_{\alpha=\!{\alpha_{\max}}}\,=\,\frac{{\widetilde{P}_{S} |h_{SU_{1}}|^{2}}}{(\alpha_{\max}+\beta)\widetilde{P}_{S}|h_{SU_{1}}|^{2}+1}\! - \!\theta\!<\!\!0, \end{array} $$

(26)

i.e., (20a) holds. Therefore, (α^∗,0) satisfies (17). To ensure that (α^∗,0) satisfies (18), by substituting ((α^∗,0)) into (16), we let the following inequalities hold:

$$\begin{array}{*{20}l} &R_{E}(\alpha^{*}, 0)-R_{E}(\alpha^{*}, 1) \geq 0, \end{array} $$

(27a)

$$\begin{array}{*{20}l} &R_{E}(\alpha^{*}, 0)-R_{E}(\alpha^{*}, 2) \geq 0, \end{array} $$

(27b)

$$\begin{array}{*{20}l} &R_{E}(\alpha^{*}, 0)-R_{E}(\alpha^{*}, 3) \geq 0, \end{array} $$

(27c)

i.e., (20b)–(20d) hold. Therefore, (α^∗,0) also satisfies (18).

Above all, we prove the set of strategy (α^∗,0) meanwhile satisfies Eqs. (17) and (18), which is the strict definition of NE. With this, Lemma 1 is completely proved. □

Availability of data and materials

The authors state the data availability in this manuscript through the email to the corresponding author.

Abbreviations

NE:: Nash equilibrium
NOMA:: non-orthogonal multiple access
UAV:: unmanned aerial vehicle

References

J. Zhao, Q. Li, Y. Gong, Computation offloading and resource allocation for mobile edge computing with multiple access points. IET Commun.PP(99), 1–10 (2019).
Google Scholar
J. Yang, D. Ruan, J. Huang, X. Kang, Y. -Q. Shi, An embedding cost learning framework using gan. IEEE Trans. Inf. Forensic. Secur. PP(99), 1–10 (2019).
Article Google Scholar
B. Wang, F. Gao, S. Jin, H. Lin, G. Y. Li, Spatial- and frequency-wideband effects in millimeter-wave massive MIMO systems. IEEE Trans. Sig. Processing. 66(13), 3393–3406 (2018).
Article MathSciNet MATH Google Scholar
X. Hu, C. Zhong, X. Chen, W. Xu, Z. Zhang, Cluster grouping and power control for angle-domain mmwave mimo noma systems. IEEE J Sel. Top. Sig. Process.13(5), 1167–1180 (2019).
Article Google Scholar
L. Fan, N. Zhao, X. Lei, Q. Chen, N. Yang, G. K. Karagiannidis, Outage probability and optimal cache placement for multiple amplify-and-forward relay networks. IEEE Trans. Veh. Technol.67(12), 12373–12378 (2018).
Article Google Scholar
X. Lin, Probabilistic caching placement in uav-assisted heterogeneous wireless networks. Phys. Commun.33:, 54–61 (2019).
Article Google Scholar
F. Shi, Secure probabilistic caching in random multi-user multi-uav relay networks. Phys. Commun.32:, 31–40 (2019).
Article Google Scholar
C. Li, L. Peng, Z. Chao, S. Fan, J. Cioffi, L. Yang, Spectral-efficient cellular communications with coexistent one- and two-hop transmissions. IEEE Trans. Veh. Technol.65(8), 6765–6772 (2016).
Article Google Scholar
G. Gomez, F. J. Martin-Vega, F. J. Lopez-Martinez, Y. Liu, M. Elkashlan, G. Gomez, F. J. Martin-Vega, F. J. Lopez-Martinez, Y. Liu, M. Elkashlan, Uplink noma in large-scale systems: Coverage and physical layer security. CoRR. abs/1709.04693: (2017).
C. Zheng, H. Xin, X. Guo, T. Ristaniemi, H. Zhu, Secure and energy efficient resource allocation for wireless power enabled full-/half-duplex multiple-antenna relay systems. IEEE Trans. Veh. Technol.65(12), 11208–11219 (2017).
Google Scholar
X. Liang, C. Xie, M. Min, W. Zhuang, User-centric view of unmanned aerial vehicle transmission against smart attacks. IEEE Trans. Veh. Technol.67(4), 3420–3430 (2017).
Google Scholar
C. Li, S. Zhang, P. Liu, F. Sun, J. Cioffi, L. Yang, Overhearing protocol design exploiting inter-cell interference in cooperative greennetworks. IEEE Trans. Veh. Technol.65(1), 441–446 (2016).
Article Google Scholar
C. Li, H. J. Yang, S. Fan, J. Cioffi, L. Yang, Multi-user overhearing for cooperative two-way multi-antenna relays. IEEE Trans. Veh. Technol.65(5), 3796–3802 (2016).
Article Google Scholar
J. Xia, Secure cache-aided multi-relay networks in the presence of multiple eavesdroppers. IEEE Trans. Commun.PP(99), 1–10 (2019).
Article Google Scholar
C. Zheng, L. Lei, H. Zhang, T. Ristaniemi, H. Zhu, Energy-efficient and secure resource allocation for multiple-antenna noma with wireless power transfer. IEEE Trans. Green Commun. Netw.2(4), 1059–1071 (2018).
Article Google Scholar
Y. Li, L. Xiao, H. Dai, P. H. Vincent, in IEEE Int. Conf. Commun.Game theoretic study of protecting mimo transmissions against smart attacks, (2017), pp. 1–6.
C. Li, Y. Xu, Protecting secure communication under UAV smart attack with imperfect channel estimation. IEEE Access. 6(1), 76395–76401 (2018).
Article Google Scholar
Y. Xu, Q-learning based physical-layer secure game against multi-agent attacks. IEEE Access. 7:, 49212–49222 (2019).
Article Google Scholar
X. Liang, Y. Li, G. Han, H. Dai, H. V. Poor, A secure mobile crowdsensing game with deep reinforcement learning. IEEE Trans. Inf. Forensic. Secur. 13(1), 35–47 (2018).
Article Google Scholar
C. Li, S. Fan, J. M. Cioffi, L. Yang, Energy efficient mimo relay transmissions via joint power allocations. IEEE Trans. Circ. Syst. II Express Briefs. 61(7), 531–535 (2014).
Google Scholar
X. Liang, C. Xie, et al., A mobile offloading game against smart attacks. IEEE Access. 4:, 2281–2291 (2017).
Google Scholar
C. Li, W. Zhou, Enhanced secure transmission against intelligent attacks. IEEE Access. 7:, 53596–53602 (2019).
Article Google Scholar
X. Liang, T. Chen, C. Xie, H. Dai, V. Poor, Mobile crowdsensing games in vehicular networks. IEEE Trans. Veh. Technol.67(2), 1535–1545 (2018).
Article Google Scholar
X. Liang, Y. Li, C. Dai, H. Dai, H. V. Poor, Reinforcement learning-based noma power allocation in the presence of smart jamming. IEEE Trans. Veh. Technol.67(4), 3377–3389 (2018).
Article Google Scholar
A. G. Barto, Reinforcement learning. A Bradford Book. 15(7), 665–685 (1998).
Google Scholar
C. J. C. H. Watkins, P. Dayan, Technical note: Q-learning. Mach. Learn.8(3-4), 279–292 (1992).
Article MATH Google Scholar
X. Lin, MARL-based distributed cache placement for wireless networks. IEEE Access. 7:, 62606–62615 (2019).
Article Google Scholar
J. Zhao, A dual-link soft handover scheme for C/U plane split network in high-speed railway. IEEE Access. 6:, 12473–12482 (2018).
Article Google Scholar
H. Xie, F. Gao, S. Zhang, S. Jin, A unified transmission strategy for TDD/FDD massive MIMO systems with spatial basis expansion model. IEEE Trans. Veh. Technol.66(4), 3170–3184 (2017).
Article Google Scholar
X. Lai, Distributed secure switch-and-stay combining over correlated fading channels. IEEE Trans. Inf. Forensic. Secur. 14(8), 2088–2101 (2019).
Article Google Scholar
Z. Na, Y. Wang, Subcarrier allocation based simultaneous wireless information and power transfer algorithm in 5g cooperative OFDM communication systems. Phys. Commun.29:, 164–170 (2018).
Article Google Scholar
C. E. Shannon, Communication theory of secrecy systems. Bell Syst. Tech. J.28:, 656–715 (1948).
Article MathSciNet MATH Google Scholar
E. N. Barron, H. Ishii, The bellman equation for minimizing the maximum cost. Nonlinear Anal. Theory Methods Appl.13(9), 1067–1090 (1989).
Article MathSciNet MATH Google Scholar
Z. Na, J. Lv, M. Zhang, M. Xiong, GFDM based wireless powered communication for cooperative relay system. IEEE Access. 7:, 50971–50979 (2019).
Article Google Scholar
X. Lai, W. Zou, DF relaying networks with randomly distributed interferers. IEEE Access. 5:, 18909–18917 (2017).
Article Google Scholar
J. Zhao, J. Liu, Y. Nie, S. Ni, Location-assisted beam alignment for train-to-train communication in urban rail transit system. IEEE Access. 7:, 80133–80145 (2019).
Article Google Scholar
J. Xia, Cache-aided mobile edge computing for b5g wireless communication networks. EURASIP J. Wirel. Commun. Netw.PP(99), 1–5 (2019).
Google Scholar
J. Xia, When distributed switch-and-stay combining meets buffer in IoT relaying networks. Phys. Commun.PP:, 1–9 (2019).
Google Scholar
S. Lai, Intelligent secure communication for cognitive networks with multiple primary transmit power. IEEE Access. PP(99), 1–7 (2019).
Google Scholar
J. Zhao, Q. Li, Y. Gong, K. Zhang, Computation offloading and resource allocation for cloud assisted mobile edge computing in vehicular networks. IEEE Trans. Veh. Technol.68(8), 7944–7956 (2019).
Article Google Scholar
J. Yang, Inverse optimization of building thermal resistance and capacitance for minimizing air conditioning loads. Renew. Energy. PP:, 1–10 (2020).
Google Scholar
H. Huang, Optimum insulation thicknesses and energy conservation of building thermal insulation materials in chinese zone of humid subtropical climate. Renew. Energy. 52:, 101840 (2020).
Google Scholar
J. Yang, Numerical and experimental study on the thermal performance of aerogel insulating panels for building energy efficiency. Renew. Energy. 138:, 445–457 (2019).
Article Google Scholar
G. Liu, Deep learning based channel prediction for edge computing networks towards intelligent connected vehicles. IEEE Access. 7:, 114487–114495 (2019).
Article Google Scholar
Z. Zhao, A novel framework of three-hierarchical offloading optimization for mec in industrial IoT networks. IEEE Trans. Ind. Inform.PP(99), 1–12 (2019).
Google Scholar
J. Xia, Intelligent secure communication for internet of things with statistical channel state information of attacker. IEEE Access. 7(1), 144481–144488 (2019).
Article Google Scholar
K. He, A MIMO detector with deep learning in the presence of correlated interference. IEEE Trans. Veh. Technol.PP(99), 1–5 (2019).
Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by National Natural Science Foundation of China 397 under Grant 61871139, by the Science and Technology Program of Guangzhou under Grant 201807010103, by the Natural Science Foundation of Guangdong Province under Grant 2018A030313736, by the Scientific Research Project of Education Department of Guangdong, China under Grant 2017GKTSCX045, by the Science and Technology Program of Guangzhou, China under Grant 201707010389, and by the Project of Technology Development Foundation of Guangdong under Grant 706049150203.

Author information

Authors and Affiliations

The School of Computer Science, Guangzhou University, Guangzhou, China
Chao Li, Junjuan Xia & Liseng Fan
The Research Center of Institute of Telecommunication Satellite, China Academy of Space Technology, Beijing, China
Zihe Gao
Guangzhou Panyu Polytechnic, Guangzhou, China
Dan Deng

Authors

Chao Li
View author publications
You can also search for this author in PubMed Google Scholar
Zihe Gao
View author publications
You can also search for this author in PubMed Google Scholar
Junjuan Xia
View author publications
You can also search for this author in PubMed Google Scholar
Dan Deng
View author publications
You can also search for this author in PubMed Google Scholar
Liseng Fan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LC deduced the formulas and made the simulation experiments. ZG analyzed the communication scenarios and modeled the network of this paper. JX presented the reinforcement learning algorithm in this work. DD embellished the language of this manuscript. FL improved the presentation of figure style in this work and enhanced the novelty. All the authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Junjuan Xia, Dan Deng or Liseng Fan.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Li, C., Gao, Z., Xia, J. et al. Cache-enabled physical-layer secure game against smart uAV-assisted attacks in b5G NOMA networks. J Wireless Com Network 2020, 7 (2020). https://doi.org/10.1186/s13638-019-1595-x

Download citation

Received: 20 September 2019
Accepted: 06 November 2019
Published: 06 January 2020
DOI: https://doi.org/10.1186/s13638-019-1595-x

Cache-enabled physical-layer secure game against smart uAV-assisted attacks in b5G NOMA networks

Abstract

1 Introduction

2 Methods/experimental

3 NOMA networks

4 Secure game in NOMA network

Lemma 1

Proof

5 NOMA power allocation algorithm

6 Results and discussion

7 Conclusions

8 Appendix

Proof

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords