 Research
 Open Access
 Published:
Cacheenabled physicallayer secure game against smart uAVassisted attacks in b5G NOMA networks
EURASIP Journal on Wireless Communications and Networking volume 2020, Article number: 7 (2020)
Abstract
This paper investigates cacheenabled physicallayer secure communication in a noorthogonal multiple access (NOMA) network with two users, where an intelligent unmanned aerial vehicle (UAV) is equipped with attack module which can perform as multiple attack modes. We present a power allocation strategy to enhance the transmission security. To this end, we propose an algorithm which can adaptively control the power allocation factor for the source station in NOMA network based on reinforcement learning. The interaction between the source station and UAV is regarded as a dynamic game. In the process of the game, the source station adjusts the power allocation factor appropriately according to the current work mode of the attack module on UAV. To maximize the benefit value, the source station keeps exploring the changing radio environment until the Nash equilibrium (NE) is reached. Moreover, the proof of the NE is given to verify the strategy we proposed is optimal. Simulation results prove the effectiveness of the strategy.
Introduction
In recent years, ultrareliable and lowlatency have been a very important requirement for supporting the wireless services for the B5G wireless communications [1–4]. To support this requirement, caching technique can prestore the wireless data during nonpeak traffic time and hence reduce the load traffic significantly [5–8]. In addition, nonorthogonal multiple access (NOMA) can provide much higher capacity and spectrum efficiency than that of orthogonal multiple access, and hence, it is one of the most promising candidate for supporting ultrareliable and lowlatency services. Moreover, NOMA protocol enables the source station to allocate the same spectrum and time resource to multiple users with powerdomain multiplexing. In particular, NOMA protocol can serve different kinds of users, and it can flexibly support ultrareliable and lowlatency services for both far and near users.
Although NOMA technology can provide a reliable performance in enhancing wireless transmission, its transmission security is threatened by the eavesdroppers due to the broadcasting nature of wireless communications [9–13]. The authors in [14] have studied the protection of physicallayer security and proposed strategies for wireless communication networks which have been confirmed to perform efficiently. In [15], the authors studied the antenna selection algorithm to protect physicallayer security in NOMA network with an eavesdropper. However, the conventional strategies for protecting the physicallayer security in NOMA system work well, only when the attacker just has one work mode. Intelligent attacker with multiple work mode is proposed in [16–20] to reduce the data rate of communication systems by freely switching between eavesdropping, jamming, deception, and silent. If the networks continue to adopt the conventional strategies, the intelligent attacks will not be suppressed.
To tackle this problem, the authors in [21–24] proposed a transmission policy based on reinforcement learning. As a special branch of artificial intelligence, the reinforcement learning proposed in [25] can be regarded as a Markov decisionmaking process. The agent trained by reinforcement learning can decide the action to be executed according to the environment state at the current moment, and maximize the longterm cumulative rewards to obtain the optimal action set. However, the state transition probability is generally unknowable for the agent. The Qlearning is proposed in [26] to solve the problem. Combining dynamic programming with the Monte Carlo method, Qlearning can make the agent learn optimal strategies without knowing the state transition probability. As far as we know, no previous work has used the Qlearning algorithm to protect secure transmission in the NOMA system, which is threatened by the intelligent attacker.
Due to mobility and ease of deployment, unmanned aerial vehicles (UAVs) have arisen as a new type of communication nodes in the wireless networks, for example, the UAVs can perform as a relay or base station under extreme natural conditions. However, a UAV can be a mobile intelligent attacker if it is equipped with attack module. In this paper, we investigate a NOMA network with two users in the presence of an UAV attacker which can execute multiple attack modes. The source station sends the composite signals to two users at the same time; therefore, the total transmit power is divided into two parts. We dynamically allocate the proportions of transmit power to confront the intelligent attacker. In the wireless communication process, it is hard to know the work mode transition probability of intelligent attacker. As a modelfree learning method without depending on the state transition probability, the Qlearning is adopted to obtain a learningbased adaptive policy. Furthermore, we formulate the confrontation between the source station and intelligent attacker as a dynamic game, and we derive the Nash equilibrium (NE) of the dynamic game. Simulation results show that the strategy we proposed significantly improved the data rate of NOMA system.
Methods/experimental
Consider one cacheenabled source station S can prestore a certain amount of information. There exists one celledge user U_{1} and one central user U_{2} in the coverage of S, where U_{2} is closer to S than U_{1}. When the request signals from users are received, S transmits cached messages based on NOMA protocol to users. Furthermore, there exists a UAV which performs as an intelligent attacker E in this area. We suppose that the UAV is more likely to attack celledge user U_{1}, and the UAV remains in the same position when attacking. Programmable radio equipment on E can flexibly select to overheard information from S, send jamming or deception signals to U_{1}, or keep silent. We denote these four work modes of E as m=0,1,2, and 3, respectively. In the experiment, the purpose of E is to attempt to decrease the system data rate and reduce the correctness of user decoding. For simplicity, all the devices in this experiment are equipped with single antenna.
NOMA networks
Now, we depict the NOMA network system model which is shown in Fig. 1. We suppose that S transmits a composite signal consisting of x_{1} and x_{2}, which contains messages requested by U_{1} and U_{2}, respectively. According to NOMA protocol, S divides the total transmit power P_{S} into two portions, i.e., αP_{S} and βP_{S}, where α and β are the power allocation factors for x_{1} and x_{2}, respectively. In order to satisfy the requirements of different transmission distance, the two factors αP_{S} and βP_{S} have to meet the following constraint conditions:
In order to fight against the intelligent UAV attacker E, S works on improving system data rate by consciously changing its power allocation factor α. For the first step of the transmission process, S chooses a value for the power allocation factor α to transmit the mixture signal x_{1},x_{2}, and then, the received signal at U_{1} denoted by \(y_{_{U_{1}}}\) can be given as:
where \(h_{SU_{1}}\sim \mathcal {CN}(0,{\nu }^{2})\) is the instantaneous channel coefficient of S−U_{1} link. \(n_{_{U_{1}}}{\sim }\mathcal {CN}(0,{\sigma }^{2})\) represents the additive white Gaussian noise (AWGN) received at U_{1} [27–30]. The resultant SINR for x_{1} at U_{1} can be written as:
when m=0 holds, i.e., E shuts down radio equipment and stays silent. In this case, the achievable rates of x_{1} at U_{1} denoted by \(C_{_{U_{1}}}\) is exactly the system data rate C_{sys,0}. Thus, the system data rate is acquired by [31]:
where \(\widetilde {P}_{S} = P_{S}/{\sigma }^{2}\). When m=1 holds, E executes to overhear information from S; the received signal at E can be given as:
we assume that perfect SIC receiver is applied at E; thus, according to [32], the achievable rate of x_{1} at E denoted by \(C_{_{E}}\) can be written as:
where \(h_{SE}{\sim }\mathcal {CN}(0,{\mu }^{2})\) is the instantaneous channel coefficient of S−E link. \(n_{_{E}}{\sim }\mathcal {CN}(0,{\sigma }^{2})\) represents AWGN received at E. Consequently, according to [17], the system data rate C_{sys,1} can be computed by:
where [X]^{+} returns X if X is positive, while returns 0 otherwise. When m=2 holds, E selects to transmit a jamming signal to U_{1}; the received signal \(y_{_{U_{1}}}\) at U_{1} can be acquired by:
where \(h_{EU_{1}}{\sim }\mathcal {CN} (0, {\lambda }^{2})\) is the instantaneous channel coefficient of E−U_{1} link. P_{J} is the jamming power of E, and \(x_{_{J}}\) represents the jamming signal transmitted by E. Therefore, in this case, the system data rate C_{sys,2} can be computed by:
where \(\widetilde {P}_{J} = P_{J}/{\sigma }^{2}\). When m=3 holds, S does not send signal to U_{1} while E transmits the deception signal \(x_{_{D}}\). The received signal at U_{1} becomes:
where P_{D} is the deception power. The increase of the deception signal received by U_{1} is bound to cause more loss in the achievable rate at U_{1}. Thus, the system data rate C_{sys,3} can be formulated as a linear function and given by:
where \(\widetilde {P}_{D}=P_{J}/{\sigma }^{2}\). γ∈(0,1) is the deception factor which quantifies the probability of the influence of each deception signal.
Secure game in NOMA network
The interaction between S and E in the NOMA network performs in a rivalry way, which is formulated as a secure game. To discuss the process of the secure game, we need to first quantify the variety range of α. While ensuring that U_{1} can decode the received information correctly, we must also ensure that U_{2} can correctly decode x_{2}. We denote the minimum data rate requirement for U_{1} and U_{2} as \(C_{\min }^{U_{1}}\) and \(C_{\min }^{U_{2}}\). Thus, α and β satisfy the following constraint:
according to (1), the threshold value of α is given by:
where α_{max} and α_{min} are the maximum power allocation factor for x_{1}. We now turn to discuss the process of the secure game. S is adaptively adjusting its power allocation factor in the range of [α_{min},α_{max}], while E selects to execute an attack modes m∈{0,1,2,3}, which represents keeping silent, eavesdropping, jamming, or deception, respectively. In each time slot, E attempts to reduce the system data rate, i.e., C_{sys,1},C_{sys,2}, or C_{sys,3}. S devotes to increase the system data rate by controlling α and meanwhile suppressing the probability of attacking. In view of this, we regard the confrontation between S and E as a zerosum game. Depending on the system data rate and power consumption, the reward function of S denoted by R_{S} in the zerosum game is formulated as:
where θ is the total power consumption. We introduce coefficient ln2 to simplify the subsequent derivation process. According to the distinguishing feature of zerosum game, the reward function of E denoted by R_{E} is defined as:
where φ_{m=0,1,2,3} denotes the consumption of E in mode m. In the secure game, S tries to find an optimal power allocation factor in [α_{min},α_{max}] to maximize R_{S}, and E is dynamically adjusting its work modes to maximize R_{E}. The purpose of the game between S and E is to achieve their own optimal strategies α^{∗} and m^{∗}, respectively. Then, we define the set of strategies {α^{∗},m^{∗}} as the Nash equilibrium (NE) of the secure game, where S and E gain the maximize reward value. Thus, the NE strategy is given by:
Through analytical derivation, we obtain one NE solution {α^{∗},0}. That is to say, if S keeps choosing a power allocation factor α^{∗}, E will obtain the maximized reward value by keeping silent, and it has no motivation to execute any attack modes. Specifically, the NE solution is given and proved in the following Lemma 1 and Proof.
Lemma 1
: The secure game in the NOMA network has one NE solution {α^{∗},0}, which is acquired by
if the following constraints are met:
Proof
The proof of this Lemma is given in the Appendix □
NOMA power allocation algorithm
In order to suppress the attack probability efficiently in the secure game, S must adopt appropriate power allocation strategy. However, because of the complexity and variability of radio signals in the NOMA network, S can barely predict the channel state information and the work modes of E. For this reason, we propose a power allocation algorithm based on Qlearning. By incorporating the Monte Carlo and dynamic programming methods, Qlearning is regarded as one of the most effective algorithms in modelfree reinforcement learning. Without knowing the state of the environment and its transition probability, the agent is constantly exploring the environment and making trialanderror experiments. After many independent repetitive experiments and the average is obtained, the Qlearningbased agent will acquire the optimal strategy.
Based on above ideas, we propose the power allocation algorithm of NOMA for the secure game. In consideration of the inherent relation between S and E, the work mode of E determines the state of S; similarly, S can influence the environment of E by adjusting α. In the first step of the algorithm, we initialize the Qtable denoted by Q(m,α) which is used for updating the reward values of stateaction pairs. For each experiment, E first selects a work mode randomly, which determines S to adopt an instantaneous α_{t} accordingly, where α_{t} denotes the power allocation factor at time t. It should be emphasized that we do not expect that S always selects the appropriate power allocation factor by searching in the Qtable. To avoid getting the local optimal solution, we use ε−greedy policy when S chooses a value of α. Specifically, S searches for the current optimal α in Qtable with probability ε, otherwise chooses a value in the range of [α_{min},α_{max}] randomly. At this time slot, S transmits a signal with power α_{t}P_{S} and computes the system data rate as reward value R_{S} from the environment. Then, E changes the work mode from m to m_{t+1} according to the system data rate. By incorporating the instantaneous reward value R_{S} and the accumulated experience in Qtable, the update process of Qtable presented by the authors in [33] can be formulated as:
where ζ∈(0,1] is the parameter to control the rate of learning. ρ∈[0,1] represents the proportion of accumulated experience. To solve the problem of not knowing the state transition probability, we repeat the experiment multiple times and compute the average reward value. After enough updates and repeated experiments, the Qtable converges to be optimal. From the optimal Qtable, S can obtain a learningbased optimal power allocation strategy. Algorithm 1 describes the learning process:
Results and discussion
In this section, we simulate the communication process to verify the effectiveness of the proposed algorithms. The links in the network experience the Rayleigh flat fading [34–37], and the nodes are equipped with a single antenna. We set the parameter as follows: \(\{\nu ^{2}, \mu ^{2}, \lambda ^{2}\} = \{1.2, 0.5, 2\}, \varphi _{m=\{0, 1, 2, 3\}}=\{0, 1.8, 2.0, 2.1\}, \gamma = 0.6, \widetilde {P}_{J} = 2, \widetilde {P}_{D} = 2.1\). We set the power allocation factor α to vary from 0.6 to 0.9 with a change interval of 0.02, and β is set to a constant value 0.1. Specifically, we set 10,000 time slots for each experiment, and then, we repeat 5000 experiments to find the average.
Figure 2 reflects the variation of the average reward value of S and E from 0 to 10,000 time slots. From this figure, we can see that the average reward value of S and E both increases rapidly between 0 and 1000 time slots. In the subsequent process, the two curves rise slowly and reach their peak value at 3000 time slot point, respectively. Then, the two curves remain steady until the terminal of the experiment. In the learningbased algorithms, we expect agents to select specific actions to improve their longterm cumulative rewards, which is consistent with the experimental results.
The purpose of our proposed power allocation strategy is to improve the average data rate of the system, which is well reflected in Fig. 3. From 0 to 1000 time slot, the average system data rate dramatically grows from the initial value 0.76 to a temporary value 1.23. After that, the average system data rate continues to rise slowly until it converges to 1.31 at 3000 time slot point, and then keeps a steady level from 3000 to the terminal. The change trend of system data rate is basically consistent with the average reward value, which also proves that the increase of system data rate will bring more rewards to agents.
Figure 4 shows a dynamic programming process of average power allocation factor in the reinforcement learning process. As can be seen from the figure, the power allocation factor has a random initial value of 0.75. After the start of the experiment, the work mode of E begins to change, and S dynamically adjusts the power allocation factor according to the environment transformation. In the first 500 time slots, the average power allocation factor gradually decreases to a temporary value of 0.708. Between 500 and 4000, the average power allocation gradually increases and then remains stable around 0.737.
Figure 5 indicates the average attack probabilities of E versus the time slot varying from 0 to 10,000. We find that the average attack probabilities fall quickly from 0 to 1000. After that, the three curves decrease slowly and tend to converge gradually. The probability of eavesdropping drops from the initial value of 0.25 to the convergence value of 0.025, and the decline rate reaches 90%. The probability of jamming drops from the initial value of 0.26 to the convergence value of 0.02, and the decline rate is 92.3%. Similarly, the probability of deception drops from the initial value of 0.27 to the convergence value of 0.01; therefore, the decline rate is 96.2%. What is more, we simulate the average attack probabilities of the power allocation algorithm again with different parameters. We set the channel parameters as {ν^{2},μ^{2},λ^{2}}={0.9,0.3,2}. That is to say, we assume that the celledge user u_{1} is placed further away from S. Correspondingly, E is also further away from S. Compared with Fig. 5, Fig. 6 shows that the converged eavesdropping probability becomes lower; at the same time, the converged deception and jamming probabilities grow up 2% with the condition that the jamming and deception power are fixed. Alignment of Fig. 5 with Fig. 6 can find that the proposed policy performs well regardless of the location of celledge user and UAV.
Conclusions
In this paper, we investigated the cacheassisted physicallayer security of a NOMA communication network where there exists an intelligent attacker UAV nearby the celledge user. The UAV within the coverage of the network tries to reduce the system data rate of the NOMA network by flexibly switching a work mode among eavesdropping, jamming, deception, and keep silence. According to the NOMA protocol, the transmitter in the system has to allocate the total power to two users in a certain proportion. In that way, we need an immediate strategy to adjust the power allocation factor to suppress the attack motivation of the UAV. To tackle this problem, we proposed the power allocation strategy based on Qlearning to control the power allocation factor. From the simulation results, we can see that the proposed strategy can well adjust the power allocation factor in real time. Furthermore, we confirmed that this strategy has excellent performance in enhancing the system data rate and suppressing the attack probabilities. In the future works, we will apply the wireless caching technique[38–40] to the NOMA systems to further enhance the transmission reliability and security. In addition, we will consider some new materials [41–43] for enhancing the communication performance in the practical applications. Furthermore, some intelligent algorithms such as deep learningbased algorithms [44–47] will be applied into the considered system, in order to further enhance the network performance.
Appendix
Proof
: By substituting m=0 into (15), we have
We take the partial derivative of R_{S}(α,0) with respect to α and have
by making further derivative, easy to find
showing that (22) is a convex function, i.e., ∂R_{S}(α,0)/∂α=0. So we substitute α=α^{∗} into (23); thus, (19) holds on. To ensure that (23) acquires the maximum in the range of [α_{min},α_{max}], let the following inequalities hold:
i.e., (20a) holds. Therefore, (α^{∗},0) satisfies (17). To ensure that (α^{∗},0) satisfies (18), by substituting ((α^{∗},0)) into (16), we let the following inequalities hold:
i.e., (20b)–(20d) hold. Therefore, (α^{∗},0) also satisfies (18).
Above all, we prove the set of strategy (α^{∗},0) meanwhile satisfies Eqs. (17) and (18), which is the strict definition of NE. With this, Lemma 1 is completely proved. □
Availability of data and materials
The authors state the data availability in this manuscript through the email to the corresponding author.
Abbreviations
 NE:

Nash equilibrium
 NOMA:

nonorthogonal multiple access
 UAV:

unmanned aerial vehicle
References
 1
J. Zhao, Q. Li, Y. Gong, Computation offloading and resource allocation for mobile edge computing with multiple access points. IET Commun.PP(99), 1–10 (2019).
 2
J. Yang, D. Ruan, J. Huang, X. Kang, Y. Q. Shi, An embedding cost learning framework using gan. IEEE Trans. Inf. Forensic. Secur. PP(99), 1–10 (2019).
 3
B. Wang, F. Gao, S. Jin, H. Lin, G. Y. Li, Spatial and frequencywideband effects in millimeterwave massive MIMO systems. IEEE Trans. Sig. Processing. 66(13), 3393–3406 (2018).
 4
X. Hu, C. Zhong, X. Chen, W. Xu, Z. Zhang, Cluster grouping and power control for angledomain mmwave mimo noma systems. IEEE J Sel. Top. Sig. Process.13(5), 1167–1180 (2019).
 5
L. Fan, N. Zhao, X. Lei, Q. Chen, N. Yang, G. K. Karagiannidis, Outage probability and optimal cache placement for multiple amplifyandforward relay networks. IEEE Trans. Veh. Technol.67(12), 12373–12378 (2018).
 6
X. Lin, Probabilistic caching placement in uavassisted heterogeneous wireless networks. Phys. Commun.33:, 54–61 (2019).
 7
F. Shi, Secure probabilistic caching in random multiuser multiuav relay networks. Phys. Commun.32:, 31–40 (2019).
 8
C. Li, L. Peng, Z. Chao, S. Fan, J. Cioffi, L. Yang, Spectralefficient cellular communications with coexistent one and twohop transmissions. IEEE Trans. Veh. Technol.65(8), 6765–6772 (2016).
 9
G. Gomez, F. J. MartinVega, F. J. LopezMartinez, Y. Liu, M. Elkashlan, G. Gomez, F. J. MartinVega, F. J. LopezMartinez, Y. Liu, M. Elkashlan, Uplink noma in largescale systems: Coverage and physical layer security. CoRR. abs/1709.04693: (2017).
 10
C. Zheng, H. Xin, X. Guo, T. Ristaniemi, H. Zhu, Secure and energy efficient resource allocation for wireless power enabled full/halfduplex multipleantenna relay systems. IEEE Trans. Veh. Technol.65(12), 11208–11219 (2017).
 11
X. Liang, C. Xie, M. Min, W. Zhuang, Usercentric view of unmanned aerial vehicle transmission against smart attacks. IEEE Trans. Veh. Technol.67(4), 3420–3430 (2017).
 12
C. Li, S. Zhang, P. Liu, F. Sun, J. Cioffi, L. Yang, Overhearing protocol design exploiting intercell interference in cooperative greennetworks. IEEE Trans. Veh. Technol.65(1), 441–446 (2016).
 13
C. Li, H. J. Yang, S. Fan, J. Cioffi, L. Yang, Multiuser overhearing for cooperative twoway multiantenna relays. IEEE Trans. Veh. Technol.65(5), 3796–3802 (2016).
 14
J. Xia, Secure cacheaided multirelay networks in the presence of multiple eavesdroppers. IEEE Trans. Commun.PP(99), 1–10 (2019).
 15
C. Zheng, L. Lei, H. Zhang, T. Ristaniemi, H. Zhu, Energyefficient and secure resource allocation for multipleantenna noma with wireless power transfer. IEEE Trans. Green Commun. Netw.2(4), 1059–1071 (2018).
 16
Y. Li, L. Xiao, H. Dai, P. H. Vincent, in IEEE Int. Conf. Commun.Game theoretic study of protecting mimo transmissions against smart attacks, (2017), pp. 1–6.
 17
C. Li, Y. Xu, Protecting secure communication under UAV smart attack with imperfect channel estimation. IEEE Access. 6(1), 76395–76401 (2018).
 18
Y. Xu, Qlearning based physicallayer secure game against multiagent attacks. IEEE Access. 7:, 49212–49222 (2019).
 19
X. Liang, Y. Li, G. Han, H. Dai, H. V. Poor, A secure mobile crowdsensing game with deep reinforcement learning. IEEE Trans. Inf. Forensic. Secur. 13(1), 35–47 (2018).
 20
C. Li, S. Fan, J. M. Cioffi, L. Yang, Energy efficient mimo relay transmissions via joint power allocations. IEEE Trans. Circ. Syst. II Express Briefs. 61(7), 531–535 (2014).
 21
X. Liang, C. Xie, et al., A mobile offloading game against smart attacks. IEEE Access. 4:, 2281–2291 (2017).
 22
C. Li, W. Zhou, Enhanced secure transmission against intelligent attacks. IEEE Access. 7:, 53596–53602 (2019).
 23
X. Liang, T. Chen, C. Xie, H. Dai, V. Poor, Mobile crowdsensing games in vehicular networks. IEEE Trans. Veh. Technol.67(2), 1535–1545 (2018).
 24
X. Liang, Y. Li, C. Dai, H. Dai, H. V. Poor, Reinforcement learningbased noma power allocation in the presence of smart jamming. IEEE Trans. Veh. Technol.67(4), 3377–3389 (2018).
 25
A. G. Barto, Reinforcement learning. A Bradford Book. 15(7), 665–685 (1998).
 26
C. J. C. H. Watkins, P. Dayan, Technical note: Qlearning. Mach. Learn.8(34), 279–292 (1992).
 27
X. Lin, MARLbased distributed cache placement for wireless networks. IEEE Access. 7:, 62606–62615 (2019).
 28
J. Zhao, A duallink soft handover scheme for C/U plane split network in highspeed railway. IEEE Access. 6:, 12473–12482 (2018).
 29
H. Xie, F. Gao, S. Zhang, S. Jin, A unified transmission strategy for TDD/FDD massive MIMO systems with spatial basis expansion model. IEEE Trans. Veh. Technol.66(4), 3170–3184 (2017).
 30
X. Lai, Distributed secure switchandstay combining over correlated fading channels. IEEE Trans. Inf. Forensic. Secur. 14(8), 2088–2101 (2019).
 31
Z. Na, Y. Wang, Subcarrier allocation based simultaneous wireless information and power transfer algorithm in 5g cooperative OFDM communication systems. Phys. Commun.29:, 164–170 (2018).
 32
C. E. Shannon, Communication theory of secrecy systems. Bell Syst. Tech. J.28:, 656–715 (1948).
 33
E. N. Barron, H. Ishii, The bellman equation for minimizing the maximum cost. Nonlinear Anal. Theory Methods Appl.13(9), 1067–1090 (1989).
 34
Z. Na, J. Lv, M. Zhang, M. Xiong, GFDM based wireless powered communication for cooperative relay system. IEEE Access. 7:, 50971–50979 (2019).
 35
X. Lai, W. Zou, DF relaying networks with randomly distributed interferers. IEEE Access. 5:, 18909–18917 (2017).
 36
J. Zhao, J. Liu, Y. Nie, S. Ni, Locationassisted beam alignment for traintotrain communication in urban rail transit system. IEEE Access. 7:, 80133–80145 (2019).
 37
J. Xia, Cacheaided mobile edge computing for b5g wireless communication networks. EURASIP J. Wirel. Commun. Netw.PP(99), 1–5 (2019).
 38
J. Xia, When distributed switchandstay combining meets buffer in IoT relaying networks. Phys. Commun.PP:, 1–9 (2019).
 39
S. Lai, Intelligent secure communication for cognitive networks with multiple primary transmit power. IEEE Access. PP(99), 1–7 (2019).
 40
J. Zhao, Q. Li, Y. Gong, K. Zhang, Computation offloading and resource allocation for cloud assisted mobile edge computing in vehicular networks. IEEE Trans. Veh. Technol.68(8), 7944–7956 (2019).
 41
J. Yang, Inverse optimization of building thermal resistance and capacitance for minimizing air conditioning loads. Renew. Energy. PP:, 1–10 (2020).
 42
H. Huang, Optimum insulation thicknesses and energy conservation of building thermal insulation materials in chinese zone of humid subtropical climate. Renew. Energy. 52:, 101840 (2020).
 43
J. Yang, Numerical and experimental study on the thermal performance of aerogel insulating panels for building energy efficiency. Renew. Energy. 138:, 445–457 (2019).
 44
G. Liu, Deep learning based channel prediction for edge computing networks towards intelligent connected vehicles. IEEE Access. 7:, 114487–114495 (2019).
 45
Z. Zhao, A novel framework of threehierarchical offloading optimization for mec in industrial IoT networks. IEEE Trans. Ind. Inform.PP(99), 1–12 (2019).
 46
J. Xia, Intelligent secure communication for internet of things with statistical channel state information of attacker. IEEE Access. 7(1), 144481–144488 (2019).
 47
K. He, A MIMO detector with deep learning in the presence of correlated interference. IEEE Trans. Veh. Technol.PP(99), 1–5 (2019).
Acknowledgements
Not applicable.
Funding
This work was supported by National Natural Science Foundation of China 397 under Grant 61871139, by the Science and Technology Program of Guangzhou under Grant 201807010103, by the Natural Science Foundation of Guangdong Province under Grant 2018A030313736, by the Scientific Research Project of Education Department of Guangdong, China under Grant 2017GKTSCX045, by the Science and Technology Program of Guangzhou, China under Grant 201707010389, and by the Project of Technology Development Foundation of Guangdong under Grant 706049150203.
Author information
Affiliations
Contributions
LC deduced the formulas and made the simulation experiments. ZG analyzed the communication scenarios and modeled the network of this paper. JX presented the reinforcement learning algorithm in this work. DD embellished the language of this manuscript. FL improved the presentation of figure style in this work and enhanced the novelty. All the authors have read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Li, C., Gao, Z., Xia, J. et al. Cacheenabled physicallayer secure game against smart uAVassisted attacks in b5G NOMA networks. J Wireless Com Network 2020, 7 (2020). https://doi.org/10.1186/s136380191595x
Received:
Accepted:
Published:
Keywords
 Cache
 UAV
 B5G
 NOMA
 Physicallayer security
 Reinforcement learning