An intelligent wireless channel allocation in HAPS 5G communication system based on reinforcement learning

Guan, Mingxiang; Wu, Zhou; Cui, Yingjie; Cao, Xuemei; Wang, Le; Ye, Jianfeng; Peng, Bao

doi:10.1186/s13638-019-1463-8

Research
Open access
Published: 28 May 2019

An intelligent wireless channel allocation in HAPS 5G communication system based on reinforcement learning

Mingxiang Guan¹,
Zhou Wu¹,
Yingjie Cui¹,
Xuemei Cao¹,
Le Wang¹,
Jianfeng Ye¹ &
…
Bao Peng¹

EURASIP Journal on Wireless Communications and Networking volume 2019, Article number: 138 (2019) Cite this article

3973 Accesses
14 Citations
3 Altmetric
Metrics details

Abstract

Channel allocation is the prerequisite for the HAPS (high-altitude platform station) 5G communication network to transmit information. An intelligent wireless channel allocation algorithm for HAPS 5G massive MIMO (multiple-input multiple-output) system based on reinforcement learning was proposed. Q-learning reinforcement learning algorithm and the back-propagation neural network were combined, which enabled HAPS 5G massive MIMO systems autonomous learn according to the environment and allocate channel resources of the system efficiently. Agents perceive the state information in the channel environment through continuous interaction with the channel environment, learn from the environment state to the action mapping, and use the back-propagation neural network for learning training, use the neural network instead of the Q value table, and train the network with each Q update as a training example to update the evaluation function. It effectively improves the overall performance of the system.

1 Introduction

Nowadays, with the unprecedented development of the world aviation communication technology and wireless network technology, the high-altitude platform information network can provide anytime and anywhere communication services not only for fixed communication terminal, but also for board, ship, and HAPS 5G and personal mobile communication terminals, as well as access to the world’s largest Internet and the Fifth Generation (5G) communication networks to form fly-over crossing and seamless communication network between space, air, sea, and ground. While 4G is still in progress, 5G has become a hot spot. In the 5G era, the development of the network society will bring about a surge in mobile and wireless traffic with the popularization of core services such as electronic banking and electronic teaching and electronic medical services. This means that data traffic will increase more 1000 times than in 2010 by 2020. In the 4G era, 5 billion items are connected to each other all over the world. But in the 5G era, more than 100 billion items are connected to each other. Most consumer products and industrial products and logistics and so on can be connected to the network [1,2,3,4]. 5G Internet of Things will also be combined with cloud computing and big data technology to make the whole society fully intelligent. 5G is not only the evolution of communication technology, but also an all-around change in the integration of computing and communication technology. As the next generation wireless communication technology, the future 5G network will be an intelligent system with multi-service, multi-access technology, and multi-level coverage. The purpose of development of HAPS 5G massive MIMO communication system is to provide supplementary wireless services for ground stations and satellites. The concept is based on the aircraft operating on the stratosphere above ground [5,6,7,8,9].

Because of its better location than satellite and terrestrial networks, HAPS 5G massive MIMO has great potential for various applications in the future 5G era. The application of 5G key technologies in HAPS 5G massive MIMO communication systems has many outstanding advantages. For example, massive MIMO technology has inherent advantages in the application of HAPS 5G massive MIMO communication system. HAPS 5G massive MIMO system can be perfectly matched with massive MIMO technology to solve the problems in many existing ground communication systems. The future HAPS 5G massive MIMO network must be able to provide real-time and accurate emergency communication services before natural disasters, wars, epidemics occur, and the existing ground communication systems cannot meet these requirements. But deploying HAPS 5G massive MIMO communication platform in the stratosphere will not only be suitable for cities, but also for marine, mountainous, and natural disaster-prone areas, which is very suitable for the application requirements.

A wireless dynamic channel allocation algorithm for HAPS 5G massive MIMO communication based on distance decision is proposed, which guarantees the quality of service of all kinds of services and maximizes the resource utilization ratio of high-altitude platform communication in reference [10]. Aiming at solving the problem of horizontal swing caused by a stratospheric crosswind on high-altitude platforms, a channel allocation algorithm combining channel reservation with handoff queuing is proposed to solve the problem of handoff between cellular for the ground calling users to continue to obtain reliable services in reference [11]. The algorithm takes full account of the service level requirements of different types of user terminals, differentiates the priority of user terminals, and queues the handoff callers on the basis of channel reservation from the viewpoint of reducing the handover failure rate. Although the above literature has solved the problem of channel allocation in HAPS 5G system in some aspects, it is not very suitable for the HAPS 5G massive MIMO network. The future HAPS 5G massive MIMO network will face a large number of data connections and the channel allocation that needs to be processed will be massive, while the HAPS 5G massive MIMO network is an autonomous learning and self-renewal intelligent system for the unknown environment, which can sense the external environment and learn from the environment by using artificial intelligence technology. By changing some operating parameters (such as transmission power, carrier frequency, and modulation technology) in real time, it can adapt to the statistical characteristics of the received wireless signals, so as to realize reliable communication and efficient utilization of spectrum resources anytime and anywhere. Therefore, this paper proposes an intelligent wireless channel allocation algorithm for HAPS 5G massive MIMO communication system based on reinforcement learning, which adopts Q-learning reinforcement learning algorithm in artificial intelligence algorithm and combines back-propagation neural network to enable HAPS 5G massive MIMO communication system to learn independently according to environment, intelligently according to channel load and blocking condition. The channel resources are allocated effectively in the system.

2 Methodology

Reinforcement learning is an artificial intelligence machine learning method, which allows agents to act in a specific environment, according to the current state, from the environment to get rewards and punishments from autonomous discovery can get the greatest reward strategy.

Usually, reinforcement learning consists of two parts: one is agent and the other is environment. The agent is the main body of learning, and the environment refers to the scene in which an agent performs actions. The environment first sends a state to the agent, and then the agent takes action based on its knowledge to respond to the state. After that, the environment sends the next state and returns the reward to the agent. Agents use the rewards returned by the environment to update their knowledge and evaluate the last action. This loop continues until the environment ends by sending a stop state.

The reinforcement learning algorithm generally consists of six elements:

1.
Action (a): all possible actions that an agent can take;
2.
State (s): the current situation of environment;
3.
Reward (r): the immediate return value of the environment to evaluate the last action of the agent.
4.
Strategy (π): the agent decides the next step according to the current state.
5.
Value (v): long-term expectation of return under discount. V(π(s)) is defined as the expected long-term return value of the current states under the policy π.
6.
Q value or action value: the Q value is similar to the value (V), but it needs an additional parameter that is the current action a. Q(π(s, a)) Q(π(s, a)) refers to the long-term reward of the current state and adopts an action below the strategic π.

Q-learning is one of the most famous algorithms in the field of reinforcement learning. It learns how to choose the next action (a) by perceiving reward and punishment (r). The detailed algorithm steps are as follows:

1)
For each state s and action a, the initialization table Q (s, a) is 0.
2)
observe the current states;
3)
Repeat it all the time.
1. a)
  Select an action a and execute it.
2. b)
  Receive an immediate return of r;
3. c)
  Observe the new state s^', and update the table item Q according to the following form:

$$ Q\left(s,a\right)=r\left(s,a\right)+\upgamma \ast \max Q\left({s}^{\prime },{a}^{\prime}\right),s={s}^{\prime }Q\left(s,a\right)=r\left(s,a\right)+\upgamma \ast \max Q\left({s}^{\prime },{a}^{\prime}\right),s={s}^{\prime } $$

(1)

Q-learning algorithm is a kind of unsupervised online learning technology, which learns from environment state to action mapping, making the agent adopt the optimal strategy according to the maximum reward value. Agents perceive information from complex environments and process information. Agents learn to improve their own performance and choose behavior, which produces the choice of group behavior. Individual behavior choice and group behavior choice make the agent make a decision to choose a certain action, and then affect the environment.

3 Intelligent wireless channel allocation algorithm based on Q-learning reinforcement learning

The block diagram of an intelligent wireless channel allocation algorithm based on Q-learning reinforcement learning is shown in Fig. 1.

As shown in Fig. 1, agents perceive information from complex environments and process information. Agents learn to improve their own performance and choose behavior, which produces the choice of group behavior. Individual behavior choice and group behavior choice make the agent make a decision to choose a certain action, and then affect the environment.

The channel assignment problem of HAPS 5G massive MIMO communication system is solved based on Q-learning reinforcement learning algorithm. The channel assignment problem is modeled as a Markov process, which generates an instantaneous return value at each step of learning, and the state converges at the end of learning. Therefore, in order to realize the algorithm modeling, the instantaneous return value R, channel state S, and channel assignment action A must be determined.

1)
Instantaneous return value R

The following principles must be satisfied for intelligent channel allocation:

a)
In the case of existing channel resources, all channels are allocated and the fairness principle is satisfied.
b)
The channel allocation satisfies the outage rate and the minimum principle of GoS (Grade of Service).
c)
Channel assignment satisfies the minimum principle of blocking rate.

Therefore, the instantaneous return value of the intelligent channel allocation algorithm is designed to achieve convergence according to the above principles:

If the a, b, and c principles are met, the instantaneous return value of the channel assignment is R = 10.
If we only meet the a and b principles and do not meet the c principles, then R = 7;
If we only meet the a and c principles and do not meet the b principles, then R = 5;
If we only meet the a principle and do not meet the b and c principles, then R = 3;
If we do not meet the a principles and satisfy only b and c principles, then R = 0;
If three principles are not satisfied, then R = − 10.

2)
Channel state S

Channel state represents the quality of the channel and the usage of the channel and the idleness of the channel in each period of time before the channel is allocated. The state set of the current allocated channel can be known through the channel state information.

The channel allocated by the intelligent channel assignment algorithm at the same time must meet the following requirements:

a)
Free channel resources;

Supposing that the traffic of each cell is A₁, A₂, ⋯, A_K, Among them, K represents the cell reuse mode of high-altitude platform communication network and K ≥ 2. Assuming that the total number of system channels is N, the level of the required service is B.

There is a relationship between the total number of channels and the traffic. Supposing that this relationship is C = f(A, B) and C is the number of channels and A is the traffic. The number of channels needed for cell i with time changing is c_i(t). Given $ {C}_i=\underset{i}{\max}\left({c}_i(t)\right) $, (1 ≤ i ≤ K) and the number of occupied channels in cell 1, 2, ⋯K is F₁, F₂, ⋯, F_K. The number of free channels is D. Under normal operating conditions, the system must satisfy:

$$ \left.\begin{array}{c}{F}_1+D\ge {C}_1\\ {}{F}_2+D\ge {C}_2\\ {}\vdots \\ {}{F}_K+D\ge {C}_K\\ {}\sum \limits_{i=1}^K{F}_i+D\le N\end{array}\right\} $$

(2)

Through Formula (2), we can get

$$ D\ge \frac{\sum \limits_{i=1}^K{C}_i-N}{K-1} $$

(3)

Through Formula (3), we define

$$ {D}_{\mathrm{min}}=\frac{\sum \limits_{i=1}^K{C}_i-N}{K-1} $$

(4)

If D_min ≥ 0, this means that the maximum number of channels required by each cell at its respective service level in a block is greater than the total number of channels available to the system. That is $ N\le \sum \limits_{i=1}^K\underset{i}{\max}\left({c}_i(t)\right) $. At this point, if the business requirements of each cell reach the peak at the same time, the system cannot meet the required performance indicators.

If D_min = 0, it means that the capacity that the system can provide can meet the needs of the users in each district, that is $ N=\sum \limits_{i=1}^K\underset{i}{\max}\left({c}_i(t)\right) $. At this point, according to the number of channels that the system just meets the maximum traffic needs of each cell, this situation is also the case of the highest resource utilization. The specific channel allocation satisfies F_i = C_i, (i = 1, 2, ⋯K), the free channel D = 0

If D_min ≤ 0, this means that the number of channels provided by the system is greater than the sum of the peak number of channels needed by each cell in a cluster to satisfy its respective service levels, that is $ N\ge \sum \limits_{i=1}^K\underset{i}{\max}\left({c}_i(t)\right) $. In this case, the user’s demand does not reach the system capacity, so only a certain number of channels need to be allocated to each cell to meet the demand, and the remaining number of channels can be set as a dynamic allocation part, so the enhancements learning algorithm can be used to allocate the free channel.

b)
Scheduling time does not conflict;

In the training phase, a conflicting coefficient is obtained by recording the channel in which a conflict occurs at a certain scheduling time. After the training phase is over, a channel conflict distribution table can be obtained. The conflict coefficient of non-conflict scheduling time is 0, and the more conflict time, the greater the conflict coefficient.

c)
Channel quality

According to the channel estimation, the channel quality of the idle channel can be divided into three levels according to the channel quality from high to low:

The best channel quality = 10;
The qualifying channel quality = 5;
The worst channel quality = 0;

d)
GoS (grade of service)

According to the requirement of GoS, the priority of channel assignment can be divided into the following 4 categories:

Emergency business level, level = 100;
High priority business level, level = 50;
Medium priority business level, level = 30;
Low priority business level, level = 10.

3)
Channel assignment action A

Channel allocation action is to select which channel to allocate among the free channel resources, and it also needs to reflect the service level information.

We use a 5-bit binary representation in which the lowest bit denotes whether the channel is allocated, if allocated, is 1, otherwise 0. The second and third bits represent the channel quality, with the best quality is 10, the qualifying quality is 01, and the worst quality is 00, 11 reserved. The fourth and fifth bits are the service level, the emergency service is 11, the high priority is 10, the middle priority is 01, and the low priority is 00.

In this way, we discretize the channel state into the above four variables, the total number of channels is N. So the channel state table of the intelligent channel allocation device contains 4 × N elements, called channel state mode matrix.

After quantifying the three elements of the reinforcement learning algorithm, we can design the intelligent channel allocation algorithm according to the flow of the reinforcement learning algorithm.

Intelligent channel allocation algorithm uses the Q-learning learning algorithm of reinforcement learning algorithm and chooses the evaluation value Q of channel allocation action instead of the state value as the basis of channel allocation. This is because the evaluation of Q value is relatively objective, it is not necessary to assume the current channel allocation, and the iterative Q-learning algorithm is policy-independent, it always selects the maximum Q value as the iterative input. After iterations, the Q value learned will gradually approach the real Q value.

The algorithm steps are as follows:

1)
Cycle tasks one by one according to the order of channel assignment tasks.
2)
Initialize the state matrix as 0.
3)
Analyze the current state of the channel S and write the state mode matrix.
4)
A set of schedulable channel assignment actions is obtained according to channel state information and several principles of channel assignment.
5)
Select and execute a channel assignment action to get the instantaneous return value R;
6)
If the channel assignment of a business is completed, then update Q according to Q(s, a) = r + γ × max Q(s^', a^') (0 ≤ γ < 1)
7)
Record the state mode, the action executed, and the Q value obtained. If the same state mode and the action executed before, only the larger Q value needs to be updated.
8)
Keep cycling until convergence.

The learning process will converge only if each idle channel state and allocation action are used indefinitely and frequently because of the complex and changeable wireless channel environment and the variety of wireless services and the mobility and uncertainty of users. In the HAPS 5G massive MIMO communication system, there will be a large number of traffic connections, and the state-action pair of channel assignment problem will be a huge state space. It is difficult to search such a huge space in practice, and it is almost impossible to get all the state-action Q value tables. So in this case, in order to make the reinforcement learning algorithm achieve the desired effect, we choose to use the back propagation neural network to quickly obtain the estimate of Q value. The neural network is used to replace the Q value table, and every Q update is used as a training example to train the network. When training the intelligent channel assignment BP network, we can quantify the channel state S and take it as the first input of the neural network. Then, the neural network finally outputs an estimate of Q value, and compares this Q value with the Q value obtained from the previous learning, and trains the BP network to get the expected Q value.

Intelligent channel allocation BP network is divided into three layers, the number of input layer units is 4 × N channel state. The third layer output layer is only one and the number of hidden layer neural units is chosen 32. All levels of neurons form a fully interconnected connection and hidden layer is S-shaped transfer function and output layer is linear transfer function.

The initial weight matrix of BP network in this paper is

$$ {W}^2=4\times N\times 32,{W}^3=32\times 1\;{b}^2=1\times 32,{b}^3=1\times 1 $$

(5)

Initial weights are chosen randomly in (0, 1) to avoid possible saddle points without leaving the flat area of the performance surface.

A lot of training data will be generated during our continuous training in the system. Although these data are not the best strategy for dealing with the environment at that time, it is the experience gained by interacting with the environment that is very helpful to our training system. So, we set up a replay_buffer to save new interactive data to overwrite the old data, and each time randomly take a batch from the replay_buffer to train our system.

Each record in replay_buffer contains the following:

State: the channel status of the current device;
Action: the behavior of our agent in the current state;
Reward: the profit from the environment after agent has made the choice behavior;
Next_state: the next state that the agent transferred after the agent has made the choice behavior;
Done: the flag to indicate if the training is ok.

In summary, the intelligent wireless channel allocation algorithm based on reinforcement learning in HAPS 5G massive MIMO communication system is as follows:

a.
Initialize the weight matrix of BP network.
b.
Cycle tasks one by one according to the order of channel assignment tasks.
c.
Initializing the channel state matrix as 0.
d.
Analyze the current state of the channel S and write the state mode matrix.
e.
A set of schedulable channel assignment actions is obtained according to channel state information and several principles of channel assignment.
f.
Select and execute a channel assignment action to get the instantaneous return value R.
g.
If the channel assignment of a business is completed, then update Q according to Q(s, a) = r + γ × max Q(s^', a^') (0 ≤ γ < 1)
h.
After updating, it begins to cycle according to the channel assignment task of the business.
i.
Take channel state S from replay_buffer one by one as the input of BP network and make channel assignment action and then compare the output Q value of BP network with the actual updated Q value and train the network to output the desired target Q value and obtain a new state S.
j.
Record the state mode, the action executed, and the Q value obtained. If the same state mode and the action executed before, only the larger Q value needs to be updated.
k.
Replace the old data with a new data in replay_buffer, starting to cycle from step h to train the BP network until the channel assignment task training for the service is completed.
l.
Repeat loops from c until convergence.

The algorithm flow chart is shown in Fig. 2.

4 Performance comparison and analysis

4.1 Establishment of simulation environment

Next, we will simulate the algorithm in this paper to verify the performance of the algorithm. The simulation model used in this article is shown in Fig. 3. Using a typical 4-platform 32-channel model, the simulation area consists of seven cellular cells, each of which is hexagonal in size, and the seven cells studied are all located in the inner ring area covered by four high-altitude communication platforms. The antenna gain mode meets the ITU standard for high-altitude platform communication. It is assumed that the mobile users distribute uniformly throughout the service area and the antenna of the mobile users points to the high-altitude platforms they access without bias. The transmitting power of each mobile user is the same. The propagation environment obeys the law of free space link loss. We analyze the network performance using the intelligent channel allocation algorithm in a mixed service environment. We choose several business scenarios from the main application scenarios of HAPS 5G massive MIMO communication system:

1)
Traffic I: cloud AR/VR business. VR/AR requires a large number of data transmission. The quality of the channel determines the quality of VR/AR video data transmission. HAPS 5G Massive MIMO network needs to allocate the corresponding quality level of the channel based on different types of AR/VR services and different environments;
2)
Traffic II: vehicle networking business. It includes traditional cars, remote control driving and unmanned autopilot. Vehicle life-cycle maintenance, sensor data packets, etc., require secure, reliable, low latency and high broadband connections, which are essential in highways and dense cities. HAPS 5G Massive MIMO network needs to assign the corresponding quality level and service priority channel according to different types of vehicles in different environments.
3)
Traffic III: voice business. There are still a large number of voice services in HAPS 5G massive MIMO network. HAPS 5G massive MIMO network needs to allocate a large number of voice services optimally in the case of channel collision and sudden emergence of high priority services.

4.2 Performance comparison

We choose two classical channel assignment algorithms in HAPS 5G communication system as the comparison algorithm: random channel allocation algorithm [12] and worst channel acceptable channel allocation algorithm [13].

Figures 4, 5, and 6 show the channel allocation accuracy of traffic 1, traffic 2, and traffic 3 under three channel resource allocation algorithms respectively. As it can be seen from Figs. 4, 5, and 6, the channel allocation accuracy of the three channel resource allocation algorithms decreases with the increase of the number of agents in the network, especially for the random channel allocation algorithm, when the number of agents increases to a certain number, the decrease is especially obvious. The channel allocation accuracy of the intelligent channel allocation algorithm based on reinforcement learning algorithm is higher than the other two algorithms for different traffic. Even if the number of agents in the network is very large, the channel allocation accuracy is still very high. This is because the algorithm can dynamically adjust the channel allocation according to the channel quality and priority of different services and select the best channel for the current service channel quality and service level.

5 Results and discussion

In this paper, an intelligent wireless channel allocation algorithm based on the reinforcement learning algorithm for HAPS 5G massive MIMO communication system is proposed, which uses Q-learning reinforcement learning algorithm in artificial intelligence algorithm and combines back-propagation neural network to enable the HAPS 5G HAPS 5G massive MIMO communication system to learn independently according to the environment and intelligently according to the channel, load, and blocking. Agents perceive the state information in the channel environment through continuous interaction with the channel environment, learn from the environment state to the action mapping, and use the back-propagation neural network for learning training, use the neural network instead of the Q value table, and train the network with each Q update as a training example to update the evaluation function, and repeat cycle iteration until convergence condition is satisfied, then stop learning. The network performance of the proposed algorithm is compared with that of the random channel allocation algorithm, the worst acceptable channel allocation algorithm, and the proposed algorithm. The channel allocation accuracy of the intelligent channel allocation algorithm based on reinforcement learning algorithm is higher than the other two algorithms for different traffic. Even if the number of agents in the network is very large, the channel allocation accuracy is still very high. It effectively improves the overall performance of the system.

Abbreviations

5G:: The Fifth Generation
AR:: Augmented reality
HAPS:: High-altitude platform station
MIMO:: Multiple-input multiple-output
VR:: Virtual reality

References

Z. Cao, X. Zhao, F.M. Soares, 38-GHz millimeter wave beam steered fiber wireless systems for 5G indoor coverage: architectures, devices, and links. IEEE J. Quantum Electron. 53(1), 1–9 (2017)
Article Google Scholar
Y. Yifei, Z. Xiaowu, 5G: vision, scenarios and enabling technologies. ZTE Commun. 13(1), 69–79 (2015)
Google Scholar
Edgar Lemos Cid, Monica Portela Taboas, Manuel Garcia Sanchez. Microcellular radio channel characterization at 60 GHz for 5G communications. IEEE Antennas Wirel. Propagation Lett., (99), 1–4(2017)
M. Moghaddam, Ready for 5G? IEEE Antennas Propagation Magazine 59(1), 1–4 (2017)
Article MathSciNet Google Scholar
L. Xin, Z. Xueyan, J. Min, 5G-based green broadband communication system design with simultaneous wireless information and power transfer. Phys. Commun. (25), 539–545 (2018)
Z. Na, Y. Wang, X. Li, J. Xia, X. Liu, M. Xiong, W. Lu, Subcarrier allocation based simultaneous wireless information and power transfer algorithm in 5G cooperative OFDM communication systems. Phys. Commun. 29, 164–170 (2018)
Article Google Scholar
Y. Wang, M.C. Erturk, J. Liu, Throughput and delay of single-hop and two-hop aeronautical communication networks. J. Commun. Netw. 17(1), 58–66 (2015)
Article Google Scholar
A. Ibrahim, A.S. Alfa, Using Lagrangian relaxation for radio resource allocation in high altitude platforms. IEEE Trans. Wirel. Commun. 14(10), 5823–5835 (2015)
Article Google Scholar
F. Dong, Y. He, X. Zhou, Optimization and design of HAPS broadband communication networks. 5th Int. Conf. Inf. Sci. Technol., 154–159 (2015)
G. Ming-xiang, G. Qing, G. Xue-mai, A dynamic wireless channel allocation algorithm in HAPS communication based on distance verdict. Electron. J. 41(1), 18–23 (2013)
Google Scholar
J. Jingya, Z. Bangning, G. Daoxing, Y. Zhan, A channel assigning algorithm for platform displacement model in HAPS communication system. Telecommunication Eng. 55(8), 906–912 (2015)
Google Scholar
Grace, D., Spillard, C., Thornton, J., Tozer, T. C.: Channel assignment strategies for high altitude platform spot-beam architecture. The 13th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, vol. 9, pp. 1586–1590(2002)
P. Pace, G. Aloi, F. De Rango, An integrated satellite-HAP-terrestrial system architecture: resources allocation and traffic management issues. IEEE 59th Vehicular Technol. Conf. 5, 2872–2875 (2004)
Google Scholar

Download references

Acknowledgements

The research presented in this paper was supported by the Department of Education of Guangdong Province and Shenzhen Science and Technology Innovation Committee, China.

Funding

This paper is supported by the Guangdong Province higher vocational colleges and schools Pearl River scholar funded scheme (2016), the project of Shenzhen Science and Technology Innovation Committee (JCYJ20170817114522834, JCYJ20160608151239996), the Science and Technology Development Center of Ministry of Education of China (2017A15009), and Engineering Applications of Artificial Intelligence Technology Laboratory (PT201701).

Author information

Authors and Affiliations

School of Electronic Communication Technology, Shenzhen Institute of Information Technology, Shenzhen, China
Mingxiang Guan, Zhou Wu, Yingjie Cui, Xuemei Cao, Le Wang, Jianfeng Ye & Bao Peng

Authors

Mingxiang Guan
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yingjie Cui
View author publications
You can also search for this author in PubMed Google Scholar
Xuemei Cao
View author publications
You can also search for this author in PubMed Google Scholar
Le Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Ye
View author publications
You can also search for this author in PubMed Google Scholar
Bao Peng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MG is the main writer of this paper. He proposed the main idea, deduced the performance of intelligent wireless channel allocation detection, completed the simulation, and analyzed the result. ZW and LW introduced the reinforcement learning algorithm. YC, XC, and JY simulated the algorithm. BP gave some important suggestions for the establishment of simulation environment. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mingxiang Guan.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Guan, M., Wu, Z., Cui, Y. et al. An intelligent wireless channel allocation in HAPS 5G communication system based on reinforcement learning. J Wireless Com Network 2019, 138 (2019). https://doi.org/10.1186/s13638-019-1463-8

Download citation

Received: 26 February 2019
Accepted: 03 May 2019
Published: 28 May 2019
DOI: https://doi.org/10.1186/s13638-019-1463-8

An intelligent wireless channel allocation in HAPS 5G communication system based on reinforcement learning

Abstract

1 Introduction

2 Methodology

3 Intelligent wireless channel allocation algorithm based on Q-learning reinforcement learning

4 Performance comparison and analysis

4.1 Establishment of simulation environment

4.2 Performance comparison

5 Results and discussion

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords