Deep learning-based optimal placement of a mobile HAP for common throughput maximization in wireless powered communication networks

Hybrid access point (HAP) is a node in wireless powered communication networks (WPCN) that can distribute energy to each wireless device and also can receive information from these devices. Recently, mobile HAPs have emerged for efficient network use, and the throughput of the network depends on their location. There are two kinds of metrics for throughput, that is, sum throughput and common throughput; each is the sum and minimum value of throughput between a HAP and each wireless device, respectively. Likewise, two types of throughput maximization problems can be considered, sum throughput maximization and common throughput maximization. In this paper, we focus on the latter to propose a deep learning-based methodology for common throughput maximization by optimally placing a mobile HAP for WPCN. Our study implies that deep learning can be applied to optimize a complex function of common throughput maximization, which is a convex function or a combination of a few convex functions. The experimental results show that our approach provides better performance than mathematical methods for smaller maps.

Because the distance between the HAP and each WD is different among each WD, there is an energy efficiency gap between the WDs caused by the difference of throughput for each WD. That is, a WD near to the HAP receives more energy from the HAP and uses less energy to transmit information, and another WD far from the HAP receives less energy but uses more energy to transmit information. To solve this unfairness problem, the worst case, a WD which receives the least energy and uses the most energy, is very important. In this case, we use the concept of common throughput which is the minimum value of throughput among the throughput values of each WD, and we concentrate on maximizing the common throughput value in the WPCN environment.
In [2], Bi and Zhang researched the placement optimization of energy and information access points in WPCN using the bi-section search method, Greedy algorithm, Trialand-error method and alternating method for joint AP-EN placement. There can be more than 1 HAPs in the supposed WPCN environment of this paper. Its methodology repeatedly adds HAPs and check if each WD satisfies conditions in the environment.
Normally, mathematical methodologies are suitable to solve optimization problems by minimizing relatively simple functions. On the other hand, deep learning is suitable to solve these problems by minimizing relatively complex functions. Some mathematical methods can be suitable to solve some relatively simple problems, and deep learning performs better when there are many and various cases of inputs and corresponding outputs. For this problem, because there are so many cases of how the devices are located in a WPCN environment, the computation and optimization of the common throughput would be more complex if there are many devices, so mathematical methods have some limits to solving this kind of problem. So, although the method in [2] is suitable to solve this problem, it would be worth trying to apply the deep learning method here for comparative purposes. We can make many and various cases of data that the inputs are the vector or tensor with the location of devices, and the outputs are the common throughput for when the HAP is located at each point. So, the motivation of this paper is to introduce deep learning to optimize the placement of HAP in the relatively complex WPCN environment. This paper introduces a methodology to place an HAP in a WPCN environment to maximize common throughput when time allocation is optimized, by using deep learning, and shows that this methodology has a meaningful contribution to solving this problem and shows better performance than the mathematical methodology already studied, such as [2]. Section 3 describes our HAP placement model, data preparation and algorithm for training, and how to find the best HAP placement. Section 4 describes our design and an environment for the experiments and the experimental results of our model. Section 5 describes our analysis of the results. Finally, Sect. 6 describes the conclusion of this paper.

Related works
Our system has only one HAP, and the goal of our system is to maximize the common throughput of devices. Considering the system, Song et al., Lee, Kim et al., Kwan and Fapojuwo and Thomas and Malarvizhi [3][4][5][6][7] have an HAP and many devices with their systems as same as this research. In detail, the HAP and devices in the system of [3] have antennas. In [4], the spectrum of HAP and devices for both DL WET and UL WIT are the same. The system of [5] consists of a primary WIT and a secondary WPCN system, and the HAP and the devices are in the latter. The system of [6] uses radio frequency (RF) to harvest energy. The system of [7] consists of not only HTT (harvest-then-transmit) but also backscatter mode. Tang et al. [8] have many UAV (unmanned aerial vehicle)s and many devices, Hwan et al. [9] have many HAPs and many devices, Xie et al. [10] have a UAV and many devices, Biason and Zorzi [11] have an AP(access point) and two devices recharged by the AP, Cao [12] have a relay communication system and many devices. Chi et al. [13] compares the performance of TDMA-and NOMA-based WPCN for EP (energy provision) minimization problem with network throughput constraints, Kwan and Fapojuwo [14] tries to maximize the sum throughput of the wireless sensor network using three protocols, and [15] tries to optimize time allocation for backscatter-assisted WPCN to maximize total throughput. Considering the objective functions and constraints of variables, Tang et al., Xie et al. and Biason and Zorzi [8,10,11] try to maximize common throughput, in another word, minimum throughput and [11] tries to maximize long-term minimum throughput. Hwan et al. [9] tries to maximize the sum-rate performance. Song [3,5] also use transmit covariance matrix for DL-WET. Lee [4] defines the problem as maximizing the sum throughput for U-CWPCN and O-CWPCN (two overlay-based cognitive WPCN models). Kwan and Fapojuwo [6,14] uses bandwidth allocation to optimize it. Chi et al. [13] tries to minimize EP of H-sink. Cao [12] have 3 divided time slots as variables with constraints. Ramezani and Jamalipour [15] uses the achievable throughput of both the users and EIRs in two phases. Thomas and Malarvizhi [7] define the sum throughput of all users as the sum of the throughput of two modes, HTT and backscatter mode. Therefore, our research can be compared with [10] [5,10] use Lagrange dual method and subgradient-based methods such as the ellipsoid method. Chi et al. [13] also uses the bisection method for time allocation, and [5] also uses a line search method. Cao [12] uses SDP (Semi-Definite Programming) relaxation to derive the optimal solution. Tang et al. [8] used Multi-agent deep Q learning (DQL), Hwan et al. [9] used multi-agent deep reinforcement learning (MADRL) and distributed reinforcement learning, Biason and Zorzi [11] used Markov Chain and Markov Decision Process, Kwan and Fapojuwo [14] used its own three protocols, and [6] used MS-BABF/Hybrid-STF method. The method applied to [15] is similar to the mathematical optimization methods used in [3-5, 10, 12, 13] but combined with Block Coordinate Descent (BCD) method. Thomas and Malarvizhi [7] describe no particular methods for finding the solution.
Consequently, the research that can be compared with ours, Xie et al. [10], does not use machine learning methods. So, we can apply machine learning methods to solve this problem, and this can be an improved method to find the optimal placement of the HAP.
3 Methods: using HAP placement model Figure 1 describes the system architecture of the model. Let us explain our model using the definitions above. Mobile HAP can be placed at any location in the environment and can move to any other location in the environment. The goal is to maximize common throughput that is defined as the minimum throughput between the HAP and each WD by optimizing the HAP placement. So, the HAP needs to move to the location where the minimum throughput is maximized. So, in WDs placement map, the HAP can be located at any grid in the map and should be located at the best throughput point. The rightmost figure of Fig. 1 describes computed minimum throughput for each grid when the HAP is located at the grid and the best throughput point. Figure 2 is the flow chart of the HAP placement model. The model is composed of three phases. First, "making data" is to create training and test data. Next, "training using data" is to process the data to convert to training and test data for the deep learning model, and train using the model. Last, "finding the best point" is to find the best HAP placement point using the throughput map derived from this model.

Overview
In this paper, we map the physical wireless channel environment into a 2-dimensional array. As in [1], we assume that the environment is located in the free space, so the path loss follows the rule for the free space. Just one exception for this is, for each WD, when the distance between the HAP and the WD is less than a specific value, we compute the throughput as when the distance is the value. Detailed discussion about it will be discussed in the Sect. 3.2.
From now on, we use the definitions here. WDs placement map means the grid map representing the environment, as in Fig. 1. N and M mean the number of rows and columns of the WDs placement map, respectively, and K means the number of wireless devices in the WDs placement map. Block means each grid in the WDs placement

Making data
We computed and used Eq. (1) by combining Eqs. (7) and (8) in [17] for the throughput. To make WDPM i (N , M, K ) ′ s, i = 0, . . . , m total −1 , where m total is the sum of the number of training and test maps, first define a grid map with N rows and M columns, N ×M blocks in total. Then repeat placing a WD on randomly selected point without a HAP K times. To make TM i (N , M) ′ s, i = 0, . . . , m total −1 using these WDPM i (N , M, K ) ′ s , place HAP at each point in WDPM i (N , M, K ) ′ s and compute throughput for the location of HAP and each WD using Algorithm 1 because the throughput is computed using (1). Procedure getThrput finds optimal time allocation .0, Ŵ = 9.8 and σ = 0.001 where d is the distance from the HAP and each WD, this formula can be converted into (2). To prevent divide by 0 error and consider the limit of throughput, we supposed that distance is 1.0 when actual distance is less than 1.0. (1)

Training
First, make input data for training and testing based on supposing that the number of training and testing data is m 1 and m 2 each. The model considers first m 1 maps as training data and next m 2 maps as test data. The input data are the N × M map whose value at each block of the map is -1 when a WD is on this block and 0 otherwise. Then make output data for training based on The output data are the N × M map whose value at each block, whose row index is n, and column index is m that is V i ′′ n,m , defined below. We define  Fig. 3 with Adam optimizer [18] with learning rate 0.0001 and 1000 epochs.

Finding the best points
Using test input data, the model finds best point for HAP placement. In (4), invSigmoid(x) is the inverse function of sigmoid(x) and defined as ln(x/ (1 − x)) . Then, for each output map, the model finds the maximum value among values in blocks of this map. Let's call row and column axis of this value in the map n M and m M , respectively, and call the maximum value V i ′′ n M ,m M . Then the row axis n optimal and column axis m optimal of optimal HAP location are computed by (5) and (6) each, and BTP i (N , M) is computed by (7), described in Fig. 4.  (4)

Experiment design and test metrics
CT.AVERAGE means average common throughput for each test map with corresponding BTP i (N , M), i = m 1 , . . . , m 1 + m 2 − 1 , and CT.AVGMAX means maximum common throughput value for each throughput map corresponding to each test map, and CT.RATE means the rate between the sum of C i and the sum of MC i for all test maps. It also means the rate between CT.AVERAGE and CT.AVGMAX . We also define performance rate PR as (11) meaning how well our methodology is compared to the methodology used in the original paper, and the original paper in (11) means [2].
In (11), M 1 is our methodology, and M 0 is the methodology in the original paper. CT.RATE can be larger than 1.0 because CT.AVGMAX means the average of largest value among the value at discrete blocks from corresponding TM i , but CT.AVERAGE means the average of common throughput value with non-discrete HAP location.

Experimental environment
The computer system information for our experiment is as the following. The operating system is Window 10 Pro 64bit (10.0, build 18363), system manufacturer is LG Electronics, the system model is 17ZD90N-VX5BK, the BIOS is C2ZE0160 X64, the processor is Intel(R) Core i5-1035G7 CPU @ 1.20 GHz (8 CPUs), ∼1.5 GHz, and the memory is 16384MB RAM. The programming language is Python 3.7.4, and used NumPy [20], Tensorflow [21] and Keras as libraries. You can download the experiment code from https://github.com/WannaBeSuperteur/2020/tree/master/WPCN. with π = 3.141592654 for the methodology in [2], and the algorithm to solve (20) in [2]is described in Algorithm 2. For our methodology, CT.RATE value increases when the number of WDs increases and decreases when the size of maps increases, and CT.AVERAGE decreases when both the number of WDs and the size of maps increases. For the methodology in the original paper, CT.RATE increases when the size of maps increases, but has no significant correlation with the number of WDs. Table 2 shows the values of CT.AVGMAX and PR for each size and number of WDs. The unit for size is one block, as mentioned in Sect. 3. For example, the size of 12 × 12 means that the environment contains 12 rows, and each row contains 12 blocks. CT.AVGMAX decreases when both the number of WDs and the size of maps increases and PR decreases when the size of maps increases, but has no significant correlation with the number of WDs. For smaller sizes, our methodology shows significantly better performance ( PR > 1 ) than the methodology in the original paper, but for 12 × 12 size, these two methods show almost the same performance. ( PR ≈ 1 ), and for 16 × 16 size, our methodology shows worse performance. ( PR < 1 ) Fig. 6. is the line chart representation of  Tables 1 and 2, and Fig. 7. is the bar chart for comparison of our methodology and the methodology in the original paper.

Discussion
Our method shows higher CT.RATE for smaller maps, and the methodology in the original paper shows higher CT.RATE for larger maps. The reason for the former is, first, that common throughput usually depends on the WDs near the boundary of the environment, and these WDs usually enlarge the minimum value of the maximum possible distance between the HAP and each WD. For larger maps, the influence on the learning of the blocks with these WDs decreases, because the number of blocks influencing the learning is larger, so the influence of each block on the learning decreases. Second, there are fewer possible cases for smaller maps because the number of blocks is fewer for them, so our model could be more accurate. The reason for the latter is that the locations of WDs are not realistic for smaller maps because both x and y-axis of them are always an integer, so the methodology in the original paper is not so accurate. Tables 3, 4 and 5 describes the average, standard deviation and 95% confidence interval of some variables from the experimental result using 100 test dataset samples, that is, WDPM i (N , M, K ) and TM i (N , M, K ) , i = m 1 , . . . , m 1 + m 2 − 1 where m 1 =900 and m 2 =100. When the value of 'rows' is r , it means the size of the grid map is r×r . We computed the confidence interval using (12) where X and σ is average (refer to Table 3 to check the values) and standard deviation (refer to Table 4 to check the values) of the sample values, respectively, and n is the number of samples for each case, that is 100 for the experiment.
According to Table 5    X-axis maximizing the common throughput should be (r − 1)/2 . In Table 5, one can see that all the confidence intervals for both Y and X-axis values include (r − 1)/2 for all the cases with r = 8 , r = 12 and r = 16 . Y/rows and X/rows should have positive correlation with r where r means the number of rows, because Y/r and X/r = ((r − 1)/2)/r = (r − 1)/2r increases when the value of r increases. In Table 5, one can see that it is true and when comparing the cases for rows = 8 and rows = 16 , the confidence intervals is not overlapped for the cases that WDs = 10 . These guarantee that our method randomly put wireless devices on the grid maps for test data. In addition, One can see that the portion of time allocated to HAP (HAPtime) has a positive correlation with the number of rows in the grid map (rows), and in Table 5, the confidence intervals are always not overlapped when the number of rows differs. It  (N, M, K ) , the value of n optimal and m optimal , the portion of time allocated to HAP computed with GETTHRPUT function in Algorithm 1 with HAPpoint as [ n optimal , m optimal ], the value of n optimal /rows and m optimal /rows, respectively   indicates that when the number of rows in the grid map increases, the portion of time allocated to HAP also increases.

Conclusion
We showed that our deep learning-based method shows better performance than the mathematical method in the original paper [2] when the size is smaller than 12 × 12 . Although our method may show worse performance if the size is larger than 12 × 12 , our approach to find the optimal placement and time allocation for HAP using deep-learning is meaningful because there is no attempt to apply deep-learning to this problem yet. In addition, we found that with HAP locations derived by our method, the portion of time allocated to HAP has a positive correlation with the size of the grid map ( 8 × 8 , 12 × 12 and 16 × 16 ). There are some limits to our study. First, our study has an advantageous point for our method that it uses only 1 HAP which is fitted to the experimental environment, but the method in the original paper may and commonly uses more than 1 HAPs. Second, we studied with just a few conditions, 3 options for map size and 2 options for the number of WDs. So, some future research should be done for many options in terms of the map size and the number of WDs, and the number of HAPs.