 Research
 Open Access
 Published:
User selection and dynamic power allocation in the SWIPTNOMA relay system
EURASIP Journal on Wireless Communications and Networking volume 2021, Article number: 124 (2021)
Abstract
Nonorthogonal multiple access (NOMA) technology provides an effective solution to massive access with a high data rate demand in newgeneration mobile networks. The paper combinations with NOMA and simultaneous wireless information and power transfer (SWIPT) relay to maximize the sum rate in the downlink system. To that end, it is critical how to select effectively users access system and power allocation for the access user. This paper proposes a user selection and dynamic power allocation (USDPA) scheme in the NOMASWIPT relay system based on neural network because traditional optimization methods have difficulty solving nonlinear and nonconvex problems. We establish a user selection network utilizing a deep neural network (DNN) and propose a power allocation network using deep reinforcement learning. The simulation results show that the proposed scheme achieves better performance than other related schemes, especially for high quality of service requirements.
Introduction
Due to the massive amount of wireless equipment accessed via the internet, researchers have focused on the high demand for charging wireless mobile terminals (MTs). Thus, as a promising green communication solution, simultaneous wireless information and power transfer (SWIPT) was introduced to increase the battery life. SWIPT can achieve significant gains in energy consumption and spectrum efficiency (SE), improve interference management, and reduce transmission delays by enabling the simultaneous transmission of power and information [1]. Two practical receiving methods exist for the SWIPT strategy, i.e., time switching (TS) and power splitting (PS), to harvest energy and decode information. In addition, a cooperative relaying (CoR) method combined with SWIPT was proposed to increase network reliability and expand the signal coverage area [2, 3]. The nonorthogonal multiple access (NOMA) scheme has been regarded as a promising technique to improve the SE for 5G and future communication systems because the signals of different MTs in NOMA can be multiplexed on the same resource elements [4]. Therefore, some researchers combined NOMA with SWIPT relay technology to improve the SE and achieve green communication [4, 5].
Numerous studies were conducted on wireless resource management to improve the performance of the SWIPTNOMA relay system [6,7,8,9,10,11]. Reference [6] utilized an average power allocation scheme in the downlink and fixed power control in the uplink to evaluate the ergodic rate; however, the strategy does not guarantee that all signals are successfully decoded in the downlink and uplink. Reference [7] compared a cognitive radioinspired power allocation scheme with a fixed power allocation scheme to ensure the fairness of the data rate. Reference [8] analyzed the error probability of the SWIPTNOMA system by using a fixed allocation power scheme. In [9], the outage probability was regarded as the optimization function to obtain the power allocation factors. The analysis in [10] was in line with realistic scenarios regarding the impact of imperfect channel state information (ICSI) and residual hardware impairments (RHIs). Reference [11] evaluated the performance of a complex SWIPT scenario that allocated fixed power in the downlink. However, most papers focused on fixed power allocation, whereas artificial intelligencebased (AI) schemes for wireless resource allocation have not been well researched.
Many studies investigated access schemes in SWIPTNOMA relay systems [12,13,14]. Reference [12] analyzed a SWIPTNOMA relay system that considered channel estimation errors (CEEs) and RHIs. When all users accessed the system, more interference occurred at the receivers. Reference [13] investigated the outage probability choosing an optimum near destination node and an optimum far destination node, and the near node was used as the relay. Nevertheless, neither [12] nor [13] considered the channel gain from the relay to the MT, causing performance degradation. [7,8,9,10,11] considered access to all users, which prevented the decoding of all signals and generated more interference at the receivers. Furthermore, the performance of an allaccess users’ scheme is not high [15, 16] despite high model complexity. In contrast to [7,8,9,10,11,12,13,14] proposed access to users who have fed back channel state information (CSI), the algorithm had difficulty converging.
Therefore, it is imperative to develop a scheme that provides user access and allocates power to qualified users. AI techniques can extract valuable information from data to learn and support different functions for optimization, prediction, and decisionmaking in mobile edge computing, mobility prediction, optimal handover solutions, and spectrum management [17]. Deep reinforcement learning (DRL) can solve realtime and dynamic decisionmaking problems for power allocation [18,19,20]. Reference [18] proposed a deep Q network (DQN) for each MT to obtain the optimal power allocation scheme. The objective was to reduce the size of the state space; however, this distributed power allocation method has no information interaction between the MTs, resulting in power allocation conflicts. Reference [19] proposed a twostep modelfree DRLbased power control scheme to maximize the longterm sum energy efficiency (EE). Based on a multicarrier NOMA network with SWIPT, reference [20] proposed to use a deep belief network (DBN) to approximate the optimal power allocation.
Contribution
To deal with problems of traditional methods [7,8,9,10,11,12,13,14,15,16], inspired by the above studies [17,18,19,20], we propose a combined user selection and dynamic power allocation (USDPA) scheme that chooses the best users access the system and decides optimal power allocation to maximize the sum rate. The main advantages and contributions of this paper are summarized as follows.

The USDPA scheme is proposed in the SWIPTNOMA relay system to optimize the user access and power allocation simultaneously to maximize the sum rate because traditional optimization methods have difficulty solving nonlinear and nonconvex problems. More importantly, the results show that our algorithm can successfully access more users than comparable algorithms.

We use a deep neural network (DNN) for the user selection network to generate the access decision. Subsequently, the access decision is mapped to several candidate access actions, whose number changes adaptively. In addition, the result displays that the model converges quickly without adding additional computational complexity.

We utilize a DQN to generate the optimal power allocation for each candidate access action. Afterwards, we use the optimal pair of access action and power allocation action with the maximum sum rate in the system. The best power allocation action is stored in the replay memory to train this network.

Finally, we compare the performance of the USDPA with other schemes. The simulations under different scenarios show that the proposed algorithm improves quality of service (QoS) and can achieve better performance than other related schemes.
Organization
The remainder of this paper is organized as follows. Section 2 describes the system model and the problem formulation of the user selection and power allocation model for the SWIPTNOMA relay system. Section 3 presents the USDPA scheme, including the user selection network and power allocation network. Section 4 presents the experimental results and analysis, including the convergence, the sum rate, and the number of successful communication users (NSCUs). Finally, the conclusions are summarized in Sect. 5.
System model and problem formulation
Problem formulation
We consider a system model that includes a base station (BS), a relay employing a decodeandforward (DF) protocol, and \(N\) destinations, as illustrated in Fig. 1. Hereafter, subscripts \({\text{S}},{\text{R}}\) and \(D_{i}\) will be used for the BS, relay, and destination \(i\), respectively. The radius of sector \(S_{1}\) are \(\Upsilon_{{S_{1} }}\) with the BS at the center and an angle \(\phi\). The radius of sector \(S_{2}\) are \(\Upsilon_{{S_{2} }}\) with same center and angle as sector \(S_{1}\). The relay is located on a circular arc of radius \(\Upsilon_{{S_{1} }}\), and the \(N\) destinations are randomly and uniformly distributed in the region between \(\Upsilon_{{S_{1} }}\) and \(\Upsilon_{{S_{2} }}\). Each destination node and the relay have a single antenna operating in halfduplex (HD) mode. We assume that all smallscale fading in the system is independent and identically distributed Rayleigh fading occurs. The channel coefficients of the links from the BS to the relay and the relay to \(D_{i}\) are \(h_{S,R} \sim {\mathcal{C}\mathcal{N}}\left( {0,d_{S,R}^{  \tau } } \right)\) and \(h_{{R,D_{i} }} \sim {\mathcal{C}\mathcal{N}}\left( {0,d_{{R,D_{i} }}^{  \tau } } \right)\), respectively, where \(d_{i,j}\) denotes the distance between node \(i\) and node \(j\), and \(\tau\) is the pathloss exponent.
For simplicity, we assume that the transmission time \(T = 1\) and the bandwidth \(B = 1\). The power splitting relay (PSR) strategy is used (Fig. 2). Within the duration of each \(\alpha T\), the relay performs energy harvesting (EH) and information decoding (ID); within each \(\left( {1  \alpha } \right)T\) period, the relay performs information forwarding (IF) in the NOMA mode. \(\rho\) is the power splitting factor for harvesting energy, and \(\left( {1  \rho } \right)\) is for decoding information. At the end of \(\alpha T\) of each slot, the relay receives the signal from the BS, which can be expressed as:
where \(x_{S}\) is the signal transmitted by the BS to the relay. \(n_{R} \sim {\mathcal{C}\mathcal{N}}\left( {0,\sigma_{R}^{2} } \right)\) is the additive white Gaussian noise.
The energy harvested by the relay is defined as follows:
where \(\eta\) is the energy conversion efficiency factor. The remaining battery power of the relay is \(B_{r} \left( t \right)\) at the beginning of each slot and \(B_{r} \left( 0 \right)\) = 0 in the first time slot. We assume that the harvested energy is much less than the maximum storage capacity of the relay. After \(\alpha T\) of each slot, the total energy of the battery is:
where \(B_{r} \left( {t  1} \right)\) is the remaining energy of the previous time slot. The relay decodes the received signal and forwards the superimposed signal through NOMA. In each \(\left( {1  \alpha } \right)T\), the maximum transmitting power of the relay can be expressed as:
We assume that \(D_{j}\) belongs to a set \({\mathbb{C}}\) that includes the qualified access users, where \(\left {\mathbb{C}} \right = \varpi\). The received signals from the relay can be defined as:
where \(\lambda_{j}\) is the power factor allocated by the relay to signal \(x_{j}\), and the power allocation factor \(\lambda_{j}\), \(n_{D} \sim {\mathcal{C}\mathcal{N}}\left( {0,\sigma_{D}^{2} } \right)\) is the additive white Gaussian noise.
The expression of the remaining energy of the battery at the relay after each time slot can be expressed as:
We implement successive interference cancellation (SIC) based on the power ranking from strong to weak. If the \(j\)th user is able to eliminate the signals of weaker users, the signaltointerferenceplusnoise ratio (SINR) for decoding its own signal is:
where \(\rho_{R} = P_{R} /\sigma_{D}^{2}\). The achievable data rate at the \(D_{j}\) is defined as follows:
The sum rate of this system is as follows:
Problem formulation
We consider the maximum sum rate of the SWIPTNOMA relay system; thus, the optimization problem is expressed as:
where the set \({\mathbb{C}}\) includes the qualified access users; \(R_{th}\) and \(\Lambda\) are data rate threshold for the \(D_{j}\) to decode the signal successfully and the set of power allocation factors, respectively.
Constraint \(C1\) represents that each access user belongs to the qualified set \({\mathbb{C}}\); constraint \(C2\) represents the minimum quality of service (QoS) requirements for selected access users, where the data rate of each qualified access user needs to be larger than the rate threshold; constraint \(C3\) states that the power cannot be larger than the transmission power of the relay.
A user selection network is established to reduce the interference caused by the access of all users. In addition, since power allocation adjustment is inefficient, we propose a DQN algorithm to solve this problem.
USDPA scheme
In this section, we describe the USDPA scheme to determine user access and power allocation. The USDPA algorithm for the downlink SWIPTNOMA relay system is presented in Fig. 3. We first determine the user access based on the user selection network and subsequently derive the power allocation based on the DQN.
The relay forwards the signals to the users with actions of the user selection network and the power allocation network. By obtaining optimized the user access and power allocation of the system, we maximize the sum rate. The USDPA algorithm is shown in Algorithm 1.
User selection network
In this part, we design an access policy that rapidly generates an access decision \(Y\left( t \right)\).
where \(Y\left( t \right)\) represents the output of the user selection network.
User selection algorithm
The user selection network has an embedded parameter \(\omega_{1} \left( t \right)\) that connects the hidden neurons. At the beginning of each slot, the user selection network uses \(h_{R,D} \left( t \right)\) as the input and outputs a relaxed user access action \(Y\left( t \right)\) with N dimensions according to the access policy \(\pi_{x} \left( t \right)\) and the parameterized \(\omega_{1} \left( t \right)\). Since each value in \(Y\left( t \right)\) is between 0 and 1, it is difficult to determine who should access the system; thus, we design a mapping rule to quantize the output \(Y\left( t \right)\). According to this rule, \(Y\left( t \right)\) is mapped into \(W\) access vectors, and the one with the maximum sum rate is the best access vector \(q^{*} \left( t \right)\).
A fourlayer DNN is designed with one input layer, two hidden layers, and one output layer. The dimensions of the input \(h_{R,D} \left( t \right)\) and output \(Y\left( t \right)\) are \(N\), which denotes the number of destination nodes. The two hidden layers’ activation function is a Relu function, and the output layer uses a Sigmoid activation function. In the \(t\)th slot. The output of the user selection network can be expressed as \(Y\left( t \right) = f_{{\omega_{1} }} \left( {h_{R,D} \left( t \right)} \right)\). The user selection algorithm is shown in Algorithm 2.
The mapping rule
The output \(Y\left( t \right)\) of the user selection network is mapped to \(W\) vectors. Each value of the vector is either 0 or 1, which 0 means the user is not accessed and otherwise. It should be noted that there are \(2^{N}\) cases for vectors; consequently, \(W \in \left[ {1,2^{N} } \right]\), where its initial value is the same as \(N\). Reference [21] proved the effectiveness of this method using the same binary representation in edge computing to evaluate the output of the DNN. The detailed mapping rules are as follows:
(1) \(q_{1}\) accounts for the first mapping vector of \(Y\left( t \right)\) and is obtained by comparing \(Y\left( t \right)\) with 0.5.
where \(j = 1, \ldots ,N\).
(2) The new sequence \(Y^{*} \left( t \right)\) is obtained by sorting \(Y\left( t \right)\) according to the absolute value of the difference between \(Y\left( t \right)\) and 0.5.
(3) The values of the remaining \(W\)1 mapping vectors are related to \(Y^{*} \left( t \right)\), and the vector of the ith mapping is as follows:
where \(l = 2, \ldots ,W  1\) and \(j = 1, \ldots ,N\). Specifically, when \(Y\left[ j \right] = Y^{*} \left[ l \right]\),
After each \(\zeta\) slot, \(W^{*} = \min (\max (W\left( {t  1} \right), \ldots ,W\left( {t  \zeta } \right)) + 1,N)\), where \(W\left( {t  \zeta } \right)\) is the position of the best user selection vector corresponding to slot \(\left( {t  \zeta } \right)\) of \(W\) vectors.
The training of the user selection network
To maximize the sum rate, \(h_{S,R} \left( t \right)\) and each access vector \(q_{k} \left( t \right)\) are the inputs of the power allocation network, which outputs the power allocation \(p_{k} \left( t \right)\). Then, the sum rate is calculated using each action pair \(\left( {q_{k} \left( t \right),p_{k} \left( t \right)} \right)\) where \(k = 1, \ldots ,W\). The system selects the best access action \(q^{*} \left( t \right)\) and adds the newly obtained pair (\(h_{S,R} \left( t \right),q^{*} \left( t \right)\)) to the replay memory 1 for training, and \(q^{*} \left( t \right)\) is used as labels. Subsequently, a batch of training samples \(\Omega_{1} \left( t \right)\) are from the replay memory 1 to train the user selection network and the parameters \(\omega_{1} \left( t \right)\) and the policy \(\pi_{x} \left( t \right)\) are updated. The \(\omega_{1} \left( t \right)\) is updated by reducing the loss function of the user selection network every \(\Delta_{1}\) slots as follows:
The Adam optimizer is utilized in the training process with learning rate \(\theta_{1}\). After training, the user selection policy \(\pi_{x} (t)^{*}\) can be updated.
Power allocation algorithm
Next, we obtain the appropriate allocation action using the DQN; the algorithm is shown in Algorithm 3. We first provide some background information on reinforcement learning (RL) to clarify the algorithm. The key elements of RL are defined as follows:
State space: The state space is defined as \(s = \{ [h_{R,D} \left( t \right),q_{1} \left( t \right)], \ldots ,[h_{R,D} \left( t \right),q_{w} \left( t \right)]\}\).
Action space: \(a\) = \(\left\{ {a^{1} , \ldots ,a^{z} } \right\}\) is defined for its power allocation action space where \(z = A_{M}^{N}\). There are \(M\) power allocation factors, and the action space for \(N\) destinations has \(A_{M}^{N}\) actions.
Reward: We use the NSCUs, whose data rate is no less than the QoS threshold to obtain an immediate reward, which is defined as follows:
where \(\varpi_{k} \left( t \right)\) is the number of qualified users accessing the system in the \(k\)th access vector. Moreover, the cumulative reward function of the power allocation network is defined as follows:
where \(\gamma\) is the discount factor of the reward during \(L\) slots.
Transition probability: \({\mathcal{P}}\) represents the transition probability, i.e., the probability to transition from state \(s\left( t \right)\) to the next state \(s\left( {t + 1} \right)\), given the action \(a\left( t \right)\) executed in the state \(s\left( t \right)\).
The Q value function is instrumental in solving RL problems [22]. The function describes the expected cumulative reward \(R\left( t \right)\) of initial \(s\left( t \right)\), performing action \(a\left( t \right)\), and following policy \(\pi_{r} \left( t \right)\). To obtain the appropriate power allocation action, the Q value function is defined as:
The optimal actionvalue function in Eq. (18) is equal to the Bellman optimality equation [22], which is expressed as follows:
After the optimal Qfunction \(Q^{*} (s(t),a(t))\) is obtained, the Qlearning policy is determined by:
The statevalue function is obtained as follows:
The Qvalue is defined as follows:
where \(\theta_{2} \left( t \right)\) is the learning rate of the power allocation network.
In general, the Q learning algorithm adopts the \(\varepsilon  greedy\) policy to select the power allocation action \(a\left( t \right)\) with probability \(1  \varepsilon\), whereas a random action has a probability of \(\varepsilon = 0.8\). The power allocation action is generated by:
where \(\omega_{2}\) is the parameter of the power allocation network.
Power allocation algorithm based on the DQN
Nevertheless, the Bellman equation is difficult to obtain because it is nonlinear and does not have a closedform solution. The solution to this problem is to utilize neural networks to estimate the Q value. Therefore, we adopt a DQN to establish the power allocation network with a DNN to output the estimated Q value.
We design a power allocation policy \(\pi_{r} \left( t \right)\) that quickly generates a power allocation decision corresponding to each access vector of the user selection network. The power allocation is implemented by the DQN, which is characterized by the embedded parameter \(\omega_{2} \left( t \right)\) that connects the hidden neurons. After the output of the user selection network has been mapped to \(W\) access vectors \(q\left( t \right)\), \(h_{R,D} \left( t \right)\) combined with each access vector \(q_{k} \left( t \right)\) is used as the input of the power allocation network. The output of this algorithm is \(p_{k} \left( t \right)\) corresponding to each access vector \(q_{k} \left( t \right)\). Then, we choose the actions \(\left( {q^{*} \left( t \right),p^{*} \left( t \right)} \right)\) with the maximum sum rate as the best actions and add the newly obtained pair (\(h_{R,D} \left( t \right),p^{*} \left( t \right)\)) to the replay memory 2. Subsequently, a batch of training samples \(\Omega_{2}\) from the replay memory 2 is used to train the power allocation network, and the parameters \(\omega_{2} \left( t \right)\) and \(\pi_{r} \left( t \right)\) are updated.
A fivelayer power allocation network is designed, with one input layer, three hidden layers, and one output layer. The Relu function is used as the activation function in the first two hidden layers, and the tanh function is used in the last hidden layer. The output of the power allocation network can be expressed as \(p_{k} \left( t \right) = f_{{\omega_{2} }} \left( {h_{R,D} \left( t \right),q_{k} \left( t \right)} \right)\).
After allocating power according to access vectors, the relay executes the optimal actions \(\left( {q^{*} \left( t \right),p^{*} \left( t \right)} \right)\) with the maximum sum rate of the system and receives the immediate reward \(r\left( t \right)\). Subsequently, the system moves to the next state, and the replay memory 2 is used to store the tuple \(\left( {s\left( t \right),p^{*} \left( t \right),r\left( t \right),s\left( {t + 1} \right)} \right)\) of each slot. When the replay memory 2 is full, the oldest record is removed, and the newest record is stored.
The training of power allocation algorithm
A batch of training samples \(\Omega_{2} \left( t \right)\) from replay memory 2 is used to train the power allocation network. Then, the target Q value is obtained according to the target Q network as follows:
where \(\overline{\omega }_{2}\) is the parameter of target Q network. The Q network is trained with \(\Omega_{2} \left( t \right)\) by minimizing the loss function of power allocation network which is defined as:
Meanwhile, the Adam optimizer is utilized in the training process with learning rate \(\theta_{2}\). We update the parameters \(\overline{\omega }_{2}\) of the target Q network by copying the parameters of the Q network to each slot.
Complexity analysis
The complexity of the USDPA algorithm depends on the number of layers of the neural network and the number of neurons in each layer. The complexity of the user selection network is \(M_{1} \triangleq Nf_{1} + f_{1} f_{2} + f_{2} N\), where \(f_{1}\) and \(f_{2}\) are the numbers of neurons in the first and second hidden layers, respectively. The complexity of the power allocation network is \(M_{2} \triangleq Nf_{Q1} + f_{Q1} f_{Q2} + f_{Q2} f_{Q3} + f_{Q3} N\), where \(f_{Q1} ,f_{Q2}\) and \(f_{Q3}\) are the numbers of neurons in the first, second, and third hidden layers, respectively. In the USDPA algorithm, the output of the user selection network is mapped to \(W\) user access vectors; thus, the algorithm complexity is \(O\left( {M_{1} + WM_{2} } \right)\).
Results and discussion
In this section, the effectiveness of the proposed user selection and power allocation optimization scheme of the SWIPTNOMA relay system is verified using the simulation. The effects of \(R_{th}\) and various levels of transmitting power at the BS on the performance of the SWIPTNOMA relay system are analyzed to illustrate the superiority of the proposed scheme in increasing the sum rate.
In this paper, Tensorflow 2.0 is used for simulation. The simulation parameters are set as follows [23] (Table 1).
The sizes of replay memory 1 and replay memory 2 are 1000 and 400, respectively. The initial number of mapping vectors \(W = N\).
Validation of training effects
In this part, we assess the performance of the proposed USDPA algorithm using simulations with different requirements for the successful decoding of the signals. Figure 4 shows the \(W\) of the USDPA algorithm versus the training slots when \(P_{S} = 40\) dbm. The value of \(W\) converges quickly after 4000 slots, and the value of \(W\) is nearly stable within 2, indicating that the mapping scheme does not increase the computational complexity. It can be seen that the higher the QoS, the lower the value of \(W\) converges. The reason is that it is easier to satisfy the lower QoS. Figure 5 shows the loss of the USDPA algorithm with \(P_{S}\) = 40 dBm and \(R_{th}\) = 0.3 bits/s/Hz. It can be seen that the loss functions of the user selection network and the power allocation network converge quickly.
Figure 6 shows the average reward of the USDPA versus the training time slots with \(P_{S}\) = 40 dBm for different QoS requirements. It can be observed that different QoS requirements have different effects on the performance of the USDPA algorithm. Specifically, the algorithm takes longer to reach convergence when the QoS requirements are high. In addition, when the QoS requirement is 0.3 bits/s/Hz, the loss of the allocation network converges rapidly after about 10,000 slots (Fig. 5), and the average reward converges rapidly to 10 (Fig. 6). Furthermore, the average reward after 2000 time slots is higher than 0.4 bits/s/Hz at a QoS requirement of 0.5 bit/s/Hz. The reason is that when the QoS requirement is 0.5 bits/s/Hz, the USDPA algorithm selects fewer users, causing less interference, and they can access the system more easily and successfully to meet the QoS requirement and allocate the appropriate power using the DQN. Therefore, the average rewards are relatively high. In general, the results indicate that the USDPA algorithm exhibits excellent learning performance for different QoS requirements.
Experimental results and discussion
The goal of this paper is to maximize the sum rate of the SWIPTNOMA relay system. Consequently, the sum rate and the NSCUs are used to evaluate the algorithm’s performance. Four algorithms are compared with the proposed algorithm: (1) All users access (AU) + DQN: all users access the system, and the power allocation scheme uses the DQN, which is the same as the power allocation scheme in [18]. (2) All users access + average power allocation (AU + AP): all users access the system, and the power of each user’s signal is the average power factor. The algorithm decodes the signals using the order of channels from strong to weak. (3) The user selection scheme average power allocation (US + AP): the users that access the system are determined by the proposed user selection network, and the power of each user’s signal is the average power factor. (4) Random user access (RU) + DQN: the users that access the system are determined randomly, and the DQN is used to allocate the power.
Figure 7 shows the NCMUs for different data thresholds. The NCMUs exhibits a decreasing trend for the USDPA, AU + DQN, US + DQN, AU + AP, and US + AP, when \(P_{S} = 40\) dBm. The reason is that it is difficult for the system to allocate the appropriate power factor to enable the users to decode the signal successfully. AU + DQN shows the best NCMUs performance when the data thresholds are \(R_{th}\) = 0.2 bits/s/Hz and \(R_{th}\) = 0.25 bits/s/Hz. The reason is that AU + DQN is easier to satisfy the lower QoS requirements. The USDPA algorithm exhibits the optimum performance when \(R_{th}\) = 0.3 bits/s/Hz, \(R_{th}\) = 0.35 bits/s/Hz, and \(R_{th}\) = 0.4 bits/s/Hz because the user selection network choose some users to access the system. The fewer the users accessing the system, the less interference there is. However, according to the USDPA, it is not possible for the system to access only one user. The reason is that the power allocation factor is less than 1 and does not take 0 and 1, which means that the sum rate of one user access will not be the maximum. Therefore, the system always selects multiple users to access the system. What’s more, it can be seen that the performance of AU + AP and US + AP schemes both converge when the QoS requirement is high. The reason is that both of them use the average power allocation factor for access users, which makes it difficult to guarantee that all qualified users can successfully decode the signal under the high QoS requirements. When \(R_{th}\) = 0.3 bits/s/Hz, the performance of the USDPA algorithm is 63.3%, 144.7%, 115%, and 7% higher than that of the RU + DQN, US + AP, AU + AP, and AU + DQN, respectively.
Figure 8 shows the average sum rate of the five schemes with \(P_{S} = 40\) dBm. We can see that the performance of the USDPA is the best for all QoS requirements. When \(R_{th}\) = 0.3 bit/s/Hz, the average sum rate of the USPDA algorithm is 47.8%, 38.2%, 178%, and 63.1% higher than that of the AU + DQN, RU + DQN, AU + AP, and US + AP, respectively. The reason is that when the average power allocation is utilized for user access, there is no dynamic adjustment of the power allocation factor. More importantly, we observe that the average sum rate of the USDPA scheme is higher for a QoS requirement of 0.4 bits/s/Hz than a QoS requirement of 0.35 bits/s/Hz. The reason is that when the QoS requirements are higher, the USDPA algorithm selects fewer users to access the system. Thus, there is less interference at the receivers, and the achieved sum rate is higher. In addition, if all users access the system, there is more interference at the receivers, although the DQN is used to allocate power. The US + AP algorithm maintains stable performance as \(R_{th}\) increases because the user selection network chooses appropriate users to access the system. Although the RU + DQN algorithm chooses the users randomly, it still maintains a steady average sum rate because it uses the DQN algorithm to allocate power.
Figure 9 displays the trend of the average sum rate of the five schemes with different levels of transmitting power at the BS, when \(R_{th}\) = 0.3 bits/s/Hz at the receivers. The average sum rate increases with increasing \(P_{S}\). As \(P_{S}\) increases, the SINR at the accessed receivers improves, leading to a performance improvement. In addition, we find that the USDPA scheme outperforms the other four schemes. The proposed scheme jointly optimizes user access and power allocation, and the algorithm exhibits efficient learning ability by utilizing the user selection network and the power allocation network in the dynamic environment.
Conclusion
We propose a USDPA scheme in the SWIPTNOMA relay system to maximize the sum rate in the downlink. A model of the SWIPTNOMA relay system was established with a PSR scheme to harvest energy and forward signals. The USDPA was used to optimize the user access action and power allocation action simultaneously. The simulation results showed that the proposed scheme provided the best performance for increasing the sum rate. Due to the complexity of the problem, practical scenarios of multiantenna configuration and a bidirectional relay will be analyzed in a future study.
Availability of data and materials
The author keeps the analysis and simulation datasets, but the datasets are not public.
Abbreviations
 NOMA:

Nonorthogonal multiple access
 DF:

Decodeandforward
 SWIPT:

Simultaneous wireless information and power transfer
 USDPA:

User selection and dynamic power allocation
 DNN:

Deep neural network
 DRL:

Deep reinforcement learning
 QoS:

Quality of service
 MTs:

Mobile terminals
 SE:

Spectrum efficiency
 TS:

Time switching
 PS:

Power splitting
 CoR:

Cooperative relaying
 ICSI:

Imperfect channel state information
 RHIs:

Residual hardware impairments
 AI:

Artificial intelligence
 CEEs:

Channel estimation errors
 CSI:

Channel state information
 NSCUs:

Number of successful communication users
 DQN:

Deep Q network
 EE:

Energy efficiency
 DBN:

Deep belief network
 BS:

Base station
 HD:

Halfduplex
 PSR:

Power splitting relay
 EH:

Energy harvesting
 ID:

Information decoding
 IF:

Information forwarding
 SIC:

Successive interference cancellation
 SINR:

Signaltointerferenceplusnoise ratio
 RL:

Reinforcement learning
 AU + DQN:

All users + deep Q learning
 AU + AP:

All users + average power allocation
 US + AP:

User selection + average power allocation
 RU + DQN:

Random users + deep Q learning
References
 1.
T.D.P. Perera, D.N.K. Jayakody, S. Chatzinotas et al., Simultaneous wireless information and power transfer (SWIPT): recent advances and future challenges. IEEE Commun. Surv. Tut. 20(1), 264–302 (2018)
 2.
M.A. Hossain, M. Noorr, K.A. Yau et al., A survey on simultaneous wireless information and power transfer with cooperative relay and future challenges. IEEE Access 7, 19166–19198 (2019)
 3.
T.M. Hoang, X.N. Tran, B.C. Nguyen et al., On the performance of MIMO fullduplex relaying system with SWIPT under outdated CSI. IEEE Trans. Veh. Technol. 69(12), 15580–15593 (2020)
 4.
H.Q. Tran, C.V. Phan, Q.T. Vien, Power splitting versus time switching based cooperative relaying protocols for SWIPT in NOMA systems. Phys. CommunAmst. 41, 101098 (2020)
 5.
M. Hedayati, I. Kim, On the performance of NOMA in the twouser SWIPT system. IEEE Trans. Veh. Technol. 67(11), 11258–11263 (2018)
 6.
S.K. Zaidi, S.F. Hasan, X. Gui, Evaluating the ergodic rate in SWIPTaided hybrid NOMA. IEEE Commun. Lett. 22(9), 1870–1873 (2018)
 7.
Z. Yang, Z.G. Ding, P.Z. Fan et al., The impact of power allocation on cooperative nonorthogonal multiple access networks with SWIPT. IEEE Trans. Wirel. Commun. 16(7), 4332–4343 (2017)
 8.
L. Bariah, S. Muhaidat, A. AIDweik, Error probability analysis of NOMAbased relay networks with SWIPT. IEEE Commun. Lett. 23(7), 1223–1226 (2019)
 9.
J.S. Zhou, Y.J. Sun, Q. Cao et al., QoSbased robust power optimization for SWIPT NOMA system with statistical CSI. IEEE Trans. Green Commun. Netw. 3(3), 765–773 (2019)
 10.
X.W. Li, J.J. Li, L.H. Li, Performance analysis of impaired SWIPT NOMA relaying networks over imperfect Weibull channels. IEEE Syst. J. 14(1), 669–672 (2020)
 11.
G.X. Li, D. Mishra, Y.H.S. Atapattu, Optimal designs for relayassisted NOMA networks with hybrid SWIPT scheme. IEEE Trans. Commun. 68(6), 3588–3590 (2020)
 12.
X.W. Li, Q.S. Wang, J.J. Liu et al., Cooperative wirelesspowered NOMA relaying for B5G IoT networks with hardware impairments and channel estimation errors. IEEE Internet Things J. (2020). https://doi.org/10.1109/JIOT.2020.3029754
 13.
T.N. Do, D.B. Costa, T.O. Duoong et al., A BNBF user selection scheme for NOMAbased cooperative relaying systems with SWIPT. IEEE Commun. Lett. 21(3), 664–667 (2017)
 14.
I.H. Lee, H. Jung, User selection and power allocation for downlink NOMA systems with qualitybased feedback in rayleigh fading channels. IEEE Wirel. Commun. Lett. 9(11), 1924–1927 (2020)
 15.
J.L. Ou, H.H. Yu, H.W. Wu et al., Security transmission scheme for twoway untrusted relay networks based on simultaneous wireless information and power transfer. J. Electr. Inf. Technol. 42(12), 2908–2914 (2020)
 16.
T.S. Li, Q.L. Ning, Z. Wang, Optimization scheme for the SWIPTNOMA opportunity cooperative system. J. Commun. 41(8), 141–154 (2020)
 17.
H.L. Yang, A. Alphones, Z.H. Xiong et al., Artificialintelligenceenabled intelligent 6G networks. IEEE Netw. 34(6), 272–280 (2020)
 18.
X.M. Wang, Y.H. Zhang, R.J. Shen et al., DRLbased energyefficient resource allocation frameworks for uplink NOMA systems. IEEE Internet Things J. 7(8), 7279–7294 (2020)
 19.
Y.H. Zhang, X.M. Wang, Y.Y. Xu, et al., Energyefficient resource allocation in uplink NOMA systems with deep reinforcement learning, in IEEE 11th International Conference on Wireless Communications and Signal Processing (WCSP), Xi’an, China, 1–6 (2019)
 20.
J. Tang, C. Luo Ji, J.H. Ou et al., Decoupling or learning: joint power splitting and allocation in MCNOMA with SWIPT. IEEE Trans. Commun. 68(9), 5834–5848 (2020)
 21.
L. Huang, S.Z. Bi, Y.J. Zhang, A deep reinforcement learning for online computation offloading in wireless powered mobileedge computing networks. IEEE Trans. Mobile Comput. 19(11), 2581–2593 (2020)
 22.
H.L. Yang, Z.H. Xiong, J. Zhao et al., Deep reinforcement learningbased intelligent reflecting surface for secure wireless communications. IEEE Trans. Wirel. Commun. 20(1), 375–388 (2021)
 23.
O. Abbasi, A. Ebrahimi, N. Mokari, NOMA inspired cooperative relaying system using an AF relay. IEEE Wirel. Commun. Lett. 8(1), 261–264 (2019)
Acknowledgements
Not applicable.
Funding
This work was supported in part by the National Nature Science Foundation of China under Grant No. 61601070, the Key Research Project of Chongqing Education Commission under Grant No. KJZDK201800603, and the Doctoral Candidate Innovative Talent Project of CQUPT under Grant No. BYJS201912 and No. BYJS2017003.
Author information
Affiliations
Contributions
XZX contributed to basic idea of the paper and provided suggestions for the experimental simulation. ML was responsible for the theoretical analysis, experimental simulation, and manuscript writing of this research. ZYS contributed to the model construction and algorithm design. QH and HT provided suggestions for theoretical analysis and English writing. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Xie, X., Li, M., Shi, Z. et al. User selection and dynamic power allocation in the SWIPTNOMA relay system. J Wireless Com Network 2021, 124 (2021). https://doi.org/10.1186/s13638021019980
Received:
Accepted:
Published:
Keywords
 Nonorthogonal multiple access (NOMA)
 Simultaneous wireless information and power transfer (SWIPT)
 User selection and power allocation
 Deep neural network (DNN)
 Deep reinforcement learning (DRL)