Skip to main content
Fig. 6 | EURASIP Journal on Wireless Communications and Networking

Fig. 6

From: User selection and dynamic power allocation in the SWIPT-NOMA relay system

Fig. 6

The average reward of USDPA algorithm with \(P_{S}\) = 40 dBm under different QoS requirements versus time. The reward function determines the speed and degree of convergence of the reinforcement learning algorithm. At low QoS, the reward function is able to converge quickly and vice versa. The x-axis is the time slot, and the y-axis is the average reward. The illustrations are \(R_{th}\) = 0.3 bits/s/Hz, \(R_{th}\) = 0.4 bits/s/Hz, and \(R_{th}\) = 0.5 bits/s/Hz

Back to article page