Fig. 14From: Reinforcement learning-based dynamic band and channel selection in cognitive radio ad-hoc networksRewards, states, and actions according to iteration at DDR = 90 kbps. In a, the reward is stable at more than 10 iterations, and we can see that the reward is temporally low in the overall interval by random action, similarly to Fig. 12. As shown in b the agent mainly visits the state 5. c Reveals that actions in band group 2 are selected mostlyBack to article page