Fig. 16From: Reinforcement learning-based dynamic band and channel selection in cognitive radio ad-hoc networksRewards, states, and actions according to iteration at DDR = 3.5 Mbps. In a, the reward is stable overall, while it is temporally low in the overall interval by random action, similar to Figs. 12 and 14. In b, the system visits state 4 to a degree, but it mainly remains in states 7 and 8. c shows that actions in band group 2 are selected mostlyBack to article page