Fig. 24From: Reinforcement learning-based dynamic band and channel selection in cognitive radio ad-hoc networksRewards for a Q-learning and b random channel selection according to iteration (DDR = 3.5 Mbps). Figures 22, 23, and 24 show the rewards for Q-learning and random band and channel selection according to the iteration for each DDR. In the case of Q-learning for all DDRs, the fluctuation decreases over time and the system operates with the intended reward design. In the random selection, there are more notches and fluctuation than Q-learning selectionBack to article page