Fig. 20From: Reinforcement learning-based dynamic band and channel selection in cognitive radio ad-hoc networksThe a mean and b boxplot of the reward for the channel utilization by the Q-learning and random selection at each DDR. For all DDRs, the boxplot of Q-learning has denser distribution and higher values than that of the random selection, and it has a higher average valueBack to article page