Skip to main content
Fig. 25 | EURASIP Journal on Wireless Communications and Networking

Fig. 25

From: Reinforcement learning-based dynamic band and channel selection in cognitive radio ad-hoc networks

Fig. 25

Rewards for Q-learning and random channel selection according to DDR change. Represents the rewards and visits of states according to the changes in DDR as shown in a. Comparing b and d, the rewards of Q-learning selection are more stable than that of the random selection. From c and e, we can see that the ad-hoc CH selects a low data rate channel and Q-learning visits the state of low DRE in band group 1 when the DDR is low. Furthermore, we can see that the Q-learning visits a state of low DRE of band group 2 when the system selects a high data rate channel by the explorer policy. When the DDR is high, the Q-learning tries to select the channel of band group 2 mainly which provides higher data rates so that the sates of band group 1 are visited less frequently. However, in random channel selection, the visits of states are distributed evenly in various DREs when the DDR is low or high. c and e shows that the visiting states of Q-learning and random channel selection are the same for a particular DDR. However, since the Q-learning channel selection tries to select a channel adaptive for the specific DDR, Q-learning mainly visits the state of the band group 1 when the DDR is low and visits the state of the band group 2 when the DDR is high. From these results, the proposed Q-learning selects the appropriate channel even if the DDR changes

Back to article page