Skip to main content
Fig. 12 | EURASIP Journal on Wireless Communications and Networking

Fig. 12

From: Reinforcement learning-based dynamic band and channel selection in cognitive radio ad-hoc networks

Fig. 12

The change of a rewards, b states, and c actions according to iteration at a low DDR of 40 kbps. The temporary low reward value is due to the random action of Q-learning exploration. The agents visits the state 2 more often than the states 4 and 5 over time as seen in Fig. 11b. As shown in Fig. 11a, the action in c mainly visits channel 2 or 3 and is adaptive to the DDR at the latest possible moment, even if a channel from band group 2 is selected or a channel from band group 1 offering a high data rate is selected. c represents how the agent selects the channels in band group 1 suitable for the DDR over time. Therefore, we can see that the agent operates according to the designed mechanism

Back to article page