Skip to main content
Fig. 18 | EURASIP Journal on Wireless Communications and Networking

Fig. 18

From: Reinforcement learning-based dynamic band and channel selection in cognitive radio ad-hoc networks

Fig. 18

Average reward comparison for Q-learning channel selection vs. random channel selection. The average reward for each DDR depending on the method of the channel selection. For all of the DDR cases, Q-learning band and channel selection has more reward value than random selection. The reward for a DDR of 1.5 or 3.5 Mbps (e.g., more than medium or high DDR case) is lower than that of 10 and 50 kbps. The ε-greedy policy in case of a high DDR causes very low reward for channel utilization RUtil due to the selection of a low-band channel which support insufficient data rate and these effects are accumulated in the Q-table

Back to article page