Skip to main content
Fig. 6 | EURASIP Journal on Wireless Communications and Networking

Fig. 6

From: Reinforcement learning-based dynamic band and channel selection in cognitive radio ad-hoc networks

Fig. 6

Proposed procedure for Q-table update, state determination, and action selection. (1) Suppose the learning agent CH determined the state st − 1 and the best action at − 1 at the end of (t − 1)-th time period. (2) During t-th time period, MNs and CH monitor the primary activities and channel statistics. (3) Agent CH detects the band and channel change event. (4) The CH calculates the reward rt − 1 for the previous action at − 1 at state st − 1. (5) The CH updates the Q-value of (st − 1, at − 1) in Q-table. (6) The CH determines the current state st based on the measured DRE during t-th time period. (7) The CH selects the optimal action at for the next (t + 1)-th time period. (8) Go to step 1

Back to article page