Fig. 4From: Reinforcement learning-based dynamic band and channel selection in cognitive radio ad-hoc networksProposed Q-learning mechanism. The CH of the ad-hoc CR system is the agent of Q-learning, and the action is a selection of a tuple (band group and channel) when the PU is detected on the current band group and channel. The Q-learning agent (CH) designates the state from the information of member node and statistics of environment by the last action. From the Q-learning module, the Q-learning agent obtains the reward, change the Q-table and next action tupleBack to article page