Fig. 17From: Reinforcement learning-based dynamic band and channel selection in cognitive radio ad-hoc networksa Average operation time, b average transmission rate, and c reward of utilization according to weight change. The average operation time, average data rate, and reward for channel utilization by changing the weight assignment for DDR to 40 kbps. Since the reward function is composed of the weighted sum of the objective functions, the Q-learning can be operated according to the desired objective function by adjusting the weight. Therefore, if the weight of the operation time is increased, the average operation time is increased, and if the weight of the data transmission rate is increased, the average transmission rate is increased. Finally, increasing the weight of reward for utilization increases the average of reward for utilizationBack to article page