Figure 9
From: Sensing time and power allocation for cognitive radios using distributed Q-learning

Distance d t between the secondary SINRs generated by the Q-learning algorithm and the optimal secondary SINRs when using different frequencies of learning f in the Q-learning implementation. The randomness of exploration ϵ is constant and a cooperative cost function is used.