Figure 4From: Sensing time and power allocation for cognitive radios using distributed Q-learningExploration strategy that consists in doing pure exploration during the ϵ ̄ T TDMA first seconds of each TDMA time slot, then pure exploitation during the remaining ( 1 - ϵ ̄ ) T TDMA last seconds of the time slot.Back to article page