Fig. 9From: Q-learning-based dynamic joint control of interference and transmission opportunities for cognitive radioFlow chart for reward function. We use (27) or (26) for the reward function depending on whether the interference ratio meets the constraint. If (27) is used, the sign of the parameter changes depending on whether or not the transmission opportunity loss ratio satisfies the restriction condition. If (26) is used, the sign of the parameter is changed according to the tendency of whether the interference ratio is improved or deteriorated. After that, the sign of the parameter is changed according to the tendency of the transmission opportunity loss ratio is improved or not for each divided caseBack to article page