Q-learning-based dynamic joint control of interference and transmission opportunities for cognitive radio

EURASIP Journal on Wireless Communications and Networking

Table 1 Q-learning algorithm for the proposed scheme

Algorithm 1 Q-learning algorithm for the proposed scheme
1:	while required packet exists
2:	if time index > sensing window size
3:	initialize the sensing results and obtained results
4:	else
5:	time index = time index +1
6:	while time index < n_sensingn_reply
7:	Determine sensing tool (with IA or without IA)
8:	from the k out of N rule with N = n_sensingn_reply sensing results
9:	end
10:	Obtain P(D₀), P(D₁) from the sensing results
11:	Calculate P(H₀), P(H₁) from the simultaneous equations of P(D₀),P(D₁)
12:	Calculate R_I,R_L from the conditional probabilities and determine the state
13:	Select an action a_t based on the optimal policy from the current state s_t
14:	Obtain the immediate payoff R from action a_t
15:	Observe the next state s_t + 1 by R_I,R_L
16:	Update \( \mathcal{Q}\left({s}_t,{a}_t\right) \) based upon this experience as
17:	\( \mathcal{Q}\left({s}_t,{a}_t\right)\leftarrow \left(1-\alpha \right)\mathcal{Q}\left({s}_t,{a}_t\right)+\alpha \left\{{r}_t+\gamma \underset{a_{t+1}}{\max}\mathcal{Q}\left({s}_{t+1},{a}_{t+1}\right)\right\} \)
18:	end
19:	end