Skip to main content

Table 1 Q-learning algorithm for the proposed scheme

From: Q-learning-based dynamic joint control of interference and transmission opportunities for cognitive radio

Algorithm 1 Q-learning algorithm for the proposed scheme

1:

while required packet exists

2:

if time index > sensing window size

3:

initialize the sensing results and obtained results

4:

else

5:

time index = time index +1

6:

while time index < n sensing n reply

7:

Determine sensing tool (with IA or without IA)

8:

from the k out of N rule with N = n sensing n reply sensing results

9:

end

10:

Obtain P(D0), P(D1) from the sensing results

11:

Calculate P(H0), P(H1) from the simultaneous equations of P(D0),P(D1)

12:

Calculate R I ,R L from the conditional probabilities and determine the state

13:

Select an action a t based on the optimal policy from the current state s t

14:

Obtain the immediate payoff R from action a t

15:

Observe the next state st + 1 by R I ,R L

16:

Update \( \mathcal{Q}\left({s}_t,{a}_t\right) \) based upon this experience as

17:

\( \mathcal{Q}\left({s}_t,{a}_t\right)\leftarrow \left(1-\alpha \right)\mathcal{Q}\left({s}_t,{a}_t\right)+\alpha \left\{{r}_t+\gamma \underset{a_{t+1}}{\max}\mathcal{Q}\left({s}_{t+1},{a}_{t+1}\right)\right\} \)

18:

end

19:

end