1:

while required packet exists

2:

if time index > sensing window size

3:

initialize the sensing results and obtained results

4:

else

5:

time index = time index +1

6:

while time index < n_{
sensing
}n_{
reply
}

7:

Determine sensing tool (with IA or without IA)

8:

from the k out of N rule with N = n_{
sensing
}n_{
reply
} sensing results

9:

end

10:

Obtain P(D_{0}), P(D_{1}) from the sensing results

11:

Calculate P(H_{0}), P(H_{1}) from the simultaneous equations of P(D_{0}),P(D_{1})

12:

Calculate R_{
I
},R_{
L
} from the conditional probabilities and determine the state

13:

Select an action a_{
t
} based on the optimal policy from the current state s_{
t
}

14:

Obtain the immediate payoff R from action a_{
t
}

15:

Observe the next state s_{t + 1} by R_{
I
},R_{
L
}

16:

Update \( \mathcal{Q}\left({s}_t,{a}_t\right) \) based upon this experience as

17:

\( \mathcal{Q}\left({s}_t,{a}_t\right)\leftarrow \left(1\alpha \right)\mathcal{Q}\left({s}_t,{a}_t\right)+\alpha \left\{{r}_t+\gamma \underset{a_{t+1}}{\max}\mathcal{Q}\left({s}_{t+1},{a}_{t+1}\right)\right\} \)

18:

end

19:

end
