# Table 1 Q-learning algorithm for the proposed scheme

Algorithm 1 Q-learning algorithm for the proposed scheme
1: while required packet exists
2: if time index > sensing window size
3: initialize the sensing results and obtained results
4: else
5: time index = time index +1
6: while time index < n sensing n reply
7: Determine sensing tool (with IA or without IA)
8: from the k out of N rule with N = n sensing n reply sensing results
9: end
10: Obtain P(D0), P(D1) from the sensing results
11: Calculate P(H0), P(H1) from the simultaneous equations of P(D0),P(D1)
12: Calculate R I ,R L from the conditional probabilities and determine the state
13: Select an action a t based on the optimal policy from the current state s t
14: Obtain the immediate payoff R from action a t
15: Observe the next state st + 1 by R I ,R L
16: Update $$\mathcal{Q}\left({s}_t,{a}_t\right)$$ based upon this experience as
17: $$\mathcal{Q}\left({s}_t,{a}_t\right)\leftarrow \left(1-\alpha \right)\mathcal{Q}\left({s}_t,{a}_t\right)+\alpha \left\{{r}_t+\gamma \underset{a_{t+1}}{\max}\mathcal{Q}\left({s}_{t+1},{a}_{t+1}\right)\right\}$$
18: end
19: end 