1:
|
while required packet exists
|
2:
|
if time index > sensing window size
|
3:
|
initialize the sensing results and obtained results
|
4:
|
else
|
5:
|
time index = time index +1
|
6:
|
while time index < n
sensing
n
reply
|
7:
|
Determine sensing tool (with IA or without IA)
|
8:
|
from the k out of N rule with N = n
sensing
n
reply
sensing results
|
9:
|
end
|
10:
|
Obtain P(D0), P(D1) from the sensing results
|
11:
|
Calculate P(H0), P(H1) from the simultaneous equations of P(D0),P(D1)
|
12:
|
Calculate R
I
,R
L
from the conditional probabilities and determine the state
|
13:
|
Select an action a
t
based on the optimal policy from the current state s
t
|
14:
|
Obtain the immediate payoff R from action a
t
|
15:
|
Observe the next state st + 1 by R
I
,R
L
|
16:
|
Update \( \mathcal{Q}\left({s}_t,{a}_t\right) \) based upon this experience as
|
17:
|
\( \mathcal{Q}\left({s}_t,{a}_t\right)\leftarrow \left(1-\alpha \right)\mathcal{Q}\left({s}_t,{a}_t\right)+\alpha \left\{{r}_t+\gamma \underset{a_{t+1}}{\max}\mathcal{Q}\left({s}_{t+1},{a}_{t+1}\right)\right\} \)
|
18:
|
end
|
19:
|
end
|