Skip to main content

Table 1 Parameters of the TD3 algorithm implemented by the secondary user

From: Dynamic spectrum access and sharing through actor-critic deep reinforcement learning

Discount rate γ of cumulative reward 0.5
Learning rate of actor 0.0001
Learning rate of critic 0.0003
Update parameter of target networks \(\rho\) 0.001
TD3 delayed update of actor 1 actor update for 10 critic updates
Experience replay buffer size 100,000
Mini-batch size 128
State observation time span T0 32 time slots
Reward coefficient \(\beta\) 0.05 (bit/s/Hz) / mW
Exploration noise w added to the action, decreasing during training Start at \(\sigma _{w}=10\) mW
\(\sigma _{w,t+1} = 0.99995\sigma _{w,t}\)