Skip to main content

Table 1 Parameters of the TD3 algorithm implemented by the secondary user

From: Dynamic spectrum access and sharing through actor-critic deep reinforcement learning

Discount rate γ of cumulative reward

0.5

Learning rate of actor

0.0001

Learning rate of critic

0.0003

Update parameter of target networks \(\rho\)

0.001

TD3 delayed update of actor

1 actor update for 10 critic updates

Experience replay buffer size

100,000

Mini-batch size

128

State observation time span T0

32 time slots

Reward coefficient \(\beta\)

0.05 (bit/s/Hz) / mW

Exploration noise w added to the action, decreasing during training

Start at \(\sigma _{w}=10\) mW

\(\sigma _{w,t+1} = 0.99995\sigma _{w,t}\)