Reinforcement learning-based hybrid spectrum resource allocation scheme for the high load of URLLC services

EURASIP Journal on Wireless Communications and Networking

Table 1 Partial parameter settings

Parameter	Value	Description
t	\(0.015625\;{\text{ms}}\)	The time interval for each resource allocation decision
L_rp	\(0.15625\;{\text{ms}}\)	The duration of the resource pool
\|sRB\|	\(5760\;{\text{KHz}}\)	The length of an RB in the frequency domain
\|pRB\|	\(9\;{\text{dBm}}\)	The length of an RB in the power domain
\(D_{MAX}\)	[3 ms, 1 ms]	Maximum delay constraint
\(\left\| Q \right\|\)	100	The length of the URLLC cache queue Q
NUM_U	100	The total amount of received URLLC data
\(\alpha\)	\(0.001\)	Learning step size of policy gradient
\(N\)	4	The number of hidden layers
\(p_{i}\)	1/4	The initial probability of exit at the ith layer