Skip to main content
Fig. 9 | EURASIP Journal on Wireless Communications and Networking

Fig. 9

From: Dynamic handoff policy for RAN slicing by exploiting deep reinforcement learning

Fig. 9

Value function convergence. This figure gives the convergence of the action value function where Q(s,a1), Q(s,a2) and Q(s,a3), respectively, correspond to the action value functions corresponding to the three different actions by the user in the same state s. As shown in the figure, the convergence of the action value function corresponding to the state s is similar to that in Fig. 8

Back to article page