Fig. 9From: Dynamic handoff policy for RAN slicing by exploiting deep reinforcement learningValue function convergence. This figure gives the convergence of the action value function where Q(s,a1), Q(s,a2) and Q(s,a3), respectively, correspond to the action value functions corresponding to the three different actions by the user in the same state s. As shown in the figure, the convergence of the action value function corresponding to the state s is similar to that in Fig. 8Back to article page