Reward versus moving speed: v
=50 m/s. It depicts the impact of moving speed on the reward for the original moving speed v0=50 m/s. We assume that the RSs are fixed, which are good enough to ensure the transmission rate. That is, the channel of BS-to-RS stays in a good state or transits to a good state with a high probability. Therefore, the state of the BS-to-RS channel is good for 16QAM, since the RSs are deployed in the advantageous geographic locations, the modulation schemes of the RS-to-train channel can be BPSK, QAM, or 16QAM based on different channel states. The corresponding transmission rates of BPSK, QAM, and 16QAM are 1, 2, and 4, respectively. We set the default discount factor as λ=0.9. The duration of one symbol is 0.1 ms. The simulation results are attained when the train is moving at speed v1, which is varied from 20 to 200 m/s in the simulation. The Monte Carlo simulations are conducted over a large number of trials, and the state-transition probability matrices of RS-to-train channels are chosen randomly for each trial based on different SNR thresholds. All the SNR values are normalized to , which has a normalized value of 1. Note that only one type of the inter-relay handoff, either intra-cell or inter-cell handoff, may occur for a given user at any decision epoch. For illustration purposes, we consider that the intra-cell inter-relay handoff introduces little overhead, which is set to zero, and the overhead for performing the inter-cell inter-relay handoff is set to 0.1 as the default value. The exact values of these parameters may be different in different systems, depending on specific implementations.