A multi-objective model-based vertical handoff algorithm for heterogeneous wireless networks

The emergence of 5G communication systems will not replace existing radio access networks but will gradually merge to form ultra-dense heterogeneous networks. In heterogeneous networks, the design of efficient vertical handoff (VHO) algorithms for 5G infrastructures is necessary to improve quality of service (QoS) and system resource utilization. In this paper, an optimized algorithm based on a multi-objective optimization model is proposed to solve the lack of a comprehensive consideration of user and network impacts during the handoff process in existing VHO algorithms. The Markov chain model of each base station (BS) is built to calculate a more accurate value of the network state that reflects the network performance. Then, a multi-objective optimization model is derived to maximize the value of the network state and the user data receiving rate. The multi-objective genetic algorithm NSGA-II is finally employed to turn the model into a final VHO strategy. The results of the simulation for the throughput and blocking rate of networks demonstrate that our algorithm significantly increases the system throughput and reduces the blocking rate compared to the existing VHO strategies.

of user switching occurrences, and improve user QoS. However, this type of algorithm usually fails to perceive the real-time dynamic changes of the network well, which may lead to a decrease in system resource utilization. Some authors have quantified the network state by introducing the Markov chain model [2,3] so that users can attain a good perception of the performance characteristics of BSs. However, this kind of algorithm does not acquire enough relevant information about users, which tends to lead to poor user QoS. In addition, some intelligent algorithms have been introduced to address the VHO process, such as artificial neural networks [4], fuzzy logic algorithms [5]. Such algorithms have strong data processing capability and can effectively reduce the pingpong effect and improve decision-making accuracy. However, the calculations involved are relatively complex and unsuitable for terminal equipment with limited computing capacity.
In this paper, we propose a VHO algorithm based on a multi-objective optimization model to solve the above problems. First, the state values of each network in different states are obtained through the Markov decision process. Then, a multi-objective optimization model that maximizes the value of the network state and user's data receiving rate is constructed. Finally, the multi-objective genetic algorithm NSGA-II is applied to solve the designed model and obtain the user's optimal handoff strategy. Compared with the existing VHO strategies, the scheme can significantly improve the utilization of system resources and reduce the blocking rate. The main contributions of this paper are summarized as follows.
1. We calculate the status value of each BS among the handoff periods to ensure that the user can perceive the performance change characteristics of the BS in real time and can effectively monitor the load rate and remaining available resources of the BS. 2. We combine the changing characteristics of the BS status with the user's QoS and fully consider the status information of the network side and the user side, which can maximize the resource utilization of the system while ensuring the user's QoS. 3. We define the handoff strategies of all handoff users as a decision variable for multiobjective optimization, which increases user coordination and ensures that more users can access the BSs, thereby further improving the resource utilization of the system.
The remainder of this paper is organized as follows. Section 2 presents related works. In Sect. 3, we derive and solve the function of a multi-objective optimization model. In Sect. 4, we simulate the performance of the handoff algorithm, and Sect. 5 summarizes this article.

Related work
In recent years, many VHO algorithms have been proposed in the literature. In the early research studies, the authors used the received signal strength (RSS) as the decision parameter to propose the traditional RSS-based VHO algorithm [6] and improved the algorithm based on the RSS threshold [7]. While this type of algorithm is simple in design and low in its level of complexity, it has a tendency to lead to a 'ping-pong' effect. In addition, since the characteristics of different access networks have become greatly varied in heterogeneous networks, executing handoff only based on RSS cannot guarantee the user's QoS.
To solve the problem that single-attribute algorithms cannot guarantee the user's QoS, a large number of papers have focused on multi-attribute VHO algorithms, which are easy to implement. In [8], a self-selection multi-attribute algorithm based on a decision tree was proposed, and this algorithm can effectively improve the users' QoS and reduce switching times. However, noise interference may cause switching errors. In [9], an improved vertical handover algorithm based on a decision tree was proposed. The algorithm derived the false alarm probability and missed alarm probability to reduce the probability of error handoff, obtained more accurate judgment attributes through the Kalman filtering algorithm, and improved the stability of the system. However, it is less able to consider the dynamic characteristics of the network, and the network performance of the BSs cannot be effectively evaluated, which may easily cause a growing blocking rate of the BS.
In [10], the Markov decision process was applied to the VHO. The algorithm introduced the total expected reward to characterize the performance of the network when the user accessed the BS, which can reflect the characteristics of the network to a certain extent, but the algorithm takes a long time to solve the expected total reward. In [11], the author formulated the VHO decision problem as a Markov decision process, with the objectives of maximizing the expected total reward and minimizing the average number of handoffs. Then, the improved genetic algorithm with simulated annealing was introduced to obtain a set of optimal decisions, which can effectively reduce the calculation time and ensure user QoS, but it lacks consideration of the real-time dynamic characteristics of the network. In [12], a VHO algorithm based on multi-armed bandit was proposed to solve the problem of a lack of consideration of the real-time dynamic characteristics of the network, which can ensure that users can access the network with a low blocking rate and a low delay. However, the algorithm does not take into account the changes of users' own states (such as location) and thus cannot guarantee user QoS requirements. In [13], the authors introduced a deep Q-network to maximize the benefits of the system and used an evolution strategy algorithm to solve the initial parameters of the backpropagation network, which improved the convergence speed and accuracy of parameter learning, but it also increases the computational complexity.
In [14][15][16][17], the authors studied multi-attribute intelligent VHO algorithms based on fuzzy logic and artificial neural networks. In [14], the authors considered the accuracy of RSS and combined the Kalman filter with a fuzzy logic algorithm to propose an improved VHO algorithm. The Kalman filter can filter out part of the Gaussian noise in the RSS to obtain more precise parameters and improve the effectiveness of decisionmaking, and the fuzzy logic algorithm can handle large amounts of fuzzy data. However, the scale of the fuzzy inference rules increases exponentially with the number of attributes, which greatly increases the complexity of the algorithm. Reference [15] proposed a neural network-based network selection algorithm that improves the algorithm's ability to adapt to network variability, but it does not consider the user's QoS and cannot reflect user satisfaction well. In [16], the authors considered the quality of experience (QoE) in the vertical algorithm based on an artificial neural network to improve user satisfaction. The authors in [17] presented a Q-learning-based algorithm for VHO. That algorithm proposed a QoE evaluation mechanism based on an RNN, which further improves the QoE. However, the complicated calculations make them unsuitable for terminal devices with limited computing capabilities.
In [18], a novel VHO management scheme for a fifth-generation vehicular cloud computing system is proposed, in which the handoff trigger is initiated in the fog environment using the vehicle's current velocity and the network signal-to-noise-plus-interference ratio. The network selection is done by the cloud using the vehicles' speed. After that, vehicles start the handoff process accordingly. This scheme uses pentagonal fuzzy interval values and the fuzzy analytic network process algorithm to predict the target network. In comparison with other handoff schemes, their work always chooses the best connection in terms of the number of handoffs performed and user satisfaction. In [19], a novel mobility management scheme for 5G systems is proposed with a scheme that considers the quality of service perceived by the mobile nodes as well as the energy level of mobile nodes, which can effectively ensure user requirements and reduce system energy consumption. At the same time, in the execution phase, an improved FPMIP method is proposed to reduce signaling costs, packet delays, and losses, but it is less flexible.
In [20], the authors built a user-centered multi-objective handoff scheme that can improve the throughput of the system to a certain extent but still cannot reflect the true state of the BSs, and it lacks consideration of coordination among users in its design so that network resources cannot be fully utilized.

Method
In this section, we construct a multi-objective optimization model for VHO to solve the problem inherent in existing VHO algorithms that did not comprehensively consider both of the impact of users QoS and the network performance during handoff process at the same time. There are two phases. In the first phase, we obtain the status value sequence of each BS by a Markov chain model. In the second phase, we build a multiobjective optimization model to maximize the value of the network state and the user data receiving rate, and then the multi-objective genetic algorithm NSGA-II is used to solve the multi-objective optimization model to obtain the final handoff strategy. Table 1 shows the variable notations used in this paper.

Pre-judgment stage
We assume there are M BSs in the heterogeneous networks, and a total of N(t) users are located under the coverage of the heterogeneous networks at time t. Next, we divide users into two classes: non-handoff users and handoff users. Non-handoff users will stay in the connections with their current BSs, while handoff users will transfer their network connections to new BSs. For each user j(j = 1, 2, . . . , N (t)) , the RSS coming from each BS can be calculated. We set the distance from user j to BS i(i = 1, 2 . . . , M) at time t as l ij (t) . Then, the RSS user j receives from BS i at time t can be calculated as follows: where ρ i is the transmission power of BS i, κ i is the path loss factor of BS i, and ǫ 2 i represents the Gaussian random variable that satisfies the Gaussian distribution (0, σ 2 1 ) . Further, according to Shannon's theorem, the data transmission rate q ij (t) that user j obtains from BS i at time t can be calculated as follows: where w ′ i is the average bandwidth that each channel of the BS i can be allocated, ξ 2 i is the Gaussian white noise satisfying the Gaussian distribution (0, σ 2 2 ) , and D j (t) is the interference signal strength of user j at time t.
The total users located under the coverage of the heterogeneous networks at time t The distance from user j to BS i rss ij The RSS user j receives from BS i The path loss factor of BS i The Gaussian random variable of RSS q ij The data transmission rate user j obtains from BS i The average bandwidth that each channel of BS i can be allocated The interference signal strength of user j at time t The Gaussian white noise of the interference signal strength During the decision process, we need to determine the number of non-handoff users under each BS and the total number of handoff users in the heterogeneous networks to determine the network status of each BS more accurately. Here, we use the basic bandwidth requirement of the user to determine whether each user requires a handoff. For users who newly entered the coverage of the heterogeneous networks and those that have not connected to the BS due to blockage during the last handoff, the current actual data transmission rate is 0, which means they are considered users who need to be handed over. For users who have connected to the BS, we should further consider whether they are handoff users. When user j is connected to BS i at time t, we can obtain q ij (t) according to Eq. (2). The rules are that when q ij (t) < γ ( γ is the basic bandwidth requirement of the user), j needs to be handed over; otherwise, user j would continue to be connected to BS i. In that way, the number of handoff users u − is(t) in the heterogeneous networks and the number of nonhandoff users in each BS u − non i (t) will be determined.
For the handoff user in heterogeneous networks, a pre-decision module is required to approximately determine the target BS that the user may access based on the user's data transmission rate. It is assumed that Q j = q 1j , q 2j , . . . , q Mj is the data transmission rate set that can be obtained by user j from each BS at time t, and user j would select the BS with the highest data transmission rate as a target for pre-handoff. Then, we can obtain pre-target BSs for all users and count the number of users who pre-switched to each BS, set as u − is i (t)(i = 1, 2, . . . , M) . We assume the user's decision gap is τ ; then, the user arrival rate i of BS i within the decision gap can be calculated as i = u − is i (t)/τ . According to queuing theory [21], the probability that k users would reach each BS in unit time is: When the service rate µ i of each BS is known, the probability that k users would leave the BS in unit time can be obtained: where c i is the number of users served in BS i, c i = u − non i (t) represents the number of non-handoff users in each BS.
The state of the BS changes as the number of users accessing the BS changes, such as the remaining available bandwidth and network delay of the BS. The status of BSs in the heterogeneous network at time t is represented by set S(t); then, S(t) = {s 1 (t), s 2 (t), . . . , s M (t)} . At each decision time, the same user can only access one BS, and the status of the BS changes based on the Markov chain. To find the state value of each BS in different states, it is necessary to first obtain the instant rewards of different states of each BS and the transition probability between states. The instant reward is determined by the network performance of each BS. The better the performance is, the greater the instant reward. Here, the remaining available bandwidth and network delay The remaining available bandwidth is a benefit parameter, and its value function is: The network delay is a loss parameter whose value function is: where d max and d min are maximum and minimum delay required for connection, respec- . w b and w d are the weights of the remaining available bandwidth and network delay, respectively. The real-time reward reflects the network performance of the BS under the current state rather than the dynamic changes of the state. Therefore, the state value of the BS needs to be further determined according to the state transition probability. The state of the BS is related to the number of users connected to it, and thus the change in the number of users would lead to changes in the state of the BS. Let t be the time when the decision cycle starts and c i be the number of users being served in the corresponding BS. Then, the probability that a user changes from c i to c next can be calculated by the following when c next < c i : when c next ≥ c i : where p(c i , c next ) is the state transition probability of the BS. According to basic bandwidth requirements, we are able to determine that the total number of users who need to make a handoff decision at time t is u − is(t) . Considering that all handoff users may access the same BS, it is necessary to calculate the u − is(t) state values corresponding to each BS. Because the number of accessing channels of each BS is limited, the number of users connected to each BS is limited as well. Suppose that the number of channels of each BS in a heterogeneous network is η i ; then, according to Eqs. (5)-(7), when u − is(t) > η i , we can calculate the state value of each state: In Eqs. (8) and (9), V π (s i (t + 1)) represents the state at the next moment, which has multiple possibilities. The probability that the user would transit to each state is p(s i (t), s i (t + 1)) = p(c i , c next ) . β is the discount factor. State values of corresponding states of all BSs can be obtained by using a dynamic programming algorithm.

Switching model design
According to Eqs. (8) and (9), we can obtain the state value corresponding to each state of the BS. According to the calculation method of Eq. (5) for the instant reward, at the same moment, it is known that the BS with the larger state value has better network performance and users tend to access the BS with excellent performance in order to obtain higher bandwidth, lower delay, and reduce the blocking rate of the BS when performing a network handoff. However, the handoff solely based on the better state of the network cannot guarantee the QoS. We have to consider changes of the user's status as well, such as the user's maximum transmission rate, which is not only related to the average bandwidth of the BS but also to the distance from the user to the BS inferred by Eq. (2). We select the user's data transmission rate to reflect the user's service quality. The greater the data transmission rate is, the higher the user's service quality. When designing a handoff scheme, users would always choose a BS with high data transmission rate and high-status value.
In the same decision cycle, there may be multiple users performing network switching. If performing a handoff with a single-handoff user as the center under traditional handoff schemes, it would be less efficient to utilize the overall resources in a heterogeneous network system, and the network performance would not reach the optimal state as it lacks coordination between handoff users during handoffs. To improve the resource utilization efficiency and the network performance, when designing the handoff scheme, handoff users should be coordinated to carry out a unified design, and the single-user handoff should be converted into a multi-user coordinated handoff in turn. First, we set the connected BSs of all handoff users at time t as a matrix variable (t): The matrix (t) reflects the network selection results of all handoff users at time t, M is the total number of BSs in the heterogeneous network, and u − is(t) is the number of handoff users at time t. Each element in the matrix can only take 1 or 0 rules as follows: where i = 1, 2, . . . , M and j = 1, 2, . . . , u − is(t).
The i-th row of the matrix is represented as (t) ((i,:)) , which reflects the situation where the handoff user accesses BS i. The j-th column of the matrix is represented as (t) ((i,:)) , reflecting the network selection result of user j at time t. Similarly, the data transmission rate from each BS for all handoff users at decision time t can be calculated by Eqs. (1) and (2), and we can construct a data transmission rate matrix Q(t) . Its mathematical form is as follows: After obtaining the data transmission rate matrix that reflects the user's QoS, we also need to build a state value matrix that reflects the status of the BS so that we could better ascertain the network performance of the BS and the user's QoS and thus design a reasonable VHO scheme. The state value can be obtained according to Eqs. (8) and (9), and the state value sequence of each BS at time t has u − is(t) elements, which can also be expressed in matrix form: Each row of the matrix V (t) is a state value sequence of a BS, and the state value sequence is arranged in descending order.
To maximize the resource utilization of the system and satisfy the basic requirements of the user's QoS at the same time, we need to follow the principle that maximizes the sum of the state value and the system throughput when designing the VHO algorithm. Let matrix (t) be a decision variable; considering Eqs. (12) and (13), then the following multi-objective optimization model can be obtained: where Q ((:,j)) (t) represents the data transmission rate from all BSs to user j at time t, and � ((i,:)) (t)� 1 is the 1 − norm of the vector ((i,:)) (t) , which (11) θ ij (t) = 1 user j connects to BS i, 0 user j does not connect to BS i represents the total number of users connected to BS i at time t. The value of each element in can only take 0 or 1. In mathematical form, it can be expressed as: θ ij (1 − θ ij ) = 0(i = 1, 2, . . . , M, j = 1, 2, . . . , u − is(t)) . The same user can only access at most one BS at the same time, so M i=1 θ ij (t) ≤ 1 , and since the number of accessing channels of each BS is limited, the number of users connected to each BS is also limited. That is, 0 ≤ � ((i,:)) � 1 ≤ η i (i = 1, 2, . . . , M) . Equation (14) shows that the VHO decision problem has been transferred into a multi-objective optimization problem. The optimization variable here is a matrix . To solve this multi-objective optimization model with constraints, the multi-objective genetic algorithm NSGA-II [22] is applied. The algorithm steps are shown in Algorithm 1. First, we initialized the population according to the constraints of the model and then set the iterative stop condition to perform fast non-dominated sorting, congestion allocation, tournament selection, and the elite

Experiments and results
To evaluate the performance of the proposed VHO algorithm, we set up a simulation model in MATLAB 2018a and compared the proposed algorithm (MBMO) with the VHO algorithm based on decision tree (DT-VHO) [9], the VHO based on the multi-armed bandit model (MABA) [12] and the user-centered multi-objective handoff algorithm (MOS) [20]. DT-VHO is a multi-attribute decision-making algorithm based on decision trees, and the algorithm obtains more accurate decision attribute values through the Kalman filtering algorithm and deduces the false alarm probability and missed alarm probability to further reduce the handoff error rate and effectively improve the user's QoS. MABA introduces the multi-armed bandit model-based algorithm and uses the Gittins index to characterize the network performance of the BS, which can monitor the dynamic changes of the network in real time and improve the resource utilization of the system. Both algorithms can bring great performance improvements, but neither fully considers user QoS and network dynamic changes. At the same time, this paper takes both the user QoS and network dynamic changes into account to propose an improved algorithm to verify the performance obtained by the algorithm. MOS is similar to our paper; it constructs a multi-objective optimization function to achieve VHO by considering the user's data transmission rate and the blocking rate of the base station, but it still cannot reflect the real-time state of the BSs.
The network simulation scenario is illustrated in Fig. 2. We referred to the literature [20] to build a heterogeneous network model with three BSs, 3G (W-CDMA), 4G (LTE), and 5G BSs. Each network was assumed to be placed in a 520 m × 510 m matrix space, where a 3G BS was located at coordinate point (250, 510), a 4G BS was located at coordinate point (− 10, 0) , and a 5G BS was located at coordinate point (510, 0). Then, we supposed 100 users were randomly generated in a matrix space of 500 m × 500 m in the middle of the BS. We assumed that the users' moving speed was 0-2 m/s and the initial direction of the movement was random. The relevant parameters of the three BSs in Table 2 are set according to the literature [9] and [20], and common simulation parameters are given in Table 3. Figure 3 shows the value of network state of each network. To show the value in different states intuitively, we assume that the user arrival rate is 2, and the available bandwidth and delay increase with the number of access users. As the number of access users increases, the network performance of the BS will decrease, resulting in a decrease in the status of the BS. When the access user is 0, each BS can reach the best state, and when the access user is greater than the maximum accessible user set by the BS, the status value of the BS becomes 0. Since the performance of the 5G BS is

Analysis of total throughput
Throughput reflects the amount of data transmitted by the system in unit time. The higher the throughput is, the higher the transmission efficiency of the system. It is an important performance indicator of the network. Figure 1 shows the throughputs under four types of VHO algorithms, MBMO, DT-VHO [9], MABA [12], and MOS [20] when they are connected with different numbers of accessing users. Among them, the throughput of MBMO is calculated by Eq. (14). From Fig. 4, we can see that the overall  trend of throughput change is that when the number of accessing users is less than 50, the throughput increases continuously with the increase in users, but when the number of accessing users is greater than 50, the throughput increases slowly and gradually balance. Because the number of accessing users allowed by the heterogeneous network is limited, when the number of accessing users is greater than 50, the system tends to be saturated. Apart from that, Fig. 4 shows the proposed handoff algorithm MBMO always has the highest throughput. Figure 5 illustrates how the blocking rate of the four algorithms would change as the number of accessing users increases. The blocking rate curve of the MBMO algorithm is obtained by Eq. (14). It can be seen from Fig. 5 that overall, the blocking rate shows an ascending trend with the increase in accessing users. When the number of accessing users is less than 30, the system still has capacity to accommodate more new users, so the blocking rate of each algorithm is small or even close to 0. Nevertheless, when the number of accessing users is greater than 30, the blocking rate of each algorithm starts to increase. With the same number of accessing users, the blocking rate under the MOS method is always quite large because it always considers the worst case and overlaps the BSs that all users can access. As a result, this algorithm could not truly reflect the blocking rate of the BS. The benefit function constructed by the DT-VHO method does not consider real-time changes in the BS state during the decision period, so its blocking rate changes gently but could not achieve an optimal effect. Both the MBMO algorithms proposed in this article and the MABA consider the real-time changes of BS state. Therefore, the blocking rates under those two methods are always in a low state, but it can be seen from the construction method of Eq. (14) that the proposed MBMO algorithm further considers the coordination between handoff users and the QoS of users and thus would further reduce the blocking rate of the system.

Stability of algorithm
To verify the stability of the proposed algorithm, we simulate the throughput of each handoff user after switching when the number of handoff users is less than the system capacity. Figure 6 shows the perceived throughput of each handoff user under each network. The simulation results of each algorithm are shown in Fig. 7. The proposed algorithm MBMO and the VHO algorithm MABA based on the multi-armed bandit model take into account the dynamic characteristics of the networks when users switch, so when the system capacity is sufficient, there will be almost no handoff users that cannot access the BS due to a certain BS blocking. The user-centered VHO algorithm MOS and the decision tree-based VHO algorithm DT-VHO focus on a single user and thus lack the consideration of network-side characteristics, resulting in large fluctuations in the data reception rate obtained by users and unstable user performance. In addition, it can be clearly seen from the mean error graph of the user throughput in Fig. 8 that the MBMO algorithm has the highest mean and the

Conclusions and prospects
In this paper, we propose a VHO algorithm based on a multi-objective optimization model. We build a multi-objective optimization model with full consideration of the dynamic characteristics of the network side and the QoS of the user side. To quantify the performance of the network state and make it convenient for mathematical processing, a Markov chain is used to solve the value sequence of the BS. Finally, we solve the multi-objective optimization function through the NSGA-II algorithm to obtain the users' decision result. The algorithm can effectively improve the user's service quality and the resource utilization of heterogeneous network systems.
In future studies, the heterogeneous network environment will be more complex. With the development of the Internet of Things, a massive number of terminals will be connected to heterogeneous networks. To maintain a balance between terminal QoS and network resource utilization, future VHO algorithms should have the ability to process a large amount of data. In the current study, although the neural network-based VHO can handle a large amount of data, its level of complexity is high, the speed of calculation is slow, and it is not suitable for high-speed moving scenes. Further study on VHO algorithms may focus on processing massive data with lower latency.