Short-term passenger flow forecast for urban rail transit based on multi-source data

Short-term passenger flow prediction in urban rail transit plays an important role because it in-forms decision-making on operation scheduling. However, passenger flow prediction is affected by many factors. This study uses the seasonal autoregressive integrated moving average model (SARIMA) and support vector machines (SVM) to establish a traffic flow prediction model. The model is built using intelligent data provided by a large-scale urban traffic flow warning system, such as accurate passenger flow data, collected using the Internet of things and sensor networks. The model proposed in this paper can adapt to the complexity, nonlinearity, and periodicity of passenger flow in urban rail transit. Test results on a Beijing traffic dataset show that the SARI-MA–SVM model can improve accuracy and reduce errors in traffic prediction. The obtained pre-diction fits well with the measured data. Therefore, the SARIMA–SVM model can fully charac-terize traffic variations and is suitable for passenger flow prediction.

systems. As the related technologies of the Internet of Things are widely used in various fields, the immediacy of information perception and collection technology becomes possible, and the perception ability of the transportation system has been unprecedentedly improved [11][12][13]. Cloud computing is a super computing model, which is a pool of resources [14,15], but it is not only distributed processing, but also an intelligent processing function which can be managed and coordinated independently on the basis of distributed architecture.
With the application of the Internet of Things, cloud computing and big data technology, the amount of data generated during passenger travel and rail transit operations is showing a rapid growth trend. Comprehensive city perception brings a lot of real-time data. Massive, multi-source, heterogeneous traffic big data and personalized, diversified travel needs pose challenges to passenger flow forecasting methods. This provides a new perspective for scientific research [16][17][18].
With the continuous opening of new lines and the increasing degree of network, the proportion of citizens traveling by urban rail transit is getting higher and higher. At present, most of the urban rail transit operation management practices rely on the experience of dispatchers to judge the current changes in passenger flow. The quantitative passenger flow forecasting methods or systems has not been applied, but the passenger flow forecasted by experience often has a greater error with the actual situation [19,20]. Therefore, the short-term passenger flow forecast of urban rail transit plays a more important role, which can provide a corresponding basis for the metro operation and dispatching department, and is of great significance for the work of urban rail transit operation management.
The development trend of intelligent system of urban rail transit is to construct largescale integrated system by using advanced technologies such as computer technology, big data technology, automation technology and Internet of Things technology, which realizes the interconnection of platforms, information interaction and data sharing [21][22][23]. At present, great progress has been made in the integration and application of intelligent system for urban rail transit [24][25][26]. In this paper, a large number of diverse and real data information such as passenger flow data and weather are used to realize the intelligent data aggregation of information resources.
So far, the methods of short-term station passenger flow forecasting for urban rail transit can generally be divided into three categories (linear model, nonlinear model and combined model). The linear model methods include time series model, Kalman filtering model [27][28][29]. The nonlinear model methods include genetic algorithm, neural networks, nonparametric regression model, gray system model, support vector machines, chaos theory, etc. [30][31][32].
The advantage of linear prediction algorithm is that the calculation complexity is low, but the effect is poor when dealing with complex passenger flow data. The nonlinear prediction model can deal with the volatility of passenger flow time series, but it has the shortcomings of complex theory and calculation. The linear model and nonlinear model are not able to fully characterize the short-term urban rail transit traffic, so the combined model has gradually become the focus of research. Based on the ARIMA model and the SVM model to forecast traffic flow, [33] proposes a new approach for traffic flow prediction. [34] addressed two novel neural network structures for short-term railway passenger demand forecasting. [35] proposes a novel hybrid optimization algorithm of computational intelligence techniques for highway passenger volume prediction. [36] proposes a hybrid EMD-BPN forecasting approach which combines empirical mode decomposition (EMD) and back-propagation neural networks (BPN).
At present, the short-term traffic passenger flow prediction method based on the combined model can improve the fitting ability of the model to a certain extent, and thus effectively improve the prediction accuracy of the passenger flow.
Generally speaking, the relationship between short-term station passenger flow of urban rail transit and the historical station passenger traffic, large-scale event information and weather conditions is very complicated, and it is not suitable to use a specific linear or nonlinear model to describe. Considering the regularity and time-varying characteristics of urban rail transit passenger flow, based on season autoregressive integrating moving average model (SARIMA) and support vector machine model (SVM), this paper proposes the SARIMA-SVM combination model for urban rail transit short-term station passenger flow forecasting.
This paper proposes the combined model considers the periodic characteristics of passenger flow changes, and considers the nonlinear relationship between short-term passenger flow and passenger flow before and after the period. It makes full use of the existing passenger flow information and realizes the prediction of short-term station passenger flow of urban rail transit.
The rest of this paper is organized as follows. Section 2 discusses the principle of SARIMA model and SVM model, proposes the SARIMA-SVM combination model. The simulation experimental results are shown in Sects. 3, and 4 concludes the paper with summary and future research directions.

SARIMA model
The seasonal autoregressive integrated moving average model (SARIMA) [37,38] is a variant and expansion of the autoregressive integrated moving average model (ARIMA), which fully takes into account the periodicity of the data and is suitable for the daily dynamics of traffic flow. It can not only guarantee the accuracy of the model but also be easily applied to real-time prediction.
The ARIMA model is composed of autoregressive model and moving average model, and is processed by d-order difference. ARIMA(p, d, q) is expressed by a mathematical formula such as Eq. (1): In which, y(t) and ε(t) denote the original time series and the zero-mean white noise sequence, respectively.
Considering account the periodic characteristics of the time series, the seasonal difference is made to the ARIMA(p, d, q) model, and the SARIMA(p, d, q)(P, D, Q) s model is obtained, which is expressed by a mathematical formula such as Eq. (2): denotes the season shift operator, B denotes the length of the seasonal cycle, D denotes the order of the seasonal difference, P denotes the lag order of the seasonal autoregressive term, Q denotes the lag order of the seasonal moving average term.

SVM model
Support vector machine (SVM) [39,40] is based on the Vapnik-Chervonenkis dimension theory of statistical learning theory and the principle of structural risk minimization. The architecture of SVM is shown in Fig. 1, where K is the kernel function: is the input column vector of the i training set, and y i is the corresponding output value.
The SVM regression function is: denotes a nonlinear mapping function, w denotes a weight coefficient matrix, b denotes a threshold. By introducing relaxation variables ξ i , ξ * i and penalty factors C , and obtain the convex quadratic programming for solving w and b.
In which, ε denotes the factor of the insensitive loss function, ε specifies the error requirement of the regression function; C is larger, and the penalty for the sample with the training error greater than ε is greater.

SARIMA-SVM combination model
The schematic diagram of the SARIMA-SVM combined model in this paper is shown in Fig. 2. The basic idea is: firstly, the passenger flow at the station is forecasted by In which, Y (f 1 , f 2 ) denotes the station passenger flow data at a certain time; f it (t) denotes the predicted value of the i-th prediction method at time t; w i (t) denotes the weight of the i-th prediction method; C is a constant.

Basic data
The study uses the passenger flow data from May 4, 2015 (Monday), to June 14, 2015 (Sunday), avoiding major holidays and similar events, with a major impact on urban traffic passenger flow.
To verify the practicability of the model, for different types of subway stations, the 15-min inbound statistics were selected for three stations: Beijing Taoranting Station   As we all know, urban rail transit passenger flow has obvious differences between working days and weekends. It can also be seen from the inbound passenger flow trend graph that the difference between weekday passenger flow and weekend passenger flow is more obvious. Therefore, this paper divides passenger flow into two types, workday and weekend.

SARIMA model
Rail transit passenger flow has a significant feature of daily periodicity, that is, there is a certain commonality at the corresponding time every day.
In order to predict the passenger flow V i (t) of the station at the t-th day of the i-th day (the measured value is V i (t) ), the time series formed by the latest k-time passenger  Step 1 Verify the stability of the passenger flow data; Step 2 Set the initial values of the p, d, q, P, D, Q, and S parameters; Step 3 Estimation SARIMA(p, d, q)(P, D, Q) s model; Step 4 Estimate the obtained SARIMA(p, d, q)(P, D, Q) s model and test the analysis to verify the residual of the fitted model to confirm that the model can adequately describe the data; Step 5 Select the optimal SARIMA model setting based on the corresponding Akaike information criterion (AIC) value or Schwarz information criterion (SIC) value.
Since the daily periodicity of the sequence is to be considered in the prediction model, the sampling interval is 15 min, the sampling time is 6:00 to 22:45 every day, the daily passenger flow data is 68, and the model parameter is S = 68.
Through the SPSS parameter estimation, the parameters of the optimal SARIMA model for the corresponding three station traffic flows are shown in Table 1.

SVM prediction model
The passenger flow is a cyclical change of 7 days. Assume that the time period of shortterm forecast is T (15 min in this paper), 1 day can be divided into multiple observation periods, and the traffic volume of a certain observation period is closely related to the previous s observation period.
The passenger flow forecast for a certain observation period of a certain d-day t-period is obtained based on the passenger flow data of the previous m-week of the day, the n-days before the week, and the s-period of the day before, as shown in Eq. (5). In the Matlab platform, the LIBSVM toolbox is called by programming, the kernel function is RBF kernel function, and the 50-fold cross-validation is set. The genetic algorithm determines the optimal parameters of the SVM model, and uses the trained SVM model to predict. The model parameters are shown in Table 2.

SARIMA-SVM combined model
According to the previous analysis, the SARIMA and SVM model prediction results reflect the real passenger flow, and the multiple linear regression analysis is performed by SPSS to obtain the regression coefficient. The results are shown in Table 3. The comparison of inbound passenger traffic forecasts is shown in Figs. 6, 7, and 8.

Results and analysis
In this study, the performance of the model was evaluated by root-mean-square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE), see Eqs. (6), (7), and (8). The RMSE is suitable for comparison between different models of the same data set, and the MAPE can be used for comparison between different data sets. The RMSE can be used as the main evaluation index of model robustness.  In which, y t indicates the actual observed value of the passenger flow, ŷ t indicates the predicted value of the passenger flow, and n is the predicted sample. The model performance evaluation pairs are shown in Table 4.   It can be seen from Table 4 that the SARIMA-SVM model error is significantly reduced compared to SARIMA and SVM for Taoranting Station and Gongzhufen Station. For BeijingNan Railway Station, SARIMA-SVM model is more robust than SARIMA and SVM, and the error is significantly lower than SARIMA.

Conclusion
Short-term passenger flow forecasting is essential for the operation and management of rail transit. However, the change of passenger flow in urban rail transit stations is characterized by complexity, nonlinearity and periodicity. Thus, a single forecasting method cannot fully describe the changing patterns of passenger flow and is not applicable to, daily passenger flow forecasting. Based on the Internet of Things technology and Sensor networks, aiming at the characteristics of the change of passenger flow in Beijing rail transit, this paper proposes a SARIMA-SVM passenger flow combination forecasting model, realizes the accurate judgment and intelligent analysis of the large passenger flow. The test results show that the model effectively improves the accuracy of passenger traffic prediction, reduces the prediction error, and can well describe the variation law of passenger flow. It has broad application prospects in urban rail transit short-term passenger flow prediction.
In order to deeply study the problem of urban rail transit passenger flow prediction, this paper proposes a rail transit passenger flow prediction model based on the SARIMA-SVM, which makes up for the deficiencies of previous studies. Compared with the existing forecasting method, this method is closer to the actual situation and provides strong support for the accurate forecast of passenger flow. This model provides a theoretical basis for the government and related departments to formulate traffic management measures. How to further update the data in the model to obtain more accurate results, study the daily changes of passenger behavior after short-term incidents, and develop passenger flow organization methods when short-term incidents occur in subway stations are the directions of future research.