Research on data mining algorithm of logistics time series based on intelligent integrated network structure

With the continuous development of information, big data analysis has become important and dependent technical means-increasingly in various fields. By data mining through time series, the development regular of the object could be grasped, so we could predict its future development trend. Based on the intelligent integration architecture, a new algorithm of bi-weighted support vector machines (SVM) based on category weighting, and feature weighting was proposed to solve the problem of unbalanced samples in time series. In the non-balanced sample set classification, the recognition ability of the traditional classification method was low; the supported vector machine as classifier was taken in the new algorithm based on cost-sensitive learning, and different weighting coefficients to less and more samples were given, and Gauss kernel function with the weight coefficients of different features was reconstructed, thus the recognition ability of less samples was improved. In the experiment, classification accuracy, g-mean, f-measure, TP, and FP were selected as evaluation indexes, indicating that the two-weighted SVM algorithm is effective in the classification of non-balanced sample sets.


Introduction
With the continuous development of information, big data analysis has become important and dependent technical means-increasingly in various fields. By data mining through time series, the development regular of the object can be grasped, so as to predict its future development trend. Time series modeling and prediction methods are generally divided into two categories: traditional method and intelligent method [1,2]. Traditional methods include linear regression analysis, nonlinear regression analysis, auto-regressive sliding average (ARMA) modeling, partial least square method, and gray prediction. The intelligent technology such as expert system, fuzzy rules, neural network, and support vector machine is used in the intelligent method to realize the predictive modeling [3,4]. The modeling of expert system refers to the description of the production process based on expert experience and knowledge, which is very explanatory. However, its knowledge acquisition has bottlenecks and its learning ability is poor. Similar to the expert system, fuzzy logic also describes the production process according to the expert experience and knowledge. The difference is that the fuzzy reasoning method could deal with uncertain information well. The modeling technology based on fuzzy rules is also limited by the acquired knowledge and has the problem of low model precision [5,6]. For complex forecasting problems, a modeling method is often unable to achieve the prediction accuracy. Therefore, it is necessary to integrate multiple modeling technologies and absorb the advantages of various modeling, so as to the purpose of accurate prediction is reached. Intelligent integration modeling is to integrate two or more modeling methods to realize the modeling of complex industrial processes in a certain way, at least one of which is the intelligent modeling method [7].
In this paper, an intelligent integration architecture is proposed, and an intelligent integration structure is given. For the time-series data of a kind of random noise disturbance, an auto-regression sliding average model of nested dual-population particle swarm algorithm is proposed by using a parallel nested modeling structure [8]. The least-square support vector machine based on probability density control is proposed for mining deterministic trend in data. The effectiveness of the proposed method was verified by a set of experiments [9].
The specific contributions of this paper include (1) a double weighted support vector machine algorithm based on category weighting and feature weighting is proposed; (2) solved the problem of imbalanced time series samples; (3) support vector machine classification based on cost-sensitive learning; (4) reconstructed Gaussian kernel functions with different feature weight coefficients.
The rest of this paper is organized as follows. Section 2 discusses intelligent integration structure, followed by the methods in Section 3. Experiment is discussed in Section 4. Section 5 concludes the paper with summary and future research directions.

Intelligent integration structure
Intelligent integration refers to two or more pattern mining methods that are integrated in a certain way to realize the mining of complex data rules or patterns, of which at least one is an intelligent modeling method. There are four main forms and structures of intelligent integration pattern mining [10].

Parallel complement integration structure
The parallel complement integration structure includes two sub-models, and the two models have no primary and secondary points and complement each other. The two sub-models in the structure are usually obtained by two modeling methods [11]. The single modeling method can mine part of the information in the time series data to know the corresponding law. However, due to the limitation of the method, all the information in the data cannot be obtained. The two modeling methods complement each other to fully exploit the laws or patterns implied in the data [12].

Weighted overlay integrated structure
The weighted superposition integration structure is composed of multiple sub-models weighted and superimposed, and the weight of each sub-model corresponds to its role in the integration model. The multiple sub-models in the structure are usually obtained by a variety of modeling methods [13]. A single modeling method can mine part of the information in the time series data to know the corresponding law. However, due to the limitation of the method, all the information in the data cannot be obtained, so a variety of modeling methods complement each other to fully exploit the laws or patterns implied in the data [14].

Series integrated structure
The tandem integration structure consists of two or more sub-models. Except for the first and last models, each model is the output of the previous model and the input of the latter model. Nonlinear dynamic systems usually adopt this form. For example, the neural network is used to reflect the nonlinear characteristics of the system static, and the dynamic characteristics are characterized by NARMX (a nonlinear auto-regressive moving average with exogenous variables) [15].

Model nested integration structure
The nested integration structure includes at least two sub-models, one of which is called a base model, which is used to model the main structure of the industrial process, and the other sub-models are nested in the base model to build unknown parameters in the base model [16]. For example, the bionic algorithm such as the ant colony algorithm, particle swarm optimization algorithm, and genetic algorithm is applied to the system identification to realize the parameter estimation in the model.

Data mining technology
In recent years, with the development of technologies such as data collection and storage, the data of the information society has exploded, and there has been a situation of "rich data and poor information." Massive data not only makes it difficult to distinguish useful data but also increases the complexity of data analysis. In order to solve this problem, data mining technology came into being [18,19]. The birth of data mining aims to transform a large amount of data that could be widely used in society into useful knowledge and information for market analysis, fraud monitoring, hacking, product control, and scientific exploration. In general, data mining can be divided into the following seven steps (Fig. 2): (1) Data cleansing-eliminating noise and data not related to the mining theme.
(2) Data integration-integrating data from multiple data sources.
(3) Data selection-select data related to mining topics.
(4) Data transformation-using data such as normalization to transform data into a form suitable for data mining. (5) Data mining-the core steps to mine knowledge using methods such as classification, fusion, and association rules.
Knowledge representation-the model is represented by a technically understandable model, and the knowledge obtained by the mining is presented to the user.

Time series data mining algorithm
The increasing non-equilibrium data in recent years has brought new challenges to data mining research. When a classification model is established for an unbalanced data set, the cost of misclassifying less types of data is more expensive than that of misclassified multi-class data, so the traditional classification method is not applicable to it. Nowadays, research on unbalanced data sets is concentrated [20]. There are two main types, one is to undersample multiple data, and the other is to generate some small data manually, but these two methods are for time series data which is not applicable. Ojha tried to set different classification penalty parameters for different categories of samples in the process of establishing classification model, and proposed weighted support vector machine WSVM [21]. In addition, Ram [13] et al. proposed a support vector machine FWSVM [22] which is based on feature weighting. Firstly, the information gain is used to evaluate the importance of each feature corresponding to the classification task, and the weights are respectively assigned, and then the weight coefficients are applied to the calculation of the SVM kernel function. Based on the sample weighted support vector machine and the feature weighted support vector machine, a dual-weighted support vector machine (DWSVM) is proposed, which maintains the advantages of WSVM and FWSVM. The dual-weighted support vector machine algorithm DWSVM is proposed in this paper, first, different misclassification penalty coefficients to different sample categories when constructing the classification hyperplane is assigned, then the weights of each feature and reconstructs the kernel function is calculated, thus which makes the algorithm better, generalization and robustness.

Introducing different penalty factors
For the two-category problem, the multi-class is a negative class, and the other is a positive class. Different types of samples are given different error-discriminating coefficients: In which, C+ is the penalty factor of the misclassified positive sample, C− is the penalty factor of the misclassified small sample, and the problem of (1) is converted into the Wolfe dual problem by the convex quadratic programming method in the optimization theory. After getting it, In which, ϕ(x i )(x j ) is Kernel function.

Kernel function based on feature weighting
The definition of feature weighting is based on the degree of contribution of each feature of the sample to the pattern recognition, giving different weight coefficients. There are many methods for feature weighting, such as the information weight-based feature weighting method proposed by Wang Yan et al. The feature weighting method based on manifold learning is adopted here. A weighting matrix P is defined as follows: The function of the kernel function is to find the nonlinear mode by using the linear function in the eigenvector space established by the nonlinear feature map [23].
According to Theorem 1 and Theorem 2, the matrix P is introduced, the shape of the input geometric space can be scaled, and the geometry of the feature space can be scaled, thereby changing the weights assigned to different linear functions in the feature space during the modeling process.
Theorem 1: K is made a kernel function which is defined on XX, which is a mapping from input space to feature space. X-F, P is a linear transformation matrix, then Theorem 2: If there is 1 < _k < _h, then the kth eigenvector of the data set is independent of the calculation of the weighted kernel function, and has nothing to do with the output of the classifier. The smaller the ωk, the smaller the influence is on the calculation of the kernel function, and the smaller the effect is on the classification result.
Therefore, by introducing P into the Gauss radial basis kernel function, the following formula can be obtained.
Using the weighted kernel function in equation (5) for the support vector machine classifier, the classification model with weighted sample features can be obtained.

Double weighted support vector machine
According to the introduction of the previous two sections, the construction steps of the dual feature support vector machine (DWSVM) are as follows: Step 1. Collects the data x, and the feature set in x is (fs, n), where n is the number of the feature, that is, the feature of X is represented by fs: (f 1 , f 2, …..,f n ).
Step 2. Calculate the weight coefficient ωi of the fi feature off by the MBFS method, and generate a linear transform weight matrix P which is based on ωi.
Step 3. Transform the Gauss kernel function with a linear transformation weighting matrix P, which is obtained a kernel function based on feature weighting (4).
Step 4. Construct a minimized structural risk function (1) which is based on the weighting of the two classification samples, and add the kernel function in equation (4) to the construction of the classification hyperplane is to establish a support vector machine classification model.
Step 5. Evaluate the obtained classifier.

Unbalanced classification evaluation index
In the process of establishing the SVM model, we continuously debug the parameters of the SVM (including the kernel function t, the penalty parameter c, the kernel function parameter g, and the weighting coefficients w0 and w1, etc.) to obtain better prediction results. Because in the unbalanced classification, classification accuracy could not be used as an evaluation index to measure classification performance, we also need to select appropriate model evaluation indicators according to actual needs. In addition to introducing the classification correct rate Acc, the evaluation indicators such as Gmean, F-measure and auc-roc are also selected. The classification correct rate Acc represents the proportion of the sample with the correct classification to all samples. The total number of samples is M, and the number of samples with the correct classification is TM.
Define the classifier's Acc.
The TP rate and FP rate of the classifier are given by the following definition, defining the TP rate of the classifier.
Define F-prate of the classifier: First define the sensitivity and specificity of the classifier. Define the sensitivity of the classifier: Define the specificity of the classifier: By definition, the sensitivity is positive class sample accuracy, and specificity is negative class sample accuracy. Based on the above two indicators, Ku-bat et al. proposed a new metric G-means to evaluate the unbalanced classification, which is given by the following definition.
Define the classifier's G-means: From the definition point of view, G-means takes into account the positive and negative precision of the class, and can better reflect the comprehensive performance of the classifier. Many researchers use G-mean as a measure when evaluating unbalanced classification performance.
In some special applications, more attention is paid to the classification performance of sample positive categories, such as credit card fraud detection, customer churn in telecommunications, arrears forecast, intrusion r in intrusion detection, abnormal state, and disease monitoring in medical diagnosis. Wait F-measure is mainly used to measure the classification effect of positive samples.
First, the definitions of the classifier precision and recall are given. Define the classifier's precision: Define the classifier's recall: By definition, precision is the positive class sample coverage, and recall is the positive class sample accuracy. Based on the above two indicators, F-measure is given by the following definitions: Define the F-measure of the classifier: Usually β = 1. It can be seen from the above definition that F-measure fully embodies the classification performance of positive classes and is a trade-off between positive coverage and accuracy. Many researchers use F-measure as a measure when evaluating unbalanced classification performance. Define the AUC of the classifier: The middle n+ is the number of samples of all the minority classes, and n− is the number of samples of all the majority classes. For any sample of a few classes, if the probability that the classification algorithm f can divide it into a minority class is greater than the probability of dividing it into a majority class, the value of f (x+) > f (x−) is 1, otherwise 0, the same is true for the AUC for solving most classes. Then multiply the two and divide by the product of the minority class and the majority class to get the AUC value.

Experiment
In order to verify the classification performance of the dual-weighted support vector machine in time series data, this chapter uses the Libsvm toolkit and modifies it accordingly, and then uses MATLAB to perform dual-weighted support vector machine modeling on the real data set.

Data set introduction
This experiment used a university student card consumption data set (2011.092011.10). Among them, the consumption data comes from the logistics department of the university, the list of impoverished students comes from the school, the sign of the poor students is 1, and the mark of non-poor students is 0. The sample size of the consumption data in September and October after pre-processing is 8093 × 10, and the data format is shown in Table 1.
By establishing a classification model for the data set, it is possible to judge whether it is a poor student based on a student's card consumption data, to help the school to identify poor students and to see if there are poor students who have not applied for subsidies for other reasons. The data model diagram is shown in Fig. 3.

Parameter settings
The pre-processed September card consumption data set is used to establish a dualweighted SVM model, and the G value, F value, and Acc value of the October data prediction result under different parameter settings are recorded.  1) Select the kernel function. In libsvm, t−0, 1, 2, 3, 4 represent linear kernel functions, polynomial kernel functions, RBF kernel functions, sigmoid kernel functions, and custom kernel functions, respectively. According to the above introduction, this paper uses the kernel function WRBF based on feature weighting, and the parameter is set to t = 4. 2) The penalty parameter c and the kernel function parameter g are selected. After selecting the kernel function, use the grid optimization algorithm to find the best c value and g value. According to experience, set the parameter range of c and g to [2^(−5), 2^(5)]; the step length is 1. As shown in Tables 2, 3, and 4, when c, g is between 2^(−5) and 1, the DWSVM model obtains the best prediction. In order to facilitate the subsequent experiments, we choose c = 1, g = 1 as the DWSVM model parameters. At this time, the F value is 0.6733, and the G value is 0.9088 and the Acc value is 89.00%.
3) Determination of weighting coefficients w0 and w1. In libsvm, the two-category samples are weighted with parameters w0 and w1. Under the premise of s = 0, t = 4, c = 1, g = 1, three experiments are carried out in this paper, as shown in Tables  5, 6, and 7: The first group, w0 = 1, w1 from 1 to 1.9 G value, F value, Acc value, TP value, and TN value are recorded in steps of 0.1; the second group, w1 = 1, w0 changes from 1 to 1.9 in steps of 0.1, and the G value and the F value are recorded. Acc value, TP value, and TN value; the third group, w1 = 1, w1 changes from 1 to 20, and records G value, F value, Acc value, TP value, and TN value. We get that  In summary, when t = 2, c = 1, g = 1, w0 = 1.5, w1 = 1, the prediction result of the DWSVM model is the same and can be obtained in the unweighted support vector machine model, when t = 2, c = 1, g = 1, the classification effect is the best, the values of G-mean, F-measure, Acc, TP, FP are 0.8408, 0.6234, 83.78%, 1081, and 6239 respectively.

Model effect evaluation
Since the training set and test set are randomly selected in the data set in the experiment, different distribution methods have a certain impact on the experimental results. In order to make the experimental results more convincing, this experiment uses a tenfold cross-validation, taking an average of 10 experiments as the final result. For the classification of unbalanced samples, the classification accuracy of negative samples is the key to the evaluation model. In the card consumption data for October, there were 1237 positive samples and 6856 negative samples. In the classification model established in this paper, the average classification accuracy rate Acc reached 91.28%; meanwhile, the values of TP and FP were 828 and 6527, respectively, that is, the correct classification rates of positive and negative categories were 67% and 95.2%. The values of G-mean and F-measure are 0.8206 and 0.6917, respectively. Figure 4 is a sample graph for predicting October data: Figure 5 is the ROC curve for which the October data is predicted. It can be seen that the overall effect of the classification model is better, which meets the requirements of the non-equilibrium sample classification model. If there is no classification model assistance and complete manual guessing, the average classification accuracy of positive samples is 50%. However, with the classification model, the classification accuracy of positive samples reaches 67%, which is 34%, which has practical guiding significance.  In this paper, a dual-weighted support vector machine model DWSVM based on sample class weighting and sample feature weighting is proposed, and the verification experiment is carried out on the actual data set. The experimental results show that the DWSVM model has a good classification effect on unbalanced time series, which meets the requirements of practical applications and improves the performance of unweighted support vector machines. In this paper, the characteristics of time series and the classification difficulties of unbalanced data sets are studied. Under the topology design of intelligent integrated architecture, based on the previous research, a double-weighted support vector machine DWSVM based on sample class weighting and sample feature weighting is proposed. In the aspect of sample category weighting, the idea of cost-sensitive learning is used. In the aspect of sample feature weighting, the feature weight calculation method in MBFS algorithm is adopted. The introduction of these two weighting methods makes the SVM more generalized and robust. The whole system has perfect functions, stable performance, strong real-time performance, good scalability, accurate data communication, and fully meets the intelligent integrated architecture. The time series dimension is large, and the sample unbalanced mining difficulty is used to verify that the practical application value of time series data mining is large and difficult, and has broad research prospects.
Abbreviations ARMA: Agung Rai Museum of Art