Skip to main content

Advertisement

Time series classification based on statistical features

Abstract

This paper presents a statistical feature approach in fully convolutional time series classification (TSC), which is aimed at improving the accuracy and efficiency of TSC. This method is based on fully convolutional neural networks (FCN), and there are the following two properties: statistical features in data preprocessing and fine-tuning strategies in network training. The key steps are described as follows: firstly, by the window slicing principle, dividing the original time series into multiple equal-length subsequences; secondly, by extracting statistical features on each subsequence, in order to form a new sequence as the input of the neural network, and training neural network by the fine-tuning idea; thirdly, by evaluating the classification performance about test sets; and finally, by comparing the sample sequence complexity and network classification loss accuracy with the FCN using the original sequence. Our experimental results show that the proposed method improved the classification effects of FCN and the residual network (ResNet), which means that it has a generalization ability to the network structures.

Introduction

Time series data are widely used for representing special data such as biological observations, stock prices, weather readings, and health monitoring data. How to extract useful information from time series data is becoming more and more important. Time series classification (TSC) is one of the key tasks, which is aimed at predicting class labels for time series. Nowadays, TSC is widely used successfully in medical diagnosis and early warning [1].

Many scholars have studied TSC and provided some methods. The approaches can be divided into the following three categories: distance-based approach, feature-based approach, and ensemble-based ones [2]. The key idea of distance-based approach is to measure the similarity between any given two time series. Commonly used classification algorithms are K-nearest neighbor (KNN), support vector machines (SVM) with similarity-based kernels, and dynamic time warping (DTW). The key idea of feature-based ones is mainly extracting a set of features representing the global time series patterns, and then commonly form a bag-of-words (BoW) provided to classifiers [3]. These kind of classification algorithms are the bag-of-features framework (TSBF) [4], bag-of-SFA-symbols (BOSS) [5], and BOSSVS method [6]. The key idea of third ones is to combine different classifiers together to achieve a higher accuracy. For example, PROP combines 11 classifiers based on elastic distance measures with a weighted ensemble scheme [7]. Shapelet ensemble (SE) provides the classifiers through the shapelet transform in conjunction with a heterogeneous ensemble [8]. In summary, these methods described above commonly require data preprocessing and feature extraction.

In recent years, deep neural networks have been used for TSC tasks, for example, multi-scale convolutional neural network (MCNN) [1], fully convolutional network (FCN) [2], and residual network (ResNet) [2]. FCN and ResNet do not require a lot of data preprocessing or feature engineering and can provide a good classification effect. Karim et al. proposed long-short-term memory fully convolutional networks (LSTM-FCN), which further improved the performance of the FCN [9]. But it increased the complexity of calculations and classification. Based on FCN, this paper studies the time series preprocessing and found a way to improve the performance of FCN classification. The key idea is described as follows: given some original time series of UCR data sets [10], the method uses the window slicing principle to divide the original time series into multiple equal-length subsequences and then extracts statistical features on each subsequence to form a new sequence as the input of the neural network. In the network training process, we introduce the fine-tuning idea and evaluate the classification performance of the network by using a test set. Finally, the sample sequence complexity and network classification loss accuracy are compared with the FCN using the original sequence. Experimental results show that FCN performs better on the samples of the statistical features of the extracted subsequences, and the found method also improves the classification performance of ResNet, which means that it has generalization ability to the network structures.

The remainder of the paper is organized as follows. Section 2 gives some basic notations such as time series classification. Section 3 introduces the network architecture and fine-tuning of models. Section 4 provides experiments and results analysis. Section 5 improves the experiments from two aspects: increase the number of original sequences and extract more statistical features. Finally, we present conclusions and future work.

Some basic notations

This section will introduce some basic notations such as TSC [11].

Definition 1

An ordered set of real values with the form X = [x1, x2, ..., xt] is called a univariate time series (UTS). The length of X is equal to number t.

Definition 2

Given N different UTS with XiRT, X = [X1, X2, ..., XN] is called an M-dimensional multivariate time series (MTS).

Definition 3

A data set D = {(X1, Y1), (X2, Y2), ..., (XN, YN)} is a set of pairs (Xi, Yi) where Xi could either be a UTS or MTS with Yi as its one-hot label vector. For a data set containing N classes, Yi is the vector of length N, where j [1, N] is equal to 1 ifXi is j and 0 otherwise.

Definition 4

Given some time series of length t, X = {Xt, t 1, 2, ...N} and their labels {Y1, Y2, ..., YN}, TSC refers to input a new time series and output its labels Yi.

Network architecture and fine-tuning of models

Network module and architecture

In FCN, a time convolution block is used as a time series feature extraction module, which consists of a convolution layer, a batch normalization layer, and an activation function [4, 12,13,14,15]. Some researchers provided methods for training deep networks [15,16,17,18]. The activation function may be a linear correction unit or a parametric linear correction unit. Convolution layer is shown in Fig. 1. Generally, the input to a temporal convolutional network is a time series signal. Given L convolutional layers, filters can capture how the input signals evolve over the course of an action. For each layer, all of the filters are parameterized by tensor \( {W}^{(l)}\in {R}^{F_l\times d\times {F}_{l-1}} \) and biases\( {b}^{(l)}\in {R}^{F_l} \), where l {1, ..., L} is the layer index and d is the filter duration. For the l − th layer, the i − th component of the incoming (normalized) activation matrix \( {E}^{\left(l-1\right)}\in {R}^{F_{l-1}\times {T}_{l-1}} \) from the previous layer for each time t where f(·) is a rectified linear unit [9, 12, 15].

$$ \hat{E_{i,t}^{(l)}}=f\left({b}_i^{(l)}+\sum \limits_{t^{\hbox{'}}=1}^d\left\langle {W}_{i,{t}^{\hbox{'}},.}^{(l)},{E}_{.,t+d-{t}^{\hbox{'}}}^{\left(l-1\right)}\right\rangle \right) $$
(1)
Fig. 1
figure1

Convolution operation diagram [4, 12,13,14,15]

When training neural networks, standardized input can increase the speed of neural network training. The method is to normalize the training data, which means that raw data is subtracted from its mean and divided by its variance. Standard processing is also required at each hidden layer of the neural network, which is the batch normalization, as shown in Fig. 2.

Fig. 2
figure2

Batch normalization [15]

Batch normalization processes lth hidden layer input Zl − 1 as follows, ignoring the superscript l − 1:

$$ {\displaystyle \begin{array}{l}\mu =\frac{1}{m}\sum \limits_i{z}^{(i)}\\ {}{\sigma}^2=\frac{1}{m}\sum \limits_i{\left({z}_i-\mu \right)}^2\\ {}{z}_{\mathrm{norm}}^{(i)}=\frac{z^{(i)}-\mu }{\sqrt{\sigma^2+\varepsilon }}\end{array}} $$
(2)

where m is the number of single mini-batch samples and ε prevents the denominator from being zero, which can be a value 10−8. In this way, all the inputs of the hidden layer have a mean of 0 and a variance of 1.

In most cases, we do not expect all z(i) means equal to 0 and the variance is 1. So, it usually needs to be further processed onz(i):

$$ \overset{\sim }{z^{(i)}}=\gamma \cdotp {z}_{\mathrm{norm}}^{(i)}+\beta $$
(3)

where γ and β are hyper-parameters, which can be obtained by gradient descent. Setting γ and β to different values, you can get any mean and variance. Activation function ReLU is shown in Fig. 3. Compared with other activation functions, ReLU has the advantages as follows: for linear functions, ReLU is more expressive; for nonlinear functions, ReLU is due to nonnegative interval gradients. Hence, there is no vanishing gradient problem.

Fig. 3
figure3

ReLU function expression and image representation

After the time series is extracted by the time convolutional block, the extracted features are input to the global average pooling module to output the classification result. The global average pooling module is composed of the global average pooling layer and the Softmax layer. Global average pooling layer is shown in Fig. 4.

Fig. 4
figure4

Schematic diagram of pooling operation

The last layer of the traditional CNN is a fully connected layer. The number of parameters is very large, which is easy to cause over-fitting (such as AlexNet network). Therefore, the paper proposes to use the global mean pooling layer instead of the fully connected layer [13]. For different traditional connection layers, global averaging is performed for each feature sequence so that each feature sequence can get an output. This can greatly reduce the network parameters and avoid over-fitting. On the other hand, each feature sequence is equivalent to an output feature, and this feature represents the characteristics of the output class.

TSC is multi-category. Generally, the most common method for solving multi-classification is to design n-output nodes. For example, the neural network can design an n-dimensional array as an output. Each dimension in the array corresponds to a category of the time series. The Softmax layer transforms the output of a neural network into a probability distribution. Each dimension value represents the probability of belonging to this category, and hence by cross entropy, it can be predicted about the distance between the probability distribution and one of the real answers.

FCN is performed as a feature extract. Its overall structure is shown in Fig. 5. The FCN contains three time convolutional blocks and one global average pooling one, where filter sizes of convolutional blocks are 128, 256, and 128, respectively.

Fig. 5
figure5

The network structure of FCN [2]

The convolution operation is fulfilled by three 1D kernels without striding, whose sizes are 8, 5, and 3, respectively. The 1 × 1 convolution kernel can operate on a cross-channel aggregation and further reduce the dimension. The calculation of the time convolution block [2] is as follows:

$$ {\displaystyle \begin{array}{l}y=W\otimes x+b\\ {}s=\mathrm{BN}(y)\\ {}h=\mathrm{ReLU}(s)\end{array}} $$
(4)

where is the convolution operator.

The global average pooling block further processes the feature sequence, and the final classification result is given by the Softmax layer. Residual network is shown in Fig. 6. By adding a long jump connection to each fragment, ResNet extends the neural networks to very deep structures, which can allow the gradient flow to pass directly through the bottom layer. It achieves the most advanced performance in target detection and other visually related tasks [14]. The residual network also performs well in solving TCS. The convolutional blocks in Eq. 4 are used to build each residual block. Let Blockk denote the convolutional block with the number of filters k, and the residual block is formalized as follows [2]:

$$ {\displaystyle \begin{array}{l}{h}_1={\mathrm{Block}}_{k_1}(x)\\ {}{h}_2={\mathrm{Block}}_{k_2}\left({h}_1\right)\\ {}{h}_3={\mathrm{Block}}_{k_3}\left({h}_2\right)\\ {}y={h}_3+x\\ {}\hat{h}=\mathrm{ReLU}(y)\end{array}} $$
(5)
Fig. 6
figure6

The network structure of ResNet [2]

The number of filters is the pair (64, 128, 128). The residual network used consists of three residual blocks and a global mean pool block, as shown in Fig. 6.

Fine-tuning of models

Fine-tuning can be described as transfer learning on the same data set [9]. The training procedure consists of two distinct phases. In the initial phase, the model is trained on a given data set with selected optimal hyper-parameters. In the second one, fine-tuning is applied to the initial model. In fact, the procedure of transfer learning is iterated over in fine-tuning phase. Each repetition is initialized using the model weight of the previous iteration. At each iteration, the learning rate is halved, which accelerates the iterative process. The procedure is repeated K times, until the initial learning rate gets the lowest threshold, where K is an arbitrary constant. The algorithm is shown in Algorithm 1.

figurea

Experiments and result analysis

Experiment settings

In the experiment, FCN neural network is trained with original sequence data set and data set based on statistical feature processing, and the network performance is evaluated on the test set. In order to present the generalization ability of the neural network structure based on statistical feature sequence, the same experiment on ResNet is carried out to evaluate and compare. The original sequence data set comes from UCR time series data, which contains 44 classes, divided into training sets and test sets, as shown in https://www.cs.ucr.edu/~eamonn/time_series_data_2018/. Among them, the class with the longer time series is selected as the input sample. The training sample format in the data set is as follows:

$$ \mathrm{data}={\left[\begin{array}{l}{y}_1,{x}_1^1,{x}_2^1,...,{x}_t^1\\ {}{y}_2,{x}_1^2,{x}_2^2,...,{x}_t^2\\ {}......\\ {}{y}_N,{x}_1^N,{x}_2^N,...,{x}_t^N\end{array}\right]}_{N\times \left(t+1\right)} $$
(6)

where N is the number of training samples and t is the length of the time series in the training samples. The only preprocessing of the original sequence before the experiment is standardization. Let μ be the average and σ2 be the variance of the data, then

$$ \mathrm{data}=\frac{\mathrm{data}-\mu }{\sigma^2} $$
(7)

Then the training samples are extracted from the data to form a matrix.

$$ X=\left[\begin{array}{l}{X}^1\\ {}{X}^2\\ {}{X}^3\\ {}{X}^4\end{array}\right]={\left[\begin{array}{l}{x}_1^1,{x}_2^1,...,{x}_t^1\\ {}{x}_1^2,{x}_2^2,...,{x}_t^2\\ {}......\\ {}{x}_1^N,{x}_2^N,...,{x}_t^N\end{array}\right]}_{N\times t} $$
(8)

The class label yi corresponding to the training sample is converted to a one-hot vector Yi, which is composed of only 0 and 1. If the training sample Xi belongs to a class, the corresponding class label is 1, and otherwise is 0.

$$ Y=\left[{Y}_1,{Y}_2,...,{Y}_N\right]={\left[\begin{array}{l}1,0,0,...,0\\ {}0,1,0,...,0\\ {}0,0,1,...,0\\ {}...\\ {}0,0,0,...,1\end{array}\right]}_{C\times N} $$
(9)

where C is the number of types of time series. Normalize the time series samples (X, Y) into the FCN neural network for training. We call statistical feature data set, which is shown in Fig. 7.

Fig. 7
figure7

Subsequence division diagram

With regard to time series preprocessing, the traditional machine learning method is to extract the statistical features of the entire time series and then use the algorithm of random forest to classify. The modern deep learning method is to convolute directly on the normalized original sequence [2], then enter into FCN. In this paper, we have found a better time series preprocessing method. Before the standardization, the original sequence is divided into multiple subsequences according to the given step size and subsequence length, and then the statistical features are extracted from each subsequence (maximum, mean, polar difference, variance, peak trough, kurtosis, etc.), and let these statistical features constitute a new time series as the input of the neural network. On the data set, the final input data samples are described as follows:

$$ {\left[\begin{array}{c}{x}_1^1,{x}_2^1,\dots, {x}_t^1\\ {}{x}_1^2,{x}_2^2,\dots, {x}_t^2\\ {}\dots \\ {}{x}_1^N,{x}_2^N,\dots, {x}_t^N\end{array}\right]}_{N\times t}\kern0.5em \underrightarrow{sl,s, nsf}{\left[\begin{array}{c}{x}_1^1,\dots, {x}_{\mathrm{nsf}}^1\\ {}{x}_1^2,\dots, {x}_{\mathrm{nsf}}^2\\ {}\dots \\ {}{x}_1^N,\dots, {x}_{\mathrm{nsf}}^N\end{array}|\begin{array}{c}{x}_1^1,\dots, {x}_{\mathrm{nsf}}^1\\ {}{x}_1^2,\dots, {x}_{\mathrm{nsf}}^2\\ {}\dots \\ {}{x}_1^N,\dots, {x}_{\mathrm{nsf}}^N\end{array}|\begin{array}{c}\dots \\ {}\dots \\ {}\dots \\ {}\dots \end{array}\right]}_{N\times \mathrm{nsf}\frac{t-\mathrm{sl}+s}{s}}={X}_{\mathrm{new}} $$
(10)

where sl is the length of the subsequence, s denotes to stride, and nsf refers to the number of statistical features selected. Moreover, new time series samples (Xnew, Y) are standardized and then input into FCN neural network for training.

Hyper-parameter setting

FCN and ResNet are trained with the Adam algorithm. Using the learning rate attenuation (the learning rate decreases as the number of iterations increases), the initial value of the learning rate is set to 0.001, and the final value of the fine-tuning 0.0001; in order to avoid the small amount of data and when the neural work is over-fitting, the method of early termination is adopted, in which the optimal value does not change for 50 generations, and the network determines that the optimal value is found, and the program ends early. Batch normalization parameters β1 = 0.9, β2 = 0.999, ε = 10−8. The loss function for all tested models is categorical cross entropy. The number of iterations is 2000.

Experiment and evaluation

In order to present the generalization ability based on the statistical feature method, the same experiments are performed on the residual network. The performance of FCN and ResNet has been presented to be superior to many other deep learning algorithms in dealing with the task of time series classification. According to the standardized MPCE score based on the mean t test given in [2], the rankings of FCN and ResNet are shown in Fig. 8. In the experiment, FCN and ResNet are trained with standardized raw data sets and new data sets based on statistical features, and then the test set is used for error and loss analysis to draw a conclusion. The experimental environment uses Google Colaboratory and the GPU model is Tesla K80. Google Colaboratory is a research tool open to Google, mainly used for machine learning development and research, where you can run python code to train neural networks.

Fig. 8
figure8

MPCE ranking of time series algorithm

Mean per class error (MPCE) is the arithmetic mean of each type of error, which can be used to evaluate the classification performance of the specific models on multiple data sets. The calculation formula is as follows:

$$ {\displaystyle \begin{array}{l}{\mathrm{PCE}}_k=\frac{1-\mathrm{acc}}{C_k}\\ {}{\mathrm{MPCE}}_i=\frac{1}{K}\sum {\mathrm{PCE}}_k\end{array}} $$
(11)

The basic idea of MPCE is as follows: the expected error rate for a single class across all the data sets. By considering the number of classes, MPCE is more robust as a baseline criterion [2].

Results and analysis

The experiment limited the length of the time series, selected 14 classes with a time series length exceeding 350, and calculated the original time series and the time series dimensions after extracting the statistical features. The dimensions of the two time series are shown in Table 1, where the original time series (OTS) and new sequence of extracted features (SEF).

Table 1 The dimensions of OTS and new SEF

From the data in Table 1, it can be found that the original time series is divided into multiple subsequences and the data dimension of the new time series formed after extracting the statistical features is significantly reduced, and the new sequence maintains the statistical characteristics of the original series and removed redundant part, so if the new sequence is input to FCN, the calculation of the neural network can be simplified and the operation efficiency can be improved while ensuring the correct rate. The original sequence and the extracted features are first tested in FCN. The final loss and accuracy of the various types, as well as the final results of the network’s runtime, are shown in Table 2.

Table 2 Comparison of loss, accuracy, and time of OTS and characteristic subsequences on FCN

From Table 2, in many time series categories, the new sequence based on statistical features is more accurate in the FCN than the original sequence in terms of the loss function, and the overall performance is better. The FCN network has a short classification time and a small average error. The loss and accuracy of the original sequence and the new sequence based on statistical features are obtained and visualized. Accuracy images of the original sequence and feature subsequence on the FCN network, as shown in Fig. 9, the results show that the accuracy of the FCN based on the statistical feature subsequence is superior to the original sequence in most of the classes (the latter several classes are more obvious), and the trend of the two curves in the first few classes is more consistent, and the algorithm runs relatively. In the visual image of the loss of the original sequence and the feature subsequence on the FCN network, from Fig. 10, we can find that the loss of the FCN based on the statistical feature subsequence is generally smaller than that of the original sequence, and the trends of the two curves are relatively consistent.

Fig. 9
figure9

The accuracy of the original sequence and feature subsequence on FCN

Fig. 10
figure10

The loss of original sequence and feature subsequences on FCN

In order to verify the generalization ability of the method based on statistical features, the same experiments are performed on the residual network and the same analysis. The accuracy, loss, and time of the original sequence and feature subsequence on ResNet are shown in Table 3.

Table 3 Comparison of loss, accuracy, and time of OTS and characteristic subsequences on ResNet

Table 3 shows that the ResNet performs well in the statistical feature subsequence classification. In many time series categories, when the loss function is equivalent, the new sequence based on statistical features is more accurate than the original sequence in ResNet, and the overall performance is better than the original sequence in terms of accuracy or loss. The ResNet network has a short classification time and a small class average error. The loss and accuracy of the original sequence and the new sequence based on statistical features are obtained and visualized. From the visual image of the accuracy on the original sequence and feature subsequence using the ResNet network, Fig. 11 shows that the accuracy of ResNet based on the statistical feature subsequence is still better than that of the original sequence in many classes, but there are some fluctuations in curve changes. In the previous classes, the performance of ResNet based on statistical feature subsequence is improved. After the seventh time series, classes begin to improve. In the visual image of loss degree on the original sequence and feature subsequence, it can be seen that the loss of ResNet network on statistical feature subsequence is generally smaller than that of the original sequence, and the trend of change of the two curves is relatively consistent, as shown in Fig. 12.

Fig. 11
figure11

The accuracy of the original sequence and feature subsequence on ResNet

Fig. 12
figure12

The loss of the original sequence and feature subsequences on ResNet

Improved experiments and result analysis

In this section, we improve the experiments from the following two aspects: increase the number of original sequences and extract more statistical features.

Increasing the number of original sequence classes

In the previous experiment, we used sequence data from the UCR time series data set, as shown in https://www.cs.ucr.edu/~eamonn/time_series_data_2018/. Fourteen classes were selected from 85 classes with the length of time series exceeding 350 steps. This improvement increased the number of the original time series classes. The number is 22. In this case, the whole convolutional neural network still performs better in the subsequence based on the statistical characteristics. Accuracy and loss are shown in Figs. 13 and 14, respectively.

Fig. 13
figure13

Accuracy of SubSeq_FCN and traditional FCN

Fig. 14
figure14

Loss of SubSeq_FCN and traditional FCN

Extracting more statistical features

In the data experiment stage, more statistical features can be extracted, such as k-order origin center distance, linear fitting best curve slope, peak value, maximum transform amplitude, and average transform rate. Then we provide on the contrast experiment from the accuracy and the loss aspect, as shown in Figs. 15 and 16.

Fig. 15
figure15

Accuracy of SubSeq_FCN and traditional FCN

Fig. 16
figure16

Loss of SubSeq_FCN and traditional FCN

From Figs. 15 and 16, although the 9 and 19 subsequences are not as accurate as the original sequences, the accuracy of the subsequences on most classes is improved with the addition of some features.

Conclusions

FCN has been shown to achieve state-of-the-art performance on the task of TSC. Based on FCN, this paper studies the time series preprocessing and found a way to improve the performance of FCN classification. The method uses the window slicing principle to divide the original time series into multiple equal-length subsequences, then extracts statistical features on each subsequence to form a new sequence as the input of the neural network, adds the fine-tuning idea in the network training process, and then evaluates the classification performance of the network by using test set. Finally, the sample sequence complexity and network classification loss accuracy are compared with the FCN using the original sequence. It is proved that the FCN performs better on the samples of the statistical features of the extracted subsequences. In order to prove that the proposed method has generalization ability to the network structure and the same experiment and analysis on the ResNet, the method based on statistical feature subsequence also improves the classification performance of ResNet.

Several problems remain unsolved. Some of the interesting problems are how to adjust subsequence length/step length and regularize a network and how to use LSTM-FCN, ALSTM-FCN [9], hierarchies of feature [19], or time aware [20, 21] to improve the performance of time series classification. In addition, we will try to apply the method of classification service in distributed cloud/fog environment [22,23,24,25,26,27,28,29,30,31,32] in the future.

Availability of data and materials

The original sequence data set comes from the UCR time series and the data can be found in https://www.cs.ucr.edu/~eamonn/time_series_data_2018/.

Abbreviations

DTW:

Dynamic time warping

FCN:

Fully convolutional neural networks

KNN:

K-nearest neighbor

MPCE:

Mean per class error

OTS:

Original time series

SEF:

Sequence of extracted features

SVM:

Support vector machines

TSC:

Time series classification

References

  1. 1.

    Z. Cui, W. Chen, Y. Chen, Multi-scale convolutional neural networks for time series classification [J]. CoRR1603(06995) (2016)

  2. 2.

    Z. Wang, W. Yan and T. Oates. Time series classification from scratch with deep neural networks: a strong baseline [J]. International Joint Conference on Neural Networks (IJCNN). 2017, 1578-1585.

  3. 3.

    J. Lin, E. Keogh, L. Wei, S. Lonardi, Experiencing SAX: a novel symbolic representation of time series [J]. Data Mining and knowledge discovery15(2), 107–144 (2007)

  4. 4.

    M.G. Baydogan, G. Runger, E. Tuv, A bag-of-features framework to classify time series. IEEE Transactions on Pattern Analysis and Machine Intelligence35(11), 2796–2802 (2013)

  5. 5.

    P. Schäfer. The boss is concerned with time series classification in the presence of noise [J] Data Mining and Knowledge Discovery, 2015, 29(6): 1505-1530.

  6. 6.

    P. Schäfer, Scalable time series classification [J]. Data Mining and Knowledge Discovery30(5), 1273–1298 (2016)

  7. 7.

    J. Lines, A. Bagnall, Time series classification with ensembles of elastic distance measures [J]. Data Mining and Knowledge Discovery29(3), 565–592 (2015)

  8. 8.

    A. Bagnall, J. Lines, J. Hills, A. Bostrom, Time-series classification with COTE: the collective of transformation-based ensembles [J]. IEEE Transactions on Knowledge and Data Engineering 27(9), 2522–2535 (2015)

  9. 9.

    F. Karim, S. Majumdar, H. Darabi, S. Chen, LSTM fully convolutional networks for time series classification [J]. IEEE Access., 1662–1669 (2018)

  10. 10.

    Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista, The UCR time series classification archive [Online], 2015, www.cs.ucr.edu/~eamonn/time_series_data/

  11. 11.

    H I Fawaz, G. Forestier, J. Weber, et al. Deep learning for time series classification: a review [J]. Data Mining and Knowledge Discovery, 2019, 33(4): 917-963, 2019.

  12. 12.

    C. Lea, R. Vidal, A. Reiter, and G. D. Hager, Temporal convolutional networks: a unified approach to action segmentation [C]. ECCV 2016: Computer Vision – ECCV 2016 Workshops, 2016: 47-54.

  13. 13.

    J. R. Chang, Y.S. Chen. Batch-normalized maxout network in network [J]. Computer Science, 2015.

  14. 14.

    He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.

  15. 15.

    S. Loffe and C. Szegedy. Batch normalization: accelerating deep network training by reducing internal covariate shift [J], International Conference on Machine Learning, pp.448-456, 2015.

  16. 16.

    N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting [J]. Journal of Machine Learning Research15(1), 1929–1958 (2014)

  17. 17.

    Y. Zheng, Q. Liu, E. Chen, Y. Ge, J.L. Zhao, Exploiting multi-channels deep convolutional neural networks for multivariate time series classification [J]. Frontiers of Computer Science10(1), 96–112 (2016)

  18. 18.

    J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431-3440.

  19. 19.

    Y. Lei , Y. Yan, Y. Han, and F. Jiang. The hierarchies of multivalued attribute domains and corresponding applications in data mining [J]. Wireless Communications and Mobile Computing, Volume 2018, Article ID 1789121, 1-11, 2018.

  20. 20.

    L.Y. Qi, R.L. Wang, S.C. Li, Q. He, X.L. Xu, C.H. Hu, Time-aware distributed service recommendation with privacy-preservation [J]. Information Sciences480, 354–364 (2019)

  21. 21.

    L. Y. Qi, X.Y. Zhang, W.C Dou, C.H. Hu, C. Yang, J.J. Chen. A two-stage locality-sensitive hashing based approach for privacy-preserving mobile service recommendation in cross-platform edge environment [J]. Future Generation Computer Systems, 2018, 88:636-643.

  22. 22.

    H.W. Liu, H.Z. Kou, C. Yan, L.Y. Qi, Link prediction in paper citation network to construct paper correlation graph [J]. EURASIP Journal on Wireless Communications and Networking (2019). https://doi.org/10.1186/s13638-019-1561-7

  23. 23.

    W. W. Gong, L.Y. Qi, Y.W. Xu. Privacy-aware multidimensional mobile service quality prediction and recommendation in distributed fog environment [J]. Wireless Communications and Mobile Computing, vol. 2018, Article ID 3075849, 8 pages, 2018.

  24. 24.

    Y. W. Xu, L.Y. Qi, W.C. Dou, J.G. Yu. Privacy-preserving and scalable service recommendation based on SimHash in a distributed cloud environment [J]. Complexity, Volume 2017, Article ID 3437854, 9 pages, 2017.

  25. 25.

    X. K. Wang, Laurence T. Yang, L.W. Kuang, X.G. Liu, Q.X. Zhang and M. Jamal Deen, A tensor-based big-data-driven routing recommendation approach for heterogeneous networks [J], IEEE Network Magazine, 2019, 33(1): 64-69.

  26. 26.

    X.K. Wang, Laurence T. Yang, H.G. Li, M. Lin, J.J. Han, Bernady O. Apduhan, NQA: a nested anti-collision algorithm for RFID systems [J]. ACM Transactions on Embedded Computing Systems18(4) (2019). https://doi.org/10.1145/3330139

  27. 27.

    X. K. Wang, Laurence T. Yang, Y.H. Wang, X.G. Liu, Q.X. Zhang, and M. Jamal Deen, A distributed tensor-train decomposition method for cyber-physical-social services [J], ACM Transactions on Cyber-Physical Systems, 3(4), doi:https://doi.org/10.1145/3323926, 2019.

  28. 28.

    X.K. Wang, L.T. Yang, X. Xie, J.R. Jin, M.J. Deen, A cloud-edge computing framework for cyber-physical-social services [J]. IEEE Communications Magazine55(11), 80–85 (2017)

  29. 29.

    X.L. Xu, R.C. Mo, F. Dai, W.M. Lin, S.H. Wan, W.C. Dou, Dynamic resource provisioning with fault tolerance for data-intensive meteorological workflows in cloud [J]. IEEE Transactions on Industrial Informatics (2019). https://doi.org/10.1109/TII.2019.2959258

  30. 30.

    X.L. Xu, X.H. Liu, Z.Y. Xu, C.J. Wang, S.H. Wan, X.X. Yang, Joint optimization of resource utilization and load balance with privacy preservation for edge services in 5G networks [J]. Mobile Networks and Applications (2019). https://doi.org/10.1007/s11036-019-01448-8

  31. 31.

    X.L. Xu, S.C. Fu, W.M. Li, F. Dai, H.H. Gao, V. Chang, Multi-objective data placement for workflow management in cloud infrastructure using NSGA-II [J]. IEEE Transactions on Emerging Topics in Computational Intelligence (2019)

  32. 32.

    X. L. Xu, C. X. He, Z. Y. Xu, L. Y. Qi, S.H. Wan, MZA Bhuiyan. Joint optimization of offloading utility and privacy for edge computing enabled IoT [J]. IEEE Internet of Things Journal, 2019, Doi: 10.1109/JIOT.2019.2944007.

Download references

Acknowledgements

Zhang Linkun and Wang Zhengyan help us read through the text and polish some sentences. The authors thank the anonymous reviewers for their valuable suggestions and contributions to the quality of the paper.

Funding

This work is partly supported by the Natural Science Foundation of China (under grant no. 61502272), the Promotive Research Fund for Excellent Young and Middle-Aged Scientists of Shandong Province (under grant no. BS2014DX005), and Undergraduate Education Reform Project in Shandong Province (no. Z2018S022).

Author information

YL and ZW designed the study. ZW performed the simulation experiments and wrote the paper. YL and ZW reviewed and edited the manuscript. Both authors read and approved the final manuscript.

Authors’ information

Yuxia Lei obtained his Ph.D. degree in computer software and theory from the Institute of Computing Technology, CAS, in 2010. He is an associate professor. His current research interests include concept lattice, artificial intelligence, and machine learning. Zhongqiang Wu received the B.Sc. degree in computer science from Qufu Normal University, Rizhao, China, in 2019.

Correspondence to Yuxia Lei.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lei, Y., Wu, Z. Time series classification based on statistical features. J Wireless Com Network 2020, 46 (2020). https://doi.org/10.1186/s13638-020-1661-4

Download citation

Keywords

  • Time series classification
  • Statistical features
  • Full convolutional neural network