Web intrusion detection system combined with feature analysis and SVM optimization

Liu, Chao; Yang, Jing; Wu, Jinqiu

doi:10.1186/s13638-019-1591-1

Research
Open access
Published: 03 February 2020

Web intrusion detection system combined with feature analysis and SVM optimization

Chao Liu^1,2,
Jing Yang¹ &
Jinqiu Wu¹

EURASIP Journal on Wireless Communications and Networking volume 2020, Article number: 33 (2020) Cite this article

3456 Accesses
27 Citations
Metrics details

Abstract

The current network traffic is large, and the network attacks have multiple types. Therefore, anomaly detection model combined with machine learning is developing rapidly. Frequent occurrences of Web Application Firewall (WAF) bypass attacks and the redundancy of the data characteristics in Hypertext Transfer Protocol (HTTP) protocol make it difficult to extract data characteristics. In this paper, an integrated web intrusion detection system combined with feature analysis and support vector machine (SVM) optimization is proposed. By using expert’s knowledge, the characteristics of the common Web attacks are analyzed. The related data characteristics are selected by the analysis of the HTTP protocol. In the classification learning, the mature and robust support vector machine algorithm is utilized and the grid search method is used for the parameter optimization. Consequently, a better detection capability on Web attacks can be obtained. By using the HTTP DATASET CSIC 2010 data set, experiments have been carried out to compare the detection capability of different kernel functions. The results show that the proposed system performs good in the detection capability and can detect the WAF bypass attacks effectively.

1 Introduction

The 2017 Global Threat Intelligence Center (GTIC) [1] Q2 threat intelligence report pointed out that among all types of attacks, Web application have the highest proportion of attacks, accounting for 21%, of which Structure Query Language (SQL) injection accounts for 97%.Therefore, the prevention of Web attacks is still the most important. Although the WAF products are constantly upgraded, WAF bypass attacks always exist. Abnormal intrusion detection based on data mining and machine learning has been developed rapidly in order to better exploit intrusion characteristics. In 2014, Devaraju et al. used the neural network algorithms in the intrusion detection [2] to effectively perform feature extraction and classification. Zhao et al. applied the Markov model to IDS in conjunction with the commonly used method of reference [3]. Mukkamala et al. applied the supervised standard SVM algorithm to intrusion detection [4], which has better detection effect compared with the intrusion detection using the neural network method.

Jabbar Akhil proposes an intelligent network intrusion detection system using AODE algorithm for the detection of different types of attacks [5], and average one dependence estimator (AODE) is one of the recent enhancements of naive Bayes algorithm. AODE solves the problem of independence by averaging all models generated by traditional one dependence estimator and is well suited for incremental learning. Through experiments, it got a good detection result. Longzhi Yang proposes a data-driven network intrusion detection system [6], in particular, the developed system equipped with a sparse rule base not only guarantees the online performance of intrusion detection, but also allows the generation of security alerts from situations which are not directly covered by the existing knowledge base.

In the machine learning-based anomaly detection model, there are some applications of hidden Markov, SVM, and neural networks [7,8,9] in which SVM compensates the over-fitting and generalization better compared to other algorithms and has the advantages of small-scale and high-dimensional Web intrusion detection [10]. Experiments show that the performance of the SVM algorithm is directly affected by its parameters.

In 2014, Xuefeng Li proposed a network intrusion detection model based on genetic algorithm to synchronously select features and support vector machine parameters [11], aiming at the problem of high-dimensional data generated by intrusion detection system and parameter optimization of support vector machine, which can improve the accuracy of network intrusion detection and meet the real-time requirements of network intrusion detection. In order to solve the shortcomings of slow convergence and easy to fall into local optimum in the process of parameter optimization of support vector, Zhang et al. proposed particle swarm optimization algorithm [12]. Some researchers utilize the ant colony algorithm [13] to find the SVM parameters with the highest detection rate, thereby improving the detection rate. Later, PRK Varma [14] proposes a set of network traffic features that can be extracted for real-time intrusion detection and also proposes fuzzy entropy-based heuristic for ant colony optimization (ACO) in order to search for global best smallest set of network traffic features for real-time intrusion detection data set. However, in practice, the algorithm is relatively complicated, time-consuming, and does not necessarily achieve an optimal solution. Relatively, the grid search algorithm has the advantage of easy to implement and the appropriate solution can be found in a short time, which is more suitable for the complex and variable network environment. In this paper, an optimized Web intrusion detection system based on feature analysis and SVM algorithm is proposed. The hidden Markov model is used to identify the parameter types, and the grid search algorithm is applied to optimize the parameters to improve the intrusion detection rate.

The remainder of this paper is organized as follows: Starting from the analysis of Web attack characteristics, Section 2 studies the attack techniques of common Web attacks and summarizes the attack characteristics of the bypass attack techniques for the protection detection. Section 3 studies the characterization of data detection, using SVM algorithm to establish the model and optimize the parameters. The overall design of the intrusion detection system is given in Section 4, and the CSIC datasets were used to carry out experiments in the model establishing. Conclusions are drawn in Section 5.

2 Related work

When visiting a website, attackers often analyze the vulnerability of the website first and then attempt to access the sensitive resources of the website by manually designing the uniform resource locator (URL) request. After that, the website resources are modified or injected the Trojan by attackers and the corresponding network data packet will change. Common Web attacks include SQL injection, cross-site script (XSS) attack, command execution, remote code execution, and directory traversal attacks. Among them, SQL injection usually uses the SQL blind annotation to obtain the hidden information of the database and then achieve the purpose of obtaining database information and Web shell.

At present, most WAF products are based on the blacklist of expert knowledge for detection and interception. However, hackers often bypass the detection rules to achieve attack. The following is a common bypass Web attack analysis, as shown in Table 1.

Table 1 Feature analysis of bypass Web attack

Full size table

Besides, websites often have vulnerabilities in code detection and buffer overflow protection. Therefore, attackers always try to attack websites by encode bypassing or making the length of packet length large. Among them, for the detection of sensitive characters, the commonly used bypass methods are used including comments, equivalent functions or equivalent commands, special symbols, and encoding bypass of sensitive characters. These bypass laws are traceable, and the corresponding parameter features can be extracted from the attack bypassing features to mark the potential attack risk. The summary feature detection points are shown in Table 2:

Table 2 Summary of feature detection points

Full size table

According to the summarized detection points, key parameter features such as access request, parameter length, number of parameters, parameter type, parameter encoding, and parameter sensitive character are extracted. The extracted data features are used as the attribute values of the model samples, which summarize the data features of the Web attacks and solve the limitations of large network data traffic, high dimensionality, and difficulty in data feature extraction.

3 Methods

3.1 Proposed model establishment

According to the characteristics of the Web attack summarized in the previous section, especially for the common methods of bypass attacking, the parameters such as URL identifier, parameter length, sensitive character detection identifier, abnormal code detection identifier, and parameter type identifier are extracted. Then, the nonlinear SVM algorithm is applied to find a kernel function that is more suitable for this scenario. Finally, combined with cross-validation, a grid search algorithm is used to find the best combination of parameters. Because of the differences between access requests, a separate model needs to be established for each type of access request. The model establishing process is shown in Fig. 1.

3.2 Data characterization

After summarizing the characteristics of the Web attack in the previous section, the key parameter features are extracted. The characteristics of each part are as follows:

(1)
Access request identifier: A HTTP request access consists of a domain name, a file path and a commit parameter. The HTTP access request is identified by the URL and parameter name and then use the md5 encryption for desensitized.
(2)
Parameter length: Get the length of the character according to the extracted “post” or “get” parameters. Transform sequence x₁, x₂, ⋯, x_n: $ {y}_i=\frac{x_i-\overline{x}}{s} $. In which$ \overline{x}=\frac{1}{n}\sum \limits_{i=1}^n{x}_i $, $ s=\sqrt{\frac{1}{n-1}\sum \limits_{i=1}^n{\left({x}_i-\overline{x}\right)}^2} $. Then the mean value of new sequence y₁, y₂, ⋯, y_n is 0 and the variance is 1.
(3)
Number of the parameters: Separating the parameters to obtain each parameter name and its corresponding value. Normalizing the parameters using the Min-max method, in which v_i represents the number of parameters. Performing Min-max transformation: $ {y}_i=\frac{v_i-{v}_{\mathrm{min}}}{v_{\mathrm{max}}-{v}_{\mathrm{min}}},i=1,2,\cdots, n $, $ {v}_{\mathrm{min}}=\underset{1\le i\le n}{\min}\left\{{v}_i\right\} $, and $ {v}_{\mathrm{max}}=\underset{1\le i\le n}{\max}\left\{{v}_i\right\} $. Then, the normalized attribute value data y₁, y₂, ⋯, y_n falls in the interval [0 1].
(4)
Abnormal Encoding Detection Identification: Define 9 encoding types: urldecode, md5, sha1, sha256, base64, unicode, utf8, html entity encoding, and undefine (normal). Among them, the lengths of md5, sha1, and sha256 are different while the modes are the same. Therefore, the min-max method is used for the normalization processing.
(5)
Sensitive character detection identifier: A sensitive character library is built for general attack, SQL injection, and sensitive directory scanning. Then, the Min-max method is used for the normalization processing.
(6)
Parameter type identification: According to the different parameters of different URL requests, the hidden Markov model (HMM) algorithm is trained separately to obtain the score of the HMM algorithm. The type of the parameter is identified by a numerical value and the feature is digitized. Then, the z-score method is used for normalization, which will be expanded in the next section.

3.3 The identification of parameter type by the HMM algorithm

The HMM algorithm [15] describes how to convert a hidden Markov chain into a state sequence and how to obtain an observation sequence from a state sequence.

It can be concluded that the parameter value of the HTTP access request consists of letters, numbers, connectors (-_\), other special characters (ASCII codes with the number 32-47), and other characters such as Chinese characters. The hidden state is recorded as: S1, S2, S3, S4, S5, and the generalization of the parameter values is as follows: [a-zA-Z] is generalized to A, [0-9] is generalized to B, [-_\] generalized to C, characters with ASCII code 32-47 are generalized to D, and Chinese and other characters are generalized to N, as shown in Fig. 2.

Since the value range of the parameter values in each URL is different, it is necessary to establish HMM model according to different parameters in different URLs. Furthermore, the HMM model parameters of each parameter in each URL are obtained, and the calculation of the model parameters is performed using the Baum-Welch algorithm.

After the HMM model is obtained, all the access requests are traversed: matching the corresponding HMM models and then using the forward algorithm to calculate the probability of different parameters in different URLs. The probability of different parameters in the same URL is summed up as the tag value of this URL parameter type.

3.4 SVM algorithm modeling

SVM is a binary classification model, which is the classifier defined in the feature space with the largest interval. The learning strategy is to maximize the interval, which can be formalized into a problem of solving convex quadratic programming. The learning algorithm of SVM is an optimization algorithm for solving convex quadratic programming.

Assume that the training set sample is T = {(x₁, y₁), (x₂, y₂), ⋯, (x_n, y_n)} among them, $ {x}_i\in \mathcal{X}={R}^m,{y}_i\in \mathcal{Y}=\left\{-1,+1\right\},i=1,2,\cdots, n $. The feature vector x_i of the i-th sample, that is, the parameter vector participating in the operation, y_i is the flag of the i-th sample, and when y_i = − 1, the sample is an attack sample, and when y_i = + 1, the sample is a normal sample.

When the training data is linearly inseparable, the soft-interval SVM algorithm is applied, and a slack variable ξ_i ≥ 0 is introduced for each sample, and a corresponding penalty parameter C ≥ 0 is added to the corresponding objective function to obtain the following convex quadratic programming problem:

$$ {\displaystyle \begin{array}{l}\underset{\omega, b}{\min}\frac{1}{2}{\omega}^2+C\sum \limits_{i=0}^n{\xi}_i\\ {}s.t.\kern0.5em {y}_i\left(\omega \cdot {x}_i+b\right)\ge 1-{\xi}_i,i=1,\cdots, n\\ {}\begin{array}{cc}& \end{array}\kern0.5em {\xi}_i\ge 0,i=1,2,\cdots, n\end{array}} $$

(0.1)

$$ {\displaystyle \begin{array}{l}\underset{\omega, b}{\min}\frac{1}{2}\sum \limits_{i=1}^n\sum \limits_{j=1}^n{\alpha}_i{\alpha}_j{y}_i{y}_j\mathrm{K}\left({x}_i\cdot {x}_j\right)-\sum \limits_{i=1}^n{\alpha}_i\\ {}s.t.\kern0.5em \sum \limits_{i=1}^n{\alpha}_i{y}_i=0\\ {}\begin{array}{cc}\begin{array}{cc}& \end{array}& 0\le {\alpha}_i\le C,i=1,2,\cdots, n\end{array}\end{array}} $$

(0.2)

Among them, C ≥ 0 is the penalty parameter, and Κ(x_i ⋅ x_j) is the kernel function. The parameter characteristics of different access requests are independent of each other, and each access request is modeled to more accurately represent the characteristics of an access request. First, get a list of URLs, sort the dataset by URL, and get the dataset for each URL. The model establishment process of the i-th URL request is as follows:

In the first step, the HMM algorithm is used to calculate the tag value of each parameter in the request, and the model parameters of the HMM algorithm are stored for data preprocessing during detection.

In the second step, the data is implemented in vectorization for the calculation of the algorithm, and the SVM algorithm is used for training and learning to obtain the model.

In the third step, the established model is stored for the use in the matching model for intrusion detection. A corresponding model is created for each URL request in the list for application detection.

3.5 Parameter optimization

Cross-validation (CV) [16, 17] groups the original data into training sets and test sets. First training with the training set, and then using the test set for verification, which is a good statistical analysis method for classifier performance testing.

Grid search method refers to the parameter dividing and traversing all points in the grid according to the network within a given range. For the selected parameters, the CV method is used to calculate the classification accuracy rate under this value. And the set of parameters with the highest classification accuracy is taken as the optimal parameter. In order to avoid the occurrence of over-learning state, when the penalty parameter C has multiple values, the minimum value is selected as the optimal parameter.

4 Results and discussion

4.1 Design of the intrusion detection system

The feature analysis and SVM algorithm-optimized Web intrusion detection system proposed in this paper is mainly composed of data preprocessing, model research, and event response. The system structure diagram is shown in Fig. 3.

Among them, the data preprocessing stage is mainly divided into two parts: parameter feature detection and data normalization processing:

1)
Parameter feature detection: According to the detection points summarized in the Web attack feature, the data is characterized, and the parameter matching value of the data packet is obtained by using the string matching algorithm which is taken as the tag value of the attribute.
2)
Data normalization processing: Data normalization processing is mainly performed for the abovementioned detected feature values, so as to facilitate statistical analysis of subsequent data and satisfy the data requirements of the SVM algorithm.

The model analysis part is mainly divided into two parts: model establishment and model detection:

1)
Model establishment: Firstly, the data set is classified according to the URL to obtain the data sets of each URL. Then the data is preprocessed to obtain normalized data.
Using the HMM algorithm to identify the parameter type and the python module LIBSVM to implement the SVM algorithm. The parameter optimization is achieved by the grid search algorithm to find an appropriate parameter to ensure a better classification effect. In the simulation stage, different kernel functions can be adopted by modifying the input parameters of the functions in the LIBSVM module to seek for an optimal model, which can ensure each model has a better detection result. The model is established for each URL identifier, and the classification model function for each URL identifier is obtained for the application detection of the model.
2)
Model detection: Firstly, the network accesses the network packet of the HTTP application layer and then performs protocol parsing to extract the parameter features. The normalized data is obtained according to the data preprocessing method, and the application detection is performed by matching the model data to detect whether there is an attack. If there is an abnormal attack, the detection point of the exception is extracted for the response of the event.

The event response part is composed of two parts: event database establishment and event warning.

1)
Event database establishment: The corresponding event database is generated according to the parameter features in the Web attack feature analysis, and then, the event database is matched basing on the detected abnormality detection points to obtain a more likely attack event.
2)
Event warning: Using the detection information to make an early warning prompt and give a possible attack mode.

4.2 Simulation

4.2.1 Dataset

The dataset is derived from the HTTP dataset Canadian Society Of Immigration Consultants (CSIC) 2010 developed by the Information Security Institute of CSIC (National Committee for Spanish Studies), which generates traffic for e-commerce Web applications, including 36,000 normal requests and more than 25,000 exception requests. HTTP requests are marked as normal or abnormal. The data set includes SQL injection, buffer overflow, information collection, file disclosure, Carriage Return/line Feed (CRLF) injection, XSS, server-side inclusion, parameter tampering, and other attacks. It is divided into three different subsets. In the training phase, a subset with normal flow is selected. In the testing phase, two subsets are selected, one of which is normal traffic and the other one is malicious traffic.

4.2.2 Data pre-processing

The data set is consist of HTTP protocol request. Firstly, the access path and access parameters are extracted from them and then detecting the summarized parameter features to obtain the attribute values of the model.

Normalized data is obtained through data preprocessing. The training set data is classified according to the URL identifier, and the training sets with different URL identifiers are obtained and the training sets of each group are established respectively.

4.3 Experiment

The specific system information is as follows:

1)
System: linux (ubuntu16.04)
2)
Software environment: hadoop2.7.3, openjdk-8-jdk, anaconda, libsvm3.22, hmmlearn.

Hadoop Distributed File System (HDFS) is used to store data sets, anaconda is used to complete data preprocessing, the hmmlearn library is used to implement HMM model, the libsvm library is used to implement SVM algorithm, and then, the model system is established.

Because of the difference between access requests, it is necessary to separately establish models for each type of access request. First, the data set is classified according to the URL, and the data set of each URL is obtained. Data preprocessing is then performed to obtain normalized data.

The parameter type tagging process using the HMM learn library is divided into two steps:

In the first step, the data is classified according to different parameters of different URLs, and then, the pre-processing is performed to obtain the observation set of the parameter values to calculate the model parameters. The Gaussian HMM is used to calculate the model and the resulting model parameters are saved.

In the second step, the obtained model is used for calculating the probability of each observation sequence, which is used to identify the parameter type of the URL. The different parameter values of the same URL are preprocessed and then written into the dataset.

Use the score function to calculate the score value of each parameter value which is shown in Table 3. A larger value indicates a higher probability that the value of the parameter will appear and vice versa. Taking the username parameter of a certain URL as example:

Table 3 HMM calculation score value

Full size table

This section uses classification accuracy (ac) to evaluate the classification effect of the model. Through data preprocessing, the data set of the SVM algorithm is obtained. The calculation of the SVM algorithm is implemented using the LIBSVM module C-Support Vector Classification (C-SVC) class problem. The time complexity of the SVM algorithm is Ο(m²n²) which is closely related to the number of samples (m) and the dimension of the feature (n). Meanwhile, the optimization problem directly affects the computation time of the algorithm. The Sequential Minimal Optimization (SMO) algorithm optimization process selects two samples per operation, which can greatly shorten the calculation time.

First, select the kernel function. After experiments, the radial basis function kernel (RBF) kernel, the polynomial kernel, and the sigmod kernel are selected as the kernel functions to build the model. The classification accuracy of the model is shown in Table 4, taking three random selected URL as example:

Table 4 Classification accuracy of different kernel functions

Full size table

The experimental data shows that the classification accuracy of RBF core is better than the polynomial kernel, and the polynomial kernel is better than the Sigmod core. Therefore, this paper adopts the RBF kernel function to establish the model which can guarantee the accuracy of the classification.

The SVM algorithm with the RBF kernel is implemented after which the grid search parameter optimization is realized. The detection rate of the SVM algorithm with different C, σ is shown in Table 5:

Table 5 Detection rate of different values of parameters

Full size table

Experiments show that the value of the parameter greatly affects the detection effect. It is especially important to select an appropriate parameter. This paper is devoted to finding the appropriate parameters to ensure that the established model has a good detection rate in a short time. Therefore, the grid search algorithm is selected. In the LIBSVM module, a grid.py is provided to implement the grid search. The parameters are continuously changed according to the fixed step size, and then, the detection rate is calculated. The set of parameters with the highest detection rate is taken as the optimal parameter. Although this may not be the best one, it guarantees that the models established for each type of URL has an appropriate parameter to ensure the detection capabilities.

In this paper, the HMM algorithm is used to identify the parameter type with different parameters, through experiments, building the model with and without parameter type identification respectively, and selecting the most representative five URLs. The classification accuracy of the model is shown in Table 6.

Table 6 Classification accuracy of different attributes

Full size table

Experiments show that the HMM algorithm can distinguish abnormal parameters from normal parameters, which can improve the detection effect of the model..

Finally, five sets of data were randomly selected and tested with the established model and the detection result is shown in Table 7.

Table 7 Detection result of the model

Full size table

Experiments show that the model can achieve 95.8% detection rate and 2.32% false alarm rate, which is a very good effect.

In addition, in the existing research of hybrid intrusion detection model, the detection effect obtained by using KDD CUP99 training set is as shown in Table 8.

Table 8 Detection result of the other existing model

Full size table

It is shown that in the existing research of hybrid intrusion detection model, the detection rate has mostly reached 90%, and the false alarm rate is less than 10%, so that the intrusion detection model studied in this paper has a relatively good detection effect.

5 Conclusion

This paper summarizes the characteristics of common Web attacks, combines with the analysis of HTTP protocol, and selects relevant data features, which solves the limitations of large network data traffic, high dimension, and difficulty in data feature extraction. Because there are large differences in the parameter characteristics of different access requests, separate models are established for each type of access requests. Data set are sorted by URL and then preprocess the data to obtain normalized data. Then, the parameter type is identified by the HMM algorithm. SVM algorithm is used for the learning classification and finally using the grid search method for parameter optimization. The simulation and experimental results show that the SVM algorithm with RBF kernel function has better detection effect. At the same time, the using of grid search optimization can accelerate the parameter optimization process and ensure each model has a better detection capability in a short time, which is especially important in the dynamic and complex network environment.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on a reasonable request

Abbreviations

WAF:: Web application firewall
HTTP:: Hypertext Transfer Protocol
GTIC:: Global Threat Intelligence Center
SQL:: Structure Query Language
HMM:: Hidden Markov Model
URL:: Uniform Resource Locator
XSS:: Cross-site script
CV:: Cross-validation
CSIC:: Canadian Society Of Immigration Consultants
CRLF:: Carriage Return Line Feed
C-SVC:: C-Support Vector Classification
RBF:: Radial Basis Function kernel
SMO:: Sequential Minimal Optimization

References

NNT Security. 2017 Global Threat Intelligence Center (GTIC) Quarterly Threat Intelligence report Q2. http://www.nttcomsecurity.com/uploads/documentdatabase/NTT%20Security%20Q2%202017_FINAL.pdf
Google Scholar
S. Devaraju, Performance comparison for intrusion detection system using neural network with KDD Dataset [J]. Ictact J. Soft Comput. 4(3), 743–752 (2014)
Article Google Scholar
Y. Zhao, Network intrusion detection system model based on data mining [C]// Ieee/acis International Conference on Software Engineering, Artificial Intelligence, NETWORKING and Parallel/distributed Computing. IEEE 1, 155–160 (2016)
Google Scholar
Mukkamala S, Janoski G, Sung A. Intrusion detection using neural networks and support vector machines[C]// International Joint Conference on Neural Networks. IEEE,(2002), pp.1702–1707
Yang L , Li J , Fehringer G, et al. Intrusion detection system by fuzzy interpolation [C]// IEEE International Conference on Fuzzy Systems. IEEE, 2017.
Google Scholar
Akhil J , Sultana A. Intelligent network intrusion detection system using data mining techniques [C]// ICATCCT16. IEEE, 2016.
Google Scholar
D.P. Muni, N.R. Pal, J. Das, Genetic programming for simultaneous feature selection and classifier design [J]. IEEE Trans. Syst. Man. Cybern. B. Cybern. 36(1), 106–117 (2006)
Article Google Scholar
J. Kennedy, R.C. Eberhart, Particle swarm optimization [C]// Proceedings of IEEE International Conference On Neural Networks (IEEE, Perth, 1995), pp. 1942–1948
Google Scholar
C.-H. Chen, H.-Y. Kung, Feng-Jang Hwang. Deep learning techniques for agronomy applications [J]. Agronomy 03(9), 1 (2019)
Google Scholar
S.X. Tang, W.J. Cai, Intrusion detection based on unsupervised clustering and hybrid genetic algorithm [J]. J. Comput Appl. 28(2), 409–411 (2008)
MATH Google Scholar
X. Li, Network intrusion detection using genetic algorithms for synchronized selection of features and support vector machine parameters [J]. Comput. Appl. Softw. 3, 301–303 (2014)
Google Scholar
T. Zhang, J. Wang, Network intrusion detection model based on CQPSO-LSSVM [J]. Comput. Eng. Appl. 51(2), 113–116 (2015)
Google Scholar
L.I. Zhengang, Q. Gan, University T C, Network intrusion detection model based on MACO-SVM[J].J.Journal of Chongqing University of Posts and Telecommunications. 26(6), 785–789 (2014)
P.R.K. Varma, V.V. Kumari, S.S. Kumar, Feature selection using relative fuzzy entropy and ant colony optimization applied to real-time intrusion detection system [J]. Procedia Comput. Sci. 85, 503–510 (2016)
Article Google Scholar
L. Denoyer, H. Zaragoza, P. Gallinari, HMM-based passage models for document classification and ranking [J]. Ecir (2016), pp. 126–135
Google Scholar
Y. Zhang, B. Li, H. Lu, et al., Sample-specific SVM learning for person re-identification [C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2016), pp. 1278–1287
Google Scholar
J.I. Changming, T. Zhou, T. Xiang, et al., Application of support vector machine based on grid search and cross validation in implicit stochastic dispatch of cascaded hydropower stations [J]. Electric Power Automation Equip. 34(3), 125–131 (2014)
Google Scholar
B. Pfahringer, Winning the KDD99 classification cup: bagged boosting. ACM SIGKDD Explorations Newsletter 1(2), 65–66 (2000)
Article Google Scholar
I. Levin, KDD-99 classifier learning contest: LLSoft’s results overview. ACM SIGKDD Explorations Newsletter 1(2), 67–75 (2000)
Article Google Scholar
C. Xiang, S.M. Lim, in IEEE Workshop on Machine Learning for Signal Processing. IEEE. Design of multiple-level hybrid classifier for intrusion detection system (2005), pp. 117–122
Chapter Google Scholar
L. Kuang, M. Zulkemine, An anomaly intrusion detection method using the CSI-KNN algorithm. In; Proceedings of the 2008 ACM symposium on Applied computing. ACM (2008), pp. 921–926
Google Scholar
Z. Ma, Application of hybrid soft computing technology in intrusion detection [Doctoral Dissertation] (Chongqing University, Chongqing, 2010)
Google Scholar

Download references

Funding

The authors thank the project of the National Natural Science Foundation of China No.61672179, No.61370083, No.61402126, and No. 61901134; the Natural Science Foundation of Heilongjiang Province (Grant No. F2015030, 201336); the Fundamental Research Funds in Heilongjiang Provincial Department of Education No. 135109247; and the Science Foundation for Young Scientists of Qiqihar University No. 2014k-M09. The authors also thank the Technical Bureau of Qiqihar GYGG-201516 and GYGG-201622.

Author information

Authors and Affiliations

Harbin Engineering University, Harbin, China
Chao Liu, Jing Yang & Jinqiu Wu
Qiqihar University, Qiqihar, China
Chao Liu

Authors

Chao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jinqiu Wu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JY contributed to the conceptualization, software, manuscript reviewing, editing, and funding acquisitions. CL contributed in the software development, investigation, methodology, simulations, result analysis, and draft manuscript writing. All authors read and approved the final manuscript

Corresponding author

Correspondence to Jing Yang.

Ethics declarations

Competing interests

The authors declare that they have no competing interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Liu, C., Yang, J. & Wu, J. Web intrusion detection system combined with feature analysis and SVM optimization. J Wireless Com Network 2020, 33 (2020). https://doi.org/10.1186/s13638-019-1591-1

Download citation

Received: 15 July 2019
Accepted: 30 October 2019
Published: 03 February 2020
DOI: https://doi.org/10.1186/s13638-019-1591-1

Web intrusion detection system combined with feature analysis and SVM optimization

Abstract

1 Introduction

2 Related work

3 Methods

3.1 Proposed model establishment

3.2 Data characterization

3.3 The identification of parameter type by the HMM algorithm

3.4 SVM algorithm modeling

3.5 Parameter optimization

4 Results and discussion

4.1 Design of the intrusion detection system

4.2 Simulation

4.2.1 Dataset

4.2.2 Data pre-processing

4.3 Experiment

5 Conclusion

Availability of data and materials

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords