MFVT:An Anomaly Tra�c Detection Method Merging Feature Fusion Network and Vision Transformer Architecture

. Abstract Network intrusion detection, which takes the extraction and analysis of network traﬃc features as the main method, plays a vital role in network security protection. The current network traﬃc feature extraction and analysis for network intrusion detection mostly uses deep learning algorithms. Currently, deep learning requires a lot of training resources, and have weak processing capabilities for imbalanced data sets. In this paper, a deep learning model (MFVT) based on feature fusion network and Vision Transformer architecture is proposed, to which improves the processing ability of imbalanced data sets and reduces the sample data resources needed for training. Besides, to improve the traditional raw traﬃc features extraction methods, a new raw traﬃc features extraction method (CRP) is proposed, the CPR uses PCA algorithm to reduce all the processed digital traﬃc features to the speciﬁed dimension. On the IDS 2017 dataset and the IDS 2012 dataset, the ablation experiments show that the performance of the proposed MFVT model is signiﬁcantly better than other network intrusion detection models, and the detection accuracy can reach the state-of-the-art level. And, When MFVT model is combined with CRP algorithm, the detection accuracy is further improved to 99.99%.


Introduction
The rapid development of the mobile Internet not only brings great convenience to network users and society but also allows criminals to create a series of attacks in the network. These attacks have seriously threatened the normal operation of the network, not only caused a lot of economic losses but also brought hidden dangers to national security. A group of behaviors that violate computer security policies such as confidentiality, integrity, and availability are defined as intrusion detection [1]. As a security protection system used to monitor computer network, the intrusion detection system can detect suspicious behaviors and take corresponding measures to ensure the normal operation of the network and reduce economic losses, which has been in use since the 1980s [2,3]. Recently, due to the rapid development of mobile Internet, attacks on Internet-connected devices are gradually increasing. Thus, many scholars have a strong interest in the research of intrusion detection systems and good detection results have been achieved [4].
Besides, the detection of anomaly network traffic is an important task of network intrusion detection, which is essential to classify network traffics [5], which requires researchers to make accurate judgments on the collected network traffic data and detect network traffic with offensive behavior. To detect anomaly traffics more effectively, network traffic packets are usually divided into flows according to source IP, destination IP, source port, destination port, protocol and timestamp [6]. The current anomaly traffic detection technology mainly includes: traditional network anomaly traffic detection technology and network anomaly traffic detection method based on machine learning. In this paper, deep learning methods were used to classify network traffics. Deep learning methods have the characteristics of end-to-end and automatic extraction of network traffic data features, to avoid the cumbersome process of manual extraction of features, and deep learning methods have good adaptability, self-organization and promotion ability. So, the use of deep learning can make the detection system have more stable performance and higher detection efficiency [7].
However, deep learning technology needs a large amount of labeled data for training, and labeled data requires experts with specific knowledge to spend a lot of time on labeling, which is time-consuming and laborious. Most of the data sets used in deep learning are imbalanced data sets. These problems cause a significant impact on the performance of deep learning models. Under-sampling and over-sampling are commonly used to solve data imbalance problems, but under-sampling will discard some data leading to the loss of some features, and over-sampling will add some data leading to changing the original data distribution, both of which have an impact on the experimental accuracy [8]. In this paper, the traffic features learned from a two-layer convolutional networks are fused, which can alleviate the impact of data imbalance on the accuracy of the experiment. Due to the outstanding performance of Transformer architecture in the field of natural language processing (NLP) and the limitations of its application in computer vision, Dosovitskiy [9] improved the Transformer architecture and proposed Vision Transformer architecture for image sequence converter realize image classification and achieved good results. Meanwhile, experiments proved that Vision Transformer required fewer training resources. Inspired by the vision transformer architecture, a deep learning model (MFVT) based on the feature fusion network and the Vision Transformer architecture was proposed in this paper for network anomaly traffic detection. MFVT model has strong ability to deal with imbalanced data sets, and therefore effectively reduce the sample resources required for training. This paper also studies the influence of learning rate change and the number of training epochs on the experimental accuracy based on the MFVT model.
So far, there are many ways to process raw network traffic data, but there is no uniform standard. Since the data that a neural network can accept must be of the same dimension, the extracted network traffic data must be filtered to a specific dimension before it can be used as the input of the neural network model. Most of the traditional methods directly intercept the data of specific dimensions from the network traffic data. Although the effect is quite good, there is room for improvement. Therefore, PCA algorithm is used in this paper to reduce all the processed digital traffic features to a specified dimension. The experimental accuracy obtained in the data sets IDS 2017 [10] and IDS 2012 [11] is significantly higher than the traditional methods.
In summary, the main contributions of this paper are as follows.
• (1)A deep learning model (MFVT) based on feature fusion network and Vision Transformer architecture is proposed, which can effectively improve the detection accuracy while reducing the training resources. On the IDS 2017 dataset and the IDS 2012 dataset, MFVT model can achieve the best performance on all evaluation metrics. • (2)A new raw traffic data extraction algorithm (CRP) is proposed, which uses the PCA [12] algorithm to reduce the processed digital traffic features to a specified dimension. The ablation experiment results show that the detection accuracy has significantly improved to compare with traditional methods. • (3)Based on the MFVT model, the impact of training epochs and the variation of the learning rate on the detection performance of the model is further studied.
The rest of this paper is organized as follows. Section 2 introduces the related works to the model and method presented in this paper, Section 3 details the deep learning model and the raw network traffic data processing algorithm, Section 4 introduces ablation experiments and experimental results of MFVT model in detail, and finally, our work is summarized in Section 5.

Related Work
This section mainly summarizes some documents related to the work of this paper, including intrusion detection and Transformer architecture.

Intrusion detection
In 1980, Anderson [13] proposed the concept of intrusion detection technology, which aims to timely identify abnormal behaviors in the network and reduce losses caused by abnormal behaviors. Over the past 40 years, many methods have been used in intrusion detection, all of which aim to sense attacks with good predictive accuracy and improve real-time prediction. These methods all attempting to extract a pattern from network traffics to distinguish attack traffics from regular traffics.
Specifically, table 1 briefly summarizes the methods used in intrusion detection. Currently, the traditional machine learning methods applied to the field of intrusion detection are mainly supervised learning, such as support vector machine (SVM) [14][15][16], K-nearest neighbor (KNN) [17], random forest (RF) [18,19], and so on. These methods mentioned above have a high false alarm rate and a low detection rate for attack traffics. It is a common problem in traditional machine learning methods to design a feature set that can accurately reflect traffic characteristics, and the quality of feature set directly affects the classification performance of the method. In recent years, although many researchers have been working on the problem of how to design feature sets [20,21], how to design a set of suitable traffic feature sets is still an unresolved research topic.
Moreover, deep learning [22] have good self-adaptability, self-organization, and generalization capabilities. Therefore, it can be a good solution to the problem that traditional machine learning needs to manually design a group of feature sets. The use of deep learning can enable detection systems with higher detection efficiency, and therefore has been widely studied by scholars in recent years. Yan [23] constructed an intrusion detection system based on convolutional neural network (CNN) and applied generative adversarial network to synthesize attack traces, and experimental results verified the effectiveness of the system. Zhang [24] proposed a deep hierarchical network-based intrusion detection model that combines CNN and long short-term memory network (CNN LSTM), and the CNN LSTM model achieved good performance on the IDS2017 dataset. Lin [25] constructed a dynamic network anomaly detection system, which uses long and short-term memory network (LSTM) combined with attention mechanism to detect anomalies. Zhang [26] proposed a twolayer parallel learning cross-fusion deep learning model (PCCN), which uses feature fusion technology to improve the extraction of features from small sample data, and experiments on ablation experiments showed good performance. Zhong [27] proposed HELAD, a network anomaly traffic detection algorithm integrating multiple deep learning techniques. Although HELAD has better adaptability and detection accuracy, its bit error rate is slightly higher.

Transformer Architecture
In 2018, Transformer architecture [28] was first appeared in the field of natural language processing (NLP), and it has occupied an important position in the field of NLP. Transformer architecture has been continuously improved by subsequent scholars [29]. Vaswani [30] first constructed Transformer architecture based on attention mechanism. Devlin et al. [31] proposed BERT, a new language representation model, which pretrains a Transformer from unmarked text through joint adjustments of left and right contexts. BERT got the latest results from 11 natural language processing tasks at the time. Influenced by the excellent performance of Transformer architecture in NLP task, scholars began to extend Transformer architecture to the field of computer vision and achieved good results. Chen et al. [32] constructed a sequence Transformer to perform regression prediction of pixels and obtained competitive results in the image classification task. In 2020, Dosovitskiy et al. [33] proposed a vision Transformer architecture, which uses a pure Transformer to directly extract the features of image block sequences and obtain the most advanced performance on multiple image recognition reference data sets. Besides the most basic image classification tasks, Transformer models are gradually applied to various computer vision tasks, and the number of vision models based on Transformer architecture has gradually become more and more.
In this paper, the latest intrusion detection model based on feature fusion is improved and integrated into Vision Transformer architecture, and then a deep learning model (MFVT) that combines feature fusion network with Vision Transformer architecture is proposed for network anomaly traffic detection. The MFVT takes full advantage of the respective strengths of feature fusion and Vision Transformer architecture, and further improves the detection accuracy of abnormal network traffic by combining with the CPR algorithm proposed by us.

Model and methods
This section mainly introduces the CPR algorithm and MFVT model.
In order to improve the processing capacity of existing deep learning models for imbalanced data sets and reduce the required training set resources, in this paper, a new model MFVT and a new raw data processing algorithm CPR were designed. This section mainly introduces the MFVT model and the CPR algorithm. The MFVT model can improve the detection ability of small sample data sets and reduce the training set resources, and the CPR can effectively remove the interference features in the raw data. Figure 1 shows the entire detection process. The MFVT model mainly composed of a feature fusion network and the Vision Transformer architecture. MFVT can use the raw features of network traffics to automatically learn the differences between different categories of network traffic features to classify anomaly network traffics, but the network model requires that the dimensionality of all input data must be consistent, so an algorithm named CPR was proposed to extract the raw features of network traffics and intercept the same dimensional data.

Data processing
The raw data processing algorithm (CPR) proposed in this paper mainly accomplishes the task of extracting raw traffic data from pcap files and processing them into the two-dimensional matrix that required by the network model. Figure 2 shows the entire data processing process. Three steps are required to process the raw flow data into a two-dimensional matrix. The specific steps are as follows.
The first step is to extract the raw data of network traffic from the pcap file, and then convert the extracted byte type data into binary type data.
In the second step, the converted packets are divided into flows according to the five-tuple, and the number of packets and bytes contained in each packet are limited when dividing the flow. If the number of data packets is insufficient, fill in the preceding item, and if the number of bytes contained in the data packet is insufficient, fill in 0. For the completion of this step, refer to the paper [34]. Through the above operations, a data set with fixed dimensions can be obtained. The pseudo code is shown in algorithm 1.

Algorithm 1 Raw data processing
Input: Raw data (pcap); Output: all data[]; 1: for each pcap do 2: if the same five-tuple could be found in the attack Labels then;# Extracting and tagging malicious traffic from pcap files if five-tuple equal and count¿ threshold then;# The maximum number of packets per stream set 9: get flows based on five-tuple information of traffic packages 10: for each flow do 11: Transform flow pcap file into txt file with wirehark to get flow's original hexadecimal data 12: end for 13: end if 14: end for 15: ls=os.listdir (fpath)# The folder path corresponding to each type of attack traffic 16: for path in ls do 17: file=open(path,'r')# Open the txt file where each flow is stored 18: for line in file.readlines() do # Read data line by line 19: Convert the read data into hexadecimal data and store it in mid data[] #mid data[] is to store each packet data in each flow 20: end for 21: all data.append(mid data) 22: all data=[] 23: end for In the third step, the network traffic data obtained after the first two steps contain high data dimensions and may have redundant features that are useless for network training, which need to be further extracted. In this paper, the data obtained from the first two steps are directly fed into the PCA algorithm to obtain the data of the required dimensions, and then the data are processed into a two-dimensional matrix. The pseudo code is shown in algorithm 2. data.append(mid data) 13: end if 14: Same as above len(all data[i])¿size 15: for i to (size-len(all data[i])) do 16: data.append(The data extracted from the previous) end if 20: end fordata=pca(data, dimension) # Reducing data to a specified dimension using the pca algorithm 21: Maxmin Normalized(data) 22: Save the descended data to the specified csv file The main idea of PCA is to map the N-dimensional features to the Kdimension, which is a new orthogonal feature, also known as the principal component, and is a reconstructed K-dimensional feature based on the original N-dimensional features, as shown in Formula 1,2,3,4,5.
Formula 1 indicates that the original data X is arranged into a matrix with n rows and D columns, and then the matrix is zero-averaged. x ij represents the data in row i and column j of matrix X. In formula 2, c represents the covariance matrix of matrix X. Formula 3 expresses getting the eigenvalue and eigenvector of the covariance matrix c, eig() is the function of getting the eigenvalue and eigenvector, w indicates the obtained eigenvector, and b indicates the corresponding eigenvalue. In formula 4, the eigenvectors are arranged into a matrix in rows from top to bottom according to the corresponding eigenvalues. The first k rows are taken to form the matrix p, where sort() is the sorting function and slect() is the selection function. Formula 5 represents the data set Y obtained after dimension reduction.  First part is the feature fusion network, which is composed of two layers of parallel convolution networks. The first layer is stacked with two convolution layers, the first convolution has a step of 1, the second convolution has a step of 2, and the size of the kernel is 3. The second layer consists of a convolutional layer and a pooling layer, where the convolutional layer has a kernel size of 3 and a step size of 1, and the pooling layer has a step size of 2. The padding size used in the two-layer convolution process is all 1. To make full use of the features extracted by convolution layer and pooling layer, the extracted features are fused to improve the extraction effect of features for small sample data.The whole calculation process of the feature fusion network is shown in Formula 6,7,8,9,10,11,12,13,14,15,16. Formula 6 represents the padding operation, and formula 7 represents the size change of the output matrix of convolution processing after the padding operation. Under the premise that padding n is equal to 1, the stride=1 keeps the output size unchanged, and the stride=2 halves the output size.

The structure of MFVT
X O represents the matrix data obtained after the original traffic data is processed by the CPR algorithm. Since the size of the input matrix will be changed after the convolution operation, the padding operation is required to keep the size of the matrix unchanged. X represents the matrix after the padding operation,X ij represents the specific data value in the matrix. W is the width of the matrix, and H is the height. Formulas 6, 8,9, 10 represent the entire calculation process of the first layer in the feature fusion network. Formulas (3) and (5) represent the convolution operation, V represents the convolution kernel matrix, v ij represents the specific value in the convolution kernel matrix, and k represents the kernel sizes. X 1 1 represents the eigenmatrix obtained after the first convolution operation. Since the stride in formula (7) is 1, the output size remains unchanged. X 2 represents the matrix obtained after the padding operation of X 1 1 , and X 3 1 represents the eigenmatrix obtained after the second convolution, and the output size is halved because the stride in formula 7 is 2.
Formulas 6, 11, 12 represent the entire computational process of the second layer in the feature fusion network, whereX 1 2 denotes the feature matrix extracted after the convolution operation, the stride=1 does not change the output size, and X 2 2 denotes the feature matrix obtained after the maximum pooling operation, which halves the size of the output feature matrix.
Formula 13 shows the scale changes of the features extracted from the first and second layers of the feature fusion network. Formula 14 represents the specific process of fusing the first layer with the second layer features. The fusion refers to the summation of the number of channels, but the data must be kept consistent except for the number of channels. C represents the number of channels, C(1) represents the number of channels is 1, C(32) represents the number of channels is 32 and so on, X f represents the features extracted by the feature fusion network.
The second part is composed of the Vision Transformer architecture. To combine Vision Transformer architecture with feature fusion network, the structure of Vision Tansformer is modified in this paper. The main methods used include: feature embedding, learnable embedding, and Transformer encoder.
For feature embedding, standard Transformer accepts Sequence of Token embeddings as input. To process the feature X f learned by the feature fusion network, we reconstructed X f into a flattened 2D block Sequence X p . Formula 15 shows the specific process of change.
Learnable embedding, a learnable embedding z 0 0 = x class is preset for the feature block embedding sequence, x class denotes the category vector whose state/feature Z 0 L at the Transformer encoder output is used as the feature representation y, as shown in Formula 21. Learnable embedding is randomly initialized at the beginning of training and obtained by training.
Transformer encoder, which consists of several blocks, each containing a Multi-Head Attention block and a Multi-Layer Perceptron block (MLP), with normalization applied before each block and residual concatenation applied after each block. Figure 4 shows The feature embedding block X 1 P E and the category vector X class form the embedding input vector Z 0 . Formula 19 adopts skip connection, where MAS represents Multi-Head Attention operation, LN represents normalization operation, L represents repeatable times, and Z ′ l represents the lth output. Formula 20 adopts skip connection, MLP represents the multi-layer perceptron block, L represents repeatable times, and Z l represents the lth output. y represents the feature representation.

Experiments and results analysis
This section first introduces the experimental environment, the datasets IDS 2017 and IDS 2012 used in the experiments, the evaluation criteria used in the experiments, and finally specifies the ablation experiments and some details of the experiments. In the ablation experiments, a series of advanced models were compared with the MFVT model.

The experimental environment of this paper
In this paper, ablation experiments were conducted on the MFVT model and CPR data processing algorithm under the environment shown in Table. 2.

Datasets
In this paper, A series of ablation experiments were designed using both IDS 2012 and IDS 2017 datasets. MFVT:An Anomaly Traffic Detection Method Merging Feature Fusion Network and V The IDS 2012 dataset contains a week of network activity including both normal and malicious activity, with three days consisting of all normal traffics and the remaining four days consisting of a large amount of normal traffics with a specific type of attack traffics. IDS 2012 dataset contains attack traffic including internal penetration, HTTP denial of service, distributed denial of service using IRC botnet, and brute force cracking of SSH [11].
The IDS 2017 data collection period lasts for five days from 9am on Monday, July 3, 2017 to 5pm on Friday, July 7, 2017, of which Mondays only include normal traffic. The attacks implemented included Brute Force FTP, Brute Force SSH, DoS, Heartbleed, Web Attack, Infiltration, Botnet, and DDoS [10]. Figure 5(a) is a bar chart of the amount of various attack traffics contained in the IDS 2017 dataset, and Figure 5

Evaluation Metrics
Authoritative evaluation metrics must be used to judge the merits of a network anomaly traffic detection method. The effectiveness of the machine learningbased network anomaly traffic detection algorithm can be evaluated by the metrics shown in formula. 25,23,22,24. T P represents the positive sample predicted to be positive by the model, which can be called the accuracy rate judged to be true. T N represents the negative sample predicted to be negative by the model, which can be referred to as the percentage of correct judgments that are false. F P represents the negative sample predicted by the model to be positive, which can be referred to as the false alarm rate. F N represents the positive sample predicted to be negative by the model, which can be referred to as the underreporting rate [27].
P recision = T P T P + F P (23)

Ablation experiment and results analysis
In this paper, two datasets of IDS 2012 and IDS 2017 were used for ablation experiments. In addition, this paper also carried out an exploratory study on the impact of model optimization methods on MFVT model detection performance on IDS 2012 dataset. In the MFVT model, the size of the kernels used in the convolutional neural network is 3 * 3, the segmentation size set in the Vision Transformer architecture is 11 * 11, the number of the head in the Multi-head attention is 12, and the number of blocks in the Encoder is 12. In the process of model training, the data input batch used in this paper is 256, the epoch of the training iteration is set to 100, and the stochastic gradient descent (SGD) optimizer is used to accelerate the network convergence. The momentum is fixed at 0.9, the learning rate is fixed at 3e-2, weight decay Set to 0, the loss function uses CrossEntropyLoss. All ablation experiments and results will be described in detail below. Figure 6 shows the parameter changes of MFVT model when using IDS 2012 dataset for training, including training loss, verification loss and verification accuracy. As show in the picture, the convergence speed of MFVT model is fast, but there are large fluctuations in the later stages of training. Table 3 shows the experimental results based on IDS 2012 dataset. It is obvious from the table that the MFVT model combined with CPR algorithm proposed in this paper is superior to other methods on all evaluation metrics, reaching the state-of-the-art level. It can also be concluded from the table that MFVT model has superior performance, and its detection accuracy is only slightly worse than that of DT (Decision Tree), but it has higher Precision. To better demonstrate the ability of the MFVT model to deal with imbalanced data, the experimental results of all evaluation metrics of the MFVT model in various types of attack traffic are shown in Table 4.

Ablation experiment based on IDS 2012
Combining the (B) in Figure 5 and Table. 4 (the experimental results of Infiltrating and Distributedenial, which account for a relatively large proportion, have been marked in red),it can be concluded that the traffic of HTTP and rutesh, which account for a relatively small proportion, still obtains good

Ablation experiment based on IDS 2017
The IDS 2012 dataset contains fewer types of attack traffic, and the effectiveness of the MFVT model and the data processing algorithm CPR is demonstrated to be not generalizable on this dataset only. So, ablation experiments also were performed on the more complex IDS 2017 dataset. Figure 7 and 8 are the results of the ablation experiment, from which it can be seen that the accuracy, Recall, F1-score and accuracy of the MFVT model and the combination of MFVT model and CPR algorithm all reached nearly 100%, which was significantly better than other comparison models. Figure 7 shows that the detection results obtained by MFVT model and the combination of MFVT model and CPR algorithm are close to 100% in the evaluation criteria, which is significantly better than other comparison models. The comparison of the FPR between the MFVT model and other comparison models is shown in Figure 8, from which it can be seen that the MFVT model is still the best.  Combined with the Figure 5(a) and Table 5(experimental results of DDos, Hulk and Portscan, which account for a large proportion of attacks, have been marked in red), it can be concluded that the MFVT model combined with the CPR algorithm has a better ability to recognize small samples.   To further demonstrate the error of the prediction results of the MFVT model proposed in this paper combined with CPR, the experimental results were made into the heat map shown in Figure 9. From the heat map, the performance of the MFVT model combined with CPR is very high, and the prediction error rate is extremely low. To verify that the MFVT model can reduce the sample resources required for training, we tested it on the IDS 2017 dataset by reducing the training set data volume according to Formula 26 with all other conditions held constant, data 0 is the initial assigned training set data volume, data n is the updated data volume, and n is taken according to Formula 27, where n 0 is the initial value of n equal to 0.9, and N takes values in the range of 1-7. Table 6 shows the test results. As can be seen from Figure 10, when the training set data amount is reduced to 80% of the original training set data amount, the impact on the overall accuracy of the test set is very small. Through this experiment, it is proved that the MFVT model combined with CPR algorithm can effectively reduce the training resources and maintain the accuracy of the test set as much as possible.

Optimization of MFVT model
In the conclusion of this section, it is hoped to further improve the detection accuracy and the stability of the model by increasing the training epochs and continuously adjusting the learning rate (lr) during the training process. Thus, IDS 2012 is used as the ablation experiment dataset, which takes less time to train than IDS 2017.Two sets of experiments were conducted. In the first group. our model was trained 1,000 times and the results were recorded every 100 times.
In the second group, based on the first group, lr is changed 100 times per iteration according to formula 28, where lr i is the learning rate changed every time according to the formula, lr 0 is the initial learning rate, and the epoch is every hundred iterations. To ensure the rigor of the experiment, the values were obtained after conducting the two sets of experiments several times. It can be seen from the figure 11 that both the increase of training epochs and the change of lr can get better prediction accuracy in some intermediate results, but the experimental results tend to be stable in the end. In comparison, the variation of lr will make the variation of experimental results more stable.

Results and Discussion
Since most of the deep learning models need a lot of training resources, a network anomaly traffic detection model (MFVT) which combining a feature fusion network with the Vision Transformer architecture was proposed. MFVT can reduce training resources while maintaining high detection accuracy. In this paper, a new raw traffic data extraction algorithm (CRP) was proposed. The MFVT model combined with the CRP algorithm achieved nearly 100% detection accuracy on both datasets IDS 2012 and IDS 2017, and with much better performance than the other methods in the comparison experiments. The MFVT model combined with the CRP algorithm is more capable of handling imbalanced data sets and can further improves the detection accuracy of the experiment. Although the MFVT model combined with the CRP algorithm has an excellent performance in the field of anomaly traffic detection, the scalability of the model is weak and the detection accuracy of new types of attack traffic that do not appear in the training set needs to be improved in the face of the increasingly complex network environment and the emergence of new attack types.
Considering the importance and practical significance of scalability, the scalability of the MFVT model will be further improved in the future to enhance the practical value and practical significance of the model.  Figure 7: Partial experimental results , Blue represents precision, purple represents recall, green represents F1-score, and red represents accuracy • Figure 8: The result of FPR • Figure 9: Heat map of prediction results of MFVT model combined with CPR algorithm. • Figure 10: Experimental results for different training set data amounts.