Network traffic modeling significantly affects various considerations in networking, including network resource allocation, quality of service provisioning, network traffic management, congestion control, and bandwidth efficiency. These are very important issues in network protocol design, too. In this paper, a comprehensive comparison of modeling approaches of adaptive neuro fuzzy inference system (ANFIS) and autoregressive integrated moving average (ARIMA) for modeling of wireless network traffic in terms of typical statistical indicator and computational complexity has been attempted. ARIMA has been widely used in this area for past many years. On the other hand, ANFIS is comparatively new, and no network traffic modeling using ANFIS was attempted until recently to the best of our knowledge. At the same time, a detailed comparative performance evaluation of ANFIS with other modeling approaches in traffic modeling could not be found in existing literature. Reportedly, ANFIS provides a good precision in prediction in terms of statistical indicators and also gives effective description of network conditions at different times. However, the computational complexity of ANFIS for traffic modeling is a major concern and deserves a closer inspection. In our case of wireless network traffic, as a final result, we find that ANFIS model performs better than the best ARIMA model in three different scenarios.
Network traffic modeling plays an important role in many areas of computer networks including but not limited to network traffic management, quality of service (QoS) provisioning, network protocol design, and bandwidth allocation. This has led to a great interest among researchers in accurate modeling of network traffic. Initial attempts in the past were mainly concentrated on Poisson modeling which did not compare well to the actual observations made at that time . Then, a groundbreaking work by Leland et al.  proved that the network traffic exhibits self similarity and therefore its nature is entirely different, justifying not so accurate results from the Poisson models. This seminal paper also laid the foundation for subsequent network traffic modeling attempts.
Many different modeling approaches have been tried since then to accurately model and capture this self-similar nature of network traffic. One important category of the type of models is statistical (or regressive) models, which include autoregressive (AR), autoregressive moving average (ARMA), generalized autoregressive moving average (GARMA), autoregressive integrated moving average (ARIMA), and fractional autoregressive integrated moving average (FARIMA) . Another approach is that of fractional Gaussian noise (fGn) and fractional Brownian motion (fBn) which generally result in better accuracy as compared to regressive models for long-range dependent data . Artificial neural network and fuzzy logic-based methods have also gathered significant attention [5–8]. Some authors have used modeling approaches based on least-mean kurtosis  and chaos theory .
ARIMA is a widely used statistical model for time series analysis and has also been used successfully in network traffic modeling [11, 12]. Adaptive neuro fuzzy inference system (ANFIS) model  has been applied to forecast Internet traffic time series in . Although other soft computing approaches have been tried earlier, ANFIS was not attempted prior to  in our knowledge. ANFIS is a combination of fuzzy logic and neural network approaches and inherently carries the advantages of both. This makes ANFIS quite attractive option for this purpose. However, one major drawback of ANFIS is that it is computationally expensive and complex. This paper is an attempt to compare the modeling approaches of ARIMA and ANFIS under different scenarios in order to conclude about their comparative suitability for computer network traffic modeling.
The paper has been organized as follows. In Section 2, we briefly highlight the related work in this area. Section 3 of this paper provides necessary description of ARIMA and ANFIS. Section 4 contains description of network traffic data collection and then data pre-processing. Modeling results and discussion have been presented in Section 5. Section 6 contains conclusions.
2 Related work
ARIMA has been discussed in [11, 12] highlighting its use in modeling and prediction of network traffic. Authors in  discuss ARIMA modeling of traffic in an institutional wireless network. A good discussion on the application of ANFIS to forecast Internet traffic time series can be found in . ANFIS method is compared with ARIMA in  for forecasting WiMAX traffic time series. The authors of  argue that ARIMA is better than ANFIS based on their comparison result which showed lower root mean square error (RMSE) and processing time for ARIMA. However in doing so, no proper reason was given for choosing a particular ARIMA model for comparison. Weather forecasting is another prominent area in which attempts have been made to compare ARIMA and ANFIS approaches. The results of these comparisons, however, are contrasting. Some authors have found ARIMA preferable over ANFIS  while others recommend that ANFIS is a better approach . We must note here that these are very specific applications in which dataset varies drastically from case to case leading to different results. In , comparisons between the two approaches have been made to forecast electrical energy consumption wherein authors conclude that ANFIS is more appropriate than ARIMA.
3 Modeling approaches: ARIMA and ANFIS
We now describe the basic framework of modeling approaches of ARIMA and ANFIS.
3.1 Autoregressive integrated moving average
In the framework of regression models, the computation of present output is done as a linear combination of some pre-specified number of past outputs and moving average of random white Gaussian noise .
Let us denote Ω as the lag operator such that ΩX(t) = X(t - 1). In general we write ΩτX(t) = X(t - τ). Also let us denote Δ as the difference operator so that ΔX(t) = X(t) - X(t - 1). It can be observed that ΔτX(t) = (1 - Ω)τX(t). Let us also define two polynomial functions ϕ(Ω) = (1 - ϕ1Ω-.......- ϕmΩm) and θ(Ω) = (1 - θ1Ω-.......- θnΩn) where ϕ1, ϕ2,....ϕn and θ1, θ2,....θn are coefficients of the lag operator Ω; m and n are the degree of the polynomials, respectively.
Given these notations, the definition of regressive models follows next.
An autoregressive model of order m, generally denoted by AR (m)  has the form
where ε(t) is random white Gaussian noise.
An autoregressive moving average model of order (m, n), generally denoted by ARMA (m, n)  has the form
An autoregressive integrated moving average model of order (m, τ, n) which is generally denoted by ARIMA (m, τ, n)  has the form
It can be seen that ARIMA is the most general of all the three regressive models discussed above. Although other more generalized regressive models are also available, ARIMA will be the focus of our study in this paper.
3.2 Adaptive neuro fuzzy inference system
A fuzzy inference system (FIS) is a framework for computation based on the concepts of fuzzy set theory, fuzzy if then rules and fuzzy reasoning [13, 19]. As shown in Figure 1, a FIS mainly has three conceptual components, viz., rule base, database, and reasoning mechanism. The rule base is a collection of fuzzy if-then rules which decide the system’s behavior and response under different possible situations. The database contains the information about the membership functions in terms of their type and shape. Finally, the reasoning or decision-making mechanism is used to infer and derive output from the system. It may be noted that a FIS may need a fuzzification interface to convert crisp input values to fuzzy values suitable for processing. However, when the inputs themselves are fuzzy then this may not be required. Similarly at the output side, a defuzzification interface is used because in almost all of the real-world application, we need a crisp output value.
A neural network following the above discussed framework of a FIS results in ANFIS. For introductory purpose, a first-order Sugeno-type FIS (see ) in Figure 2 and equivalent ANFIS architecture in Figure 3 has been shown next. The following common rule set for first-order Sugeno fuzzy model can easily be verified:
Rule 1. If x is P1 and y is Q1, then d1 = a1x + b1y + c1.
Rule 2. If x is P2 and y is Q2, then d2 = a2x + b2y + c2.
In the ANFIS architecture shown in Figure 3, each node in the same layer has the similar function. Here we denote output of the i th node in the layer l by .
In layer 1, a linguistic label is associated with each input in terms of its membership grade. This membership grade can be defined by suitable membership functions and with appropriate parameters. The parameters associated with these membership functions are called premise or nonlinear parameters.
In layer 2, a Suitable T-norm operator (most commonly multiplication) is used to perform fuzzy AND operation of the input signals to get the output:
The output of this layer is often called the firing strength of the corresponding rule.
The ratio of a rule’s firing strength to the sum of the firing strengths of all the rules is calculated in layer 3. This operation is also called normalization of firing strengths:
The output of layer 4 is given by
Here, ai, bi, and ci; i = 1, 2 are called consequent or linear parameters of ANFIS. The total number of parameters of ANFIS is the sum of premise and consequent parameters.
Lastly in layer 5, the summation of incoming signals is performed to get the overall output of ANFIS.
It must be noted that the structure of ANFIS explained above is not unique, and, in fact, arbitrary but meaningful assignment of node functions and configurations is possible.
A Sugeno-type FIS as in MATLAB Fuzzy Logic Toolbox is shown in Figure 4.
4 Network traffic collection and data pre-processing
4.1 Network traffic data collection
After completing a brief introduction of modeling approaches of ARIMA and ANFIS, we now proceed to implement these concepts to the real-world network traffic data.
To begin, real-time network traffic trace is the first thing required. In the networking research community, Wireshark  is the most popular and sophisticated network traffic monitoring tool. It can capture data packets from the network and provides important information like packet size, packet transfer rate, and packet capture time as well as data packet contents. We collected the packet statistics from an institutional wireless network, discarding the user data for this study. Matshark  was used to extract the data from Wireshark and export it into MATLAB memory space.
4.2 Data pre-processing
The collected network traffic data was at nonuniform time scale. To get a time series data, samples at a uniform time scale are required. Data samples were extracted from the traffic trace at intervals of 0.1 s in MATLAB resulting in a time series data.
The data thus obtained was normalized using the operation:
This ensured that all samples remain within the range [0, 1] so that the RMSE values after applying different models can be effectively compared.
RMSE was used as a statistical indicator for assessing goodness of different models. The lower is the RMSE value, the better is the model. Mathematically, RMSE is given by
where ym and denote m th actual and model trained data samples, respectively. N is the sample size.
5 Results and discussion
We considered three different cases to evaluate and compare the performances of ARIMA and ANFIS. In the first case, N = 500 samples of the time series data were used for modeling. In the second and third cases, N = 1,000 and N = 1,500 samples were used respectively. Below we describe the individual cases using each approach.
5.1 ARIMA approach
IBM SPSS Statistics 17.0  was used for ARIMA modeling. SPSS is comprehensive and versatile tool for statistical analysis and it provides best fit ARIMA model for the user-supplied data. The collected institutional network traffic data after necessary pre-processing was loaded into SPSS workspace. The best fit model was obtained with ARIMA (1,0,0) having RMSE of 0.085 for case 1, ARIMA (1,0,0) having RMSE of 0.089 for case 2, and ARIMA (0,0,6) having RMSE of 0.083 for case 3. The statistical description of data in three cases has been presented in Table 1 along with the best fit model in each case.
5.2 ANFIS approach
MATLAB Fuzzy Logic Toolbox  was used for ANFIS modeling. The following discussion details the results obtained under each case.
An RMSE of 0.080 was obtained with ANFIS model under case 1. Four past values of data were used as input, and the present value was used as output with each input having two membership functions (2-2-2-2 architecture). The variation of error with increasing epoch numbers is shown in Figure 5. It is observed that error continues to decrease till about 140 epochs and then it becomes almost constant. ANFIS specification for this case has been presented in Table 2.
An RMSE of 0.087 was obtained with ANFIS model under case 2. Six past values of data were used as input, and the present value was used as output with each input having two membership functions (2-2-2-2-2-2 architecture). The variation of error with increasing epoch numbers is shown in Figure 6. It can be seen that error remains almost unchanged beyond seven epochs. ANFIS specification for this case has been presented in Table 3.
A careful reader might have observed that the error along the Y-axis in Figure 6 remains almost equal to 0.0871 (shown only until four decimal places). This means that the error does not decrease appreciably with increasing epochs.
RMSE of 0.081 was obtained with ANFIS model under case 3. Six past values of data were used as input and present value as output with two inputs having three and rest others having two membership functions (2-2-3-2-3-2 architecture). The variation of error with increasing epoch numbers is shown in Figure 7. It is observed that the error curve is smooth contrary to earlier two cases and also that the error does not decrease appreciably with increasing epochs similar to case 2 above. ANFIS specification for this case has been presented in Table 4.
Noting that the number of parameters of ARIMA (m, τ, n) is m + n + 1, we summarize our results in Table 5.
From Table 5, we see that ANFIS model results in lower RMSE as compared to ARIMA in all the three cases considered here. At the same time, it can also be observed that the number of parameters in ANFIS is much larger than ARIMA in each of these cases. Computational complexity is empirically related to the number of parameters of a model which means that ANFIS is computationally more expensive and complex than ARIMA. The difference between the numbers of parameters becomes even larger when the number of inputs and the number of MFs of inputs of ANFIS are increased.
Hence, it is clear from the above results that although ANFIS performs better than ARIMA, this is achieved at the cost of complexity in computation which must be taken into consideration when ANFIS is used for network traffic modeling.
Network traffic modeling demands algorithms that are capable of dealing with the self-similar behavior of traffic data where conventional methods and assumptions fall short in terms of accuracy. Two different modeling approaches, autoregressive integrated moving average (ARIMA) and adaptive neuro fuzzy inference system (ANFIS), were applied to model the institutional network traffic data. For this study, three different cases are considered for modeling with 500, 1,000, and 1,500 samples, respectively, of an institutional wireless network traffic. We find that ANFIS performs better than ARIMA in all the three cases. However, this accuracy is achieved at the expense of computational complexity. Hence, it is recommended to use ANFIS approach only in those cases in which carrying out large computations is possible.
In future, coactive neuro fuzzy inference system (CANFIS)  can be used for network traffic modeling. Since it is a generalized form of ANFIS and allows avoiding some inherent constraints to ANFIS in its original form, we expect to get even better modeling results from CANFIS.
Paxson V, Floyd S: Wide-area traffic: the failure of Poisson modeling. IEEE/ACM Transac Network 1995, 3: 226-244. 10.1109/90.392383
Yadav RK: Modeling of self similartraffic in wireless networks. In IEEE International Conference on High Performance Computing (HiPC) at Proceedings of Workshop on Next Generation Wireless Networks (WoNGeN). Bangalore; 2011.
Piedra N, Chicaiza J, López J, García J: Study of the application of neural networks in internet traffic engineering. In Information Science and Computing. Institute of Information Theories and Applications, FOI ITHEA; 2008:3-47. ISSN: 1313–0455
Zhao H, Ansari N, Shi YQ: Self-similar traffic prediction using least mean kurtosis. In Proceedings of International Conference on Information Technology: Coding and Computing Computers and Communications ITCC 2003. Las Vegas; 2003:352-355.
Li D, Ji B, Xiang H: The on-line prediction of self-similar traffic based on chaos theory. In International Conference on Wireless Communications, Networking and Mobile Computing, 2006. WiCOM 2006. Wuhan: Wuhan University; 2006:1-4.
Rahman M, Islam AHMS, Nadvi SYM, Rahman RM: Comparative study of ANFIS and ARIMA model for weather forecasting in Dhaka. In International Conference on Informatics, Electronics and Vision (ICIEV). Dhaka; 2013:1-6.
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Yadav, R.K., Balakrishnan, M. Comparative evaluation of ARIMA and ANFIS for modeling of wireless network traffic time series.
J Wireless Com Network2014, 15 (2014). https://doi.org/10.1186/1687-1499-2014-15