Forecasting WiMAX traffic by data mining methodology
 Cristina StolojescuCrisan^{1}Email author and
 Alexandru Isar^{1}
https://doi.org/10.1186/168714992013280
© StolojescuCrisan and Isar; licensee Springer. 2013
Received: 29 July 2013
Accepted: 23 November 2013
Published: 6 December 2013
Abstract
One of the most important objectives of wireless network service providers is to make the traffic as uniform as possible in different sectors of the network. In this paper, we analyze the uniformity of the traffic in a WiMAX network with the aid of a forecasting methodology. Taking into account the high volume of data transferred in a wireless network and the requirement of real time, we propose a forecasting methodology based on data mining. The theoretical basis of the proposed method is explained in detail. Its implementation is highlighted by diagrams, which explain each step of the algorithm. The method is applied on real data and the obtained results are discussed.
Keywords
Forecasting Data mining Traffic Wavelets WiMAX1 Introduction
Worldwide Interoperability for Microwave Access (WiMAX) technology is a modern solution for wireless networks. One of the most difficult problems that appear in the exploitation of a WiMAX network is the nonuniformity of traffic developed by different base stations. This comportment is induced by the ad hoc nature of wireless networks and concerns the service providers who administrate the network. The amount of traffic through a base station (BS) should not be higher than the capacity of that BS. If the amount of traffic approaches the capacity of the BS, then it saturates. Due to the traffic nonuniformity, different BS will saturate at different future moments. These moments can be predicted using traffic forecasting methodologies.
The methodology supposes the use of ARIMA models in the wavelet domain to estimate the overall tendency and the variability of the time series belonging to a wireline network traffic database. The first block in Figure 1 implements a multiresolution analysis (MRA), using the stationary wavelet transform (SWT), providing two types of wavelet coefficients: approximation coefficients used for modeling the traffic overall tendency and detail coefficients used for modeling the variability around the overall tendency of traffic. The second block in Figure 1 implements an analysis of variance (ANOVA) procedure, which validates the MRA previously implemented. The third block in Figure 1 establishes the two statistical models for the overall tendency and for the variability of traffic, using ARIMA modeling and BoxJenkins methodology. Finally, the last block in Figure 1 estimates the risk of saturation of the considered server.
The second goal of this paper is of practical nature. We present the proposed forecasting algorithm, highlighting the differences between the wireline and wireless traffic and the results obtained for a WiMAX database.
The structure of the paper is as follows. In Section 2, the phases of design are described. Section 3 is dedicated to the implementation of the proposed forecasting system, each subsystem functionality being exemplified by figures. The results obtained applying the proposed forecasting methodology are presented in Section 4. Finally, Section 5 concludes the paper.
2 The WiMAX traffic forecasting method
In the following, details about the phases of the proposed algorithm in Figure 2 are presented.
2.1 Business understanding
The first phase of a data mining project involves understanding the objectives and the requirements of the project, defining the problem and designing a preliminary plan to achieve these objectives. The objective of the proposed algorithm is to predict when upgrades of a given BS must take place. We compute an aggregate demand for each BS and we look at its evolution at time scales larger than 1 h. The requirement of the project is to perform this prediction fast and precise. We have chosen the forecasting methodology proposed in [2], and our preliminary plan was to adapt this methodology to the case of a WiMAX network.
2.2 Data understanding
Data understanding phase implies collecting initial data, describing and exploring data. In our case, the data was obtained by monitoring the traffic from 67 BS composing a WiMAX network. The duration of collection is 8 weeks. Our database is formed by numerical values representing the total number of packets/bytes from uplink and downlink channels, for each BS. The values were recorded every 15 min and the traces are accessible in two formats: bytes per second and packets per second. Supplementary details about the database are presented in [5]. We will analyze the format in packets per second, because it is easier to handle time series with smaller values of samples. For estimating the moment when the traffic of each BS becomes comparable with the BS capacity, the downlink channel is more important. The traffic has a higher volume in downlink. Therefore, the results presented in the following correspond to downlink channel. The risk of saturation of the BS in uplink is considerably smaller.
2.3 Data preparation
This phase includes selecting data to be used for analysis and data clearing, such as identification of the potential errors in data sets, handling missing values, and removal of noises or other unexpected results that could appear during the acquisition process. The incomplete or missing data constitute a problem. Despite the efforts made to reduce their occurrence, in most cases missing values cannot be avoided. If the number of missing values is big, the results are not relevant. It is therefore essential to know how to minimize the amount of missing values and which strategy to select in order to handle missing data. There are several strategies of handling missing data, for example delete all instances where there is at least one missing value, replacing missing attribute values by the attribute mean, or estimating each of the missing values using the values that are present in the dataset (interpolation) [6]. There are many different interpolation methods such as linear, polynomial, cubic or nearest neighbor interpolation. We have chosen the cubic interpolation because for some BS the missing values are situated on the first/last position of the vector and this fact forbids us to use, for example, the linear interpolation.
Next, a multitime scale analysis is proposed. The SWT is used to decompose the original signal into a range of frequency bands. The level of decomposition (n) depends on the length of the original signal. For a discrete signal, in order to be able to apply the SWT, if the decomposition at level n is needed, 2^{ n }must divide evenly the length of the signal. Level n of decomposition gives n+1 signals for processing: one approximation signal, corresponding to the current level, and n detail sequences, corresponding to each of the n decomposition levels. The value of n gives the maximal number of resolutions which can be used in the MRA.
WiMAX traffic exhibits some periodicities which are better noticed if the sampling interval is modified from 15 to 90 min. So, by temporal decimation with a factor of 6, these time series can be transformed in signals with a temporal resolution of 1.5 h. This represents the highest time resolution which is used in the proposed MRA. Further on, these temporal series will be denoted by x(t). The derived temporal series x(2^{ p }t) have a temporal resolution of 2^{ p }×1.5 h.
where e(t) represents the error of the new statistical model.
For capacity planning purposes, one only needs to know the traffic baseline in the future, along with possible fluctuations of the traffic around this particular baseline. Since our goal is not to forecast the exact amount of traffic on a particular day in the future, we calculate the weekly standard deviation as the average of the seven values computed within each week. Given that the sequence a_{6}(t) is a very smooth approximation of the original signal, we calculate its average across each week and we create a new time series, capturing the longterm trend from one week to the next. Approximating the original signal, using weekly average values for the overall longterm trend and the daily standard deviation, results in a model which accurately captures the desired behavior. So, our database is prepared now for modeling the overall tendency of the traffic and the variability around this tendency.
2.4 Modeling
2.4.1 Basic stochastic models in time series analysis
In the following, some basic stochastic models in time series analysis are presented.
where X_{ t }represent the time series which the model must establish, ϕ_{ p }(.) is a p th degree polynomial and Z_{ t }is a white noise time series.
where θ_{ q }(.) is a q th degree polynomial and Z_{ t }is a white noise random process with constant variance and zero mean [7].
where ϕ_{ p }(.) and θ_{ q }(.) are p th and q th degree polynomials and B is the backward shift operator (B_{ j }X_{ t }=X_{t−j}, B_{ j }Z_{ t }=Z_{t−j}, j=0,1,…).
The ARMA model fitting procedure assumes data to be stationary. If the time series exhibits variations that violate the stationary assumption, then there are specific approaches that could be used to render the time series stationary. Most statistical forecasting methods are based on the assumption that the time series can be rendered approximately stationary through the use of mathematical transformations. A stationarized series is relatively easy to predict because its statistical properties will be the same in the future as they have been in the past. The predictions for the stationarized series can then be transformed by reversing whatever mathematical transformations were previously used, to obtain predictions for the original series. Thus, finding the sequence of transformations needed to stationarize a time series often provides important clues in the search for an appropriate forecasting model. One of the operations which can be used for the stationarization of a time series is the differencing operation. The first difference of a time series is the series of changes from one moment to the next. If Y(t) denotes the value of the time series Y at time t, then the first difference of Y at time t is equal to Y(t)−Y(t−1). If the first difference of Y is stationary but correlated, then a more sophisticated forecasting model, such as exponential smoothing or ARIMA may be appropriate.
ARIMA model is a generalization of an ARMA model. In statistics and signal processing, ARIMA models, sometimes called BoxJenkins models after the iterative BoxJenkins methodology used to estimate them, are usually applied for time series data. ARIMA models are fitted to time series data, either to better understand the data or to predict future points in the series. They are applied in some cases where data show evidence of nonstationarity, when some initial differencing steps must be applied to remove the nonstationarity.
A generalization of standard ARIMA(p,d,q) processes is the fractional ARIMA model, referred to as FARIMA(p,d,q) [8]. The difference between ARIMA and FARIMA consists in the degree of differencing d, which for FARIMA models takes real values.
We have selected ARIMA modeling procedure and we have implemented it with the aid of BoxJenkins methodology.
2.4.2 BoxJenkins methodology
The BoxJenkins methodology [9] applies to ARMA or ARIMA models to find the best fit of a time series to its past values, in order to make forecasts. The original methodology uses an iterative threestage modeling approach:

Model identification and model selection. The first goal is to make sure that the time series are stationary. Stationarity can be assessed from a run sequence plot. It can also be detected from an autocorrelation plot. Specifically, nonstationarity is often indicated by an autocorrelation plot with very slow decay. The second goal is to identify seasonality in the dependent series. Seasonality (or periodicity) can usually be assessed from an autocorrelation plot, a seasonal subseries plot, or a spectral plot.

Parameter estimation. Once stationarity and seasonality have been addressed, the next step is to identify the order (i.e., the p and q) of the AR and MA parts. The primary tools for doing this are the autocorrelation (ACF) plot and the partial autocorrelation (PACF) plot. The sample ACF plot and the sample PACF plot are compared to the theoretical behavior of these plots, when the order is known. Specifically, for an AR(1) process, the sample ACF should have an exponentially decreasing appearance. However, higherorder AR processes are often a mixture of exponentially decreasing and damped sinusoidal components. The ACF of a MA(q) process becomes zero at lag q+1 and greater, so we examine the sample ACF to see where it essentially becomes zero.

Model checking by testing whether the estimated model conforms to the specifications of a stationary univariate process. In particular, the residuals should be as small as possible and should not follow a model. If the estimation is inadequate, we have to go back to step one and attempt to build a better model.
The determination of an appropriate ARMA(p,q) model to represent an observed stationary time series involves the order p and q selection and estimation of the mean, the coefficients ϕ_{ p }and θ_{ q }, and the variance of the white noise, σ^{2}. When p and q are known, good estimators of ϕ and θ can be found by imagining the data to be observations of a stationary Gaussian time series and maximizing the likelihood with respect to the parameters ϕ_{ p }, θ_{ q }and σ^{2}. The estimators obtained using this procedure are known as maximum likelihood estimators (MLE). The aim of this method is to determine the parameters that maximize the probability of observations. A detailed theoretical approach regarding MLE is presented in [10].
In the following, the problem of selecting appropriate values for the orders p and q will be discussed. Several criteria have been proposed in the literature, since the problem of model selection arises frequently in statistics [11].
where n is the number of samples and ${\widehat{\sigma}}^{2}$ is the estimated noise variance.
where L is the likelihood function.
where n is the number of samples.
In the case of AICC and AIC statistics, for $n\to \infty $, the factors 2(p+q+1)n/(n−p−q−2) respective 2(p+q+1) are asymptotically equivalent.
where ${\widehat{\sigma}}^{2}$ is is the maximum likelihood estimator of σ^{2}(the white noise variance of the AR(p) model).
The goal of the BoxJenkins methodology is to find an appropriate model so that the residuals are as small as possible and exhibit no patterns [9]. The residuals represent all the influences on the time series which are not explained by other of its components (trend, seasonal component, trade cycle). The steps involved to build the model are repeated in order to find a specific multiple times formula that copies the patterns in the series as closely as possible and produces accurate forecasts. The input data must be adjusted first to form a stationary series and next, a basic model can be identified [9]. The initial model can be selected using Matlab function idpoly.
2.5 Evaluation
One of the most important phases of a data mining project is the evaluation phase, which collaborates with all the precedent phases. The connection with data preparation phase supposes the evaluation of the MRA, using ANOVA (see Figure 1).
2.5.1 Analysis of variance
ANOVA is a statistical method used to quantify the amount of variability accounted by each term in a multiple linear regression model. It can be used in the reduction of a multiple linear regression model process, identifying those terms in the original model that explain the most significant amount of variance.
where e(t) represents the error of the model.
where y(t) is the observed response of the model.
where $\overline{y\left(t\right)}$ represents the mean of y(t).
The ANOVA methodology splits this variability into two parts. One component is accounted for by the model and it corresponds to the reduction in uncertainty that occurs when the regression model is used to predict the response. The remaining component is the uncertainty that remains even after the model is used. The regression sum of squares, SSR, is defined as the difference between SST and SSE. This difference represents the sum of the squares explained by the regression.
The model is considered to be statistically significant if it can account for a large fraction of the variability in the response, i.e. yields large values for R^{2}.
2.6 Deployment
The final stage, deployment, involves the application of the model to new data in order to generate predictions. A statistical model is aggregated for each BS using the corresponding overall tendency and variability models and its trajectory is established. The saturation moment can be found at the intersection of this trajectory with a horizontal line which represents the BS saturation threshold.
3 Implementation
The representation contains specific underlying overall trend, represented in red. The other two curves represent the deviation, plus (in green)/minus (in black), from the approximation signal. It can be observed that a large part of the traffic is contained between the green and black lines. The red line indicates an increase of the traffic in time, suggesting the possibility of saturation of BS1.
The approximation errors are higher in the second plot than in the third plot. This remark justifies the utilization of both weights β and γ. The approximation in the second plot is smoother than the approximation in the third plot. So, the utilization of the weight γ diminishes the highfrequency components of the errors.
The next step consists in modeling. We used the BoxJenkins methodology [9] to fit linear time series models, separately for the overall trend and for the variability, starting with the estimations in Figure 11. The estimations ‘mean approximation plus’ and ‘mean approximation minus’ are used for modeling the variability, while the estimation ‘approximation per week’ is used for modeling the overall trend.
Analyzing Figure 14, we obtain the same conclusion as in the case of Figure 13. The sequence obtained by computing the second difference of the sequence of approximations is more stationary than the sequence obtained by computing the first difference of approximations or than the sequence of approximations itself. In fact, in [9] it is devised to apply the BoxJenkins methodology two times. First, an initial model is established and then it is optimized in the second run. To initialize the Matlab BoxJenkins methodology (the function b j.m), some information is required: the data to be modeled (one of the three sequences: the approximation c_{6}, its first difference, or its second difference, in our case) and the model (the values p and q and the initial coefficients of the polynomials ϕ of order p and θ of order q) to be initialized. The results of the function b j.m represent the optimal values of coefficients of the polynomials θ and ϕ (which permit the mathematical description of the model), the degree in which the model fits the data (it must be as small as possible), and the value of FPE which must be as small as possible. The orders p and q of the polynomials ϕ and θ can be identified based on their coefficients but the value of the parameter d from the ARIMA model cannot be identified using the function b j.m. For this reason it is identified based on stationarity tests.
One of the most important stages of the proposed WiMAX traffic forecasting methodology is the evaluation. In this phase, both models (for overall tendency and variability of the traffic) are evaluated and all the precedent steps are reviewed. In order to see if the new statistical model in (3) is representative, we used ANOVA and we computed the coefficient of determination. We have obtained good results for each BS for both statistical models, applying the forecasting algorithm to all the downlink traces from the database and obtaining statistically significant ARIMA models for each traffic overall tendency and variability of each BS. We have identified the model parameters (p and q) using MLE. The best model was chosen as the one that provides the smallest AICC, BIC and FPE while offering the smallest mean square prediction error for a number of weeks ahead.
The last stage of the proposed forecasting methodology consists in deployment. The models obtained for the longterm trend of the downlink traces from the database indicate that the first difference of those time series is consistent with a simple MA model (p=0) with one or two terms (q=1 and d=1 or q=2 and d=1) plus a constant value μ_{ ot }. Similar results were obtained for the models of variability of the traffic.
The need for one differencing operation at lag one and the existence of term μ_{ ot }across the model indicate that the longterm trend of the downlink traffic is a simple exponential smoothing with growth. The trajectory for the longterm forecasts will be a sloping line, whose slope is equal to μ_{ ot }. Similar conclusions can be formulated for the variability of the downlink traffic. The trajectory for the variability forecast is a sloping line as well, but it has a much smaller slope. The sum of these sloping lines is a third line, parallel with the trajectory of the longterm forecast, which represents the trajectory of the overall forecast. Hence, the risk of saturation of a BS is proportional with the slope of its overall tendency. Given the estimates of μ_{ ot }across all models, corresponding to all BS, we can conclude, based on the positive values of those slopes, that all traces exhibit upward trends, but grow at different rates.
4 Results
The slope of the overall tendency line in Figure 15 is equal to μ_{ ot }. Analyzing Figure 15, it can be observed that a BS saturates faster if μ_{ ot }has a higher value. Indeed, for the same value of the saturation threshold, with the growth of μ_{ ot }, the value of t_{ s }(which indicates the moment of saturation in Figure 15) will decrease. So, the value of μ_{ ot }is proportional with the saturation risk of the considered BS.
BS risk of saturation
BS  ${\mathit{\mu}}_{\mathit{ot}}\phantom{\rule{1em}{0ex}}\left({\mathbf{\text{Mb/s}}}^{2}\right)$  BS  ${\mathit{\mu}}_{\mathit{ot}}\phantom{\rule{1em}{0ex}}\left({\mathbf{\text{Mb/s}}}^{2}\right)$  BS  ${\mathit{\mu}}_{\mathit{ot}}\phantom{\rule{1em}{0ex}}\left({\mathbf{\text{Mb/s}}}^{2}\right)$  BS  ${\mathit{\mu}}_{\mathit{ot}}\phantom{\rule{1em}{0ex}}\left({\mathbf{\text{Mb/s}}}^{2}\right)$ 

63  239.860  48  114.810  13  68.311  1  45.068 
60  185.470  52  110.250  53  66.329  2  44.729 
3  177.680  8  109.040  6  65.579  9  43.102 
49  176.070  7  105.240  5  63.415  42  42.878 
61  164.030  56  104.720  26  59.885  33  41.441 
57  157.260  55  99.920  12  58.708  30  41.395 
62  146.310  65  99.174  39  57.789  28  39.973 
67  144.630  20  97.943  38  57.675  41  38.129 
54  143.880  29  97.655  35  54.498  40  33.587 
18  138.230  46  93.711  37  53.458  36  32.224 
64  134.220  10  91.557  23  52.729  25  30.601 
16  131.730  19  83.567  45  51.019  15  29.400 
59  130.530  43  79.215  22  50.872  11  27.622 
58  130.350  44  78.572  24  49.404  31  26.144 
51  123.960  66  74.149  27  46.704  17  25.052 
4  118.100  14  71.564  47  45.879  21  24.614 
32  15.921 
The base stations with the smallest risk of saturation are BS15, BS11, BS31, BS17 and BS21. Comparing the results in Table 1, it can be observed that the risk of saturation of BS63 is ten times bigger than the risk of saturation of BS21. So, the traffic of different BS is far to be uniform. This is a significant result because it shows the existence of some limitations in the design of WiMAX networks. This design is based on the Shanon communications theory, which does not take into account some particularities of the wireless networks, for example their ad hoc nature, or the effects of different social influences. The traffic analysis presented in this paper is useful for network designers and administrators, being one of the first studies on WiMAX traffic forecasting subjects.
5 Conclusions
This optimization is described in Equation 7. Because, in the case of WiMAX traffic, we have obtained smaller energy values for the sequence d_{3} than the corresponding values obtained in [2], we have additionally considered the sequence d_{4}.
The proposed forecasting methodology extracts the traffic trends from historical measurements and can identify the BS which exhibits higher growth rates and, thus, may require additional capacity in the future. It is capable of isolating the overall long trend and identifying the components that significantly contribute to its variability. Predictions based on approximations of those components provide accurate estimates with a minimal computational overhead. All our forecasts were obtained in seconds. All the procedures described are implemented as Matlab functions. We have found that the BS of the considered network are more charged in downlink cycles than in uplink cycles. This unbalanced comportment gives some indications about the user needs, and its analysis can give useful information for the local operators, regarding the number of users at different locations within the network. We cannot come up with a single WiMAX networkwide forecasting model for the aggregate demand. Different parts of the network grow at different rates (longterm trend) and experience different types of variation (deviation from the longterm trend). This comportment prove that other methods must be applied to make the traffic uniform. For example, it could be possible to optimize the positions of some BS [12].
This paper presents one of the first attempts to forecast the traffic of a WiMAX network. Taking into account the high speed of traffic analysis developed by the proposed method (it works practically in real time), we consider that it could be implemented by each WiMAX network service provider.
Declarations
Authors’ Affiliations
References
 Rong Z, Qiu CR, Xia X, Guoping W: A casebased reasoning system for individual demand forecasting. In Proceedings of the 4th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM). Piscataway: IEEE; 2008:16.Google Scholar
 Papagiannaki K, Taft N, Zhang Z, Diot C: Longterm forecasting of internet backbone traffic: observations and initial models. In Proceedings of the IEEE INFOCOM. Piscataway: IEEE; 2003:11781188.Google Scholar
 Groschwitz NK, Polyzos G C: A time series model of longterm NSFNET backbone traffic. Proc. IEEE Int. Conf. Commun 1994, 3: 14001404.Google Scholar
 Chapman P, Clinton J: CRISPDM 1.0 Stepbystep data mining guide (2000). . Accessed Nov 2013 ftp://ftp.software.ibm.com/software/analytics/spss/support/Modeler/Documentation/14/UserManual/CRISPDM.pdfGoogle Scholar
 Stolojescu C: Forecasting WiMAX BS traffic by statistical processing in the wavelet domain. In Proceedings of IEEE International Symposium on Signals, Circuits and Systems. Iasi; 9–10 July 2009:177180.Google Scholar
 Rokach L, Maimon O: The Data Mining and Knowledge Discovery Handbook: A Complete Guide for Researchers and Practitioners. New York: Springer; 2005.Google Scholar
 Chatfield C: TimeSeries Forecasting. Boca Raton: Chapman and Hall/CRC Press; 2000.View ArticleGoogle Scholar
 Moayedi HZ, MasnadiShirazi M: ARIMA model for network traffic prediction and anomaly detection. In Proceedings of the International Symposium on Information Technology. Piscataway: IEEE; 2008:16.Google Scholar
 Box GEP, Jenkins GM, Reinsel G: TimeSeries Analysis: Forecasting and Control. Hoboken: Wiley; 2008.View ArticleMATHGoogle Scholar
 Hurvich CM, Tsai C L: Regression and time series model selection in small samples. Biometrika 1989, 76: 297307. 10.1093/biomet/76.2.297MathSciNetView ArticleMATHGoogle Scholar
 Brockwell PJ, Davis RA: Introduction to Time Series and Forecasting. New York: Springer; 2002.View ArticleMATHGoogle Scholar
 Stolojescu C, Moga S, Lenca P, Isar P: WiMAX traffic analysis and base stations classification in terms of LRD. J. Expert Syst 2013, 30(4):285293. 10.1111/exsy.12026View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.