In this section, we propose a defensive method that can be used for detecting an App-DDoS attack. We show how to represent a set of web browsing behaviors using sequence-order-independent attributes instead of web page request sequences. We then present a multiple PCA model to profile normal web browsing patterns and distinguish App-DDoS attacks. Since DDoS attack detection systems are required to handle an extremely large volume of traffic, we base our description of the web browsing patterns on PCA instead of nonlinear methods such as kernel methods and manifold learning [10, 11].
3.1. Sequence-order-independent attributes
We represent each request sequence as a vector form of extracted attributes. Let us assume that N users browsed a web server where the total number of web pages is D. For user i who browsed this server, let s
i
be the web page request sequence and η
d, i
be the number of requests of user i for page d of the server. Then,
is the total number of requests of user i and
is the average number of requests of users.
Now, let us define several sequence-order-independent attributes for detecting App-DDoS attacks. To give a clearer representation of active user i, we introduce the attribute
which is the ratio of the number of requests of a user and their average value. The next attribute,
is the proportion of page d among pages requested by user i; it shows how much user i was interested in page d of the server.
To help determine whether incoming users are indicative of a DDoS attack, we supplement the two basic attributes with two other attributes that characterize the web browsing patterns of a user. The first supplementary attribute is defined as the proportion of all server pages requested by user i, i.e.,
where b
d, i
is an indicator that equals 1 if η
d, i
> and 0 otherwise. This attribute represents the breadth of user i's interest.
The next supplementary attribute shows the intensity of interest in the user's page of greatest interest. This attribute for user i can be defined by using q
i
= argmaxd{η
d, i
} to denote the most frequently requested page. Thus, the intensity of the user's interest in the page of greatest interest can be represented as follows by the ratio of the number of requests for the page of greatest interest and the number of page requests:
From these attributes, we denote a attribute vector as w
i
= [h
i
, α
i
, τ
i
, γ
i
]T, where h
i
= [h1,i, h2,i,...,h
D
,
i
]T. We then form the attribute matrix W = [w1...w
N
].
3.2. PCA for web browsing behaviors
PCA is the simplest statistical method for transforming given data to new coordinates called principal components. By removing less important components, it can reduce the number of dimensions required to explain the given data. The reduced subspace best represents the given data in a least-squares sense.
We use PCA to model web browsing patterns; the modeling is based solely on normal users' attributes. To model the web browsing patterns, we first denote a mean vector, μ0, and a covariance matrix, C, as
, and C = XXT/N, where X = [x
i
...x
N
], x
i
= w
i
- μ0, and i = 1,...,N. We then compute the eigenvectors and eigenvalues by applying singular value decomposition to the covariance matrix.
If we let u
j
be the j th most significant eigenvector of covariance matrix C, then the significant principal components are denoted by
where P (≪ D) is the number of significant principal components. Since the remaining eigenvectors [uP+1...uD+3] are less significant, we can reduce the dimensions of the data without significant loss when these eigenvectors are discarded.
If the attribute vectors are projected into the subspace spanned by P significant principal components, then we can represent the attribute of web user i in terms of the following P-dimensional coefficient vector:
This coefficient indicates how much each principal component contributes to the representation of the given attribute.
3.3. Multiple PCA model
Describing real traffic via a single PCA model is difficult because the traffic data usually include many patterns, variations, and different types of noise. We therefore propose to use a multiple form of PCA for effective modeling of real traffic. For the multiple PCA model, we use the k-means clustering method to partition the given data into several clusters [12]. The k-means clustering is a well-known algorithm for unsupervised clustering, but is inappropriate for sparse or concave-shaped data [13, 14]. With our attributes, it is frequently the case that a particular data element may remain zero because some web pages may not be requested for a long period. Accordingly, our attribute matrix may have a high degree of sparsity. To overcome this problem, we perform k-means clustering on the values of the PCA coefficient, a
i
, instead of the values of the raw data, w
i
. Because the a
i
values are low-dimensional and not sparse, we can easily partition the given attributes into several clusters with the coefficient.
Next, we build a PCA model on each cluster. Let w
i
(k)be user i's attribute vector that belongs to cluster k. For each cluster, we first normalize the attribute vector by x
i
(k)= (w
i
(k)- μ(k))/(σ (k))2, where μ(k)and (σ (k))2 are the mean and variance vectors for cluster k, respectively. We then compute P- significant principal components, U(k), for cluster k as described in the previous section. Once the principal components of cluster k are computed, we can reconstruct the original attribute vector with only P principal components. The reconstructed data of
and their errors, ε
i
, are denoted as follows:
Designed exclusively for normal traffic, our PCA model produces a good representation of the attributes of normal traffic but a poor representation of the attributes of unseen traffic. As a result, the reconstruction error is low for normal behavior but high for abnormal.
We regard the high reconstruction errors of the PCA as statistical outliers. Hence, we choose a threshold, δ(k), of cluster k and use it as follows to determine whether the given web browsing behavior is normal:
where E[ε] and E[ε - E[ε]]2 are the mean and variance of the reconstruction errors for model k, respectively. The β value is a scale factor for defining the outlier range. According to studies on outlier detection, the outlier range should deviate from the mean by more than two or three standard deviations [15].
3.4. Detection method
If a new web user, t, requests a web page from the server, then we first form attribute vector w
t
and determine the best fitting model as follows:
where k = 1,...,K. The value K is the total number of clusters, and m
k
is the mean of the PCA coefficients for cluster k. After selecting the best fitting model, π, we normalize w
t
using μ(π) and σ(π). Finally, the model compares the reconstruction error, ε
t
, with the error threshold, δ(π). If ε
t
> δ(π), then the model regards the current user as an App-DDoS attack. Figures 2 and 3 show the pseudocodes of the proposed method, which includes model training and testing.