SAViNE: social network analysis - inspired content delivery network deployment and experimentation

Leivadeas, Aris; Papagianni, Chrysa; Papavassiliou, Symeon

doi:10.1186/1687-1499-2014-198

Research
Open access
Published: 24 November 2014

SAViNE: social network analysis - inspired content delivery network deployment and experimentation

Aris Leivadeas¹,
Chrysa Papagianni¹ &
Symeon Papavassiliou¹

EURASIP Journal on Wireless Communications and Networking volume 2014, Article number: 198 (2014) Cite this article

1914 Accesses
1 Citations
Metrics details

Abstract

Over the last years, the content delivery network (CDN) market has been witnessing globally an increase in the development of cloud-based CDNs, as they constitute a viable and cost-effective alternative to traditional commercial CDNs. Towards that direction, Social-Aware Virtual Network Embedding (SAViNE) for wireless content delivery within the OpenLab project aims at establishing, assessing, and prototyping a novel framework for deploying a CDN over the wireless cloud. Specifically, the goal of SAViNE is to provide experimental validation of the efficiency of social network analysis (SNA) - inspired surrogate server placement strategies, devised for fostering content delivery. The proposed framework has been integrated with the OpenLab experimental facilities, providing a run-time environment for CDN deployment, operation, and performance evaluation over the wireless environment. The framework’s feasibility and scalability is experimentally validated. The study is complemented with repeatability evaluation of experimental measurements, related to the resulting CDN’s operational efficiency.

1 Introduction

In today’s digital landscape, the mobile channel has emerged as the most significant factor in consumers lifestyle. Rich media content providers are quickly becoming key elements for mobile channel monetization affecting a significant part of the global gross domestic product. To accommodate this traffic, numerous geographically disparate servers are needed, imposing significant cost overheads that are expected to multiply. Towards that end, content delivery network (CDN) providers (e.g., Akamai, Rackspace, etc.) facilitate the search and delivery of content.

Traditional CDNs have proven to be successful at optimizing and accelerating delivery of content. CDNs replicate content at cache and/or replica servers^a, deployed in multiple, geographically diverse locations, closer to the recipients of content[1]. End user requests are redirected to the most appropriate surrogate, based on networkrelated criteria (e.g., traffic volume, proximity, etc.) as well as service-oriented criteria (e.g., response time, availability, etc.).

Emerging cloud-based CDNs leverage cloud resources to reduce the cost associated with implementing content delivery services (e.g., Amazon CloudFront, MetaCDN). Cloud-based CDNs promote a different business model where content is distributed on a pay-as-you-go model, supporting economies of scale, while ‘CDN as a service’ (CDNaaS) emerges in accordance with the ‘everything as a service’ notion[2, 3].

Effective content distribution is crucial in CDNs[1]. Toward this direction, the appropriate placement of surrogates to locations that are closer to end users is considered an important challenge in CDNs[4]. While the problem has been studied extensively in traditional CDNs (e.g.,[5–8], etc.), these results cannot be directly applied for cloud-based CDNs[2, 9]. Few studies in literature investigate server placement for an overlay cloud-based CDN (i.e.,[2, 9, 10]).

Following a cloud-based CDN approach in conjunction with the current trend of using lightweight (wireless/mobile) handheld devices (e.g., smart-phones, tablets etc.) for accessing resource-voracious applications (e.g., media streaming, etc.), the Social-Aware Virtual Network Embedding (SAViNE) framework aims at establishing, assessing, and prototyping surrogate placement techniques for wireless cloud-based CDN deployment. SAViNE’s goal is to provide experimental validation on the efficiency of social network analysis (SNA)-inspired surrogate server placement strategies devised for fostering content delivery within the evolving wireless cloud environment. SAViNE serves also as a proof of concept for delivering CDNaaS over the Cloud, catering for the cost-efficient delivery of the long tail of the content. The evaluation is performed over a pure wireless multi-hop environment, in order to validate the ad hoc deployment of cloud-based CDNs, in the case that underlying wired infrastructure is not available or wireless infrastructure is preferred (e.g., wireless community networks).

1.1 Paper contributions and outline

The scope of this work is to provide experimental validation of the proposed framework for cloud-based CDN deployment, in realistic, large-scale wireless testing environment. The application of surrogate placement algorithms, followed by the deployment of the resulting CDNs, is conducted over a set of Future Internet (FI) research experimental infrastructures, namely (i) the w-iLab.t[11] wireless testbed for functional validation and feasibility check and (ii) PlanetLab[12] for scalability testing, utilizing appropriate techniques to emulate the wireless environment. These experimental infrastructures are federated within the context of the OpenLab[13] and Fed4FIRE Integrated Projects[14] - as part of the Future Internet Research and Experimentation (FIRE) initiative in Europe[15].

The key contributions of this work are the following;

The proposed SAViNE framework has been integrated with the OpenLab experimental facilities, providing a run-time environment for CDN deployment, operation, and performance evaluation.
SAViNE’s feasibility is validated by investigating the operational efficiency of the proposed solution in wireless environment, throughout each step of the CDN deployment process (server selection and placement, CDN deployment, CDN operation).
The validity of the proposed framework is reinforced by investigating SAViNE’s repeatability over the experimental infrastructures. Specifically, the repeatability testing methodology is streamlined, describing appropriate experiment scenarios and statistical analysis methods of experimental data.
The scalability of the proposed solution is examined by incorporating PlanetLab testbed conducting cross-testbed experimentation.
An appropriate set of performance evaluation metrics are presented, related to the feasibility/ operational efficiency and scalability of the proposed framework.

The rest of the paper is organized as follows. In Section 2, a brief insight on the surrogate placement problem and the background work on the heuristics adopted for the particular study are presented. Following in Section 3, a description of the architecture of the SAViNE framework is presented, integrated with the OpenLab experimental facilities. The experimentation results are presented in Section 4, emphasizing on feasibility and scalability validation as well as repeatability evaluation. In Section 5, a set of closing remarks conclude the paper.

2 Background work

A decisive factor on the performance of a CDN is the number and location of surrogate servers. Optimizing surrogate placement enables the delivery of high quality services at low prices. Surrogate placement belongs to the NP-complete class of problems[16]. With regard to cloud-based CDNs, the replica placement is a complex, joint problem of building distribution paths and replicating content[9].

Within the SAViNE framework, the location and number of surrogates is optimized, taking into consideration metrics inspired by social network analysis. In SNA, the ‘popularity’ of nodes in networks has been successfully captured by the notion of ‘centrality’ and respective metrics[17]. Centrality in graph theory and network analysis is a quantification of the relative importance of a vertex within the graph. Modelling the network as an undirected weighted graph G(V,E); let σ(j,k) be the number of shortest (j,k) paths. Shortest path betweenness centrality (SPBC)[18]c is defined as the ratio of the number of shortest paths traversing node i denoted as σ(j,k|i) over the total number of shortest paths in the network.

\begin{array}{lcr} c (i) = \sum_{\forall j, k} \frac{σ (j, k | i)}{σ (j, k)} \end{array}

(1)

For the sake of completeness, a brief description of the SAViNE heuristics introduced in[2] is provided.

2.1 Surrogate placement

SNA inspired - virtual surrogate placement heuristic

The SNA Inspired Virtual Surrogate Placement (SNA-VSP) is a sub-optimal algorithm, inspired by resource mapping algorithms in virtual networks[19]. The number and location of surrogates is optimized with regard to CDN deployment cost and SPBC, while taking into consideration quality of service (QoS) constraints. CDN deployment cost is comprised of update (upload), storage, and retrieval (download) costs (e.g.,[20]) in the cloud environment. For a surrogate that serves user requests, its update cost is incurred by incoming traffic at the node in case of cache miss, and its retrieval cost is incurred by outgoing traffic to serve user requests. In the case that the server relays traffic to other surrogates (transit node), the retrieval cost is incurred by traffic to provision these servers[9]. With regard to QoS, the maximum (routing) distance between the edge server and the end user is usually considered, as it captures the communication quality between the two nodes and can be measured by either hop count or delay. Alternatively, as in this particular case, geographic distance can be used as an indicator of delay[2]. The algorithm is executed in two phases; (i) the surrogate server selection phase and (ii) the content distribution path selection phase.

Surrogate selection

We consider the joint set of potential surrogate/transit and origin(s) servers, along with their communication links, as an undirected physical network graph. The set of N end users is provided, where each user is associated with (i) a specific location determined by geographical coordinates, (ii) content request rate, and (iii) the size(s) of the request. The physical network graph set is augmented by introducing one pseudo-node for each end user, having the same properties (e.g., coordinates, etc.). For each pseudo-node, a cluster is created, with a radius that matches the maximum QoS distance that the corresponding end user can have from a surrogate server. Each pseudo-node is connected with infinite bandwidth to the physical network graph nodes within its cluster, creating a set of pseudo-edges. These pseudo-edges are added to the edge set of the the augmented graph.

An end user in the resulting CDN will be served by either (i) a surrogate server in its cluster, on the path to the origin server, or (ii) by the origin server in the case of cache miss, as a non-cooperative pull-based approach is used. Therefore, we define N origin-destination pairs between the origin server and pseudo-nodes in the augmented graph, where communication demands are defined by end-user request pattern (request rate and size) as well as cache misses. A mixed integer programming (MIP) minimum cost N-commodity flow problem is formulated. The objective is essentially to minimize the overall CDN deployment cost as discussed earlier and maximize the average SPBC of the selected set of surrogates, while the QoS distance requirement for end users is satisfied, ensuring that the capacity of the physical resources is not exceeded. The detailed description of the MIP problem formulation is provided in[2].

Solving the particular flow allocation problem results in assigning end users (pseudo-nodes) to surrogates (physical network graph nodes), thus selecting the appropriate set of surrogates (cloud sites) that will be utilized for the CDN deployment. Since MIP problems are known to be NP-hard and hence computationally intractable, the optimal fractional solution is computed for the problem’s linear programming relaxation of the integer variables, which can provide a solution at least as good as the integer one. The relaxed problem can be solved by any suitable linear programming method, in polynomial time. A rounding technique is applied to obtain the integer solution of the MIP problem[19].

Content distribution path selection

Once the set of surrogates have been selected and end user to surrogate assignment has been completed, we compute content distribution paths among origin and surrogate servers, by using a shortest path algorithm taking into consideration capacity constraints of the underlying physical network. Communication demands in the physical network graph are defined by end-user request pattern (request rate and size) as well as cache misses. The selected content distribution paths determine also the set of servers to be used for relaying traffic (transits) in the resulting CDN.

SNA inspired - greedy virtual surrogate placement heuristic

Cronin et al.[6] proposed the transit node heuristic for mirror placement on the Internet, where mirrors are placed on candidate hosts in descending order of out-degree. The heuristic was based on the assumption that nodes with the highest out-degrees can reach more nodes with lower latencies. Following the same incentive for the SNA Inspired - Greedy Virtual Surrogate Placement Heuristic (SNA-GVSP), the out-degree metric has been replaced by the SPBC metric. Specifically, the goal of SNA-GVSP is to greedily assign end users to surrogate servers, maximizing the average SPBC of the selected set of surrogates. Each end user is assigned to at least one surrogate server within its cluster, so that the QoS distance constraint is satisfied, while ensuring that the capacity of the physical resources is not exceeded. The detailed description of the server placement algorithm is provided in[2]. Subsequently, to identify the content delivery paths from the origin to surrogates servers, a shortest path algorithm is applied, as the one in the content distribution path selection phase of the SNA-VSP, taking into consideration capacity constraints of the underlying physical network.

3 SAViNE framework: cloud CDN design, deployment, and operation

The SAViNE framework has been integrated with the OpenLab facilities; namely w-iLab.t and PlanetLab testbeds, to conduct experimental validation of the aforementioned server placement techniques. CDN design, deployment, and operation is realized in three consecutive phases:

Offline planning phase: The selected server placement algorithm is refined with the necessary information such as (i) information regarding the testbed’s physical infrastructure subject to disclosure policies by testbed providers (e.g., node/site location) and (ii) CDN-related information provided by the user of the SAViNE framework (e.g., the content provider), such as geographical dispersion[21], request rate and requested traffic volume of end users[10], replica size(s), and QoS distance[9].
Real-time design phase: Taking into consideration information acquired during the offline planning phase, the server placement algorithm is executed, providing the actual CDN topology that will be deployed on the testbed(s). During this phase, predetermined evaluation metrics (denoted hereafter as offline metrics) are gathered. Offline metrics signify (i) the efficiency of the surrogate placement strategy and (ii) the impact of the SNA-inspired metrics on the CDN design.
CDN deployment and operation phase: During this phase, CDN deployment takes place using the results produced from the real-time design phase. The resulting CDN is fully operational and administrable by the user of the SAViNE framework. During the actual operation of the CDN, a set of predefined evaluation metrics (denoted hereafter as online metrics) are acquired. These are crucial for the evaluation of the deployed CDN in terms of its performance in delivering content to CDN end users.

3.1 Architectural overview

The SAViNE framework is implemented as loosely coupled set of components. Four main entities interact with each other during the CDN design, deployment, and operation namely the SAViNE GUI/CDN design engine, the SAViNE module, the CDN software, and the online measurement tool. The reference architecture is provided in Figure1.

3.1.1 SAViNE GUI/CDN design engine

The CDN design engine is responsible for the offline planning and real-time design phase. The core of this component is based on the discrete event Java-based simulator, called Simulator for Controlling Virtual Infrastructures (CVI-SIM)[22]. CVI-SIM was initially used to evaluate the performance of the proposed server placement heuristics[2]. It provides an extensible simulation environment that was developed to facilitate research on the control of (virtualized) infrastructures. CVI-SIM acts also as an emulator since it is designed to support importing actual resource specification files (e.g., GENI v3 RSpec[23]).

In the context of SAViNE, CVI-SIM facilitates the execution of the proposed SAViNE heuristics based on the set of input parameters set by the user of the SAViNE framework and provides as output the CDN overlay to be deployed, along with CDN deployment related information and offline metrics in distinct XML files. The set of input parameters to the SAViNE heuristics include (i) RSpec advertisement files providing details on the location of wireless nodes and their functional properties, (ii) the set of nodes to be used as potential surrogates, (iii) the node to be used as origin server (by default is selected as the node closest to the center of mass of end users[2]), (iv) the content to be distributed, (v) QoS distance, and (vi) site-related unit costs (storage/downloading/uploading).

Via the SAViNE graphical user interface (GUI), the user may initiate each one of the three phases of the experiment as described in the previous section.

3.1.2 SAViNE module

The SAViNE module (Figure1) is the control entity that initiates the overall experimental process and orchestrates the CDN deployment and operation phase. It is developed as a custom shell script, deployed over a single node per testbed (e.g., the experiment controller (EC) for OMF-based testbeds such as w-iLab.t). The module retrieves from the CDN engine all necessary information (e.g. CDN topology, etc.) to deploy the CDN overlay over the testbed and setup its operation. SAViNE module configures each CDN node based on its corresponding role(s) in the experiment (e.g., origin, transit, surrogate, etc.) and transfers the content to the origin server. It is also responsible for the aggregation of online metrics, communicating with each of the surrogate servers at the end of the CDN deployment and operation phase. The pseudo-code of the script is provided in Algorithm 1.

3.1.3 CDN software

To facilitate the automated deployment of the content delivery scenario, OpenCDN[24] (v0-7-7) software is used for the deployment of the application-level overlay content delivery network. In the OpenCDN terminology, an appropriately defined set of nodes act either as edge servers (cache/surrogates) or as transit servers - following a cooperative pull-based approach for content outsourcing - under the control of a centralized module named Request Routing and Distribution Manager (RRDM). RRDM orchestrates media distribution among CDN nodes and reports to an end-user portal the node address where to route a viewer request. Origins provide content and publish metadata, describing it to the portal. For the purposes of SAViNE, the origin/RRDM roles coincide to a single node. Either Apple’s open source Darwin Streaming Server (v6.0.3) or VLC streamer Goldeneye is used for streaming content, depending on the testbed involved, and VLC media player is selected as the default client.

3.1.4 Online measurement tool

Apart from OpenCDN software, the online measurement tool (OMT) is deployed at the edge servers. OMT retrieves periodically (i) server CPU load and throughput values and (ii) latency and cache hit ratio values by the OpenCDN log files. Moreover, it captures the link quality for each of the CDN end users the surrogate serves. OMT is also implemented as a custom shell script.

4 Experiment and performance evaluation

The goal of the SAViNE framework is to establish an efficient content distribution overlay in a wireless environment. The experimental evaluation is essentially conducted during the real-time design phase and the CDN deployment and operation phase. For the sake of comparison, apart from the SNA-inspired heuristics, the greedy site (GS) algorithm[9] is also used as a server placement technique.

4.1 Performance evaluation metrics

As defined in the previous section, performance evaluation metrics have been classified as; offline metrics and online metrics for the evaluation of the real-time design and CDN deployment and operation phase, respectively.

Offline metrics, depicted in Table1, are used to evaluate the efficiency of the server placement techniques. Specifically, these can be compared on the basis of the CDN deployment cost and mapping cost, the number of surrogate servers selected along with other relevant QoS/geographical proximity-related metrics e.g. the path length from the end user to the origin server of the CDN. Moreover, they are used to quantify the impact and effectiveness of the adopted SNA-inspired algorithms on the CDN design (e.g., average SPBC, the number of surrogate servers selected, etc).

Table 1 Offline evaluation metrics

SAViNE: social network analysis - inspired content delivery network deployment and experimentation

Abstract

1 Introduction

1.1 Paper contributions and outline

2 Background work

2.1 Surrogate placement

3 SAViNE framework: cloud CDN design, deployment, and operation

3.1 Architectural overview

3.1.1 SAViNE GUI/CDN design engine

3.1.2 SAViNE module

3.1.3 CDN software

3.1.4 Online measurement tool

4 Experiment and performance evaluation

4.1 Performance evaluation metrics

4.2 Feasibility validation

4.2.1 w-iLab.t offline metrics

4.2.2 w-iLab.t Online metrics

4.2.3 Discussion

4.3 Repeatability evaluation

4.3.1 Temporal repeatability

4.3.2 Spatial repeatability

4.4 Scalability validation

4.4.1 w-iLab.t/PlanetLab offline metrics

4.4.2 w-iLab.t/PlanetLab online metrics

4.4.3 Discussion

5 Conclusion

Endnote

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords