Open Access

P2P and grid computing: opportunity for building next generation wireless multimedia digital library

  • Srinivasan Arulanandam1,
  • Suresh Jaganathan2Email author and
  • Damodaram Avula3
EURASIP Journal on Wireless Communications and Networking20122012:165

https://doi.org/10.1186/1687-1499-2012-165

Received: 3 February 2012

Accepted: 10 May 2012

Published: 10 May 2012

Abstract

Nowadays digital libraries have become the source of information, sharing across the globe in the fields of education, research and knowledge. The full usage of digital libraries will be realized only when people can have access to the material from any location. The advantage of multimedia is that people of all ages can understand more clearly by seeing or hearing rather than reading. Considering the exponential growth in various technologies, developing a multimedia digital library in wireless is not an complicated task. Grid computing enables the virtualization data resources, process network bandwidth and storage capacities to create a single system image granting the user a seamless access to vast IT capabilities. By adopting peer-to-peer overlay networks, which are taking a central position in information systems, the storage space problem can be solved and by using grid computing the security can be maintained. In this article, we propose a framework for wireless multimedia digital library, built using Grid and P2P technology. Using this proposed framework, the digital data is stored in a cluster built of commodity components and users can access those data from anywhere, anytime securely. Advantages of this framework are: (i) existing capital investments are used for storing the multimedia files, (ii) increased access to data, (iii) balancing workloads among different systems connected in pGrid nodes, (iv) authenticated and secured transfer of files. Benchmarks used to test this framework are: (i) file size vs. download time, (ii) simultaneous connections, (iii) band-width utilization, (iv) security, (v) scalability, and (vi) robustness.

Keywords

grid computing peer-to-peer networks wireless networks multimedia data digital library

1 Introduction

In today's, academic libraries, learning resources are found in the form of audio's and video's, which forms a significant collection. The upcoming trends of collecting and maintaining strategies for library profession has been replaced the traditional methods. The traditional methods are no longer acceptable and there is a strong demand for newer forms of services. Therefore libraries which are unfamiliar, are now forced to be friendly and make them familiarized with all relevant and current popular multimedia formats. Inter-active and multimedia learning resources are one of the most rapidly changing and exciting areas of education today. The recent entrants are computer-based training (CBTs) materials/Web-based training (WBTs) materials, especially interactive multimedia programs that run on personal computers. These new technologies offer students, teachers and researchers to access the materials which was not done before.

Multimedia can deliver large amounts of information in many ways. It is manageable, approachable, and useful. The integration of multimedia programmes into libraries and classrooms promise not only to change the kinds of information available for learning, but also the way in which the learning takes place. Multimedia data is a non-textual knowledge resource, has problems in rendering service to the users and also with its sole dependency on a host of gadgets. The current environment is in need of the latest digital technologies for building wireless multimedia digital libraries [1].

Today's information services offer lots of power and visibility to the libraries. It helps the user community without compromising on their quality and performance too. Developments in media related technologies like format migrations and resource intensive maintenance has added complexity and complications in building effective wireless digital library.

A peer to peer (P2P) computer network uses diverse connection between participants in a network and the cumulative bandwidth of network participants. P2P networks are typically used for connecting large nodes via ad-hoc connections and such networks are useful for many purposes. A pure P2P network does not have the notion of clients or servers. It has only equal peer nodes that simultaneously function as both "clients" and "servers" to the other nodes on the network. This model of network arrangement differs from the client-server model, where communication is usually to and from a central server. Grid computing [2, 3] is a collection of distributed resources that are shared among the group of users. It schedules and coordinates resources to offer a diverse collection of services over a network of connected devices. It has its focus on both in research for institutions and in industries.

Grid researchers are also looking at P2P architecture for data storage, where a number of experimental systems are proposed and being developed [4]. The need for massive data storage is not a recent problem and with the use of large information it has become more relevant for institutions. Like the large organizations, the institutions also where they have to maintain a large number of multimedia data files. As a result, valuable expertise has been built, resulting in the actual initiative for building the digital library using P2P and Grid Computing.

The proposed system serves for three main purposes. The first and foremost purpose is by using the latest technology for building an architecture which addresses the specific problem of massive storage for the large collection of digital and digitized items. The second purpose is to resemble not only as a traditional library, but also to make use of available storage space and resources that can be reserved for it. The third purpose is to ease the accessibility of multimedia data from anywhere. The proposed system uses an interface for operations like browsing, searching, sharing and also with the traditional library services. The main services that this system is expected to assure are the storage, search, fast retrieval, and preservation of digital resources in wireless and in a secured manner.

Rest of the articles is divided into 7 sections. Section 2 highlights some works related to digital library. Section 3 depicts the proposed system for digital library. Section 4 briefs the design details. Section 5 explains the components available in it. Section 6 reveals the experimental results done and Section 7 concludes the article.

2 Related work

Traditionally, libraries have been the most important source of knowledge for people. Therefore the managers of libraries have constantly been seeking every way to promote their services. As internet rises and knowledge explodes, the concept of digital libraries has been extensively accepted and a lot of work has been devoted to this field [57]. This section reviews some of the work done in the field of digital library using technologies such as P2P and Grid computing.

2.1 Digital libraries using P2P

Design of digital libraries using P2P paradigm is a unique feature. Here collaborative searches can be done among the peers in decentralized manner. Shortcoming of centralized search engines are: (i) democratic community search, (ii) implicit user feedback, (iii) lack of user recommendations, (iv) no replication, (v) caching and proactive dissemination, (vi) overlap aware query routing is not supported, (vii) less cost optimization.

To overcome the above said shortcomings in centralized search engines and to support collaborative search, Chord-style peer-to-peer overlay network [8] is designed. It connects an apriori unlimited number of peers. Each peer posts a small amount of metadata to a conceptually global, but physically distributed directory. This directory is used to select efficient peers to execute a query based on their local data.

Recent research on P2P systems, such as Chord [9], CAN [10], Pastry [11], or P-Grid [12], are based on various forms of distributed hash tables (DHTs) and supports mappings from keys in a decentralized manner, such that routing, scales well with the number of peers in the system.

PlanetP [13] is "publish" and "subscribe" service for P2P communities. It is the first system which supports content ranking search. PlanetP distinguishes local indexes and a global index which describes all peers and their shared information. The global index is replicated using a gossiping algorithm. The system, however, is limited to a few thousand peers.

Odissea [14] assumes two-layered search engine architecture with a global index structure distributed over the nodes in the system. A single node holds the entire index for a particular text term (i.e., keyword or word stem). Query execution uses a distributed version of Fagin's threshold algorithm [15]. The system appears to cause high network traffic when posting document metadata into the network and the query execution method, presented currently seems limited to queries with one or two keywords only.

In [16] collaborative search, a large number of digital libraries and query routing strategies in a peer-to-peer (P2P) environment is addressed. Both digital libraries and users are equally viewed as peers and thus, as part of the P2P network. This system provides a versatile platform for a scalable search engine combining local index structures of autonomous peers with a global directory based on a distributed hash table (DHT) as an overlay network.

P2P information retrieval framework [17] is a framework designed for distributed search application. A digital library is deployed using P2P and searches for an article or book, based on a search key, which is assigned by the peer node when he/she shares it. This framework consists of peer nodes agrees to share their resources by joining the network. While joining the network, these nodes construct the active peer list. The files are distributed over the P2P network based on the keywords. The searching request could be initialized in anyone of the peers in the peer to peer network. This search request is propagated in an incremental fashion across the nodes with the aim of finding the best node in the incremental fashion. The results obtained from each keyword are aggregated and the final result is listed in the node, which initiated the search procedure.

Basically, it is impossible for a single library to collect all the books from the world and do all the work by itself. Therefore, there must be some collaboration among the libraries. InterLibrary Cooperation (ILC) is used for such purpose [18]. It is designed to satisfy the information requests of worldwide users. In brief, ILC means, two or more libraries cooperate to share their resources, which might be books, journals, catalogs, etc. In this article, author's propose InterLibrary Cooperation Framework for digital libraries using P2P technology. An application of this framework to Faculty Publication Sharing System is presented. Besides, a reputation model based on data mining is utilized to provide libraries with incentives to join this framework. Table 1 presents the summary of digital libraries developed using P2P concept.
Table 1

Summary of digital libraries developed using P2P

Name

Contributed by

Year

Concept

Remarks

CHORD

Balakrishnan et al.

2001

1. Decentralized,

2. Unlimited Pers

Metadata stored in Centralized manner

PlanetP

Nguyen et al.

2002

1. Publish & subscribe model,

2. Supports content search

Limited peers

ODISSEA

Shanmugasunderam et al.

2003

1. Two layered search Engine architecture

2. Uses Fagin's algorithm for query search

1. High network traffic,

2. Query is limited to one or two words

ILC

Shian-Shyong Tseng et al.

2006

Collaborates two of more library resources

Data mining concepts makes the system understanding harder

IRF

Renuga and Sudhasadasivam

2009

Search is based on keywords

Search time is increased due to incremental search fashion search fashion

2.2 Digital libraries using grid computing

Nowadays, data is increasing rapidly. The growing size of digital libraries and integration of digital libraries, there are also various challenges in this filed. Some of them are: (i) resource discovery, (ii) standardization of interfaces, (iii) digital library administration, (iv) copyright and licensing, and (v) cost optimization.

Access to global literature, books, and articles require efficient data management and querying techniques. The requirements for building a digital library are: (i) very large storage resources, (ii) complex queries, (iii) interoperability, and (iv) scalability across global environment. The Integration of grid, data grid and digital library solves various issues related to the upcoming globalization of digital libraries. In [19], Joshi and Jakharia propose a Grid based digital library concept and examine the synergies between these data management systems, which would help in future evolution of digital libraries.

In this 21st century, accumulating knowledge has increased to a tremendous size. Accessing books on every fields are continually increasing in large quantities. In [20], Yani and Ho use the advancements available in Internet and information technology to build a new knowledge transfer model using digital contents, such as books, literatures and data. For this, author's proposed a Global Digital Library Grid concept. This digital library organization not only include new members (new libraries or new museums) flexibility, it also not affect the system and framework of the original Grid member. As the organization grows, it will form an enormous Virtual Grid Digital Library to provide readers with speedy Knowledge Service that will meet their individual requirements.

To build an effective digital library we need vast amount of storage space and have to invest large amount in buying storage oriented hardware's. The emerging trends on distributed computing brought new solutions for existent storage problem. Grid computing proposes a distributed approach for data storing. In [21], Fei et al. introduced a Grid-based system (ARCO) developed for multimedia storage of large amounts of data. The system is being developed for the National Library of Portugal. Using Grid informational system and resources management, they develop a transparent system where Terabytes of data are stored in a Beowulf cluster. It is built of commodity components with backup solution and error recover mechanisms. Table 2 presents the summary of digital libraries developed using Grid computing concept.
Table 2

Summary of digital libraries developed using Grid Computing

Name

Contributed by

Year

Concept

Remarks

ARCO

Han Fei et al.

2004, 2005

1. Developed for national Library of Portugal

2. Beowulf cluster concept is used

Cluster maintenance

Grid digital library organization (DLO)

Chao-Tung Yani et al.

2005

Instead of VO and DLO is created

Problem persists when more DLO's created

Grid DL

Hardik Joshi et al.

2006

Examines the concept of using Grid in DL

Theoretical explanation, Implementation NA

3 Proposed architecture

The goal of the proposed system is to provide a manageable transparent layer for storing and accessing large amounts of multimedia data. Figure 1 depicts the proposed system architecture. It contains five layers: (i) layer 1 - network fabric layer, which lays the foundation stone for communication between heterogeneous peers or nodes, (ii) layer 2 - P2P middleware (JXTA) layer, which controls the peers connected, (iii) layer 3 - Grid middleware (Globus) layer, which is used for transfer of files securely and takes care of data storage, (iv) layer 4 - shows the components of proposed system and (v) layer 5 - a sample end user node, who can access services such as sharing, browsing, searching and downloading the multimedia data.
Figure 1

System architecture.

The high level interface (layer 5) interacts with lower abstractions (layer 4), throughout the process. Layer 5 describes the responsibilities by providing graphical interfaces for a pGrid node formation, data storing, browsing and downloading by and for the users. Layer 4 contains the building blocks of the proposed system and also it has digital library access interface (DLAI), Integrated System Management (ISM), data retrieve management (DRM) and pGrid node management (PNM). Layer 3 describes the interaction done by the grid middleware interfaces with proposed system components (Layer 4) for downloading the multimedia files, i.e., transport and data storage. Layer 2 uses P2P middleware which collects, connects and maintains all pGrid nodes in the network. Layer 1 form the network communication for heterogeneous peers or nodes.

3.1 P2P layer

The proposed system uses P2P and makes use of P2P middleware called Juxtapose or JXTA. JXTA technology [22] is a network programming and computing platform, designed to solve a number of problems in modern distributed computing, especially in peer-to-peer computing (P2P). JXTA defines a common set of protocols for building P2P applications and discusses the recurrent problem with existing P2P systems of creating incompatible protocols.

It is an open network computing platform. It provides the building blocks and services required to enable anything and anywhere application connectivity. JXTA's building blocks help in building applications in client-server or web-based computing or in distributed computing models. JXTA has a common set of open protocols backed with open source reference implementations for developing peer-to-peer applications. The JXTA protocols standardized in the manner, in which peers can do the following activities, i) discover each other, ii) self-organize into peer groups, iii) advertise and discover network resources, iv)communicate with each other and v) monitor. Addition to this, it consists of six protocols that support core P2P operations, such as peer discovery, organization, identification and messaging. It is also independent from the programming language, operating system, network topology and underlying communication protocol. The JXTA protocol implementations have tolerated a series of changes aimed at improving their performance, scalability, and reliability. JXTA has following advantages: i) interoperability, ii) platform independence and iii) ubiquity. It empowers end points (peers) by providing a unique addressing scheme (ID). Using this unique IDs peers can migrate across physical networks, changing transports and network address, even when they are temporarily disconnected and can be addressable by other end points [23].

3.2 Grid middleware layer

Grid computing has been an active research area for several years and several systems exist through utilize functional computational grids. The most notable of these is the NASA Information Power Grid [24] (run on the Globus toolkit) and the new grid being constructed for analyzing data from the Large Hadron Collider project at CERN [25]. The introduction of computational grids has given developers a considerable number of extra problems to overcome in order to make them work correctly, reliably and also to build new middleware's apart from Globus, which is widely used. The Globus toolkit [26] is designed to enable people to create computational grids. It has been developed at the Argonne National Laboratory Illinois, USA. It is an open source initiative aimed at creating new grids capable of the scale of computing. As an open source project any person can download the software, examine it, install it and hopefully improve it. By this constant stream of comments and improvements, new versions of the software can be developed with increased functionality and reliability. The Globus toolkit itself is made from a number of components. Figure 2 shows the three main components in the globus toolkit: (i) Resource management (GRAM), (ii) Information services (MDS), and (iii) Data management (GridFTP). Grid security infrastructure acts as a base for the three components [27].
Figure 2

Globus toolkit components.

Resource management The resource management provides support for, i) resource allocation, ii) submitting jobs for executing jobs remotely and receiving results, and iii) managing job status and progress.

Information services The information services provides support for collecting information in the grid and for querying this information, based on the Lightweight Directory Access Protocol (LDAP).

Data management The data management provides support to transfer files among machines in the grid and managing the transfers in secured manner.

The design of the toolkit itself is very modular and has been developed in a way to make alterations and improvements easier with less impact on connected components. The toolkit is written in the C programming language and the source is available for download. It is designed to work on a number of platforms, predominantly that of Linux but with limited support for Microsoft. So far Globus has been a lead contender in the development of grid computing and is currently the only major e ort with open source availability. The Toolkit itself is designed to work in research environments, pre-dominantly as an impetus to be redesigned and improved.

3.3 UDT protocol

Layer 3 plays a vital role in the proposed system. It incorporates the grid architecture in two ways: transfer and data storage. Transfer of multimedia data among pGrid nodes is done using UDT protocol [28, 29]. UDT is a UDP-based approach. It is accounted to be the only UDP-based protocol that applies a congestion control algorithm targeting shared networks. It is an emerging application level protocol with user configurable control algorithms and extended powerful API's. UDT is a transport protocol with its acquired reliability control and the flow or congestion control is constructed above UDP. UDT allows both reliable data streaming service which is quite similar to TCP and partial reliable messaging. Applications use UDT socket to transfer their data, which is passed to the UDP socket. Figure 3 shows that UDT in the application layer above the UDP.
Figure 3

Layer architecture (UDT).

Application exchanges its data through UDT socket, which then uses UDP socket to send or receive the data. Memory copy is bypassed between UDT socket and UDP socket, in order to reduce processing time. Application can also provide a customized control system (CC).

3.4 Proposed framework components

The proposed framework components are divided into four layers, DLAI, ISM, DRM, and pGrid node monitorization (PNM). Various operations come under these layers which are carried out by the users. In each of these layers, there exist sub layers according to the type of tasks allowed to the users.

3.4.1 Digital library access interface (DLAI)

The proposed system can be abstracted into different levels from several viewing angles. First, the whole concept of wireless digital library can be broken down into peer grid nodes and multimedia data. Secondly, actions can be from groups and users. It can be further divided into basic operations and final operations, which forms objects in different abstraction levels of the system. This interface can be accessed by both the system administrator and other users. It has several operations on digital data such as, store, browse, copy, and retrieve.

3.4.2 Integrated system management (ISM)

The ISM operation contains sub layers, such as Node Configuration and User Operations. The system administrator is responsible for performing these tasks, concerning "pGrid Nodes" and "Users".

The node configuration sub-layer is where the pGrid node configurations are done. Nodes configurations are taken care by P2P middleware JXTA, which does the job of discovering peers and their neighbors. Each peer is an authenticated peer. When the user executes the system, the peer will be added and shown in the other peer list.

In the User Operations, the user can do operations related to the system users. The User Operations are i) share and ii) retrieve.

3.4.3 Data retrieve management (DRM)

DRM incorporates data grid architecture which integrates data storage devices and data management services. Various Grid middleware are available to implement Grid environment, such as Globus, Legion, and Unicore etc. Since globus is widely used and portable on open source systems, Globus Toolkit is adopted in our system. It provides solutions such as security, resource management, data management and information service. The storage system is a basic data grid component and supports various file systems [30]. Data access service is a mechanism for accessing, managing and transferring data in the storage system [31]. Resource Management is a part in the data grid architecture which is responsible for storage system, net-works and other data grid resources. It assures end-to-end efficiency, technical assessment of the efficiency test, as well as crucial resources. Grid Security Infrastructure provides environment authorization and certification mechanism to a large number of users.

3.4.4 pGrid node monitorization (PNM)

The monitorization is done over a pGrid node. It is a useful layer which analyzes node performance and its behaviors along with the time to set up the alarms to the administrator. The main goal is to prevent system failures by warning the administrator and to provide information to the users regarding non-availability of node. It presents information about the actual state of the peers.

4 Design Details

Figure 4 depicts design details of the proposed framework. It describes the layers available in the proposed framework consisting of i) low level interface which establishes the connection with underlying layers such as globus and JXTA middleware's, ii) functionalities layer contains the code where the user can share, search, download the files and iii) user interface layer has users (pGrid nodes).
Figure 4

Design details.

The framework provides the user with an interface to access the digital library from anywhere. When the user wants to access the digital library, the following steps are to be followed: i) first step, the user related data interface. It checks whether the user is an authenticated person or not, ii) second step, the user is asked to enter the key (generated by the system) and sets a username and password for his pGrid node, iii) third step is the function call control interface. This interface does the job of finding the entire neighbor pGrid nodes and lists it in the user screen. The pGrid nodes are the basic components of storing data and also the basic units of resource discovering entity.

Figure 5 depicts the sequence diagram for searching a file using our proposed framework. Consider there are two pGrid nodes A and B. A has shared a file and B wants to download and use it. Sequences to achieve this task are stated below:
Figure 5

Sequence diagram for file search.

  • Step 1. User A starts his system and logins

  • Step 2. After login, User A sends a broadcast message to establish his presence

  • Step 3. Server monitors the User A, receives his ID and stores it.

  • Step 4. User A shares a file and sends the File ID [Key] to the sever

  • Step 5. Server updates the details

  • Step 6. User B starts his system and logins

  • Step 7. After login, User B sends a broadcast message to establish his presence

  • Step 8. Server monitors the User B, receives his ID and stores it.

  • Step 9. User B searches for a file

  • Step 10. Server gets the search string and searches for the file in all Nodes details

  • Step 11. Finds the string in Node A (User A)

  • Step 12. Sends the User A ID to User B

    Note: If the file is shared by more than one User then Server will send all the ID to the User B.

  • Step 13. User B selects the User A ID and downloads the file directly from the User A.

Figure 6 shows the objects and operations available in the proposed framework. There are different operations which can be taken to different objects. The smallest basic object is the digital object. For instance the digital object description gives out the details of a multimedia data. The multimedia data object description includes object id, size, name, data located, full path node with IP address and username who executed the operation.
Figure 6

Objects and operations.

The pGrid node is not only the component of the storage volume from the view point of digital library abstraction, but it is the basic unit of computing grid environment for storing operations, It is also a basic unit of grid computational information and resource discovering entity. The MDS in pGrid node provides resource information and the system utilizes the MDS computational resource information, such as host full domain name and IP address.

To every action (i.e., share, search backup etc.), when the operation is finished, the operation execution information (multimedia data file accessed, size, etc.) will be recorded into the job information statistics database. This is done by the statistic module which collects time benchmarks in different places of procession and updates the statistic database which is maintained in a text file in every local pGrid node. Other than the above text file there are i) data object description file which contains the details of the multimedia data and ii) grid node description file which contains the details of nodes.

5 Implementation

In order to provide stable and different operations for the proposed framework, several functions are called. The first function wmdl_add_node (Algorithm 1) initiates a pGrid Node. It adds a multimedia data file in the peer grid node and the user assigns a relative search key for it. The wmdl_search (Algorithm 2) is a function which does the job of searching or querying for a particular data. The user will type a relevant search key and then the system will find to display it in the pGrid node search list with details.

try

//Initialize and Log into network

netPeerGroup = PeerGroupFactory.newNetPeerGroup();

//check for availability

if(available in netPeerGroup())

     login

else

     //create new login

     newpGridlogin();

     buildpipes();

     createpGridID();

     broadcasteID();

//Obtaining the Services

myDiscoveryService = netPeerGroup.getDiscoveryService();

Algorithm 1: Adds a new pGrid node

create_ShareObject();

//Creating Share object

cms = new Share(); //share object

//initializing group for sharing an object;

cms.init(myGroup);

//object for file sharing

//initialize Content Manager;

ContentManager contentManager = null;

contentManager = cms.getContentManager();

//Sharing files in current JXTA network

//get_filesList();

if(list[i].isFile())

     //Sharing Files and check sums in network

     //share_files();

     contentManager.share(list[i],checkSum.getFileSum(list[i]));

Algorithm 2: Shares a file in a group

The wmdl_browse (Algorithm 3) and (Algorithm 4) is an operation which updates the file list available in all pGrid nodes in the network and displays it in the browse list with pGrid node name. The operation wmdl_cp is also a low level operation, which just simply copies a data between two remote nodes. This operation is done only by the administrator and is triggered when a pGrid Node is removed permanently from the group.

//set_myGroup();

myDiscoveryService = myGroup.getDiscoveryService();

//Creating listener object and initialize peeradvertisement;

discoveryEvent(DiscoveryEvent event)

DiscoveryResponseMsg res = event.getResponse();

PeerAdvertisement peerAdv = res.getPeerAdvertisement();

if(peerAdv != null)

     name = peerAdv.getName();

//search_pGridnodelist();

PeerAdvertisement myAdv = null;

Enumeration en = res.getAdvertisements();

Vector peerList = new Vector();

while(en.hasMoreElements())

     myAdv = (PeerAdvertisement) en.nextElement();

     peerList.addElement(myAdv.getName());

//update_pGridnodelist()- updating the List of peers name

updatePeerList(peerList);

Algorithm 3: Updates nodes in the group

The operation wmdl_rm is used for removing data from a pGrid node. This operation can be initiated by the user and can remove data only from the node. The proposed system makes a copy of the data in some other remote node by calling the wmdl_backup operation. In this function, first the node in which the data located is identified and then the data is moved to a temporary delete directory. It is then placed in a remote node and the status is updated as deleted in data object

Initialize_pGridnodelist();

//search_filename();

notifyMoreResults()

     searchResult = search_file;

for(int i = 0; i ¡ searchResult.length;i++)

     append results;

//update the searched files in the requested pGrid node

update_file_pGridnodelist();

Algorithm 4: Searches a file in Peer Group

description file. All of these procedures are submitted as remote jobs to grid system and they take place in remote nodes.

initialize_pGridnodelist();

select_pGridnode();

initiate_downloadprocess();

//this class is a subclass of GetContentRequest

GetRemoteFile(Groupname, filename, destinationpGridnode))

     call UDTTransfer

//inform the user about current download progress

     notifyUpdate(int percentage)

//inform user when download has been finished

     notifyDone()

     append(Downloading Process is successfully finished);

Algorithm 5: Download file from a peer

The function wmdl_download (Algorithm 5) provides the programming interface to the globus fundamental service. The wmdl_download provides two different interfaces for remote file transfer and copy. The operation wmdl_shutdown is a function for physically taking a node out of the system. When a node is to be shut down and taken out of the system, all the multimedia data located in that node are reloaded to other nodes to guarantee the data integration.

6 Experimental results

The proposed system is experimented on two networks: (i) adopting only P2P and (ii) P2P with Grid computing technology. For P2P networks it uses TCP as its base for transferring data between peers. In case of Grid computing, transfer protocol used for transferring data is UDT [32]. As UDT is a connection oriented duplex protocol, it supports both reliable data streaming and partial reliable messaging. UDT entity has two parts: the sender and the receiver. The sender is dependable for data packet receiving, timer expiration detection, control packet sending and receiving. All data and control packets in both directions are exchanged between a pair of UDP ports.

Figure 7 demonstrates the operation of UDT. Entity A sends multimedia file to another entity B. The file is transferred from 'A' sender to 'B' receiver, whereas control information about that file is exchanged between the two receivers. Table 3 shows that UDT's features for bulk data transfer and streaming data processing.
Figure 7

UDT protocol architecture.

Table 3

UDT services/features

Services/features

UDT

TCP

UDP

Connection-oriented

yes

yes

no

Full duplex

yes

yes

yes

Reliable data transfer

yes

yes

no

Partial-reliable data transfer

yes

no

no

Flow control

yes

yes

no

Congestion control

yes

yes

no

Selective ACKs

yes

optional

no

Multi streaming

Dependent

yes

no

Multi homing

yes

no

no

TCP cannot be used for this type of processing because it has two problems. First, the link has to be wiped out to employ the full bandwidth. Second, when two TCP streams start at the similar time, the stream with longer RTT will be famished due to the RTT bias problem. This results in long wait for analysis process and slow down the data stream. UDT supports selective streaming for each client when required, while TCP does not.

6.1 Performance of TCP, UDP, and UDT

Performance of TCP, UDP and UDT are checked using emulators. Network link emulators are easy to use and they delay or drop packets coming in or going out of a specific network interface to match the desired network characteristics (latency, packet loss, and bandwidth). Dummynet [33] is an open source tool and integrated with FreeBSD. Figure 8 shows the user dummynet setup. We use Dummynet for three main reasons: (i) first, they are of production quality, not prototypes and are used by researchers, (ii) second, they are freely available on operating systems (Linux and FreeBSD) and (iii) finally, they are already being used by the research community. Dummynet capture incoming or outgoing packets. They use a set of rules and queues to store the packets and determine the packet to be released to the operating system (in the case of incoming packets) or to the network (in the case of outgoing packets).
Figure 8

Dummynet setup.

We use FTP payload for TCP and CBR payload for UDP traffic. During the simulation, we evaluated the scenarios on the basis of link capacity such as 256 Kb, 1, 5, and 11 Mb. The size of TCP and UDP packets remain same in all of the scenarios. Figure 9 gives analytical comparison of TCP and UDP in detail. Throughput can be found by the following formula: Throughput = Packet_size Packet_delivered/Time
Figure 9

Throughput comparison of TCP and UDP w.r.t different link capacities.

Concurrent UDT flows can share the available bandwidth fairly, while UDT also leaves enough bandwidth for TCP. Because of parallel streams, throughput increases when compared to other protocols such as TCP and UDP. In some cases, throughput decreases, when the numbers of flows are increased which in-turn occupies more bandwidth and hence decrease in throughput. Figure 10 shows the analytical comparison of throughput of UDT versus a number of flows.
Figure 10

Throughput of UDT protocol w.r.t link capacity 100 MB.

Table 4 shows the time taken to transfer files using TCP, UDP, and UDT protocol. A large file of 70 MB is transferred between two machines connected via a dummynet shown in Figure 11. Transfer method adopted is memory-to-memory transfer using RAM Disks and the transfer delay is set to be 50ms using the dummynet.
Table 4

Time taken to transfer files using TCP, UDP, and UDT

File size

Rate

Time taken

(MB)

(Kb/s)

Seconds

  

TCP

UDP

UDT

1

256

0.010

0.008

0.002

4

239.8

0.035

0.030

0.020

15

241.7

5.7

3.65

1.7

35

205.5

120.39

101.89

60.39

70

401.2

82.56

66.57

32.56

Figure 11

Distribution of transfer times using UDT.

From the graph plotted (Figure 11), we observe that the distribution of transfer time is close to a normal distribution. In UDT the transfer times is high, when compared to other protocols, such as TCP and UDP. The variation is because UDT employs better flow control and congestion control mechanism than other protocols. Flow control plays a vital role in high transfer rates, because, receiver buffer at the network line card is frequently overrun and sometimes not scheduled for transferring data.

6.2 Sample scenario

Figure 12 depicts the sample scenario for our proposed system. Each pGrid node has an interface with options such as share, browse, unshared and download. Interface also displays the list of pGrid nodes running in a network. When a user shares a file and assigns a key, then the file presence can be viewed or browsed by other nodes. Ordinary pGrid node has permission to unshared the file and not able to delete it physically from his node. When the user chooses the unshared option, it triggers the server, stating that the file has to be moved somewhere else and the admin node deletes the entry in that file.
Figure 12

Sample scenario of the proposed system.

We experimented the proposed system using these benchmarks: (i) file size versus download time, (ii) simultaneous connections, (iii) bandwidth utilization, (iv) security, (v)scalability, and (vi) robustness. Table 5 shows the benchmark results from our system when incorporated in two modes: (i) P2P only, and (ii) P2P with Grid. From Table 5, it can be seen that the file size matters in P2P network, as it uses TCP for transfer and hence speed is normal when compared in using both P2P and Grid computing, as it uses UDT protocol. Bandwidth utilization depends on a number of connections, i.e., number of peers connected in the network, which indirectly employs in simultaneous connections. When deploying grid concepts in peer-to-peer networks, scalability increases, as there is no administrative node in P2P concept. And also robustness increases architecture it supports.
Table 5

Process benchmarks

Benchmarks file size vs. download time

P2P normal speed (TCP)

P2P with grid increased speed (UDT)

Simultaneous Connections

Supports, increased download time

Supports and minimal increased download time

Bandwidth utilization

Normal

Good

Security

Authentication by JXTA

GSI

Scalability

Supports

More scalable

Robustness

Limited support

Supports

Apart from the above benchmarks, we tested the system and obtained the download time taken in two modes, i.e.: (i) in P2P networks, and (ii) P2P with Grid. Tables 6 and 7 tabulates the total time taken for retrieving a multimedia data when using the two modes, respectively. From Tables 6 and 7, download time depends on size of the file and link rate between the nodes. When the link rate goes down and whatever the file size is, download time increases. There is a remarkable variation in link rates when adopting P2P with Grid and is less when using only P2P. The variation is because of UDT protocol. Figure 13 shows the plotted results for retrieval time against time for the two modes, i.e., only P2P and P2P with Grid. From the Figure 13 it is clear that, the proposed system gives less retrieval time when compared to other methods, i.e., using only P2P. The steep decrease in retrieval time in the proposed system is because of using UDT which is incorporated in grid architecture and used with P2P. And also it can be noticed that, in some points, there is a slight variation of download time, i.e., increase in downloads time. This is because of low link rate between the two nodes.
Table 6

Time taken for retrieving multimedia data using P2P technology

Size

P2P

 

Packet rate (sec)

Link capacity (Mbits/sec)

Rate (Mbits/sec)

Time taken (sec)

1

773

851

2.884

4.1

10

973

1185

6.656

13.7

25

1125

1590

8.604

25.5

50

962

2413

5.398

83.9

75

976

1476

8.264

76.8

90

1111

1030

10.012

75.7

150

1321

1290

8.707

153.5

Table 7

Time taken for retrieving multimedia data using P2P and grid technology

Size

P2P with grid computing

 

Packet rate (sec)

Link capacity (Mbits/sec)

Rate (Mbits/sec)

Time taken (sec)

1

1259

4315

5.388

1.68

10

3373

5221

13.444

6.30

25

3919

5114

22.717

8.87

50

5044

5395

25.521

15.62

75

5294

5529

24.99

24.56

90

5487

5789

26.226

28.31

150

6056

6190

27.902

45.48

Figure 13

Comparison of retrieval time for P2P and with grid.

7 Conclusion

In this article, we have proposed a framework for building wireless digital library using grid computing and P2P technology. By adopting this technology storing of large bytes of multimedia data across the network is done efficiently and accessed easily from anywhere. From the experimental results obtained, we claim that this framework works efficiently because, the components act as key components and assure an increased performance. The proposed framework is deployed and experimented in an institution, which pertains to discovery and dissemination of multimedia data within their available resources, and the results are tabulated. From the results and benchmarks, the proposed framework works well and attains an efficient state when incorporated with Grid Computing and P2P technology. As every system lacks in some point, our system lacks in, when there are more number of simultaneous connections, i.e., more pGrid node formations. Increase in node results in reduced bandwidth utilization and increased retrieval time. Our future study is to optimize the results obtained, by solving the node increase problem and to incorporate the framework into cloud computing technology.

Declarations

Authors’ Affiliations

(1)
Department of Information Technology, Misrimal Navajee Munoth Jain Engineering College, Anna University
(2)
Department of Computer Science and Engineering, Sri Sivasubramania Nadar College of Engineering, Anna University
(3)
Department of Computer Science and Engineering, Jawaharlal Nehru Technological University

References

  1. Fei Han, Almeida Nuno, Loureno Miguel, Trezentos Paulo, Luis Borbinha Jos, Neves Joo: ARCO: a long-term digital library storage system based on grid computational infrastructure. In Proceedings of the Seventh International Conference on Enterprise Information Systems. Volume 1. Miami, USA; 2005:44-51. ISBN:972-8865-19-8Google Scholar
  2. Foster I: The anatomy of the Grid: Enabling scalable virtual organizations. Lecture notes in Computer Science 2001.Google Scholar
  3. Foster I, Kesselman C, Nick J, Tuecke S: The physiology of the grid: open grid services architecture for distributed system integration. Article in Open Grid Service Infrastructure WG, Global Grid Forum 2002.Google Scholar
  4. Kubiatowiuz J, Bindel D, Chen Y, Czerwinski S, Eatson P, Gunmandi R: OceanStore: an architecture for global scale persistent storage. In Proceedings of 9th International Conference on Architectural Support for Programming Language and OS. Volume 35. Cambridge, Massachusetts, USA; 2000:190-201. doi:10.1145/356989.357007Google Scholar
  5. Fox EA, Marchionini G: Toward a worldwide digital library. Commun. ACM 1998, 41(4):28-32.Google Scholar
  6. Borbinha Jos, Pedrosa Gilberto, Gil Joo, Martins Bruno, Freire Nuno: Milena Dobreva and Alberto Wyttenbach. In Digital libraries and digitised maps: an early overview of the DIGMAP project. Volume 4822. Asian Digital Libraries, LNCS, Springer Berlin/Heidelberg; 2007:383-386. ISBN:978-3-540-77093-0Google Scholar
  7. Lagoze C, Hunter J: The ABC ontology and model. J. Digital Inf 2002, 2(2):160-176.Google Scholar
  8. Aberer K, Punceva M, Hauswirth M, Schmidt R: Improving data access in p2p systems. IEEE Internet Comput 2002, 6(1):58-67. 10.1109/4236.978370View ArticleGoogle Scholar
  9. Stoica I, Morris R, Karger D, Kaashoek MF, Balakrishnan H: Chord: a scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 conference on applications, technologies, architectures, and protocols for computer communications, ACM Press, SIGCOMM01. San Diego, California, USA; 2001:149-160.Google Scholar
  10. Ratnasamy S, Francis P, Handley M, Karp R, Schenker S: A scalable content-addressable network. In Proceedings of ACM SIGCOMM. ACM Press, New York, NY, USA; 161-172. ISBN:1-58113-411-8, doi:10.1145/383059.383072Google Scholar
  11. Rowstron A, Druschel P: Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Proceeding of IFIP/ACM International Conference on Distributed Systems Platforms (Middleware). Heidelberg, Germany; 2001:329-350.Google Scholar
  12. Aberer Karl, Cudr-Mauroux Philippe, Datta Anwitaman, Despotovic Zoran, Hauswirth Manfred, Punceva Magdalena, Schmidt Roman: P-Grid: a self-organizing structured P2P system. SIGMOD Rec 2003, 32(3):29-33. doi:10.1145/945721.945729 10.1145/945721.945729View ArticleGoogle Scholar
  13. Cuenca-Acuna FM, Peery C, Martin RP, Nguyen TD: PlanetP, Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. Technical Report DCS-TR-487, Rutgers University 2002.Google Scholar
  14. Suel Torsten, Mathur Chandan, Wu JoWen, Zhang Jiangong, Delis Alex, Kharrazi Mehdi, Long Xiaohui, Shanmugasundaram Kulesh: Odissea: A peer-to-peer architecture for scalable web search and information retrieval. Technical report, Polytechnic University 2003.Google Scholar
  15. Fagin R: Combining fuzzy information from multiple systems. J. Comput. Syst. Sci 1999, 58(1):83-99. 10.1006/jcss.1998.1600MathSciNetView ArticleGoogle Scholar
  16. Bender Matthias, Michel Sebastian, Zimmer Christian, Weikum Gerhard: Towards Collaborative Search in Digital Libraries Using Peer-to-Peer Technology. Volume 2004. DELOS Workshop: Digital Library Architectures; 2004:61-72.Google Scholar
  17. Renuga AR, Sudhasadasivam G: P2P information retrieval framework for digital library system. J. Appl. Theor. Inf. Technol 2009, 5(3):301-306. ISSN 1992-8615Google Scholar
  18. Shih Wen-Chung, Yang Chao-Tung, Tseng Shian-Shyong: An interlibrary cooperation framework for digital libraries using P2P technology. In Proceedings of the Asia-Pacific Conference on Library & Information Education & Practice Singapore: School of Communication & Information, Nanyang Technological University. Singapore; 2006:155-159.Google Scholar
  19. Joshi H, Jakharia JC: Digital library grid: a Roadmap to Next generation digital libraries using grid technologies. 4th International Convention CALIBER-2006, Gulbarga, INFLIBNET Centre, Ahmedabad 2006.Google Scholar
  20. Yani C-T, Ho H-C: Using data grid technologies to construct a digital library environment, High-Performance Computing Laboratory Department of Computer Science and Information Engineering Tunghai University Taichung. 40704 Taiwan, ROC 2005. 0-7803-8932-8/05/Google Scholar
  21. Fei Han, Almeida Nuno, Loureno Miguel, Trezentos Paulo, Luis Borbinha Jos, Neves Joo: ARCO: Moving Digital Library Storage to Grid Computing. Proceedings of the International Conference on Enterprise Information Systems 2004, 1: 64-69.Google Scholar
  22. Riasol E, Xhafa F: Juxta-cat: a jxta based platform for distributed computing. In ACM Proceedings of the 4th International symposium on Principles and practice of programming in Java. ACM, New York, NY, USA; 2006:72-81. doi:10.1145/1168054.1168065View ArticleGoogle Scholar
  23. Antoniu G, Jan M, Noblet DA: Enabling JXTA for High Performance Grid Computing. Report: Department of Computer Science, University of New Hampshire 2005, 14.Google Scholar
  24. Wankar R: Grid computing with Globus: an overview and research challenges. Int. J. Comput. Sci. Appl 2008, 5(3):56-69.Google Scholar
  25. CERN open lab for data grid applications[http://public.web.cern.ch/public/]
  26. Globus Toolkit[http://www.globus.org/toolkit/]
  27. Globus: Grid Security Infrastructure(GSI)[http://www.globus.org/security]
  28. Gu Y: UDT: a high performance data transport protocol. In Thesis. Department of Computer Science, University of IIinois, Chicago; 2005.Google Scholar
  29. Gu Y, Grossman RL: UDT: UDP-based data transfer for high-speed wide area networks. Int. J. Comput. Telecommun. Netw 2007., 51(7)Google Scholar
  30. Vazhkudai Sudharshan, Tuecke Steven, Foster Ian: Replica selection in the Globus data grid, Cluster Computing Grid, Computing Research Repository, Report No:ANL/MCS-P869-0201. 2001.Google Scholar
  31. Fitzgerald Steven, Foster Ian, Kesselman Carl, von Laszewski Gregor, Smith Warren, Tuecke Steven: A directory service for configuring high performance distributed computations. In Proceedings of the 6th IEEE Symposium on High Performance Distributed Computing. IEEE Computer Society Washington DC, USA; 1997:365-375. ISBN:0-8186-8117-9Google Scholar
  32. Suresh J, Srinivasan A, Damodaram A: Performance analysis of various high speed data transfer protocols for streaming data in long fat networks. In International Conference on Recent Trends in Information, Telecommunication and Computing. Kerala, India; 2010:234-237. doi:10.1109/ITC.2010.64View ArticleGoogle Scholar
  33. Rizzo L: Dummynet; a simple approach to the evaluation of network protocols. ACM Comput. Commun. Rev 1997, 27(1):31-41. 10.1145/251007.251012View ArticleGoogle Scholar

Copyright

© Arulanandam et al; licensee Springer. 2012

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.