Skip to main content
  • Research Article
  • Open access
  • Published:

Quality-Assured and Sociality-Enriched Multimedia Mobile Mashup


Mashups are getting more complex with the addition of rich-media and real-time services. The new research challenges will be how to guarantee the quality of the aggregated services, and how to share them in a collaborative manner. This paper presents a metadata-based mashup framework in Next Generation Wireless Network (NGWN), which guarantees the quality and supports social interactions. In contrast to existing quality-assured approaches, the proposed mashup model addresses the quality management issue from a new perspective through defining the Quality of Service (QoS) metadata into two levels: fidelity (user perspective) and modality (application perspective). The quality is assured from quality-aware service selection and quality-adaptable service delivery. Furthermore, the mashup model is extended for users to annotate services collaboratively. The annotation occurs in two ways, social tagging (e.g., rating and comments) and QoS attributes (e.g., device type and access network, etc.). In order to apply this network-independent metadata model into NGWN architecture, we further introduce a new entity named Multimedia Mashup Engine (MME) which enables seamlessly access to the services and Adaptation Decision Taking (ADT). Finally, our prototype system and the simulation results demonstrate the performance of the proposed work.

1. Introduction

The evolution of Web 2.0 has brought a significant impact on the Internet service provisioning by encouraging the contribution from end user for contents and services creation. This phenomenon, termed User-Generated Content (UGC) or User-Generated Service (UGS), aim to enlarge user personalization through the "Do IT Yourself (DIY)" manner. Mashup, as a general term in the UGC/UGS domain, is an application that incorporates elements coming from more than one source into an integrated user experience [1]. Meanwhile, in Telecom there is an ongoing process of transformation and migration from so-called legacy technology to an IP-based Next Generation Networking (NGN), or Next Generation Wireless Network (NGWN), which enables people to access multimedia anytime and anywhere. With the advantage of an All-over-IP network, the opportunity for integration and convergence is amplified, where the most prominent example is the Web-NGN convergence. Toward the convergence of Web and NGN, mobile mashup is promising for the next generation user-driven multimedia delivery [2, 3].

With the proliferation of services available on the Internet and the emergence of user-centric technologies, millions of users are able to voluntarily participate in the development of their own interests and benefits by means of service composition [4]. The concept of composition is to create a new service by combining several existing elementary services. A number of composition mechanisms have been proposed, such as workflow technique and Artificial Intelligence (AI) Planning [5]. However, as most of the existing solutions are still professional developer inclined, the arduous development task always discourages users to contribute themselves to the service creation process. In this context, mashup, which is well known with its intrinsic advantages of easy and fast integration, is a promising choice for the user-driven service composition issues. Generally, the mashup mechanism is provided to combine nonreal-time Web services such as translation, search, and map. by leveraging the programming Application Programming Interfaces (APIs). With the proliferation of mobile devices and wireless networks, real-time and resource-consuming multimedia services have been ubiquitous and all pervasive. Thus, in this paper we consider mashup as user-driven multimedia aggregation. We argue that the user-driven multimedia delivery is more challenging than the provider-driven model. Firstly, to nonexpert users it is desirable to have a mashup model which hides the backend complexity and simplifies the aggregation process. Moreover, the emerging mashups are getting more and more complicated when the rich-media and real-time services are aggregated. Nevertheless, the diverse terminals, heterogonous networks as well as various user requirements constrain multimedia mashup to low quality, especially in the mobile network environment. The third challenge is raised from the sociality point of view. Since the great success of social networking has shown that user experiences are enriched by sharing, aggregating, and tagging collaboratively, the social phenomena behind mashup are worth being explored.

Our paper presents a NGWN-based mashup framework, which is featured by an intermediate metadata model with the guarantee of quality and the support of sociality. The metadata-based framework brings the benefits in three aspects. Firstly, the human-readable metadata is the higher level description language compared with programming APIs, which can hide the programming complexity from nonexperts. Secondly, the scalable quality management can be enforced by Quality of Service (QoS) metadata. The concept of scalability in this paper means that the aggregated media can be tailored and adapted to diverse terminals and heterogeneous networks with the assured quality, which aims to provide the best user experiences across aggregated multimodal services. Thirdly, these metadata entities can be further enriched collaboratively by end users through social annotation. In this paper, we propose to extend the CAM4Home metadata as our mashup model. CAM4Home is an ITEA2 project enabling a novel way of multimedia provisioning by bundling different types of content and service into bundles on the level of metadata [6]. In our solution, rich-media services including video, audio, image, and even text can be encapsulated as Collaborative Aggregated Multimedia (CAM) Objects, which can be then aggregated into CAM Bundles. We further propose to integrate MPEG-21 metadata within the CAM4Home model. We enforce QoS by two ways, quality-aware service selection at design-time, and quality-adaptable service delivery at run time. The human-readable part of QoS metadata facilitates service selection firstly. Meanwhile it will enable adaptable delivery. Prominently, our system supports collaborative annotation. The annotation occurs in two ways, social tagging (e.g., rating and comments), and QoS tagging (e.g., device type and access network etc.). The former may facilitate service selection, while the latter will enhance QoS-aware mashup consumption.

The rest of the paper is organized as follows. Section 2 reviews the background and related works. In Section 3, we describe a scenario and present the metadata-based model, in which we illustrate QoS management and social metadata. Section 4 discusses the approach to apply the metadata model into the NGN-based service architecture. A prototype system and the performance evaluation are described in Section 5. Section 5 concludes the paper and presents some issues for future research.

2. Related Work

The past few years have witnessed the great success of user-driven models, such as Wikipedia, Blog, and YouTube, which are known as UGC. The next big user-driven hype will happen in the service arena, that is, UGS. Considerable researches have been conducted on mashup and service composition, most of which utilize Web-based programming technologies (e.g., Web Service Description Language (WSDL) and Representational State Transfer (REST)) for the implementation. In order to facilitate the creation of mashup, some Web platforms have been proposed by different communities, among which Yahoo Pipes [7] and Microsoft Popfly [8] are well-known examples. Nevertheless, these platforms are far from being popularized for the ordinary users due to their complexity. It is desirable to have a mashup model which hides the backend complexity from user, simplifies the service creation interface, and satisfies the service creation variety requirements.

Unlike traditional data services, multimedia services face more challenges in the heterogeneous environments. A lot of research works have been conducted in this area. Z. Yu et al. proposed a context-aware multimedia middleware which supports multimedia content filtering, recommendation, and adaptation according to changing context [9, 10]. The article in [11] described an approach for context-aware and QoS-enabled learning content provisioning. L. Zhou et al. presented a context-aware middleware system in heterogeneous network environments, which facilitates diverse multimedia services by combining an adaptive service provisioning middleware framework with a context-aware multimedia middleware framework [12]. The scheduling and resource allocating issues were discussed for multimedia delivery over wireless network [13, 14]. However, these systems or solutions usually targeted one type of media. When more and more rich-media services are aggregated or composed, the quality issue is getting more challenging. In addition, the social phenomena between users are ignored by the past research works.

Typically, a mashup process can be divided into three steps: service selection, service aggregation, and service execution. The quality issue is across these three steps, among which research efforts are firstly made to QoS-aware service selection. A composite service can be constructed and deployed by combining independently developed component services, each one may be offered by different providers with different nonfunctional QoS attributes. A random selection may not be optimal for its targeted execution environment and may incur inefficiencies and costs [15]. Therefore, a selection process is needed to identify which constituent services are to be used to construct a composite service that best meets the QoS requirements of its users. To formally define the QoS level required from the selected provider, the provider and the user may engage in negotiation process, which culminates in the creation of a Service Level Agreement (SLA). The management of QoS-based SLAs has become a very active area of research, including the QoS-aware service description, composition, and selection [16]. However, QoS-aware service selection is just the initial step to guarantee the quality. The other two steps may also bring a lot of impacts to the final quality. Most prominently, the context of service creation could be different to that of service execution, especially in the mobile environment, where the diverse terminals, heterogonous networks as well as various user requirements constrain the multimedia access to low quality. This problem is getting more and more complicated when the rich-media services are aggregated. As a result, a scalable model with QoS management is significantly important for mashups, especially for the mobile mashups in a highly dynamic service environment.

Since the mechanism of mashup is to combine data from different sources, it is desired to have an overall quality model across aggregated services. T.C. Thang et al. [1719] have intensively studied the quality in multimedia delivery. They identified the quality from two aspects: perceptual quality and semantic quality. The former known as fidelity refers to a user's satisfaction, while the latter is the amount of information the user obtains from the content. The former is sometimes referred to as Quality of Experience (QoE), while the latter is as Information Quality (IQ). In some cases, the perceptual quality of a media service is unacceptable or its semantic quality is much poorer compared with that of a substitute modality. A possible solution for this problem is to convert the modality. For example, when the available bandwidth is too low to support the video streaming service for a football match, the text-based statistics service would be more appropriate than the adapted video with poor perceptual quality. This is a typical case of video-to-text modality adaptation. Apparently, the combination of fidelity and modality can enhance user experiences. Dynamic adaptation is seen as an important feature enabling terminals and applications to adapt to changes in access network, and available QoS due to mobility of users, devices, or sessions [20]. The previous research works on multimedia adaptation are more concerned with the perceptual quality from the aspect of end user. However, the intensive studies in [1719] state that the semantic quality should be considered in some cases. They argue that modality conversion could be a better choice than unrestricted adaptation on fidelity. The Overlapped Content Value (OCV) model is introduced in [17] to represent conceptually both quality and modality. Unfortunately, a quality model for mashup has never been mentioned in the literature. In this paper, we propose to apply both fidelity and modality into the quality of mashup. We argue that both perceptual quality and semantic quality need to be considered in order to provide quality-assured mashup.

Considering video as the most prominent media, we take video as the example for quality adaptation. There are some issues that cannot be ignored for video adaptation, such as complexity, flexibility, and optimization. In this regard, Scalable Video Coding (SVC) has emerged as a promising video format. SVC is developed as an extension of H.264/MPEG-4 Advance Video Coding (AVC) [21]. SVC offers spatial, temporal, and quality scalabilities at bit stream level, which enables the easy adaptation of video by selecting a subset of the bit stream. As a result, the SVC bit streams can be easily truncated in spatial, temporal, and quality dimensions to meet various constraints of heterogeneous environments [19]. The three-dimensional scalability offers a great flexibility that enables customizing video streams for a wide range of terminals and networks. SVC can thus allow a very simple, fast, and flexible adaptation to the heterogeneous networks and diverse terminals. M. Eberhard et al. have developed an SVC streaming test bed, which allows dynamic video adaptation [22]. It is desired to apply the advantages of SVC into mashup coping with the quality issue.

The ubiquitous multimedia results in the overwhelming multimedia services where it has become difficult to retrieve specific ones. Semantic metadata is a solution to the overwhelming resources. The lack of semantic metadata is becoming a barrier for the in-depth study and wide application. Recently, the great success of social networking has shown that user experiences are enriched by sharing, aggregating, and tagging collaboratively. Under this trend, folksonomy also known as social tagging or collaborative annotation draws more and more attention as a promising source of semantic metadata. Several works have been launched to exploit the knowledge of the mass in order to improve the composition process by considering either social networks or collaborative environments [2325]. However, they only make use of sociality for service selection or recommendation. The sociality across the process of mashup should be further explored, especially for the quality issue.

In this paper, we present a mashup framework as illustrated in Figure 1. We enforce the quality by two ways, quality-aware service selection, and quality-adaptable service delivery. The proposed quality model considers both fidelity and modality to meet QoS requirements in the diverse terminals, heterogeneous networks as well as dynamic network conditions. We concentrate on both the user level by specifying user perceivable service parameter and the application level by adapting multimedia services according to the resource availability of terminal and network. Furthermore, we extend the mashup model allowing users to annotate the services collaboratively.

Figure 1
figure 1

Mashup Model.

3. Mashup Model

This section firstly describes the concept of metadata-based mashup model through an example scenario, followed by the illustration of the model decomposition. The mashup model is further decomposed into three essential parts: multimodal service aggregation, metadata-based QoS management, and metadata-based social enrichment.

3.1. Concept of Mashup Model

Let us take "Sports Live Broadcasting" service as an example. The scenario is the last round of the football league where more than one team has the chance to win the champion. All teams start playing at the same time. Fans are watching the live TV broadcasting of their team. At the same time, they may also want to be updated on the information (e.g. goal, penalty, and red card, etc.) of other simultaneous matches. We assume that there are two relevant services from different providers. The first one is an Internet Protocol TV (IPTV) program delivering a live football game. The IPTV service component can be configured by a set of offered alternative operating parameters (e.g., frame sizes, frame rates and bit rates etc.), by which IPTV can be adjusted dynamically according to user context. The second one is a real-time literal broadcasting service delivering statistics data synchronized to all football matches. A user composes the "Sports Live Broadcasting" mashup containing above two services. Before multimedia session, the quality model firstly selects the service version according to the static capabilities of terminals or networks. During session, this service element of IPTV can be adapted according to dynamic network condition or user preferences. Moreover, if the adapted IPTV service cannot provide the expected user-perceived quality, a cross-modal adaptation from IPTV to Text may occur. Besides the quality adaptation, the fan can share the metadata-based mashup with friends like file sharing and annotate it by comment, rating as well as user-generated QoS parameters.

3.2. CAM4Home Metadata

The essential part of mashup model is the multimodal service aggregation. In this paper, we use CAM4Home framework as the metadata model for multimodal service aggregation. The CAM4Home is an ITEA2 project implementing the concept of Collaborative Aggregated Multimedia (CAM) [6]. The concept of CAM refers to aggregation and composition of individual multimedia contents into a content bundle that may include references to content-based services and can be delivered as a semantically coherent set of content and related services over various communication channels. This project creates a metadata-enabled content delivery framework by bundling semantically coherent contents and services on the level of metadata. The CAM4Home metadata model supports the representation of a wide variety of multimedia content and service in CAM Element as well as its descriptive metadata. CAM Object is the integrated representation of CAM Element and CAM Element Metadata on the association rule "isMetadataOf". CAM Bundles are the aggregation of two or more CAM Objects on the association rule "containsCAMObjectReference". CAM Object and CAM Bundle can be uniquely identified by "camElementMetadataID" and "camBundleMetadataID". Figure 2 illustrates a conceptual view of CAM Bundle and CAM Object. Moreover, some complicated rules such as spatial and synchronization are also defined for enhanced aggregation.

Figure 2
figure 2

Conceptual view of CAM object and CAM bundle.

The taxonomy of CAM Element has two subclasses, Multimedia Element and Service Element. The Multimedia Element is the container of a specific multimedia content, which is further divided into four types, document, image, audio, and video. The Service Element is the container of a specific service. The physical content in CAM Element is referred by the attribute "EssenceFileIdentifier" which is a Universal Resource Locator (URL). The Service Element includes the other attribute "ServiceAccessMethod" indicating the methods used to access the service. With the instinctive of CAM, we use the metadata-based approach for the content and service delivery. The service capabilities are described by a CAM object containing Service Element and related metadata, while the converged service is described by a CAM bundle containing several CAM objects of service capabilities. For instance, the attribute "EssenceFileIdentifier" can be used to indicate the Public Service Identity (PSI) of the service capability. And the other attribute "ServiceAccessMethod" indicates the SIP methods (e.g., INVITE) accessing the service. However, the described services are not limited to SIP based. This model can be used to encapsulate any types of services. In this paper, the CAM4Home metadata model is adopted as the rich-media aggregation model. Figure 3 shows an example for the aforementioned "Sports Live Broadcasting" service.

Figure 3
figure 3

CAM4Home metadata example.

3.3. QoS Metadata

It is necessary to provide a quality-guaranteed and interoperable mashup delivery across various devices and heterogeneous networks as well as an optimized use of underlying delivery network bandwidth and QoS characteristics. Generally, it is a computing intensive process for adapting decision-taking involved for choosing the right set of parameters that yield an adapted version. The computational efficiency of adaptating can be greatly enhanced if this process could be simplified, in particular by using metadata that conveys precomputed relationships between feasible adaptation parameters and media characteristics obtained by selecting them [26]. Moreover, the development of an interoperable multimedia content adaptation framework has become a key issue for coping with this heterogeneity of multimedia content formats, networks, and terminals.

Toward this purpose, MPEG-21 Digital Item Adaptation (DIA) specifying metadata for assisting adaptation has been finalized as part of the MPEG-21 Multimedia Framework [27]. MPEG-21 DIA aims to standardize various adaptation related metadata including those supporting decision-taking and the constraint specifications. MPEG-21 DIA specifies normative description tools in syntax and semantic to assist with the adaptation. The central tool is the Adaptation QoS (AQoS) representing the metadata supporting decision-taking. The aim of AQoS is to select optimal parameter settings that satisfy constraints imposed by a given external context while maximizing QoS. The adaptation constraints may be specified implicitly by a variety of Usage Environment Description (UED) tool describing user characteristics (e.g. user information, user preferences, and location), terminal capabilities, network characteristics, and natural environment characteristics (e.g., location, time). The constraints can also be specified explicitly by Universal Constraints Description (UCD). Syntactically, the AQoS description consists of two main components: Module and Input Output Pin (IOPin). Module provides a means to select an output value given one or several input values. There are three types of modules, namely, Look-Up Table (LUT), Utility Function (UF), and Stack Function (SF). IOPin provides an identifier to these input and output values.

The mashup QoS management is proposed on two levels: fidelity and modality. The fidelity is to adapt one of the aggregated service component adjusting QoS parameters, that is, multimedia adaptation with the perceptual quality from the perspective of end user. The modality is to select the most appropriate modality among aggregated multimodal services components, that is, modality conversion with the semantic quality from the application point of view. The overall quality model is illustrated in Figure 4. We propose to integrate MPEG-21 DIA into CAM4Home model enabling QoS management. Originally in MPEG-21 DIA, the output values are utilized by Bitstream Syntax Description (BSD) for content-independent adaptation. However, in the proposed mashup model the adapted target is altered to CAM Bundle. Specifically, the AQoS is embedded in each CAM Object for quality adaptation as well as for modality adaptation. In this regard, for quality adaptation, the output values (e.g., bit rate, frame rate, resolution) are utilized to yield an adapted version on a single service component.

Figure 4
figure 4

Mashup quality model.

3.4. Social Metadata

Collaborative tagging describes the process by which many users add metadata in the form of keywords to shared content [28]. Social metadata is data generated by collaborative tagging, such as tags, ratings, and comments, added to content by individual users other than content creators. Examples can be found everywhere on the web, ratings and comments on YouTube, and tagging in Digg. The social metadata can help users navigate to relevant contents even quicker because members can use them to provide context and relevant description to the content.

The proposed model takes advantage of social metadata to enrich the sociality of mashup from two aspects, service discovery, and QoS management. Accordingly, users are allowed to annotate the services collaboratively in two ways: social tagging (e.g., rating and comments), and QoS attributes (e.g., device type and access network etc.). For example, Bob can tag a CAM entity that is relevant to him and choose the tags he believes best to describe the entity. The keywords Bob chooses help organize and categorize the service element in a way that is meaningful to him. Later, Bob or other members can use those tags to locate data using the meaningful keywords. In order to introduce ambiguous social tagging into structured metadata, the CAM4Home metadata framework defines some attributes of social metadata which include social tag, user comment, and user rating. As mentioned above as the second point, the QoS metadata can also be generated by users. For example, Bob can tag a CAM entity indicating the relevant service inside is not suitable for a mobile device with limited bandwidth. Usually, it is the service provider in the value chain of service delivery to take the responsibility on specifying these QoS parameters. However, it is cost-inefficient and time consuming. The user-generated QoS metadata could be complementary to the provider-generated.

4. Mobile Mashup Architecture

In this section, we firstly describe the mashup framework in detail. Then we propose the extension of session negotiation.

4.1. NGN-Based Mobile Mashup Framework

IP Multimedia Subsystem (IMS) has been widely recognized to be the service architecture for NGN/NGWN, offering multimedia services and enabling service convergence independent to the transport layer and the access layer. The IMS architecture is made up of two layers: the service layer and the control layer. The service layer comprises a set of Application Servers (ASs) that host and execute multimedia services. Session signaling and media handling are performed in the control layer. The key IMS entity in this layer is the Call Session Control Function (CSCF) which is an SIP server responsible for session control. There are three kinds of CSCF, among which Serving CSCF (S-CSCF) is the core for session controlling and service invocation. Home Subscriber Server (HSS) is the central database storing the subscriber's profile. Regarding the media delivery, the key component is Media Resource Function (MRF) that can be seen as media server for content delivery.

The IMS-based mashup framework firstly supports the combined delivery of multimodal services based on CAM4Home model. Further, the QoS management enforced by MPEG-21 DIA metadata is applied into IMS service architecture. Especially, the cross-modal adaptation is implemented as service switching among aggregated services. AS also interacts with MRF in order to ensure the adaptive delivery of media. Figure 5 illustrates the conceptual mashup framework in IMS. The essential component in the proposed mashup platform is Multimedia Mashup Engine (MME) shown in Figure 5. MME provides the controlled network environment between the mashup clients and the service repository. MME enables easy and seamless access to the service repository, and supports the delivery of quality-assured experiences, across various devices, heterogeneous access networks, and multiple service models (e.g., Web-based, Telco-based). Aforementioned mashup is a user-driven model for service delivery. Therefore, MME is firstly proposed as a generic component of Service Deliver Platform, responsible for service-related functionalities, such as service registration and service discovery. Services represented as CAM metadata entities (e.g. object or bundle) are registered in MME. To end users, the rich semantic information may facilitate service composition and service discovery. The service repository holds both service objects and service bundles. To be noted that the service repository can be in MME or in an external database alternatively. For instance, the CAM4Home project provides a web service platform for metadata generating, storing, and searching. In this case, MME needs to access the external platform through Web service interfaces.

Figure 5
figure 5

Conceptual mobile mashup framework.

Besides above functionalities, the vital role of MME is service routing. MME provides the address resolution decision-making on ASs. As shown in Figure 5, MME is located between S-CSCF and AS. For the consideration of scalability and extensibility, we collocate MME in a SIP AS behaving as Back-to-Back User Agent (B2BUA). On one hand, MME is configured to connect with IMS. On the other hand, MME interfaces with SIP ASs which host those aggregated service elements. In order to enable quality-assured mashup, we extend MME mainly from three aspects: Adaptation-Decision Taking Engine (ADTE), UED collecting, and social metadata interface. ADTE either selects appropriate content modalities among the aggregated service components or to choose adaptation parameters for a specific media service. Additionally, MME needs to collect UED as inputs of ADTE. For modality selection, MME can act on the incoming requests and route them to AS according to the outputs of ADTE. Thanks to MPEG-21 QoS management, it is more intelligent compared with the routing criterion in [29] where it is based on the user-requested service element. Secondly, MME supports the social metadata interface, through which end users may enrich the original CAM metadata collaboratively.

For quality adaptation, we hereafter take video as the target considering video that is the most challenging media type. We introduce the Media Aware Network Element (MANE), as shown in Figure 5. The concept of MANE is defined as network element, such as a middlebox or application layer gateway that is capable of adapt video in real time according to the configuring parameters. It is desirable to control the data rate without extensive processing of the incoming data, for example, by simply dropping packets. Due to the requirement to handle large amounts of data, MANEs have to identify removable packets as quickly as possible. In our solution, the objective of MANEs is to manipulate the forwarded bit stream of SVC according to the network conditions or terminal capabilities. The target configurations of video that can be generated include bit rate, resolution, and frame rate that in fact come as the outputs of ADTE.

4.2. Session Negotation Extension

The scalability we describe in this paper relies on the information exchange between client and server, which includes both static capabilities (e.g. terminal or network) and dynamic conditions (e.g. network or user preference). It allows participants to inform each other and negotiate about the QoS characteristics of the media components prior to session establishment. SIP together with Session Description Protocol (SDP) is used in IMS as the multimedia session negotiation protocol. However, the ability is very limited for SDP to indicate user environment information such as terminal capabilities and network characteristics. The User Agent Profile (UAProf) [30] is commonly used to specify user terminal and access network constraints. It is also not enough, because UAProf contains only static capabilities. Although RFC 3840 [31] specifies mechanisms by which an SIP user agent can convey its capabilities and characteristics to other user agents, it is not compatible with MPEG-21-based ADTE. It is important to reach interoperability between IETF approaches for multimedia session management and the MPEG-21 efforts for metadata-driven adaptation, in order to enable personalized multimedia delivery [32]. In our model, UCD and UED serve as the input of ADTE. These input values are in the format of XML document with a known schema. UCD includes the constraints imposed by service providers. We can assume that UCD is available for ADTE. However, UED should be collected for dynamic multimedia session in real time since it is the constraint imposed by external user environment. Therefore, there should be a way to query and monitor UED, particularly terminal capabilities and network characteristics.

In order to collect UED, we propose to extend the Offer/Answer mechanism. According to [33], SDP negotiation may occur in two ways, which are referred to as "Offer/Answer" and "Offer/Counter-Offer/Answer". In the first way the offerer offers an SDP, the answerer is only allowed to reject or restrict the offer. In the latter way, the answer makes a "Counter-Offer" with additional elements or capabilities not listed in the original SDP offer. We slightly modify the latter way to put querying information in the "Counter-Offer". DIA defines a list of normative semantic references by means of a classification scheme [34], which includes normative terms for the network bandwidth, the horizontal and vertical resolution of a display, and so on. For instance, the termID "" describes the average available bandwidth in Network Condition. Table 1 show some examples of the semantic references. To indicate these normative terms in SDP, we define a new attribute/value pair as shown in Table 2. "Offer" and "Answer" are distinguished by "recvonly" and "sendonly", respectively. The value in "Offer" means the threshold set by offerer, which is optional. The value in "Answer" is mandatory as return. In the adaptation framework, MME extracts the semantic inputs of AQoS and format them into SDP formats. During the Offer/Answer session negotiation procedure, the requested parameters are sent to UE in SDP. We assume that there is a module in User Equipment (UE) responsible for providing answers and monitoring dynamic conditions if necessary (e.g. presented by [35]). Accordingly, the answering values are also conveyed in SDP sending back to MME activating adaptation.

Table 1 Examples of semantic termID in DIA.
Table 2 SDP extension.

The proposed adaptation process is divided into three phrases: session initiation, session monitoring, and session adaptation. In the session initiation phrase, the party who invokes the service offers the default parameters in SDP by an SIP signaling message, normally SIP INVITE. Besides those well-known parameters as answer, MME extracts input parameters in AQoS and offers them again as request. Some input parameters can be answered immediately such as terminal capabilities and network capabilities, which is enough for modality selection. However, some of them need to be monitored in real time, for example network conditions. In case that any parameter varies out of the threshold set by AQoS, an SIP UPDATE with the specific SDP is feedback to MME. Once ADTE in MME receives the inputs and makes a decision, the adaptation starts with session renegotiation. In case of quality adaptation, MME commands the MANE with the new parameters.

5. Prototype and Evaluation

To verify the proposed approach, we develop a prototype system to demonstrate the scenario mentioned in Section 3. The prototype system is the integration of several open source projects as illustrated in Figure 6. On the server side, Open IMS Core [36] is deployed as IMS testbed. We make use of UCT Advanced IPTV [37] to provide IPTV service. MME and Text AS is set up by Mobicents SIP Servlet [38] and configured to connect with Open IMS Core. The client is simulated in the signaling plane and in the media plane separately.

Figure 6
figure 6

Prototype system.

The CAM4Home metadata are central to the proposed mashup model. Aforementioned, the CAM4Home project provides a web service platform for metadata generating, storing, and searching. In order to enable our client to access the service, we have deployed a gateway between IMS and CAM4Home. For metadata generating, a minimal set of data is required, such as title, description, and essence file identifier. In our case, CAM objects with QoS metadata (e.g. IPTV and Text) are generated by service providers and deposited in the platform. End users can search, aggregate, share, or annotate these multimedia resources through the gateway.

The system performance is analyzed in the signaling plane and in the media plane, respectively. In the signaling plane, we emulate IMS signaling client by SIPp [39]. The prototype system demonstrates that the proposed SIP/SDP extension works compatibly with the standardized IMS platform. We observe that there are notably two kinds of latency: UED collecting and ADTE. The first one is more related to the characteristics of UED themselves. For instance, if the screen size is considered in UED, it could be retrieved immediately by UE. But in terms of available bandwidth, it depends on the time for sampling. Without considering UED, we further observe that ADTE-incurred delay is 100ms averagely. To some extent, this result confirms that the metadata-based adaptation is efficient, because the precomputation saves significant time over parameter selection.

The media plane is correlated with quality adaptation. We simulate three types of terminal with various resolutions: mobile phone, smart phone, and laptop. These terminals are assumed to be connected with three kinds of access networks, General Packet Radio service (GPRS), Universal Mobile Telecommunications System (UMTS), and Worldwide Interoperability for Microwave Access (WiMAX), respectively. The terminal settings are listed in Table 3. The quality adaptation is simulated under the constraints of network bandwidth and terminal resolution. The SVC reference software JSVM 9.18 [40] is used as the video codec. The test sequence is ICE which is encoded with three spatial layers (QCIF, CIF, and 4CIF), five temporal layers (1.875, 3.75, 7.5, 15, and 30 fps), and two quality layers. The supported bitrates at various Spatial Quality and Temporal Quality are summarized in Table 4. Figure 7 shows the average bitrates of adapted videos. Figure 8 presents the output Peak Signal to Noise Ratio (PSNR) curves of adapted videos.

Table 3 Terminal, access network, and settings.
Table 4 Average Bitrate.
Figure 7
figure 7

Output bitrate of adapted video.

Figure 8
figure 8

Output Y-PSNR of adapted video.

It can be seen that the average bitrates of adapted videos are consistent with the settings. And the adapted videos have different qualities, measured by means of PSNR. Obviously the bitrates corelate with the values of PSNR. As we can see, SVC with the support ADTE is very suitable for quality-assured mashup. Considering this plane is more related to user experience, we plan to run usability tests in our future work.

6. Conclusion

This paper presented a metadata-based multimedia mashup framework in NGWN. It is not only provided scalable QoS management but also enhanced the sociality of mashup. To achieve that, we proposed a flexible framework using the CAM4Home metadata model as a bundle of multimodal media. MPEG-21 DIA was further integrated into CAM4Home model to meet end-to-end QoS requirements. We addressed the issues in supporting QoS from two aspects, namely, fidelity and modality, in order to tailor and adapt multimedia to the diverse terminals and the heterogeneous networks, as well as dynamic network conditions. The social annotations were used to enrich CAM4Home metadata collaboratively. Finally, a prototype system was developed on IMS architecture to validate the proposed model. With the use of rich metadata, context awareness, and personalization could be challenging topics in the future.


  1. Benslimane D, Dustdar S, Sheth A: Services mashups: the new generation of web applications. IEEE Internet Computing 2008, 12(5):13-15.

    Article  Google Scholar 

  2. Brodt A, Nicklas D: The TELAR mobile mashup platform for Nokia internet tablets. Proceedings of the 11th International Conference on Extending Database Technology (EDBT '08), March 2008 700-704.

    Chapter  Google Scholar 

  3. Falchuk B, Sinkar K, Loeb S, Dutta A: Mobile contextual mashup service for IMS. Proceedings of the 2nd International Conference on Internet Multimedia Services Architecture and Application (IMSAA '08), December 2008

    Google Scholar 

  4. Liu XZ, Huang G, Mei H: A community-centric approach to automated service composition. Science in China, Series F 2010, 53(1):50-63. 10.1007/s11432-010-0013-0

    Google Scholar 

  5. Rao J, Su X: A survey of automated Web service composition methods. Proceedings of the 1st International Workshop on Semantic Web Services and Web Process Composition, July 2004

    Google Scholar 

  6. CAM4Home Official Website,

  7. Yahoo Pipes,

  8. Microsoft Popfly,

  9. Yu Z, Zhou X, Yu Z, Zhang D, Chin C-Y: An OSGI-based infrastructure for context-aware multimedia services. IEEE Communications Magazine 2006, 44(10):136-142.

    Article  Google Scholar 

  10. Yu Z, Zhou X, Zhang D, Chin C, Wang X, Men J: Supporting context-aware media recommendations for smart phones. IEEE Pervasive Computing 2006, 5(3):68-75. 10.1109/MPRV.2006.61

    Article  Google Scholar 

  11. Yu Z, Nakamura Y, Zhang D, Kajita S, Mase K: Content provisioning for ubiquitous learning. IEEE Pervasive Computing 2008, 7(4):62-70.

    Article  Google Scholar 

  12. Zhou L, Xiong N, Shu L, Vasilakos A, Yeo SS: Context-aware middleware for multimedia services in heterogeneous networks. IEEE Intelligent Systems 2010, 25(2):40-47.

    Article  Google Scholar 

  13. Zhou L, Wang X, Tu W, Mutean G, Geller B: Distributed scheduling scheme for video streaming over multi-channel multi-radio multi-hop wireless networks. IEEE Journal on Selected Areas in Communications 2010, 28(3):409-419.

    Article  Google Scholar 

  14. Zhou L, Geller B, Zheng B, Wei A, Cui J: Distributed resource allocation for multi-source multi-description multi-path video streaming over wireless networks. IEEE Transactions on Broadcasting 2009, 55(4):731-741.

    Article  Google Scholar 

  15. Wu Q, Iyengar A, Subramanian R, Rouvellou I, Silva-Lepe I, Mikalsen T: Combining quality of service and social information for ranking services. Proceedings of the 7th International Joint Conference on Service-Oriented Computing, 2009, Lecture Notes in Computer Science 5900: 561-575.

    Article  Google Scholar 

  16. Cardellini V, Casalicchio E, Grassi V, Lo Presti F: Scalable service selection for web service composition supporting differentiated QoS classes.

  17. Thang TC, Jung YJ, Ro YM: Modality conversion for QoS management in universal multimedia access. IEE Proceeding Vision, Image Signal Process 2005, 152(3):374-384. 10.1049/ip-vis:20045084

    Article  Google Scholar 

  18. Thang TC, Jung YJ, Ro YM: Semantic quality for content-aware video adaptation. Proceedings of the IEEE 7th Workshop on Multimedia Signal Processing (MMSP '05), November 2005

    Google Scholar 

  19. Thang TC, Kim J-G, Kang JW, Yoo J-J: SVC adaptation: standard tools and supporting methods. EURASIP Signal Processing: Image Communication 2009, 24(3):214-228. 10.1016/j.image.2008.12.006

    Google Scholar 

  20. Eberhard M, Celetto L, Timmerer C, Quacchio E, Hellwagner H, Rovati FS: An interoperable multimedia delivery framework for scalable video coding based on MPEG-21 digital item adaptation. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '08), June 2008 1607-1608.

    Google Scholar 

  21. Schwarz H, Marpe D, Wiegand T: Overview of the scalable video coding extension of the H.264/AVC standard. IEEE Transactions on Circuits and Systems for Video Technology 2007, 17(9):1103-1120.

    Article  Google Scholar 

  22. Szwabe A, Schorr A, Hauck FJ, Kassler AJ: Dynamic multimedia stream adaptation and rate control for heterogeneous networks. Journal of Zhejiang University: Science A 2006, 7(1):63-69. 10.1631/jzus.2006.AS0063

    Article  MATH  Google Scholar 

  23. Schall D, Truong H-L, Dustdar S: Unifying human and software services in web-scale collaborations. IEEE Internet Computing 2008, 12(3):62-68.

    Article  Google Scholar 

  24. Maaradji A, Hacid H, Daigremont J, Crespi N: Social composer: a social-aware mashup creation environment. Proceedings of the ACM Conference on Computer Supported Cooperative Work, February 2010

    Google Scholar 

  25. Treiber M, Kritikos K, Schall D, Plexousakis D, Dustdar S: Modeling context-aware and socially-enriched mashup. Proceedings of the 3rd International Workshop on Web APIs and Services Mashups, October 2009

    Google Scholar 

  26. Mukherjee D, Delfosse E, Kim J-G, Wang Y: Optimal adaptation decision-taking for terminal and network quality-of-service. IEEE Transactions on Multimedia 2005, 7(3):454-462.

    Article  Google Scholar 

  27. Information Technology—Multimedia Framework (MPEG-21)—Part 1: Vision, Technologies, and trategy, 2002

  28. Golder SA, Huberman BA: Usage patterns of collaborative tagging systems. Journal of Information Science 2006, 32(2):198-208. 10.1177/0165551506062337

    Article  Google Scholar 

  29. Zhang H, Nguyen H, Crespi N, Sivasothy S, Le TA, Wang H: A novel metadata-based approach for content and service combined delivery over IMS. Proceedings of the 8th Conference on Communication Networks and Services Research, May 2010

    Google Scholar 

  30. OMA-UAPROF : User Agent Profiling Specification (UAPROF) 1.1. Open Mobile Alliance, December 2002

  31. IETF RFC 3840 : Indicating User Agent Capabilities in the Session Initiation Protocol (SIP). August 2004

  32. Kassler A, Guenkova-Luy T, Schorr A, Schmidt H, Hauck F, Wolf I: Network-based content adaptation of streaming media using MPEG-21 DIA and SDPng. Proceedings of the 7th International Workshop on Image Analysis for Multimedia Interactive Services, 2006

    Google Scholar 

  33. IETF RFC 3264 : An Offer/Answer Model with Session Description. June 2002

  34. Vetro A, Timmerer Ch, Devillers S: Information Technology—Multimedia Framework—Part 7: Digital Item Adaptation. ISO/IEC JTC 1/SC 29/WG11/N5933, October 2003

  35. Özçelebi T, Radovanović I, Chaudron M: Enhancing end-to-end QoS for multimedia streaming in IMS-based networks. Proceedings of the 2nd International Conference on Systems and Networks Communications (ICSNC '07), August 2007

    Google Scholar 

  36. Magedanz T, Witaszek D, Knuettel K: The IMS playground @ FOKUS—an open testbed for next generation network multimedia services. Proceedings of the 1st International Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities (Tridentcom '05), February 2005

    Google Scholar 

  37. UCT Advanced IPTV,

  38. Mobicents,

  39. SIPp,

  40. SVC Reference Software (JSVM Software),

Download references


This paper was supported in part by the projects of SERVERY and CAM4Home. The authers would like to thank all partners for their contributions and thank Hui Wang and Mengke Hu for their simulation work.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Hongguang Zhang.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Zhang, H., Zhao, Z., Sivasothy, S. et al. Quality-Assured and Sociality-Enriched Multimedia Mobile Mashup. J Wireless Com Network 2010, 721312 (2010).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: