Skip to main content

ESVD: An Integrated Energy Scalable Framework for Low-Power Video Decoding Systems


Video applications using mobile wireless devices are a challenging task due to the limited capacity of batteries. The higher complex functionality of video decoding needs high resource requirements. Thus, power efficient control has become more critical design with devices integrating complex video processing techniques. Previous works on power efficient control in video decoding systems often aim at the low complexity design and not explicitly consider the scalable impact of subfunctions in decoding process, and seldom consider the relationship with the features of compressed video date. This paper is dedicated to developing an energy-scalable video decoding (ESVD) strategy for energy-limited mobile terminals. First, ESVE can dynamically adapt the variable energy resources due to the device aware technique. Second, ESVD combines the decoder control with decoded data, through classifying the data into different partition profiles according to its characteristics. Third, it introduces utility theoretical analysis during the resource allocation process, so as to maximize the resource utilization. Finally, it adapts the energy resource as different energy budget and generates the scalable video decoding output under energy-limited systems. Experimental results demonstrate the efficiency of the proposed approach.

1. Introduction

With the growing popularity of portable video applications, such as portable video smart phones, mobile video terminals such as PDA, and vehicle DVD devices energy consumption of video decoders becomes an important design requirement. Lots of compression codecs are issued for the several major video code standards, including MPEG4/2, H.264/3, and AVS. Generally, decoders focus on the performance while rarely support dynamic decoding process to meet the variable energy resources. However, most portable video application devices operate on batteries with limited-energy supply. The capacity of battery in portable devices is limited, as well as the usable capacity of the battery declines with using time. Thus, power should be used economically to provide longer service time. Then, how to make the video decoder adapt resource in handheld devices? How to maximum video decoding quality under battery constraint when playing on portable terminals? This paper tries to answer above-mentioned questions.

In this paper, we proposed simple, energy-scalable video decoding algorithms for energy constraint terminals to save power and improve video quality. Moreover, we complement these algorithms with device energy aware method to lengthen the available time of video services. This is implemented through maximizing the decoded available video frames at a given power budget. The algorithm, called ESVD, means an integrated energy-scalable video decoding framework for low-power video decoding applications. ESVD uses energy profiles as scalable management guideline. Each energy profile is equivalent to an energy constraint budget. On such ESVD, algorithms use utility theory to find the best energy levels for each of the subfunctions in decoding.

In ESVD system, video decoder can dynamically adapt the variable energy resources through energy aware technique. ESVD helps the decoder combine decoded data with the decoding process. Video decoder can work under variable energy resources constraint marked with different energy consumption budgets and provide a wide scope adjustable decoding energy output. Besides, it uses utility theory to solve the tradeoff between decoding effect and energy consumption, so as to obtain better performance in each energy levels.

This paper is organized as follows. Section 2 describes related work; Section 3 gives label and parsing method so as to provide a sufficient conditions for the ESVD; Section 4 describes the energy-scalable video decoding algorithms; Section 5 evaluates them; and Section 6 concludes.

2. Related Work and Backgrounds

The contributions of the paper are related to several areas of work, which we consider in turn.

2.1. Designing Low-Power Video Encoders

2.1.1. Scalable Video Decoders on Terminals

De Schrijver et al. [1] study the scalable video codec. They consider the memory, processing power, and bridge these with amount of bandwidth which comes from video fragment. Thus, scalable function is from the encoded scalable video bitstreams. Yanagihara et al. [2] propose CPU load-scalable video decoder algorithm, it uses several DCT manipulations such as low-pass filtering and resolution conversion in DCT domain. The decoder aims at the application of multichannel multicast system. Their work is rudimental to ours. Landge et al. [3] propose a systematic framework to optimize the energy consumption. They are in view of wavelet-based video decoders and use generic computational complexity metrics derived from the frequency of execution of program basic blocks. Since the decoder often does not know beforehand the encoded streams, this scalable function is obtained postmanufacturing and is unique to each codec system.

2.1.2. Designing Low-Power Video Decoders

Masselos et al. [4] design a low-power decoders based on the replacement of the image block by the selected codeword in the output image. Besides, they use efficient transformations to the codewords to compensate for the quality degradation introduced by the small codebook size in the encoder side. This method reduces its memory requirements so that it gets lower power consumption. Szu- Lee and Kuo [5] integrate the encoder selected proper interprediction modes and then generate a video bit stream. This method enables the encoder to estimate the decoding complexity and choose the best inter prediction mode to meet the complexity constraint of the target decoding platform. In a word, these methods rely on the encoder to reduce resource consumption of decoder. From integrated circuit aspect, Liu et al. [6] derive rapid algorithm in IDCT, deblocking filter and prediction, which can reduce the processing cycles and reduce the memory size and access frequency. These methods are the main measures for lowering the power consumption. The work is also complementary to ours. The other low-power design techniques include skipping computation in zero components, using lower constant multipliers, reducing transitions in the data path, and self-adaptive techniques. These methods acquire good effects in IDCT and prediction compensation modules, corresponding research examples include August and Ha [7] in IDCT and prediction, Tsung- Tsai and Fang [8] in VLC, and Xu and Choy in [9] self-adaptive prediction. We combine thoughts in scalable decoder and methods in low-power design so as to achieve integration scalability and efficiency.

2.1.3. Complexity Power Mapping in Video Decoders

From encoder aspect, researchers have developed how to measure the power consumption in video encoders. He et al. [10] analyze the rate-distortion (R-D) behavior of video encoding system under the energy constraint. Based on power-rate-distortion (P-R-D) model in [10], they prove that power is tightly coupled with rate, thus, to trade bits for joules and to perform energy minimization are rapid method to obtain minimum energy [11]. Though these models are proposed based on the encoder, they can be used for reference in low power decoding design. From decoder aspect, existing approaches use the complexity metrics as the main measure methods on the first step; these metrics include counting the number of base operations [12], and memory access frequency [13] and occupation. On the second step, use mapping relations between complexity metrics and power or energy consumption to evaluate the accurate loss value [10, 14]. We combine the complexity metrics and power mapping methods, which in turn guide the control of optimal algorithm design to optimize the energy consumption.

2.2. Complexity Metrics in Video Codec

2.2.1. Complexity Evaluation in MPEG

It is largely recognized that MPEG standards play a major role in the starting and development of multimedia communications and applications [15]. From the compression ratio point of view, MPEG possesses an important role of low-bit-rate video coding. From the complexity point of view, MPEG provides three tools to evaluate video codec complexity so that it controls the resources required at the decoder. Through these models, we can set boundaries on memory and computational requirements. The MPEG-4 standard defines video buffering verifier mechanism, which includes three virtual buffer models, named the video rate buffer verifier (VBV), the video complexity verifier (VCV), and video reference memory verifier (VMV). There, the VCV model is applied to all macroblocks in an MPEG-4 video bitstream and is used to verify the computational power required at the decoder. The model is defined in terms of the VCV MB/S decoding rate and VCV buffer size and is applied to all MBs in the scene [16, 17]. It mainly aims at the processing speed, defines in terms of the number of macroblocks (MBs) per second, and determines whether the decoding resources fit within a certain profile so as to not exceed the values specified for the corresponding profile and level.

In VCV model, the computational complexity of the decoder is defined by bridging the data rate, and the number of MBs per second that the decoder has to decode. Indeed, the computational power consumption required by each MB decoding may largely vary with the MB types. According to careful analysis in [16], the ways to measure the decoding complexity of the encoded video data can be associated to the rate of the following parameters, including the number of MBs, the number of MBs per shape type such as boundary or transparent, the number of MBs per combination of texture and shape coding types, and the number of arithmetic instructions and memory Read/Write operations. Therefore, the number of MBs per combined coding type is a better method to represent the major factors determining the actual decoding complexity from the compressed data. Based on this, an alternative VCV model is proposed in [18], which allows a more efficient use of the available decoding resources. The model indicates that the decoding complexity can be measured by a combination of the MB complexity types and the number of MBs in corresponding different types. Thus, the decoding complexity can be evaluated and characterized by a combination of scenes, shape, and texture coding tools. This model enhances the VCV model because of complementing some determining factors. Furthermore, simplified control method in [18] can be adopted to distinguish the various types of MBs in terms of decoding complexity, in which the complexity weights can be defined relatively to the most complex MB type in the context of each profile. This means MPEG-4 decoders in most critical cases can be a compliant decoder, making a better supplement of the video complexity verifier model.

2.2.2. Complexity Evaluation in H.264

H.264/AVC represents many advanced techniques in standard video coding technology, and promises some significant advances of the state-of-the-art video coding techniques in a broad variety of applications [19, 20]. Compared to previous standards, H.264/AVC is given with respect to the coding efficiency and hardware complexity [21]. Indeed, assessing the complexity of a video coding standard is not a straightforward task; the same is true of H.264/AVC. Though the complexity heavily depends on the characteristics of the platform on which it is implemented, there are still mapping metrics to evaluate implementation complexity. Reference [21] analyse the complexity of H.264/AVC based on the new versions of the executable H.264/AVC specification, which includes updated tool definitions and can achieve a reduced complexity [22]. This analysis divided the H.264/AVC decoder into six parts, these are CABAC, RD-Lagrangian optimization, B-frames, Hadamard transform, deblocking filter, and displacement vector resolution. And it analyzes these parts in detail from the access frequency aspect and decoding time aspect.

2.2.3. Complexity Metrics in Video Codec

Generally speaking, the VCV model and the alternative VCV model are both based on measuring the decoding complexity in terms of the number of MB. The relative complexity weight for each MB complexity type is thus obtained as the ratio between the maximum decoding time for each MB type and the highest maximum decoding time from all the MB types relevant in the decoder profile. This method is widely adopted in the video codec, such as [23].

The measurement flow of video complexity evaluation systems such as video codec can be typically divided in several main steps.

() Algorithmic development phase. This first step focuses on algorithmic performance. The algorithmic specification is typically released as a standard description plus a software verification model [24]. In this phase, complexity cost function in C-level analysis is needed. Efficient implementation based on each algorithm is adopted while it guarantees performance [25]. This phase focuses on deducing complexity, leading to high performance and enabling low-power realizations in algorithm-specific complexity level.

() Evaluation flow phase which deals with the actual system realization is based on a specific platform. The true implementation complexity of the algorithm based on universal platform can be acquired. Can this stage determine the cost of each module or each algorithm in some series terminals and, hence, its success and widespread diffusion or not?

On the other hand, memory access consumption is another key factor in power consumption. In video decoding, the primary design goal is to reduce memory transfers between large frame memories and data paths. Many researches summarize the cost of a data transfer into a function of the memory size, memory type, and the access frequency, such as [5, 13, 26]. The measure method is the number of accesses per second instead of the clock frequency [26]. To accurately calculate the dynamic cost in each frame during decoding is a difficult job. Thus, in [12], they provide the upper limit of memory consumption.

3. Parsing and Labeling Video Decoding

The main low-power techniques targeted at achieving lower consumed processing cycles and memory requirements are both described and discussed in Section 2. In this part, we address in analysis how to partition the decoder so as to provide scalable output.

In most cases, there is not enough residual capacity of battery to enable portable devices users to watch any video programs at any time as they wish, because of the exhausting battery. At the same time, in general video decoding systems, each module consumes a different amount of power and can affect a different rating of video quality. That is, the modules have different contributions in an environment with energy/battery constraint. Therefore, there is a tradeoff between maximum available lifetime of battery and minimum distortion caused by as possible as balanced decoding control.

Given that the residual capacity levels of battery can be substantial, it makes sense to schedule modules and perform power management as if the scalable affected was a heterogeneous system. On the other hand, most video decoders nowadays, especially in real time mobile video applications, are paid more efforts in improving robustness. For example, data partition techniques in H.264, decoder with little redundancy information or with little support from the encoder side. In this case, useful information can be introduced to help decoder. In this environment, there are three high-level control issues. The first is the MB types in coded data; the second is the detailed MB partition information; the third is the effect of human visual properties on single image. Based on these configurations, we present a set of energy-scalable algorithms for video decoding scheduling and energy management, aimed at minimizing power and maximizing video quality. The scheduling algorithms are intended to complement the scheduling criteria produced by the parsing and labeling control, such as priority, and fairness. In the following, we give the detailed parsing and labeling processing.

3.1. MB Type Information

In the first place, MB type information is considered as the primary criterion in decision since an intra MB is decoded without referencing any MB in another picture [27], but may be referred to by other inter MBs. Usually, intra MB is taken for more importance than inter MB. Thus, the intra MB block is marked as , the inter MB is marked as , and inter MB in B frame is marked as , which are denoted in (1). It means, from block type aspect, intrablocks and intra frames are assigned and processed in high energy profile comparing with interblocks. In fact, VCV also introduced MB type information as main decoding control term, which had been discussed in Section 2.2,


where is the results of paring MB type information. denotes the position index of an MB.

3.2. MB Partition Information

In the second place, the MB partition information is considered as secondary criterion in decision.

Each intramacroblock could be classified into several modes including intra_, intra_ , and intra_. Each intermacroblock in P frames could be partitioned into inter_ , inter_ , inter_ , inter_ , and inter_ . In a word, there are following partition modes in macroblock, these are , , , , , , and . Among these, if a block is partitioned into mode, then it is the finest block and may be assigned in top level profile; while if a block is in mode, it belongs to coarse block and is in bottom level profile. The MB partition information can be easily extracted after entropy decoding. Thus, the partition information becomes a criterion in assigning the macroblock into different energy profile. Here, for simplicity, we use the energy controlling parameters to mark the blocks or macroblocks so that we can obtain a reasonable distribution in the energy profile. denotes the controlling level while , and the values corresponding to MB partition information are in the following:


Generally, a macroblock can be regarded as a combination of basic blocks which belong to different partitions. The basic block is defined in block in H.264 [19] and is defined in in MPEG2 [28] and AVS [29]. Hence, the marked coefficient for a macroblock is deduced through the partition results of basic blocks. Weighted sum method is adopted in this paper. For instance, a macroblock consists of four blocks in top left corner, two blocks in top right corner, two blocks in bottom left corner, and an block in bottom right corner. Figure 1 shows the partition results. Then, the final effected coefficient which decides the macroblock into appropriate energy profile is .

Figure 1
figure 1

A example of MB partition information computing.

3.3. Effect of Human Visual Properties

In the third place, the effect of human visual properties is considered as third criterion in decision. In many video applications, clients would pay more attention to the regions of their interest. For example, if the shoulder and head video is always existed in video applications, the region of interest (ROI) of clients is usually the human face instead of the background. Thus, for the decoder, more resources including bits and computational power are desired to be allocated reasonably according to the human subjective effects to improve the overall visual quality [30]. From the objective aspect, [31] gave a detailed segmentation strategies for an image. The paper analyses main segmentation approaches for multimedia services from the viewpoint of their features. The first one consists in estimating segmentation scope through the position of the transitions and marks the separation between neighboring regions. This approach has been mainly successful for the temporal case and being applied to both spatial and temporal segmentation problems. The second approach consists in estimating the region through homogeneous elements according to the feature space. This approach has been mostly applied to spatial and spatial-temporal segmentation. Here, we applied the segmentation thoughts and ROI technology to the image region decision. We mark the region in image based on human's attention degree. The technology of ROI is adopted as an efficient tool for the reasonable classification of image; it could be used to divide an image into several parts into different level. When the available battery energy is not enough, the ROI information is used to optimally allocate the available energy to different parts of the image according to their relative level. Since the central region in an image will be concerned firstly according to the habit of human being, the blocks in central region is allocated to higher energy profile than the surrounding region. As shown in Figure 2, the marks of the human's attention degree are dispersal from central to surrounding regions, then the energy controlling parameters can be marked as (3), where denotes the position index of an MB,

Figure 2
figure 2

Utilities as a function of decoding effect and power consumption.

These parsing and labeling configurations provide the sufficient conditions for the following energy-scalable algorithms. Then the energy profile scheduling and energy scalable management rely on the criteria produced by the parsing and labeling control, including priority, and fairness. In the next section, we develop a model of energy-scalable video decoding (ESVD).The overall energy consumption could be optimized after these methods, at the same time the ESVD can guarantee the best video decoding quality in energy constraint circumstance.

4. Energy-Scalable Video Decoding Model

In this model, different energy profiles are equivalent to different energy consumption level, and video decoder runs at these profiles. In this scalable energy profiles, the most obvious optimization goal is to maximize performance at a given power or energy budget. Given the complexity or the power budget of this environment, to reasonably design the algorithm for scheduling and for energy or power management, a global optimization solution is required.

Section 2 shows possible algorithms to maximize performance at the target power. To simplify the problem, we construct parsing and labeling processing in video decoder in first step, which is given in details in Section 2. These provide the foundation of ESVD. On the other hand, in most video decoding systems, especially for mobile applications, there is a limited system energy supply. Most of the services or functions in mobile devices have estimable power consumption. It means that the upper bound of the consumption can be acquired. Generally speaking, the total consumption is measured by the available battery capacity, that is, the energy consumption is inverse proportion to the available battery lifetime. Strictly speaking, the energy consumption in general video processing applications results from a number of factors, including the number of functions in using regulations, operation systems, hardware, and battery life. Most researches distinguish between two types of power constraints, namely peak constraint and average constraint. Here, we propose another type of power constraints, which is a bound constraint . We use to represent a function, in which represents the th subfunction in function . represents the minimum energy requirement required to implement a function . For video decoder, a bound of energy constraints also exists. It implies that the optimal energy control method can be obtained when the total energy consumption is deduced by the method tends to the energy bound as closely as possible. Of course, the video decoding function contains many subfunctions such as interpolation (INTP), deblocking filter (DF), entropy decoding (END), and inverse transform (IDCT) [32]. According to bound constraint definition, designing an optimal energy/power consumption video decoding system can be transferred to find the best control among these subfunctions to achieve lower power/energy consumption, so that we can prolong the available battery duration.

The above discussion shows the possibility to maximize performance at different target power level. To resolve the problem, we decompose it into two steps. First, we use parsing and labeling processing to map the subfunctions in unit of MB in video decoder, so as to generate scalable video decoding output. Second, we use power management algorithm to find the best configuration in subfunctions for each power profile and at the same time, maximizes overall performance while keeping lower power consumption.

4.1. Energy Scalable Management in Video Decoder

To compute the integrated weight of MB in order to assign it into appropriate energy profile, the proposed three decision phases in Section 3 are in combined calculation. This needs a mapping bridge between the levels in each phase. This problem is solved as follows. Given a set of subfunctions in video decoding function in unit of MB, each subfunction can run at levels, there is power consumption level, correspondingly. Then this problem can be summarized as finding the best selection of power consumption levels for the subfunctions, at the same time it can maximize the decoding quality subject to the constraints: each scalable power consumptions in whole video decoding is less than in each energy/power profile. Our approach is to reduce the problem to a linear optimization problem. Overall, from parsing and labeling procedure, we map the labeling results on energy/power profiles orderly. To be specific, from the subfunctions, we select the subfunctions in order for MBs and in round robin manner for the whole video sequences decoding.

Here, to be simplified, linear weight control


where , and represent the effects on the total performance for each phase, separately. We give a simple example firstly. Then the final value can assign the macroblock into appropriate energy profile. For instance, if the video is encoded in AVS, assuming that the initial is in empirical way and , thereby the maximum marked coefficient for a macroblock is and the minimum marked one is zero. We can get the marked bound of a macroblock as . Suitable levels can be classified either in theoretical way or in empirical method, then there are different intervals corresponding to the levels. represents coarse level, denote half accurate level, and is in accurate level, for the sake of clarity, equal configuration is used, that is . For example, given a coded frame, after entropy decoding, the macroblock information is extracted as follows, the type is intra, the partition belongs to , the position lies in central adjacency region, and still comes into existence. Then the finial marked coefficient is calculated through (4). It means that the labeling energy index of this macroblock belongs to the corresponding energy profile.

4.2. Utility Function in Power Control Scheme

As a frame decoding is composed by subdecoding in unit of MB, MB encoding is also under common resource constrained. Each MB's decoding is a competitor of battery energy for others. On the other hand, PSNR and bit rates are the measurement of decoding quality of all MBs. Ideally, an MB unit would like to achieve normal quality of decoding effect while expending a small amount of energy. In some cases, better decoding effect or long duration decoding and playing are in anticipantion even if the available battery capacity is not enough. For example, most mobile terminals can work in different battery states including "Maximum battery life mode,'' "Battery optimized mode,'' "Maximum performance mode,'' and "Enhanced quality mode''. Each battery state corresponds to a battery working mode of the device. These states are widely used in mobile devices and terminals. It is desired that video decoder should provide corresponding decoding output to match these working states. Thus, it is necessary to optimize the video decoding process under battery resource constraint. Obviously, it can be transformed into a kind of tradeoff between obtaining better decoding effect and obtaining lower energy consumption in corresponding working state. Finding a good balance between the two conflicting objectives is the primary focus of the power control component of resource management. This tradeoff is illustrated through the conceptual line in Figure 2. If the decoder power is fixed, the terminal would experience high decoding effect which leads to increased reasonable allocation of the system resources. If the decoding effect and quality is fixed, increasing the power consumption expedites the battery drain, which reduces the effective use of the mobile terminal.

The optimal power control algorithm for video decoding systems should maximize the decoding quality. Traditionally, the object is to achieve acceptable PSNR as the measurement of decoding quality. However, this single target is not enough for efficient video decoding. This is because the object on power consumption is another important factor in applications. It is clear that a high PSNR level at the decoding output will result in better decoding effect. However, achieving a high PSNR level often requires the terminal to work in high power consumption state, which, in turn, results in low battery life. These issues can be quantified by defining the utility function of an MB decoding unit, which is defined as


where and . For , represents the battery power consumption of the decoder in normal state, while means the battery power consumption in corresponding energy profile. Accordingly, is the quality in full decoding state, while represents the decoding quality in corresponding energy profile. Utility as defined above combines the decoding quality and power consumption. The efficiency function yields the desirable properties. Assuming perfect case and means the decoder is under the full-state decoding. The mobile terminal can work in "Maximum performance mode'' or "Enhanced quality mode.'' In this case, the decoding quality will obtain maximum value. On the other hand, is a monotonically increasing function of the . That is, in case of fixed target power consumption , for decoding schemes, the best strategy for MB encoding is to make a decision for each subfunction, so as to acquire maximum utility . This suggests that, in order to maximize utility, all MBs in the video decoding system should try to improve the decoding effect while as possible as less consume the energy. So that the utility function is suitable for measuring power efficiency of video decoding systems.

4.3. Energy Allocation Scheme Based on Macroblock Tracking

As mentioned above, most mobile terminals provide many working states such as "Maximum battery life mode,'' "Battery optimized mode,'' "Maximum performance mode'' and so on. Accordingly, supposing that video decoder can provide corresponding decoding output to match these working states. Each energy profile , corresponds to a decoding level. Then the goal is to adjust the decoder state in unit of MB to obtain best decoding quality under energy consumption budget . Following the arguments in (5), there is


where and . For example, if there is video decoding data in CIF format, then , , , and so forth. From the discussion above, all MBs in a frame are parsed and labeled into different scalar quantity, here we use to represent the final labeling result of each MB. Then the MBs in a frame can be allocated into different energy profile levels according to their labeling results. As the decoder is divided into several levels in unit of MBs, we relate these MBs with different decoding state to realize fine allocation. Define that the number of decoder states is , and then it is obviously that the number of MBs is usually unequal to . This leads to an optimal problem. That is, we should configure these MBs into suitable decoding states to obtain better decoding quality. From (6), we have


As mentioned above, we classify these MBs into three levels for the sake of simplicity and define that the MBs in the same level has the same energy budget. For each MB in level 1, let the energy budget is , accordingly, for each MB in level 2 and each MB in level 3, the energy budget be and , separately. Then (7) can be rewritten as


4.4. Decisions Using Learning Method

As we known, it is difficult to obtain the accurate correlation between PSNR and energy consumption level. Thus we use machine learning tools [33] to exploit the correlation and derive decision table to classify the MBs into corresponding decoding levels. Machine learning method refers to the study of decoding states to acquire knowledge from experiences. It deduces new knowledge from existing rules and uses the analysis of a set of experiments or examples, for creating a set of rules to take decisions. Thus, the correlation problem is posed into two sub-problems: one is to collection the variation of PSNR and energy consumption in each decoding state; the other is to classify these data into suitable modes according to their utilities. In next section, we give the detail of subfunctions design in unit of MB. And carry out a performance evaluation of each subfunction in terms of its variation of PSNR and the variation of PSNR and energy consumption results.

5. Implementation During Video Decoding

The overall energy consumption could be optimized after these methods; at the same time the ESVD can guarantee the best video decoding quality in energy constraint circumstance. For the sake of clarity, the whole energy constraints are summarized as the total summation of each function which the applications support. In practice, the functions cover a variety of applications. In contrast, the average power constraint can be imposed on the overall consuming power in universal user application circumstance.

Motivated by the previous discussion that all macroblocks are classified into several energy/power profiles, we design a device resource perceptual module. This module implements a mapping bridge between the energy profile and the device available resource. This module includes two part functions. Part 1, user can specify the working state of video service. These states include "Maximum battery life mode,'' "Battery optimized mode,'' "Maximum performance mode,'' and "Enhanced quality mode''. As mentioned above, each state corresponds to a battery working mode of the device. Part 2, to automatic adapt the working state of video service according to remaining battery capacity perception. For instance, "Maximum battery life mode'' can be configured automatically when the residual capacity is under 30%, while "Enhanced quality mode'' adopted automatically in the case of available battery capacity is above 80%. To be specific, when the result of part 1 and result of part 2 are not matched, that is, user configures the device as "Enhanced quality mode'' but the residual capacity is under 30% at that time, the final available profile is based on perceptual remaining battery capacity results. That is to say, part 2 has higher priority than part 1, and user can manually specify the working state only when the device resource is sufficient.

It is widely accepted that END, IDCT, INTP, and DF are the four main subfunctions in universal video decoder. Consequently, the following discussion is based on these four subfunctions. The implementation of each energy profile is described in Figure 3, and the modules are listed as follows.

Figure 3
figure 3

Illustration of the power scalable control video decoding system.

5.1. IDCT SubFunction

The complexity of IDCT subfunction in decoder has closed relation with the inner non-zero parameters. Researches provide many scalable methods, for example, [34] using different proportion subrectangles in blocks to output scalable computation IDCT. In general, the energy of the DCT coefficients is dissipated among the zigzag scan of the block. The low-frequency component in left-upper corner has higher energy, while the high-frequency components in right-lower corner contain lower energy. Thus, we progressively omit the data along the inverse zigzag scan, from right-lower corner to the left-upper corner, so that obtain minimal output quality degradation and at the same time achieve scalable energy consumption. Here, we classify the energy profile in IDCT subfunction into four degrees, including accurate-level, saving-level, coarse-level and non-IDCT. When accurate-level is selected, the whole parameters computation is implemented as shown in Figure 4(a). Many simplified methods can be used such as 1D IDCT optimization so as to minimize the energy consumption possible as. Figures 4(b) and 4(c) show the cases of optimal-level and matching-level, separately. The main difference between the two levels is the number of computing parameters. The number implies corresponding processing levels.

Figure 4
figure 4

Data pruning patterns in IDCT.

5.2. Motion Compensation and Interpolation SubFunction

Motion and residual information is generated from compressed bits after entropy decoding. Interpolation of reference samples to generate a motion-compensated prediction is generally performed for each macroblock that is intercoded [12] and occupies most complexity in motion-compensated prediction. Thus, the average time required by the interpolation subfunction is approximate to a function of the number of intercoded macroblocks. The most straightforward approach to classify this subfunction is to fully interpolate and fully compensation operations. In this level, quarter-pixel motion compensation is replaced by half-pixel operations, it forms a saving mode with little quality decline while computation is saved. Accordingly, substituted interpolation modes in unit of half-pixel and integral-pixel compensation by integer interpolation results are adopted in the other energy profile, separately.

5.3. Deblocking Filter SubFunction

Deblocking filter which is often referred to as a loop filter is the final stage of the decoding process. DF subfunction reduces the blocking effect that is introduced by encoding the process at block boundaries. Comparatively high complexity of the subfunction is in consensus. Even after a tremendous effort in speed optimization of the filtering algorithms, the filter can easily account for one-third of the computational complexity of a decoder [35]. The complexity is mainly based on the high adaptivity of the filter, which requires conditional and decisional processing on the block edge and sample levels, thus, there are many conditional branches in the filter which leads to excessive power consumption. At the same time, for a macroblock, the vertical filter begins from left-most edge and is followed from left to right by the three vertical edges; besides, the horizontal filter begins from top edge, and is followed by the three internal horizontal edges from top to bottom. Amount of relevant and candidate pixels should be loaded into the memory, this leads to additional power consumption either. Scalable energy can be achieved by classifying the filtering process into three levels, including full, half, and rough filtering. Among these, full filtering operation means that overall branch filtering is implemented for the macroblock. And, half filtering represents the operation reduced in computational complexity, which can be achieved by taking into account the fact that the image area in past frames is already filtered, and thereby optimizing or omitting the filtering process accordingly. For the rough filtering, skip operation is used with low quality degradation, while the lowest power consumption of the DF subfunction is required in this mode.

Besides, learning tools are used to analyze the data sets of decoder. The decision table will be used to determine the decoding modes of an MB. Inductive learning uses the analysis of data sets for creating a set of rules to take decisions.

Then a decision table is built as the decoding rules. This table is from a set of experiments or examples, collected as the training data set. We build information database to gather the decoding states. This set of data including the following properties: (0) full decoding mode; () decoding without deblocking filter mode, which corresponds to deblocking filter subfunction adjusting, () quarter pixel interpolation is compensated by half-pixel interpolation, () quarter pixel interpolation and half-pixel interpolation are both compensated by integer-pixel interpolation; these two cases are corresponding to motion compensation and interpolation subfunction adjusting, () data pruning pattern in IDCT complies with saving-level, () data pruning pattern in IDCT follows coarse-level; () data pruning pattern in IDCT follows low-level; these three cases are brought into correspondence with IDCT subfunction adjusting. Figure 5 gives the influence on energy consumption and PSNR under different decoding rules, separately.

Figure 5
figure 5

Influence on energy consumption and PSNR under different decoding rules. Influence on energy consumptionInfluence on PSNR

Affiliated subfunction: discussion on error concealment subfunction. Error concealment technique aims at obtaining a close approximation of the original signal or making the output of decoder closely accepted by human eyes [36]. Most error concealment techniques are based on block matching algorithms [37] or adaptive techniques in unit of block such as [38]. It can improve the decoding quality while it leads to less computational complexity. Due to the energy consumption which lies in computation, memory occupation and memory access, the effect of error concealment on additional power consumption is more than that on complexity. Here, we classify the error concealment operation into three levels to adapt the scalable energy profiles. This classification is based on scene and region change and on the unit of block. Thus, the macroblock can belong to three energy profiles, including accurate concealment in case of scene change, half concealment in case of regional variability, and coarse concealment when few and no movements take place.

Reference [39] gives an analysis of H.264/AVC decoder in computational complexity, and [12] presents detailed analysis in both computational complexity and memory occupation complexity. For the aspect of the complexity in AVS video decoding, [32] is provided an approximate estimation. Generally speaking, for most video decoders including H.264, MPEG4, AVS, and so forth, the computational power allocation with emphasis on power-distortion (P-D) [10] can be expressed in form of cost functions. We take power consumption in video decoding into account by modifying the power-distortion-complexity (P-D-C) cost functions in processing unit of macroblock and subfunctions in decoder. Through the objective function in (8), dynamic scalable assignments provide a local quality optimum in each energy profile. Consequently energy scalable video decoding (ESVD) is achieved. An undeniable fact is that scalable video decoding leads to the quality degradation. Thus minimizing this degradation is another purpose in ESVD.

6. Experimental Results

6.1. Building Energy Consumption Information Database

In this subsection, we use Application Energy Graphing Tool[40], which can measure the battery power consumption of an application over time, log and graph the resulting data. We use it to profile the energy distribution of the decoding modes. To calculate the energy consumption in the case of subfunctions modes, we assume that all other possible operations among the subfunctions are running, expect the testing mode. It means it will occur in power control schemes in practices that decoding data will be ergodic to all basic subfunction units in despite of some skipped or simplified operations. The reason is that compressed video data includes multifeatures, thus the decoding process varies with these features. For instance, for the same decoding program, the decoding time is different among the typical sequences such as mother, waterfall, tennis, ship, bus, and paris. Thus we use the typical video sequences as the test video set. The format is CIF and coded in AVS standard. We recycle the decoding process until the number of decoding frames obtains 15000 frames in each sequence. Figure 6 shows the total energy consumption and corresponding PSNR in each decoding rule. The results are based on statistical experimental average.

Figure 6
figure 6

Stat. on energy consumption and PSNR for different video sequences in each decoding rule.Sequence motherSequence waterfallSequence tennisSequence motherSequence waterfallSequence tennisSequence shipSequence busSequence parisSequence shipSequence busSequence paris

The decision table will be used to determine the decoding mode of MBs, based on the information gathered during the preanalysis of the decoder. This process can be more accurate by the information update during the decoding stage. Figure 5 depicts the process for building the decision tables from the results in Figure 6. For example, when the decoder works on mode (1), decoding without deblocking filter mode, little PSNR is lost but about 15% energy consumption saving can be obtained; when the decoder works on mode (5), data pruning pattern in IDCT follows to coarse-level, only around 10% energy consumption saving can be obtained but 85% PSNR losing occurs that is, when the energy budget is not full enough to support full-mode decoding, mode (1) is a better choice than mode (5).

6.2. The Performance of the ESVD Model

To evaluate the performance of the ESVD model and the energy scalable video decoding system, we implement the proposed ESVD model and energy scalability scheme in the AVS decoder software. The ESVD model is not limited to the video coding standards, and thus similar performance can be expected for other coding systems, such as H.264 and MPEG-4. We select stochastically "waterfall" CIF sequence at 128 kB/s and 25 fps as the testing sequence. We performed two sets of evaluations—one is for evaluating decoding scalability and the other for evaluating scalability quality. We let the decoder work under four modes. The energy consumption budgets are descending. The scalable results including PSNR and energy consumption shown in Figures 7 and 8 show the subjective quality in different decoding modes, separately. Each mode corresponds to energy consumption budget ratio compared to the full decoding mode. These experiments show the scalability and efficiency of ESVD.

Figure 7
figure 7

Influence on energy consumption and PSNR under different decoding modes. Energy Consumption in each decoding modesPSNR in each decoding modes

Figure 8
figure 8

Subjective quality in the different scalable modes. In 90% budget modeIn 80% budget modeIn 70% budget modeIn 60% budget modeIn 50% budget modeIn 40% budget modeIn full mode mode

7. Conclusion and Future Work

This paper proposed ESVD framework in power control video decoding systems. It aims at providing the scalable decoding output which is adaptive to energy resource. It proposed a method to make the video decoder adapt resource under battery constraint, which can be widely used in handheld devices. At the same time, it gives a method to maximum video decoding quality when playing on portable terminals, through building a decoding information database. The experiments demonstrate the efficiency of ESVD. In future research, we will try to study fine-grained energy scalable control in energy consumption through improving the scalability of each decoding module.


  1. De Schrijver D, Poppe C, Lerouge S, De Neve W, Van De Walle R: MPEG-21 bitstream syntax descriptions for scalable video codecs. Multimedia Systems 2006, 11(5):403-421. 10.1007/s00530-006-0021-5

    Article  Google Scholar 

  2. Yanagihara H, Sugano M, Yoneyama A, Nakajima Y: Scalable video decoder and its application to multi-channel multicast system. Proceedings of the International Conference on Consumer Electronics (ICCE '00), June 2000 232-233.

    Google Scholar 

  3. Landge G, Van Der Schaar M, Akella V: Complexity metric driven energy optimization framework for implementing MPEG-21 scalable video decoders. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005 2: 1141-1144.

    Google Scholar 

  4. Masselos K, Merakos P, Stouraitis T, Goutis CE: A novel algorithm for low-power image and video coding. IEEE Transactions on Circuits and Systems for Video Technology 1998, 8(3):258-263. 10.1109/76.678619

    Article  MATH  Google Scholar 

  5. Lee S-W, Kuo C-CJ: Complexity modeling for motion compensation in H.264/AVC decoder. Proceedings of the 14th IEEE International Conference on Image Processing (ICIP '07), September 2007 5: v-313-v-316.

    Google Scholar 

  6. Liu T-M, Lin T-A, Wang S-Z, Lee C-Y: A low-power dual-mode video decoder for mobile applications. IEEE Communications Magazine 2006, 44(8):119-126.

    Article  Google Scholar 

  7. August NJ, Ha DS: Low power design of DCT and IDCT for low bit rate video codecs. IEEE Transactions on Multimedia 2004, 6(3):414-422. 10.1109/TMM.2004.827491

    Article  Google Scholar 

  8. Tsai T-H, Fang D-L: A novel design of CAVLC decoder with low power consideration. Proceedings of the IEEE Asian Solid-State Circuits Conference (A-SSCC '07), November 2007 196-199.

    Google Scholar 

  9. Xu K, Choy C-S: A power-efficient and self-adaptive prediction engine for H.264/AVC decoding. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2008, 16(3):302-313.

    Article  Google Scholar 

  10. He Z, Liang Y, Chen L, Ahmad I, Wu D: Power-rate-distortion analysis for wireless video communication under energy constraints. IEEE Transactions on Circuits and Systems for Video Technology 2005, 15(5):645-658.

    Article  Google Scholar 

  11. He Z, Cheng W, Chen X: Energy minimization of portable video communication devices based on power-rate-distortion optimization. IEEE Transactions on Circuits and Systems for Video Technology 2008, 18(5):596-607.

    Article  Google Scholar 

  12. Horowitz M, Joch A, Kossentini F, Hallapuro A: H.264/AVC baseline profile decoder complexity analysis. IEEE Transactions on Circuits and Systems for Video Technology 2003, 13(7):704-716. 10.1109/TCSVT.2003.814967

    Article  Google Scholar 

  13. Wang T-H, Chiu C-T: Low power design of high performance memory access architecture for HDTV decoder. Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '07), July 2007 699-702.

    Google Scholar 

  14. Nachtergaele L, Moolenaar D, Vanhoof B, Catthoor F, De Man H: System-level power optimization of video codecs on embedded cores: a systematic approach. Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology 1998, 18(2):89-109.

    Article  Google Scholar 

  15. Pereira F: MPEG multimedia standards: evolution and future developments. Proceedings of the 15th ACM International Conference on Multimedia (MM '07), September 2007 8-9.

    Chapter  Google Scholar 

  16. Valentim J, Nunes P, Pereira F: Evaluating MPEG-4 video decoding complexity for an alternative video complexity verifier model. IEEE Transactions on Circuits and Systems for Video Technology 2002, 12(11):1034-1044. 10.1109/TCSVT.2002.805497

    Article  Google Scholar 

  17. Valentim J, Nunes P, Pereira F: Evaluating MPEG-4 video decoding complexity. Proceedings of the 2nd Workshop and Exhibition on MPEG-4, March 2001

    Google Scholar 

  18. Valentim J, Nunes P, Pereira F: An alternative complexity model for the MPEG-4 video verifier mechanism. Proceedings of the IEEE International Conference on Image Processing (ICIP '01), October 2001 1: 461-464.

    Google Scholar 

  19. Wiegand T, Sullivan GJ, Bjøntegaard G, Luthra A: Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 2003, 13(7):560-576.

    Article  Google Scholar 

  20. Stockhammer T, Hannuksela MM, Wiegand T: H.264/AVC in wireless environments. IEEE Transactions on Circuits and Systems for Video Technology 2003, 13(7):657-673. 10.1109/TCSVT.2003.815167

    Article  Google Scholar 

  21. Ostermann J, Bormans J, List P, Marpe D, Narroschke M, Pereira F, Stockhammer T, Wedi T: Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circuits and Systems Magazine 2004, 4(1):7-28. 10.1109/MCAS.2004.1286980

    Article  Google Scholar 


  23. Karri R, Goodman DJ, Karri R: System-Level Power Optimization for Wireless Multimedia Communication. Kluwer Academic Publishers, Dordrecht, The Netherlands; 2006.

    Google Scholar 

  24. Saponara S, Denolf K, Lafruit G, Blanch C, Bormans J: Performance and complexity co-evaluation of the advanced video coding standard for cost-effective multimedia communications. EURASIP Journal on Applied Signal Processing 2004, 2004(2):220-235. 10.1155/S111086570431019X

    Article  Google Scholar 

  25. Denolf K, Vos P, Bormans J, Bolsens I: Cost-efficient C-level design of an MPEG-4 video. Proceedings of the 10th International Workshop on Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation, 2000 233-242.

    Google Scholar 

  26. Nachtergaele L, Catthoor F, Kapoor B, Janssens S, Moolenaar D: Low-power data transfer and storage exploration for H.263 video decoder system. IEEE Journal on Selected Areas in Communications 1998, 16(1):120-128. 10.1109/49.650925

    Article  Google Scholar 

  27. Ling N, Wang N-T: Real-time video decoding scheme for HDTV set-top boxes. IEEE Transactions on Broadcasting 2002, 48(4):353-360. 10.1109/TBC.2002.806796

    MathSciNet  Article  Google Scholar 

  28. Chiariglione L: MPEG and multimedia communications. IEEE Transactions on Circuits and Systems for Video Technology 1997, 7(1):5-18. 10.1109/76.554414

    Article  Google Scholar 

  29. Gao W, Reader C, Wu F, et al.: AVS—The Chinese Next-Generation Video Coding Standard.

  30. Liu Y, Li ZG, Soh YC: Region-of-interest based resource allocation for conversational video communication of H.264/AVC. IEEE Transactions on Circuits and Systems for Video Technology 2008, 18(1):134-139.

    Article  Google Scholar 

  31. Salembier P, Marques F: Region-based representations of image and video: segmentation tools for multimedia services. IEEE Transactions on Circuits and Systems for Video Technology 1999, 9(8):1147-1169. 10.1109/76.809153

    Article  Google Scholar 

  32. Ji W, Chen Y, Lei C, Zhang H, Wang H, Xing Y, Tang B: Module and distortion analysis for video decoding on mobile devices. Proceedings of the 7th International Conference on Networking (ICN '08), April 2008 681-685.

    Google Scholar 

  33. Russell S, Norving P: Artificial Intelligence: A Modern Approach. 2nd edition. Prentice-Hall, Upper Saddle River, NJ, USA; 2006.

    Google Scholar 

  34. Peng S: Complexity scalable video decoding via IDCT data pruning. Proceedings of the IEEE International Conference on Consumer Electronics (ICCE '01), June 2001 74-75.

    Google Scholar 

  35. List P, Joch A, Lainema J, Bjøntegaard G, Karczewicz M: Adaptive deblocking filter. IEEE Transactions on Circuits and Systems for Video Technology 2003, 13(7):614-619.

    Article  Google Scholar 

  36. Wang Y, Zhu Q-F: Error control and concealment for video communication: a review. Proceedings of the IEEE 1998, 86(5):974-997. 10.1109/5.664283

    Article  Google Scholar 

  37. Chen M-J, Chen C-S, Chi M-C: Temporal error concealment algorithm by recursive block-matching principle. IEEE Transactions on Circuits and Systems for Video Technology 2005, 15(11):1385-1393.

    Article  Google Scholar 

  38. Hsia S-C, Cheng S-C, Chou S-W: Efficient adaptive error concealment technique for video decoding system. IEEE Transactions on Multimedia 2005, 7(5):860-868.

    Article  Google Scholar 

  39. Lappalainen V, Hallapuro A, Hamalainen TD: Complexity of optimized H.26L video decoder implementation. IEEE Transactions on Circuits and Systems for Video Technology 2003, 13(7):717-725. 10.1109/TCSVT.2003.814968

    Article  Google Scholar 


Download references


The authors acknowledge the support from the National Natural Science Foundation of China (NSFC), contract/grant number: 60872007; National 863 High Technology Program of China, contract/grant number: 2009AA01Z239; The Ministry of Science and Technology (MOST), International Science and Technology Collaboration Program, contract/grant number: 0903; The NAP of Korea Research Council of Fundamental Science & Technology; The MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) "(NIPA-2010-(C1090-1011-0004))".

Author information

Authors and Affiliations


Corresponding author

Correspondence to Xiaohu Ge.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Ji, W., Chen, M., Ge, X. et al. ESVD: An Integrated Energy Scalable Framework for Low-Power Video Decoding Systems. J Wireless Com Network 2010, 234131 (2010).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Energy Profile
  • Scalable Video
  • Error Concealment
  • Video Codec
  • IDCT