Adaptive multiple description video coding and transmission for scene change
© Zhang and Bai; licensee Springer. 2012
Received: 27 March 2012
Accepted: 24 July 2012
Published: 21 August 2012
In view of perfect compatibility with the standard source and channel codec, temporal sampling-based multiple description coding (MDC) has become a better choice for practical applications. However, for the frames change from one scene to another temporal correlation may be destroyed by temporal sampling extremely, which results in the false estimation when the related frames are lost at the side decoder. Therefore, in this article the frames containing scene change are detected and duplicated before temporal sampling, which maintains better temporal correlation in each description. Furthermore, for better rate distortion performance temporal sampling is employed adaptively, that is, frame skipping or up-sampling according to the motion characteristics in original video. The experimental results exhibit better performance of the proposed scheme than other schemes whether in the on–off MDC environment or packet lossy network, especially about 15 dB improvements for the frames with scene change. Therefore, it may be a promising choice for video transmission over error-prone channels, especially over wireless networks.
In the recent years, the increasing demands for multimedia communication have generated a lot of research interests in developing novel image or video compression technologies. Due to network congestion and delay sensibility, it is always a great challenge for video transmission over lossy network. Multiple description coding (MDC) is an attractive approach to solve this problem. It can efficiently combat packet loss without any retransmission thus satisfying the demand of real-time services and relieving the network congestion .
MDC encodes the source message into several bit streams (descriptions) carrying different information which then can be transmitted over multiple channels. In its simplest form, two parallel channels are assumed to connect the source with the destination. If only one channel works, the descriptions can individually be decoded to sufficiently guarantee a minimum fidelity in the reconstruction at the receiver. However, when both channels work, the descriptions from the channels can be combined to yield a higher fidelity reconstruction.
In the past years, many approaches for realizing MDC have been proposed, such as those using interleaved scalar quantizer , lattice vector quantizer [3, 4], pairwise correlating transforms , FEC , and so on. Although these methods have shown good performance, they are incompatible with widely used standard codecs, such as H.26x and MPEG-x.
To overcome the limitation, pre- and post-processing-based MDC may be a good choice. In pre-processing, the original source is split into multiple sub-sources before encoding and then these sub-sources can be encoded separately by the standard codec to generate multiple descriptions. The study of  is a typical example, which employs hierarchical B pictures in the H.264/AVC scalable extension  for a pre-processing-based MDC. Furthermore, sub-sampling technique can also be applied in pre-processing to realize multiple sub-sources, such as the MD video coder based on spatial sampling  and the MD video coder based on temporal sampling . In the method of spatial sampling, through zero padding inside each individual frame, only the correlation of intra-frame is considered to improve side distortion and the temporal correlation of inter-frame is neglected completely. In , the method of temporal sampling is proposed to make motion compensation interpolation (MCI) more efficient for lost frames, which turns to better rate distortion performance. However, considering the video sequences containing different scenes, simple MCI method may lead to false interpolated reconstruction at the decoder. Recently, for robust transmission over wireless network, in [11, 12] some new MD methods like distributed MDC have been proposed to achieve better rate distortion performance by optimized redundancy allocation for the splitting video sub-sequences.
To address this problem from scene change, a preliminary scheme is presented in . Here, in this article an improved MD coding based on adaptive temporal sampling is proposed to make sure the decoder can work correctly when scene changes. According to the temporal correlation in the original video, the frames containing scene change can be detected before temporal sampling. Then, these frames are duplicated and transmitted over both channels for better side reconstruction. Furthermore, adaptive temporal sampling is applied, that is, frame skipping and up-sampling in terms of motion characteristics between the frames, which can achieve better rate distortion performance.
The rest of this article is organized as follows. In the following section, the proposed MD video coding scheme is presented including pre- and post-processing stages and improvement for scene change. The performance of the proposed scheme is examined against some other relevant MD coders in Section “Experimental results”. We conclude the article in Section “Conclusions”.
Proposed MDC scheme for scene change
Adaptive temporal sampling in pre-processing
In terms of the principle of MDC, higher quality of side decoded video will result from more correlations in descriptions because better error concealment is available, but more redundancy introduced will also bring about lower efficiency to the central decoder. Obviously, it is a better solution that the redundancy added should make a tradeoff between the reconstructed quality and compression efficiency. As a result, in the pre-processing stage, the sampling with flexible frame rate is employed to introduce the adaptive redundancy. Since unstable motion appearance of inter-frames will affect the performance of error concealment at the side decoder, the rate of sampling should be various according to the motion information between any two neighboring frames. More interpolated frames or higher rate of up-sampling will be utilized to smooth the abrupt motion between the frames. On the other hand, if no abrupt motion happens, no redundant frames are needed. Furthermore, if the motion information between continuous frames is enough stable, some middle frames can be skipped to guarantee the high compression efficiency. Such method of pre-processing mainly aims to generate descriptions with regular motion which can make better estimation of lost frames available at the side decoder.
Then the thresholds T1 and T2 are set empirically as T1 = T and T2 = 2T.
In the pre-processing, new interpolated frames and skip some other frames may increase the encoding computational complexity to some extent. However, the increasing complexity mainly comes from the decision of interpolated or skipped frames. After the simple decision using motion vector, only a few frames need to be interpolated or skipped. Therefore, it cannot lead to higher complexity.
Additionally, the labels (‘O’, ‘I’, or ‘S’) are set for each frame to distinguish the original frame, interpolated one, or skipped one, then indexed with odd or even numbers and transmitted over two channels, respectively. Here, the labels are coded by entropy encoder, which nearly can be neglected compared with the total bit rate.
Improvement for scene change
To distinguish the scenes in the video, new marks are needed, that is, ‘A’ represents the first frame of a scene and ‘Z’ represents the last frame. When scene change appears, both the first and the last frames of the current scene will be transmitted over all channels to ensure the intact rebuilding of frames. Here, take two channels as example. At the encoder frame Fk- 1 and F k with scene change will be duplicated and the ‘ZZAA’ will be labeled at the same time. Then adaptive temporal sampling will be applied, which is introduced in Section “Adaptive temporal sampling in pre-processing”. At the decoder, if two channels can work the duplicated frames will be deleted to obtain the central reconstruction. If only one channel works, side reconstruction can be accepted according to the labels.
Adaptive temporal sampling in post-processing
Decoder design for the on–off environment
If the current label is ‘O’ but its following label is ‘I’, the represented frame is just the reconstructed one.
If the current label is ‘I’ but its following label is ‘O’, the represented frame is the interpolated frame and it can be regarded as the reconstructed one.
If the current label is ‘I’ and its following label is also ‘I’, the continuous frames represented by ‘I’ should be merged to a reconstructed frame.
If the current label is ‘O’, and its following label is also ‘O’, a new frame should be interpolated between the two frames denoted by ‘O’.
Decoder design for packet lossy network
In packet lossy network, due to both descriptions received with packet losses, only central decoder should be designed in post-processing stage and it is different from the approach in on–off MDC environment.
Performance in on–off MDC channels
Here, there are mainly two experiments taken into account to present the efficiency of adaptive sampling in temporal domain. The first one is shown the better performance of the proposed scheme than the conventional scheme without pre-processing stage. In the second experiment, the advantage of the proposed scheme is illuminated compared with the scheme using spatial sampling .
In the first experiment, the standard test video “coastguard.qcif” is used with 30 frames per second. For a fair comparison, the same mode and parameters are chosen in H.264 encoder and decoder .
Performance in packet lossy network
Here, the standard test video “coastguard.qcif” is chosen to examine our proposed scheme with 30 frames per second. For a fair comparison, the same mode and parameters are chosen in H.264 encoder and decoder. Furthermore, the organization of packets is a checkerboard type for all the cases.
Performance for scene change
Here, the test video “merged” in Section “Proposed MDC scheme for scene change” is used to exhibit the better performance of the proposed scheme for scene change. In the proposed one the three scene changes have been taken into account when using adaptive temporal sampling while in the compared one scene changes have also been processed only through adaptive temporal sampling. For fair comparison, the same method of adaptive temporal sampling has been adopted for the above two schemes.
In view of perfect compatibility with the standard codec, the MD video coding based on pre- and post-processing may be a better choice in the practical applications. In this article, adaptive frame skipping or up-sampling is employed in the pre-processing to obtain better tradeoff between the compression efficiency and reconstructed quality. Furthermore, when scene change happens in the original video, the pre-processing are improved to avoid error frame interpolation, even severe quality degradation. As a result, the proposed MD video coding scheme has demonstrated superior rate-distortion performance to the conventional MD video coder and spatial sampling-based scheme, which may be a promising choice for video transmission over error-prone channels, especially over wireless networks.
This study was supported in part by the 973 program (2012CB316400), the National Natural Science Foundation of China (Nos. 61103113, 60903066), the Beijing Natural Science Foundation (No. 4102049), the Specialized Research Fund for the Doctoral Program of Higher Education (No. 20090009120006), the Fundamental Research Funds for the Central Universities (No. 2011JBM214), the Jiangsu Provincial Natural Science Foundation (BK2011455) and PHR (IHLB) (PHR201008187).
- Goyal VK: Multiple description coding: compression meets the network. IEEE Signal Process. Mag. 2001, 18(5):74-93. 10.1109/79.952806View ArticleGoogle Scholar
- Vaishampayan V, John S: Balanced interframe multiple description video compression. 3rd edition. International Conference on Image Processing (ICIP), Kobe, Japan; 1999:812-816.Google Scholar
- Bai H, Zhu C, Zhao Y: Optimized multiple description lattice vector quantization for wavelet image coding. IEEE Trans. Circuits Syst. Video Technol. 2007, 17(7):912-917.View ArticleGoogle Scholar
- Liu M, Zhu C: M-description lattice vector quantization: index assignment and analysis. IEEE Trans. Signal Process. 2009, 57(6):2258-2274.MathSciNetView ArticleGoogle Scholar
- Reibman AR, Jafarkhani H, Yao Wang MT, Reibman AR, Jafarkhani H, Yao W, Orchard MT, Puri R: Multiple-description video coding using motion-compensated temporal prediction. IEEE Trans. Circuits Syst. Video Technol 2002, 12(3):193-204. 10.1109/76.993440View ArticleGoogle Scholar
- Chang S, Cosman PC, Milstein LB: Performance analysis of channel symmetric FEC-based multiple description coding for OFDM networks. IEEE Trans. Image Process. 2011, 20(4):1061-1076.MathSciNetView ArticleGoogle Scholar
- Zhu C, Liu M: Multiple description video coding based on hierarchical B pictures. IEEE Trans. Circuits Syst. Video Technol. 2009, 19(4):511-521.View ArticleGoogle Scholar
- ITU-T and ISO/IEC JTC 1: Advanced Video Coding for Generic Audiovisual Services. ITU-T Rec. H.264 & ISO/IEC 14496-10; 2005. Version 4, JulyGoogle Scholar
- Wang D, Canagarajah N, Redmill D, Bull D: Multiple description video coding based on zero padding. International Symposium on Circuits and Systems (ISCAS), Vancouver, CA; 2004:205-208.Google Scholar
- Bai H, Zhao Y, Zhu C: Multiple description video coding using adaptive temporal sub-sampling. International Conference on Multimedia and Expo (ICME), Beijing China; 2007:1331-1334.Google Scholar
- Crave C, Guillemot B, Crave O, Guillemot C, Pesquet-Popescu B, Tillier C: Distributed temporal multiple description coding for robust video transmission. EURASIP J. Wirel. Commun. Netw. 2008. Article ID 183536 (2008)Google Scholar
- Wang A, Li Z, Zhao Y, Wang W, Bai H: Two-description distributed video coding for robust transmission. EURASIP J. Adv. Signal Process. 2011, 76: 2011.Google Scholar
- Zhang M, Bai H: Temporal sampling based multiple description video coding for scenes switching. Data Compression Conference (DCC), Snowbird Utah, USA; 2012:413-413.Google Scholar
- ITU-T Rec. H.264 ISO/IEC 14496-10 AVC: Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification. JVT Doc JVT-G050, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG(ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6) 7th Meeting, Pattaya, Thailand; 2033. 7-14 MarchGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.