Impact of Video Coding on Delay and Jitter in 3G Wireless Video Multicast Services

We present an e ﬃ cient method for supporting wireless video multicast services. One of the main goals of wireless video multicast services is to provide priority including dedicated bandwidth, controlled jitter (required by some real-time and interactive traf-ﬁc), and improved loss characteristics. The proposed method is based on storing multiple di ﬀ erently encoded versions of the video stream at the server. The corresponding video streams are obtained by encoding the original uncompressed video ﬁle as a sequence of I - P ( I )-frames using a di ﬀ erent GOP pattern. Mechanisms for controlling the multicast service request are also presented and their e ﬀ ectiveness is assessed through extensive simulations. Wireless multicast video services are supported with considerably reduced additional delay and acceptable visual quality at the wireless client-end. Copyright


INTRODUCTION
Multimedia transport typically requires stringent QoS metrics (bandwidth and delay and jitter guarantees).However, in addition to unreliable wireless channel effects, it is very hard to maintain an end-to-end route which is both stable and has enough bandwidth in an ad hoc network.The rapid growth of wireless communications and networking protocols will ultimately bring video to our lives anytime, anywhere, and on any device.
Until this goal is achieved, wireless video delivery faces numerous challenges, among them highly dynamic network topology, high error rates, limited and unpredictably varying bit rates, and scarcity of battery power.Most emerging and future mobile client devices will significantly differ from those used for speech communications only; handheld devices will be equipped with color display and a camera, and have sufficient processing power to allow presentation, recording, and encoding/decoding of video sequences.In addition, emerging and future wireless systems will provide sufficient bit rates to support video communication applications.Nevertheless, bit rates will always be scarce in wireless transmission environments due to physical bandwidth and power limitations; thus, efficient video compression is required [1,2].
In the last decade, video compression technologies have evolved in the series of MPEG-1, MPEG-2, MPEG-4, and H.264 [3][4][5][6].Given a bandwidth of several hundred of kilobits per second, the recent codecs, such as MPEG-4, can efficiently transmit quality video.
An MPEG video stream comprises intra-frames (I), predicted frames (P), and interpolated frames (B) [3][4][5].According to MPEG coding standards, I-frames are coded such that they are independent of any other frames in the sequence; P-frames are coded using motion estimation and each one has a dependency on the preceding Ior P-frame; finally the coding of B-frames depends on the two "anchor" frames-the preceding I/P-frame and the following I/P-frame.An MPEG coded video sequence is typically partitioned into small intervals called GOP (group of pictures).
Streaming of live or stored video content to group of mobile devices comes under the scope of multimedia broadcast/multicast service (MBMS) standard [7].MBMS standardization is still in process.It seems that its pure commercialization will need at least three more years.Some of the typical applications are subscription to live sporting, events, news, music, videos, traffic and weather reports, and live TV content.MBMS has two modes in practice: broadcast mode and multicast mode.The difference between broadcast and multicast modes is that the user does not need to subscribe in each broadcast service separately, whereas in multicast mode, the services can be ordered separately.The subscription and group joining for the multicast mode services could be done by the mobile network operator, the user him/herself, or a separate service provider.The current understanding about the broadcast mode is that the services are not charged, whereas the multicast mode can provide services that are billed.Specifically MBMS standard specifies transmission of data packets from single entity to multiple recipients.The multimedia broadband-multicast service center should be able to accept and retrieve content from external sources and transmit it using error resilient schemes.
In recent years several error resilience techniques have been devised [8][9][10][11][12][13][14][15].In [8], an error resilience entropy coding (EREC) has been proposed.In this method the incoming bitstream is reordered without adding redundancy such that longer VLC blocks fill up the spaces left by shorter blocks in a number of VLC blocks that form a fixed-length EREC frame.The drawback of this method is that the codes between two synchronization markers are dropped, results in that any VLC code in the EREC frame be corrupted due to transmission errors.A rate-distortion frame work with analytical models that characterize the error propagation of the corrupted video bitstream subjected to bit errors was proposed [9].One drawback of this method is that it assumes that the actual rate-distortion characteristics are known, which makes the optimization difficult to be realized practically.In addition the error concealment is not considered.Error concealment has been available since H.261 and MPEG-2 [4].The easiest and most practical approach is to hold the last frame that was successfully decoded.The best known approach is to use motion vectors that can adjust the image more naturally when holding the previous frame.More complicated error concealment techniques consist of a combination of spatial, spectral, and temporal interpolations with motion vector estimation.In [10] an error resilience transcoder for general packet radio service (GPRS) mobile accesses networks is presented.In this approach the bit allocation between insertion error resilience and the video coding is not optimized.In [11] optimal error resilience insertion is divided into two subproblems: optimal mode selection for macroblocks and optimal resynchronization marker insertion.Moreover, in [12] an approach to recursively compute the expected decoder distortion with pixel-level precision to account for spatial and temporal error propagation in a packet loss environment is proposed.In both methods [11,12], interframe dependencies are not considered.In MPEG-4 video standard [5], application layer error resilient tools were developed.At the source coder layer, these tools provide synchronization and error recovery functionalities.Efficient tools are resynchronization marker and adaptive intra-frame refresh (AIR).The marker localizes transmission error by inserting code to mitigate errors.AIR prevents error propagation by frequently performing intra-frame coding to motion domains.However, AIR is not effective in combating error propagation when I-frames are less frequent.
A survey of error resilient techniques for multicast applications for IP-based networks is reported in [13].It presents algorithms that combine ARQ, FEC, and local recovery techniques where the retransmissions are conducted by multicast group members or intermediate nodes in the multicast tree.
Moreover, video resilience techniques using hierarchical algorithms are proposed where transmission of I-, P-, and Bframes is sent with varying levels of FEC protection.Some of the prior research works on error resilience for broadcast terminals focus on increasing FEC based on the feedback statistics for the user [14].A comparison of different error resilience algorithms for wireless video multicasting on wireless local area networks is reported in [15].However, in the literature survey none of the methods applied error resilience techniques at the video coding level to support multicasting services.
Error resilient (re-) encoding is a technique that enables robust streaming of stored video content over noisy channels.It is particularly useful when content has been produced independent of the transmission network conditions or under dynamically changing network conditions.
This paper focuses on signaling aspects of mobile clients, such as joining or leaving a multicast session of multimedia delivery.Developing error resilience technique which provides high quality of experience to the end mobile user is a challenging issue.In this paper we propose a very efficient error resilience technique for MBMS.Similar to [16] by encoding separate copies of the video, the multicast video stream is supported with minimum additional resources.The corresponding version is obtained by encoding every (i.e., uncompressed) frame of the original movie as a sequence of I-P(I)frames using a different GOP pattern.
The paper is organized as follows.In Section 2 the multimedia broadcast/multicast service standard is briefly discussed.In Section 3 the problem of supporting multimedia broadcast/multicast service over wireless networks is formulated.In Section 4 the preprocessing steps required to support efficient multicast streaming services over wireless networks are detailed.Section 5 presents the extensive simulations results.Finally conclusions are discussed in Section 6.

MULTIMEDIA BROADCAST/MULTICAST SERVICE
Third generation partnership project (3GPP) has standardized four types of visual content delivery services and technologies.
The first three mobile applications assume the point-topoint model, where two single end-points (e.g., client-server) communicate one another.As its name indicates, MBMS has two modes in practice: broadcast mode and multicast mode.
A broadcast service can be generalized to mean a unidirectional point-to-multipoint service in which data is transmitted from a single source to multiple terminals in the associated broadcast service area.On the other hand, a multicast service can be defined as a unidirectional point-tomultipoint service in which data is transmitted from a single source to a multicast group in the associated multicast service area.Only the users that are subscribed to the specific multicast service and have jointed the multicast group associated with the service can receive the multicast services.As a difference a broadcast service can be received without separate indication from the customers.In practice multicast users need a return channel for the interaction procedures in order to be able to subscribe to the desired services.
Similar to (PSS) and (MMS), two type applications of (MBMS) standard are anticipated.
(i) MBMS download: to push a multimedia message to clients.(ii) MBMS streaming: continuous media stream transmission and immediate playout.
The protocol stack is designed to accommodate the above applications as illustrated in Figure 1.
The streaming stack is very similar to PSS [18].On the other hand, the download stack is unique in terms of its adoption of IETF reliable multicast/broadcast delivery in error-prone environments.As protocol, FLUTE is fully specified and built on top of the asynchronous layered coding (ALC) protocol of the layered coding transport (LCT) building block.File transfer is administrated by special-purpose objects, file description table (FDT) instances, which provide a running index of files and their essential reception parameters in-band of a FLUTE session.ALC is the adaption protocol to extend LCT for multicast.ALC combines the LCT and FEC building blocks.LCT is designed as a layered multicast transport protocol for massively scalable, reliable, and asynchronous content delivery.An LCT session comprises multiple channels originating at a single sender that are used for some period of time to carry packets pertaining to the transmission of one or more objects that can be of interest to receivers.The FEC building block is optionally used together with the LCT building block to provide reliability.The FEC building block allows the choice of an appropriate FEC (e.g., Reed-Solomon) code to be used with ALS, including using the no-code FEC scheme that simply sends the original data using no FEC coding [7].

PROBLEM FORMULATION
The MBMS system introduces a new paradigm from the traditional internet-or satellite-based multicasting system due to mobility.The system has to account for wide variety of receiver conditions such as handover, speed of the receiver, interference, and fading.Moreover, the required bandwidth and power should be kept low for mobile devices.
Since mobility is expected during session there is typically significant packet loss during handover.If the packet loss occurs on an I-frame, it would effect all the Pand Bframes that predict from the I-frame.In the case of P-frames, the error concealment techniques could mitigate the loss; however, the distortion would continue to propagate until an I-frame is found.These could also be managed using intrablock refresh rate technique.On the other hand, loss of Bframes limits the loss to that particular frame and does not result in error propagation.When a mobile joints an existing multicast session, there is a delay before which it can be synchronized.This delay is proportional to the frequency of I-frames as determined by the streaming server.Since I-frames require more bits than the Pand B-frames, the compression efficiency is inversely to the frequency of I-frames.Assume that I number is the frequency of I-frames, F rate is the frame rate of the video compression.The worse case initial delay in seconds can be computed as follows: where N is the distance between two successive I-frames defining a "group of pictures" (GoP).N can be defined as follows: where M is the distance between two successive P-frames.
(usually set to 3) and α is nonnegative constant (α ≥ 0). Figure 2 depicts the worse delay in seconds for different combination of frame rate and the number of I-frames in a Group Of Pictures.
The graph in Figure 2 shows that the delay is proportional to the frequency of I-frames.The application would also require more frequent transmission of I-frames so as to allow the use to joint the ongoing session.However, this would result in requiring more bandwidth.
Assuming that the ratio of frame sizes for I-, P-, and Bframes is 5 : 3 : 2, the MPEG bitstream used for simulation is the "Mobile" sequence with 180 frames.The average bandwidth is given by [20] bandwidth = F rate × average (IP) size × 8 bits/byte, (4)    where average (IPB) size ( Figure 3 shows the increase in the network bandwidth as a function of group of pictures.It can be seen from this graph that more I-frames in a GOP results in increase in the network bandwidth.
One other tool that is effective against error propagation is intra-block refresh technique [5].In this technique, a percentage of Pand B-frames block is intra-coded and criterion for determining such intra-clock is dependent on the algorithm.However, the intra-block refresh technique is not effective in combating error propagation when I-frames are less frequent.
Apart from traditional broadcasting/multicasting techniques, the MBMS system requires new technologies for error resilience.This is because MBMS does not allow retransmissions and the temporal fading conditions of wireless channels could result in corruption of certain frames.Due to the frame dependency within hybrid coding techniques the errors propagate until an I-frame is decoded.

PROPOSED TECHNIQUE
In a typical video distribution scenario, video content is captured, then immediately compressed and stored on a local network.At this stage, compression efficiency of the video signal is most important as the content is usually encoded with relatively high quality and independent of any actual channel characteristics.Note that heterogeneity of client networks makes it difficult for the encoder to adaptively encode the video contents to a wide degree of different channel conditions.This is especially true for wireless clients.It should also be noted that the transcoding (decode-(re-) encode) of stored video is as necessary as that for live video streaming.For instance, pre-analysis may be performed on stored video to gather useful information.If the server only has the original compressed bitstream (i.e., the original uncompressed sequence is unavailable), we can decode the bitstream.
The problem addressed is that of transmitting a sequence of frames of stored video using the minimum amount of energy subject to video quality and bandwidth constraints impose by the wireless network.
Assume that I-frame is always the start point of a joining multicast session.Since I-frames are decoded independently, switching from leaving to joining multicast session can been done very efficiently The corresponding video streams are obtained by encoding the original uncompressed video file as a sequence of I-P(I)-frames using a different GOP pattern (N = 5, M = 1).P(I) are coded using motion estimation and each one has a dependency only on the preceding I-frame.This results in that the corruption of P-frame does not affect the next P-frame to be decoded.On the other hand, it increases the P(I)-frame sizes.
We consider a system where source coding decisions are made using the minimum amount of energy min E q(i) {I, P(I)} subject to minimum distortion (D min ) at the mobile client and the available channel rate (C Rate) required by wireless network.Hence min E q(i) I, P(I) ≤ C Rate, min E q(i) I, P(I) ≤ D min . ( It should be emphasized that a major limitation in wireless networks is that mobile users must rely on a battery with a limited supply of energy.Effectively utilizing this energy is a key consideration in the design of wireless networks.Our goal is to properly select a quantizer q(i) in order to minimize the energy required to transmit the sequence of I-P(I)frames subject to both distortion and channel constraints.
A common approach to control the size of an MPEG frame is to vary the quantization factor on a per-frame basis [21].The amount of quantization may be varied.This is the mechanism that provides constant quality rate control.The quantized coefficients QF [u, v] are computed from the DCT coefficients F[u, v], the quantization scale, MQUANT, and a quantization matrix, W [u, v], according to the following equation: The normalized quantization factor w [u, v] is The quantization step makes many of the values in the coefficient matrix zero, and it makes the rest smaller.The result is a significant reduction in the number of coded bits with no visually apparent difference between the decoded output and the original source data [22].The quantization factor may be varied in two ways.
To bound the size of predicted frames, an P(I)-frame is encoded such that its size fulfills the following constraints: The encoding algorithm in the first encoding attempt starts with the nominal quantization value that was used to encode the preceding I-frame.After the first encoding attempt, if the resulting frame size fulfills the constraints (9), the encoder proceeds to the next frame.Otherwise, the quantization factor (quantization matrix, W[u, v]) varies and the same frame is re-encoded.
The quantization matrix can be modified by maintaining the same value at the near-dc coefficients but with different slope towards the higher frequency coefficients.This procedure is repeated until the size of the compressed frame corresponds to (9).The advantage of this scheme is that it tries to minimize the fluctuation in video quality while satisfying channel condition.
Figure 4 shows two matrices both with the same value at the near-dc coefficients but with different slope towards the higher frequency coefficients.In other words, the quantization scale is fixed MQUANT and the quantization matrix W[u, v] varies.

SIMULATIONS RESULTS
There are two types of criteria that can be used for the evaluation of video quality; subjective and objective.It is difficult to  do subjective rating because it is not mathematically repeatable.For this reason we measure the visual quality of the interactive mode using the peak signal-to-noise ratio (PSNR).We use the PSNR of the Y -component of a decoded frame.
The MPEG-4 bitstream used for simulation is the "Mobile" sequence with 180 frames, with a frame rate of 30 fps.The GOP format was N = 5, M = 1.We consider a set of allowable channel rate, C Rate = (300 kbps, 200 kbps, 100 kbps).In order to illustrate the advantage of the proposed algorithm we consider a system where source coding decisions are mode without any constraints, using the same GOP format (N = 5, M = 1).Figure 5 shows the PSNR plot per frame obtained with the proposed algorithm and the reference scheme for the allowable channel rate.Clearly the proposed algorithm yields an advantage of (PSNR) 1.42 dB, 1.39 dB, and 1.35 dB, for the allowable channel rates 300 kbps, 200 kbps, and 100 kbps, respectively.Figure 6 depicts the performance of the proposed algorithm compared with the performance of MPEG-4 simple profile [5] codec during frame loss.The percentage of frames that  are dropped is varied and it is clearly seen that the proposed approach maintains the quality.On the other hand, in the MPEG-4 simple profile codec, the quality degrades with the increase in frame loss percentage.The "Mobile" sequence was used for these experiments with bit rate 100 kbps and frame rate 30 fps.The above figures show that the proposed algorithm minimizes jitter during multicast session.

CONCLUSIONS
Error resilient (re-) encoding is a key technique that enables robust streaming of stored video content over noisy channels.It is particularly useful when content has been produced independent of the transmission network conditions.In this paper, we investigated the constraints of supporting multimedia multicast services in wireless clients.In order to overcome these additional resources we proposed the use of differently encoded version of each video sequence.The differently coded sequences are obtained by encoding frames of the original (uncompressed) sequence as I-P(I)-frames using a different GOP pattern.The server responds to a multicast request by switching from leaving to joining multicast session very efficiently.By proper encoding versions of the original video sequence, multicast video streaming services can be supported with considerably reduced additional delay and minimum jitter which implies acceptable visual quality at the wireless client-end.Our future work includes developing a multilevel quality of services framework for wireless full interactive multicast video services.

Figure 2 :
Figure 2: Relative increase in the delay as a function of GOP (N).

Figure 3 :
Figure 3: Relative increase in the bandwidth as a function of GOP (N).

Figure 5 :
Figure 5: PSNR for encoded frames in the multicast version.

Figure 6 :
Figure 6: PSNR as function of frames dropped.