Predictive side decoding for human-centered multiple description image coding

Multiple description coding (MDC) provides a favorable solution for human-centered image communication, which takes into account people’s varying watching situations as well as people’s demand for real-time image display. As an effective technique for MDC, three-description lattice vector quantization (3D-LVQ) is considered for image coding in this paper. Based on intra- and inter-correlation in the 3D-LVQ index assignment as well as wavelet intra-subband correlation, a novel predictive decoding method for 3D-LVQ-based image coding is proposed to enhance side decoding performance, which attempts to predict lost descriptions (sublattice points) in a good way for better reconstructions of wavelet vectors (fine lattice points) in the side decoding. Experimental results validate effectiveness of the proposed decoding scheme in terms of rate-distortion performance.

communication to varying demands of different people, whose communication channels may have varying bandwidth and loss probabilities. When packet loss occurs during online image browsing, people tend to prefer viewing a degraded version of a whole image immediately instead of waiting and staring at a partially displayed fine image. The design of human-centered image coding scheme that takes into account people's varying watching situations as well as people's demand for real-time image display is a challenging problem.
Multiple description coding (MD coding or MDC) [18] provides a favorable solution to this problem. Although the reliability of multimedia communication can be improved from the perspective of multicore real-time system design [19][20][21] or load balancing of cloud-edge computing [22][23][24][25][26][27], MDC offers an error-resilient source coding method to combat information loss over lossy networks without retransmission. MDC generates different encoded versions for the same source. Each version is referred to as a description and transmitted separately over unreliable networks. Each description can provide a degraded version of the source independently, while a finer reconstruction quality can be obtained with increasing number of descriptions received. Generally, the decoding of one or partial descriptions is known as side decoding corresponding to side distortions, while the decoding of all the descriptions is central decoding resulting in a central distortion [28]. Using MDC, people with varying bandwidth can select different number of descriptions that correspond to different reconstruction qualities. During network congestion, people can get access to a coarsely reconstructed source immediately, instead of waiting for retransmission of all the lost packets.
Vaishampayan introduced the earliest practical MD technique known as multiple description scalar quantizer (MDSQ) [29]. MDSQ generates descriptions by performing scalar quantization, followed by an index assignment. A wavelet image coding based on MDSQ was developed in [30]. Another wavelet-based MD image coding scheme is proposed in [31] for image transmission with mixed impulse noise, where multi-objective evolutionary algorithm is used to solve the side quantization optimization problem and the parameter optimization problem of the denoising filter simultaneously.
Multiple description lattice vector quantization (multiple description LVQ or MDLVQ) was later developed in [32], and a study on optimal MDLVQ design was presented in [33]. MDLVQ generates descriptions by performing vector quantization first, and then, an index assignment maps a fine lattice point to multiple sublattice points. An image coding scheme based on two-description LVQ was developed in [34], which shows better coding performance than the corresponding MDSQ-based counterpart [30]. In [35], the design of M-description LVQ is investigated, where the MDLVQ index assignment design is translated into a transportation problem. The effectiveness of the proposed index assignment design in [35] is verified under high-resolution assumption. In [36], an analytical expression for optimal entropy-constrained asymmetric MDLVQ design is presented, which allows unequal packet-loss probabilities and side entropies. In [37], the design of symmetric MD coinciding LVQ is proposed, where the coinciding sublattices refer to sublattices with the same index but generated by different generator matrices. The developed MD coinciding LVQ scheme is applied to standard test images.
Other MD schemes include using forward error correction codes [38], MDC via polyphase transform and selective quantization [39], set partitioning of hierarchical trees (SPHIT)-based image MDC [40], and a JPEG 2000-based MD approach presented in [41]. In [42], a just noticeable difference (JND)-based MD image coding scheme is proposed utiltizing the charactersitics of human visual model. In [43], an adaptive reconstructionbased MD image coding scheme is proposed with randomly offset quantizations. Deep learning approaches [44] have been applied in the MDC. In [45], a standard-compliant multiple description coding framework is proposed, where the input image is polyphase downsampled to form two descriptions for the standard codec, while during decoding deep convolutional neural networks are utilized to conduct artifact removal and image super-resolution to enhance reconstructed image quality. In [46], MDC and convolutional autoencoders are combined for image compression to achieve high coding efficiency. Besides traditional images, a few research works on MDC target at 3D depth images or single-view and multiview video sequences. In [47], observing that the 3D depth images have special characteristics, which can be classified into edge blocks and smooth blocks, a two-description LVQ scheme is proposed for efficient compression of 3D depth images. In [48], a novel coding scheme has been proposed for video sequences based on the spatial-temporal masking characteristics of human visual system. In [49], the multiview sequence is spatial polyphase subsampled and "cross-interleaved" sampling grouped to generate two subsequences, and an MDC scheme is proposed which directly reuses the computed modes and prediction vectors of one subsequence to the other one. This work is extended in [50], where one subsequence is directly coded by joint multiview video coding (JMVC) encoder, and the other subsequence selectively chooses the prediction mode and the prediction vector of the coded subsequence to improve the rate-distortion performance. On the decoder side, the side reconstruction quality is improved using a gradient-based interpolation.
Among the abovementioned works, most of them center on two-channel MDC or two-description coding. Comparing with two-description MDC, more-description case is able to provide better robustness against description loss, especially for networks with high loss ratios. However, redundancy increases apparently with the increasing number of descriptions. Three-description coding may thereby be a good trade-off choice in some cases. On the other hand, compared with MDSQ, MDLVQ exhibits better coding efficiency and the ease of extension to more-description coding. Therefore, a three-description lattice vector quantization (3D-LVQ)-based image coding scheme is considered in this paper.
The general design of 3D-LVQ is concerned with index assignment, which is discussed in [33] and [51]. Here, we consider how to take good advantage of the index assignment result for better reconstruction quality in image decoding. For the vector reconstruction at the decoder side in the case of some descriptions (i.e., sublattice points in MD-LVQ) being lost, the existing MD-LVQ coding schemes employ a simple side decoding of each vector individually based on the sublattice points of the vector. We observe a good correlation characteristic of the 3D-LVQ index assignment result, which can be exploited to enhance side decoding for memory source. Specifically in the context of wavelet image coding, a predictive side decoding method is proposed accordingly to improve reconstruction quality in side decoding. Compared with the existing work in [33,51] which only decodes the received sublattice points during description losses, the proposed scheme can predict the lost sublattice points based on index correlation.
The main contributions of this paper can be summarized as follows: • The intra-and inter-correlation between sublattice points in the 3D-LVQ index assignment has been analyzed and discussed, followed by the correlation discussion of wavelet intra-subbands.
• Based on correlation discussion, a novel predictive decoding method for 3D-LVQbased image coding is proposed to enhance side decoding performance. The performance of the proposed predictive decoding scheme is verified by experimental results. The remainder of the paper is structured as follows. Section 2 provides a 3D-LVQbased image coding scheme. Section 3 presents a novel predictive side decoding approach. Experimental settings and results are presented in Sections 4 and 5, respectively, while Section 6 concludes the paper.

Three-description LVQ-based image coding
In this section, we first provide a concise description of 3D-LVQ and then present a 3D-LVQ-based image coding scheme.

3D-LVQ
For a given lattice in the L-dimensional Euclidean space, a sublattice ⊆ is said to be geometrically similar to , if can be obtained from by applying a scaling, rotation, or reflection. The index number N of the sublattice is defined as the number of elements of (fine lattice points) in each Voronoi cell of . 3D-LVQ aims to map one fine lattice point λ (λ ∈ ) to three sublattice points λ 1 , λ 2 , and λ 3 (λ 1 , λ 2 , λ 3 ∈ ) based on a bijective labeling function α(.) (also known as index assignment) as: for minimizing the side distortions when only one or two sublattice points are received. The overall 1-description side distortion D s1,λ and the overall 2-description side distortion D s2,λ are given as: respectively, where the midpoint of two received sublattice points is taken as the reconstructed vector for the 2-description-based side decoding. The optimal index assignment design to minimize the side distortions or the expected distortion is a challenging task, and the index assignment based on A 2 lattice can be found in [33] and [51]. Figure 1 shows an example of the labeling function obtained with the index assignment result in [33] and [51] based on A 2 lattice with index number N = 31, which has been shown to minimize the side distortions. For instance, the lattice point "OAB" in Fig. 1 is represented by the three sublattice points "O, " "A, " and "B, " while another lattice point "BOO" is mapped to the three sublattice points "B, " "O, " and "O. " In this paper, we consider the 3D-LVQ with the optimal index assignment as shown in Fig. 1.

3D-LVQ-based image coding
As in [34], a simple 3D-LVQ-based image encoding scheme is shown in Fig. 2. As a popular technique for image compression, discrete wavelet transform (DWT) can provide multiresolution representation and subband decomposition for images and capture feature information in horizontal, vertical, and diagonal directions [52]. DWT is considered for image coding in this paper. After applying a DWT to the input image, an input vector x is constructed in a subband. It is then quantized to a (fine) lattice point λ(x) , which is mapped to three sublattice points λ 1 (x) , λ 2 (x), and λ 3 (x) to be transmitted in separate channels after performing arithmetic coding. At the receiver, decoding is the exact reverse of encoding. Due to network congestion or channel errors, some channels of information (descriptions) may be lost. Therefore, three different types of 3D-LVQ decoders may be needed, that is, one-description-based and two-description-based side decoding as well as three-description-based central decoding. Denote byx the reconstructed vector x. If all the three sublattice points of vector x are received, the central decoder yields α −1 (λ 1 (x), λ 2 (x), λ 3 (x)) = λ(x), where α −1 is the inverse function of the labeling function α. If two sublattice points are received while one is lost, the conventional two-description-based side decoder simply takes the average of the two sublattice points λ i (x) and λ j (x) (1 ≤ i, j ≤ 3, i = j) as the reconstructed vector: In the case of only one sublattice point λ i (x) being received, the conventional onedescription-based side decoder just uses the received sublattice point for the reconstruction:x In the following section, We will propose a more effective vector reconstruction method to improve the side decoding performance by taking advantage of the correlation of sublattice points in the 3D-LVQ index assignment and the wavelet intra-subband correlation characteristics.

Correlation discussion
As can be seen from Fig. 1, each fine lattice point is mapped to an ordered 3-tuple with the three sublattice points being as close as possible to the fine lattice point for minimizing side distortions [33,51]. In this way, we can see that there is a strong intra-correlation among the three sublattice points for a fine lattice point. More importantly, there exists a substantial inter-correlation among neighboring fine lattice points in terms of their corresponding sublattice points. In other words, neighboring fine lattice points share most sublattice points in the index assignment. In Fig. 1, for instance, the fine lattice point labeled as "OOA" shares at least two sublattice points with its six closest neighbors "AOO, " "OAO, " "BOO, " "OAB, " "AOB, " and "OAF, " regardless of the order. Statistically, we observe from the figure that the immediately neighboring fine lattice points have the same three sublattice points (but in different order) with a probability of 78/186, while they share two sublattice points with a probability of 108/186. That is to say, these immediately neighboring fine lattice points share at least two sublattice points. As the distance between two fine lattice points increases, they have fewer sublattice points in common.
On the other hand, it is well known that a wavelet image normally exhibits strong intrasubband correlation especially in low-frequency subbands, as the discrete wavelet transform re-distributes the energy of the image into different subbands. One-dimensional DWT passes the signal through a low-pass filter and a high-pass filter simultaneously, providing approximation coefficients (low-frequency subband) and detail coefficients (high-frequency subband), respectively. For two-dimensional DWT performed on images, one level of transform generates four subbands. The subband with low-pass filters in both horizontal and vertical directions is termed as the "LL" subband. Similarly, the subbands resulting from a high-pass filter in the horizontal direction and a low-pass filter in the vertical direction, a low-pass filter in the horizontal direction and a high-pass filter in the vertical direction, and high-pass filters in both directions are termed as the "HL, " "LH, " and "HH" subbands, respectively. As an example, two-stage wavelet decomposition of the image "Couple" is shown in Fig. 3. It can be seen that coefficients in subband "LL" exhibit high correlation in both horizontal and vertical directions due to the fact that "LL" is the low-pass filtered version of the original image in both directions. Likewise, the coefficients in the "HL" and "LH" subbands are highly correlated either vertically or horizontally. However, the coefficients in subband "HH" have less correlation in the subband of high frequency in both directions. In view of the concurrent correlations in the 3D-LVQ index assignment and wavelet subbands, with properly constructed vectors based on the correlation of wavelet coefficients, the neighboring wavelet vectors will most likely share some sublattice points, which motivates us to develop a better side decoding approach by predicting lost descriptions (sublattice points) using neighboring information. To exploit the directional correlations in the wavelet subbands, we consider constructing a vector for the "LH" subband with two horizontally neighboring coefficient, whereas for the "HL" subband, a vector is constructed with two vertical neighboring coefficients. For simplicity, vectors for the "LL" and "HH" subbands are also constructed horizontally.

Proposed 3D-LVQ side decoding with prediction
Consider a wavelet vector x which is mapped to (λ 1 (x), λ 2 (x), λ 3 (x)) in the 3D-LVQ coding, where λ k (x) is assigned to kth description. We will first study the two-descriptionbased side decoding, that is, the reconstruction of the vector x if one description such as description k is lost (λ k (x) is missing). As discussed above, there is strong intra-and inter-correlation in the assignment of sublattice points for the 3D-LVQ mapping, while neighboring wavelet vectors may most likely share most or all sublattice points. Therefore, it is reasonable to predict the lost λ k (x) from those received sublattice points for the vector x as well as from its neighboring vectors. A list of sublattice point candidates can be formed for the estimation of λ k (x). Subsequently, we can reconstruct the vector x by taking each sublattice point in the list as an estimate of the missing sublattice point for decoding and finally averaging the decoded results.
As an example, we consider the vector x and its neighboring vector y labeled as (λ 1 (y), λ 2 (y), λ 3 (y)) with description 1 being lost. Then, we receive {λ 2 (x), λ 3 (x)} for vector x and {λ 2 (y), λ 3 (y)} for vector y at the decoder side, while λ 1 (x) and λ 1 (y) in description 1 are missing. Based on the above discussion, the candidate list for estimating the lost λ 1 (x) can be obtained as {λ 2 (x), λ 3 (x), λ 2 (y), λ 3 (y)}, in which each element may be a good prediction. Note that these sublattice points in the list may be duplicate. We can thereby use all the candidates in the list one by one as an estimate of the missing sublattice point for decoding and then take the average as the reconstructionx. That can be represented as:x If there are more neighboring vectors of x, their sublattice points can be included in the candidate list. Note that there may be some invalid 3-tuple combinations with the prediction scheme, which are not decodable by the inverse mapping function. In that case, those sublattice points causing invalid combinations are removed from the candidate list. Then, all the valid combinations based on the final candidate list are decoded and averaged as the final reconstruction of x.
We now consider one-description-based side decoding where only one description is received while the other two are missing. Assuming description 1 and description 2 are lost, only the sublattice points {λ 3 (x)} and {λ 3 (y)} are received for the vector x and its neighboring vector y, respectively. Similarly, we can also construct a candidate list of {λ 3 (x), λ 3 (y)}. Instead of estimating the two missing sublattice points which are harder or unreliable to be predicted based on one received sublattice point and its neighbor, we simply use the sublattice points in the list as possible reconstructions for vector x followed by an averaging that isx = (λ 3 (x) + λ 3 (y))/2. Like the two-description-based side decoding, we also need to perform a validation for each candidate in the list by checking whether the candidate point is the same as or immediately neighboring to the received sublattice point {λ 3 (x)}. Invalid sublattice points are removed from the list. Then, all the valid sublattice points are averaged to obtain the final reconstructionx.
In the above, we show the way to obtain the reconstruction given one neighboring vector for vector x, which can be extended to the case of more neighboring vectors. Consider a two-dimensional wavelet image, there are four directly neighboring vectors for a vector. Denote by λ(i, j) the current vector to be decoded, while λ(i−1, j), λ(i+1, j) and λ(i, j−1), λ(i, j + 1) are the four adjacent vectors horizontally and vertically, respectively.
For the band "LL, " in view of both horizontal and vertical correlation, prediction for the current vector λ(i, j) can utilize the four adjacent vectors. All the received sublattice points of vector λ(i, j) and these four neighboring vectors are put into the candidate list with possible duplicates. For the band "HL" exhibiting the vertical correlation, the two vertically adjacent vectors λ(i, j − 1) and λ(i, j + 1) are employed for the prediction. Therefore, the candidate list consists of received sublattice points for λ(i, j), λ(i, j − 1), and λ(i, j + 1). For band "LH" showing the horizontal correlation, we use horizontally adjacent vectors λ(i − 1, j) and λ(i + 1, j) in the prediction. Consequently, the candidate list comprises the received sublattice points for λ(i, j), λ(i − 1, j), and λ(i + 1, j). For the band "HH, " no prediction is considered and the conventional MDLVQ decoding is performed, that is, the received sublattice point or the average of two received sublattice points is used as the reconstruction of the current vector for one-description-based or two-description-based side decoding. Figure 4 illustrates the predictive side decoding using neighboring vectors with respect to the different subbands.

Experimental methods
Five standard 512 × 512 images, "Lena, " "Couple, " "Baboon, " "Aerial, " and "Goldhill, " were tested in the experiment. The input image was applied with a discrete wavelet transform (DWT), where four-stage decomposition with the 10/18 Daubechies wavelet was employed. As mentioned before, to exploit the directional correlations in the wavelet subbands, we constructed a 2 × 1 vector with two horizontally neighboring coefficients in the "LH" subband or two vertically neighboring coefficients in the "HL" subband, while the vectors in the "LL" and "HH" subbands could be formed horizontally or vertically (horizontally in our experiments). Such a vector x is then quantized to a (fine) lattice point λ(x), which was mapped to three sublattice points λ 1 (x) , λ 2 (x), and λ 3 (x) based on the predesigned index assignment. Lastly, adaptive three-order arithmetic coding was applied to compress the three sequences of sublattice indexes. The three produced descriptions may be transmitted in separate channels. At the receiver, the conventional decoding method and the proposed predictive decoding method were used to reconstruct images based on the received descriptions. Note that our focus is to test the effectiveness of the proposed side decoding in terms of rate-distortion performance, as compared to the conventional side decoding [51] as shown in (4) and (5). We implemented both the algorithms with the sublattice index number N = 31.

Experimental results and discussion
Rate-distortion curves are plotted in Fig. 5 to compare the two decoding schemes in decoding all the five testing images. It can be seen that our proposed predictive scheme consistently outperforms the conventional method in both one-description-based and two-description-based side decoding, where up to 1.68 dB (at 0.531 bpp for "Goldhill") a b c d e Fig. 5 Rate-distortion performance comparison of reconstructed images using the proposed predictive side decoding and the conventional side decoding: a "Lena," b "Couple," c "Baboon," d "Aerial," and e "Goldhill"

Xu EURASIP Journal on Wireless Communications and Networking
(2020) 2020:93 Page 11 of 14 and 1.64 dB (at 0.531 bpp for "Goldhill") gains are obtained in the cases of 2-description side decoding and 1-description side decoding, respectively. Reconstructed images for "Lena" in the case of losses of one and two descriptions are shown in Fig. 6 for a subjective visual comparison. In the figure, the proposed scheme can achieve 1.37 dB gain at 0.537 bpp in the 2-description side decoding and 1.25 dB gain at 1.012 bpp in the 1-description side decoding over the conventional method for "Lena, " respectively. The coding gain tends to become more significant at lower bit rates where the side distortion is normally larger, as expected. With a higher coding bit rate, the conventional side decoding may also reconstruct a vector fairly well even with one or two received sublattice points due to a finer quantization in that case, leaving less room of improvement for the predictive side decoding.

Conclusions
In this paper, we consider the design of human-centered image coding scheme that can adapt to people's varying watching situations and consider people's demand for real-time image display. Specifically, a novel predictive side decoding scheme for 3D-LVQ-based image coding has been proposed. In view of the strong intra-and inter-correlation in the index assignment of 3D-LVQ mapping as well as the intra-subband correlation exhibited in the low-frequency wavelet subbands, we have developed an effective prediction approach for lost descriptions (sublattice points) to enhance side decoding performance. The prediction scheme adapts to the different subbands with varying intra-subband correlation characteristics. Experimental results have substantiated the effectiveness of the proposed predictive side coding in reducing side distortions significantly for both twodescription-based and one-description-based cases. As compared to the conventional side decoding method, the proposed decoding scheme has shown up to 1.68 dB and 1.64 dB performance gains in the cases of 2-description side decoding and 1-description side decoding, respectively, in our experiments.