Assume that the multimedia data includes *N* video frames. Video frame pixel of each video frame is expressed as px(*i*, *j*). Parameter *i* belongs to interval [0, *X*]. Parameter *j* belongs to interval [0, *Y*]. Here, *X* is the pixel number of horizontal direction video frame. *Y* is the pixel number of vertical direction video frame. TF(*i*, *j*) denotes the signal content of multimedia data. Multimedia data content MC based on time domain and frequency domain is shown by formula (1).

$$ \mathrm{M}\mathrm{C}=N\frac{{\displaystyle {\sum}_{i=0}^X{\displaystyle {\sum}_{j=0}^Y{\left|\mathrm{T}\mathrm{F}\left(i+1,j+1\right)-\mathrm{T}\mathrm{F}\left(i,j\right)\right|}^2}}}{{\displaystyle {\prod}_{i=0}^X\mathrm{p}\mathrm{x}{\left(i,Y\right)}^{N\ast X}}} $$

(1)

Based on multimedia data compression of the cloud platform, we considered the time domain and frequency domain conversion. The compression content is given by formula (2).

$$ \left\{\begin{array}{l}\mathrm{M}\mathrm{C}\mathrm{C}={\displaystyle \underset{i=0}{\overset{N}{\int }}\mathrm{p}\mathrm{x}\left({t}_x,{f}_y\right)\mathrm{d}\mathrm{t}}\\ {}{t}_x=\sqrt{\frac{1}{X}{\displaystyle \sum_{i=0}^X{\displaystyle \sum_{j=0}^Y{\left|{t}_i-{t}_j\right|}^{\frac{X+Y}{N}}}}}\\ {}{f}_y=\frac{Y}{N}{\displaystyle \prod_{t=0}^X\sqrt{\left|{f}_t-{f}_{t+1}\right|}}\end{array}\right. $$

(2)

Here, *t*
_{
x
} is the delay jitter. *t*
_{
y
} is the frequency domain conversion. It is well known that the frequency domain and time domain have a great influence on the efficiency and quality of multimedia compression in the cloud platform.

Encoding rate is *K* pixels per second. The multimedia data for cloud encoding can obtain space gain, as shown in formula (3).

$$ \left\{\begin{array}{l}{S}_G=\left|TF-{\displaystyle \sum_{i=0,j=0}^{X,Y}\mathrm{p}\mathrm{x}\left(i,j\right)}\right|\frac{S_{\mathrm{MC}}}{S_{\mathrm{MC}\mathrm{C}}}\\ {}{S}_{\mathrm{MC}}={\displaystyle \sum_{i=0}^X{\displaystyle \sum_{j=0}^Y{\left|\mathrm{T}\mathrm{F}\left(i+1,j+1\right)-\mathrm{T}\mathrm{F}\left(i,j\right)\right|}^2}}\\ {}{S}_{\mathrm{MC}\mathrm{C}}={\displaystyle \sum_{j=0}^Y{\displaystyle \sum_{i=j+1}^X{\displaystyle \underset{i=0}{\overset{N}{\int }}\mathrm{p}\mathrm{x}\left({t}_x,{f}_y\right)}}}\end{array}\right. $$

(3)

Here, *S*
_{
G
} is the space gain. *S*
_{MC} is the cloud platform space occupancy before the compression of multimedia data. *S*
_{MCC} is the space occupancy of cloud platform after compression of multimedia data.

Thus, the distortion *D*
_{
t
} is caused by the frequency domain conversion and the distortion *D*
_{
f
} caused by the frequency domain conversion. The delay jitter *D*
_{IS} is shown in the formula (4), which was caused by the multimedia data compression.

$$ {D}_{\mathrm{IS}}=\frac{D_t}{D_f} $$

(4)

The influence of multimedia data content on the correlation between the time and frequency domain was reflected by the distortion ratio. In the process of cloud compression, the content of multimedia data is optimized by analyzing the changes of *D*
_{IS}, *D*
_{
t
}, and *D*
_{
f
}. Therefore, the distortion of each video frame in the multimedia data can be calculated by the formula (5).

$$ \left\{\begin{array}{l}{D}_F\left(x,y\right)=\frac{\mathrm{TF}\left({\mathrm{VF}}_{x,y}\right)}{C_{cl}{\delta}^{\sqrt{N}}{\displaystyle {\prod}_{i=0}^{X+Y}{\mathrm{px}}_i}}\\ {}{\mathrm{VF}}_{x,y}={\displaystyle \prod_{i=x}^X{\displaystyle \sum_{j=y}^Y\mathrm{p}\mathrm{x}\left(i,j\right)}}\end{array}\right. $$

(5)

where, *D*
_{
F
}(*x*, *y*) denotes the multimedia video data frame distortion value. VF_{
x, y
} expresses multimedia content of these video frames. *δ* is the variance of *D*
_{IS} on the time and frequency domain. *C*
_{cl} denotes the cloud compression ratio.

Combining Eqs. (1) to (5), the multimedia compressed content *C*
_{MC} is given by Eq. (6), which depended on the conversion jitter of cloud platform multimedia data in time domain and frequency domain.

$$ \left\{\begin{array}{l}{C}_{\mathrm{MC}}=\frac{{\displaystyle {\sum}_{j=0}^Y\mathrm{T}\mathrm{F}\left(X,j\right)}{\displaystyle {\int}_0^{+\infty}\mathrm{p}\mathrm{x}\left(N,y\right)\mathrm{dy}}}{\delta^2{\displaystyle {\sum}_{i=0}^X\mathrm{D}\mathrm{F}\left(x,Y\right){\displaystyle {\sum}_{j=0}^Y\mathrm{T}\mathrm{F}\left(X,y\right)}}}\\ {}\mathrm{p}\mathrm{x}\left(N,Y\right)\ge {\displaystyle \prod_{i=x}^X{\displaystyle \sum_{j=y}^Y\mathrm{p}\mathrm{x}\left(i,j\right)}}\end{array}\right. $$

(6)

In the cloud compression, a frequency domain of transition distortion gain could be obtained based on three different frequency directions. How to select three frequency directions was shown in Fig. 1. The progress of quantifying and reconstructing residual image samples of content-centric multimedia data compression is equivalent to the object pixel iteration of the video frames. The video frame reconstruction based on pixel expansion can judge the correlation of the subsequent pixel, which is used to provide reference pixels for cloud compression. It is well known that the multimedia data correlation of adjacent clouds is strong. Therefore, the combination of the multimedia stream video frames and frame reconstruction pixels would have better multimedia compression performance.

When we compressed the content-centric multimedia data, the expanding of the pixel would be as the reference compression pixel. The filtering progress can make up the video frame pixel distortion caused by compression. The edges fuzzy extension model of the video frame could reduce the cloud compression space complexity. In cloud compressed video frames, the cooperative filtering of cloud equipment can denoise the multimedia stream, which could obtain more cloudy compression gain. On the three different frequency directions in Fig. 1, according to content requirements, we selected the best cloud platform resources as a follow-pixel reference pixel in order to increase the cloud compression weight. Workflow is shown in Fig. 2.