TV2++: a novel spatial-temporal total variation for super resolution with exponential-type norm

Recently, many super-resolution algorithms have been proposed to recover high-resolution images to improve visualization and help better analyze images. Among them, total variation regularization (TV) methods have been proven to have a good effect in retaining image edge information. However, these TV methods do not consider the temporal correlation between images. Our algorithm designs a new TV regularization (TV2++) to take advantage of the time dimension information of the images, further improving the utilization of useful information in the images. In addition, the union of global low rank regularization and TV regularization further enhances the image super-resolution recovery. And we extend the exponential-type penalty (ETP) function on singular values of a matrix to enhance low-rank matrix recovery. A novel image super-resolution algorithm based on the ETP norm and TV2++ regularization is proposed. And the alternating direction method of multipliers (ADMM) is applied to solve the optimization problems effectively. Numerous experimental results prove that the proposed algorithm is superior to other algorithms.

based methods [14][15][16][17]. Recently, learning-based methods have gained huge attentions because of its efficiency and generalization. The learning-based methods acquire the co-occurrence prior knowledge between the low-resolution and high-resolution image blocks through the learning process. From sparse representation [15,18], anchored neighborhood regression [16] to later transformed self-exemplars [17,19] and deep convolutional networks [20][21][22], by learning from examples based on huge databases consisting of low-resolution and high-resolution image pairs, these methods can recover details and enhance texture information. Extracting the high frequency information based on local features from the training images and then adding the high frequency detail for low-resolution image can guide a high-resolution image reconstruction [14]; however, it also results that these methods deeply rely on selected data and sometimes may bring some artifacts. Moreover, learning-based methods usually require a large number of training sets and a large amount of time to complete the training, which makes it impossible to meet the requirements in real time. Because the model relies too heavily on the data set, it does not have independence [17,19]. A lot of time and pretreatment also make this method inefficient so that it cannot achieve the desired effect. The edge calculation based on big data [23,24] can improve the processing efficiency of the system model to a certain extent. We propose to use ADMM to optimize the calculation of the model, which greatly improves the running speed and efficiency.
Conversely, the bulk of reconstructed based methods usually get high-resolution image from a sequence of low-resolution images by optimizing the cost function which have prior restriction [10]. The reconstruction based methods are to invert high-resolution images from the low-resolution image through the image degradation model, which is to solve the inverse problem of the image, and often requires prior knowledge of gradient [25], sparse [15,26,27], spatial-temporal feature [28,29] and so on. Iterative back projection method (IP) proposed by Pejman et al. [11] obtained the high-resolution image through iterative back projection for deviation of simulated low-resolution image and observed image. Non-local means (NLM) [12,24] is a SR method based on the non-local property. It explains that analogous pixels are not limited to a local area, such as long edges and texture structures. As seen in [13,30], a framework of maximum a posteriori (MAP) estimation is proposed to performs joint blur identification and high-resolution image reconstruction. And the algorithm addresses blind image super resolution by fusing multiple blurred low-resolution images to render a high-resolution image. In recent years, many popular super-resolution algorithms based on total variation regularization (TV) [10,[31][32][33][34] have emerged through reconstruction perspective. The TV function is usually defined as the integral of absolute value of image gradient, as shown in [31,35]; the total variation of the signal is used as a regularizing functional. Since the signal with too much and possibly false details has a high total variation, i.e., the integral of the absolute gradient of the signal is high. According to this principle, the total variation of the signal is reduced to closely match the original signal, removing unnecessary details while retaining important details such as edges. In the case of low signal-to-noise ratio, total variation denoising is also very effective in retaining edges while smoothing out noise in flat areas. This also leads to a total deterioration that will make the restored image too smooth, and in some more detailed images, the restored image will lose details. Therefore, it is the focus of research to keep more details while removing noise.Farsiu et al [33,36] proposed a bilateral total variation (BTV) regularization prior algorithm, which used the same smoothing coefficient to process adjacent pixels at the same position, and smoothed the images in all directions while retaining certain edge information. In [5,37], the authors formulated a new time dependent convolutional model for super resolution based on the TV regularization. And in [38], a spatially weighted TV image super-resolution algorithm is proposed, in which the spatial information distributed in different image regions is added to constrain the super-resolution process. In addition, Li et al. [39] further improved the above algorithm based on TV regularization. They proposed a new regularization term named steering kernel regression total variation (SKRTV), in which the local structural regularity properties can be fully exploited. Furthermore, in [32,40], a partial differential equation (PDE) model is proposed for SR based on a constrained variational model which uses the nonlocal total variation (NLTV) as a regularization term. As known from these algorithms, TV regularization has a good effect on retaining image edge information [36,41].
However, these TV-based regularization methods act only on the space dimension and does not take into account the dependence of the time dimension [42,43]. And there are many images with temporal similarities between frames, such as sequence images [44] and medical CT images [45,46]. In addition, for the time dimension in TV regularization, Li et al. [47] proposed that one can directly treat the time component as an additional space variable and form the corresponding seminorm. In this form, all dimensions are coupled, and the expected behavior is uniform in all directions.The time component is directly regarded as a spatial variable, which ignores the similarity in the time dimension and the difference between the temporal variable and the spatial variable. Without considering time and space separately, picture details and edge information will be ignored. Therefore, considering the differing behaviors of the spatial and temporal components, we decouple the spatial dimensions from the temporal one when constructing TV2++. Moreover, TV2++ could allow for more flexibility in penalizing. Admittedly, TV regularization is so susceptible to noise that often causes staircase effects. To weaken this flaw, we added a global low-rank constraint on the image, as Feng et al. [10] did. The global low-rank constraint in the model is to enhance global consistency and improve noise immunity. In addition, we introduce the exponential-type penalty (ETP) norm [48] to constrain the image with low-rank. Non-convex penalty functions such as ETP are extended on singular values of a matrix as surrogate functions of -norm to enhance low-rank matrix recovery more effectively, which has been proven in [49]. The ETP is characterized by a positive parameter, which establishes a connection with the 0 and 1 penalties.
In this paper, we introduce the ETP global low-rank regularization [48,50] and design a TV regularization on the space dimension and the time dimension (TV2++) to construct the basic framework of the algorithm. Overall, our contributions are threefold: • The algorithm, which designs TV2++ regularization based on spatio-temporal, utilizes image time dimension similarity and further strengthens image detail recovery. • A novel framework based on TV2++ regularization and ETP norm low-rank regularization is constructed; TV2++ regularization helps retain image edge information; ETP norm low-rank regularization does well in global consistency and improving noise immunity.
• We use an efficient alternating direction method of multipliers (ADMM) [51]-based approach for problem optimization.

Background
In this part, we first introduce a basic model in the TV methods, and then we introduce the exponential-type penalty function and IRNN algorithm for solving the nonconvex low-rank minimization problem.

Basic model
We first suggest the basic model in the TV methods. In this model, we take noise and blur into consideration. Then, we solve the SR problem through this model. The basic observation model is: where T means the image we observed and X is the HR image we expect to recover. D symbolizes the down-sampling operator and S is the blurring operator. At the same time, N represents the noise which influencing our observation results.
We want the HR image to be as perfect as possible.In mathematics, we can get the optimal solution in the basic model. So, we deform the above model to build a new cost function as follows: When we solve the optimization problem, then we can get the optimal X, which we can define as X 0 . The cost function consist of one penalizing term about LR image T and HR image X.
To better solve this ill-posed problem, we can add regularization terms.
where ρ(X) means the regularization term. This term usually depends on prior knowledge and we will illustrate this term in the following part.λ is the parament controlling the weigh of regularization terms in the whole model.

Nonconvex low-rank minimization
Variable selection problems are typically addressed under a penalized optimization framework. Nonconvex penalties such as the exponential-type penalty (ETP) [48,52] have been demonstrated to have the properties of sparsity practically and theoretically. The exponential-type penalty is characterized by a positive parameter, which establishes a connection with the and penalties. More specifically,the limits of ETP are the 0 and 1 penalties when this parameter approaches ∞ and 0 respectively. And it is easy to extend the exponential-type penalty functions on singular values of a matrix as surrogate functions of -norm to enhance low-rank matrix recovery. In addition, matrix completion algorithm [53] is based on the assumption that recovered matrix has low rank and then the missing entries can be filled in by minimizing the difference between the input matrix and the estimated one.
However, different from convex optimization, solving the nonconvex low-rank minimization problem is much more challenging than the nonconvex sparse minimization problem. Iteratively reweighted least squares (IRLS) algorithm [54] has been recently extended to handle the nonconvex Schatten-p norm penalty. Actually, it solves a relaxed smooth problem which may require many iterations to achieve a low-rank solution. It cannot solve the general nonsmooth problem.
Recently, Lu et al. [49] propose an iteratively reweighted nuclear norm (IRNN) algorithm to solve the nonconvex nonsmooth low-rank minimization problem. The IRNN algorithm takes advantage of the following properties of exponential-type penalty function: the exponential-type penalty function is concave and monotonically increasing on [ 0, ∞], and its gradient is a decreasing function. IRNN algorithm iteratively solves a weighted singular value thresholding (WSVT) problem. By setting the weight vector as the gradient of the concave penalty function, the WSVT problem has a closed form solution.
where w is the weight coefficient, σ is the singular value, and ∇f X k is the super gradient.

Our method
In this part, we will propose our own model, namely LRTV2++. As we know, rank is the important property, often calculated by the minimum of columns and rows containing non-zero elements. Rank is often thought about the richness of information contained in images. In regularization, if a matrix has low rank property, we can recover some losing columns or rows linearly expressed by some known rows and columns. This theory is widely used in matrix completion. We often represent the low rank function about matrix X from the certain norm, in this paper, we use ETP function to approximate L 0 norm.
where X is the number of dimensions, and α i is the weigh. Especially, α i satisfies two constraints: Gao et al. [17] proposed a non-convex continuous surrogate function called exponential type penalty (ETP), as follows: And its super gradient is defined as: As we know, TV function on the vertical horizontal is usually defined as: where is spatial domain, and ⊆ R 2 . And then, we consider temporal dependency and get the difference term of the time dimension ∂ t X. Its formula is as follows: Therefore, TV2++ can be expressed as: The first of Eq. (11)is the spatial regularization term, and the latter is the temporal regularization term. Then, adding the ETP low rank regularization and TV2++ regularization, the super-resolution reconstruction model can be further expressed as: In this method, we separate ∂ t X from ∂ x X and ∂ y X, separating spatial dimensions from temporal dimensions, considering the difference of the temporal dimension and spatial dimensions.

The optimization
To solve the optimization problem of the cost function Eq. (11), we use the alternating direction method of multipliers (ADMM) algorithm. ADMM is proved to be an efficient algorithm and a widely used optimization method for constrained problems when a cost function includes multiple variables. This algorithm solve one variable through other dual variables and iterate this variable to solve other variables. Repeat this process, then we can complete the optimization problem of our cost functions. We will show how to use ADMM to optimize our cost functions in the following contents.
where M and U are the auxiliary variables. Further derivation of Eq. (13), the augmented Lagrangian function can be obtained as follows:

Sub-problem 1: updating
equivalent to: Take the derivative of the above function : From Eq. (17), we can easily get the equation of X in TV2++ method:

Sub-problem 2: updating
equivalent to: It can be further transformed into: where, As we all know, a globally optimal solution to the above problem is given by the weighted singular value thresholding: where, Eq. (24) is the SVD of Y.

Sub-problem 3: update U
which can be solved efficiently via a soft shrinkage operator: where,

Experimental results and analysis
We take CT images of the HNSCC-3DCT-RT dataset as original images to perform the experiment. And the low-resolution images are all obtained by MATLAB simulation. There are four comparison algorithms used in the experiment: L1TV, L2TV, NLM [12], and LRTV [10]. In addition, we employed two important characters to estimate the recovery quality: peak signal-to-noise ratio (PSNR) and structure similarity index (SSIM). PSNR is widely used to evaluate quality of SR recovery and its unit is decibels (dB). The mathematical formula of PSNR is defined as: where MSE is the mean square error between original HR image and recovered HR image. Aside from PSNR, SSIM is another important character we used to evaluate the similarity between images. SSIM is usually considered to accord with the visual perception of human naked eyes. Its mathematical formula can be written as: where μ x is the average value of the original HR image x and μ y is the average value of the recovered y, σ x is the variance of x and σ x is the variance of y. σ xy is the covariance of original HR image and recovered image. c 1 and c 2 are the balance parameters. SSIM is ranged from 0 to 1. When the recovered image is exactly same to original HR image, SSIM equals 1.

Performance under noise environment
In this part, we add Gaussian noise to the images (4-HEADNECK 2.0 B30s-50686) in order to test the robustness of super-resolution reconstruction. Gaussian noise is a special kind of noise whose probability density function obeys normal distribution. We set mean of Gaussian noise to be 0 and its variance to be 0.001, 0.003, 0.005, 0.006, 0.007. Each algorithm is tested in the same experimental environment, and the experimental results are shown in curves charts. Figure 1 and Fig. 2 show the PSNR and SSIM trends for all algorithm results as the variance of Gaussian noise increases from 0.001 to 0.007, where sigma represents the variance of the Gaussian noise. In Fig. 1 and Fig. 2, we can see that the proposed method has overwhelmingly advantage compared with other algorithms. When sigma increased gradually, the advantage tends to emerge. We can find that the index data curve of the proposed algorithm is much better than the others, which means that the proposed algorithm has better image super-resolution recovery ability. Figure 3 shows recovered images of different methods and enlarged regions. Compared to original HR image, recovered images of L1TV, L2TV, and NLM appear to be blurred while the recovered image of the proposed method is clearer and the details are better restored. This is because we enhance the efficiency of low rank regular terms by replacing the tracking norm with ETP non-convex penalty function constraints. In addition, the TV2++ regular term utilizes the temporal dimensional similarity of the image to further enhance the detail recovery capability of the image.

Recovery performance comparison
In this part, we will use more medical images from our dataset and show results by using different methods. We mainly show two other CT images in different part of bodies to evaluate the recovery performance of our proposed methods. We first chose one 3D CT of 12-08-2009-RTHEADNECK Adult-25471 series. By using the above methods, we will put results and recovered image and close-up view of selected regions in each image in the following part for comparison. From the Table 1, we can see that the proposed method gets best PSNR and SSIM index values. The recovered images and details in selected loca- tion are shown in Fig. 4. We can find that the result of the proposed method is clearer than other methods. When we compare these results to original image, we also find the proposed method is more similar to original image, resulting in high SSIM.
We further chose HEADNECK CT images to test and verify experimental results in the subsequent part. Processed images are shown in Fig. 5. We also show specific data of their PSNR and SSIM in the Table 2. We also tested other images in our database and got similar results to what we show by testing two examples. In general, the proposed method gets highest SSIM and PSNR than all other comparison methods. In addition, under blur interference, our proposed method is clearer than other methods.By selecting a close-up view of the location, this method displays the details better than other methods.

Ablation analysis
We use adult images 4-HEADNECK 2.0 B30s-50686 as the test image for parameter optimization experiments. For the parameter of low-rank term, we set α 1 = α 2 = 1 2 , which means weigh controlling each matrix unfolded in each dimensions plays the same role. We estimate the error for each iteration by calculating X k − X k − 1 . When the above error is less than 1e − 5, we stop the optimization of the parameters. First, we set λ 2 to 0.0001. We experiment with λ 2 being set between 0.0001 and 0.1, respectively. Through running our observed Fig. 6, it can be seen that the value of λ 1 is near 0.001 when the PSNR reaches a peak. Continuing to increase the value of λ 1 , the PSNR shows a monotonous decreasing trend. As can be seen from Fig. 7, when λ 1 increases between intervals 0 and 0.1, the value of SSIM shows a tendency to decrease slightly. Taking into account the performance of the above indicators, we believe that the experimental result obtained is the best when the value of λ 1 is set to 0.009.
Then, we fixed the parameter λ 1 and performed a parameter optimization experiment on the parameter λ 2 . We experiment with λ 2 being set between 0.0001 and 0.1, respectively. Through running our program, we acquire the value of PSNR and SSIM when λ 2 is 0.0001, 0.0002, 0.0005, 0.0007, 0.01, 0.1, and 0.5. And the line charts are shown in Fig. 8 and Fig. 9.
As can be seen from Fig. 8, the line chart has both a rising interval and a falling interval, and when λ 2 is around 0.1, the PSNR value reaches a peak. For the SSIM in Fig. 9, it can be seen that as λ 2 increases, the value of SSIM continues to decrease. Taking into account the performance of the above indicators, we believe that the experimental result obtained is the best when the value of λ 2 is set to 0.1.

Discussion
In this paper, we propose a super-resolution reconstruction algorithm based on TV2++ regularization and ETP norm low-rank regularization and optimize it by ADMM method. TV2++ regularization is based on spatio-temporal and utilizes image time dimension similarity. Considering the differing behaviors of the spatial and temporal components, we decouple the spatial dimensions from the temporal one which also allows for more  On the other hand, ETP norm are extended on singular values of a matrix as surrogate functions of L 0 -norm to enhance low-rank matrix recovery more effectively. Then, the experimental results reveals that the proposed method has better capacity of resisting noise and blur. In addition, our model is more complex than NLM, L1TV, and L2TV, although we use the ADMM algorithm for optimization. But it still takes more time than NLM, L1TV, and L2TV. The reason may be that the program takes a lot of time to solve the TV problem.