Infrared and visible image fusion based on nonlinear enhancement and NSST decomposition

In multi-scale geometric analysis (MGA)-based fusion methods for infrared and visible images, adopting the same representation for the two types of images will result in the non-obvious thermal radiation target in the fused image, which can hardly be distinguished from the background. To solve the problem, a novel fusion algorithm based on nonlinear enhancement and non-subsampled shearlet transform (NSST) decomposition is proposed. Firstly, NSST is used to decompose the two source images into low- and high-frequency sub-bands. Then, the wavelet transform (WT) is used to decompose high-frequency sub-bands to obtain approximate sub-bands and directional detail sub-bands. The “average” fusion rule is performed for fusion for approximate sub-bands. And the “max-absolute” fusion rule is performed for fusion for directional detail sub-bands. The inverse WT is used to reconstruct the high-frequency sub-bands. To highlight the thermal radiation target, we construct a non-linear transform function to determine the fusion weight of low-frequency sub-bands, and whose parameters can be further adjusted to meet different fusion requirements. Finally, the inverse NSST is used to reconstruct the fused image. The experimental results show that the proposed method can simultaneously enhance the thermal target in infrared images and preserve the texture details in visible images, and which is competitive with or even superior to the state-of-the-art fusion methods in terms of both visual and quantitative evaluations.

information, while the target will be inconspicuous and easily influenced by smoke, bad weather conditions, and other factors. Therefore, fusion of the two types of the images can compensate for the insufficient imaging competence of infrared and visible sensors [11]. The final fused image can possess clearer scene information as well as better target characteristics [12].
There are seven main fusion methods: multi-scale geometric analysis (MGA)-based, sparse representation-based [13][14][15], neural network-based [16,17], subspace-based [18], saliency-based methods [19], hybrid models [20], and other methods. Among them, MGA-based methods are the most popular. MGA-based methods assume that the images can be represented by different coefficients in different scale. These methods decompose the source images into low-and high-component bands, combine the corresponding bands with specific fusion rules, and reconstruct the fused image with the inverse MGA transform [21]. The key to MGA-based methods is the MGA transform, which decides the amount of the useful information that can be extracted from source images and integrated in the fused image. Popular transforms used for decomposition and reconstruction include wavelet transform [22] (WT), wedgelet transform [23], curvelet transform [24,25], contourlet transform [26], NSCT [27,28], shearlet transform [29] (ST), non-subsampled shearlet transform [30] (NSST), and so on. Due to the characteristics of shift-invariant, high sensitivity, strong directivity, fast operation speed, and multi-directional processing, NSST has been widely used in the image fusion [31]. Many researches have shown that NSST is more consistent with human visual characteristics than other MGA transforms, and the performance can make the fused images have better visual effects [32]. However, it may be inappropriate for the infrared and visible image fusion. In infrared images, the target information is significant and easy to detect and recognize. While in visible images, the detailed information is mainly provided by gradients. Therefore, adopting the same representation for the two types of images will cause the thermal radiation target inconspicuous, which can hardly be distinguished from the background. In MGA-based fusion methods, it is difficult to keep the thermal radiation in infrared images and appearance information in visible images simultaneously.
To overcome the problem, we proposed a new fusion algorithm based on nonlinear enhancement and NSST decomposition for the infrared and visible images. Firstly, the NSST is used to decompose the two source images into low-and high-frequency subbands. Then, the high-frequency sub-bands are fused with WT-based method. To highlight the target, we construct a non-linear transform function to determine the fusion weight of low-frequency sub-bands, and whose parameters can be further adjusted to meet different fusion requirements. Finally, the inverse NSST is used to reconstruct the fused image. The experiments demonstrate that the proposed method can not only enhance the thermal target in infrared images, but also preserve the texture details in visible images. The presented method is competitive with or even superior to other methods in terms of both visual and quantitative evaluations.
The rest of this paper is organized as follows. The principle theoretical base and implementation steps of NSST are reviewed in Section 2. The details of the proposed image fusion method are proposed in Section 3. Experimental results and comparisons are presented in Section 4. The main conclusion of this paper is drawn in Section 5.

Related works
NSST is one of the most suitable multi-scale geometric analysis tools for fusion applications. The NSST provides an elegant sparse image representation with edges and much detail information. It does not introduce artifacts or noise when the inverse NSST is performed. In addition, the shearlet coefficients are well-localized in tight frames ranging at various locations, scales with anisotropic orientation. This achieves a successful fusion process and produces higher image quality and more clearness of image details and edges [33].

Basic principle of NSST
The shearlet construction is based on the non-sampled pyramid filter banks that provide the multi-scale decomposition and directional filtering generated using shear matrix that provides multi-directional localization. When the dimension n = 2, the affine system with synthetic expansion is A AB (ψ) [10].
where ψ ∈ L 2 (R 2 ), A and B are 2 × 2 invertible matrices and |detB| = 1. If A AB (ψ) forms a Parseval tight framework for L 2 (R 2 ), the elements of the system are called composite wavelets. For any f ∈ L 2 (R 2 ), there is Among them, matrix A j and B l are respectively associated with scale and geometric transformations, such as rotation and shear operations.
Where A a ¼ a 0 0 ffiffiffi a p , B s ¼ 1 s 0 1 , the system can be shown as follows: Equation (3) is a shearlet system, and ψ ast (x) is a shearlet. Figure 1 shows the tiling of the frequency plane induced by the shearlets and frequency supports of shearlet elements. It can be seen from Fig. 1 that each element ψ j;l;k ðxÞ is supported on a pair of trapezoidal pairs with the size of about 2 j × 2 2j , and the direction is along a straight line with a slope of l2 −j .

Implementation steps
The NSST can be realized through two steps: (1) Multi-scale decomposition. The nonsubsampled pyramid (NSP) filter bank decomposes each source image into a set of high-and low-frequency sub-images to attain multi-resolution decomposition. Firstly, the source image is decomposed into the lowand high-frequency coefficients with NSP. Then, the NSP decomposition of each layer will iterate on the low-frequency components obtained by the upper layer decomposition to get the singular points. Without the down-sampling operation, the sub-band images will have the same size as the source image. Finally, for j level decomposition, we can obtain a low-pass image and j band-pass images.
(2) Directional localization. The shearlet filter bank decomposes these high-frequency sub-images to attain multi-direction decomposition. Firstly, the pseudo polarization coordinates are mapped to Cartesian coordinates. Then, the "Meyer" wavelet is used to construct window function and generate shearlet filters. Finally, the sub-band image is convoluted with "Meyer" window function to obtain the direction sub-band images.
The two-level decomposition structure is shown in Fig. 2. The NSP decomposes the source image f into a low-pass filtered image f 1 a and a high pass filtered image f 1 d . In each iteration, the NSP decomposes the low-pass filtered image from the upper layer until the specified number of decomposition layers is reached. Finally, a low-pass lowfrequency image and a series of high-frequency images are obtained.

Proposed method
In this section, we introduce the process of the proposed method and discuss the setting of parameters. The low-and high-frequency components obtained from the NSST decomposition represent different feature information. For example, the low-frequency components carry the approximate features of the source image, and the highfrequency components carry the detailed features. The approximate parts of images provide more visually significant information and contrast information. The detailed parts of images provide more contour and edges information. Therefore, we should use different fusion rules to fuse the low-and high-frequency components. According to the stage of image data to be fused and the degree of information extraction in the fusion system, image fusion is divided into three levels: pixel level, feature level, and decision level. The proposed method focuses on the pixel level. The specific fusion scheme is shown in Fig. 3. The steps of proposed method are as follows: Step 1: Decompose the infrared and visible images with NSST into low-and highfrequency coefficients.
Step 2: Fuse low-frequency coefficients based on nonlinear enhancement algorithm.  Step 3: Fuse high-frequency coefficients based on WT-based method.
Step 4: Apply inverse NSST to obtain the fused image.

Low-frequency sub-band fusion
The low-frequency components reflect the contour information of the image, which contain a lot of energy information of the original image [34]. Weighted average method is commonly used to fuse low-frequency sub-bands; however, unreasonable fusion weight will cause loss of source image information or poor image performance. We introduce a fusion strategy that construct a nonlinear transform function to determine the fusion weight of the low-frequency sub-bands to address the problems.
In infrared images, the target information is significant. Due to the large gray values, the target is easy to detect and recognize. In order to highlight the target in the fused image, we extract the coefficients in the low-frequency component of the infrared image to determine the low-frequency fusion weight.
Each coefficient of the low-frequency components takes the absolute value as follows: Where LFC IR represents the low-frequency sub-band of the infrared image after decomposition, R represents the significant infrared characteristic distribution. R mean means the average of the LFC IR . When R is larger than R mean , it can be considered as a bright point; when R is smaller than R mean , it can be considered as a dark point. The bright points are regarded as the target, while the dark points are regarded as backgrounds. In order to highlight the target, a nonlinear transform function is introduced to control the degree of the enhancement. The nonlinear transform function is as follows: where the parameter λ belongs to (0, ∞). The low-frequency information fusion weight can be expressed as: Where C IR is the fusion weight of the infrared image, C VIS is the weight of visible image, and they both belong to [0, 1].
As shown in Eqs. 5-7, the parameter λ directly affects the fusion weight of the infrared image. Therefore, we can adjust λ to control the proportion of the infrared features of the fused image. Particularly, the larger the value of C IR , the more obvious the target is. To strengthen the thermal radiation target, the value of C IR should be relatively large.
The final low-frequency sub-band fusion result can be obtained as follows: where LFC _ F represents the low-frequency component of the fused image. LFC VIS represents the low-frequency component decomposed by visible images.

High-frequency sub-band fusion
High-frequency components reflect detailed information, such as edges and contours of the source image. To obtain more detailed information, we use the WT-based method to fuse the high frequency sub-bands of the infrared and visible images. Firstly, the WT is used to decompose high-frequency sub-bands to obtain approximate sub-bands (LFC IR and LFC VIS ) and directional detail sub-bands (HFC IR and HFC VIS ). Here, Haar wavelet is selected as the WT basis, and the decomposition layers are set to 1. Then, the "average" fusion rule is performed for fusion for approximate sub-bands. The approximate sub-band fusion rule is defined as follows: And the "max-absolute" fusion rule is performed for fusion for directional detail subbands. The directional detail sub-band fusion rule can be expressed as follows: where LFC _ F and HFC _ F represent the approximate and directional detail sub-bands of high-frequency sub-band images. Finally, the inverse WT is implemented on LFC _ F and HFC _ F to get the highfrequency sub-bands of the fused image.

Analysis of parameter
In the nonlinear enhancement method, there is a main parameter which influence the enhancement performance, namely, the parameter λ. In this section, we draw the curve of the enhancement weight C IR under different parameter λ shown in Fig. 4. The intensity of the target pixel in the fused image is determined by the value of C IR . The larger the value of C IR , the more evident the target is. As shown in Fig. 4, the C IR curve with the the abscissa R (the gray level of the pixel) is "S" type, which shows that the target pixels can obtain larger enhancement than that of the background pixels. Moreover, the shape of C IR becomes steep when the parameter λ increases. Therefore, it is convenient to adjust λ to get different fusion result. Figure 5 shows the fused images under the parameter λ of 5, 10, 30, 50, 100, and 200. As seen in Fig. 5, the pixel intensity distribution of infrared images is strengthened with the increase of λ. However, when λ reaches a certain degree, the distortion in the fused image will occur. The parameter λ should be appropriately large to meet different fusion requirements. In this paper, the value of λ is 10. The proposed algorithm is summarized as Table 1.

Experimental scheme
To evaluate the performance of the proposed algorithm, two groups of simulation experiments have been carried out. Firstly, we compare the proposed method with six MGA-based methods. Then, we compare our method with other five advanced methods. Finally, qualitative and quantitative analysis of experimental results is achieved. The infrared and visible images to be fused are collected from TNO Image Fusion Dataset. Our experiments are performed using MATLAB Code on a computer with 2.6 Hz Intel Core CPU and 4 GB memory.

Subjective evaluation
The subjective evaluation methods assess the quality of the fused image according to the evaluator's own experience and feeling. To some extent, it is a relatively simple, direct, fast, and convenient method. However, the lower efficiency and poorer real-time performance limit its practical applications. Table 2 shows the common used subjective evaluation criteria.

Objective evaluation
According to the different subjects, the objective evaluation indicators of image fusion quality can be divided into three categories: the characteristics of the fusion image itself, the relationship between the fusion image and the standard reference image, and the relationship between the fusion image and the source images [10]. We use A, B, and F to infrared, visible, and fused image, respectively, and R to be the ideal reference image. Here are the five objective evaluation parameters we used.

(1) Entropy (E)
E can be directly used to measure the richness of image information. The larger the E value, the better the fusion effects are. The calculation formula is shown in Eq. (11): where L is the total number of gray levels of the image, and p i is the probability with the gray value i in the image. 2. The parameter λ set to 10; compute fusion weight C IR and C VIS by nonlinear function S(λ) (Eqs. 5-8).

4.
Fuse high-frequency components using WT-based method with "mean" and "maximum absolute" rules to obtain fused high-frequency component HFC F .

5.
Repeat 4 until all corresponding components are fused.
6. The final fused images F is obtained by inverse NSST. (2) Average gradient (AG) AG is used to reflect the micro-detail contrast and texture variation in the image. The larger the AG value, the more gradient information the fused image contains. The calculation formula is shown in Eq. (12): where ΔF x is the difference in the x direction of the fused image F, and ΔF y is the difference in the y direction. (

3) Standard deviation (SD)
SD is used to reflect the distribution of pixel gray values and the contrast of the fused image. It is defined as follows: (4) Spatial frequency (SF) SF is used to reflect the overall activity of the image in the spatial domain. The solution of SF is defined in Eq. (16). The larger the SF, the better the fusion effects are.
where RF and CF are the row and column frequency of image respectively.

(5) Edge information retention (Q AB/F )
Q AB/F measures the amount of edge information that is transferred from the source image to the fused image. Q AB/F is defined as follows: w A and w B denote the weight of the importance of infrared and visible images to the fused image. Q AF and Q BF are calculated from the edges. A large Q AB/F means that considerable edge information is transferred to the fused image. For a perfect fusion result, Q AB/F is 1.
The key of MGA-based fusion schemes is the selection of the transforms. WT-and CURV-based methods have block artifacts, reduce the contrast of the image, and cannot capture abundant directional information of images. NSCT-based method can capture the geometry of image edges well, while the number of the directions at every level is fixed. In NSST-based methods, the number of the directions can be set arbitrarily, and thus the more detailed information can be obtained. But the more directions, the longer running time is. We replaced LP with NSST in TEMST as a comparative experiment.
In the proposed method, the pyramid filter for NSST is set as "maxflat," the decomposition level of NSST is set for 3, and the number of the directions is set for {4,4,4}. The high-frequency sub-bands are decomposed into 1 level by WT (with the basis of Harr). The results are shown in Fig. 6. The first two rows in Fig. 6 show the infrared and visible images. The six remaining rows denote the fused images of our method, TEMST, NSST with weighted average, WT, NSST with WT, NSCT with WT, and CURV with WT. The subjective and objective evaluation parameters introduced earlier are used to analyze the fusion results.
The above five assessment indicators (i.e., E, AG, SD, SF, and Q AB/F ) on the five typical infrared and visible images are shown in Fig. 7. The larger their values, the better the fusion effects are.

Comparison with the state-of-the-art methods
In this part, seven typical infrared and visible images in the TNO datasets (i.e., men in front of house, bunker, soldier_behind_smoke_1, Nato_camp_sequence, Kaptein_1123, lake, and barbed_wire_1) are chosen to evaluate the effectiveness of the proposed method. We compare the proposed method with other 5 advanced methods, including: guided filtering-based weighted average technique (GF) [40], multi-resolution singular value decomposition (MSVD) [41], fourth order partial differential equations (FPDE) [42], different resolutions via total variation (DRTV) [43], and visual attention saliency guided joint sparse representation (SGJSR) [44].
The fused images are shown in Fig. 8. The values of the five evaluations metrics on the seven infrared and visible images are shown in Fig. 9.

Results and discussion
As seen in Figs. 6, 7, 8, and 9, the above 12 methods can implement the effective fusion of infrared and visible images. In the other MGA-based methods, the fused image is dark and the target is not prominent, which can be clearly seen from the sky in images "Men in front of house" and "Kaptein_1123" in Fig. 6. It can be seen that the proposed method can achieve apparently and easily identifiable target information. In terms of objective evaluation parameters, our proposed method is generally higher than other methods as seen in Fig. 7. In short, the presented method in this paper is superior to other MGA-based methods.
Compared with five advanced methods, the presented method can achieve the best visual quality as shown in Fig. 8. However, analyzing the objective evaluation parameters (i.e., E, AG, SD, SF, and Q AB/F ) as seen in Fig. 9, there is a fluctuation. Our method cannot always get the highest values, but it can get the more stable image quality. In all, our method is competitive with the five advanced fusion methods.

Conclusions
In this study, we propose a new fusion algorithm for infrared and visible images based on nonlinear enhancement and NSST decomposition. It can be demonstrated that this algorithm can not only retain the texture details of the visible image, but also highlight the targets in the infrared image. Compared with other MGA-based and advanced algorithms, it is competitive or even superior in terms of qualitative and quantitative evaluation. And the fusion performance is beneficial for target detection and tracking in complex environments.