### Using information measure to select HIS’s spectra

#### Determination of candidate spectra sets

The amount of entropy was measured by the degree of uncertainty, i.e., the probability of the occurrence of discrete random events [44]. When the amount of information was larger, the redundancy was smaller. Information measurement based on Shannon’s communication theory has proven to be very effective in identifying the redundancy of high-dimensional data sets. When these measurements were applied to hyperspectral images, each channel was equivalent to a random variable **X**. In addition, all the pixels were considered as events *x*_{i} of **X**. The channels with less information were excluded by Shannon entropy to determine the candidate spectra sets as follows:

First, the entropy of each spectrum of the hyperspectral image *H*(*B*_{i}) was calculated. The random variable *B*_{i} is the *i*th spectrum (*i* = 1, 2, …*n*), *x*_{i} is the pixel of the *i*th spectrum, \(p_{B_{i}}(x_{i})\) is the probability density function of the spectrum *B*_{i}, *b* is the logarithmic order.

$$\begin{array}{@{}rcl@{}} H(B_{i})=-\sum\limits_{i=1}^{n}p_{B_{i}}(x_{i})\log_{b}[p_{B_{i}}(x_{i})] \end{array} $$

(1)

Second, the local average of the entropy of each spectrum was defined. In the equation, *m* is the window size, indicating the size of the neighborhood

$$\begin{array}{@{}rcl@{}} \overline{H_{m}(B_{i})}=\frac{1}{m}\sum\limits_{p=-m/2}^{m/2}H(B_{i+p}) \end{array} $$

(2)

Finally, the spectrum *B*_{i} that met the following conditions was retained. In the equation, *σ* is the threshold factor. The spectrum was considered redundant if its entropy was higher or lower than the upper or the lower floating threshold factor *σ* of the local average \(\overline {H_{m}(B_{i})}\).

$$\begin{array}{@{}rcl@{}} H(B_{i})\in[\overline{H_{m}(B_{i})}\times(1-\sigma),\overline{H_{m}(B_{i})}\times(1+\sigma)] \end{array} $$

(3)

As shown in Fig. 2, the horizontal axis represents the number of spectra, i.e., the spectral dimension, the vertical axis represents the entropy of each spectrum, and the blue curve is the entropy curve. The smoothness of the entropy curve determined the values of the window size *m* the threshold factor *σ*. If the curve was smooth, the change of the adjacent spectra information of the hyperspectral image was small, the uncertainty of the spectrum information was small, the number of spectra falling outside the relevant range was small, the probability of having an uncorrelated spectrum was small, and the number of redundant spectra was small. In this case, the smaller values of *σ* and *m* were chosen to improve the ability to exclude redundant spectra. On the contrary, if the curve fluctuated greatly, the information of the adjacent spectra of the hyperspectral image changed dramatically, the uncertainty of the spectrum information was large, the number of spectra falling outside the correlation range was large, the probability of having an uncorrelated spectrum was large, and there were many redundant spectra. In this case, the larger values of *σ* and *m* were chosen to prevent the elimination of the spectra with valid information.

Next, based on the calculated entropy, the CIE 1931 Standard Chroma Observer Color Matching Function (CMF) [45], which described the visual color characteristics of human eyes, was used to complete the determination of candidate spectra. The CMF determined the amount of light (red, blue, and green) to achieve the same visual effect as the monochromatic light corresponding to the wavelength. By applying CIE color matching to the hyperspectral images in the visible range, the hyperspectral images can be visualized as images with the correct color matching [46]. By setting the threshold value *t* for the CMF coefficients of the three primary colors, the candidate spectra based on the three primary color channels, \( \text {Set}_{R}^{t}\ ,\ \text {Set}_{B}^{t}\), and \( \text {Set}_{G}^{t}\), were obtained.

In Fig. 3, two spectral thresholds (*t* = 0.1, *t* = 0.5) were set for the trend curve of the red light CMF coefficient. When the CMF coefficient was above the threshold, the corresponding spectra were preserved. However, it was very challenging to set the parameter *t* without a specific application. In this paper, an automatic threshold method was used, in which the optimal threshold of *t* was defined as the value to maximize the amount of discarded information. *S*discard*t* was defined as the set of channels that were discarded by thresholding the CMF. *S*selected*t* was the complementary set of *S*discard*t*. The optimal threshold was defined as:

$$\begin{array}{@{}rcl@{}} {\begin{aligned} t_{opt} &= \text{argmax}(t){\quad subject \quad to \quad{H}} \left(S_{discard})^{t} \right) \\&\qquad < {H}\left(S_{selected}^{t} \right) \end{aligned}} \end{array} $$

(4)

where H(*S**d**i**s**c**a**r**d**t*) is the total entropy of the discarded spectra obtained by the above derivation, *H*(*S**d**i**s**c**a**r**d**t*) is the total entropy of the selected spectra. Based on these results, the spectra were initially selected.

#### Spectra selection

Mutual information is a measure of useful information. It is defined as the contained amount of information in a random variable another random variable. The mutual information between two random variables **X** and **Y** is defined as follows:

$$\begin{array}{@{}rcl@{}} I(\rm X,Y)&=&\sum_{\substack{i=1\dots n \\ j=1 \dots m}} P_{\text{X,Y}}(x_{i},y_{j})\log \frac{P_{\text{X,Y}}(x_{i},y_{j})}{P_{\rm{X}}(x_{i})\cdot P_{\rm{Y}}(y_{j})}\\ &=&{H(\rm X)}+{H(\rm Y)}-H(\rm X,Y) \end{array} $$

(5)

where *P*_{X}(*x*_{i}) is the probability density function of **X**,*P*_{Y}(*y*_{i}) is the probability density function of **Y**, and *P*_{X,Y}(*x*_{i},*y*_{j}) is the joint probability density of **X** and **Y**. *H*(*X*) is the entropy of the random variable **X**,*H*(*Y*) is the entropy of the random variable **Y**. *H*(*X*,*Y*) is the joint entropy of two random variables **X** and **Y**.

Further, Bell proposed that the mutual information of three random variables **X**,**Y**, and **Z** can be defined as [47]

$$\begin{array}{@{}rcl@{}} I(\rm X,Y,Z)&=&{H(\rm X,Z)}+H{(\rm Y,Z)}-H{(\rm X,Y,Z)}\\ &=&-~{H(\rm Z)}-I(\rm X,Y) \end{array} $$

(6)

where *H*(*X*,*Y*,*Z*) is the third-order joint entropy of the three random variables **X**,**Y**, and **Z**.

The above principle is also applicable to hyperspectral images. The information of one channel can increase the mutual information between the other two channels. In this case, when the overlapped information between the two channels is less, the interdependence degree between the two random variables is lower, and more information is contained. Two criteria need to be considered to reduce the dimensionality of hyperspectral images, i.e., maximum information and minimum redundancy.

Pla proposed to standardize the mutual information [48]. In this paper, the *k*th-order normalized information (NI) of the spectrum \( S=\{B_{1},\dots,B_{k}\}\) was used as the standardized mutual information. NI was defined as follows:

$$\begin{array}{@{}rcl@{}} \text{NI}_{k}(S)=\frac{K\times I(S)}{\sum\limits_{i=1}^{k}H(B_{i})} \end{array} $$

(7)

where *I*(*S*) is the mutual information among the spectra of *B*_{1} to *B*_{k} and *H*(*B*_{i}) is the entropy of the spectrum *B*_{i}.

In the previous section, the threshold *t* was set for the CMF coefficients of the three primary colors to obtain three candidate spectra, i.e., \( \text {Set}_{R}^{t}\), \( \text {Set}_{B}^{t}\), and \( \text {Set}_{G}^{t}\). When the value of the mutual information was smaller, the amount of information contained in the selected spectra was larger, and the dimensionality reduction effect on the hyperspectral image was better. Based on the mutual information, the three spectra, i.e., *x*_{R}, *y*_{B}, *z*_{G} (\(x_{R}\in \text {Set}_{R}^{t}\), \(y_{B}\in \text {Set}_{B}^{t}\), \(z_{G} \in \text {Set}_{G}^{t}\)) were obtained to minimize NI_{3}(*x*_{R},*y*_{B},*z*_{G}). These three spectra contained the most important information of the hyperspectral images.

### Pseudo-color images synthesis

To get the texture information of in the construction of virtual land environment, we adopt the pseudo-color images synthesis technology, which is based on the three spectra selected from the previous section. Besides, the pseudo-color images can also be convenient for human eyes to observe. As an image enhancement technology, the pseudo-color images synthesis technology can synthesize multispectral monochrome images into pseudo-color images by adding or subtracting colors.

Using the three grayscale images, *x*_{R},*y*_{B},*z*_{G}, the pseudo-color images synthesis can be performed based on information measurement according to Section 3.1.1.

As shown in Fig. 4, three grayscale images of the hyperspectral three-dimensional color image were saved by the three-color synthesis method [49], and the three color-changing functions corresponding to the three colors of red, green, and blue were set to R (*x*,*y*), G (*x*,*y*), and B (*x*,*y*).

$$\begin{array}{@{}rcl@{}} {\mathrm{R}}(x,y)&=&{\text{Red}}(\rm Gray_{1}(x,y)),\\ {\mathrm{G}}(x,y)&=&{\text{Green}}(\rm Gray_{2}(x,y)),\\ {\mathrm{B}}(x,y)&=&{\text{Blue}}(\rm Gray_{3}(x,y)) \end{array} $$

(8)

where Gray _{i}(x,y),(*i*=1,2,3) represents the grayscale data of three grayscale images, Red(Gray _{1}(x,y)), Green(Gray _{2}(x,y)), and Blue(Gray _{3}(x,y)) indicate that the color conversion of red, green, and blue is corresponding to the three grayscale images, respectively. Finally, the three color-converted images were combined into a pseudo-color image. The generated pseudo-color image can also improve the fidelity through color correction technology [50].

$$\begin{array}{@{}rcl@{}} ({\text{R,G,B}})=({\mathrm{R}}(x,y),{\mathrm{G}}(x,y),{\mathrm{B}}(x,y)) \end{array} $$

(9)

### HSI classification based on CNN

#### HSI classification based on information measure

As shown in Fig. 5, the classification process of hyperspectral image based on information measure (IM for short) can be divided into the following two main steps: Firstly, three candidate spectra were selected based on entropy and color matching function and three most important spectra were selection based on the minimum mutual information. Then the grayscale images of these three spectra were synthesized into the pseudo-color images. These three spectra contained the most important spectral information. Then neighborhood of the pixel to be classified was finally extracted to generate a patch with spatial-spectral information. Secondly, the patches were input into the CNN for training and testing.

As shown in Fig. 6, assume that the size of hyperspectral data is *I*_{1}×*I*_{2}×*I*_{3}. From the three spectra *x*_{R},*y*_{B},*z*_{G}, a *m*×*m*×3 patch with spatial-spectral information was extracted and put into CNN. From the perspective of the two spatial dimension *I*_{1}×*I*_{2}, the patch to CNN contained three layers of spatial information with the size of *m*×*m*. From the perspective of spectral dimension *I*_{3}, the patch contained all the spectral information of the three spectra *x*_{R},*y*_{B},*z*_{G}. Therefore, the method uses all the spatial-spectral information of the three spectra *x*_{R},*y*_{B},*z*_{G}, and the number of input channels to CNN was in_channels=3. CNN network consists of two convolution layers (Conv1 and Conv2), two pooling layers (Max-pooling1 and Max-pooling2), and two full connection layers (Full_connected1 and Full_connected2). Finally, the classification of hyperspectral images is realized by using Softmax classifier. The training stage is as follows:

Step 1. Random initialization of network parameters, including Conv1, Conv2, Max-pooling1, Max-pooling2, and so on.

Step 2. The input patch is propagated forward through the convolution layers, the pooling layers, and the full connection layers to get the output value.

Step 3. Calculate the error between the output value and the target value of the network.

Step 4. When the error is greater than the expected value, the error is propagated back to the network, and the errors of the full connection layers, the pooling layers, and the convolution layers are obtained in turn; when the error is equal to or less than the expected value, the training is ended.

Step 5. Update the network parameters according to the obtained error and go to step 2).

#### HSI classification based on information measure and enhanced spectral information

On the basis of the previous methods, we propose HSI classification based on information measure and enhanced spectral information (IM_ESI for short) as shown in Fig. 7. Compared with the IM method, the main difference of IM_ESI is extracting all the spectral information of the pixel to be classified (the patch central pixel) and then attach them to previous *m*×*m*×3 patch by combing, intercepting, and deforming the spectral information, generating a new *m*×*m*×6 patch for CNN’ input finally.

As shown in Fig. 8, Firstly, the one-dimensional spectral information of the patch center with the size of 1×1×*I*_{3} was repeatedly superimposed by *n* times to obtain the one-dimensional spectral information with the size of 1×(*n*×*I*_{3}). A one-dimensional spectral vector with the same size as the two-dimensional spatial information *m*×*m* was intercepted. The one-dimensional spectrum with the size of 1×(*m*×*m*) was deformed into a two-dimensional spectral matrix of *m*×*m* and superimposed in three layers. The obtained results were combined with the two-dimensional spatial information with the size of *m*×*m* in three layers of the three spectra *x*_{R},*y*_{B},*z*_{G} to obtain the spatial-spectral patch with the size of *m*×*m*×6. The obtained spatial-spectral patch was put into the CNN. The number of input channel in CNN model was in_channels=6. The structure of CNN network is the same as that of the previous network.