Related works
Assuming that the images are derived from n sensors, the size of the image taken from one of the sensors with number s is M × N; that means each image provided by this sensor has M × N pixels. If feature information has been extracted from this image, the feature vector of each pixel in this sensor could be expressed as X_{S}(1, 1), …, X_{S}(M, N), S = 1, 2, . …n. Similarly, if the images provided by the sensor s have D_{s} bands, we can also use images X_{S}(i, j) representing a grayscale vector for whole bands of the sensor s at the location of (i, j). That is, X_{S}(i, j) = (X_{S}(i, j, 1), …, X_{S}(i, j, D_{s})), where X_{S}(i, j, g) represents the grayscale value of the band g. In this paper, n is 2, indicating the considered images consisting of a Sentinel1 SAR image and a Landsat8 OLI optical image.
In the images derived from n sensors, it is assumed that there are K objects, namely ω_{1}, …, ω_{K}. For prior probability P(ω_{1}), …, P(ω_{K}), C = {C(i, j); 1 ≤ i ≤ M, 1 ≤ j ≤ N} denotes the label set for the whole scene, where C(i, j) ∈ {ω_{1}, ω_{2}, …, ω_{k}}. All the pixels in the images could be represented by X_{S} = {X_{S}(i, j); 1 ≤ i ≤ M, 1 ≤ j ≤ N}.
The task of multisource classification is to maximize the posterior probability P(C X_{1}, …, X_{n}) of each pixel, depicted as [18]:
$$ P\left(C{X}_1,\dots, {X}_n\right)=\frac{P\left({X}_1,\dots, {X}_nC\right)P(C)}{P\left({X}_1,\dots, {X}_n\right)} $$
(1)
where P(X_{1}, …, X_{n} C) is the conditional probability of feature vector X_{1}, …, X_{n} with the label C. P(C) is a prior probability, and P(X_{1}, …, X_{n}) is the probability of n sensors data.
Assume the images of different sensors are identical independent distributions, we could get the formula as P(X_{1}, …, X_{n} C) = P(X_{1} C)⋯P(X_{n} C). The corresponding weight value is given according to the reliability factors for each sensor data. Thus, the posterior probability could be formulated as [18]:
$$ L\left(C{X}_1,\dots, {X}_n\right)=P{\left({X}_1C\right)}^{\alpha_1}\cdots P{\left({X}_nC\right)}^{\alpha_n}P(C) $$
(2)
where α_{s} denotes a reliability factor with 0 ≤ α_{s} ≤ 1. If the sensor s has low reliability, α_{s} is zero resulting \( P{\left({X}_sC\right)}^{\alpha_s}=1 \). This means that the conditional probability will have no effect on the likelihood function, while, for a sensor with lesser reliability, the closer the reliability factor is close to 0, the more larger it will be to the posterior probability. By using spatial information, prior probability for class labels P(C) could be depicted as [20]
$$ {\displaystyle \begin{array}{l}P\left(C\left(i,j\right)C\left(k,l\right),\left\{k,l\right\}\ne \left\{i,j\right\}\right)\\ {}=P\left(C\left(i,j\right)C\left(k,l\right),\left\{k,l\right\}\in {\xi}_{ij}\right)\\ {}=\frac{1}{Z}{e}^{U(C)/T}\end{array}} $$
(3)
where U is the potential energy function, C denotes the clique, Z is the normalization constant, T is the temperature constant, ξ_{ij} is the local neighborhood pixel set of the pixel (i, j).
Literature [21] pointed out that the reliability of a sensor could upgrade or downgrade the data classes. The conditional probabilities can be grouped into a matrix R, and denoted by
$$ R=\left[\begin{array}{l}P\left({\omega}_1{X}_{S,1}\right)\kern0.5em P\left({\omega}_2{X}_{S,1}\right)\kern0.5em \dots \kern0.5em P\left({\omega}_K{X}_{S,1}\right)\\ {}P\left({\omega}_1{X}_{S,2}\right)\kern0.5em P\left({\omega}_2{X}_{S,2}\right)\kern0.5em \dots \kern0.5em P\left({\omega}_K{X}_{S,2}\right)\\ {}\kern2.25em .\kern4.75em .\kern2.75em \dots \kern2.75em .\\ {}\kern2.25em .\kern4.75em .\kern2.75em \dots \kern2.75em .\\ {}\kern2.25em .\kern4.75em .\kern2.75em \dots \kern2.75em .\\ {}P\left({\omega}_1{X}_{S,Z}\right)\kern0.5em P\left({\omega}_2{X}_{S,Z}\right)\kern0.5em \dots \kern0.5em P\left({\omega}_K{X}_{S,Z}\right)\end{array}\right] $$
(4)
where X_{S, 1} represents the feature vector of the first pixel in the sensor image s or the gray vector at the first pixel position (assuming there are multiple bands), i.e., X_{S, 1} = X_{S}(1, 1). X_{S, Z} represents the feature vector of the last pixel in the sensor image or the last pixel position gray vector, i.e., X_{S, Z} = X_{S}(M, N) with Z = M × N. If the sensor data is reliable, the class label information for each data of the sensor will be unique. Specifically, each row in the matrix has only one value with 1, and all the others would be zero. If the sensor data is extremely unreliable, then the class label information for each data is random. Thus, there is an uncertainty of log[1/P(ω_{j} X_{s, i})] for a certain observation data X_{s, i} at sensor s with class information ω_{j}, and the average uncertainty of the data about the class information can be calculated as [21]
$$ H\left(\omega {X}_{s,i}\right)=\sum \limits_jP\left({\omega}_j{X}_{s,i}\right)\log \frac{1}{P\left({\omega}_j{X}_{s,i}\right)} $$
(5)
If the uncertainty of the sensor data can be measured as H(ω X_{s}), it could be expressed by
$$ {\displaystyle \begin{array}{l}H\left(\omega {X}_s\right)=\sum \limits_iP\left({X}_{s,i}\right)H\left(\omega {X}_{s,i}\right)\\ {}\kern4.25em =\sum \limits_j\sum \limits_iP\left({X}_{s,i}\right)P\left({\omega}_j{X}_{s,i}\right)\log \frac{1}{P\left({\omega}_j{X}_{s,i}\right)}\\ {}\kern4.25em =\sum \limits_j\sum \limits_iP\left({X}_{s,i},{\omega}_j\right)\log \frac{1}{P\left({\omega}_j{X}_{s,i}\right)}\end{array}} $$
(6)
If only α_{s} = H(ω X_{s}) of the image is taken as the sole basis for the reliability factor of the image, it could have influence on the posterior probability (2). If α_{s} is small except 0, the sensor data has more effect on (2). Conversely, it plays less role. However, due to the different advantages of different sensors on certain land cover, the reliability factors may result in low classification performance or misclassification. When a reliability factor is not zero, a small reliability factor will make the posterior probability large, and a large reliability factor will make the posterior probability small. Taking two sensors as an example, sensor 1 has an advantage on land cover A over sensor 2, but the reliability factor of sensor 1 may be large and the reliability factor of sensor 2 is small, resulting in a large contribution of sensor 2 to classification. So the classification performance will not be as good as that of sensor 1 for land cover A. Furthermore, the land cover A is classified as another class with small reliability factor, resulting in misclassifications. Therefore, for certain land cover, a sensor with good classification advantage may be chosen. Assuming correct classification, the reliability factor should be small, and the posterior probability is large. The sensor without such an advantage should be with a large reliability factor. Then, the posterior probability is small. This will improve the classification performance to a certain extent and reduce the problem of misclassifications.
In order to solve this problem, we use different reliability factors for different land covers. Different sensors in the same region have different reliability factors, so that the data with better classification advantage could be more important, that is, the reliability factor with better classification ability is lesser. Different sensors use different reliability factors in different land covers. Different sensor data have different classification ability for different land covers, such as SAR images having better classification for urban areas, but for those areas without much detail, such areas could not obtain good results. The optical image has rich spectral information, so the discrimination ability of areas without detailed information is larger. Therefore, we divide the image to be classified into the urban area and the nonurban area and give different reliability factors to different sensor data for different areas.
Proposed method
Since the reliability factors of the two sensors are fixed in (5), reliability factors in different object regions could not be given full play to their classification advantages. In order to solve this problem, we assume that different object areas should adopt different reliability factors, so that data with good classification advantage could play a greater role in different land covers, that is, the reliability factor with good classification ability is lesser and vice versa. Starting from the reliability factors, we divide the reliability factor of Eq. (5) into two quantities as
$$ \left\{\begin{array}{l}{\lambda}_{\mathrm{SAR},i}^{\prime }=H\left(\omega {X}_{\mathrm{SAR},i}\right)\\ {}\kern2.25em =\sum \limits_jP\left({\omega}_j{X}_{\mathrm{SAR},i}\right)\log \frac{1}{P\left({\omega}_j{X}_{\mathrm{SAR},i}\right)}\\ {}\\ {}{\lambda}_{\mathrm{Optical},i}^{\prime }=H\left(\omega {X}_{\mathrm{Optical},i}\right)\\ {}\kern2.25em =\sum \limits_jP\left({\omega}_j{X}_{\mathrm{Optical},i}\right)\log \frac{1}{P\left({\omega}_j{X}_{\mathrm{Optical},i}\right)}\end{array}\right. $$
(7)
where \( {\lambda}_{\mathrm{SAR},i}^{\prime } \) and \( {\lambda}_{\mathrm{Optical},i}^{\prime } \) are the reliability factors of the i pixel in the SAR image and the optical image, respectively. X_{SAR, i} and X_{Optical, i} represent the i pixel in the SAR image and the optical image, respectively. Normalizing the SAR image and the optical image reliability factors, the reliability factors at the same position in the SAR image and the optical image both could reach 1. However, when the two reliability factors are close to each other, the advantage of one image cannot be highlighted, which leads to the inability to improve the classification accuracy. To solve this problem, according to the literature [22], we introduce the idea of stretching. We stretch (7) and get
$$ \left\{\begin{array}{l}{\lambda}_{\mathrm{SAR},i}=\frac{1/\left(1+\exp \left(16\times {\lambda}_{\mathrm{SAR},i}^{\prime }+4\right)\right)}{1/\left(1+\exp \left(16\times {\lambda}_{\mathrm{SAR},i}^{\prime }+4\right)\right)+1/\left(1+\exp \left(16\times {\lambda}_{\mathrm{Optical},i}^{\prime }+4\right)\right)}\\ {}\\ {}{\lambda}_{\mathrm{Optical},i}=\frac{1/\left(1+\exp \left(16\times {\lambda}_{\mathrm{Optical},i}^{\prime }+4\right)\right)}{1/\left(1+\exp \left(16\times {\lambda}_{\mathrm{SAR},i}^{\prime }+4\right)\right)+1/\left(1+\exp \left(16\times {\lambda}_{\mathrm{Optical},i}^{\prime }+4\right)\right)}\\ {}\\ {}{\lambda}_{\mathrm{SAR},i}+{\lambda}_{\mathrm{Optical},i}=1\end{array}\right. $$
(8)
where λ_{SAR, i} and λ_{Optical, i} are the normalized reliability factors. The aim of (8) is to make highly reliable sensor data having more contribution in the classification process.
At medium resolution, SAR images have better classification accuracy for texture areas such as urban areas than optical images [23], which means that SAR images can have better recognition ability for urban areas. In SAR images, there are often many bright spots inside the building area that are reflected by objects such as oblique roofs and sharp corners. The middle of the bright spot is mixed with shadows, black roads, and light gray blocks caused by vegetation, and the arrangement of the buildings is usually relatively neat, so it is easy to form a texture with a regular light and dark interval [24]. Therefore, we perform a measure of uniformity on SAR images, to extract urban areas and obtain image classification method with different reliability factors for different sensor images.
At present, the extraction of urban areas based on SAR images has been reported. In the literatures [24, 25], both of them using the gray level cooccurrence matrix texture [26] as the main means for building extraction in SAR images. In [25], the extracted urban area is used as a marking field, and the SAR image is divided into urban area and nonurban area. Then, a joint utilization rule from the SAR image and multispectral image based on these two different areas is given.
Inspired by this idea, in order to improve the measurement of reliability factors, we introduce urban area as the label field into the classification of SAR and optical images. In the extraction of urban areas, we use the entropy information of the gray level cooccurrence matrix used in [25]. The difference is that our method does not use the blockbased extraction urban area strategy in [25], because the edge fit of [25] is poor according to the experiments. Here, we use a pixelbased approach to extract urban areas. The specific description is as follows: First, the gray level cooccurrence matrix is calculated for the SAR image, and the entropy information is calculated from the gray level cooccurrence matrix [25].
In order to improve the accuracy of urban area extraction, we need to make full use of the information of SAR and optical images. The extracted urban area could be obtained by an entropy threshold for Sentinel1 image with 0.6 (parameter sensitivity analysis could be seen in experimental section), providing a coarse label field. Figure 1 a–c are the Sentinel1 image, the Landsat8 image, and the coarsely extracted urban area image in Xiamen, China. Figure 2a–c are the Sentinel1 image, the Landsat8 image, and the coarsely extracted urban area image in Neiye, Japan.
After obtaining the urban area, we propose the strategy to obtain amendment reliability factor for the urban area and the nonurban area. In order to make the contribution of the highly reliable sensor data in classification of these two sources, the strategy with amendment reliability factors could be depicted as: It is assumed that ω_{B} denotes the urban area label, and \( {\omega}_{B^{\prime }} \) indicates the label of the nonurban area. We define that the amendment factor could be expressed by the reliability factors in (7) added with controlling factors such as λ_{e} = 1, \( {\lambda}_e^{\prime }=0 \). If the current pixel is in the urban area, the conditional probability P(X_{i} ω_{B}) belonging to the urban category depicted as \( P\left({X}_i{\omega}_B\right)=P{\left({X}_{\mathrm{SAR},i}{\omega}_B\right)}^{\lambda_{\mathrm{SAR},i}+{\lambda}_e^{\prime }}\times P{\left({X}_{\mathrm{Optical},i}{\omega}_B\right)}^{\lambda_{\mathrm{Optical},i}+{\lambda}_e^{\prime }} \) should be increased, where the amendment reliability factor is denoted as \( {\alpha}_{s,i}={\lambda}_{s,i}+{\lambda}_e^{\prime } \), while the conditional probability of noneurban \( P\left({X}_i{\omega}_{B^{\prime }}\right) \) belonging to the urban category could be denoted as \( P\left({X}_i{\omega}_{B^{\prime }}\right)=P{\left({X}_{\mathrm{SAR},i}{\omega}_{B^{\prime }}\right)}^{\lambda_{\mathrm{SAR},i}+{\lambda}_e}\times P{\left({X}_{\mathrm{Optical},i}{\omega}_{B^{\prime }}\right)}^{\lambda_{\mathrm{Optical},i}+{\lambda}_e} \) and the amendment factor could be denoted with α_{s, i} = λ_{s, i} + λ_{e}. The reason is that the conditional probability P(X_{s, i} ω_{j}) of the i pixel has the value between 0 and 1. Therefore, the lesser the value of the conditional probability, the lesser the sensor will contribute to the classification. If the current pixel is in the urban area, the value of the amendment reliability factor of a sensor data under the correct label should be as small as possible, and the reliability of the sensor for nonbuilding with incorrect label should be larger, so the former case \( {\lambda}_e^{\prime } \) is added, while the latter case λ_{e} is added. Thus, the probability that the current point is judged to be a nonurban area becomes small, and the probability of being judged as an urban area becomes large. Conversely, if the current pixel is in a nonurban area, the probability that the current point is judged to be an urban should be as small as possible, and the current point is a nonurban should be as large as possible. It means that \( P\left({X}_{\mathrm{Fused},i}{\omega}_B\right)=P{\left({X}_{\mathrm{SAR},i}{\omega}_B\right)}^{\lambda_{\mathrm{SAR},i}+\frac{1}{\mathrm{ep}}}\times P{\left({X}_{\mathrm{Optical},i}{\omega}_B\right)}^{\lambda_{\mathrm{Optical},i}+\frac{1}{\mathrm{ep}}} \) should be small for the pixel in the nonurban area with the amendment reliability factor \( {\alpha}_{s,i}={\lambda}_{s,i}+\frac{1}{\mathrm{ep}} \), and vice versa, where ep is a very small positive real number. The amendment reliability factors in nonurban belong to the nonurban in the SAR image and the optical image could be denoted as α_{SAR, i} = λ_{SAR, i} + λ_{e}, and \( {\alpha}_{\mathrm{Optical},i}={\lambda}_{\mathrm{Optical},i}+{\lambda}_e^{\prime } \) respectively. The reason is that when the current pixel is in a nonurban area, the value of the amendment reliability factor ω_{B} under a given label with urban should be as large as possible, and the design \( {\alpha}_{s,i}={\lambda}_{s,i}+\frac{1}{ep} \) ensures that the value of the amendment reliability factor is large enough. The smaller the value of the amendment reliability factor \( {\omega}_{B^{\prime }} \) of the sensor under the given label with nonbuilding should be smaller. In order to highlight the important effect between SAR image and optical images in the nonurban area, \( {\lambda}_e^{\prime } \) and λ_{e} are introduced into α_{Optical, i} and α_{SAR, i}, thus ensuring α_{SAR, i} > α_{Optical, i}, demonstrating the nature that the optical image contributes more to the image classification in the nonurban area than the SAR image. From the above discussion, we get
$$ \left\{\begin{array}{l} if\ {\mathrm{Mask}}_i=1,{\lambda}_e=1,{\lambda}_e^{\prime }=0:\\ {}P\left({X}_i{\omega}_j\right)\\ {}\kern0.5em =\left\{\begin{array}{l}P{\left({X}_{\mathrm{SAR},i}{\omega}_B\right)}^{\lambda_{\mathrm{SAR},i}+{\lambda}_e^{\prime }}\times P{\left({X}_{\mathrm{Optical},i}{\omega}_B\right)}^{\lambda_{\mathrm{Optical},i}+{\lambda}_e^{\prime }}\\ {}P{\left({X}_{\mathrm{SAR},i}{\omega}_{B^{\prime }}\right)}^{\lambda_{\mathrm{SAR},i}+{\lambda}_e}\times P{\left({X}_{\mathrm{Optical},i}{\omega}_{B^{\prime }}\right)}^{\lambda_{\mathrm{Optical},i}+{\lambda}_e}\end{array}\right.\\ {}\\ {}\ if\ {\mathrm{Mask}}_i\ne 1,{\lambda}_e=1,{\lambda}_e^{\prime }=0, ep=0.00001:\\ {}P\left({X}_i{\omega}_j\right)\\ {}\kern0.5em =\left\{\begin{array}{l}P{\left({X}_{\mathrm{SAR},i}{\omega}_B\right)}^{\lambda_{\mathrm{SAR},i}+\frac{1}{ep}}\times P{\left({X}_{\mathrm{Optical},i}{\omega}_B\right)}^{\lambda_{\mathrm{Optical},i}+\frac{1}{ep}}\\ {}P{\left({X}_{\mathrm{SAR},i}{\omega}_{B^{\prime }}\right)}^{\lambda_{\mathrm{SAR},i}+{\lambda}_e}\times P{\left({X}_{\mathrm{Optical},i}{\omega}_{B^{\prime }}\right)}^{\lambda_{\mathrm{Optical},i}+{\lambda}_e^{\prime }}\end{array}\right.\end{array}\right. $$
(9)
where Mask_{i} = 1 indicates that the current pixel is in the urban area and Mask_{i} ≠ 1 indicates that the current pixel is in a nonurban area.
If U(C X_{1}, …, X_{n}) = log(L(C X_{1}, …, X_{n})), we introduce (8) into the following energy function [21] as
$$ U\left(C{X}_1,\dots, {X}_n\right)=\sum \limits_{s=1}^n{\alpha}_s{U}_{\mathrm{data}}\left({X}_S\right)+{U}_{sp}(C) $$
(10)
Then, (9) is used in (10), we get the object functions of MRF with amendment reliability factors for classification as follows:
 (1)
If the current pixel is in the building area, then the energy function that belongs to the building is
$$ {\displaystyle \begin{array}{l}{U}_{\mathrm{data}}\left({X}_{\mathrm{Fused}}\right)+{U}_{sp}(C)\\ {}=\Big\{\left({\lambda}_{\mathrm{SAR},i}+{\lambda}_e^{\prime}\right)\log \left(P\left({X}_{\mathrm{SAR},i}{\omega}_B\right)\right)\\ {}+\left({\lambda}_{\mathrm{Optical},i}+{\lambda}_e^{\prime}\right)\log \left(P\left({X}_{\mathrm{Optical},i}{\omega}_B\right)\right)\Big\}+{U}_{sp}(C)\end{array}} $$
(11)

(2)
If the current pixel is in the building area, then the energy function that belongs to the nonbuilding is
$$ {\displaystyle \begin{array}{l}{U}_{data}\left({X}_{Fused}\right)+{U}_{sp}(C)\\ {}=\Big\{\left({\lambda}_{SAR,i}+{\lambda}_e\right)\log \left(P\left({X}_{SAR,i}{\omega}_{B^{\prime }}\right)\right)\\ {}+\left({\lambda}_{Optical,i}+{\lambda}_e\right)\log \left(P\left({X}_{Optical,i}{\omega}_{B^{\prime }}\right)\right)\Big\}+{U}_{sp}(C)\end{array}} $$
(12)

(3)
If the current pixel is in a nonbuilding area, then the energy function that belongs to the building is
$$ {\displaystyle \begin{array}{l}{U}_{\mathrm{data}}\left({X}_{\mathrm{Fused}}\right)+{U}_{sp}(C)\\ {}=\Big\{\left({\lambda}_{\mathrm{SAR},i}+\frac{1}{ep}\right)\log \left(P\left({X}_{\mathrm{SAR},i}{\omega}_B\right)\right)\\ {}+\left({\lambda}_{\mathrm{Optical},i}+\frac{1}{ep}\right)\log \left(P\left({X}_{\mathrm{Optical},i}{\omega}_B\right)\right)\Big\}+{U}_{sp}(C)\end{array}} $$
(13)

(4)
If the current pixel is in a nonbuilding area, then the energy function that belongs to the nonbuilding is
$$ {\displaystyle \begin{array}{l}{U}_{\mathrm{data}}\left({X}_{\mathrm{Fused}}\right)+{U}_{sp}(C)\\ {}=\Big\{\left({\lambda}_{\mathrm{SAR},i}+{\lambda}_e\right)\log \left(P\left({X}_{\mathrm{SAR},i}{\omega}_{B^{\prime }}\right)\right)\\ {}+\left({\lambda}_{\mathrm{Optical},i}+{\lambda}_e^{\prime}\right)\log \left(P\left({X}_{\mathrm{Optical},i}{\omega}_{B^{\prime }}\right)\right)\Big\}+{U}_{sp}(C)\end{array}} $$
(14)
If the current pixel is in the building area, the energy function of each class is calculated according to (11) and (12), then the label that minimizes the energy function is the final label for the current pixel. If the current pixel is in the nonurban region, the energy function of each class is calculated according to (13) and (14), and the class that minimizes the energy function is the final label for the current pixel.