Related works
Assuming that the images are derived from n sensors, the size of the image taken from one of the sensors with number s is M × N; that means each image provided by this sensor has M × N pixels. If feature information has been extracted from this image, the feature vector of each pixel in this sensor could be expressed as XS(1, 1), …, XS(M, N), S = 1, 2, . …n. Similarly, if the images provided by the sensor s have Ds bands, we can also use images XS(i, j) representing a grayscale vector for whole bands of the sensor s at the location of (i, j). That is, XS(i, j) = (XS(i, j, 1), …, XS(i, j, Ds)), where XS(i, j, g) represents the grayscale value of the band g. In this paper, n is 2, indicating the considered images consisting of a Sentinel-1 SAR image and a Landsat-8 OLI optical image.
In the images derived from n sensors, it is assumed that there are K objects, namely ω1, …, ωK. For prior probability P(ω1), …, P(ωK), C = {C(i, j); 1 ≤ i ≤ M, 1 ≤ j ≤ N} denotes the label set for the whole scene, where C(i, j) ∈ {ω1, ω2, …, ωk}. All the pixels in the images could be represented by XS = {XS(i, j); 1 ≤ i ≤ M, 1 ≤ j ≤ N}.
The task of multisource classification is to maximize the posterior probability P(C| X1, …, Xn) of each pixel, depicted as [18]:
$$ P\left(C|{X}_1,\dots, {X}_n\right)=\frac{P\left({X}_1,\dots, {X}_n|C\right)P(C)}{P\left({X}_1,\dots, {X}_n\right)} $$
(1)
where P(X1, …, Xn| C) is the conditional probability of feature vector X1, …, Xn with the label C. P(C) is a prior probability, and P(X1, …, Xn) is the probability of n sensors data.
Assume the images of different sensors are identical independent distributions, we could get the formula as P(X1, …, Xn| C) = P(X1| C)⋯P(Xn| C). The corresponding weight value is given according to the reliability factors for each sensor data. Thus, the posterior probability could be formulated as [18]:
$$ L\left(C|{X}_1,\dots, {X}_n\right)=P{\left({X}_1|C\right)}^{\alpha_1}\cdots P{\left({X}_n|C\right)}^{\alpha_n}P(C) $$
(2)
where αs denotes a reliability factor with 0 ≤ αs ≤ 1. If the sensor s has low reliability, αs is zero resulting \( P{\left({X}_s|C\right)}^{\alpha_s}=1 \). This means that the conditional probability will have no effect on the likelihood function, while, for a sensor with lesser reliability, the closer the reliability factor is close to 0, the more larger it will be to the posterior probability. By using spatial information, prior probability for class labels P(C) could be depicted as [20]
$$ {\displaystyle \begin{array}{l}P\left(C\left(i,j\right)|C\left(k,l\right),\left\{k,l\right\}\ne \left\{i,j\right\}\right)\\ {}=P\left(C\left(i,j\right)|C\left(k,l\right),\left\{k,l\right\}\in {\xi}_{ij}\right)\\ {}=\frac{1}{Z}{e}^{-U(C)/T}\end{array}} $$
(3)
where U is the potential energy function, C denotes the clique, Z is the normalization constant, T is the temperature constant, ξij is the local neighborhood pixel set of the pixel (i, j).
Literature [21] pointed out that the reliability of a sensor could upgrade or downgrade the data classes. The conditional probabilities can be grouped into a matrix R, and denoted by
$$ R=\left[\begin{array}{l}P\left({\omega}_1|{X}_{S,1}\right)\kern0.5em P\left({\omega}_2|{X}_{S,1}\right)\kern0.5em \dots \kern0.5em P\left({\omega}_K|{X}_{S,1}\right)\\ {}P\left({\omega}_1|{X}_{S,2}\right)\kern0.5em P\left({\omega}_2|{X}_{S,2}\right)\kern0.5em \dots \kern0.5em P\left({\omega}_K|{X}_{S,2}\right)\\ {}\kern2.25em .\kern4.75em .\kern2.75em \dots \kern2.75em .\\ {}\kern2.25em .\kern4.75em .\kern2.75em \dots \kern2.75em .\\ {}\kern2.25em .\kern4.75em .\kern2.75em \dots \kern2.75em .\\ {}P\left({\omega}_1|{X}_{S,Z}\right)\kern0.5em P\left({\omega}_2|{X}_{S,Z}\right)\kern0.5em \dots \kern0.5em P\left({\omega}_K|{X}_{S,Z}\right)\end{array}\right] $$
(4)
where XS, 1 represents the feature vector of the first pixel in the sensor image s or the gray vector at the first pixel position (assuming there are multiple bands), i.e., XS, 1 = XS(1, 1). XS, Z represents the feature vector of the last pixel in the sensor image or the last pixel position gray vector, i.e., XS, Z = XS(M, N) with Z = M × N. If the sensor data is reliable, the class label information for each data of the sensor will be unique. Specifically, each row in the matrix has only one value with 1, and all the others would be zero. If the sensor data is extremely unreliable, then the class label information for each data is random. Thus, there is an uncertainty of log[1/P(ωj| Xs, i)] for a certain observation data Xs, i at sensor s with class information ωj, and the average uncertainty of the data about the class information can be calculated as [21]
$$ H\left(\omega |{X}_{s,i}\right)=\sum \limits_jP\left({\omega}_j|{X}_{s,i}\right)\log \frac{1}{P\left({\omega}_j|{X}_{s,i}\right)} $$
(5)
If the uncertainty of the sensor data can be measured as H(ω| Xs), it could be expressed by
$$ {\displaystyle \begin{array}{l}H\left(\omega |{X}_s\right)=\sum \limits_iP\left({X}_{s,i}\right)H\left(\omega |{X}_{s,i}\right)\\ {}\kern4.25em =\sum \limits_j\sum \limits_iP\left({X}_{s,i}\right)P\left({\omega}_j|{X}_{s,i}\right)\log \frac{1}{P\left({\omega}_j|{X}_{s,i}\right)}\\ {}\kern4.25em =\sum \limits_j\sum \limits_iP\left({X}_{s,i},{\omega}_j\right)\log \frac{1}{P\left({\omega}_j|{X}_{s,i}\right)}\end{array}} $$
(6)
If only αs = H(ω| Xs) of the image is taken as the sole basis for the reliability factor of the image, it could have influence on the posterior probability (2). If αs is small except 0, the sensor data has more effect on (2). Conversely, it plays less role. However, due to the different advantages of different sensors on certain land cover, the reliability factors may result in low classification performance or misclassification. When a reliability factor is not zero, a small reliability factor will make the posterior probability large, and a large reliability factor will make the posterior probability small. Taking two sensors as an example, sensor 1 has an advantage on land cover A over sensor 2, but the reliability factor of sensor 1 may be large and the reliability factor of sensor 2 is small, resulting in a large contribution of sensor 2 to classification. So the classification performance will not be as good as that of sensor 1 for land cover A. Furthermore, the land cover A is classified as another class with small reliability factor, resulting in misclassifications. Therefore, for certain land cover, a sensor with good classification advantage may be chosen. Assuming correct classification, the reliability factor should be small, and the posterior probability is large. The sensor without such an advantage should be with a large reliability factor. Then, the posterior probability is small. This will improve the classification performance to a certain extent and reduce the problem of misclassifications.
In order to solve this problem, we use different reliability factors for different land covers. Different sensors in the same region have different reliability factors, so that the data with better classification advantage could be more important, that is, the reliability factor with better classification ability is lesser. Different sensors use different reliability factors in different land covers. Different sensor data have different classification ability for different land covers, such as SAR images having better classification for urban areas, but for those areas without much detail, such areas could not obtain good results. The optical image has rich spectral information, so the discrimination ability of areas without detailed information is larger. Therefore, we divide the image to be classified into the urban area and the non-urban area and give different reliability factors to different sensor data for different areas.
Proposed method
Since the reliability factors of the two sensors are fixed in (5), reliability factors in different object regions could not be given full play to their classification advantages. In order to solve this problem, we assume that different object areas should adopt different reliability factors, so that data with good classification advantage could play a greater role in different land covers, that is, the reliability factor with good classification ability is lesser and vice versa. Starting from the reliability factors, we divide the reliability factor of Eq. (5) into two quantities as
$$ \left\{\begin{array}{l}{\lambda}_{\mathrm{SAR},i}^{\prime }=H\left(\omega |{X}_{\mathrm{SAR},i}\right)\\ {}\kern2.25em =\sum \limits_jP\left({\omega}_j|{X}_{\mathrm{SAR},i}\right)\log \frac{1}{P\left({\omega}_j|{X}_{\mathrm{SAR},i}\right)}\\ {}\\ {}{\lambda}_{\mathrm{Optical},i}^{\prime }=H\left(\omega |{X}_{\mathrm{Optical},i}\right)\\ {}\kern2.25em =\sum \limits_jP\left({\omega}_j|{X}_{\mathrm{Optical},i}\right)\log \frac{1}{P\left({\omega}_j|{X}_{\mathrm{Optical},i}\right)}\end{array}\right. $$
(7)
where \( {\lambda}_{\mathrm{SAR},i}^{\prime } \) and \( {\lambda}_{\mathrm{Optical},i}^{\prime } \) are the reliability factors of the i pixel in the SAR image and the optical image, respectively. XSAR, i and XOptical, i represent the i pixel in the SAR image and the optical image, respectively. Normalizing the SAR image and the optical image reliability factors, the reliability factors at the same position in the SAR image and the optical image both could reach 1. However, when the two reliability factors are close to each other, the advantage of one image cannot be highlighted, which leads to the inability to improve the classification accuracy. To solve this problem, according to the literature [22], we introduce the idea of stretching. We stretch (7) and get
$$ \left\{\begin{array}{l}{\lambda}_{\mathrm{SAR},i}=\frac{1/\left(1+\exp \left(-16\times {\lambda}_{\mathrm{SAR},i}^{\prime }+4\right)\right)}{1/\left(1+\exp \left(-16\times {\lambda}_{\mathrm{SAR},i}^{\prime }+4\right)\right)+1/\left(1+\exp \left(-16\times {\lambda}_{\mathrm{Optical},i}^{\prime }+4\right)\right)}\\ {}\\ {}{\lambda}_{\mathrm{Optical},i}=\frac{1/\left(1+\exp \left(-16\times {\lambda}_{\mathrm{Optical},i}^{\prime }+4\right)\right)}{1/\left(1+\exp \left(-16\times {\lambda}_{\mathrm{SAR},i}^{\prime }+4\right)\right)+1/\left(1+\exp \left(-16\times {\lambda}_{\mathrm{Optical},i}^{\prime }+4\right)\right)}\\ {}\\ {}{\lambda}_{\mathrm{SAR},i}+{\lambda}_{\mathrm{Optical},i}=1\end{array}\right. $$
(8)
where λSAR, i and λOptical, i are the normalized reliability factors. The aim of (8) is to make highly reliable sensor data having more contribution in the classification process.
At medium resolution, SAR images have better classification accuracy for texture areas such as urban areas than optical images [23], which means that SAR images can have better recognition ability for urban areas. In SAR images, there are often many bright spots inside the building area that are reflected by objects such as oblique roofs and sharp corners. The middle of the bright spot is mixed with shadows, black roads, and light gray blocks caused by vegetation, and the arrangement of the buildings is usually relatively neat, so it is easy to form a texture with a regular light and dark interval [24]. Therefore, we perform a measure of uniformity on SAR images, to extract urban areas and obtain image classification method with different reliability factors for different sensor images.
At present, the extraction of urban areas based on SAR images has been reported. In the literatures [24, 25], both of them using the gray level co-occurrence matrix texture [26] as the main means for building extraction in SAR images. In [25], the extracted urban area is used as a marking field, and the SAR image is divided into urban area and non-urban area. Then, a joint utilization rule from the SAR image and multi-spectral image based on these two different areas is given.
Inspired by this idea, in order to improve the measurement of reliability factors, we introduce urban area as the label field into the classification of SAR and optical images. In the extraction of urban areas, we use the entropy information of the gray level co-occurrence matrix used in [25]. The difference is that our method does not use the block-based extraction urban area strategy in [25], because the edge fit of [25] is poor according to the experiments. Here, we use a pixel-based approach to extract urban areas. The specific description is as follows: First, the gray level co-occurrence matrix is calculated for the SAR image, and the entropy information is calculated from the gray level co-occurrence matrix [25].
In order to improve the accuracy of urban area extraction, we need to make full use of the information of SAR and optical images. The extracted urban area could be obtained by an entropy threshold for Sentinel-1 image with 0.6 (parameter sensitivity analysis could be seen in experimental section), providing a coarse label field. Figure 1 a–c are the Sentinel-1 image, the Landsat-8 image, and the coarsely extracted urban area image in Xiamen, China. Figure 2a–c are the Sentinel-1 image, the Landsat-8 image, and the coarsely extracted urban area image in Neiye, Japan.
After obtaining the urban area, we propose the strategy to obtain amendment reliability factor for the urban area and the non-urban area. In order to make the contribution of the highly reliable sensor data in classification of these two sources, the strategy with amendment reliability factors could be depicted as: It is assumed that ωB denotes the urban area label, and \( {\omega}_{B^{\prime }} \) indicates the label of the non-urban area. We define that the amendment factor could be expressed by the reliability factors in (7) added with controlling factors such as λe = 1, \( {\lambda}_e^{\prime }=0 \). If the current pixel is in the urban area, the conditional probability P(Xi| ωB) belonging to the urban category depicted as \( P\left({X}_i|{\omega}_B\right)=P{\left({X}_{\mathrm{SAR},i}|{\omega}_B\right)}^{\lambda_{\mathrm{SAR},i}+{\lambda}_e^{\prime }}\times P{\left({X}_{\mathrm{Optical},i}|{\omega}_B\right)}^{\lambda_{\mathrm{Optical},i}+{\lambda}_e^{\prime }} \) should be increased, where the amendment reliability factor is denoted as \( {\alpha}_{s,i}={\lambda}_{s,i}+{\lambda}_e^{\prime } \), while the conditional probability of none-urban \( P\left({X}_i|{\omega}_{B^{\prime }}\right) \) belonging to the urban category could be denoted as \( P\left({X}_i|{\omega}_{B^{\prime }}\right)=P{\left({X}_{\mathrm{SAR},i}|{\omega}_{B^{\prime }}\right)}^{\lambda_{\mathrm{SAR},i}+{\lambda}_e}\times P{\left({X}_{\mathrm{Optical},i}|{\omega}_{B^{\prime }}\right)}^{\lambda_{\mathrm{Optical},i}+{\lambda}_e} \) and the amendment factor could be denoted with αs, i = λs, i + λe. The reason is that the conditional probability P(Xs, i| ωj) of the i pixel has the value between 0 and 1. Therefore, the lesser the value of the conditional probability, the lesser the sensor will contribute to the classification. If the current pixel is in the urban area, the value of the amendment reliability factor of a sensor data under the correct label should be as small as possible, and the reliability of the sensor for non-building with incorrect label should be larger, so the former case \( {\lambda}_e^{\prime } \) is added, while the latter case λe is added. Thus, the probability that the current point is judged to be a non-urban area becomes small, and the probability of being judged as an urban area becomes large. Conversely, if the current pixel is in a non-urban area, the probability that the current point is judged to be an urban should be as small as possible, and the current point is a non-urban should be as large as possible. It means that \( P\left({X}_{\mathrm{Fused},i}|{\omega}_B\right)=P{\left({X}_{\mathrm{SAR},i}|{\omega}_B\right)}^{\lambda_{\mathrm{SAR},i}+\frac{1}{\mathrm{ep}}}\times P{\left({X}_{\mathrm{Optical},i}|{\omega}_B\right)}^{\lambda_{\mathrm{Optical},i}+\frac{1}{\mathrm{ep}}} \) should be small for the pixel in the non-urban area with the amendment reliability factor \( {\alpha}_{s,i}={\lambda}_{s,i}+\frac{1}{\mathrm{ep}} \), and vice versa, where ep is a very small positive real number. The amendment reliability factors in non-urban belong to the non-urban in the SAR image and the optical image could be denoted as αSAR, i = λSAR, i + λe, and \( {\alpha}_{\mathrm{Optical},i}={\lambda}_{\mathrm{Optical},i}+{\lambda}_e^{\prime } \) respectively. The reason is that when the current pixel is in a non-urban area, the value of the amendment reliability factor ωB under a given label with urban should be as large as possible, and the design \( {\alpha}_{s,i}={\lambda}_{s,i}+\frac{1}{ep} \) ensures that the value of the amendment reliability factor is large enough. The smaller the value of the amendment reliability factor \( {\omega}_{B^{\prime }} \) of the sensor under the given label with non-building should be smaller. In order to highlight the important effect between SAR image and optical images in the non-urban area, \( {\lambda}_e^{\prime } \) and λe are introduced into αOptical, i and αSAR, i, thus ensuring αSAR, i > αOptical, i, demonstrating the nature that the optical image contributes more to the image classification in the non-urban area than the SAR image. From the above discussion, we get
$$ \left\{\begin{array}{l} if\ {\mathrm{Mask}}_i=1,{\lambda}_e=1,{\lambda}_e^{\prime }=0:\\ {}P\left({X}_i|{\omega}_j\right)\\ {}\kern0.5em =\left\{\begin{array}{l}P{\left({X}_{\mathrm{SAR},i}|{\omega}_B\right)}^{\lambda_{\mathrm{SAR},i}+{\lambda}_e^{\prime }}\times P{\left({X}_{\mathrm{Optical},i}|{\omega}_B\right)}^{\lambda_{\mathrm{Optical},i}+{\lambda}_e^{\prime }}\\ {}P{\left({X}_{\mathrm{SAR},i}|{\omega}_{B^{\prime }}\right)}^{\lambda_{\mathrm{SAR},i}+{\lambda}_e}\times P{\left({X}_{\mathrm{Optical},i}|{\omega}_{B^{\prime }}\right)}^{\lambda_{\mathrm{Optical},i}+{\lambda}_e}\end{array}\right.\\ {}\\ {}\ if\ {\mathrm{Mask}}_i\ne 1,{\lambda}_e=1,{\lambda}_e^{\prime }=0, ep=0.00001:\\ {}P\left({X}_i|{\omega}_j\right)\\ {}\kern0.5em =\left\{\begin{array}{l}P{\left({X}_{\mathrm{SAR},i}|{\omega}_B\right)}^{\lambda_{\mathrm{SAR},i}+\frac{1}{ep}}\times P{\left({X}_{\mathrm{Optical},i}|{\omega}_B\right)}^{\lambda_{\mathrm{Optical},i}+\frac{1}{ep}}\\ {}P{\left({X}_{\mathrm{SAR},i}|{\omega}_{B^{\prime }}\right)}^{\lambda_{\mathrm{SAR},i}+{\lambda}_e}\times P{\left({X}_{\mathrm{Optical},i}|{\omega}_{B^{\prime }}\right)}^{\lambda_{\mathrm{Optical},i}+{\lambda}_e^{\prime }}\end{array}\right.\end{array}\right. $$
(9)
where Maski = 1 indicates that the current pixel is in the urban area and Maski ≠ 1 indicates that the current pixel is in a non-urban area.
If U(C| X1, …, Xn) = log(L(C| X1, …, Xn)), we introduce (8) into the following energy function [21] as
$$ U\left(C|{X}_1,\dots, {X}_n\right)=\sum \limits_{s=1}^n{\alpha}_s{U}_{\mathrm{data}}\left({X}_S\right)+{U}_{sp}(C) $$
(10)
Then, (9) is used in (10), we get the object functions of MRF with amendment reliability factors for classification as follows:
- (1)
If the current pixel is in the building area, then the energy function that belongs to the building is
$$ {\displaystyle \begin{array}{l}{U}_{\mathrm{data}}\left({X}_{\mathrm{Fused}}\right)+{U}_{sp}(C)\\ {}=-\Big\{\left({\lambda}_{\mathrm{SAR},i}+{\lambda}_e^{\prime}\right)\log \left(P\left({X}_{\mathrm{SAR},i}|{\omega}_B\right)\right)\\ {}+\left({\lambda}_{\mathrm{Optical},i}+{\lambda}_e^{\prime}\right)\log \left(P\left({X}_{\mathrm{Optical},i}|{\omega}_B\right)\right)\Big\}+{U}_{sp}(C)\end{array}} $$
(11)
-
(2)
If the current pixel is in the building area, then the energy function that belongs to the non-building is
$$ {\displaystyle \begin{array}{l}{U}_{data}\left({X}_{Fused}\right)+{U}_{sp}(C)\\ {}=-\Big\{\left({\lambda}_{SAR,i}+{\lambda}_e\right)\log \left(P\left({X}_{SAR,i}|{\omega}_{B^{\prime }}\right)\right)\\ {}+\left({\lambda}_{Optical,i}+{\lambda}_e\right)\log \left(P\left({X}_{Optical,i}|{\omega}_{B^{\prime }}\right)\right)\Big\}+{U}_{sp}(C)\end{array}} $$
(12)
-
(3)
If the current pixel is in a non-building area, then the energy function that belongs to the building is
$$ {\displaystyle \begin{array}{l}{U}_{\mathrm{data}}\left({X}_{\mathrm{Fused}}\right)+{U}_{sp}(C)\\ {}=-\Big\{\left({\lambda}_{\mathrm{SAR},i}+\frac{1}{ep}\right)\log \left(P\left({X}_{\mathrm{SAR},i}|{\omega}_B\right)\right)\\ {}+\left({\lambda}_{\mathrm{Optical},i}+\frac{1}{ep}\right)\log \left(P\left({X}_{\mathrm{Optical},i}|{\omega}_B\right)\right)\Big\}+{U}_{sp}(C)\end{array}} $$
(13)
-
(4)
If the current pixel is in a non-building area, then the energy function that belongs to the non-building is
$$ {\displaystyle \begin{array}{l}{U}_{\mathrm{data}}\left({X}_{\mathrm{Fused}}\right)+{U}_{sp}(C)\\ {}=-\Big\{\left({\lambda}_{\mathrm{SAR},i}+{\lambda}_e\right)\log \left(P\left({X}_{\mathrm{SAR},i}|{\omega}_{B^{\prime }}\right)\right)\\ {}+\left({\lambda}_{\mathrm{Optical},i}+{\lambda}_e^{\prime}\right)\log \left(P\left({X}_{\mathrm{Optical},i}|{\omega}_{B^{\prime }}\right)\right)\Big\}+{U}_{sp}(C)\end{array}} $$
(14)
If the current pixel is in the building area, the energy function of each class is calculated according to (11) and (12), then the label that minimizes the energy function is the final label for the current pixel. If the current pixel is in the non-urban region, the energy function of each class is calculated according to (13) and (14), and the class that minimizes the energy function is the final label for the current pixel.