Decentralized Detection in Wireless Sensor Networks with Channel Fading Statistics

Existing channel aware signal processing design for decentralized detection in wireless sensor networks typically assumes the clairvoyant case, i.e., global information regarding the transmission channels is known at the design stage. In this paper, we consider the distributed detection problem where only the channel fading statistics, instead of the instant channel state information (CSI), is available to the designer. We investigate the design of local decision rules for the following two cases: 1. fusion center has the instant CSI; 2. fusion center does not have the instant CSI. We show that, for both cases, the optimal local decision rules that minimize the error probability at the fusion center amount to a likelihood-ratio test (LRT), as in the previous work with known CSI. The proposed approach enables distributed design for a decentralized detection problem


INTRODUCTION
While study of decentralized decision making can be traced back to the early 1960s in the context of team decision problems (see, e.g., [1]), the effort significantly intensified since the seminal work of [2]. Classical distributed detection [3][4][5][6][7], however, typically assumes error-free transmission between the local sensors and the fusion center. This is overly idealistic in the emerging systems with stringent resource and delay constraints, such as the wireless sensor network (WSN) with geographically dispersed lower-power low-cost sensor nodes. Accounting for nonideal transmission channels, channel aware signal processing for distributed detection problem has been developed in [8][9][10]. The optimal local decision rule was still shown to be a monotone likelihood ratio partition of its observation space, provided the observations were conditionally independent across the sensors. It was noted recently that such optimality is preserved for a more general setting [11].
The work in [8][9][10] assumed a clairvoyant case, that is, global information regarding the transmission channels between the local sensors and the fusion center is available at the design stage. This approach is theoretically significant as it provides the best achievable detection performance to which any suboptimal approach needs to be compared. However, its implementation requires the exact knowledge of global channel state information (CSI) which may be costly to acquire. In the case of fast fading channel, the sensor decision rules need to be synchronously updated for different channel states; this adds considerable overhead which may not be affordable in resource constrained systems.
To make the channel aware design more practical, the requirement of the global CSI in the distributed signaling design needs to be relaxed. In the present work, only partial channel knowledge instead of the global CSI is assumed to be available. In the context of WSN, a reasonable assumption is the availability of channel fading statistics, which may remain stationary for a sufficiently long period of time. Therefore, the updating rate of the decision rules is more realistic. In this paper, we consider the distributed detection problem where the designer only has the channel fading statistics instead of the instantaneous CSI. In this case, a sensible performance measure is to use the average error probability at the fusion center where the averaging is performed with respect to the channel state. We restrict ourselves to binary local sensor outputs and derive the necessary conditions for optimal local decision rules that minimize the average error probability at the fusion center for the following two 2 EURASIP Journal on Wireless Communications and Networking Figure 1: A block diagram for a wireless sensor network tasked for binary hypothesis testing with channel fading statistics. cases: (1) CSIF: the fusion center has access to the instantaneous CSI. (2) NOCSIF: the fusion center does not have access to the instantaneous CSI. We note here that for the CSIF case, the design of sensor decision rules does not require CSI; the CSI is only used in the fusion rule design, which is reasonable due to the typical generous resource constraint at the fusion center. Its computation, however, is very involved and has to resort to exhaustive search. On the other hand, as to be elaborated, the NOCSIF case can be reduced to the channel aware design where one averages the channel transition probability with respect to the fading channel using the known fading statistics.
We show that the sensor decision rules amount to local LRTs for both cases. Compared with the existing channel aware design based on CSI, the proposed approaches have an important practical advantage: the sensor decision rules remain the same for different CSI, as long as the fading statistics remain unchanged. This enables distributed design as no global CSI is used in determining the local decision rules. We also demonstrate through numerical examples that the proposed schemes suffer small performance loss compared with the CSI-based approach, as long as the CSI is available at the fusion center, that is, used in the fusion rule design.
The paper is organized as follows. Section 2 describes the system model and problem formulation. In Section 3, we establish, for both the CSIF and NOCSIF cases, the optimality of LRTs at local sensors for minimum average error probability at the fusion center. Numerical examples are presented in Section 4 to evaluate the performance of these two cases. Finally, we conclude in Section 5.

STATEMENT OF THE PROBLEM
Consider the problem of testing two hypotheses, denoted by H 0 and H 1 , with respective prior probabilities π 0 and π 1 . A total number of K sensors are used to collect observations X k , for k = 1, . . . , K. We assume throughout this paper that the observations are conditionally independent, that is, Without this assumption, the distributed detection design becomes much more complicated from an algorithmic viewpoint [12]. The optimal design is not completely understood even in the simplest case [13]. Upon observing X k , each local sensor makes a binary decision 1 The decisions U k are sent to a fusion center through parallel transmission channels characterized by where g = {g 1 , . . . , g K } represents the CSI. Thus, from (3), the channels are orthogonal to each other, which can be achieved through, for example, partitioning time, frequency, or combinations thereof. For the CSIF case, the fusion center takes both the channel output y = {Y 1 , . . . , Y K } and the CSI, g, and makes a final decision using the optimal fusion rule to obtain U 0 ∈ {H 0 , H 1 }, For the case of NOCSIF, the fusion output depends on the channel output and the channel fading statistics, where the dependence of fading channel statistics is implicit in the above expression. An error happens if U 0 differs from the true hypothesis. Thus, the error probability at the fusion center, conditioned on a given g, is where H is the true hypothesis. Our goal is, therefore, to design the optimal mapping γ k (·) for each sensor and the fusion center that minimizes the average error probability, defined as where p(g) is the distribution of CSI. A simple diagram illustrating the model is given in Figure 1.
We point out here that integrating the transmission channels into the fusion rule design has been investigated before [14,15]. The optimal fusion rule in the Bayesian sense amounts to the maximum a posteriori probability (MAP) decision, that is, Given specified local sensor signaling schemes and the channel characterization, this MAP decision rule can be obtained in a straightforward manner. As such, we will focus on the local decision rule design without further elaborating on the optimal fusion rule design.

DESIGN OF OPTIMAL LOCAL DECISION RULES
As in [9,10], we adopt in the following a person-by-person optimization (PBPO) approach, that is, we optimize the local decision rule for the kth sensor given fixed decision rules at all other sensors and a fixed fusion rule. As such, the conditions obtained are necessary, but not sufficient, for optimality. Denote the average error probability at the fusion center is where, different from the CSI-based channel aware design, the local decision rules do not depend on the instantaneous CSI. Next, we will derive the optimal decision rules by further expanding the error probability with respect to the kth decision rule γ k (·) for the two different cases.

The CSIF case
We first consider the case where the fusion center knows the instantaneous CSI. Define, for k = 1, . . . , K and i = 0, 1, We can expand the average error probability in (10) with respect to the kth decision rule γ k (·), and we get where is a constant with regard to U k , and To minimize P e0 , one can see from (12) that the optimal decision rule for the kth sensor is Let us further take a look at A k in (14). We can rewrite it as Then, as shown in Appendix A, A k > 0 as long as is a monotone increasing function of U k (monotone LR index assignment), that is, where A k and B k are defined in (14) and (15), respectively.

EURASIP Journal on Wireless Communications and Networking
An alternative derivation, along the line of [11], is given in Appendix B.
Although the optimal local decision rule for each local sensor is explicitly formulated in (20), it is not amenable to direct numerical evaluation: in (14) and (15), the integrand involves the fusion rule that is a highly nonlinear function of the CSI g, making the integration formidable. The only possible way of finding the optimal local decision rules appears to be an exhaustive search, whose complexity becomes prohibitive when K is large.
This numerical challenge motivates an alternative approach: instead of minimizing the average error probability, one can first marginalize the channel transition probability followed by the application of standard channel aware design [8][9][10]. That is, we first compute p(y | u) by marginalizing out the channel g using the channel fading statistics: With this marginalization, we can use the channel aware design approach [9] that tends to the "averaged" transmission channel. The difference between this alternative approach and that of minimizing (7) is in the way that the channel fading statistics, p(g), is utilized. In using (7), the variable to be averaged is P e0 (γ 0 , . . . , γ K | g), which is a highly nonlinear function of the decision rules, whereas the alternative approach uses p(g) to obtain the marginalized channel transition probability, thus enabling the direct application of the channel aware approach. This difference can also be explained using Figure 1. The alternative approach averages each transmission channel over respective channel statistics g i to obtain p(Y i | U i ), while the CSIF case averages all the transmission channels and the fusion center (the part in the dashed box in Figure 1) over channel statistics g to obtain P U 0 | u = g y P U 0 | y, g p y | u, g p(g)dy dg. (22) It turns out that this alternative approach is a direct consequence of the NOCSIF, as elaborated below.

The NOCSIF case
Here we consider the case where the fusion center does not know the instantaneous CSI. Therefore, we have As the fusion rule no longer depends on g, the average error probability in (10) can be rewritten as where integration with respect to g is carried out first. Notice that the term ( g p(y | u, g)p(g)dg) precisely describes the marginalized transmission channels (cf. (21)). This leads to the standard channel aware design where the transmission channels are characterized by p(y | u). From [9], we have a result resembling that of Theorem 1 except that A k , B k , and C are replaced by A k , B k , and C , Contrary to the CSIF case, A k , B k for the NOCSIF case are much easier to evaluate.

PERFORMANCE EVALUATION
In this section, we first use a two-sensor example to evaluate the performance of both the CSIF and NOCSIF cases and compare them with the clairvoyant case where the global channel information is known to the designer. For convenience, we call the clairvoyant case the CSI case. Consider the detection of a known signal S in zero-mean complex Gaussian noises that are independent and identically distributed (i.i.d.) for the two sensors, that is, for k = 1, 2, with N 1 and N 2 being i.i.d. CN (0, σ 2 1 ). Without loss of generality, we assume S = 1 and σ 2 1 = 2. Each sensor makes a binary decision based on its observation X k , and then transmits it through a Rayleigh fading channel to the fusion center. Notice that U k ∈ {0, 1} implies an on-off signaling, thus enabling the detection at the fusion center in the absence of CSI (i.e., the NOCSIF case). The channel output is where g 1 , g 2 are independently distributed with zero-mean complex Gaussian distributions CN (0, σ 2 g1 ) and CN (0, σ 2 g2 ), B. Liu and B. Chen  respectively. W 1 , W 2 are i.i.d. zero-mean complex Gaussian noises with distribution CN (0, σ 2 2 ). We first consider a symmetric case where the CSI is identically distributed, that is, σ g 2 1 = σ g 2 2 . Without loss of generality, we assume σ g 2 1 = σ g 2 2 = 1. In Figure 2, with the assumption of equal prior probability, the average error probabilities as a function of the average signal-to-noise ratio (SNR) of the received signal at the fusion center are plotted for the CSIF and NOCSIF cases, along with the CSI case. We also plot a curve, legended with "CSIF1" in Figure 2, where the local sensors use thresholds obtained via the NOC-SIF approach but the fusion center implements a fusion rule that utilizes the CSI g. The motivation is twofold. First, estimating g at the fusion center is typically feasible. Second and more importantly, the threshold design for NOC-SIF is much simpler compared with CSIF, as explained in Section 3.1.
In evaluating the performance of CSIF case, we consider the following two methods to get the optimal thresholds for local sensors.

Exhaustive search method
We first partition the space of likelihood ratio of observations into many small disjoint cells. For each cell, we compute the average error probability at the fusion center by using the center point of the cell (for small enough cell size, the center point can be considered "representive" of the whole cell) as the local thresholds. After evaluating the performance for all the cells, we choose the one with smallest average error probability as our optimal thresholds. Intuitively, one can get arbitrarily close to the optimal thresholds by decreasing the cell size, which proportionally increases the computational complexity.
Greedy search method is much faster than exhaustive search method since we do not need to compute the average error probability corresponding to every point. But it can only guarantee convergence to a local minimum point.
As seen from Figure 2, the CSI case has the best performance and NOCSIF case has the worst performance. This is true because the designer has the most information in the clairvoyant case and the least information in NOCSIF case. The CSIF case is only slightly worse than the CSI case but is much better than the NOCSIF case. The difference of CSIF1 and CSIF is almost indistinguishable. The explanation is that the performance is much more sensitive to the fusion rule than to the local sensor thresholds. This phenomenon has been observed before: the error probability versus threshold plot is rather flat near the optimum point, hence is robust to small changes in thresholds.
We also consider an asymmetric case where σ g 2 1 = σ g 2 2 . As plotted in Figure 3 where σ g 2 1 = 1 and σ g 2 2 = 3, all three cases with known CSI at the fusion center have similar performance and are much better than the NOCSIF case. This is consistent with the symmetric case.
As we stated above, for the system with only two local sensors, a mixed approach (CSIF1) achieves almost same performance to that of the CSIF case. In the following, we show that the same holds true even in the large system regime, that is, as K increasing. We first consider the Bayesian framework. As K goes to infinity, all the local sensors use the same local decision rule [16] and the optimal local thresholds are determined by maximizing the Chernoff information to achieve the best error exponent, For simplicity, we consider another performance measure, Bhattacharyya's distance, which is an approximation of 6 EURASIP Journal on Wireless Communications and Networking  Chernoff information by setting s = 1/2, From Figure 4, which gives Bhattacharyya's distance as a function of local threshold for both CSIF and NOCSIF cases with σ 2 2 = 2 and σ 2 g = 1, we can observe that the optimal threshold obtained in NOCSIF case is close to that of the CSIF case.
Alternatively, under the Neyman-Pearson framework, the Kullack-Leibler (KL) distance gives the asymptotic error exponent, In Figure 5, with the same setting as in Figure 4, the KL distance as a function of local threshold for both CSIF and NOCSIF cases is plotted and we can also observe that the optimal local thresholds obtained in NOCSIF and CSIF cases are close to each other.

CONCLUSIONS
In this paper we investigated the design of the distributed detection problem with only channel fading statistics available to the designer. Restricted to conditional independent observations and binary local sensor decisions, we derive the necessary conditions for optimal local sensor decision rules that minimize the average error probability for the CSIF and NOCSIF cases. Numerical results indicate that a mixed approach where the sensors use the decision rules from the NOCSIF approach while the fusion center implements a fusion rule using the CSI achieves almost identical performance to that of the CSIF case.

A. PROOF OF
Since the observations, X 1 , . . . , X K , are conditionally independent and the local decision in each sensor depends only on its own observation, the local decisions are also conditionally independent for a given hypothesis. In addition, the local decisions are transmitted through orthogonal channels, thus the channel output for one sensor is conditionally independent to the channel input from another sensor. Therefore, Defining the likelihood ratio function for the local decision at the kth sensor, which is a monotone increasing function of L(U k ). Then, given a monotone LR index assignment for the local output, that is, L(U k = 1) > L(U k = 0), P(H 1 | y k , U k ) is a monotone increasing function of U k . Thus, Similarly, we can get The optimum fusion rule is a maximum a posteriori rule (i.e., to minimize error probability). Thus deciding U 0 = 1 for a given y k and U k = 0 requires From (A.5) and (A.6), (A.7) implies Therefore, we must have U 0 = 1 for the same y k with U k = 1. Thus and further, Similarly, one can show and further, Assuming the cost that we want to minimize at the fusion center can be written as J = E C γ 0 (y, g), u, H (B .3) and we further have J = E C γ 0 (y, g), u, H = E C γ 0 Y k , y k , g k , g k , γ k X k , u k , H = E E C γ 0 Y k , y k , g k , g k , γ k X k , u k , H | u k , y k , g k , g k , X k , H = E E C γ 0 Y k , y k , g k , g k , γ k X k , u k , H | X k , g k = E Yk C γ 0 Y k , y k , g k , g k , γ k X k , u k , H × p Y k | γ k X k , g k dY k , (B.4) where g k = [g 1 , . . . , g k−1 , g k+1 , g K ] and (B.4) follows that conditioned on γ k (X k ) and g k , the channel output Y k is independent of everything else.

EURASIP Journal on Wireless Communications and Networking
Defining X = X k , z = (y k , u k , g), F U k , z, H = Yk C γ 0 Y k , y k , g , U k , u k , H p Y k | U k , g k dY k (B.5) and substituting them into (B.1), we obtain the optimal local decision rule for kth sensor: γ k X k = arg min In the approach proposed in this paper, the objective is to minimize the average error probability at the fusion center, J = g P γ 0 (y, g) = H p(g) dg. (B.12) Thus we have α k H j , U k = y g P U 0 = 1 − j | y, g × u k p y | u k , U k , g p(g)P u k | H j dg dy.