In this section, we present a novel privacy-preserving crowdsourcing-based site survey scheme which can estimate the distribution of the WiFi signal strengths at each specific location without leaking the privacy of each supplier. The proposed scheme consists of four phases which are detailed as follows.

### 5.1 Preparation and initiation

In this phase, *n* suppliers and an aggregator form a site survey group. Within this site survey group, every supplier *u*
_{
i
} generates its public key PK_{
i
} and private key PR_{
i
} using the Paillier cryptosystem and then sends the public key PK_{
i
} to other suppliers and the aggregator. The above process can be executed offline and only needs to be performed once. If the aggregator wants to estimate the mean and variance of the WiFi signal strengths from the *j*th AP at location *l*
_{
s
}, it sends a request with <AP_{
j
},*l*
_{
s
}> to every supplier in this group.

### 5.2 Adding noises

After receiving the aggregator’s request, every supplier *u*
_{
i
} first queries its local dataset *V*
_{
i
} to get a tuple \((m_{i},v_{i}^{sj})\), where *m*
_{
i
} indicates whether the supplier *u*
_{
i
} visited location *l*
_{
s
} before and \(v_{i}^{sj}\) is the measured WiFi signal strength of the *j*th AP at location *l*
_{
s
}. If the supplier *u*
_{
i
} visited location *l*
_{
s
} before, *m*
_{
i
} is set to 1 and \(v_{i}^{sj}\) is set to the measured WiFi signal strength. If the supplier *u*
_{
i
} never visited location *l*
_{
s
} before, *m*
_{
i
} and \(v_{i}^{sj}\) are both set to 0.

To ensure that the presence or absence of the supplier *u*
_{
i
} in the site survey group will not significantly increase her chance of being compromised (i.e., to achieve *ε*-differential privacy), every supplier *u*
_{
i
} adds appropriately chosen random noises to \(v_{i}^{sj}\) and *m*
_{
i
} as follows:

$$ v_{i}^{\prime }=v_{i}^{sj}+\mathcal{G}_{1}(n,\lambda_{1}) - \mathcal{G}_{2}(n,\lambda_{1}), $$

((7))

$$ m_{i}^{\prime }=m_{i} + \mathcal{G}_{3}(n,\lambda_{2}) - \mathcal{G}_{4}(n,\lambda_{2}), $$

((8))

where \(\mathcal {G}_{1}(n,\lambda _{1})\) and \(\mathcal {G}_{2}(n,\lambda _{1})\) are independent and identically distributed (i.i.d.) random variables having gamma distribution with probability density function (PDF)

$$ g(x,n,\lambda_{1})=\frac{(1/\lambda_{1})^{1/n}}{\Gamma (1/n)}x^{\frac{1}{n}-1}e^{-x/\lambda_{1} }, $$

((9))

and \(\mathcal {G}_{3}(n,\lambda _{2})\) and \(\mathcal {G}_{4}(n,\lambda _{2})\) are i.i.d. random variables having gamma distribution with PDF

$$ g(x,n,\lambda_{2})=\frac{(1/\lambda_{2})^{1/n}}{\Gamma (1/n)}x^{\frac{1}{n}-1}e^{-x/\lambda_{2} }. $$

((10))

*λ*
_{1}=*Δ*
*f*
_{1}/*ε* and *λ*
_{2}=*Δ*
*f*
_{2}/*ε*, where *Δ*
*f*
_{1} and *Δ*
*f*
_{2} are the global sensitivity of the WiFi signal strength and *m*, respectively. Since the WiFi signal strength ranges from −90 to 0 dbm and *m*∈{0,1}, we set *Δ*
*f*
_{1}=90 and *Δ*
*f*
_{2}=1. The parameter *ε* controls the trade-off between the desired privacy level and the data utility. A smaller *ε* yields a stronger privacy guarantee but generates more noises. In the evaluation section, we will investigate the impact of *ε* on the data utility. We will prove that the proposed scheme can achieve *ε*-differential privacy in the next section.

### 5.3 Encrypting data

After adding noises to her data, every supplier employs secret sharing [23] and Paillier cryptosystem to hide her data from the aggregator. For simplicity, we only demonstrate how to hide \(v_{i}^{\prime }\). The way to hide \(m_{i}^{\prime }\) is the same. Each supplier *u*
_{
i
} first splits \(v_{i}^{\prime }\) into *n* random shares as follows:

$$ \begin{aligned} & u_{1}: v_{1}^{\prime} = v_{11}^{\prime} + v_{12}^{\prime} + \ldots v_{1i}^{\prime} + \ldots v_{1n}^{\prime} \mod \eta\\ & u_{2}: v_{2}^{\prime} = v_{21}^{\prime} + v_{22}^{\prime} + \ldots v_{2i}^{\prime} + \ldots v_{2n}^{\prime} \mod \eta\\ & \ldots\\ & u_{i}: v_{i}^{\prime} = v_{i1}^{\prime} + v_{i2}^{\prime} + \ldots v_{ii}^{\prime} + \ldots v_{in}^{\prime} \mod \eta\\ & \ldots\\ & u_{n}: v_{n}^{\prime} = v_{n1}^{\prime} + v_{n2}^{\prime} + \ldots v_{ni}^{\prime} + \ldots v_{nn}^{\prime} \mod \eta,\\ \end{aligned} $$

((11))

where *η* is a large integer. Then, each supplier *u*
_{
i
} keeps \(v_{ii}^{\prime }\) for herself, encrypts \(v_{ij}^{\prime }\) using the public key of supplier *u*
_{
j
}, and then sends \(E_{\text {PK}_{j}}(v_{ij}^{\prime })\) to the aggregator. After the aggregator receives the encrypted shares from all the suppliers, she adds the shares which are encrypted by the same public key based on the additively homomorphic property of the Paillier cryptosystem as follows:

$$ \begin{aligned} &V_{1} = E_{\text{PK}_{1}}({\underset{j\neq 1}{\sum }v_{j1}^{\prime}}) = \underset{j\neq 1}{ \prod}E_{\text{PK}_{1}}(v_{j1}^{\prime})\\ &V_{2} = E_{\text{PK}_{2}}({\underset{j\neq 2}{\sum }v_{j2}^{\prime}}) = \underset{j\neq 2}{ \prod}E_{\text{PK}_{2}}(v_{j2}^{\prime})\\ & {\ldots}\\ &V_{i} = E_{\text{PK}_{i}}({\underset{j\neq i}{\sum }v_{ji}^{\prime}}) = \underset{j\neq i}{ \prod}E_{\text{PK}_{i}}(v_{ji}^{\prime})\\ & {\ldots}\\ &V_{n} = E_{\text{PK}_{n}}({\underset{j\neq n}{\sum }v_{jn}^{\prime}}) = \underset{j\neq n}{ \prod}E_{\text{PK}_{n}}(v_{jn}^{\prime}).\\ \end{aligned} $$

((12))

Then, the aggregator sends *V*
_{
i
} to the supplier *u*
_{
i
}. Every supplier *u*
_{
i
} decrypts *V*
_{
i
} using her secret key PR_{
i
}, and adds her share \(v_{ii}^{\prime }\phantom {\dot {i}\!}\) to \(D_{\text {PK}_{i}}(V_{i})\phantom {\dot {i}\!}\) to get \(V_{i}^{\prime } = \sum _{j=1}^{n}v_{ji}^{\prime }\phantom {\dot {i}\!}\) in plaintext and sends \(v_{i}^{\prime }\) to the aggregator. Adding all \(V_{i}^{\prime }(1\leq i\leq n)\) together, the aggregator can get \(V^{\prime } = \sum _{i=1}^{n}V_{i}^{\prime }\) which is equal with \(\sum _{i=1}^{n}v_{i}^{\prime }\). We will prove its correctness in next section. In the same way, every supplier can hide \(m_{i}^{\prime }\) from others, but the aggregator can get \(M^{\prime } = \sum _{i=1}^{n}m_{i}^{\prime }\).

### 5.4 Estimating the parameters

After getting \(V^{\prime } = \sum _{i=1}^{n}V_{i}^{\prime }\) and \(M^{\prime } = \sum _{i=1}^{n}m_{i}^{\prime }\), the aggregator can estimate the mean (denoted as *μ*
^{′}) of the WiFi signal strengths of AP_{
t
} at location *l*
_{
s
} as follows:

$$ \mu^{\prime} = \frac{V^{\prime}}{M^{\prime}} = \frac{\sum_{i=1}^{n}v_{i}^{\prime}}{\sum_{i=1}^{n}m_{i}^{\prime}}. $$

((13))

Since every supplier adds controlled noises to her data, the estimated mean *μ*
^{′} is not exactly the same as \(\mu = \sum _{i=1}^{n}v_{i}^{sj}/\sum _{i=1}^{n}m_{i}\). The estimation error is controlled by the parameter *ε*. We will investigate the impact of *ε* on the estimation errors and show that the localization accuracy when we use *μ*
^{′} is comparable with that when we use *μ* in most cases.

To estimate the variance *δ*
^{′}, the aggregator send *μ*
^{′} back to every supplier *u*
_{
i
} which can then get *δ*
_{
i
} as follows:

$$ \delta_{i}= \left\{\begin{array}{ll} (v_{i}^{sj}- \mu^{\prime})^{2}, & \text{if }m_{i} = 1 \\ \quad \quad 0, & \text{if }m_{i} = 0.\end{array}\right. $$

((14))

Following the same rules described above, every supplier *u*
_{
i
} adds a random noise \(\mathcal {G}_{5}(n,\lambda _{1}) - \mathcal {G}_{6}(n,\lambda _{3})\) to *δ*
_{
i
} to get \(\delta _{i}^{\prime } = \delta _{i} + \mathcal {G}_{5}(n,\lambda _{1}) - \mathcal {G}_{6}(n,\lambda _{3})\), and then sends \(\delta _{i}^{\prime }\) to the aggregator in a secret way. \(\mathcal {G}_{5}(n,\lambda _{1})\) and \(\mathcal {G}_{6}(n,\lambda _{3})\) are i.i.d. random variables having gamma distribution with PDF

$$ g(x,n,\lambda_{3})=\frac{(1/\lambda_{3})^{1/n}}{\Gamma (1/n)}x^{\frac{1}{n}-1}e^{-x/\lambda_{3} }, $$

((15))

where *λ*
_{3}=90^{2}/*ε*. Following the same rules above, the aggregator computes \(\sum _{i=1}^{n}\delta _{i}^{\prime }\) without knowing every \(\delta _{i}^{\prime }\), and the variance of the WiFi signal strengths of AP_{
t
} at location *l*
_{
s
} can be estimated by \(\delta ^{\prime } = \sum _{i=1}^{n}\delta _{i}^{\prime }/M^{\prime }\).