Privacy-preserving crowdsourced site survey in WiFi fingerprint-based localization

Li, Shujun; Li, Hong; Sun, Limin

doi:10.1186/s13638-016-0624-2

Research
Open access
Published: 04 May 2016

Privacy-preserving crowdsourced site survey in WiFi fingerprint-based localization

Shujun Li¹,
Hong Li² &
Limin Sun²

EURASIP Journal on Wireless Communications and Networking volume 2016, Article number: 123 (2016) Cite this article

2445 Accesses
18 Citations
Metrics details

Abstract

Typically, site survey is an inevitable phase for WiFi fingerprint-based localization which is regarded as one of the most promising techniques for indoor localization. However, the site survey can cause potential location privacy leakage for the participants who contribute their WiFi fingerprint measurements. In this paper, we propose a privacy-preserving site survey scheme for WiFi fingerprint-based localization. In the proposed scheme, we use homomorphic encryption to protect the location privacy of the participants which get involved in the site survey. Further, we employ differential privacy model to ensure that the released data will not breach an individual’s location privacy regardless of whether she is present or absent in the site survey group. We theoretically analyze the security of the proposed scheme and use simulation experiments on a real-world data to validate the efficiency of the proposed scheme.

1 Introduction

Due to the increasing demand for location-based services (LBSs) and the lack of GPS signals in indoor environments, indoor localization has become more and more popular in recent years. Researchers have proposed a vast range of approaches, among which WiFi fingerprint-based indoor localization is one of the most promising technologies [1–4]. A typical WiFi fingerprint-based localization algorithm consists of two phases, offline site survey and online operating. In the offline site survey phase, the service provider collects WiFi signal strengths from multiple access points (APs) at every location of an interested area. Next, in the online operating phase, a to-be-localized client measures the signal strengths at a specific location from nearby APs, and then algorithms such as k-nearest neighbors [1, 3] or probability-based algorithms [5] are employed to infer the user’s location based on the measured WiFi signal strengths.

Usually the site survey is conducted in a crowdsourced way [2, 6, 7]. Suppliers recruited by the service provider measure the WiFi signal strengths of nearby APs when they visit the places which the service provider is interested in, and then send the measured WiFi signal strengths and the corresponding locations to the service provider. The service provider aggregates the data contributed by the suppliers to estimate the parameters which will be used in the online operating phase. The parameters which need to be estimated depend on the algorithms used in the online operating phase. In k-nearest neighbor-based algorithms, the mean of the WiFi signal strength of every AP at every location needs to be estimated, while in the probability-based algorithms, both the mean and variance of the signal strength of every AP are required. Crowdsourcing is an efficient way to conduct the site survey, but the measurements contributed by the suppliers will inevitably leak their location privacy. The service provider can infer the locations that the suppliers visit based on the data they contribute. Existing research indicates that location traces can leak information about the individuals’ habits, interests, activities, and relationships [8, 9]. Consequently, the loss of location privacy can expose the suppliers to unwanted advertisements and location-based spams/scams and may cause social reputation or economic damage to the suppliers and can make the victims of blackmails or even physical violence.

Several approaches have been proposed to address the privacy issues of indoor localization algorithms. In [10], Shu et al. studied the privacy issues in range-based localization algorithms and proposed a scheme to protect users’ privacy during the localization process. In [11], Wang et al. developed a privacy-preserving fuzzy localization scheme with CSI (Channel State Information) fingerprint. These privacy-preserving schemes were not designed for WiFi fingerprint-based localization algorithms; thus, they cannot be used to address the privacy issues presented in this paper. The most closely related work to this paper is that of Li et al. [12] which proposed a privacy-preserving scheme to address the privacy issues of the online operating phase in WiFi fingerprint-based localization algorithms. However, they did not consider the privacy leaks of the offline site survey phase.

In this paper, we propose a privacy-preserving site survey scheme which can protect the suppliers’ location privacy in crowdsourcing-based site survey for WiFi fingerprint-based localization and, at the same time, can ensure the usability of the aggregated result for the service provider. Under this scheme, all the suppliers involved in the site survey form a group and they cooperate with each other to hide their measurements from the service provider based on homomorphic encryption. Further, every supplier releases her measurements in a differential private manner to guarantee that the released data will not breach an individual’s location privacy regardless of whether she is present or absent in the group. The contributions of this paper are summarized as follows:

To the best of our knowledge, this work is the first to address the privacy issues of the site survey in WiFi fingerprint-based localization algorithms.
We propose a privacy-preserving site survey scheme for WiFi fingerprint-based localization based on homomorphic encryption and differential privacy model.
We theoretically analyze the security of the proposed scheme and carry out simulation experiments on a real-world dataset to evaluate the performance of our scheme.

The rest of the paper is organized as follows. We first discuss the related work and introduce the background. Then, we present the detailed design of our scheme and the security analysis. Finally, we report the evaluation results and conclude this paper.

2 Related work

Location privacy in LBSs has been widely studied in the literature. In general, all the existing works can be classified into two categories: privacy-preserving service request and privacy-preserving localization.

2.1 Privacy-preserving service request

In LBSs, users send their locations to the service provider to get the services, which will inevitably leak their privacy. Many schemes have been proposed to protect users’ location privacy when they request the location-based services. k-anonymity [13, 14] provides a form of plausible deniability by ensuring that the client cannot be individually identified from a group of k clients. Mix zone-based schemes [15] divide the whole region into application and mix zones. Clients report their locations in application zones and receive new, unused pseudonyms at mix zones. Cryptography-based approaches [16, 17] protect users’ location privacy based on secure multiparty computation protocols. Since all the above schemes focus on the privacy issues when users request the location services, thus they cannot be used to address the privacy issues we discuss in this paper.

2.2 Privacy-preserving localization

To address the location privacy issues in localization, Shu et al. [10] addressed the privacy leakage problem for range-based localization algorithms, thus preventing the leakage of the location information of both the target and the anchors. Wang et al. [11] developed a privacy-preserving fuzzy localization scheme with CSI fingerprint using homomorphic encryption and fuzzy logic. These privacy-preserving schemes were not designed for WiFi fingerprint-based localization; thus, they cannot be used to address the privacy issues presented in this paper. Li et al. [12] studied the privacy issues in WiFi fingerprint-based localization and proposed a privacy-preserving scheme to protect both the users’ and the service provider’s privacy during the online operating phase. However, they did not consider the privacy leaks during the site survey phase.

3 Background

3.1 WiFi fingerprint-based localization

The process of WiFi fingerprint-based localization can be divided into two phases: offline site survey phase and online operating phase. In the offline site survey phase, a supplier u _i recruited by the service provider measures the WiFi signal strengths ${V_{s}^{i}}$ of nearby APs when they visit a place l _s and send $(l_{s}, {V_{s}^{i}})$ to the service provider which aggregates the measurements and estimates the parameters which will be used in the online operating phase. In the online operating phase, a to-be-localized user measures the WiFi signal strengths at her current location, denoted as $V^{\prime }=\left (v_{1}^{\prime },v_{2}^{\prime },\ldots,v_{j}^{\prime },\ldots,v_{N}^{\prime }\right)$. Then, the service provider uses k-nearest neighbors or probability-based algorithms to determine the location of the user.

In k-nearest neighbor-based algorithms [1, 3], the service provider estimates the average WiFi signal strengths $\overline {V_{s}}$ at every location l _s based on the suppliers’ measurements and stores $(l_{s}, \overline {V_{s}})$ in the WiFi fingerprint database. In the online operating phase, k-nearest neighbors of V ^′ are identified from the database to estimate the location of the user. In probability-based algorithms [18, 19], the location loc of the user is determined based on the Bayes’ theorem

$$ \begin{aligned} \text{loc} &=\underset{l\in L}{\arg \max }P\left(l|V^{\prime}\right) \\ &= \underset{l\in L}{\arg \max }P\left(l|v_{1}^{\prime},v_{2}^{\prime},\ldots,v_{n}^{\prime}\right)\\ &=\underset{l\in L}{\arg \max }\left(P\left(l\right)\cdot \underset{i=1}{\overset{n}{\prod }}P(v_{i}^{\prime}|l)\right). \end{aligned} $$

((1))

A common assumption is that the signal strength of AP_i at location l follows a normal distribution parameterized with mean μ and variance δ. The parameters μ and δ are estimated by the service provider based on the measurements of the suppliers.

3.2 Differential privacy

The concept of differential privacy is originally introduced by Dwork [20]. Differential privacy ensures that a supplier is not at increasing risk of privacy when she participates in a certain statistical database. An algorithm $\mathcal {A}$ is ε-differential privacy, if for any datasets D ₁ and D ₂, where D ₁ and D ₂ differ in at most one record, and for all subsets of possible answers $S \subseteq \text {Range}(\mathcal {A})$,

$$ \Pr (\mathcal{A}(D_{1})\in S)\leq e^{\epsilon }\cdot \Pr (\mathcal{A}(D_{2})\in S). $$

((2))

The above equation indicates that the output of $\mathcal {A}$ is insensitive to the modification of any single user’s data in the datasets (including its removal or addition). The parameter ε allows us to control the balance between the level of privacy and the data utility. A smaller ε implies stronger privacy. One common way to achieve differential privacy is to add Laplace noises to the original output of $\mathcal {A}$ according to the following theorem.

Theorem 1.

For all f:D→R ^d, the following mechanism $\mathcal {A}$ is ε-differential private: $\mathcal {A}(D)=f(D)+\mathcal {L}(\Delta (f)/\epsilon)$, where $\mathcal {L}(\Delta (f)/\epsilon)$ is an independently generated random variable following the Laplace distribution and Δ(f) denotes the global sensitivity of f, which is defined as follows:

$$ \Delta f = \underset{D_{1},D_{2}}{\max }\left\vert f(D_{1})-f(D_{2})\right\vert $$

((3))

for all D ₁ and D ₂ differing in at most one record.

3.3 The Paillier cryptosystem

In this work, we employ the Paillier cryptosystem as our cryptographic primitive. Invented by Pascal Paillier [21], the Paillier cryptosystem is a probabilistic asymmetric algorithm based on the decisional composite residuosity problem. Paillier cryptosystem is summarized below to facilitate the understanding of our algorithm.

Key generation: To construct the public and private keys, one first chooses two large primes p, q of equivalent length and computes N=p q, λ=l c m(p−1,q−1), g=N+1, and μ=φ(N)⁻¹ mod n, where φ(N)=(p−1)(q−1). The public key PK and private key PR are (N,g) and (λ,μ), respectively.
Encryption: Let m be the plaintext to be encrypted. We denote the ciphertext of m by E(m), which is given by
$$ E(m) =g^{m}r^{N}\mod N^{2}, $$
((4))

where $r\in \mathbb {Z}_{N}$ is a random number.
Decryption: Let c be the ciphertext, the plaintext D(m) is obtained by
$$ D(m) = L(c^{\lambda} \mod N^{2})\mu \mod N. $$
((5))

The Paillier cryptosystem is additively homomorphic. Given only the public key, one can compute E(m ₁+m ₂) from E(m ₁) and E(m ₂) as follows:

$$ E(m_{1}+m_{2}\mod N) = E(m_{1}) \cdot E(m_{2}) \mod N^{2}. $$

((6))

4 System model and problem formulation

4.1 System model

A typical scenario of crowdsourcing-based site survey in WiFi fingerprint-based localization is depicted in Fig. 1. In general, there are n suppliers and an aggregator (i.e., the service provider). The suppliers could be volunteers or workers recruited by the service provider. Every supplier u _i records the WiFi signal strengths ${V_{i}^{s}} = (v_{i}^{s1}, v_{i}^{s2},\ldots, v_{i}^{sj},\ldots)$ when she visits location l _s and stores $(l_{s}, {V_{i}^{s}})$ in her local database V _i, where $v_{i}^{sj}$ is the measured WiFi signal strength of the jth AP, 1≤i≤n, l _s∈L, and L is a location set defined by the service provider. The aggregator collects the measurements from the suppliers and would like to estimate the mean and variance of the WiFi signal strengths of every AP at every specific location in L based on the measurements of the suppliers.

4.2 Design goal

In the crowdsourcing-based site survey, the aggregator estimates the parameters based on the data (i.e., the measured WiFi signal strengths and the corresponding locations) contributed by the suppliers. However, the released data inevitably leak the location privacy of the suppliers. The aggregator can learn the locations that the suppliers visit. The goal of this paper is to ensure that the aggregator can estimate the mean and variance of WiFi signal strengths of every AP at every location in L, and at the same time, the location privacy of the suppliers is not compromised. In detail, we want to achieve the following privacy goals:

Location privacy: Our scheme should ensure that the aggregator cannot learn the locations that the suppliers visited before. Also, the WiFi signal strengths collected by the suppliers should not be revealed, since the aggregator can infer their location privacy based on their measured WiFi signal strengths.
Differential privacy: In the crowdsourcing-based site survey, even though the measurements of every supplier are completely hidden from the aggregator, it still can infer the location privacy of a supplier u _i by comparing the aggregating result when the u _i is in the site survey group and that when u _i is not in the site survey group.¹ Therefore, our scheme should achieve differential privacy which has been accepted as a standard for privacy preservation [20, 22]. Differential privacy can guarantee that the aggregator can retrieve information about any supplier only up to a predefined threshold, no matter what auxiliary information it knows about that supplier.

In this paper, we adopt the “honest-but-curious” model which assumes that each player honestly follows the designated protocols and procedures while it intends to disclose the other’s private information.

5 Privacy-preserving site survey

In this section, we present a novel privacy-preserving crowdsourcing-based site survey scheme which can estimate the distribution of the WiFi signal strengths at each specific location without leaking the privacy of each supplier. The proposed scheme consists of four phases which are detailed as follows.

5.1 Preparation and initiation

In this phase, n suppliers and an aggregator form a site survey group. Within this site survey group, every supplier u _i generates its public key PK_i and private key PR_i using the Paillier cryptosystem and then sends the public key PK_i to other suppliers and the aggregator. The above process can be executed offline and only needs to be performed once. If the aggregator wants to estimate the mean and variance of the WiFi signal strengths from the jth AP at location l _s, it sends a request with <AP_j,l _s> to every supplier in this group.

5.2 Adding noises

After receiving the aggregator’s request, every supplier u _i first queries its local dataset V _i to get a tuple $(m_{i},v_{i}^{sj})$, where m _i indicates whether the supplier u _i visited location l _s before and $v_{i}^{sj}$ is the measured WiFi signal strength of the jth AP at location l _s. If the supplier u _i visited location l _s before, m _i is set to 1 and $v_{i}^{sj}$ is set to the measured WiFi signal strength. If the supplier u _i never visited location l _s before, m _i and $v_{i}^{sj}$ are both set to 0.

To ensure that the presence or absence of the supplier u _i in the site survey group will not significantly increase her chance of being compromised (i.e., to achieve ε-differential privacy), every supplier u _i adds appropriately chosen random noises to $v_{i}^{sj}$ and m _i as follows:

$$ v_{i}^{\prime }=v_{i}^{sj}+\mathcal{G}_{1}(n,\lambda_{1}) - \mathcal{G}_{2}(n,\lambda_{1}), $$

((7))

$$ m_{i}^{\prime }=m_{i} + \mathcal{G}_{3}(n,\lambda_{2}) - \mathcal{G}_{4}(n,\lambda_{2}), $$

((8))

where $\mathcal {G}_{1}(n,\lambda _{1})$ and $\mathcal {G}_{2}(n,\lambda _{1})$ are independent and identically distributed (i.i.d.) random variables having gamma distribution with probability density function (PDF)

$$ g(x,n,\lambda_{1})=\frac{(1/\lambda_{1})^{1/n}}{\Gamma (1/n)}x^{\frac{1}{n}-1}e^{-x/\lambda_{1} }, $$

((9))

and $\mathcal {G}_{3}(n,\lambda _{2})$ and $\mathcal {G}_{4}(n,\lambda _{2})$ are i.i.d. random variables having gamma distribution with PDF

$$ g(x,n,\lambda_{2})=\frac{(1/\lambda_{2})^{1/n}}{\Gamma (1/n)}x^{\frac{1}{n}-1}e^{-x/\lambda_{2} }. $$

((10))

λ ₁=Δ f ₁/ε and λ ₂=Δ f ₂/ε, where Δ f ₁ and Δ f ₂ are the global sensitivity of the WiFi signal strength and m, respectively. Since the WiFi signal strength ranges from −90 to 0 dbm and m∈{0,1}, we set Δ f ₁=90 and Δ f ₂=1. The parameter ε controls the trade-off between the desired privacy level and the data utility. A smaller ε yields a stronger privacy guarantee but generates more noises. In the evaluation section, we will investigate the impact of ε on the data utility. We will prove that the proposed scheme can achieve ε-differential privacy in the next section.

5.3 Encrypting data

After adding noises to her data, every supplier employs secret sharing [23] and Paillier cryptosystem to hide her data from the aggregator. For simplicity, we only demonstrate how to hide $v_{i}^{\prime }$. The way to hide $m_{i}^{\prime }$ is the same. Each supplier u _i first splits $v_{i}^{\prime }$ into n random shares as follows:

$$ \begin{aligned} & u_{1}: v_{1}^{\prime} = v_{11}^{\prime} + v_{12}^{\prime} + \ldots v_{1i}^{\prime} + \ldots v_{1n}^{\prime} \mod \eta\\ & u_{2}: v_{2}^{\prime} = v_{21}^{\prime} + v_{22}^{\prime} + \ldots v_{2i}^{\prime} + \ldots v_{2n}^{\prime} \mod \eta\\ & \ldots\\ & u_{i}: v_{i}^{\prime} = v_{i1}^{\prime} + v_{i2}^{\prime} + \ldots v_{ii}^{\prime} + \ldots v_{in}^{\prime} \mod \eta\\ & \ldots\\ & u_{n}: v_{n}^{\prime} = v_{n1}^{\prime} + v_{n2}^{\prime} + \ldots v_{ni}^{\prime} + \ldots v_{nn}^{\prime} \mod \eta,\\ \end{aligned} $$

((11))

where η is a large integer. Then, each supplier u _i keeps $v_{ii}^{\prime }$ for herself, encrypts $v_{ij}^{\prime }$ using the public key of supplier u _j, and then sends $E_{\text {PK}_{j}}(v_{ij}^{\prime })$ to the aggregator. After the aggregator receives the encrypted shares from all the suppliers, she adds the shares which are encrypted by the same public key based on the additively homomorphic property of the Paillier cryptosystem as follows:

$$ \begin{aligned} &V_{1} = E_{\text{PK}_{1}}({\underset{j\neq 1}{\sum }v_{j1}^{\prime}}) = \underset{j\neq 1}{ \prod}E_{\text{PK}_{1}}(v_{j1}^{\prime})\\ &V_{2} = E_{\text{PK}_{2}}({\underset{j\neq 2}{\sum }v_{j2}^{\prime}}) = \underset{j\neq 2}{ \prod}E_{\text{PK}_{2}}(v_{j2}^{\prime})\\ & {\ldots}\\ &V_{i} = E_{\text{PK}_{i}}({\underset{j\neq i}{\sum }v_{ji}^{\prime}}) = \underset{j\neq i}{ \prod}E_{\text{PK}_{i}}(v_{ji}^{\prime})\\ & {\ldots}\\ &V_{n} = E_{\text{PK}_{n}}({\underset{j\neq n}{\sum }v_{jn}^{\prime}}) = \underset{j\neq n}{ \prod}E_{\text{PK}_{n}}(v_{jn}^{\prime}).\\ \end{aligned} $$

((12))

Then, the aggregator sends V _i to the supplier u _i. Every supplier u _i decrypts V _i using her secret key PR_i, and adds her share $v_{ii}^{\prime }\phantom {\dot {i}\!}$ to $D_{\text {PK}_{i}}(V_{i})\phantom {\dot {i}\!}$ to get $V_{i}^{\prime } = \sum _{j=1}^{n}v_{ji}^{\prime }\phantom {\dot {i}\!}$ in plaintext and sends $v_{i}^{\prime }$ to the aggregator. Adding all $V_{i}^{\prime }(1\leq i\leq n)$ together, the aggregator can get $V^{\prime } = \sum _{i=1}^{n}V_{i}^{\prime }$ which is equal with $\sum _{i=1}^{n}v_{i}^{\prime }$. We will prove its correctness in next section. In the same way, every supplier can hide $m_{i}^{\prime }$ from others, but the aggregator can get $M^{\prime } = \sum _{i=1}^{n}m_{i}^{\prime }$.

5.4 Estimating the parameters

After getting $V^{\prime } = \sum _{i=1}^{n}V_{i}^{\prime }$ and $M^{\prime } = \sum _{i=1}^{n}m_{i}^{\prime }$, the aggregator can estimate the mean (denoted as μ ^′) of the WiFi signal strengths of AP_t at location l _s as follows:

$$ \mu^{\prime} = \frac{V^{\prime}}{M^{\prime}} = \frac{\sum_{i=1}^{n}v_{i}^{\prime}}{\sum_{i=1}^{n}m_{i}^{\prime}}. $$

((13))

Since every supplier adds controlled noises to her data, the estimated mean μ ^′ is not exactly the same as $\mu = \sum _{i=1}^{n}v_{i}^{sj}/\sum _{i=1}^{n}m_{i}$. The estimation error is controlled by the parameter ε. We will investigate the impact of ε on the estimation errors and show that the localization accuracy when we use μ ^′ is comparable with that when we use μ in most cases.

To estimate the variance δ ^′, the aggregator send μ ^′ back to every supplier u _i which can then get δ _i as follows:

$$ \delta_{i}= \left\{\begin{array}{ll} (v_{i}^{sj}- \mu^{\prime})^{2}, & \text{if }m_{i} = 1 \\ \quad \quad 0, & \text{if }m_{i} = 0.\end{array}\right. $$

((14))

Following the same rules described above, every supplier u _i adds a random noise $\mathcal {G}_{5}(n,\lambda _{1}) - \mathcal {G}_{6}(n,\lambda _{3})$ to δ _i to get $\delta _{i}^{\prime } = \delta _{i} + \mathcal {G}_{5}(n,\lambda _{1}) - \mathcal {G}_{6}(n,\lambda _{3})$, and then sends $\delta _{i}^{\prime }$ to the aggregator in a secret way. $\mathcal {G}_{5}(n,\lambda _{1})$ and $\mathcal {G}_{6}(n,\lambda _{3})$ are i.i.d. random variables having gamma distribution with PDF

$$ g(x,n,\lambda_{3})=\frac{(1/\lambda_{3})^{1/n}}{\Gamma (1/n)}x^{\frac{1}{n}-1}e^{-x/\lambda_{3} }, $$

((15))

where λ ₃=90²/ε. Following the same rules above, the aggregator computes $\sum _{i=1}^{n}\delta _{i}^{\prime }$ without knowing every $\delta _{i}^{\prime }$, and the variance of the WiFi signal strengths of AP_t at location l _s can be estimated by $\delta ^{\prime } = \sum _{i=1}^{n}\delta _{i}^{\prime }/M^{\prime }$.

6 Theoretical analysis of the proposed scheme

In this section, we will theoretically analyze the correctness of the proposed scheme and prove that the proposed scheme can achieve the desired privacy goals.

6.1 The correctness of the scheme

In our scheme, we employ secret sharing and homomorphic encryption to protect the privacy of every supplier. Every supplier u _i splits her data $v_{i}^{\prime }$ into n shares and submits $v_{i}^{\prime }$ to the aggregator. Adding all $V_{i}^{\prime }(1\leq i\leq n)$ together, the aggregator can get $V^{\prime } = \sum _{i=1}^{n}V_{i}^{\prime }$. We claim that $V^{\prime } = \sum _{i=1}^{n}V_{i}^{\prime }$ is equal to $\sum _{i=1}^{n}v_{i}^{\prime }$, which is supported by the following theorem.

Theorem 2.

Only given $V_{i}^{\prime }(1\leq i\leq n)$, the aggregator can correctly compute $\sum _{i=1}^{n}v_{i}^{\prime }$ by adding $V_{i}^{\prime }(1\leq i\leq n)$ together.

Proof.

As described above, $V_{i}^{\prime } = D_{\text {PK}_{i}}(V_{i}) + v_{ii}^{\prime }$ thus, we have

$$\begin{array}{@{}rcl@{}} \sum_{i=1}^{n}V_{i}^{\prime} &=& \sum_{i=1}^{n}\left({D_{\text{PK}_{i}}(V_{i}) + v_{ii}^{\prime}}\right) \\ &=& \underset{i=1}{\overset{n}{\sum }}\left(D_{\text{PK}_{i}}\left({\underset{j\neq i}{ \prod}E_{\text{PK}_{i}}(v_{ji}^{\prime})}\right) +v_{ii}^{\prime }\right) \end{array} $$

Applying the additively homomorphic property of Paillier cryptosystem, we have $ \underset {j\neq i}{ \prod }E_{\text {PK}_{i}}(v_{ji}^{\prime }) = E_{\text {PK}_{i}}({\underset {j\neq i}{\sum }v_{ji}^{\prime }})$, thus

$$\begin{array}{@{}rcl@{}} \sum_{i=1}^{n}V_{i}^{\prime} &=& \underset{i=1}{\overset{n}{\sum }}\left(D_{\text{PK}_{i}}\left({E_{\text{PK}_{i}}({\underset{j\neq i}{\sum }v_{ji}^{\prime}})}\right) +v_{ii}^{\prime }\right)\\ &=& \underset{i=1}{\overset{n}{\sum }}\left(\underset{j\neq i}{\sum }v_{ji}^{\prime }+v_{ii}^{\prime }\right)\\ &=& \underset{i=1}{\overset{n}{\sum }}\underset{j=1}{\overset{n}{\sum }}v_{ji}^{\prime }=\sum_{i=1}^{n}v_{i}^{\prime}\\ \end{array} $$

Then, we have $\sum _{i=1}^{n}V_{i}^{\prime } = \sum _{i=1}^{n}v_{i}^{\prime }$, which proves Theorem 2.

Following the same rules, we can also prove that the aggregator can correctly compute $\sum _{i=1}^{n}m_{i}^{\prime }$ and $\sum _{i=1}^{n}\delta _{i}^{\prime }$. Therefore, the correctness of the proposed scheme is proved.

6.2 The security of the scheme

In the proposed scheme, every supplier adds random noises having gamma distribution to achieve differential privacy and further employs secret sharing to hide her data. We claim that the proposed scheme can achieve the desired privacy goals, which is supported by the following two Theorems:

Theorem 3.

The proposed scheme satisfies ε-differential privacy.

Proof.

In the proposed scheme, $v_{i}^{\prime }=v_{i}+\mathcal {G}_{1}(n,\lambda _{1}) - \mathcal {G}_{2}(n,\lambda _{1})$, where $\mathcal {G}_{1}(n,\lambda _{1})$ and $\mathcal {G}_{2}(n,\lambda _{1})$ are i.i.d. random variables having gamma distribution with PDF

$$ g(x,n,\lambda_{1})=\frac{(1/\lambda_{1})^{1/n}}{\Gamma (1/n)}x^{\frac{1}{n}-1}e^{-x/\lambda_{1} } $$

((16))

thus, we have

$$\begin{array}{@{}rcl@{}} \sum_{i=1}^{n}v_{i}^{\prime} &=& \sum_{i=1}^{n}\left({v_{i}+\mathcal{G}_{1}(n,\lambda_{1}) - \mathcal{G}_{2}(n,\lambda_{1})}\right)\\ &=& \sum_{i=1}^{n}v_{i} + \sum_{i=1}^{n}\left({\mathcal{G}_{1}(n,\lambda_{1}) - \mathcal{G}_{2}(n,\lambda_{1})}\right). \end{array} $$

Let $\mathcal {L}(\lambda)$ denote a random variable which has a Laplace distribution with PDF $f(x,\lambda _{1})=\frac {1}{2\lambda _{1} }e^{\frac {\left \vert x\right \vert }{\lambda _{1}}}$. According to [24], the distribution of $\mathcal {L}(\lambda _{1})$ is infinitely divisible. Furthermore, for every integer n≥1, $\mathcal {L}(\lambda _{1}) = \sum _{i=1}^{n}\left [ {\mathcal {G}}_{1}(n,\lambda _{1})-{\mathcal {G}}_{2}(n,\lambda _{1})\right ]$, where $\mathcal {G}_{1}(n,\lambda _{1})$ and $\mathcal {G}_{2}(n,\lambda _{1})$ are i.i.d. random variables having gamma distribution with PDF $g(x,n,\lambda _{1})=\frac {(1/\lambda _{1})^{1/n}}{\Gamma (1/n)}x^{\frac {1}{n}-1}e^{-x/\lambda _{1} }$, where x≥0. Thus, we have

$$\begin{array}{@{}rcl@{}} \sum_{i=1}^{n}v_{i}^{\prime} &=& \sum_{i=1}^{n}v_{i} + \sum_{i=1}^{n}\mathcal{L}(\lambda_{1}). \end{array} $$

Similarly, we can have

$$\begin{array}{@{}rcl@{}} \sum_{i=1}^{n}m_{i}^{\prime} &=& \sum_{i=1}^{n}m_{i} + \sum_{i=1}^{n}\mathcal{L}(\lambda_{2}) \end{array} $$

$$\begin{array}{@{}rcl@{}} \sum_{i=1}^{n}\delta_{i}^{\prime} &=& \sum_{i=1}^{n}\delta_{i} + \sum_{i=1}^{n}\mathcal{L}(\lambda_{3}), \end{array} $$

where $\mathcal {L}(\lambda _{2})$ and $\mathcal {L}(\lambda _{3})$ are two random variables following the Laplace distribution with PDF $f(x,\lambda _{1})=\frac {1}{2\lambda _{2} }e^{\frac {\left \vert x\right \vert }{\lambda _{2}}}$ and $f(x,\lambda _{3})=\frac {1}{2\lambda _{3} }e^{\frac {\left \vert x\right \vert }{\lambda _{3}}}$, respectively. According to Theorem 1, the proposed scheme achieves ε-differential privacy.

Theorem 4.

The proposed scheme can protect every supplier’s location privacy.

Proof.

In the proposed scheme, every supplier u _i splits its data $v_{i}^{\prime }$, $m_{i}^{\prime }$, and $\delta _{i}^{\prime }$ into n random shares and sends the other encrypted n−1 shares to the aggregator. Even the aggregator gets the plaintexts of the other n−1 shares, it still cannot know $v_{i}^{\prime }$, $m_{i}^{\prime }$, and $\delta _{i}^{\prime }$ since u _i keeps one share after splitting the data. Therefore, the aggregator cannot figure out whether u _i visited l _s and the measured WiFi signal strength, which proves that the proposed scheme can protect every supplier’s location privacy.

7 Evaluation

In this section, we evaluate the performance of the proposed scheme. We focus on two important metrics in the evaluation: the utility of the aggregated data and the efficiency of the proposed scheme.

7.1 Experiment setup

We implement the supplier side of the proposed scheme on a Android platform with a Qualcomm Snapdragon600 Quad-Core 1.7 GHz CPU and 2 G RAM, and the aggregator side of the proposed scheme on a 32-bit computer with Intel i7 CPU of 3.4 GHz and 4 G memory. The Paillier modulus used in this work is set to 1024. In the experiments, we use a real-world WiFi fingerprint dataset to evaluate the performance of our algorithm. The dataset has total 1000 records which are collected in a typical indoor environment. Each record contains the WiFi signal strengths from nearby APs. The total number of APs used in the experiments is 10 (i.e., n=10) and the total number of locations in the indoor environment is 76 (i.e., |L|=76). In the simulations, the data are randomly distributed to the suppliers and then the aggregator tries to estimate the WiFi fingerprint at every location.

7.2 Utility evaluation

In the experiments, the aggregator estimates the WiFi fingerprint at every location based on the data with noises provided by the suppliers, and then uses the estimated data to offer localization service. In this section, we evaluate the impact of the added noises on the aggregated results and the accuracy of the localization.

In the proposed scheme, every supplier adds two random noises with gamma distribution to her measurements to achieve ε-differential privacy. In the experiments, we employ the Euclidean distance as the metric to evaluate the usability of the data with noises. Figure 2 presents the cumulative distribution function (CDF) of the Euclidean distance between the estimated WiFi fingerprint without noises and the estimated WiFi fingerprint with noises. It is observed that the accuracy of estimation increases when ε increases from 0.4 to 2.0, and 80 % of the Euclidean distances between the noisy WiFi fingerprint and the original WiFi fingerprint are smaller than 6. A smaller ε yields larger noises, which, on the other hand, provides a stronger privacy preservation. It is a trade-off between the utility of the data and the privacy.

Further, we investigate the impact of our privacy-preserving scheme on the localization accuracy. In this paper, we employ k-nearest neighbors to determine the unknown locations in the online operating phase. Figure 3 shows the CDF of the localization errors when ε is set to different values and when no noises are added to the data. It is observed that the localization accuracy increases when ε increases from 0.4 to 2.0 and the localization accuracy is the highest when no noises are added to the data. We also observed that 80 % of the localization errors are within 5 m, which implies a high data usability.

7.3 Computational and communication overhead

In this work, we employ Paillier cryptosystem as our cryptographic primitive to protect the suppliers’ privacy, which inevitably brings more computational and communication overhead. In this section, we evaluate the computational and communication cost of the proposed scheme. In the experiments, we set ε=0.4 and investigate the impact of the number of suppliers (i.e., n) on the computational time and communication overhead.

Figure 4 shows the time cost on the supplier side and the aggregator side for estimating the WiFi signal strength of one AP at every location. We can see that the computational time on the supplier side is proportional to the number of the suppliers. When the number of the suppliers is set to 10, the time cost on the supplier side is 1.6 s. When the number of the suppliers reaches 100, the time cost on the supplier side is 16 s. The computational time on the aggregator side is proportional to the square of the number of the suppliers. When the number of the suppliers is 10, the computational time on the aggregator side is only 0.08 s. However, when the number of the suppliers reaches 100, the computational time on the aggregator side becomes 8.8 s.

Figure 5 shows the impact of the number of the suppliers on the bandwidth cost of every supplier and the aggregator. It is observed that the bandwidth cost of the supplier is proportional to the number of the suppliers. The bandwidth cost is 10 kb for every supplier when the number of the supplier is 10. The bandwidth cost increases with the increase of the number of the supplers. The bandwidth cost reaches 101 kb for every supplier when the number of the supplier is 100. The bandwidth cost on the aggregator side is proportional to the square of the number of the suppliers. When the number of the suppliers is 10, the bandwidth cost of the aggregator is only 110 kb. When the number of the suppliers is 100, the bandwidth cost becomes 10100 kb.

8 Conclusions

In this work, we propose a privacy-preserving site survey scheme for WiFi fingerprint-based localization. The proposed scheme uses homomorphic encryption and differential privacy model to protect the location privacy of the participants which get involved in the site survey process of the WiFi fingerprint localization. We theoretically analyze the security of the scheme and use simulation experiments on real-world data to validate the efficiency of the proposed scheme.

9 Endnote

¹For example, assume that the mean of the WiFi signal strengths estimated by the aggregator is μ ₁ when u _i is in the group, and the mean of the WiFi signal strengths is μ ₂ when u _i is not in the group. The aggregagor can get the measured WiFi signal strength of u _i by the formula μ ₁·n−μ ₂·(n−1), where n is the number of suppliers in the group.

References

P Bahl, VN Padmanabhan, in Proc. of IEEE INFOCOM. Radar: an in-building RF-based user location and tracking system, (2000), pp. 775–784.
Z Yang, C Wu, Y Liu, in Proc. of ACM MobiCom. Locating in fingerprint space: wireless indoor localization with little human intervention, (2012), pp. 269–280.
H Liu, Y Gan, J Yang, S Sidhom, Y Wang, Y Chen, F Ye, in Proc. of ACM MobiCom. Push the limit of WiFi based localization for smartphones, (2012), pp. 305–316.
W Cheng, D Wu, X Cheng, D Chen, in WASA. Routing for information leakage reduction in multi-channel multi-hop ad-hoc social networks, (2012), pp. 31–42.
D Milioris, L Kriara, A Papakonstantinou, G Tzagkarakis, P Tsakalides, M Papadopouli, in Proceedings of the 13th ACM International Conference on Modeling, Analysis, and Simulation of Wireless and Mobile Systems. Empirical evaluation of signal-strength fingerprint positioning in wireless LANs (ACM, 2010), pp. 5–13.
J Niu, B Wang, L Cheng, JJ Rodrigues, in Communications (ICC) 2015 IEEE International Conference on. WicLoc: an indoor localization system based on WiFi fingerprints and crowdsourcing (IEEE, 2015), pp. 3008–3013.
J Li, Z Cai, M Yan, Y Li, in INFOCOM, year=2016 Proceedings IEEE. Using crowdsourced data in location-based social networks to explore influence maximization (IEEE, 2016).
R Shokri, G Theodorakopoulos, J Le Boudec, J Hubaux, in IEEE Symposium on Security and Privacy. Quantifying location privacy, (2011), pp. 247–262.
Y He, L Sun, W Yang, H Li, A game theory-based analysis of data privacy in vehicular sensor networks. Int. J. Distrib. Sens. Networks. 2014: (2014).
T Shu, Y Chen, J Yang, A Williams, in INFOCOM, 2014 Proceedings IEEE. Multi-lateral privacy-preserving localization in pervasive environments (IEEE, 2014), pp. 2319–2327.
X Wang, Y Liu, Z Shi, X Lu, L Sun, A privacy-preserving fuzzy localization scheme with CSI fingerprint, (2016).
H Li, L Sun, H Zhu, X Lu, X Cheng, in INFOCOM, 2014 Proceedings IEEE. Achieving privacy preservation in WiFi fingerprint-based localization (IEEE, 2014), pp. 2337–2345.
D Yang, X Fang, G Xue, in Proc. of IEEE INFOCOM. Truthful incentive mechanisms for k-anonymity location privacy, (2013), pp. 3094–3102.
X Liu, K Liu, L Guo, X Li, Y Fang, in Proc. of IEEE INFOCOM. A game-theoretic approach for achieving k-anonymity in location based services, (2013), pp. 3085–3093.
AR Beresford, F Stajano, in Proc. of the IEEE PerSec. Mix zones: user privacy in location-aware services, (2004), pp. 127–131.
J Shao, R Lu, X Lin, in INFOCOM, 2014 Proceedings IEEE. FINE: a fine-grained privacy-preserving location-based service framework for mobile devices (IEEE, 2014), pp. 244–252.
I Bilogrevic, M Jadliwala, K Kalkan, J-P Hubaux, I Aad, in Privacy Enhancing Technologies. Privacy in mobile computing for location-sharing-based services (Springer, 2011), pp. 77–96.
A Chen, C Harko, D Lambert, P Whiting, An algorithm for fast, model-free tracking indoors. ACM SIGMOBILE Mob. Comput. Commun. Rev. 11(3), 48–58 (2007).
Article Google Scholar
D Milioris, G Tzagkarakis, A Papakonstantinou, M Papadopouli, P Tsakalides, Low-dimensional signal-strength fingerprint-based positioning in wireless LANs. Ad Hoc Netw. 12:, 100–114 (2014).
Article Google Scholar
C Dwork, in Encyclopedia of Cryptography and Security. Differential privacy (Springer, 2011), pp. 338–340.
P Paillier, in Proc. of ACM EUROCRYPT. Public-key cryptosystems based on composite degree residuosity classes, (1999).
ME Andrés, NE Bordenabe, K Chatzikokolakis, C Palamidessi, in Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security. Geo-indistinguishability: differential privacy for location-based systems (ACM, 2013), pp. 901–914.
FD Garcia, B Jacobs, in Security and Trust Management. Privacy-friendly energy-metering via homomorphic encryption (Springer, 2011), pp. 226–238.
S Kotz, T Kozubowski, K Podgorski, The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance (Springer Science & Business Media, 2012).

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant No. 61472418).

Author information

Authors and Affiliations

School of Information Engineering, Yancheng Teachers University, Yancheng, China
Shujun Li
Beijing Key Laboratory of IOT Information Security Technology, Institute of Information Engineering, CAS, Beijing, China
Hong Li & Limin Sun

Authors

Shujun Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Li
View author publications
You can also search for this author in PubMed Google Scholar
Limin Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shujun Li.

Additional information

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Li, S., Li, H. & Sun, L. Privacy-preserving crowdsourced site survey in WiFi fingerprint-based localization. J Wireless Com Network 2016, 123 (2016). https://doi.org/10.1186/s13638-016-0624-2

Download citation

Received: 18 December 2015
Accepted: 24 April 2016
Published: 04 May 2016
DOI: https://doi.org/10.1186/s13638-016-0624-2

Privacy-preserving crowdsourced site survey in WiFi fingerprint-based localization

Abstract

1 Introduction

2 Related work

2.1 Privacy-preserving service request

2.2 Privacy-preserving localization

3 Background

3.1 WiFi fingerprint-based localization

3.2 Differential privacy

Theorem 1.

3.3 The Paillier cryptosystem

4 System model and problem formulation

4.1 System model

4.2 Design goal

5 Privacy-preserving site survey

5.1 Preparation and initiation

5.2 Adding noises

5.3 Encrypting data

5.4 Estimating the parameters

6 Theoretical analysis of the proposed scheme

6.1 The correctness of the scheme

Theorem 2.

Proof.

6.2 The security of the scheme

Theorem 3.

Proof.

Theorem 4.

Proof.

7 Evaluation

7.1 Experiment setup

7.2 Utility evaluation

7.3 Computational and communication overhead

8 Conclusions

9 Endnote

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

Keywords