Privacypreserving crowdsourced site survey in WiFi fingerprintbased localization
 Shujun Li^{1}Email author,
 Hong Li^{2} and
 Limin Sun^{2}
https://doi.org/10.1186/s1363801606242
© Li et al. 2016
Received: 18 December 2015
Accepted: 24 April 2016
Published: 4 May 2016
Abstract
Typically, site survey is an inevitable phase for WiFi fingerprintbased localization which is regarded as one of the most promising techniques for indoor localization. However, the site survey can cause potential location privacy leakage for the participants who contribute their WiFi fingerprint measurements. In this paper, we propose a privacypreserving site survey scheme for WiFi fingerprintbased localization. In the proposed scheme, we use homomorphic encryption to protect the location privacy of the participants which get involved in the site survey. Further, we employ differential privacy model to ensure that the released data will not breach an individual’s location privacy regardless of whether she is present or absent in the site survey group. We theoretically analyze the security of the proposed scheme and use simulation experiments on a realworld data to validate the efficiency of the proposed scheme.
Keywords
1 Introduction
Due to the increasing demand for locationbased services (LBSs) and the lack of GPS signals in indoor environments, indoor localization has become more and more popular in recent years. Researchers have proposed a vast range of approaches, among which WiFi fingerprintbased indoor localization is one of the most promising technologies [1–4]. A typical WiFi fingerprintbased localization algorithm consists of two phases, offline site survey and online operating. In the offline site survey phase, the service provider collects WiFi signal strengths from multiple access points (APs) at every location of an interested area. Next, in the online operating phase, a tobelocalized client measures the signal strengths at a specific location from nearby APs, and then algorithms such as knearest neighbors [1, 3] or probabilitybased algorithms [5] are employed to infer the user’s location based on the measured WiFi signal strengths.
Usually the site survey is conducted in a crowdsourced way [2, 6, 7]. Suppliers recruited by the service provider measure the WiFi signal strengths of nearby APs when they visit the places which the service provider is interested in, and then send the measured WiFi signal strengths and the corresponding locations to the service provider. The service provider aggregates the data contributed by the suppliers to estimate the parameters which will be used in the online operating phase. The parameters which need to be estimated depend on the algorithms used in the online operating phase. In knearest neighborbased algorithms, the mean of the WiFi signal strength of every AP at every location needs to be estimated, while in the probabilitybased algorithms, both the mean and variance of the signal strength of every AP are required. Crowdsourcing is an efficient way to conduct the site survey, but the measurements contributed by the suppliers will inevitably leak their location privacy. The service provider can infer the locations that the suppliers visit based on the data they contribute. Existing research indicates that location traces can leak information about the individuals’ habits, interests, activities, and relationships [8, 9]. Consequently, the loss of location privacy can expose the suppliers to unwanted advertisements and locationbased spams/scams and may cause social reputation or economic damage to the suppliers and can make the victims of blackmails or even physical violence.
Several approaches have been proposed to address the privacy issues of indoor localization algorithms. In [10], Shu et al. studied the privacy issues in rangebased localization algorithms and proposed a scheme to protect users’ privacy during the localization process. In [11], Wang et al. developed a privacypreserving fuzzy localization scheme with CSI (Channel State Information) fingerprint. These privacypreserving schemes were not designed for WiFi fingerprintbased localization algorithms; thus, they cannot be used to address the privacy issues presented in this paper. The most closely related work to this paper is that of Li et al. [12] which proposed a privacypreserving scheme to address the privacy issues of the online operating phase in WiFi fingerprintbased localization algorithms. However, they did not consider the privacy leaks of the offline site survey phase.

To the best of our knowledge, this work is the first to address the privacy issues of the site survey in WiFi fingerprintbased localization algorithms.

We propose a privacypreserving site survey scheme for WiFi fingerprintbased localization based on homomorphic encryption and differential privacy model.

We theoretically analyze the security of the proposed scheme and carry out simulation experiments on a realworld dataset to evaluate the performance of our scheme.
The rest of the paper is organized as follows. We first discuss the related work and introduce the background. Then, we present the detailed design of our scheme and the security analysis. Finally, we report the evaluation results and conclude this paper.
2 Related work
Location privacy in LBSs has been widely studied in the literature. In general, all the existing works can be classified into two categories: privacypreserving service request and privacypreserving localization.
2.1 Privacypreserving service request
In LBSs, users send their locations to the service provider to get the services, which will inevitably leak their privacy. Many schemes have been proposed to protect users’ location privacy when they request the locationbased services. kanonymity [13, 14] provides a form of plausible deniability by ensuring that the client cannot be individually identified from a group of k clients. Mix zonebased schemes [15] divide the whole region into application and mix zones. Clients report their locations in application zones and receive new, unused pseudonyms at mix zones. Cryptographybased approaches [16, 17] protect users’ location privacy based on secure multiparty computation protocols. Since all the above schemes focus on the privacy issues when users request the location services, thus they cannot be used to address the privacy issues we discuss in this paper.
2.2 Privacypreserving localization
To address the location privacy issues in localization, Shu et al. [10] addressed the privacy leakage problem for rangebased localization algorithms, thus preventing the leakage of the location information of both the target and the anchors. Wang et al. [11] developed a privacypreserving fuzzy localization scheme with CSI fingerprint using homomorphic encryption and fuzzy logic. These privacypreserving schemes were not designed for WiFi fingerprintbased localization; thus, they cannot be used to address the privacy issues presented in this paper. Li et al. [12] studied the privacy issues in WiFi fingerprintbased localization and proposed a privacypreserving scheme to protect both the users’ and the service provider’s privacy during the online operating phase. However, they did not consider the privacy leaks during the site survey phase.
3 Background
3.1 WiFi fingerprintbased localization
The process of WiFi fingerprintbased localization can be divided into two phases: offline site survey phase and online operating phase. In the offline site survey phase, a supplier u _{ i } recruited by the service provider measures the WiFi signal strengths \({V_{s}^{i}}\) of nearby APs when they visit a place l _{ s } and send \((l_{s}, {V_{s}^{i}})\) to the service provider which aggregates the measurements and estimates the parameters which will be used in the online operating phase. In the online operating phase, a tobelocalized user measures the WiFi signal strengths at her current location, denoted as \(V^{\prime }=\left (v_{1}^{\prime },v_{2}^{\prime },\ldots,v_{j}^{\prime },\ldots,v_{N}^{\prime }\right)\). Then, the service provider uses knearest neighbors or probabilitybased algorithms to determine the location of the user.
A common assumption is that the signal strength of AP_{ i } at location l follows a normal distribution parameterized with mean μ and variance δ. The parameters μ and δ are estimated by the service provider based on the measurements of the suppliers.
3.2 Differential privacy
The above equation indicates that the output of \(\mathcal {A}\) is insensitive to the modification of any single user’s data in the datasets (including its removal or addition). The parameter ε allows us to control the balance between the level of privacy and the data utility. A smaller ε implies stronger privacy. One common way to achieve differential privacy is to add Laplace noises to the original output of \(\mathcal {A}\) according to the following theorem.
Theorem 1.
for all D _{1} and D _{2} differing in at most one record.
3.3 The Paillier cryptosystem
In this work, we employ the Paillier cryptosystem as our cryptographic primitive. Invented by Pascal Paillier [21], the Paillier cryptosystem is a probabilistic asymmetric algorithm based on the decisional composite residuosity problem. Paillier cryptosystem is summarized below to facilitate the understanding of our algorithm.

Key generation: To construct the public and private keys, one first chooses two large primes p, q of equivalent length and computes N=p q, λ=l c m(p−1,q−1), g=N+1, and μ=φ(N)^{−1} mod n, where φ(N)=(p−1)(q−1). The public key PK and private key PR are (N,g) and (λ,μ), respectively.

Encryption: Let m be the plaintext to be encrypted. We denote the ciphertext of m by E(m), which is given by$$ E(m) =g^{m}r^{N}\mod N^{2}, $$(4)
where \(r\in \mathbb {Z}_{N}\) is a random number.

Decryption: Let c be the ciphertext, the plaintext D(m) is obtained by$$ D(m) = L(c^{\lambda} \mod N^{2})\mu \mod N. $$(5)
4 System model and problem formulation
4.1 System model
4.2 Design goal

Location privacy: Our scheme should ensure that the aggregator cannot learn the locations that the suppliers visited before. Also, the WiFi signal strengths collected by the suppliers should not be revealed, since the aggregator can infer their location privacy based on their measured WiFi signal strengths.

Differential privacy: In the crowdsourcingbased site survey, even though the measurements of every supplier are completely hidden from the aggregator, it still can infer the location privacy of a supplier u _{ i } by comparing the aggregating result when the u _{ i } is in the site survey group and that when u _{ i } is not in the site survey group.^{1} Therefore, our scheme should achieve differential privacy which has been accepted as a standard for privacy preservation [20, 22]. Differential privacy can guarantee that the aggregator can retrieve information about any supplier only up to a predefined threshold, no matter what auxiliary information it knows about that supplier.
In this paper, we adopt the “honestbutcurious” model which assumes that each player honestly follows the designated protocols and procedures while it intends to disclose the other’s private information.
5 Privacypreserving site survey
In this section, we present a novel privacypreserving crowdsourcingbased site survey scheme which can estimate the distribution of the WiFi signal strengths at each specific location without leaking the privacy of each supplier. The proposed scheme consists of four phases which are detailed as follows.
5.1 Preparation and initiation
In this phase, n suppliers and an aggregator form a site survey group. Within this site survey group, every supplier u _{ i } generates its public key PK_{ i } and private key PR_{ i } using the Paillier cryptosystem and then sends the public key PK_{ i } to other suppliers and the aggregator. The above process can be executed offline and only needs to be performed once. If the aggregator wants to estimate the mean and variance of the WiFi signal strengths from the jth AP at location l _{ s }, it sends a request with <AP_{ j },l _{ s }> to every supplier in this group.
5.2 Adding noises
After receiving the aggregator’s request, every supplier u _{ i } first queries its local dataset V _{ i } to get a tuple \((m_{i},v_{i}^{sj})\), where m _{ i } indicates whether the supplier u _{ i } visited location l _{ s } before and \(v_{i}^{sj}\) is the measured WiFi signal strength of the jth AP at location l _{ s }. If the supplier u _{ i } visited location l _{ s } before, m _{ i } is set to 1 and \(v_{i}^{sj}\) is set to the measured WiFi signal strength. If the supplier u _{ i } never visited location l _{ s } before, m _{ i } and \(v_{i}^{sj}\) are both set to 0.
λ _{1}=Δ f _{1}/ε and λ _{2}=Δ f _{2}/ε, where Δ f _{1} and Δ f _{2} are the global sensitivity of the WiFi signal strength and m, respectively. Since the WiFi signal strength ranges from −90 to 0 dbm and m∈{0,1}, we set Δ f _{1}=90 and Δ f _{2}=1. The parameter ε controls the tradeoff between the desired privacy level and the data utility. A smaller ε yields a stronger privacy guarantee but generates more noises. In the evaluation section, we will investigate the impact of ε on the data utility. We will prove that the proposed scheme can achieve εdifferential privacy in the next section.
5.3 Encrypting data
Then, the aggregator sends V _{ i } to the supplier u _{ i }. Every supplier u _{ i } decrypts V _{ i } using her secret key PR_{ i }, and adds her share \(v_{ii}^{\prime }\phantom {\dot {i}\!}\) to \(D_{\text {PK}_{i}}(V_{i})\phantom {\dot {i}\!}\) to get \(V_{i}^{\prime } = \sum _{j=1}^{n}v_{ji}^{\prime }\phantom {\dot {i}\!}\) in plaintext and sends \(v_{i}^{\prime }\) to the aggregator. Adding all \(V_{i}^{\prime }(1\leq i\leq n)\) together, the aggregator can get \(V^{\prime } = \sum _{i=1}^{n}V_{i}^{\prime }\) which is equal with \(\sum _{i=1}^{n}v_{i}^{\prime }\). We will prove its correctness in next section. In the same way, every supplier can hide \(m_{i}^{\prime }\) from others, but the aggregator can get \(M^{\prime } = \sum _{i=1}^{n}m_{i}^{\prime }\).
5.4 Estimating the parameters
Since every supplier adds controlled noises to her data, the estimated mean μ ^{′} is not exactly the same as \(\mu = \sum _{i=1}^{n}v_{i}^{sj}/\sum _{i=1}^{n}m_{i}\). The estimation error is controlled by the parameter ε. We will investigate the impact of ε on the estimation errors and show that the localization accuracy when we use μ ^{′} is comparable with that when we use μ in most cases.
where λ _{3}=90^{2}/ε. Following the same rules above, the aggregator computes \(\sum _{i=1}^{n}\delta _{i}^{\prime }\) without knowing every \(\delta _{i}^{\prime }\), and the variance of the WiFi signal strengths of AP_{ t } at location l _{ s } can be estimated by \(\delta ^{\prime } = \sum _{i=1}^{n}\delta _{i}^{\prime }/M^{\prime }\).
6 Theoretical analysis of the proposed scheme
In this section, we will theoretically analyze the correctness of the proposed scheme and prove that the proposed scheme can achieve the desired privacy goals.
6.1 The correctness of the scheme
In our scheme, we employ secret sharing and homomorphic encryption to protect the privacy of every supplier. Every supplier u _{ i } splits her data \(v_{i}^{\prime }\) into n shares and submits \(v_{i}^{\prime }\) to the aggregator. Adding all \(V_{i}^{\prime }(1\leq i\leq n)\) together, the aggregator can get \(V^{\prime } = \sum _{i=1}^{n}V_{i}^{\prime }\). We claim that \(V^{\prime } = \sum _{i=1}^{n}V_{i}^{\prime }\) is equal to \(\sum _{i=1}^{n}v_{i}^{\prime }\), which is supported by the following theorem.
Theorem 2.
Only given \(V_{i}^{\prime }(1\leq i\leq n)\), the aggregator can correctly compute \(\sum _{i=1}^{n}v_{i}^{\prime }\) by adding \(V_{i}^{\prime }(1\leq i\leq n)\) together.
Proof.
Then, we have \(\sum _{i=1}^{n}V_{i}^{\prime } = \sum _{i=1}^{n}v_{i}^{\prime }\), which proves Theorem 2.
Following the same rules, we can also prove that the aggregator can correctly compute \(\sum _{i=1}^{n}m_{i}^{\prime }\) and \(\sum _{i=1}^{n}\delta _{i}^{\prime }\). Therefore, the correctness of the proposed scheme is proved.
6.2 The security of the scheme
In the proposed scheme, every supplier adds random noises having gamma distribution to achieve differential privacy and further employs secret sharing to hide her data. We claim that the proposed scheme can achieve the desired privacy goals, which is supported by the following two Theorems:
Theorem 3.
The proposed scheme satisfies εdifferential privacy.
Proof.
where \(\mathcal {L}(\lambda _{2})\) and \(\mathcal {L}(\lambda _{3})\) are two random variables following the Laplace distribution with PDF \(f(x,\lambda _{1})=\frac {1}{2\lambda _{2} }e^{\frac {\left \vert x\right \vert }{\lambda _{2}}}\) and \(f(x,\lambda _{3})=\frac {1}{2\lambda _{3} }e^{\frac {\left \vert x\right \vert }{\lambda _{3}}}\), respectively. According to Theorem 1, the proposed scheme achieves εdifferential privacy.
Theorem 4.
The proposed scheme can protect every supplier’s location privacy.
Proof.
In the proposed scheme, every supplier u _{ i } splits its data \(v_{i}^{\prime }\), \(m_{i}^{\prime }\), and \(\delta _{i}^{\prime }\) into n random shares and sends the other encrypted n−1 shares to the aggregator. Even the aggregator gets the plaintexts of the other n−1 shares, it still cannot know \(v_{i}^{\prime }\), \(m_{i}^{\prime }\), and \(\delta _{i}^{\prime }\) since u _{ i } keeps one share after splitting the data. Therefore, the aggregator cannot figure out whether u _{ i } visited l _{ s } and the measured WiFi signal strength, which proves that the proposed scheme can protect every supplier’s location privacy.
7 Evaluation
In this section, we evaluate the performance of the proposed scheme. We focus on two important metrics in the evaluation: the utility of the aggregated data and the efficiency of the proposed scheme.
7.1 Experiment setup
We implement the supplier side of the proposed scheme on a Android platform with a Qualcomm Snapdragon600 QuadCore 1.7 GHz CPU and 2 G RAM, and the aggregator side of the proposed scheme on a 32bit computer with Intel i7 CPU of 3.4 GHz and 4 G memory. The Paillier modulus used in this work is set to 1024. In the experiments, we use a realworld WiFi fingerprint dataset to evaluate the performance of our algorithm. The dataset has total 1000 records which are collected in a typical indoor environment. Each record contains the WiFi signal strengths from nearby APs. The total number of APs used in the experiments is 10 (i.e., n=10) and the total number of locations in the indoor environment is 76 (i.e., L=76). In the simulations, the data are randomly distributed to the suppliers and then the aggregator tries to estimate the WiFi fingerprint at every location.
7.2 Utility evaluation
In the experiments, the aggregator estimates the WiFi fingerprint at every location based on the data with noises provided by the suppliers, and then uses the estimated data to offer localization service. In this section, we evaluate the impact of the added noises on the aggregated results and the accuracy of the localization.
7.3 Computational and communication overhead
In this work, we employ Paillier cryptosystem as our cryptographic primitive to protect the suppliers’ privacy, which inevitably brings more computational and communication overhead. In this section, we evaluate the computational and communication cost of the proposed scheme. In the experiments, we set ε=0.4 and investigate the impact of the number of suppliers (i.e., n) on the computational time and communication overhead.
8 Conclusions
In this work, we propose a privacypreserving site survey scheme for WiFi fingerprintbased localization. The proposed scheme uses homomorphic encryption and differential privacy model to protect the location privacy of the participants which get involved in the site survey process of the WiFi fingerprint localization. We theoretically analyze the security of the scheme and use simulation experiments on realworld data to validate the efficiency of the proposed scheme.
9 Endnote
^{1}For example, assume that the mean of the WiFi signal strengths estimated by the aggregator is μ _{1} when u _{ i } is in the group, and the mean of the WiFi signal strengths is μ _{2} when u _{ i } is not in the group. The aggregagor can get the measured WiFi signal strength of u _{ i } by the formula μ _{1}·n−μ _{2}·(n−1), where n is the number of suppliers in the group.
Declarations
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (Grant No. 61472418).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 P Bahl, VN Padmanabhan, in Proc. of IEEE INFOCOM. Radar: an inbuilding RFbased user location and tracking system, (2000), pp. 775–784.Google Scholar
 Z Yang, C Wu, Y Liu, in Proc. of ACM MobiCom. Locating in fingerprint space: wireless indoor localization with little human intervention, (2012), pp. 269–280.Google Scholar
 H Liu, Y Gan, J Yang, S Sidhom, Y Wang, Y Chen, F Ye, in Proc. of ACM MobiCom. Push the limit of WiFi based localization for smartphones, (2012), pp. 305–316.Google Scholar
 W Cheng, D Wu, X Cheng, D Chen, in WASA. Routing for information leakage reduction in multichannel multihop adhoc social networks, (2012), pp. 31–42.Google Scholar
 D Milioris, L Kriara, A Papakonstantinou, G Tzagkarakis, P Tsakalides, M Papadopouli, in Proceedings of the 13th ACM International Conference on Modeling, Analysis, and Simulation of Wireless and Mobile Systems. Empirical evaluation of signalstrength fingerprint positioning in wireless LANs (ACM, 2010), pp. 5–13.Google Scholar
 J Niu, B Wang, L Cheng, JJ Rodrigues, in Communications (ICC) 2015 IEEE International Conference on. WicLoc: an indoor localization system based on WiFi fingerprints and crowdsourcing (IEEE, 2015), pp. 3008–3013.Google Scholar
 J Li, Z Cai, M Yan, Y Li, in INFOCOM, year=2016 Proceedings IEEE. Using crowdsourced data in locationbased social networks to explore influence maximization (IEEE, 2016).Google Scholar
 R Shokri, G Theodorakopoulos, J Le Boudec, J Hubaux, in IEEE Symposium on Security and Privacy. Quantifying location privacy, (2011), pp. 247–262.Google Scholar
 Y He, L Sun, W Yang, H Li, A game theorybased analysis of data privacy in vehicular sensor networks. Int. J. Distrib. Sens. Networks. 2014: (2014).Google Scholar
 T Shu, Y Chen, J Yang, A Williams, in INFOCOM, 2014 Proceedings IEEE. Multilateral privacypreserving localization in pervasive environments (IEEE, 2014), pp. 2319–2327.Google Scholar
 X Wang, Y Liu, Z Shi, X Lu, L Sun, A privacypreserving fuzzy localization scheme with CSI fingerprint, (2016).Google Scholar
 H Li, L Sun, H Zhu, X Lu, X Cheng, in INFOCOM, 2014 Proceedings IEEE. Achieving privacy preservation in WiFi fingerprintbased localization (IEEE, 2014), pp. 2337–2345.Google Scholar
 D Yang, X Fang, G Xue, in Proc. of IEEE INFOCOM. Truthful incentive mechanisms for kanonymity location privacy, (2013), pp. 3094–3102.Google Scholar
 X Liu, K Liu, L Guo, X Li, Y Fang, in Proc. of IEEE INFOCOM. A gametheoretic approach for achieving kanonymity in location based services, (2013), pp. 3085–3093.Google Scholar
 AR Beresford, F Stajano, in Proc. of the IEEE PerSec. Mix zones: user privacy in locationaware services, (2004), pp. 127–131.Google Scholar
 J Shao, R Lu, X Lin, in INFOCOM, 2014 Proceedings IEEE. FINE: a finegrained privacypreserving locationbased service framework for mobile devices (IEEE, 2014), pp. 244–252.Google Scholar
 I Bilogrevic, M Jadliwala, K Kalkan, JP Hubaux, I Aad, in Privacy Enhancing Technologies. Privacy in mobile computing for locationsharingbased services (Springer, 2011), pp. 77–96.Google Scholar
 A Chen, C Harko, D Lambert, P Whiting, An algorithm for fast, modelfree tracking indoors. ACM SIGMOBILE Mob. Comput. Commun. Rev. 11(3), 48–58 (2007).View ArticleGoogle Scholar
 D Milioris, G Tzagkarakis, A Papakonstantinou, M Papadopouli, P Tsakalides, Lowdimensional signalstrength fingerprintbased positioning in wireless LANs. Ad Hoc Netw. 12:, 100–114 (2014).View ArticleGoogle Scholar
 C Dwork, in Encyclopedia of Cryptography and Security. Differential privacy (Springer, 2011), pp. 338–340.Google Scholar
 P Paillier, in Proc. of ACM EUROCRYPT. Publickey cryptosystems based on composite degree residuosity classes, (1999).Google Scholar
 ME Andrés, NE Bordenabe, K Chatzikokolakis, C Palamidessi, in Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security. Geoindistinguishability: differential privacy for locationbased systems (ACM, 2013), pp. 901–914.Google Scholar
 FD Garcia, B Jacobs, in Security and Trust Management. Privacyfriendly energymetering via homomorphic encryption (Springer, 2011), pp. 226–238.Google Scholar
 S Kotz, T Kozubowski, K Podgorski, The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance (Springer Science & Business Media, 2012).Google Scholar