 Research
 Open Access
 Published:
Privacypreserving crowdsourced site survey in WiFi fingerprintbased localization
EURASIP Journal on Wireless Communications and Networking volume 2016, Article number: 123 (2016)
Abstract
Typically, site survey is an inevitable phase for WiFi fingerprintbased localization which is regarded as one of the most promising techniques for indoor localization. However, the site survey can cause potential location privacy leakage for the participants who contribute their WiFi fingerprint measurements. In this paper, we propose a privacypreserving site survey scheme for WiFi fingerprintbased localization. In the proposed scheme, we use homomorphic encryption to protect the location privacy of the participants which get involved in the site survey. Further, we employ differential privacy model to ensure that the released data will not breach an individual’s location privacy regardless of whether she is present or absent in the site survey group. We theoretically analyze the security of the proposed scheme and use simulation experiments on a realworld data to validate the efficiency of the proposed scheme.
Introduction
Due to the increasing demand for locationbased services (LBSs) and the lack of GPS signals in indoor environments, indoor localization has become more and more popular in recent years. Researchers have proposed a vast range of approaches, among which WiFi fingerprintbased indoor localization is one of the most promising technologies [1–4]. A typical WiFi fingerprintbased localization algorithm consists of two phases, offline site survey and online operating. In the offline site survey phase, the service provider collects WiFi signal strengths from multiple access points (APs) at every location of an interested area. Next, in the online operating phase, a tobelocalized client measures the signal strengths at a specific location from nearby APs, and then algorithms such as knearest neighbors [1, 3] or probabilitybased algorithms [5] are employed to infer the user’s location based on the measured WiFi signal strengths.
Usually the site survey is conducted in a crowdsourced way [2, 6, 7]. Suppliers recruited by the service provider measure the WiFi signal strengths of nearby APs when they visit the places which the service provider is interested in, and then send the measured WiFi signal strengths and the corresponding locations to the service provider. The service provider aggregates the data contributed by the suppliers to estimate the parameters which will be used in the online operating phase. The parameters which need to be estimated depend on the algorithms used in the online operating phase. In knearest neighborbased algorithms, the mean of the WiFi signal strength of every AP at every location needs to be estimated, while in the probabilitybased algorithms, both the mean and variance of the signal strength of every AP are required. Crowdsourcing is an efficient way to conduct the site survey, but the measurements contributed by the suppliers will inevitably leak their location privacy. The service provider can infer the locations that the suppliers visit based on the data they contribute. Existing research indicates that location traces can leak information about the individuals’ habits, interests, activities, and relationships [8, 9]. Consequently, the loss of location privacy can expose the suppliers to unwanted advertisements and locationbased spams/scams and may cause social reputation or economic damage to the suppliers and can make the victims of blackmails or even physical violence.
Several approaches have been proposed to address the privacy issues of indoor localization algorithms. In [10], Shu et al. studied the privacy issues in rangebased localization algorithms and proposed a scheme to protect users’ privacy during the localization process. In [11], Wang et al. developed a privacypreserving fuzzy localization scheme with CSI (Channel State Information) fingerprint. These privacypreserving schemes were not designed for WiFi fingerprintbased localization algorithms; thus, they cannot be used to address the privacy issues presented in this paper. The most closely related work to this paper is that of Li et al. [12] which proposed a privacypreserving scheme to address the privacy issues of the online operating phase in WiFi fingerprintbased localization algorithms. However, they did not consider the privacy leaks of the offline site survey phase.
In this paper, we propose a privacypreserving site survey scheme which can protect the suppliers’ location privacy in crowdsourcingbased site survey for WiFi fingerprintbased localization and, at the same time, can ensure the usability of the aggregated result for the service provider. Under this scheme, all the suppliers involved in the site survey form a group and they cooperate with each other to hide their measurements from the service provider based on homomorphic encryption. Further, every supplier releases her measurements in a differential private manner to guarantee that the released data will not breach an individual’s location privacy regardless of whether she is present or absent in the group. The contributions of this paper are summarized as follows:

To the best of our knowledge, this work is the first to address the privacy issues of the site survey in WiFi fingerprintbased localization algorithms.

We propose a privacypreserving site survey scheme for WiFi fingerprintbased localization based on homomorphic encryption and differential privacy model.

We theoretically analyze the security of the proposed scheme and carry out simulation experiments on a realworld dataset to evaluate the performance of our scheme.
The rest of the paper is organized as follows. We first discuss the related work and introduce the background. Then, we present the detailed design of our scheme and the security analysis. Finally, we report the evaluation results and conclude this paper.
Related work
Location privacy in LBSs has been widely studied in the literature. In general, all the existing works can be classified into two categories: privacypreserving service request and privacypreserving localization.
Privacypreserving service request
In LBSs, users send their locations to the service provider to get the services, which will inevitably leak their privacy. Many schemes have been proposed to protect users’ location privacy when they request the locationbased services. kanonymity [13, 14] provides a form of plausible deniability by ensuring that the client cannot be individually identified from a group of k clients. Mix zonebased schemes [15] divide the whole region into application and mix zones. Clients report their locations in application zones and receive new, unused pseudonyms at mix zones. Cryptographybased approaches [16, 17] protect users’ location privacy based on secure multiparty computation protocols. Since all the above schemes focus on the privacy issues when users request the location services, thus they cannot be used to address the privacy issues we discuss in this paper.
Privacypreserving localization
To address the location privacy issues in localization, Shu et al. [10] addressed the privacy leakage problem for rangebased localization algorithms, thus preventing the leakage of the location information of both the target and the anchors. Wang et al. [11] developed a privacypreserving fuzzy localization scheme with CSI fingerprint using homomorphic encryption and fuzzy logic. These privacypreserving schemes were not designed for WiFi fingerprintbased localization; thus, they cannot be used to address the privacy issues presented in this paper. Li et al. [12] studied the privacy issues in WiFi fingerprintbased localization and proposed a privacypreserving scheme to protect both the users’ and the service provider’s privacy during the online operating phase. However, they did not consider the privacy leaks during the site survey phase.
Background
WiFi fingerprintbased localization
The process of WiFi fingerprintbased localization can be divided into two phases: offline site survey phase and online operating phase. In the offline site survey phase, a supplier u _{ i } recruited by the service provider measures the WiFi signal strengths \({V_{s}^{i}}\) of nearby APs when they visit a place l _{ s } and send \((l_{s}, {V_{s}^{i}})\) to the service provider which aggregates the measurements and estimates the parameters which will be used in the online operating phase. In the online operating phase, a tobelocalized user measures the WiFi signal strengths at her current location, denoted as \(V^{\prime }=\left (v_{1}^{\prime },v_{2}^{\prime },\ldots,v_{j}^{\prime },\ldots,v_{N}^{\prime }\right)\). Then, the service provider uses knearest neighbors or probabilitybased algorithms to determine the location of the user.
In knearest neighborbased algorithms [1, 3], the service provider estimates the average WiFi signal strengths \(\overline {V_{s}}\) at every location l _{ s } based on the suppliers’ measurements and stores \((l_{s}, \overline {V_{s}})\) in the WiFi fingerprint database. In the online operating phase, knearest neighbors of V ^{′} are identified from the database to estimate the location of the user. In probabilitybased algorithms [18, 19], the location loc of the user is determined based on the Bayes’ theorem
A common assumption is that the signal strength of AP_{ i } at location l follows a normal distribution parameterized with mean μ and variance δ. The parameters μ and δ are estimated by the service provider based on the measurements of the suppliers.
Differential privacy
The concept of differential privacy is originally introduced by Dwork [20]. Differential privacy ensures that a supplier is not at increasing risk of privacy when she participates in a certain statistical database. An algorithm \(\mathcal {A}\) is εdifferential privacy, if for any datasets D _{1} and D _{2}, where D _{1} and D _{2} differ in at most one record, and for all subsets of possible answers \(S \subseteq \text {Range}(\mathcal {A})\),
The above equation indicates that the output of \(\mathcal {A}\) is insensitive to the modification of any single user’s data in the datasets (including its removal or addition). The parameter ε allows us to control the balance between the level of privacy and the data utility. A smaller ε implies stronger privacy. One common way to achieve differential privacy is to add Laplace noises to the original output of \(\mathcal {A}\) according to the following theorem.
Theorem 1.
For all f:D→R ^{d}, the following mechanism \(\mathcal {A}\) is εdifferential private: \(\mathcal {A}(D)=f(D)+\mathcal {L}(\Delta (f)/\epsilon)\), where \(\mathcal {L}(\Delta (f)/\epsilon)\) is an independently generated random variable following the Laplace distribution and Δ(f) denotes the global sensitivity of f, which is defined as follows:
for all D _{1} and D _{2} differing in at most one record.
The Paillier cryptosystem
In this work, we employ the Paillier cryptosystem as our cryptographic primitive. Invented by Pascal Paillier [21], the Paillier cryptosystem is a probabilistic asymmetric algorithm based on the decisional composite residuosity problem. Paillier cryptosystem is summarized below to facilitate the understanding of our algorithm.

Key generation: To construct the public and private keys, one first chooses two large primes p, q of equivalent length and computes N=p q, λ=l c m(p−1,q−1), g=N+1, and μ=φ(N)^{−1} mod n, where φ(N)=(p−1)(q−1). The public key PK and private key PR are (N,g) and (λ,μ), respectively.

Encryption: Let m be the plaintext to be encrypted. We denote the ciphertext of m by E(m), which is given by
$$ E(m) =g^{m}r^{N}\mod N^{2}, $$((4))where \(r\in \mathbb {Z}_{N}\) is a random number.

Decryption: Let c be the ciphertext, the plaintext D(m) is obtained by
$$ D(m) = L(c^{\lambda} \mod N^{2})\mu \mod N. $$((5))
The Paillier cryptosystem is additively homomorphic. Given only the public key, one can compute E(m _{1}+m _{2}) from E(m _{1}) and E(m _{2}) as follows:
System model and problem formulation
System model
A typical scenario of crowdsourcingbased site survey in WiFi fingerprintbased localization is depicted in Fig. 1. In general, there are n suppliers and an aggregator (i.e., the service provider). The suppliers could be volunteers or workers recruited by the service provider. Every supplier u _{ i } records the WiFi signal strengths \({V_{i}^{s}} = (v_{i}^{s1}, v_{i}^{s2},\ldots, v_{i}^{sj},\ldots)\) when she visits location l _{ s } and stores \((l_{s}, {V_{i}^{s}})\) in her local database V _{ i }, where \(v_{i}^{sj}\) is the measured WiFi signal strength of the jth AP, 1≤i≤n, l _{ s }∈L, and L is a location set defined by the service provider. The aggregator collects the measurements from the suppliers and would like to estimate the mean and variance of the WiFi signal strengths of every AP at every specific location in L based on the measurements of the suppliers.
Design goal
In the crowdsourcingbased site survey, the aggregator estimates the parameters based on the data (i.e., the measured WiFi signal strengths and the corresponding locations) contributed by the suppliers. However, the released data inevitably leak the location privacy of the suppliers. The aggregator can learn the locations that the suppliers visit. The goal of this paper is to ensure that the aggregator can estimate the mean and variance of WiFi signal strengths of every AP at every location in L, and at the same time, the location privacy of the suppliers is not compromised. In detail, we want to achieve the following privacy goals:

Location privacy: Our scheme should ensure that the aggregator cannot learn the locations that the suppliers visited before. Also, the WiFi signal strengths collected by the suppliers should not be revealed, since the aggregator can infer their location privacy based on their measured WiFi signal strengths.

Differential privacy: In the crowdsourcingbased site survey, even though the measurements of every supplier are completely hidden from the aggregator, it still can infer the location privacy of a supplier u _{ i } by comparing the aggregating result when the u _{ i } is in the site survey group and that when u _{ i } is not in the site survey group.^{1} Therefore, our scheme should achieve differential privacy which has been accepted as a standard for privacy preservation [20, 22]. Differential privacy can guarantee that the aggregator can retrieve information about any supplier only up to a predefined threshold, no matter what auxiliary information it knows about that supplier.
In this paper, we adopt the “honestbutcurious” model which assumes that each player honestly follows the designated protocols and procedures while it intends to disclose the other’s private information.
Privacypreserving site survey
In this section, we present a novel privacypreserving crowdsourcingbased site survey scheme which can estimate the distribution of the WiFi signal strengths at each specific location without leaking the privacy of each supplier. The proposed scheme consists of four phases which are detailed as follows.
Preparation and initiation
In this phase, n suppliers and an aggregator form a site survey group. Within this site survey group, every supplier u _{ i } generates its public key PK_{ i } and private key PR_{ i } using the Paillier cryptosystem and then sends the public key PK_{ i } to other suppliers and the aggregator. The above process can be executed offline and only needs to be performed once. If the aggregator wants to estimate the mean and variance of the WiFi signal strengths from the jth AP at location l _{ s }, it sends a request with <AP_{ j },l _{ s }> to every supplier in this group.
Adding noises
After receiving the aggregator’s request, every supplier u _{ i } first queries its local dataset V _{ i } to get a tuple \((m_{i},v_{i}^{sj})\), where m _{ i } indicates whether the supplier u _{ i } visited location l _{ s } before and \(v_{i}^{sj}\) is the measured WiFi signal strength of the jth AP at location l _{ s }. If the supplier u _{ i } visited location l _{ s } before, m _{ i } is set to 1 and \(v_{i}^{sj}\) is set to the measured WiFi signal strength. If the supplier u _{ i } never visited location l _{ s } before, m _{ i } and \(v_{i}^{sj}\) are both set to 0.
To ensure that the presence or absence of the supplier u _{ i } in the site survey group will not significantly increase her chance of being compromised (i.e., to achieve εdifferential privacy), every supplier u _{ i } adds appropriately chosen random noises to \(v_{i}^{sj}\) and m _{ i } as follows:
where \(\mathcal {G}_{1}(n,\lambda _{1})\) and \(\mathcal {G}_{2}(n,\lambda _{1})\) are independent and identically distributed (i.i.d.) random variables having gamma distribution with probability density function (PDF)
and \(\mathcal {G}_{3}(n,\lambda _{2})\) and \(\mathcal {G}_{4}(n,\lambda _{2})\) are i.i.d. random variables having gamma distribution with PDF
λ _{1}=Δ f _{1}/ε and λ _{2}=Δ f _{2}/ε, where Δ f _{1} and Δ f _{2} are the global sensitivity of the WiFi signal strength and m, respectively. Since the WiFi signal strength ranges from −90 to 0 dbm and m∈{0,1}, we set Δ f _{1}=90 and Δ f _{2}=1. The parameter ε controls the tradeoff between the desired privacy level and the data utility. A smaller ε yields a stronger privacy guarantee but generates more noises. In the evaluation section, we will investigate the impact of ε on the data utility. We will prove that the proposed scheme can achieve εdifferential privacy in the next section.
Encrypting data
After adding noises to her data, every supplier employs secret sharing [23] and Paillier cryptosystem to hide her data from the aggregator. For simplicity, we only demonstrate how to hide \(v_{i}^{\prime }\). The way to hide \(m_{i}^{\prime }\) is the same. Each supplier u _{ i } first splits \(v_{i}^{\prime }\) into n random shares as follows:
where η is a large integer. Then, each supplier u _{ i } keeps \(v_{ii}^{\prime }\) for herself, encrypts \(v_{ij}^{\prime }\) using the public key of supplier u _{ j }, and then sends \(E_{\text {PK}_{j}}(v_{ij}^{\prime })\) to the aggregator. After the aggregator receives the encrypted shares from all the suppliers, she adds the shares which are encrypted by the same public key based on the additively homomorphic property of the Paillier cryptosystem as follows:
Then, the aggregator sends V _{ i } to the supplier u _{ i }. Every supplier u _{ i } decrypts V _{ i } using her secret key PR_{ i }, and adds her share \(v_{ii}^{\prime }\phantom {\dot {i}\!}\) to \(D_{\text {PK}_{i}}(V_{i})\phantom {\dot {i}\!}\) to get \(V_{i}^{\prime } = \sum _{j=1}^{n}v_{ji}^{\prime }\phantom {\dot {i}\!}\) in plaintext and sends \(v_{i}^{\prime }\) to the aggregator. Adding all \(V_{i}^{\prime }(1\leq i\leq n)\) together, the aggregator can get \(V^{\prime } = \sum _{i=1}^{n}V_{i}^{\prime }\) which is equal with \(\sum _{i=1}^{n}v_{i}^{\prime }\). We will prove its correctness in next section. In the same way, every supplier can hide \(m_{i}^{\prime }\) from others, but the aggregator can get \(M^{\prime } = \sum _{i=1}^{n}m_{i}^{\prime }\).
Estimating the parameters
After getting \(V^{\prime } = \sum _{i=1}^{n}V_{i}^{\prime }\) and \(M^{\prime } = \sum _{i=1}^{n}m_{i}^{\prime }\), the aggregator can estimate the mean (denoted as μ ^{′}) of the WiFi signal strengths of AP_{ t } at location l _{ s } as follows:
Since every supplier adds controlled noises to her data, the estimated mean μ ^{′} is not exactly the same as \(\mu = \sum _{i=1}^{n}v_{i}^{sj}/\sum _{i=1}^{n}m_{i}\). The estimation error is controlled by the parameter ε. We will investigate the impact of ε on the estimation errors and show that the localization accuracy when we use μ ^{′} is comparable with that when we use μ in most cases.
To estimate the variance δ ^{′}, the aggregator send μ ^{′} back to every supplier u _{ i } which can then get δ _{ i } as follows:
Following the same rules described above, every supplier u _{ i } adds a random noise \(\mathcal {G}_{5}(n,\lambda _{1})  \mathcal {G}_{6}(n,\lambda _{3})\) to δ _{ i } to get \(\delta _{i}^{\prime } = \delta _{i} + \mathcal {G}_{5}(n,\lambda _{1})  \mathcal {G}_{6}(n,\lambda _{3})\), and then sends \(\delta _{i}^{\prime }\) to the aggregator in a secret way. \(\mathcal {G}_{5}(n,\lambda _{1})\) and \(\mathcal {G}_{6}(n,\lambda _{3})\) are i.i.d. random variables having gamma distribution with PDF
where λ _{3}=90^{2}/ε. Following the same rules above, the aggregator computes \(\sum _{i=1}^{n}\delta _{i}^{\prime }\) without knowing every \(\delta _{i}^{\prime }\), and the variance of the WiFi signal strengths of AP_{ t } at location l _{ s } can be estimated by \(\delta ^{\prime } = \sum _{i=1}^{n}\delta _{i}^{\prime }/M^{\prime }\).
Theoretical analysis of the proposed scheme
In this section, we will theoretically analyze the correctness of the proposed scheme and prove that the proposed scheme can achieve the desired privacy goals.
The correctness of the scheme
In our scheme, we employ secret sharing and homomorphic encryption to protect the privacy of every supplier. Every supplier u _{ i } splits her data \(v_{i}^{\prime }\) into n shares and submits \(v_{i}^{\prime }\) to the aggregator. Adding all \(V_{i}^{\prime }(1\leq i\leq n)\) together, the aggregator can get \(V^{\prime } = \sum _{i=1}^{n}V_{i}^{\prime }\). We claim that \(V^{\prime } = \sum _{i=1}^{n}V_{i}^{\prime }\) is equal to \(\sum _{i=1}^{n}v_{i}^{\prime }\), which is supported by the following theorem.
Theorem 2.
Only given \(V_{i}^{\prime }(1\leq i\leq n)\), the aggregator can correctly compute \(\sum _{i=1}^{n}v_{i}^{\prime }\) by adding \(V_{i}^{\prime }(1\leq i\leq n)\) together.
Proof.
As described above, \(V_{i}^{\prime } = D_{\text {PK}_{i}}(V_{i}) + v_{ii}^{\prime }\) thus, we have
Applying the additively homomorphic property of Paillier cryptosystem, we have \( \underset {j\neq i}{ \prod }E_{\text {PK}_{i}}(v_{ji}^{\prime }) = E_{\text {PK}_{i}}({\underset {j\neq i}{\sum }v_{ji}^{\prime }})\), thus
Then, we have \(\sum _{i=1}^{n}V_{i}^{\prime } = \sum _{i=1}^{n}v_{i}^{\prime }\), which proves Theorem 2.
Following the same rules, we can also prove that the aggregator can correctly compute \(\sum _{i=1}^{n}m_{i}^{\prime }\) and \(\sum _{i=1}^{n}\delta _{i}^{\prime }\). Therefore, the correctness of the proposed scheme is proved.
The security of the scheme
In the proposed scheme, every supplier adds random noises having gamma distribution to achieve differential privacy and further employs secret sharing to hide her data. We claim that the proposed scheme can achieve the desired privacy goals, which is supported by the following two Theorems:
Theorem 3.
The proposed scheme satisfies εdifferential privacy.
Proof.
In the proposed scheme, \(v_{i}^{\prime }=v_{i}+\mathcal {G}_{1}(n,\lambda _{1})  \mathcal {G}_{2}(n,\lambda _{1})\), where \(\mathcal {G}_{1}(n,\lambda _{1})\) and \(\mathcal {G}_{2}(n,\lambda _{1})\) are i.i.d. random variables having gamma distribution with PDF
thus, we have
Let \(\mathcal {L}(\lambda)\) denote a random variable which has a Laplace distribution with PDF \(f(x,\lambda _{1})=\frac {1}{2\lambda _{1} }e^{\frac {\left \vert x\right \vert }{\lambda _{1}}}\). According to [24], the distribution of \(\mathcal {L}(\lambda _{1})\) is infinitely divisible. Furthermore, for every integer n≥1, \(\mathcal {L}(\lambda _{1}) = \sum _{i=1}^{n}\left [ {\mathcal {G}}_{1}(n,\lambda _{1}){\mathcal {G}}_{2}(n,\lambda _{1})\right ]\), where \(\mathcal {G}_{1}(n,\lambda _{1})\) and \(\mathcal {G}_{2}(n,\lambda _{1})\) are i.i.d. random variables having gamma distribution with PDF \(g(x,n,\lambda _{1})=\frac {(1/\lambda _{1})^{1/n}}{\Gamma (1/n)}x^{\frac {1}{n}1}e^{x/\lambda _{1} }\), where x≥0. Thus, we have
Similarly, we can have
where \(\mathcal {L}(\lambda _{2})\) and \(\mathcal {L}(\lambda _{3})\) are two random variables following the Laplace distribution with PDF \(f(x,\lambda _{1})=\frac {1}{2\lambda _{2} }e^{\frac {\left \vert x\right \vert }{\lambda _{2}}}\) and \(f(x,\lambda _{3})=\frac {1}{2\lambda _{3} }e^{\frac {\left \vert x\right \vert }{\lambda _{3}}}\), respectively. According to Theorem 1, the proposed scheme achieves εdifferential privacy.
Theorem 4.
The proposed scheme can protect every supplier’s location privacy.
Proof.
In the proposed scheme, every supplier u _{ i } splits its data \(v_{i}^{\prime }\), \(m_{i}^{\prime }\), and \(\delta _{i}^{\prime }\) into n random shares and sends the other encrypted n−1 shares to the aggregator. Even the aggregator gets the plaintexts of the other n−1 shares, it still cannot know \(v_{i}^{\prime }\), \(m_{i}^{\prime }\), and \(\delta _{i}^{\prime }\) since u _{ i } keeps one share after splitting the data. Therefore, the aggregator cannot figure out whether u _{ i } visited l _{ s } and the measured WiFi signal strength, which proves that the proposed scheme can protect every supplier’s location privacy.
Evaluation
In this section, we evaluate the performance of the proposed scheme. We focus on two important metrics in the evaluation: the utility of the aggregated data and the efficiency of the proposed scheme.
Experiment setup
We implement the supplier side of the proposed scheme on a Android platform with a Qualcomm Snapdragon600 QuadCore 1.7 GHz CPU and 2 G RAM, and the aggregator side of the proposed scheme on a 32bit computer with Intel i7 CPU of 3.4 GHz and 4 G memory. The Paillier modulus used in this work is set to 1024. In the experiments, we use a realworld WiFi fingerprint dataset to evaluate the performance of our algorithm. The dataset has total 1000 records which are collected in a typical indoor environment. Each record contains the WiFi signal strengths from nearby APs. The total number of APs used in the experiments is 10 (i.e., n=10) and the total number of locations in the indoor environment is 76 (i.e., L=76). In the simulations, the data are randomly distributed to the suppliers and then the aggregator tries to estimate the WiFi fingerprint at every location.
Utility evaluation
In the experiments, the aggregator estimates the WiFi fingerprint at every location based on the data with noises provided by the suppliers, and then uses the estimated data to offer localization service. In this section, we evaluate the impact of the added noises on the aggregated results and the accuracy of the localization.
In the proposed scheme, every supplier adds two random noises with gamma distribution to her measurements to achieve εdifferential privacy. In the experiments, we employ the Euclidean distance as the metric to evaluate the usability of the data with noises. Figure 2 presents the cumulative distribution function (CDF) of the Euclidean distance between the estimated WiFi fingerprint without noises and the estimated WiFi fingerprint with noises. It is observed that the accuracy of estimation increases when ε increases from 0.4 to 2.0, and 80 % of the Euclidean distances between the noisy WiFi fingerprint and the original WiFi fingerprint are smaller than 6. A smaller ε yields larger noises, which, on the other hand, provides a stronger privacy preservation. It is a tradeoff between the utility of the data and the privacy.
Further, we investigate the impact of our privacypreserving scheme on the localization accuracy. In this paper, we employ knearest neighbors to determine the unknown locations in the online operating phase. Figure 3 shows the CDF of the localization errors when ε is set to different values and when no noises are added to the data. It is observed that the localization accuracy increases when ε increases from 0.4 to 2.0 and the localization accuracy is the highest when no noises are added to the data. We also observed that 80 % of the localization errors are within 5 m, which implies a high data usability.
Computational and communication overhead
In this work, we employ Paillier cryptosystem as our cryptographic primitive to protect the suppliers’ privacy, which inevitably brings more computational and communication overhead. In this section, we evaluate the computational and communication cost of the proposed scheme. In the experiments, we set ε=0.4 and investigate the impact of the number of suppliers (i.e., n) on the computational time and communication overhead.
Figure 4 shows the time cost on the supplier side and the aggregator side for estimating the WiFi signal strength of one AP at every location. We can see that the computational time on the supplier side is proportional to the number of the suppliers. When the number of the suppliers is set to 10, the time cost on the supplier side is 1.6 s. When the number of the suppliers reaches 100, the time cost on the supplier side is 16 s. The computational time on the aggregator side is proportional to the square of the number of the suppliers. When the number of the suppliers is 10, the computational time on the aggregator side is only 0.08 s. However, when the number of the suppliers reaches 100, the computational time on the aggregator side becomes 8.8 s.
Figure 5 shows the impact of the number of the suppliers on the bandwidth cost of every supplier and the aggregator. It is observed that the bandwidth cost of the supplier is proportional to the number of the suppliers. The bandwidth cost is 10 kb for every supplier when the number of the supplier is 10. The bandwidth cost increases with the increase of the number of the supplers. The bandwidth cost reaches 101 kb for every supplier when the number of the supplier is 100. The bandwidth cost on the aggregator side is proportional to the square of the number of the suppliers. When the number of the suppliers is 10, the bandwidth cost of the aggregator is only 110 kb. When the number of the suppliers is 100, the bandwidth cost becomes 10100 kb.
Conclusions
In this work, we propose a privacypreserving site survey scheme for WiFi fingerprintbased localization. The proposed scheme uses homomorphic encryption and differential privacy model to protect the location privacy of the participants which get involved in the site survey process of the WiFi fingerprint localization. We theoretically analyze the security of the scheme and use simulation experiments on realworld data to validate the efficiency of the proposed scheme.
Endnote
^{1}For example, assume that the mean of the WiFi signal strengths estimated by the aggregator is μ _{1} when u _{ i } is in the group, and the mean of the WiFi signal strengths is μ _{2} when u _{ i } is not in the group. The aggregagor can get the measured WiFi signal strength of u _{ i } by the formula μ _{1}·n−μ _{2}·(n−1), where n is the number of suppliers in the group.
References
P Bahl, VN Padmanabhan, in Proc. of IEEE INFOCOM. Radar: an inbuilding RFbased user location and tracking system, (2000), pp. 775–784.
Z Yang, C Wu, Y Liu, in Proc. of ACM MobiCom. Locating in fingerprint space: wireless indoor localization with little human intervention, (2012), pp. 269–280.
H Liu, Y Gan, J Yang, S Sidhom, Y Wang, Y Chen, F Ye, in Proc. of ACM MobiCom. Push the limit of WiFi based localization for smartphones, (2012), pp. 305–316.
W Cheng, D Wu, X Cheng, D Chen, in WASA. Routing for information leakage reduction in multichannel multihop adhoc social networks, (2012), pp. 31–42.
D Milioris, L Kriara, A Papakonstantinou, G Tzagkarakis, P Tsakalides, M Papadopouli, in Proceedings of the 13th ACM International Conference on Modeling, Analysis, and Simulation of Wireless and Mobile Systems. Empirical evaluation of signalstrength fingerprint positioning in wireless LANs (ACM, 2010), pp. 5–13.
J Niu, B Wang, L Cheng, JJ Rodrigues, in Communications (ICC) 2015 IEEE International Conference on. WicLoc: an indoor localization system based on WiFi fingerprints and crowdsourcing (IEEE, 2015), pp. 3008–3013.
J Li, Z Cai, M Yan, Y Li, in INFOCOM, year=2016 Proceedings IEEE. Using crowdsourced data in locationbased social networks to explore influence maximization (IEEE, 2016).
R Shokri, G Theodorakopoulos, J Le Boudec, J Hubaux, in IEEE Symposium on Security and Privacy. Quantifying location privacy, (2011), pp. 247–262.
Y He, L Sun, W Yang, H Li, A game theorybased analysis of data privacy in vehicular sensor networks. Int. J. Distrib. Sens. Networks. 2014: (2014).
T Shu, Y Chen, J Yang, A Williams, in INFOCOM, 2014 Proceedings IEEE. Multilateral privacypreserving localization in pervasive environments (IEEE, 2014), pp. 2319–2327.
X Wang, Y Liu, Z Shi, X Lu, L Sun, A privacypreserving fuzzy localization scheme with CSI fingerprint, (2016).
H Li, L Sun, H Zhu, X Lu, X Cheng, in INFOCOM, 2014 Proceedings IEEE. Achieving privacy preservation in WiFi fingerprintbased localization (IEEE, 2014), pp. 2337–2345.
D Yang, X Fang, G Xue, in Proc. of IEEE INFOCOM. Truthful incentive mechanisms for kanonymity location privacy, (2013), pp. 3094–3102.
X Liu, K Liu, L Guo, X Li, Y Fang, in Proc. of IEEE INFOCOM. A gametheoretic approach for achieving kanonymity in location based services, (2013), pp. 3085–3093.
AR Beresford, F Stajano, in Proc. of the IEEE PerSec. Mix zones: user privacy in locationaware services, (2004), pp. 127–131.
J Shao, R Lu, X Lin, in INFOCOM, 2014 Proceedings IEEE. FINE: a finegrained privacypreserving locationbased service framework for mobile devices (IEEE, 2014), pp. 244–252.
I Bilogrevic, M Jadliwala, K Kalkan, JP Hubaux, I Aad, in Privacy Enhancing Technologies. Privacy in mobile computing for locationsharingbased services (Springer, 2011), pp. 77–96.
A Chen, C Harko, D Lambert, P Whiting, An algorithm for fast, modelfree tracking indoors. ACM SIGMOBILE Mob. Comput. Commun. Rev. 11(3), 48–58 (2007).
D Milioris, G Tzagkarakis, A Papakonstantinou, M Papadopouli, P Tsakalides, Lowdimensional signalstrength fingerprintbased positioning in wireless LANs. Ad Hoc Netw. 12:, 100–114 (2014).
C Dwork, in Encyclopedia of Cryptography and Security. Differential privacy (Springer, 2011), pp. 338–340.
P Paillier, in Proc. of ACM EUROCRYPT. Publickey cryptosystems based on composite degree residuosity classes, (1999).
ME Andrés, NE Bordenabe, K Chatzikokolakis, C Palamidessi, in Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security. Geoindistinguishability: differential privacy for locationbased systems (ACM, 2013), pp. 901–914.
FD Garcia, B Jacobs, in Security and Trust Management. Privacyfriendly energymetering via homomorphic encryption (Springer, 2011), pp. 226–238.
S Kotz, T Kozubowski, K Podgorski, The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance (Springer Science & Business Media, 2012).
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (Grant No. 61472418).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Li, S., Li, H. & Sun, L. Privacypreserving crowdsourced site survey in WiFi fingerprintbased localization. J Wireless Com Network 2016, 123 (2016). https://doi.org/10.1186/s1363801606242
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363801606242
Keywords
 Privacy
 Crowdsourcing
 Site survey
 Differential privacy