Skip to content


Open Access

Robust indoor localization and tracking using GSM fingerprints

  • Ye Tian1Email author,
  • Bruce Denby1,
  • Iness Ahriz2,
  • Pierre Roussel3 and
  • Gérard Dreyfus3
EURASIP Journal on Wireless Communications and Networking20152015:157

Received: 27 September 2014

Accepted: 29 May 2015

Published: 6 June 2015


The article presents an easy to implement approach for indoor localization and navigation that combines Bayesian filtering with support vector machine classifiers to associate high-dimensionality cellular telephone network received signal strength fingerprints to distinct spatial regions. The technique employs a “space sampling” and a “time sampling” scheme in the training procedure, and the Bayesian filter allows introducing a priori information on room layout and target trajectories, resulting in robust room-level indoor localization.


Indoor localizationTarget trackingFingerprintingSupport vector machineBayesian filter

1 Introduction

A variety of localization and tracking approaches based on Global Positioning System (GPS) and other satellite-based systems have provided solutions for outdoor environments [1]. However, since satellite signals do not penetrate buildings adequately, it is not possible for such systems to function in indoor environments, which has led to a variety of proposed methods for indoor localization and tracking in an attempt to provide seamless and ubiquitous services for mobile users [214].

Existing indoor localization techniques are implemented using such technologies as infrared, Bluetooth, radio-frequency identification (RFID), wireless local area networks (WLAN), Global System for Mobile Communications (GSM) networks, ultra-wideband (UWB), acoustic signals, etc. Methods based on the measurement of received signal strength (RSS) in RF networks, such as Wi-Fi and Bluetooth networks, for example, have proven to be effective [29]. These are low-cost and simple to implement because wireless system receivers commonly possess RSS measurement capabilities. However, such short-range signals require the deployment and maintenance of networks, which is time and labor consuming.

Indoor localization approaches based on ambient radiotelephone networks such as GSM and CDMA have also been studied [1012], which suggest that an appropriately programmed standard cellular mobile phone can provide a simple, inexpensive solution for room-level indoor localization [13]. In indoor environments, where it is rare to have a line-of-sight path between mobile terminals and base stations, RSS is significantly affected by shadowing and multipath effects, making it difficult to develop a mathematical model to estimate distances from RSS [14]. Classification of RSS values according to position, however, has been shown to be an interesting alternative; here, the mobile terminal’s position is estimated by classifying sets of measured RSS values, called fingerprints, obtained from distinct spatial regions, making use of a model constructed in a previous “training” phase.

This paper presents a room-level indoor localization method that uses support vector machines (SVMs) to classify RSS vectors containing very large numbers of GSM channels. Using a “space sampling” scheme, training and test data was recorded while randomly walking inside the rooms, which enables localizing a mobile terminal in arbitrary positions, not only in some representative points. The robustness problem of the received signal strength fingerprinting approach was investigated with experiments over several months. The evolution of received signal strengths and the localization performance are examined. In order to combat the severe, performance-degrading fluctuations to which radiotelephone RSS values are susceptible [15], a “time sampling” scheme is introduced to incorporate as much as possible of the RSS fluctuations. A Bayesian filter is furthermore applied in order to employ a priori information about the indoor environment and mobile’s trajectory for the correction of possible errors in the raw SVM outputs. Tests in several rooms of an office building show that the space sampling and time sampling schemes allow the performance of our indoor room-level localization system to remain stable over a period of months, and that the Bayesian filter is able to correct most of the localization errors made by this classifier. Good results using the system in a railway and subway transfer station are also presented.

The structure of the article is as follows. The principle of indoor localization is described in “Section 2”, followed by a presentation of the experimental setup and localization algorithms in “Section 3.” Results are presented in “Section 5,” while a conclusion appears in the last section.

2 Background

2.1 Fingerprint-based localization technique

Localization can be based on signals of various natures: electromagnetic (RSS, time of arrival, angle of arrival), magnetic, acoustic, etc. An RSS fingerprint consists of a set of location-dependent received signal strengths. For indoor localization, the fingerprinting approach consists of two phases as described below.

2.1.1 Training phase

In the training phase, also known as the offline or calibration phase, position-tagged RSS values over a wide range of positions are recorded and used to construct a model that relates RSS to position. The locations at which measurements are performed can correspond to grid points, specific reference points, or regions, depending on the targeted application and desired accuracy. Because these signals are variable and noisy, it is necessary to record as many measurements in each location as possible to cover a large diversity of fingerprints and get an accurate estimation of the distribution of fingerprints. An appropriate number of measurements taken during the training phase will assure adequate performance later on during the online localization phase [16].

2.1.2 Localization phase

In the localization phase, measurements are recorded online and sent to the localization model developed in the training phase, which in turn provides a location estimation.

2.2 GSM RSS fingerprint

GSM is the most widely deployed cellular telephony standard in the world, with networks provided in more than 220 countries by nearly 800 mobile operators worldwide [17]. Most cellphones today still support GSM, and GSM will continue to be used until at least 2021. Because 3G is operated in relatively high frequency and has poor penetration, and 4G is not deployed yet in most places, GSM was chosen for our indoor localization studies since its ubiquity avoids the need for time- and labor-consuming infrastructure deployment and maintenance. In addition, it is shown in [10] that GSM signal strengths have smaller fluctuations in time than 2.4-GHz Wi-Fi signals.

In most installations, both a 900- and a 1800-MHz band are used for GSM, with some variations from country to country. In this work, all experiments used the 900- and 1800-MHz bands as defined in France. Each carrier is labeled with an absolute radio-frequency channel number (ARFCN) as shown in Table 1 [18].
Table 1

ARFCN of extended GSM-900 and GSM-1800



Extended GSM-900

0–124, 975–1023



There are overall 548 channels in the combined Extended GSM-900 (including the standard GSM-900 band) and GSM-1800 bands. Although one might expect broadcast control channels (BCCH, or beacon channels), in which data is transmitted at constant power, to be the most useful for localization, our system in fact scans RSS of all 548 channels without regard to the type of logical channel implemented. This has an added advantage of allowing a very rapid scan (see “Section 3.2”), since it is then not necessary to decode the base station identity code of each carrier. These “full-channel” fingerprints provide a rich measurement of the local radio environment.

2.3 Desired localization accuracy

Accuracy is an important indicator of indoor localization performance. For most indoor location-based services, such as indoor navigation, advertising, and rescue, room level, i.e., a few meters accuracy, is perfectly adequate. Furthermore, the fingerprinting classification method requires more detailed training datasets as region sizes are reduced, which is time and labor consuming, and results in increased computation time in both the training phase and localization phase. As a trade-off between accuracy and ease of implementation, room-level localization was therefore chosen in all of our tests.

3 Experimental setup

3.1 Experimental sites

Experimental data were recorded in two sites. The first, called the “laboratory site,” is a fourth-floor laboratory building in central Paris, France, consisting of a steel frame, concrete and plaster walls, and double-pane windows. Measurements were obtained in seven rooms of this building, as shown in the layout map of Fig. 1.
Fig. 1

Layout of the laboratory measurement site. The seven rooms are with different sizes and shapes, separated in two sides of the corridor

The second site, referred to as the “station site”, is the “Gare de Lyon” railway and subway transfer station in southeast Paris, France, which consists of three floors with waiting halls and transit corridors. Experiments here were carried out in an area extending from the entrance of the subway station, on the second underground floor, to the waiting hall on the ground floor. For recording fingerprints, we defined location zones in the area, as shown in Fig. 2. To make these zones more meaningful, they were based on conventionally defined areas such as halls, entrances, and escalators, as shown in Fig. 2.
Fig. 2

Location cells of the station measurement site. Cell 1 is the exit of the subway, cells 2 and 5 are two escalators, cells 3 and 4 are in the connecting hall, and cells 6 to 8 are in the waiting hall

3.2 Data acquisition and processing device

The data acquisition device used in our experiments was the Test Mobile System (TEMS), consisting of a standard Sony Ericsson W995 mobile phone to which network investigation software has been added by the manufacturer. In stand-alone mode, it can scan channels in either the GSM-900 band or the GSM-1800 band, while when interfaced to a PC running the TEMS™ investigation software, it is able to scan the entire GSM-900 and GSM-1800 bands in only about 300 ms. Since all mobile phones are required to be able to scan all channels in the GSM bands, our approach is potentially applicable to any commercial phone supporting GSM. This means that if cellphone manufacturers can be convinced that GSM-based indoor localization is viable, it should be relatively inexpensive to create new products. As for the power consumption, a single battery charge will last for up to 10 h when the TEMS mobile phone does uninterrupted scanning, which is longer than the nominal call time, 9 h. A 1-min scan occupies less than 300 kB of storage space.

All the data in our experiments is stored and processed in a HP™ Z800 workstation. The workstation has two Intel Xeon E5620 central processing units with 2.4GHz clock frequency and 16 GB of random-access memory, which runs a Windows 7 64-bit operating system. Data is processed using Matlab 2011b 64bit.

3.3 Data collection schemes

In general, data acquisition is quite simple and easy. Since the TEMS automatically records the fingerprint examples very fast, what the data collection staff needs to do is just a few minutes of random walk inside each room. In our experiments, we used two methods to record training datasets, which we refer to as space sampling and time sampling.

3.3.1 Space sampling scheme

To construct a “radio map” of the indoor environment, we need to know the distribution of signal strengths in each location area, since RSS values vary in space over the area. Since we use the room as the smallest location unit, we performed space sampling by collecting a large number of signal strengths in each room. This was done by recording the RSS with the TEMS held in hand during a random walk throughout the accessible space of each room, rather than, for example, using a grid or a set of special representative points.

3.3.2 Time sampling scheme

RSS suffers from fluctuations on different time scales due to shadowing, multipath, and environmental effects, such as network traffic, presence of people, and atmospheric conditions. To demonstrate this kind of signal strength variation, we recorded two datasets in room 1 and room 7 of the laboratory site over 2 days. Figure 3 shows the fluctuation over time of RSS averaged over all GSM channels. The day/night cycle is clearly apparent, along with shorter-time fluctuations.
Fig. 3

RSS fluctuation over time. There are a lot of small short-time fluctuations as well as day and night cycles

To counteract the effect of these fluctuations, a time sampling scheme was also used in recording the datasets. On each day for which we have measurements, we recorded fingerprints at different time periods from morning to evening. Such training data, over several days or even longer times, are thus expected to provide a better sampling of RSS fingerprint values.

3.4 Datasets

Three types of dataset were recorded in our experiments. The first type, which we call a “classification set,” was collected in the seven rooms of the laboratory site. Scans were recorded during random walks in all seven rooms and manually labeled with the corresponding room numbers. While scanning a room, the TEMS Pocket was turned on and held in hand while walking; then, after a few minutes, the scan was stopped. Datasets were recorded on 34 different days between December 15, 2012, and March 1, 2013.

The second type of dataset, called “tracking set,” was also taken in the laboratory site. This type of set was recorded while a user, holding the mobile phone, walked between the seven rooms, continuously recording the RSS. The actual trajectories were recorded using the mobile phone camera. Nine such tracking datasets were recorded in our experiments; they are investigated in section 5.2.

The third and final type of dataset, called the “station demo set,” was used for a practical demonstration at the station site. Two such training datasets were recorded during a random walk inside the station on May 24 and June 17, 2013, respectively, and labeled manually. An additional test “station demo” trace was recorded on June 19.

All datasets are available as an open-source project at

4 Methodology

Figure 4 depicts the complete location estimation algorithm, which consists of an offline training phase, an online localization phase, and a post-processing (Bayesian filtering) phase. As introduced above, before the localization system can be used, an offline training is first performed, including labeling of the rooms or regions of the site (“zoning”), RSS data acquisition, and training and validation to develop a localization model. The model is then used in the online localization phase: real-time RSS data is input to produce an estimate of location (i.e., the room or region as defined during zoning). Finally, a more reliable location estimate is obtained using Bayesian filtering to combine the raw classification result provided by the localization model with prior knowledge of the physical layout of the area under study.
Fig. 4

Overall localization algorithm, including an offline training phase, an online localization phase, and a Bayesian filtering phase

4.1 Classification algorithms

The room-level indoor localization problem is considered as a multiclass classification problem, where each room is a class. As is usual in data-driven classification problems, the algorithm works in a two-stage process. The first stage is offline training, in which the equations of the discriminant functions are determined using training data with known labels. The second stage is online testing, in which, given a fingerprint that is not present in the training dataset, the classifier must provide the label of the room where it was measured, using the previously defined separating surfaces.

4.1.1 Pairwise classifier

Since the number of variables is very large and the size of the training set is relatively limited, support vector machine (SVM) classifiers were deemed appropriate because of their built-in regularization mechanism [19].

Consider a set of M examples of items belonging to either of two classes A and B, each example being described by a p-dimensional vector x i . Further assume that the examples are linearly separable, i.e., that there are, in descriptor space, linear surfaces of equation f(x) = 0 that separate all examples without error: f(x i ) > 0 for all examples belonging to class A and f(x i ) < 0 otherwise. The equation of linear separating surfaces has the form
$$ f\left(\mathbf{x}\right)=\mathbf{w}\cdot \mathbf{x}+b $$
A linear SVM is a linear classifier such that all training examples are correctly classified and that the minimum distance between the separation surface f(x) = 0 and the examples that are closest to it (called support vectors) is maximum, thereby guaranteeing the best generalization given the available data. Figure 5 is an example of a SVM classifier with two classes in a two-dimensional descriptor space, where the squares are examples of class A and the circles are examples of class B. Squares and circles in red outline indicate the support vectors.
Fig. 5

An example of SVM classification. The SVM classifier separates the examples with the maximum margin γ

The values of the parameters in w and b of such a classifier are estimated by solving a quadratic optimization problem under linear inequality constraints: maximize the geometric margin γ (shown in Fig. 5) under the constraint that all training points are correctly classified. The support vectors are the points that lie on the margin.
Fig. 6

An example of soft-margin SVM. The soft-margin SVM classifier allows a small fraction of the training examples to be in the margin or even misclassified

If the examples are not linearly separable, one can resort to the “soft-margin” approach, whereby a small fraction of the training examples is in the margin or even misclassified (Fig. 6). The approach performs a trade-off between accuracy of classification of the training examples and ability to generalize; the price to pay is the introduction of a “regularization” constant C whose value must be chosen appropriately. An alternative solution consists in trying to find nonlinear separation surfaces by means of the “kernel trick”; as this approach was found not to be more efficient than the soft-margin approach for our data, it will not be described here.

To summarize, a GSM environment described by the fingerprint x is assigned to room A or room B according to the sign of f(x), defined by (1). x i is the fingerprint dataset entry i, i.e., row i of RSS. The offline training step, i.e., the estimation of the values of w and b in relation (1) from the training data, may be computationally costly (typical training times are provided in Table 2). The localization consists simply in computing the sign of f(x) given the GSM fingerprint x, which can be done online: it is very fast.
Table 2

Training times

Training data window (days)

Training examples

Total training time (min)










The SVMs used in our study were implemented using the SVMlight toolbox [20].

4.1.2 Decision rules for multiclass discrimination

When the discrimination problem involves more than two classes, it is necessary, for pairwise classifiers such as SVMs, to define a method that allows combining multiple pairwise classifiers into a single multiclass classifier [21]. We applied one-vs-all multiclass classifiers in this paper.

The one-vs-all approach consists of dividing the n-class problem into an ensemble of n pairwise classification problems, each of which is specialized in separating one class from all others. Figure 7 illustrates the procedure. In the first stage, each of the n classifiers is trained separately, and in the second stage, the following decision rule is applied: the outputs of all n classifiers are first calculated, and following the conventional procedure, the predicted class is taken to be that of the classifier with the largest magnitude of f(x) (relation (1)). The one-vs-all technique is advantageous from a computational standpoint, in that it only requires a number of classifiers equal to the number of classes.
Fig. 7

One-vs-all decision rule. Test example is the first input to each of the pairwise classifier separating one class from all others and the predicted class based on the decision rule

4.2 Bayesian filter

Localization accuracy can be improved by taking into account time constraints (the receiver moves with a finite velocity) and space constraints (presence of walls and furniture, occupancy of the room, etc.). In this article, this is achieved by Bayesian filtering, which allows combining the current and previous SVM classifier outputs, and taking into account space constraints [2224].

4.2.1 Recursive Bayesian filtering

For room-level indoor localization, the state x k at discrete time k is the actual room number of the target, while the observation y k is the output of the SVM classifier at the same time. We assume that the state at time k depends only on the state at time k − 1. From Bayes’ theorem [25], the probability of the target being in room x k given the past and present outputs of the SVM classifier is
$$ P\left({x}_k\left|{y}_k,{y}_{k-1}\right.\right)=\frac{P\left({y}_k\left|{y}_{k-1},{x}_k\right.\right)P\left({y}_{k-1}\left|{x}_k\right.\right)P\left({x}_k\right)}{P\left({y}_k,{y}_{k-1}\right)} $$
Since the classifier at time k does not take into account its previous output, we can write
$$ P\left({y}_k\left|{y}_{k-1},{x}_k\right.\right)=P\left({y}_k\left|{x}_k\right.\right) $$
Applying Bayes’ theorem to x k and y k − 1, we have
$$ P\left({x}_k\Big|{y}_{k-1}\right)=\frac{P\left({y}_{k-1}\Big|{x}_k\right)P\left({x}_k\right)}{P\left({y}_{k-1}\right)} $$
Therefore, relation (2) can be rewritten as
$$ \begin{array}{c}P\left({x}_k\left|{y}_k,{y}_{k-1}\right.\right)=\frac{P\left({y}_k\left|{x}_k\right.\right)P\left({x}_k\left|{y}_{k-1}\right.\right)P\left({y}_{k-1}\right)}{P\left({y}_k,{y}_{k-1}\right)}\\ {}=\frac{P\left({y}_k\left|{x}_k\right.\right)P\left({x}_k\left|{y}_{k-1}\right.\right)}{P\left({y}_k\left|{y}_{k-1}\right.\right)}\end{array} $$
where P(y k |x k ) is the likelihood of observing y k when the target is in room x k and P(x k |y k − 1) is the probability of the target being in location x k given the label assigned by the SVM classifier at time k − 1. For our room-level indoor localization, we have a finite number of rooms numbered from 1 to 7. Therefore, we have
$$ P\left({x}_k\Big|{y}_{k-1}\right)={\displaystyle \sum_{x_{k-1}=1}^7P\left({x}_k\Big|{x}_{k-1}\right)P\left({x}_{k-1}\Big|{y}_{k-1}\right)} $$

where P(x k |x k − 1) is the state transition probability from x k − 1 to x k , which is constrained by the prior information of room layout, target velocity, maximum room occupancy, etc., as described in the next subsection.

Finally, we have
$$ P\left({x}_k\left|{y}_k,{y}_{k-1}\right.\right)=\frac{P\left({y}_k\left|{x}_k\right.\right){\displaystyle \sum_{x_{k-1}=1}^7P\left({x}_k\Big|{x}_{k-1}\right)P\left({x}_{k-1}\Big|{y}_{k-1}\right)}}{P\left({y}_k\left|{y}_{k-1}\right.\right)} $$
The initial probabilities {P(x 0|y 0) = P(x 0), x 0 = 1,,7} may be either estimated from prior knowledge or observations, or set to the same value for all rooms. Then, in principle, the posterior probabilities {P(x k |y k ,y k − 1), x k = 1,,7} are obtained, recursively, in two stages: prediction and update, as described in (6) and (7), respectively. The final estimation of location is taken to be that of the state with the largest posterior probability:
$$ {\widehat{x}}_k=\underset{x_k}{ \arg \max }P\left({x}_k\Big|{y}_k,{y}_{k-1}\right) $$

4.2.2 Prior information

In our work, the aim is to obtain the most probable location of the device. For the laboratory site, the indoor environment was modeled as nodes and paths as shown in Fig. 8. Rooms are the nodes numbered from 1 to 7, and the corridor is split into three sections and modeled as three additional nodes numbered from 8 to 10. The edges between nodes denote feasible paths between rooms. It is desired that the Bayesian filter provides a trajectory that uses feasible paths and is consistent with the usual velocity of the target.
Fig. 8

Node and path model abstracted from the “laboratory site.” The rooms and corridor are abstracted as nodes, and the paths between nodes indicate the feasible transitions

Therefore, the state transition probability in this paper is defined as
$$ p\left({x}_k\Big|{x}_{k-1}\right)=\left\{\begin{array}{c}\hfill {p}_0,\hfill \\ {}\hfill \begin{array}{l}{p}_1,\\ {}0,\end{array}\hfill \end{array}\begin{array}{c}\hfill \mathrm{if} \operatorname {pathlength}\ \left({x}_k,{x}_{k-1}\right)=0\hfill \\ {}\hfill \begin{array}{l}\mathrm{if} \operatorname {pathlength}\kern0.5em \left({x}_k,{x}_{k-1}\right)=1\\ {}i\mathrm{f} \operatorname {pathlength}\kern0.5em \left({x}_k,{x}_{k-1}\right)>1\end{array}\hfill \end{array}\right. $$

where pathlength (i, j) is the length of path we defined from room i to room j. In this work, the lengths of all the paths that directly link two rooms are assigned the value 1. Since the time interval of data acquisition is 300 ms, considering the normal moving speed in indoor environments, p 0 is set to 0.95 and p 1 is set to 0.05 in our experiments.

The prior information for the station site is encoded similarly.

4.2.3 Observation model

The observations in our case are the decisions of the SVM classification described above. The likelihood P(y k |x k ) is the probability that the classifier assigns the target to room y k while it is actually in room x k . Given the available data, this probability can be estimated from the SVM confusion matrix whose element C ij is the number of examples that are assigned to class i while the target is actually in room j. Then,
$$ P\left({y}_k=i\left|{x}_k=j\right.\right)\approx \frac{C_{ij}}{{\displaystyle \sum_{l=1}^{10}{C}_{il}}} $$

The classifier provides the binary output of each SVM one-vs-all classifier; it can also provide probabilistic outputs, which can be used as observations as well. Using these more detailed information instead of the final decision did not improve the results described in the next sections.

4.2.4 The filtering procedure

For sequential fingerprint measurements, Bayesian filtering recursively estimates the current location as follows:
  • Making room predictions from the previous location estimation based on the node-path room layout model and their probabilities P(x k |x k − 1) from relation (9).

  • Inputting the current fingerprint measurement into SVM classifiers and obtaining the room label, which is used to calculate the observation probabilities P(y k |x k ) based on relation (10).

  • The filtering location output of a current fingerprint measurement is by updating the predictions based on observation probabilities using (7).

5 Results

5.1 Results of SVM classification

Classification results are shown in Fig. 9, where we present the percentage of correctly classified test examples as a function of time for different sizes of the training set. The test data used consists of datasets taken from December 15, 2012, to March 1, 2013. The classifier used was a set of soft-margin linear one-vs-all SVMs (Fig. 6) with regularization constant C = 0.01. Elapsed time for data training is shown in Table 2. Figure 9 shows that the more extensive the training dataset, the better the results obtained. Furthermore, when the model is trained using a large dataset over a few days, the results are stable in time and in excess of 80 % correct classification, even more than 1 month after training.
Fig. 9

Classification results based on models using different training windows. Line 1 shows the test results of a model trained on data taken over a 1-day window, while the training windows of Line 2 and Line 3 are 2 and 8 days, respectively

Table 3

Confusion matrix of classification results of line 3 in Fig. 9

True room

SVM output 1 (%)

SVM output 2 (%)

SVM output 3 (%)

SVM output 4 (%)

SVM output 5 (%)

SVM output 6 (%)

SVM output 7 (%)

























































Figure 9 also shows that after about 2 months, two sharp decreases in classification accuracy occur. These can be traced to significant RSS shifts of some GSM channels probably caused by an update or upgrade of local base stations. Figure 10 shows sudden shifts of RSS of two GSM channels (ARFCN 135 and 278) observed on February 19 and February 27. Such shifts are quite simple to detect, which suggests a simple scheme that could be introduced to render performance robust against such changes. Figure 11 shows the results of retraining the models after removing unstable channels from the RSS vectors examined by the classifier. The performance of the updated classifier is relatively stable and consistent with the more gradual performance degradation observed before the network changes occurred.
Fig. 10

RSS in channels 135 and 278 in room 1 of the “laboratory site.” In general, RSS of GSM signals are stable over time, but we also observed significant RSS shifts in some GSM channels

Fig. 11

Classification results after removing 20 unstable GSM channels, compared with the original results using all channels

In Table 3, we present the confusion matrix of test results corresponding to line 3 of Fig. 9. It can be seen that most confusions occur between adjacent rooms, and very few between rooms located on opposite sides of a corridor.

5.2 Results of Bayesian filtering

Bayesian filtering results on the tracking set are shown in Table 4, where we compare the raw results of SVM classification to the results obtained after Bayesian filtering. In this experiment, the model was trained on datasets obtained within the first 2 days, and the confusion matrix is obtained in a separate test including three corridor sections. The results presented only consider test examples actually recorded within rooms. In the actual traces, some fingerprints were in fact acquired in the corridor and thus were not taken into account, since the SVM only classifies the seven rooms. It is clear that Bayesian filtering provides a substantial improvement over raw classification results, even if the classifier is not optimally trained using a large dataset. In addition, the results are stable over time.
Table 4

Localization accuracy with and without Bayesian filtering

Test dataset

Accuracy without Bayesian filtering (%)

Accuracy with Bayesian filtering (%)

December 24, 2012



December 29, 2012



December 30, 2012



January 2, 2013



January 3, 2013



January 4, 2013



January 6, 2013



January 7, 2013



February 3, 2013



Figure 12 shows the tracking results on one test trace (December 30), showing the real position, SVM classification results, and Bayesian filtering results. Locations 1 to 7 correspond to rooms 1 to 7, while locations 8 to 10 are the sections of the corridor illustrated in Fig. 8. The Bayesian filtering method correctly tracked the moving target, except for a few mistakes in the corridor.
Fig. 12

Target tracking results of trace (December 30). The Bayesian filtering method correctly tracked the moving target, except for a few mistakes in the corridor

5.3 Results of station demonstration

In the station demonstration, a classification model was trained on the datasets recorded on May 24 and June 17 during normal passenger traffic hours. The classifier was a set of linear one-vs-all SVMs with regularization constant C = 0.01.

Our localization/tracking demonstration was conducted on June 19. The mobile phone was handheld while walking from the waiting hall of the railway station to the subway entrance and back. The location sequence of the trace is 8-7-5-4-3-2-1-2-3-4-5-6, as shown in Fig. 2. The results are shown in Fig. 13, where the SVM classification result and the Bayesian filtering result can be compared. It is seen that our localization method correctly obtained the location in the test with only a few mistakes between the adjacent location units. The Bayesian filter has corrected most of the mistakes made by the classifier.
Fig. 13

Results of station demo trace. In the station demonstration, the real positions were not recorded. Filtered results, compared with raw classification results, avoid erratic jumps from one zone to one of its neighbors

6 Conclusions

We have presented an approach for indoor localization and navigation based on RSS vectors containing very large numbers of GSM carriers, coupled with an SVM classifier and a Bayesian filter. The approach was tested on datasets recorded in different scenarios over a period of months, under realistic conditions. Data was collected using both space sampling to explore the full surface area of a room and time sampling to explore the variations of RSS over an extended time period. Experimental results show that this localization approach achieves room-level accuracy 98 % of the time. This performance is equivalent to the previous works, but in more practical scenarios and over a much longer time, demonstrating that it is accurate, stable, and practical.

Future work will involve investigating whether 3G and 4G network data or other types of variables can also be incorporated into our scans. Also, making RSS fingerprint measurement easily available is under investigation.



Global Positioning System


Radio-frequency identification


Wireless local area networks




Received signal strength


Support vector machines


Absolute radio-frequency channel number



The authors wish to acknowledge the support of the China Scholarship Council French National Railway Company (SNCF).

Authors’ Affiliations

Université Pierre et Marie Curie, Paris, France
LAETITIA/CEDRIC Laboratory, Conservatoire National des Arts et Métiers, Paris, France
Signal Processing and Machine Learning Laboratory, ESPCI ParisTech, Paris, France


  1. A Küpper, Location-Based Services: Fundamentals and Operation (John Wiley & Sons, New York, NY, USA, 2005)View ArticleGoogle Scholar
  2. AM Ladd, KE Bekris, A Rudys, LE Kavraki, DS Wallach, On the feasibility of using wireless ethernet for indoor localization. IEEE Trans. Robot. Autom. 20(3), 555–559 (2004)View ArticleGoogle Scholar
  3. Y Zhou, CL Law, YL Guan, FPS Chin, Indoor elliptical localization based on asynchronous UWB range measurement. IEEE Instrum. Meas. 6(1), 248–257 (2011)View ArticleGoogle Scholar
  4. Q Yang, SJ Pan, V Wenchen Zheng, Estimating location using Wi-Fi. IEEE Intell. Syst 23(8), 13 (2008)MATHGoogle Scholar
  5. S-H Fang, T-N Lin, P-C Lin, Location fingerprinting in a decorrelated space. IEEE Trans. Knowl. Data Eng. 20(5), 685–691 (2008)View ArticleGoogle Scholar
  6. S-H Fang, T-N Lin, Indoor location system based on discriminant-adaptive neural network in IEEE 802.11 environments. IEEE Trans. Neural Netw. 19(11), 1973–1978 (2008)View ArticleGoogle Scholar
  7. Y Kong, Y Kwon, G Park, Robust localization over obstructed interferences for inbuilding wireless applications. IEEE Trans. Consum. Electron. 55, 105–111 (2009)View ArticleGoogle Scholar
  8. S-H Hong, B-K Kim, D-S Eom, Localization algorithm in wireless sensor networks with network mobility. IEEE Trans. Consum. Electron. 55(4), 1921–1928 (2009)View ArticleGoogle Scholar
  9. S Mazuelas, A Bahillo, RM Lorenzo, P Fernandez, FA Lago, E Garcia, J Blas, EJ Abril, Robust indoor positioning provided by real-time RSSI values in unmodified WLAN networks. IEEE J. Sel. Topics Signal Process. 3(5), 821–831 (2009)View ArticleGoogle Scholar
  10. V Otsason, A Varshavsky, A LaMarca, E de Lara, Accurate GSM indoor localization. In Proceedings of the 7th International Conference on Ubiquitous Computing, pp.141-158, Tokyo, Japan, September 2005.Google Scholar
  11. W ur Rehman, E de Lara, S Saroiu, CILoS, a CDMA indoor localization system. In Proceedings of the 10th International Conference on Ubiquitous Computing, pp.21-24, Seoul, South Korea, September 2008.Google Scholar
  12. Y Oussar, I Ahriz, B Denby, G Dreyfus, Indoor localization based on cellular telephony RSSI fingerprints containing very large numbers of carriers. EURASIP J. Wirel. Commun. Netw. 2011, 81 (2011)View ArticleGoogle Scholar
  13. Y Tian, B Denby, I Ahriz, P Roussel, R Dubois, G Dreyfus, Practical indoor localization using ambient RF. In Proceedings of 2013 IEEE International Instrumentation and Measurement Technology Conference, pp.1125-1129, Minneapolis, May, 2013.Google Scholar
  14. K Whitehouse, C Karlof, D Culler, A practical evaluation of radio signal strength for ranging-based localization. ACM SIGMOBILE Mobile Computing. Commun. Rev 11(1), 41–52 (2007)View ArticleGoogle Scholar
  15. VW Zheng, EW Xiang, Q Yang, D Shen. Transferring localization models over time. In Proceedings of the 23rd National Conference on Artificial Intelligence, vol. 3, pp.1421-1426, Chicago, Jul 2008.Google Scholar
  16. Y Jin, W-S Soh, W-C Wong, Error analysis for fingerprint-based localization. IEEE Commun. Lett. 14(5), 393–395 (2010)View ArticleGoogle Scholar
  17. M Page, M Molina, J Gordon, The mobile economy 2013 (ATKearney, 2013),
  18. M Rahnema, Overview of the GSM system and protocol architecture. IEEE Commun. Mag. 31(4), 92–100 (1993)View ArticleGoogle Scholar
  19. N Cristianini, J Shawe-Taylor, Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge University Press, Cambridge, UK, 2000)View ArticleGoogle Scholar
  20. T Joachims, Making large-scale SVM learning practical, in Advances in Kernel Methods—Support Vector Learning (MIT-Press, MA, 1999)Google Scholar
  21. C-W Hsu, C-J Lin, A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13, 415–425 (2002)View ArticleGoogle Scholar
  22. AH Jazwinski, Stochastic Processes and Filtering Theory (Academic, New York, 1970)MATHGoogle Scholar
  23. A Bekkali, M Matsumoto, Bayesian sensor model for indoor localization in ubiquitous sensor network, in Proceedings of the First ITU-T Kaleidoscope Academic Conference, 2008, 2008, pp. 285–292Google Scholar
  24. CH Chao, CY Chu, AY Wu, Location-constrained particle filter human positioning and tracking system, in Proceedings of the IEEE Workshop on Signal Processing Systems, 2008, 2008, pp. 73–76View ArticleGoogle Scholar
  25. JM Bernardo, AFM Smith, Bayesian Theory, 2nd edn. (Wiley, New York, 1998)Google Scholar


© Tian et al. 2015

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.