An improved k-NN algorithm for localization in multipath environments

To improve the localization accuracy in multipath environments, this paper presents an effective localization approach with the utilization of reference tags. In this approach, an improved k-nearest neighbor (k-NN) algorithm is proposed based on radio-frequency (RF) phases. The traditional k-NN algorithm only focuses on the weighting factors of the coordinates of the selected reference tags, while the improved k-NN algorithm aims at the estimation of direct distance from a reader antenna to a target tag. Then, the location is estimated by linear least squares with a new reference selection scheme. Simulation results show that our approach is superior to the traditional localization approaches under multipath environments. In addition, we conclude that phase has the superiority over strength in the selection of reference tags for range estimation, and range estimation is more accurate than coordinate estimation with k-NN algorithm for localization.


Introduction
Indoor location awareness is a hot research subject in many wireless systems due to its capability to provide a wide range of location-based services.Due to the multipath propagation channel in indoor environments, it is hard to achieve a high accuracy in localization which is urgently needed for both commercial and industrial interests.Many technologies and systems have been developed for indoor localization, and one of the most popular technologies is the ultrahigh frequency (UHF) radiofrequency identification (RFID) technology.UHF RFID signals can be active or passive based on whether the RFID tag has an internal source of energy or not.In this paper, we focus on passive RFID tags for their less cost, long lifetime, and maintenance-free characteristics.
Existing range measurement techniques for RFID-based application include the time of arrival (TOA) technique [1][2][3][4], the time difference of arrival (TDOA) technique [5,6], the phase of arrival (POA) technique [7][8][9][10][11][12][13], and the received signal strength indicators (RSSI) technique [14,15].The TOA technique measures the one-way propagation time, and the distance between the transmitter and the receiver can be calculated using the propagation velocity of the signal.But, a precise synchronization of all transmitters and receivers is required to guarantee the correct localization estimation.The TDOA technique does not require the synchronization of the transmitter and receiver and uses the time difference of a transmitted signal at the receivers, so it requires that the reference receiver subtracted be precise.The POA technique can measure the complete propagation distance of the signal between a transmitter and receiver but has a problem of having whole-cycle phase ambiguities in the calculation.The RSSI technique uses the attenuation of the sent signal to estimate the distance between the transmitter and the receiver.Though this technique is less complex than other techniques, it suffers from poor accuracy because the signal strength can be easily affected by various environment factors, such as multipath.So, the impact of the wireless channel on the received signals [16] is significant.In a radar sensor network, information diversity has been utilized for the improvement of detection accuracy [17][18][19] and reliable communications [20].Based on the above-mentioned range measurement techniques, the target location can be estimated by using the geometric properties of triangles [21].
Due to the inadequate accuracy of range measurement techniques, localization algorithms using scene analysis is developed to improve the overall accuracy.As an effective scene analysis localization algorithm, the knearest neighbor (k-NN) [22] algorithm is proposed by utilizing reference tags.This algorithm estimates the target localization by using the coordinates of the reference tags which are closest to the target tag with RSSI Euclidean distances.Since severe multipath and interference effects can easily affect signal strength, a distant reference may be selected as the candidate tag, significantly lowering the localization accuracy [23].Even so, strength is still commonly used in scene analysis algorithm.In recent years, another way to compute the location is proposed by estimating the distance from a reader antenna to the target tag with the help of selected reference tags just as range measurement techniques do [24][25][26].Therefore, we consider that a similar measurement value means a similar distance between a pair of tag-and-reader antenna, not only the closest coordinate position in space.
In this paper, we propose an indoor localization approach following the concept of reference tags in the k-NN algorithm which is a further study on the localization approach proposed in [24].We substitute phase for strength in the Euclidean distance calculation just as the approach proposed in [24]; however, we make a lot of improvement.
This paper is organized as follows.In Section 2, the radio-frequency (RF) phase is measured with a singlefrequency subcarrier amplitude modulation.In Section 3, an improved k-NN algorithm is proposed to estimate the direct distance between a reader antenna and the target tag, and the localization is estimated by linear least squares algorithm with a new reference selection scheme.Simulative results are illustrated in Section 4, and the improvement in localization accuracy is shown to prove the effectiveness of the proposed approach.In the end, the conclusion is presented in Section 5.

RF Phase
Assuming that all reader antennas emit pure cosine signals, we can utilize the phase of arrival technique to measure distance which does not exceed half of the signal wavelength.Since the UHF RFID signal wavelength is tens of centimeters, the actual distance is a few meters or even dozens of meters between a reader and a tag.So, if we utilize the extracted phases of UHF RFID signals directly, there will be a whole-cycle carrier phase ambiguities problem in distance estimation.To avoid this problem, we should use a low-frequency signal with a longer wavelength to measure the phase accumulated during the signal travel from the reader to the tag and back.

Traditional RF-phase-based ranging algorithm
For the purpose of eliminating the whole-cycle carrier phase ambiguities, the frequency of the signal for phase extraction should be low.In other words, the round trip distance should be shorter than the wavelength of the signal.Meanwhile, we also need an extra energy signal to start up the passive tag.In summary, we should use a low-frequency signal to measure the phases and a highfrequency signal to support energy simultaneously in the whole communication process.With the concept of subcarrier amplitude modulation in [27], we utilize a singlefrequency subcarrier in the communication process, where the low-frequency signal is taken as the subcarrier signal and the high-frequency signal is the carrier signal.The transmitted signal is expressed as Equation 1 and the received signal expressed as Equation 2.
where ω s stands for the angular frequency of the subcarrier; φ s is the original phase of the subcarrier; ω c stands for the angular frequency of the carrier; θ s is the original phase of the carrier; A and A′ are the amplitudes of the carrier in transmitting and receiving, respectively; m a denotes the modulation index; and n(t) is the addictive noise of the wireless channel.φ r and θ r are the phases of the received subcarrier and the carrier, respectively.m(t) is the baseband signal of the tag.The zero-IF scheme is a way to realize the down conversion of the received signal.This scheme needs neither modification of the current structure of the RFID readers nor extra hardware, which reduces the cost to a great extent than the sub-Nyquist sampling used in [24].In this scheme, the received signal r(n) will be converted in an in-phase/quadrature (I/Q) demodulator, which splits the received signal into two parts by mixing it with the local oscillator signal cos(ω c t + φ) and the local oscillator signal shift by 90°, respectively, as in Equations 3 and 4.
The two I and Q component signals are filtered to remove the high-frequency signal components, leaving behind the subcarrier signal containing the phase of signal as in Equations 5 and 6.Then, we use the sampling frequency f s to sample the signals and extract the subcarrier phase information in the digital domain with the all-phase FFT [28].
So, the subcarrier phase information extracted can be expressed as In the end, the calculated propagation distance can be written as where c is the speed of the electromagnetic wave propagation.

RF Phase analysis
As we know, the phase of the received signal in real propagation environments can be expressed as where φ prop stands for the phase accumulated during the round direct trip between a reader antenna and a tag antenna; φ m is the phase offset of the multipath effect and the reflection from the obstacles; φ o represents phase shifts of the reader and antenna components, as well as cables; and φ b is the backscattered phase of the tag modulation [29].
For accurate localization estimation, we should only extract the φ prop component for distance calculation in Equation 10.As the other phase error components cannot be eliminated completely, the exact phase for measuring propagation distance is a tough task.
In our paper, we introduce the reference tags just like [22,26] but substitute phase for strength in the reference tag selection.The positions of reference tags and reader antennas are known beforehand which means that the direct distance between them can be calculated directly.As the reference tags are involved in the same environment as the tags to be located, the direct distances between them can be used to improve localization accuracy.

Traditional k-NN algorithm
The LocAtioN iDentification based on dynaMic Active Rfid Calibration (LANDMARC) [22] system uses the k-NN algorithm with reference tags to estimate the locations of target tags.Let N be the number of reader antennas and M be the number of reference tags.The signal strength vector of the ith target tag is defined as TS i = (ts (i,1) , ts (i,2) ,…, ts (i,N) ) where ts (i,j) denotes the signal strength of the ith target tag perceived by the jth reader antenna, where j ∈ (1, N).For the mth reference tag, the corresponding strength vector is RS m = (rs (m,1) , rs (m,2) ,…, rs (m,N) ).The Euclidean distance in signal strength between the ith target tag and mth reference tag is defined as So the ith target tag has its strength Euclidean distance vector e i as After sorting e i in ascending order, the vector can be defined as Then, according to the k-NN algorithm, the first K values can make a new Euclidean distance vector . The weighting factor for each selected reference tag can be calculated by In the last, the estimated location of the ith target tag is given by where ðx is the estimated coordinate of the ith target tag, and (x k , y k , z k ) denotes the coordinate of the kth selected reference tag.

Phase-based improved k-NN algorithm
In this paper, we propose a phase-based improved k-NN algorithm (Phase (improved k-NN)).The phase vector of the ith target tag is defined as TP = (tp (i,1) , tp (i,2) ,…, tp (i,N) ), where tp (i,j) denotes the phase of the ith target tag perceived by the jth reader antenna, where j ∈ (1, N).And, we define the phase of the reference tags perceived by the jth reader antenna as RP j = (rp (1,j) , rp (2,j) ,…, rp (M,j) ), where rp (m,j) denotes the phase of the mth reference tag perceived by the jth reader antenna, where m ∈ (1, M).The phase Euclidean distance between the ith target tag and mth reference tag about the jth reader antenna is defined as An important difference in our algorithm is that we select the K nearest reference tags by calculating the difference between the target tag and the reference tags about each one of the reader antennas as Equation 16, not the sum of all those reader antennas as Equation 11.So, the phase Euclidean distance vector ep j about the jth reader antenna is defined as ep j = (ep (1,j) , ep (2,j) ,…, ep (M,j) ).We also sort ep j in ascending order and select the first K values, i.e., EP j = (EP (1,j) , EP (2,j) ,…, EP (K,j) ).As we know, the noise is random, we repeat the above calculation for Q times and record the selected reference tag numbers and its corresponding EP (k,j) each time.Then, we get a list of the number of the selected reference tag and calculate the frequency of selection for each reference tag number, then select the first K reference tags with the highest frequency as the final selected reference tags, i.e., f 1 , f 2 ,…, f K .Meanwhile, we compute the averaged phase Euclidean distance of these selected reference tags, i.e., ave_EP (k,j) .The weighting factor for each neighboring reference tag is calculated by When we get the weighting factor for each selected reference tag, we use these weighting factors to estimate the direct distance d j ∧ between ith target tag and the jth reader antenna.
where Rd (k,j) is the direct distance between kth selected reference tag and the jth reader antenna, and as the reference tags are fixed, Rd (k,j) is known beforehand.And, the ranging error is given by where d j is the real distance of the target tag.
The reason that we prefer the direct distance between the reader antenna and reference tags to the coordinate of the reference tags can be explained in Figure 1.We utilize one reader antenna as an example; the transmitted signal is sent via the downlink channel (marked as a solid line) to the tag and returned over the uplink channel (marked as a dotted line).The square stands for the target tag and the triangles (R1, R2, R3, and R4) stand for the reference tags.As the R1, R2, and R3 reference tags are surrounding the target tag, they are likely to have some similarities in the measurement informations.Reference R4 has the same direct distance as the target tag from the reader antenna.In multipath propagation environments, the measurement of reference R4 may also be similar with the target tag.If we set the k value in the k-NN algorithm as 4, the selected reference tags will be R1, R2, R3, and R4.If we still use the coordinates of the selected tags, the localization will be estimated with considerable error.On the contrary, their direct distances are approximately equal.So, we think range estimation is more appropriate for localization than coordinate estimation.

Localization algorithm
The distance d j between ith target tag and the jth reader antenna defines a spherical surface around the jth reader antenna, and the localization equation is nonlinear as Equation 19.
where (x i , y i ) is the ith target tag localization and (x j , y j ) is the jth reader antenna localization.We can make Equation 20linear by fixing the expression of the rth reader antenna, i.e., the reference reader antenna, subtracting it from the rest of the equations.As for the selection of the rth reader antenna, we propose a reference reader antenna selection scheme.Firstly, we calculate the distance rd (m,j) between the mth reference tag and jth reader antenna by Equation 8with the Q-averaged reference tag phases.Next, we compute the estimation error of each reader antenna as The one with the least estimation error is marked as the rth reader antenna.Then, we can get the final localization of the ith target tag by utilizing the linear least squares (LLS) algorithm.In [30], the reference selection is given by comparing the distances from the target tag to reader antennas, so we name it LLS-RS-TT.
where d ∧ j is the result of Equation 18.While the reference selection scheme proposed in our paper is processed by comparing the distances from the reference tags to reader antennas, we name it LLS-RS-RT.
In the end, the location estimation error e i is given by where ðx and (x i , y i , z i ) denote the estimated coordinate and the real coordinate of the ith target tag, respectively.

Comparison with other localization approaches
To investigate the performance of the approach Phase (improved k-NN) + LLS-RS-RT proposed in our paper, we simulate some other localization approaches in the same scenario of this paper.These comparison approaches are as follows: 1) Strength (k-NN): It is processed just as the LANDMARC [17] system is processed.And for comparison, the localization results of this approach are averaged for Q times, as we can get a result per simulation time during the Q times.Location is estimated by LLS-RS-TT.4) Phase (1-NN) + LLS-RS-TT: In [24], we use the 1-nearest neighbor (1-NN) to estimate the direct distance from the target tag to the reader antenna and utilize LLS-RS-TT to calculate the location.For comparison, we average the localization results of each target tag for Q times.5) Multi-frequency + LLS-RS-TT: Here we use the method FM2 with multi-frequency in [31] to calculate the tag distance and estimate the location by LLS-RS-TT.6) Phase (improved k-NN) + LLS-RS-TT: It has the same ranging procedure of Phase (improved k-NN) + LLS-RS-RT except the location is estimated by LLS-RS-TT.

Simulation
We evaluate the performance of Phase (improved k-NN) + LLS-RS-RT and compare it with the other five approaches explained in Section 3.4.It indicates that our approach can improve the accuracy significantly under multipath environments and is superior to the other approaches.

Simulation setup
These simulations are implemented in MATLAB language and tested on a PC with an Intel Core i3-3220 CPU of 3.30 GHz and DDR3 SDRAM of 4 GB.Here, we use eight reader antennas placed around the eight corners of an indoor scene which is modeled in a space of 4 × 4 × 4 m 3 .As for the reference tags, we choose 13 reference tags which are numbered and arranged as Figure 2, with a height of 2 m.The carrier frequency is 910 MHz, and the subcarrier frequency is 2 MHz.The Q value in the improved k-NN algorithm is 100, and the K value is set to 4 as a matter of experience.We set 1,000 target tags randomly in the scenario and each with a height of 2 m for space average.Because the Q factor used in the improved k-NN algorithm is like some kind of time average, we should also simulate the comparison approaches without using the improved k-NN algorithm for Q times and average the results for comparison.In this paper, the root mean square error (RMSE) evaluation factor is used as the rule of localization accuracy.
And, we express it as where E denotes the averaging operator for the 1,000 target tags, ðx denotes the averaged estimated position of the ith tag for Q iterations, and (x i , y i , z) denotes the real position of the ith tag.So, in our simulation, RMSE is the average value in both space and time.The smaller the value of RMSE is, the higher the localization accuracy is.
The RFID channel contains a downlink channel and an uplink channel at one time.In our simulation, each of these two channels is modeled as a linear time variant filter channel.The corresponding channel pulse response is contains a direct line-of-sight (LOS) path with a time delay of τ los and path gain of a los and a sum of P nonline-of-sight (NLOS) paths of delay τ nlos,i and path gain of a nlos,i .The path gain is modeled by path-loss and lognormal shadowing, and the expressions of reflection coefficients in [32] are also considered for the calculation of path gain of each NLOS path.Then, the complex path gain model of the deterministic multipath channel between the reader and the tag is given by [8].Here, we suppose there are four propagation paths under the multipath environment, including the direct ray, the two wall reflection paths, and the ground reflection path, as shown in Figure 3.For simplicity, we just consider the case that the signal propagates back just as the way it sends.Thus, the received RF signal can be expressed where x(t) denotes the sending signals and h d and h u are channel pulse responses of downlink and uplink channels, respectively, and are equal in our simulation.n(t) is the additive Gaussian noise.

Effect of the factor Q
In our approach, the location of each target tag is estimated by Q iterations.So, Q is a key factor for the improvement of performance in localization.But, the number of iterations will directly influence the running time of calculation.So, we should make a compromise between the accuracy and the time complexity.shows the accuracy comparison between different Q values.From the simulation results, we find that the location estimation error will decrease with the increase of the Q value.But when the Q value is larger than 100, there is no obvious increase in accuracy.So, 100 is an appropriate value for factor Q in our following simulations.

Ranging accuracy
In this section, the averaged ranging error is calculated by averaging the errors of all target tags.Figure 5 shows the ranging performance comparison between different approaches.From the figure, we can see that our algorithm is superior to the other three approaches.When the signal-to-noise ratio (SNR) value is −20 dB, the error of Phase (improved k-NN) is almost the same as the result of Strength (improved k-NN), but our algorithm is much better under higher SNR values.The results indicate that phase is an excellent measurement indicator for range estimation in k-NN algorithm.

Localization accuracy
In this section, averaged location estimation error is shown as the RMSE changing with the SNR values.As shown in Figure 6, the performance of the Multifrequency + LLS-RS-TT is the worst than other approaches under low SNR values.Phase (1-NN) + LLS-RS-TT does not improve the accuracy effectively when the SNR value increases, so the ranging procedure needs to be revised.Phase (k-NN) and Strength (k-NN) have the same performance when the SNR value is low, but Strength (k-NN) works much better than Phase (k-NN) in higher SNR values.So, we can conclude that strength is more suitable for non-ranging algorithms than phase.Strength (improved k-NN) + LLS-RS-TT is better than the above four approaches but it is still worse than Phase (improved k-NN) + LLS-RS-RT and Phase (improved k-NN) + LLS-RS-TT.It means that phase has the superiority over strength in the selection of the reference tags for range measurement, and range estimation is more accurate than coordinate estimation with k-NN algorithm for localization.The accuracy of Phase (improved k-NN) + LLS-RS-RT is superior to Phase (improved k-NN) + LLS-RS-TT under low SNR environments and almost equal when the SNR is higher.So, we think LLS-RS-RT is a better reference selection scheme in LLS algorithm with reference tags, and LLS-RS-TT is also an effective selection criterion in approaches without reference tags.In addition, the localization accuracy of our approach does not change so obviously among the range of the SNR values.So, our approach is robust to environment interferences.Figure 7 illustrates the cumulative distribution function (CDF) of the localization accuracy for different approaches under two different SNR values.Figure 7a shows the CDF results when the SNR is −20 dB.Based on the statistics, it can be seen that the 80 percentile has a localization estimation error under 0.78 m and the 90 percentile is under 0.90 m in Phase (improved k-NN) + LLS-RS-RT which is the approach proposed in our paper.The performance of Phase (improved k-NN) + LLS-RS-TT is 0.93 and 1.04 m, respectively.Then, the results of the other approaches are much more than 1 m corresponding to the two percentiles.Figure 7b illustrates the corresponding results when the SNR is 0 dB.The curves of the CDF for Phase (improved k-NN) + LLS-RS-RT and Phase (improved k-NN) + LLS-RS-TT overlap almost completely.With the improvement of the SNR value, the difference between the other approaches is more obvious.The error range of Phase (k-NN) is between 0.44 and 2.87 m.We think phase is more appropriate for ranging algorithms.

Cost analysis
We analyze the cost of different approaches compared in our paper.From the description of each approach, we know that Phase (improved k-NN) + LLS-RS-RT, Phase (improved k-NN) + LLS-RS-TT, and Strength (improved k-NN) + LLS-RS-TT all finish the location estimation of each target tag after Q iterations, while the other approaches can get the final results at one  time.Considering that the Q-times averaged results should be utilized to compare the performance of different approaches, we can find that the time complexity of each approach is almost the same.And, we also record the running time of each approach in Table 1.From the results in Table 1, we think the time complexity of each approach is of the same order.
In reference to space complexity, the size of Phase (improved k-NN) + LLS-RS-RT approximately equals to QNM × 64 bits ≈ 0.66 Mbits.So, the cost of Phase (improved k-NN) + LLS-RS-RT is far lower than 50 Mbits which is the standard maximum on-chip memory size of SRAM; we can consider memory costs are the same among these approaches.
As the averaged accuracy performance of Phase (improved k-NN) + LLS-RS-RT is significantly better than the other approaches, considering the current hardware technology, performance of our approach is superior to the other approaches.

Conclusions
In this paper, we use the RF phase to evaluate the similarity of the reference tags in an improved k-NN algorithm for range estimation.And the location is estimated by linear least squares with a new reference selection scheme.To validate the performance of the approach proposed in our paper, we compare it with six different approaches.Simulation results show that our approach is superior to the traditional localization approaches under multipath environments.We also find that phase can provide a more effective selection of the reference tags than strength in ranging estimation.And, ranging estimation is more accurate than coordinate estimation with k-NN algorithm for localization.Considering the current hardware technology, the performance of our approach is superior to the other approaches.Our future work is to study the effects of the placement of the reference tags and the tag interaction on the phase measurement accuracy and develop more effective localization algorithms in the NLOS environments.

2 )
Phase (k-NN): It has the same procedure of Strength (k-NN), but it utilizes the phase instead of the strength of the signals to compute the Euclidean distance.The final results are averaged for Q times.3) Strength (improved k-NN) + LLS-RS-TT: It calculates the strength Euclidean distance by Equation 16 like calculating the phase Euclidean distance and estimates the direct distances from reader antennas and the target tag by Equations 17 and 18.

Figure 1
Figure1The analysis of improved k-NN algorithm.

Figure 2
Figure2The arrangement of reference tags from the top view.

Figure 3
Figure 3 The multipath propagation scene from the top view.

Figure 4
Figure 4 Effect of the factor Q.

Figure 6
Figure 6 RMSE versus SNR values for the different approaches.

Figure 7
Figure 7 CDF of localization accuracy for different approaches when (a) SNR is −20 dB, (b) SNR is 0 dB.

Table 1
Running time