A novel algorithm of low sampling rate GPS trajectories on map-matching

Map-matching is the process of matching the GPS locus to the road network on the digital map. However, due to the most existing map-matching algorithms that are based on high sampling rate, when the sampling interval is increased, the correct rate of the algorithm will be greatly reduced. Based on this, this paper proposed a new algorithm of map-matching for low sampling rate GPS trajectories. The algorithm gave full consideration to the road network of the geometric structure and topological structure and the mutual influence between adjacent points (time, speed information) by calculating the probability of each trajectory point of candidate points to determine matching results. At the end of this paper, we use the data of Beijing UCAR Inc.’s car in a case study. This case demonstrates: For low sampling rate matching track points in the complex road, the algorithm has a good uptime, and an exact match was found.


Introduction
As the result of maturity development of the Internet technology, the wisdom city can be built and developed quickly, including intelligent transportation infrastructure as one, which is indispensable. The construction of intelligence transportation includes several areas: vehicle navigation, traffic flow analysis, and satellite positioning which has not been intensively studied. All these mentioned application programs are based on the track. Its core steps are involved in GPS to accurately position the GPS track data of vehicles on the road, in other words, the map-matching [1].
Typical GPS track points data is a serious of sequential track points. Each GPS point consists of latitude, longitude, and timestamp information. However, on account of the limitation of the GPS itself, the sampling and measuring process of GPS data and the return or accept process of the measuring data will have possible errors, which further lead to inaccurate GPS data [2]. Therefore, the original data need to be processed and then be used on the road network, that is to say, the map-matching.

Application conditions of the algorithm
With the development of science and technology, the number of any travel navigation system has increased sharply, such as a GPS-embedded PAD and smartphone. Due to the spread of these devices, a large number of track point data can be available. But in the practice of real life, only a low sampling rate (e.g., a sample point every 2 min) of GPS can be obtained because of energy consumption, cost consumption, and so on. For example, there are more than 60,000 taxis in Beijing, and most of them are equipped with GPS [3,4]. Usually, the taxi drivers drive on the road. In order to save energy consumption, the time intervals of their passing the GPS point is bound to increase, which leads to lower sampling rate of GPS track data [5,6].
However, at present, the algorithm of map-matching is only for processing GPS data with high sampling rate (usually 10~30 s every one track point) [7]. When they use points with low sampling rate as their data, the matching error is over 50% [8,9]. Therefore, in view of the track point with a low sampling point, the paper will put forward an improved algorithm of map-matching. Besides, the low sampling rate here means collecting one track point every 1.5 min and more [10,11].

The algorithm design
Map-matching system of GPS navigation based on a low sampling rate consists of four parts: the preparation of candidate point, the analysis with the time factors, the analysis with the spatial factors, and the result matching.

The preparation of candidate point
The algorithm will give full consideration to the geometric structure of road network, so as to calculate the candidate point of the track point. It needs two steps to achieve this goal. Firstly, we have to find out the possible section of the track point, in other words, the candidate sections. Secondly, we have to calculate the candidate points in the section by making use of the point-to-curve in the present geometric map-matching algorithm.
1. Selection of candidate sections: To find track points possible in sections, complete algorithm can be designed to allow each locus point for the entire network of roads to traverse, but this approach will lead to too much time complexity. Therefore, the algorithm must narrow the range of the segment to be compared. Existing algorithms use the error oval method in probability theory to narrow the comparison range of the road. However, the disadvantage of this method is that, it is very likely that there is no road node in the error ellipse, and people mistakenly believe that there is no candidate road segment. Therefore, this paper proposed a GeoHash algorithm to implement this step: through a certain rule [12,13], a string to represent the latitude and two longitude coordinates. 2. Calculating candidate points: After acquiring the perimeter of each GPS point, the algorithm calculates the candidate points of the GPS point on each link. From the track point to the section of the vertical line, if the foot points on the road section, the foot-point is the candidate point; if the foot-down point is not on the road, then select the nearest road segment endpoint as a candidate point. 3. Filtering of candidate points: You may encounter the following problems during the last part of the search for candidate points: GeoHash algorithm can be found out by more than one road. Based on the existing map-matching algorithm geometry, it shows that if the distance between the candidate points and the locus of points are closer, the greater the likelihood of the candidate point is the best match point. Based on this, the algorithm stores only the first five candidate points, and the segments where the distance between the candidate points and the track points is the smallest in the database.

Spatial analysis
In this step, we need to make the most of the geometry and topology information of road network to evaluate the candidate point getting from the first step. In this paper, geometry information is represented by the observation density, and topology information is represented by transmission probability. Based on this algorithm, two hypotheses are proposed: 1. Vehicles tend to run straight on the same road rather than bypass: Consider a taxi's trajectory in Fig. 1. A taxi on the horizontal road e 1 from west to east, where P a , P b , and P c for the vehicle to move back in the process of three GPS location trajectory points. According to most of the current map-matching algorithm, the vertical distance of the path e 2 from the point P b to the vertical direction is smaller than the foot-up distance of the path e 1 from the point P b to the horizontal direction. Thus, the point P b is matched to e 2 . But with the P b point before and after the two points P a and P c movement trends, it can be speculated that the taxi is not likely to take a roundabout way, that is, starting from e 1 to e 2 and then back to e 1 . This example means that if the algorithm combines the location of the trajectory with the topology information of the road network, a better matching effect may be obtained. 2. The speed of the vehicle is often limited by the maximum speed of the road on which it travels: Consider a taxi's trajectory in Fig. 2, that is, traveling from south to north. During this time, the taxi returns a GPS track with two GPS locations which are P a and P b . If there is no velocity information for vehicle travel, it is almost impossible to distinguish whether the two track points belong to the highway However, if the average speed of the travel process is calculated to be 80 km/h using the distances and timestamps of points P a and P b , it is entirely possible to determine that these two GPS trajectory points are likely to be on the highway e 1 . This example illustrates that making good use of the time information and vehicle speed information can effectively improve the map-matching accuracy [14].

Time analysis
In most cases, the algorithm can find out the best candidate point through spatial analysis, so that choosing the true path P i among the candidate paths { P i − 1 , P i … P n } will be possible. However, there is a special kind of situation that cannot be solved by the spatial analysis, as shown in Fig. 3. In this figure, a thick yellow line stands for highway and a thin blue line represents the common roads. The two roads are very close, so if we use spatial analysis to calculate the candidate point of P i − 1 and P i , the results of the algorithm of two roads may be same. But if the average speed from P i − 1 to P i is 85 km/h, the two track points can be matched on the highway because of road speed limits. Therefore, it needs the time analysis of track points.
Firstly, the algorithm needs to calculate the average speed v from P i − 1 to P i ; the formula is as follows: The candidate point of P i − 1 is C i − 1 , the candidate point of P i is C i , and the shortest path from C i − 1 to C i is a series of sections [ e 1 ', e 2 '. .. e k ']. Therefore, the l u = e u '. l in the formula, that is, l u , is the length of e u '; the member means the shortest length from C i − 1 to C i ; and the denominator Δt i − 1 → i means the time intervals from P i − 1 to P i .
The algorithm thinks that every section e u ' in the road network has its own speed constraint value e u '. v. This paper will use cosine calculation to describe the similarity between the average speed from C i − 1 to C i and the section constraint value e u '. v. Therefore, the time analysis using time and speed information is defined as follows:

The result matching
After spatial matching, the paper can find out a candidate graph G ' (V ', T ') to its given track sequence T : P 1 → P 2 → … → P n . V ' is the candidate point of the track point; T ' is the side represented by the shortest path between two adjacent candidate points.
Every candidate point in the candidate graph G ' can be described by N C s i À Á . Every side can be described by To sum up, the paper can define the probability function of the map-matching as follows: At last, we can get a candidate path collection from the whole track (T)-P c P : C S 1 1 →C S 2 2 →…→C S n n . If to calculate each path's value F(p c ), ,the maximum will be the final matching path.

The experimental results and analysis of the algorithm
Based on the algorithm proposed in this paper, using the true data of road network and track network, the    trajectory remains unchanged and the number of candidate points were used by changing only the trace points to analyze the impact of its running time; it was found that with the increase in the number of candidate points, with the increase in the running time, and when the number of candidate points is over 5, the running time surges. If the number of candidate point values is too small, it will lead to lower matching accuracy. Therefore, in this algorithm, the number of candidate points that the trajectory point has at most is determined as 5 by combining the running time and the matching precision.

Conclusions
In view of the limitation of energy and resources in real life, the actual sampling intervals of getting GPS trajectories is very large, while the existing map-matching algorithms are aimed at a high sampling rate. Therefore, this paper proposed a special map-matching algorithm aiming at GPS trajectories with a low sampling rate. Probabilistic matching is taken as the core idea, and the probabilistic calculation of each trajectory point is carried out considering the road network geometry (observation probability), the topological structure (transmission probability), and the time speed information of vehicles (spatial analysis). The best matching point is determined by calculating the result. Finally, the paper verifies the matching precision of algorithm and time complexity through experimental analysis with actual data. This method can be applied to the map navigation system as a supplement. When calculating the shortest path adjacent track points between the candidate points, for convenience, we use Dirjkstra algorithm. But for the shortest path calculation, Dirjkstra algorithm is not the most effective way when the algorithm uses A* algorithm or ATL algorithm, which can often improve the running time of the algorithm. Therefore, the algorithm can also do some improvements for this part of the algorithm in most path calculations.