In this section, we firstly explain a grid partition approach for real-time computation of facility influence and then we address two issues of this approach: ambiguous grid and grid size. Finally, we design a fast computation algorithm to calculate facility influence.
Partition of region
Given N facilities, calculating the nearest distance for one object requires N calculations. The computation overhead increases with the number of both objects and facilities. Assume we have N facilities and M objects, Θ(M
N) distance calculations are needed in order to get influence values for all facilities. The computation overhead is exacerbated if vehicles are moving and update locations fastly. Obviously, it is unnecessary to calculate the distance for every pair of facility f and object p at every time stamp.
More specifically, we partition a geographic region into equal-size grids and provide an index for each grid. Since different facilities have different scope area, the number of grids covering different facilities varies. If a grid is inside the scope of a facility, the value that denotes the influence of this grid on the facility is stored. This value is simply the total number of objects inside this grid. With this approach, we do not need to calculate the distances to access all N facilities to discover the shortest one (whose time complexity is Θ(N)); instead, mapping the index to the value incurs much less computation overhead (whose time complexity is (Θ(1)).
Let us use an example (Fig. 2) to walk through this grid-based approach. Two facilities f
1 and f
2 are in the region in Fig. 2
a. Each has an associated scope and each has multiple objects around. The cell influence view (Fig. 2
b) shows that every grid cell has a nearest facility, and the objects view Fig. 2
c shows objects in every cell. We can overlay these two views and get all facilities’ influence very quickly (Fig. 2
d).
Although the grid-based approach seems simple, two practical issues need to be addressed. First, ambiguous grid may lead to inaccuracy in calculation of facility influence: some grids are at the intersection of ranges of more than one facility. Second, the choice of the grid size significantly affects the overhead of computing facility influence. We next address these two practical issues.
Ambiguous grid
We define two types of grid cells (Fig. 3).
-
Unique grid. A grid cell g is a unique cell if there exists a facility f, for every point pt in g such that NF(pt)=f, where NF(·) outputs the nearest facility.
-
Ambiguous grid. A grid cell g is an ambiguous cell if there exist two different points p
t
1≠p
t
2 in g, such that NF(pt
1)≠NF(pt
2).
The following theorem provides a method to classify a cell being unique or ambiguous.
Theorem
1.
Let p
t
i
(i=1,2,3,4) be grid g’s four vertices. If there exists a facility f such that for all points pt in g, N
F(p
t)=f, then g is a unique cell.
Proof.
Assume p
t
1 is the farthest vertex from facility f, then for every point pt in g, d(pt,f)≤d(p
t
1,f), where d(x,y) is the Euclidean distance. Since N
F(p
t
1)=f, we can get N
F(p
t)=f. Hence, g is a unique cell.
Due to the existence of ambiguous cells, we use approximate influence instead of accurate facility influence. For every facility f in \(\mathcal {F}\), the approximate influence of facility f is defined as follows:
$$ \hat I_{\mathcal{P}}(\,f)=\sum_{p\in \mathcal{P}} \text{Pr}(\text{NF}(g)=f | g=H(p)) $$
((3))
where NF(g) denotes the nearest facility of grid cell g, H(p) outputs the cell that p is mapped to, and Pr(·) is the probability.
Since every object can be mapped to a cell, we separate the objects set \(\mathcal {P}\) into two subsets: one is unique cell set \(\mathcal {P'}\) and the other is ambiguous cell set \(\mathcal {P''}\). If the mapped cell g is a unique cell, then Pr(NF(g)=f|g=H(p))=1; when the mapped cell g is an ambiguous cell, we design three methods to compute Pr(NF(g)=f|g=H(p)).
Case 1 (Centering): When object p is mapped to grid g, we use the center of g (denoted by c) representing grid g to compute the probability.
$$ \mathrm{Pr(NF}(g)=f | g=H(p)) = 1, \text{if} \text{NN}(c)=f $$
Case 2 (Sampling): We randomly choose several points in a grid and thus compute every sampling point’s nearest facility in \(\mathcal {F}\). The sampling points are uniformly distributed.
$$ \mathrm{Pr(NF}(g)=f | g=H(p)) =\frac{|\{s|s \in \mathcal{S}_{g}\wedge \text{NN}(s)=f\}|}{|\mathcal{S}_{g}|} $$
where s is a sampling point and \(\mathcal {S}_{g}\) is the set of sampling points.
Case 3 (Area): We partition a grid using the Voronoi diagram, then we compute the area of each part and use the proportion related to facility f as the probability. Suppose all points in R
1 are in f, then
$$ \mathrm{Pr(NF}(g)=f | g=H(p)) =\frac{\text{Area}(R_{1})}{\text{Area}(g)} $$
Centering is the simplest scheme as it uses center point to represent a grid and finding the nearest facility of a point is easy. However, it is too coarse-grained because it increases the inaccuracy if one object does not lie in the same region as the center. If locations mapped to a grid follow uniform distribution, sampling and area schemes are recommended. The area scheme is a generalization of the sampling scheme and can use the information of Voronoi diagram [19] generated by facility set \(\mathcal {F}\).
A natural question now arises: what is the difference between a facility’s actual influence and approximate influence? If we assume objects follow uniform distribution, then we have the following theorem.
Theorem
2.
Given a region R, facility set \(\mathcal {F}\), object set \(\mathcal {P}\), and a grid partition with grid size l, for a facility \(f \in \mathcal {F}\), we have
$${\lim}_{l \to 0}{\hat I_{\mathcal{P}}(\,f)}=I_{\mathcal{P}}(\,f) $$
Proof.
Set \(\mathcal {P}\)consists of two subsets: a unique cell set \(\mathcal {P'}\) and an ambiguous cell set \(\mathcal {P''}\). When l gets smaller, the number of objects in \(\mathcal {P'}\) decreases and thus the ambiguity of influence of facility f will decrease. Therefore, there exits a small l
∗, such that for all l<l
∗, \(\mathcal {P''}=\Phi \). Thus, we have
$$\begin{array}{*{20}l} \hat I_{\mathcal{P}}(f)&=\sum_{p\in \mathcal{P'} \cup \mathcal{P^{\prime \prime}}} \mathrm{Pr(NN}(g)=f | g=H(p)) \\ &={\lim}_{l \to 0} \sum_{p\in \mathcal{P'}} \mathrm{Pr(NN}(g)=f | g=H(p))\\ &=I_{\mathcal{P}}(\,f) \end{array} $$
Theorem 2 shows that if l is small, the approximate influence of a facility is very close to its actual influence. We can use \(\hat I_{\mathcal {P}}(\,f)\) to approximately estimate \(I_{\mathcal {P}}(\,f)\) if the grid size is not too big. In practical, we also shall not set the grid size too small. We demonstrate the reason below.
Grid size
Grid size l has an impact to facility influence on three aspects: memory size requirements, computing overhead, and accuracy. We next analyze these three factors and discuss how to determine a suitable grid size.
Memory Intuitively, smaller grid size makes a larger grid count in a certain geographic region. For instance, assume Beijing has an area of S (nearly to 16,800 km2), the grid count is equal to S/l
2 where l is the grid length. If one grid only stores one value, i.e., facility influence (assume 32-bit), then the total memory needed is equal to (S/l
2)∗32 bits. We generally do not set grid size too small considering memory usage.
Computing overhead. Given N facilities and M objects, each object p falls into one grid, so overhead of computing facility influence is Θ(1) for this object-facility pair. The total computing overhead is Θ(M), which is irrelative with grid size.
Accuracy If we set l larger, we have fewer unique cells and more ambiguous cells, which will decrease the accuracy of facility influence. The principle is not to set l too large. Empirically, we set l to be average distance of adjacent facilities. In the evaluation section, we would discuss the grid size impacting over facility influence.
Facility influence computation
After the entire geographic region is divided into cells, we can get a mapping between cells and facilities. Given an object’s location, the grid index indicates which cell the object should be mapped. The corresponding nearest facility and its probability can be retrieved. Therefore, it is unnecessary to compute distance between all facilities and the object to determine the nearest facility. Our iterative process works as shown in Algorithm 1. When an object’s location is updated (i.e., changed), we use grid index to get a new grid cell ID. If the current cell ID is not equal to the previous one, we extract the new nearest facility ID which is stored in the grid data structure. We then compare the current facility ID with the previous one. If they are equal, we just skip and do nothing. Otherwise, we plus facility influence value into current facility and minus it from the previous one. Then, we calculate the facility influence using (3).
Analysis: The time complexity of Algorithm 1 is O(N), where N is the number of objects. Lines 2 and 3 describe that the grid cell ID is obtained from object’s location using grid index, so the calculation is constant time. Lines 5 and 6 retrieve data from grid data structure, which also takes constant time. The upper bound of algorithm is determined by the the number of objects.