We divide the shopping data mining procedure into two phases. In phase-I, our system discovers hot areas by k-NN machine learning algorithm. In phase-II, feedback information are used to reveal implicit relationships among items, identifying hot and popular items by hierarchical agglomerative clustering.

### Phase-I: machine learning

During the whole process of shopping, if the customer is fond of an item, he will pick or move it for a while. Those actions will alter the items’ state from stationary to movement. Based on the above analysis, the velocity pattern would change when the status of the tag changes, which naturally separates the moved items apart from other stationary items. Namely, by observing changes in velocity values, we can identify the popular products. In real environment, however, only by the magnitude of the velocity can not exactly deduce what the state of a tag is. For example, when someone walks past a stationary tag or other tags around it are moved, this tag will have a relative speed like the velocity patterns of the six items in 4 s shown in Fig. 4, making it not easy to judge the state of the tag through a simple change of the amplitude. Although the mobile tags’ velocity pattern changes are more obvious than the stationary tags, the boundary is intangible to tell them apart. In Fig. 4, the column on the left shows the static tag’s velocity patterns, the first picture depicts the velocity patterns when there are no volunteers, the second shows the velocity patterns when volunteers stroll around the shelf, the third figure shows the fluctuating pattern of the static tag’s velocity when nearby objects have been taken. Likewise, the column on the right represents a model for moving tags. The first is that the item is picked up and turned over, and the remaining two represent the items being carried by different customers. The measurement indicates that it is possible to use k-NN classifier to divide the tags into three classes: no one near the tags (stationary tags), someone near the tags (unstable tags), and tags that were moved (mobile tags).

We can use the number of unstable tags to find a hot area. In phase-I, our system can discover the unstable tags through machine learning, record its ID, and count the number of unstable tags, as well as the number of times to be passed by the customer. Based on the area of where each product belongs which we already know, we can determine the hot area according to the collected tag ID.

First, we need to train the mode as shown in Fig. 4 by collecting the sequence of the tags’ velocity vectors, which states are already known, then use this model to estimate the tags’ state for a given new velocity vector. Set the number of readers and reference tags as *N* and *M* respectively, then the RSS vector of the *i*_{
th
} target tag can be defined as *T*_{
i
}=(*t*_{(i,1)},*t*_{(i,2)},⋯,*t*_{(i,N)}); the *t*_{(i,n)} denotes the velocity of the *i*_{
th
} target tag by the *n*_{
th
} reader received, and *n*∈(1,*N*). And the velocity vector of the *i*_{
th
} target tag is defined as *V*_{t(i,n)}=(*v*_{t(i,1)},*v*_{t(i,2)},⋯,*v*_{t(i,N)}); the *v*_{t(i,n)} is velocity of the *i*_{
th
} target tag by the *n*_{
th
} reader received. And the corresponding RSSI vector for the *j*_{
th
} reference tag is defined as *R*_{
j
}=(*r*_{(j,1)},*r*_{(j,2)},⋯,*r*_{(j,N)}), and *r*(*j*,*n*) denotes the signal strength, and *j*∈(1,*M*). And the velocity vector of the *j*_{
th
} reference tag is defined as *V*_{r(j,n)}=(*v*_{r(j,1)},*v*_{r(j,2)},⋯,*v*_{r(j,N)}); the *v*_{r(j,n)} is the velocity of the *j*_{
th
} reference tag by the *n*_{
th
} reader received.

The Euclidean distance *E*_{(i,n)} between *V*_{t(i,n)} of the *i*_{
th
} target tag and *V*_{r(j,n)} of the *j*_{
th
} reference tag is calculated by:

$$ E_{(i,n)}=\sqrt{\sum_{n=1}^{N}(v_{t(i, n)}-v_{r(j, n)}})^{2} $$

(7)

Unfortunately, items are usually densely placed side by side in the supermarket shelf, and the RSS is easily affected by the multi-path effect and the change of the radiation pattern (a tag antenna’s radiation patterns will have a great impact on the adjacent tags due to mutual coupling, shielding, and reflection [9]), making the selected *K*, where most similar references usually do not have the same state with the target tag [10]. So, a further improvement method is proposed by [11] to mitigate the impact of the velocity fluctuation on the estimation error.

The *j*_{
th
} reference tags’ mean value collected by the *n*_{
th
} reader antenna is defined as *u*(*j*,*n*), and standard deviation is *δ*(*j*,*n*). Due that the target tag would have an unequal velocity when it is close to the different reference tags or is covered by an object, we use *u*(*j*,*n*) and *δ*(*n*,*m*) to optimize it. The normalized velocity of the *j*_{
th
} reference tag *n*_{
j
} is calculated as follows:

$$ n_{j}=\sqrt{\frac{1}{M}\sum_{n=1}^{N}\left(\frac{v_{r(j,n)}-u_{(j,n)}}{\delta_{(j,n)}}\right)^{2}} $$

(8)

Hence, the revised Euclidean distance \(E_{(i, n)}^{'}\) is calculated by the following formula :

$$ E_{(i,n)}^{'}=\frac{E_{(i,n)}}{n_{j}} $$

(9)

Therefore, the *i*_{
th
} target tag has its Euclidean distance vector \(E_{(i, 1)}^{'}, E_{(i, 2)}^{'}, \cdots, E_{(i, N)}^{'}\), and the reference tag closer to the target tag is assumed to have a smaller Euclidean distance. Then, we can get the \(E_{i}^{''}\), which is the Euclidean distance vector after the revised Euclidean distances in \(E_{i}^{'}\) are sorted in an ascending order, i.e., \(E_{(i, 1)}^{''}\leq E_{(i, 2)}^{''} \leq \cdots \leq E_{(i, N)}^{''}\). The first *K* reference tags are the nearest neighbors (NNs) whose states are utilized to identify the state of the target tag. The weighting factor for each selected reference tag are:

$$ w_{(i,k)}=\frac{1/{E_{(i,k)}^{^{\prime\prime}2}}}{\sum_{k=1}^{K}1/{E_{(i,k)}^{^{\prime\prime}2}}} $$

(10)

where *k*∈(1,*K*). The estimated state of the *i*_{
th
} target, i.e., *y*_{
th
} is given by

$$ \hat{y_{i}}=\sum_{k=1}^{K}w_{(i,k)}y_{k} $$

(11)

where *y*_{
k
} denotes the state of the *k*_{
th
} selected reference tag. After this process, we reduce the amount of data which need to be computed by the reader effectively.

After phase-I, we can separate the moving items from all the items. In phase-II, the reader only needs to deal with the information of moving items, greatly improving the time efficiency and reducing the amount of computation.

### Phase-II: hierarchical agglomerative clustering

In the clustering phase, our system first explores the velocity to discover correlated items, which are usually tried on or buy together, e.g., when people buys pasta, they also want to buy tomato, and if they need a dress, they also consider about the high-heeled shoes. Previous effort [4] proposed an RSS-based localization technique for correlated item discovery, based on an intuition that correlated items held by the same person should be in close proximity. However, this method is not accurate after applying in real-world applications since items around the customer are also very close, then they may be taken as the correlated items mistakenly. When different commodities are continuously picked up, rather than pick up simultaneously to compare, this method also does not work well.

Our system uses the observation that correlated items, either in the hands of a single customer or in the same shopping bag, once they follow a similar moving pattern with the customer, they would have the same velocity time curve. So, we can use hierarchical agglomerative clustering approach to organize tags in different groups, making the correlated items be aggregated into one group.

Since we do not know the number of target items to be tracked (new items may be added to the shopping cart or some previous items are discard by customers at any time, and we also do not know the number of customers), clustering algorithms, such as k-means [12], in which the number of clusters must be known as a priori, thus leading to this kind of clustering algorithm that can not be applied. For this reason, we adopt hierarchical agglomerative clustering (HAC) algorithm [13] to solve the challenge.

We define *K* as the set of velocity of the mobile tags that are identified in the phase-I, and we divide the velocity into *N* segments. Each segment of data is considered as a vector. Each tag *i*∈*K* has velocity vectors *v*_{
i
}. In the HAC algorithm, each vector is initially considered as an independent cluster. At each iteration, the two similarities clusters are merged. The distance between two clusters, *S*_{
i
}∈*K* and *S*_{
j
}∈*K*, is measured with the average distance \(\bar {d}\):

$$ \bar{d}\left(S_{i},S_{j}\right)=\frac{\sum_{ti\in S_{i}} \sum_{tj\in S_{j}} \left \| v_{ti}-v_{tj} \right \|}{\left | S_{i}\right |\left | S_{j} \right |} $$

(12)

where *t* denotes the *t*_{
th
} segment of relative data and *i*, *j* are tag’s identifiers.

The iterations terminate when the minimum of the average speed similarity among the clusters is larger than a threshold *T*_{
c
}, which determines the classification accuracy and time efficiency, (i.e., time delay for low values of *T*_{
c
}, and low correct rate for high values of *T*_{
c
}). The optimal value of *T*_{
c
} used in the experiments was derived experimentally (see Section 5).

We use the example of Fig. 5 to explain our algorithm. At first, each tag is considered as one individual cluster. According to Eq. 12, we can calculate the similarity of velocity vector *v*_{
i
} based on the measured data. The results are shown in Fig. 7. The smaller the distance \(\bar {d}\) is, the more similar the two tags are. We put items into one cluster, where distance values are lower than the threshold. Hence, beer and nappies are clustered after several iteration. When there is no \(\bar {d}\) or \(\bar {d}\) is lower than the threshold *T*_{
c
}, or only one cluster is left in the end, the iterative algorithm will be terminated, and each cluster is defined as the tags that are selected by the same person. However, there is another situation, like the lipstick in Fig. 5, which is divided into a single category. Maybe it was just being picked up at a certain time and not being put into the shopping cart, it was just a hot item. So all commodities like the lipstick should have a calculated distance \(\bar {d}\) again with the static goods, if the \(\bar {d}\) is lower than the threshold *T*_{
c
} in the time period, it will be considered as a hot item, and the popular items are those mobile items that are identified in the phase-I minus the hot items identified in phase-II.

In the clustering process, different time segment classification results may be different, a certain group of goods may increase or decrease in the next period of time, corresponding to the product that is newly added to the shopping cart or abandoned during the process of shopping. Thus, we have the shopping order in time series, which is beneficial for retailers in optimizing the pattern of commodity display.