System model
Figure 1 illustrates a system for predicting spatial information in real time. Mobile devices equipped with a sensor acquire sensor data and transmit them to a base station through an uplink channel. Here, we assume that vehicles equipped with image sensors (cameras) are the mobile devices. The base stations forward the received data to a server via a relay network. The server extracts the elements of spatial information from the collected sensor data and produces spatial information.
Figure 2 shows a block diagram of the system. Each mobile device consists of a sensor, pre-processor, transmitter, controller, and data storage, while the server consists of a receiver, converter, learner, predictor, and evaluator. In this figure, the solid lines indicate data flows, while the broken lines indicate control messages.
Image data collected by mobile devices need to be delivered to the server in real time to extract input data for real-time prediction even when the transmission speed is low due to bandwidth limitation, uplink traffic congestion, or poor signal quality. Therefore, under such bandwidth limitations, only mobile devices with higher priority data are allowed to transmit their image data. The controller prioritizes the sensor data in accordance with the importance of each element of spatial information as determined by the evaluator. The server receives the sensor data and converts them into a form that can be used by the learner as input data for prediction. The predictor in the server predicts and produces spatial information by using the feature model, which will be discussed later in Section 3.2.
In contrast, training data for prediction can be collected from mobile devices as a background process. That is, the way to forward them to the server is out of the scope of our framework because they can be forwarded through mobile networks at off-peak traffic time or a rich bandwidth provided by WiFi or millimeter-wave transmission. The learner in the server produces a feature model by using the training data received through the machine-learning training process discussed in Section 3.2.
Figure 3 illustrates the process flow of the proposed framework, which is split into the rea time and background flows. In this figure, the solid lines indicate the main flow of the process, while the broken lines indicate the information update. In the real-time flow, first, sensor data are acquired at each mobile device. Then, the data are prioritized at mobile devices on the basis of the importance given from the background flow. Data are transmitted by mobile devices in accordance with the priority. Here, we can consider two cases: (1) communication capacity can be estimated and (2) communication capacity is not given. In the former case, an existing method for capacity estimation in communication networks would be used. A simple and classical approach for this is measuring roundtrip time, as suggested in prior works [18–20]. If communication capacity can be successfully estimated in advance, transmitted data are limited before actually being transmitted, so the total volume of the transmitted data does not exceed the communication capacity. In the latter (no given communication capacity) case, transmitted data are dropped by the channel access control protocol, so the total volume of the transmitted data does not exceed the communication capacity as actually occurs in real systems such as wireless local area networks and cellular networks. In both cases, in our framework, data with high importance are transmitted with high priority. At the server, data are extracted and are used as input for prediction. In the background flow, sensor data for training are acquired at each mobile device. As we mentioned above, the way to prioritize and transmit data for training is out of the scope of this paper. Data used as input and output for training are extracted at the server. The ML model is updated using those data, and the updated model is used for performing prediction in the real-time flow. The importance of data is also updated, and it is used when data are prioritized for transmission in the real-time flow.
Prediction and data assessment using machine learning
Figure 4 shows three example patterns of spatial information recorded on different days: d1, d2, and d3. They show the values, which could indicate the volume of any spatial information extracted from image sensor data like the volume of road traffic (the numbers of vehicles or pedestrians), for five geographical sections (sections A to E) for each time slot (1:01 to 1:05 pm). Each pattern consists of a set of input data and the expected result. The learner accumulates the recorded patterns. The predictor predicts the future results, which will be actually obtained at 1:00pm, from the currently obtained input data if it finds the corresponding input data in the recorded patterns.
In the figure, the black elements are common to the three patterns, while white elements vary among the patterns. This means that the white elements are meaningful for prediction. In other words, the white elements are more important for prediction than the black ones.
However, there are two problems. The exact same input is rarely found in the recorded patterns, so the prediction needs to be done using similar previously recorded inputs. Elements in different records that are exactly the same are also rarely found, so the importance of elements needs to be evaluated over different records even though they are not exactly the same. These two problems are overcome by machine learning of data features.
By using a sufficient number of recorded patterns as training data, supervised learning using an NN or RF method produces generalized feature models that enable the system to perform prediction from the immediately acquired input even if the exact same input is not found in the recorded patterns. Moreover, machine learning enables the system to evaluate which elements are important for prediction. As we explained in Section 2.2, this capability is called feature selection and enables us to obtain the importance score of each data element.
Formulation of proposed framework
This section presents the key idea of our framework. The objective function and traffic-volume constraint of our framework can be formulated as
$$\begin{array}{@{}rcl@{}} \max_{X(t)} A(X(t)), \end{array} $$
(1)
$$\begin{array}{@{}rcl@{}} \sum_{x \in X(t)} d_{x} \leq C(t), \end{array} $$
(2)
where X(t) and A(X(t)) mean the set of input data received from mobile devices for prediction at time t and the accuracy of the prediction at time t achieved using X(t), respectively. In Eq. (2), dx and C(t) mean the data volume of an input data element x and the capacity of the network at time t, respectively. Equation (2) is the constraint meaning that the total volume of data transmitted by mobile devices must be smaller than or equal to the capacity of the network. However, since the prediction system is operated on a real-time basis, it is impossible to search for and find the optimal X(t) among all possible sets of X(t). Therefore, in our framework, we convert Eq. (1) into
$$\begin{array}{@{}rcl@{}} \max_{X(t)} \sum_{x \in X(t)} s_{x}, \end{array} $$
(3)
where sx means the importance score of input data element x obtained by using a feature selection method. The constraint in Eq. (2) still works for Eq. (3). Equation (3) means that we need to maximize the total score of input data for prediction. Converting Eq. (1) into Eq. (3) is reasonable because, as we mentioned above, feature selection methods give higher scores to input data that make larger contributions to prediction accuracy.
Finally, we mention how to solve Eq. (3). This problem is considered as a classical 0-1 Knapsack problem [21]. A simple approach for this is just to sort x in the ascending order of sx, and then “greedily” pick as many x as possible from the top under the constraint of (2).