Using vision-based object detection for link quality prediction in 5.6-GHz channel

Various smart connected devices are emerging like automated driving cars, autonomous robots, and remote-controlled construction vehicles. These devices have vision systems to conduct their operations without collision. Machine vision technology is becoming more accessible to perceive self-position and/or the surrounding environment thanks to the great advances in deep learning technologies. The accurate perception information of these smart connected devices makes it possible to predict wireless link quality (LQ). This paper proposes an LQ prediction scheme that applies machine learning to HD camera output to forecast the influence of surrounding mobile objects on LQ. The proposed scheme utilizes object detection based on deep learning and learns the relationship between the detected object position information and the LQ. Outdoor experiments show that LQ prediction proposal can well predict the throughput for around 1 s into the future in a 5.6-GHz wireless LAN channel.

strengthens [8]. To counter the wireless link quality (LQ) variations caused by the surrounding environment changes, LQ prediction plays an important role.
LQ prediction methods have been proposed to guarantee the quality of service (QoS) in wireless communication systems. Machine learning-based LQ prediction has been widely studied as surveyed in [9]. The network performance prediction and related deep learning technologies for solving mobile networking problems are summarized in [10], and the case studies and intelligent decision-making use of machine learning are described in [11]. The prediction of channel state information (CSI) was proposed in [10][11][12][13][14]. CSI is predicted by the position of the terminal, temperature, humidity, and weather in [12]. The prediction scheme in [13] enables CSI estimation with minimal pilot overhead. The CSI yielded by massive numbers of antennas was used to predict the channel statistical characteristics in millimeter wave (mmWave) environments in [14]. Xu et al. [15] showed that the future network performance could be predicted by using the appropriate metrics in major cellular networks. Wei et al. [16] focused on the transportation mode of the holder of the smart phone and predicted throughput by using the moving history. The LQ related to autonomous connected devices has been studied for connected vehicles [17] and unmanned aerial vehicles (UAVs) [18]. Yang et. al. [17] proposed resource management to realize ultra-reliable and low-latency wireless communication for connected vehicles, and Almeida et al. [18] proposed a quality of service estimator using UAV base station positions and user traffic demand. Since these works do not consider the influence of the surrounding objects, the LQ change caused by large mobile objects such as trucks cannot be predicted.
To consider surrounding mobile objects, LQ prediction based on cameras and sensors was studied in [8,[19][20][21]. RadMAC in [8] used radar to detect human obstacles and switched the beam pattern of 60-GHz channel in a 4.65 m × 5.95 m room, and Wang et. al. [19] proposed the mmWave beam prediction by using receiver location and surrounding vehicles. Papers [20,21] proposed machine learning-based LQ prediction using depth cameras in a 60-GHz channel, and the LQ degradation created by human body blocking was predicted for a transmitter and receiver pair separated by 4 m. Although the impact of mobile objects has been studied in UHF channels, the mobile objects impact LQ even in microwave channels such as SHF. Since SHF has wider service areas than UHF band, the vision system must recognize mobile objects in wider regions.
Object detection technologies have been advanced significantly thanks to deep neural networks [22,23]. The precision accuracy and inference speed of object detection have been improving every year, and many significant reports are now available. The typical object detection output is the object bounding-box with classification result. The object bounding-box indicates the position and size of a rectangular area delineating an object in the image. Classifications are derived from annotated images, and some detection algorithms also output confidence scores.
Autonomous connected devices have driven advances in object detection performance. Therefore, we present an LQ prediction scheme that uses the vision-based advanced object detection methods. The LQ prediction proposal uses object boundingboxes and their classification provided by advanced object detection algorithms. Experiments using HD cameras and wireless LAN systems are conducted in an outdoor environment, and LQ prediction performance is evaluated by using the relationship between the object bounding-box information and the throughput of SHF channels (5.6 GHz). The LQ prediction proposal consists of two-step machine learning: the first step is object detection using deep learning and the second step uses random forest regression to predict the future LQ.
The key contribution of this paper is to introduce the two-step LQ prediction that can take advantage of subsequent advances in object detection algorithms and yield explainable AI results by using the bounding-box information. It is confirmed that visionbased LQ prediction can accurately predict the throughput 1 s into the future in low-SHF channels (5.6 GHz) by using HD camera images captured in an outdoor environment in which various moving objects were present; the distance between the transmitter and receiver is 42 m.
The rest of this paper is organized as follows. Section 2 describes the target region of the vision-based LQ prediction and the system model. Section 3 details the LQ prediction process. Section 4 shows the outdoor experiment setup. Section 5 details the performance evaluation results and the discussions, and Section 6 concludes this paper.

System model
To satisfy the various requirements posed by wireless access services, LQ variation should be predicted and countered as needed. LQ prediction is one foundation technology of advanced wireless access management. The base station (BS) communicates with the terminal-connected devices by wireless access, such as IEEE 802.11, long-term evolution (LTE), and 5G. The LQ of the wireless access is determined by various factors, which are categorized into two parts: communication network and radio-wave propagation condition. The communication network factors include the transmission power, transmission beamforming, traffic, interference, modulation scheme, and error correction code. Their relationship with LQ has been well studied, and LQ can be improved by advanced signal processing. The other aspect is the radio-wave propagation condition between the transmitter and receiver. We separate the radio-wave propagation condition into three categories: surrounding mobile objects, static environment, and connected device status. The factors of static environment and connected device status such as position have been studied by papers [24,25]. The remaining factor, the influence of the surrounding mobile object, was the last piece to achieve advanced wireless access management based on the accurate LQ prediction.
The target radio frequency is important in considering the LQ prediction. Oguma et. al [20] proposed LQ prediction for the millimeter wave communications by using depth cameras. However, LQ prediction for the wireless systems in SHF is needed because the major wireless systems are being operated in SHF. In SHF wireless systems, the service area is likely be greater than that of the millimeter wave wireless systems and the influence of the mobile object varies widely depending on the size, movement, and type of the mobile objects. The object type denotes the category of the object such as car, truck, and person. The object detection algorithms that use the vision information obtained by the cameras and sensors are promising to provide accurate information of the surrounding mobile objects. The state-of-art object detection achieves accurate and real-time operations. In this paper, we used the leading object detection algorithms, M2Det [26] and YOLO v3 [27], and these algorithms can process each frame in less than 100 ms.
Since the movement of a physical object does not change in periods of the order of milliseconds, the vision-based object detection is expected to be used to predict LQ with lead time of around 1 s. Such a lead time allows negative changes in LQ to be countered effectively. The connected device switches to an expensive but more robust wireless link only when the predicted LQ degradation is excessive. Since Lauridsen et al. [28] showed that when transferring from the idle state to connected state in LTE networks the round trip time of ping packets can be several hundreds of milliseconds, long-term LQ prediction is attractive. In the other approach, the data-rate of the video streaming service is decreased to avoid fatal errors such as monitoring video freeze. Furthermore, this might, in combination with position information, yield enhanced movement control of the autonomous robot to optimize LQ.
In this paper, we consider that the wireless environments around connected devices are recognized using the images obtained by cameras and sensors, and the LQ of wireless communication can be estimated using the recognized environmental information. Thus, we define our problem as accurately estimating the LQ about 1 s into the future by using past images taken by cameras. To simplify the scenario, we use a fixed transmitter and a fixed receiver with a dedicated wireless channel between them. In this system model, the wireless channel is disturbed by mobile objects and the objects are found in the camera images. The target LQ at timing t are taken to be L[t]. The camera image images, Ω[t 0 ], are obtained from the images at the current timing t 0 and past timings. The relationship between target LQ and the image features is denoted by introducing function f I as follows, From the viewpoint of machine learning, our problem is to construct function f I , given training dataset (L[t], Ω[t 0 ]). Figure 1 shows the structure of the proposed LQ prediction. The proposal uses twostep machine learning: the first step realizes the object detection, and the second step predicts the LQ using the bounding-box information, which consists of the object category, position, and size in the processed image. The object detection block provides the bounding-box information of the surrounding objects by using vision information Fig. 1 Proposed two-step LQ prediction. The first object detection block uses the deep learning-based object detection algorithm, and the second LQ prediction block determines the future LQ from the bounding-box information  [29] and the measured dataset of (L[t], Ω[t 0 ]), respectively. In this paper, random forest regression is used to implement the second machine learning. The benefits of the proposal are that it allows us to take advantage of subsequent advances in object detection algorithms and that we can understand what condition alters the LQ in the environment. The detection precision of object prediction using camera images is improving continually [23], and novel object detection schemes are expected to emerge in the future. Since the object bounding-box is used to measure object detection performance [23], it is expected that the future object detection algorithms will also provide the object bounding-box information and this approach allows us to benefit from future enhancements in object detection. LQ prediction must have explainable features to encourage the development of the technologies. By using the object bounding-box information, what condition impacts the LQ can be evaluated. Furthermore, we can separately consider the performance of the object detection block and the LQ prediction block.

LQ definition
This paper takes normalized throughput as LQ to focus on the LQ variation caused by the surrounding mobile objects. The downlink throughput was measured using User Datagram Protocol (UDP) traffic with full buffer condition. The throughput at timing t i , R[t i ], was obtained every ΔT, (t i − t i−1 = ΔT) as the bit rate from t i−1 to t i , where B[t] is the bit amount of the UDP packets successfully received at timing t. To focus on dynamic throughput changes, the normalized throughput,R½t i , is defined as where Median() denotes the function that calculates the median value and T A is the averaging time window (set to be greater than ΔT). Since the measured throughput contains extremely low values but with at very low probability, we adopt the median value instead of averaging to alleviate the influence of outliers.R½t i (i > 0) is used as the LQ, L[t], in Eq. 1. To consider the LQ prediction performance corresponding to the difference between the target LQ timing and the current timing, the lead time, T F , is defined as T F = t i − t 0 . In this paper, the time interval for the normalized throughput,

Object detection
The first machine learning block outputs detected object categories and object bounding-box information. Figure 2 shows the generation of object bounding-box information in the time domain. The object bounding-box acquisition timing is assumed to be asynchronous to LQ acquisition timing. Thus, term t n,m is defined as the m-th object bounding-box acquisition timing in the time window from t n to t n + 1 . The time interval of object bounding-box acquisition, The object bounding-box information consists of object category, detection reliability, position, and size. Figure 3 shows an example of the object bounding-box data as gathered in the outdoor experiment. We can see that mobile objects are detected by rectangular bounding-boxes. The positions and sizes were obtained as where the term χ denotes the object class defined in MS COCO dataset, the term j is the object serial number belonging to the same object class, and {X χ, respectively, x-axis position, z-axis position, width, and height, of the j-th object of the object class, χ, at the timing t n,m . In this paper, object class χ consists of "car," "bus (truck)," and "person." Since the observed number of "truck" was only two in the experiments, "truck " was merged into "bus" class. The object class, "all, " which contains "car," "bus," "truck," and "person" object classes, is defined to evaluate the effectiveness of object detection. The performance of the LQ prediction using the bounding-box information of the object class "all" corresponds to using bounding-box information without the object categories.
Since the object detection block provides several object bounding-boxes for the same object, the overlapped objects are deleted by using the Intersection over Union (IoU), which is given by where A intersection and A union are the overlapping area and total area of the boundingboxes, respectively. To track the same object over consecutive timing intervals, IoU for the past 2 frames were calculated, and objects whose IoU is greater than 0.6 are recognized as the same object. The more recent object recognized as the same object is assigned the same serial number as the earlier object. Since all combinations of the bounding-boxes are checked by Eq. 5, it is not possible for the same object to belong to several categories.
Since the object bounding-box information includes a reliability score that ranges from 0 to 1.0, we chose the objects whose reliable scores are greater than threshold S recog . Increasing threshold S recog reduces the number of detected objects. Although higher thresholds prevent misrecognition, the detect detection can be delayed. The numbers of the object class χ was defined as N χ , which depends on the object detection algorithm and threshold S recog .

Object bounding-box information as input features
Since the throughputs are obtained every ΔT, the object bounding-box information is translated to use as input features for the LQ prediction block. Since the time interval of the object bounding-box acquisition, Δτ, is shorter than that of the LQ acquisition, the bounding-box features, Φ χ,j [t n ], are calculated as median values of the positions and sizes of the object bounding-boxes for LQ acquisition timing between t n − 1 and t n. Φ χ; j ½t n ¼ fX χ; j ½t n ;Z χ; j ½t n ;W χ; j ½t n ;H χ; j ½t n ; g ¼ fMedian ΔΦ χ; j ½t n ¼ fΔX χ; j ½t n ; ΔZ χ; j ½t n ; ΔW χ; j ½t n ; ΔH χ; j ½t n g where ΔΦ χ,j [t n ] is the delta value of the object bounding-box information between t n − 1 and t n , the timings t n − 1,α and t n − 1,β are the first and last acquisition timing of the bounding-box information in the time region t n − 1 < t ≤ t n , and Δ′Φ χ, . ΔΦ χ,j can be obtained when there are at least two pieces of object bounding-box information between t − 1 and t n , and Δ′Φ χ,j requires the object bounding-box information in the previous time slot t n − 1 . In Fig. 2

LQ prediction block
The second machine learning block predicts the future LQ by using the object bounding-box information. In this paper, the random forest regression with 500 decision trees is used to evaluate the LQ prediction performance. The input features for the LQ prediction block, Ω[t 0 ], were chosen from the bounding-box features Φ χ,j [t n ], ΔΦ χ,j [t n ], and Δ′Φ χ,j [t n ], where n ≤ 0. The LQ model f I in Eq. 1 is pre-trained by using the dataset of (R½t i , Ω[t 0 ]), and the future normalized throughput is given bŷ In random forest regression, the output of function f I is obtained as an average of outputs of 500 decision trees. The prediction error, E[t i ], is defined as where |A| denotes the absolute value of A. For performance comparison, the prediction performance of the LQ prediction using past LQ features, Θ[t 0 ], which are chosen from the past normalized throughput infor-mationR½t n ðn ≤ 0Þ, is also evaluated. The relationship between the target normalized throughput and the past normalized throughputs is pre-trained as function f L by using dataset (R½t i , Θ [t 0 ]), while the relationship between the target normalized throughput and both bounding-box information and past normalized throughputs is also pre-

Experiment and dataset
The experiments evaluated the LQ prediction performance in an actual outdoor environment. The major parameters are shown in Table 1 [30]. No interference signals were observed in this environment. The bandwidth was set to 20 MHz. The normalized throughput,R ½t n , was measured every 0.5 ms (ΔT = 0.5 s), and time interval for image acquisition, Δτ , was set to 0.1 s, which corresponds to 10 frames per second (FPS). Figure 4 shows a photo of the connected device. The environment surrounding the connected device was captured by 2 HD cameras with fisheye lens, providing a 360°view. The cameras and laptop PC with LQ measurement function were set at 1.2 m and 0.4 m height, respectively. A map of the experiment environment is shown in Fig. 5. A road and sidewalk lay between the connected device and the base station, separated by 42 m, and vehicles and pedestrians passed through the area. Figure 6 shows examples of the images captured and the coordinate system. The object was detected by the object detection block, and the object bounding-box information was used in the LQ prediction block. The ranges of the x-axis and z-axis were set to from − 1 to 1 and from 0 to 1, respectively. To evaluate LQ prediction performance in the event of surrounding mobile objects, we defined an object transit event. In this paper, the output of the object detection block was used as the dataset to evaluate the LQ prediction performance in the LQ prediction block. The object transit event is the timing at which some object was detected in the transit window with the x-axis boundaries of − 0.15 to 0.35, see Fig. 6. The dataset for the LQ prediction block was generated at the transit event timing, which is the period from 5 s before the transit event to 5 s after the transit event for all objects: "car," "bus (truck)," and "person." The vehicle event and person event correspond to the transit event timing of vehicle-related objects ("car" and "bus (truck)"), and people ("person"), respectively. By using the dataset, we generated the LQ model and evaluated the LQ prediction performance.

Dataset and LQ prediction performance evaluation
The dataset for the LQ prediction block totaled 3490 s of data, containing transit events of 288 cars, 20 buses/trucks, and 36 persons. The vehicle-event data totaled 3061.5 s, while the person-event data held 976 s. The dataset includes 547.5-s data corresponding to both vehicle event and person event. LQ prediction performance was evaluated by the metric of k-cross validation. The dataset was divided into 10 parts, and 9 of tenths were used for training to generate the LQ model. LQ prediction was conducted using the remaining one-tenth dataset.

Measured normalized throughput
The cumulative distribution functions of the normalized throughput,R½t i , of all-transit-event timing, vehicle-event timing, and person-event timing are shown in Fig. 7. We can see that the distribution is far from Gaussian; this is considered to be due to the moving objects. The probabilities of the normalized throughputs falling below 0.8 were 0.113, 0.105, and 0.280 for  all event, vehicle event, and person event, respectively. Since the time when the moving persons affect the normalized throughput is longer than that caused by the vehicles because of their low moving speed, they have high probability of low normalized throughput.

Object detection and object bounding-box
M2Det in 2019 [22] and YOLO v3 in 2018 [23], which are used as the object detection block in Fig. 1, are state-of-the-art detectors based on deep neural networks. The image processing speed and average precisions of M2Det are stated to be 30 ms and 37.6 in [22], while those of YOLO v3 are 51 ms and 33.0 in [23]. The average precision of the object detection denotes the detection performance, and M2Det has better performance than YOLO v3. Both object detectors output object bounding-boxes, their categories, Fig. 6 The view from the connected device and the x-axis and z-axis definition and reliability scores. We used the bounding-box information whose reliability score is greater than the threshold value, S recog . Thus, the number of detected objects depends on S recog . Table 2 shows the maximum number of objects detected by M2Det [26] and YOLO v3 [27] for several threshold values, S recog . We checked the maximum number of objects by watching the video, and the number of the maximum numbers of "car," "bus (truck)," and "person" were 2, 1, and 4, respectively. Therefore, the large number of the detected objects means the object detection block generated unnecessary bounding-boxes. We can see the object number increases as the threshold was set to be low. We can see that the error of the detected number becomes 1 or 0 when the threshold S recog is 0.5 and 0.8, respectively, for M2Det and YOLO v3.

Feature importance evaluation
Since the LQ prediction block uses the object bounding-box information to predict the LQ, the relationship between the elements of Φ . This means that the instant information is needed to accurately predict the near future condition, and the accuracy of the delta is more important for predictions with greater lead times. Figure 9 plots the feature importance for the target normalized throughputs with T F of 1.0 [s] when the vehicle event and person event were picked up from the dataset. The feature importance of boundary-box features in the vehicle event was similar to that of the all-event dataset, and the x-axis information is the most important. In the person event, height information H χ,j is more important than the x-axis position, X χ,j . This is because the width information of people is unstable compared to that of the vehicle. People with their arms spread wide are detected as large width structures, and the x-axis position can be biased.

Input feature set for LQ prediction block
The input feature sets for LQ prediction block were generated by where χ∈ "car"; "bus truck ð Þ"; "person" f g and 1 ≤ j≤ N χ ; ð9Þ where χ∈ "car"; "bus truck ð Þ"; "person" f g and 1 ≤ j≤ N χ : ð11Þ Ω BB [t 0 ] is the basic boundary-box information set of the objects, "car," "bus (truck)," and "person." The feature number is given by 4 × (N car + N bus + N person ). Ω BBA [t 0 ] is the boundary-box information with single object class "all." The feature number is 4 × N all . Ω BV [t 0 ] is the advanced boundary-box information that contains the delta values, ΔX χ,j , ΔH χ,j , Δ′X χ,j , and Δ′H χ,j where χ ∈ {"car", "bus (truck)", and "person"}. ΔZ χ,j , ΔW χ,j , Δ′Z χ,j , and Δ′W χ,j , were not used because of their low feature importance. As conventional LQ prediction approach, past LQ information use [15] was also evaluated. The input feature set of current and past LQ information is given by whereR½t − 5 corresponds to 2.5 s past normalized throughput. The effectiveness of the past feature use is discussed in Section 5.3. Furthermore, the input feature set for the combination of the object bounding-box and past LQ was also generated as

Calculation complexity in LQ prediction block
The computation time of LQ prediction block was evaluated by using the all-event dataset of 3490 s. The training dataset (R½t i ; Ω BB [t 0 ]), which consists of 3141 s (nine tenths of all the data), is used to construct function f I , and the predicted throughput dataR½t i is calculated by using the remaining 349-s data (one-tenth of all the data). It takes 0.51 s to provide the normalized throughputR½t i for a 349-s data by using the LQ model. One-second bounding-box information can be processed in 1.4 ms by random forest regression. This shows that the dominant computation load of the two-step LQ prediction proposal is the object detection block. Regarding the training computation load for LQ model, it takes 218 s to generate function f I by using the training dataset.

LQ prediction in time domain
The normalized throughput was predicted by using the input feature set, Ω BV [t 0 ], provided by M2Det with S recog of 0.5 and T F of 1.0 [s]. Figure 10 shows the measured and predicted throughput in the 5.6-GHz channel for the 500-s dataset, and the red solid line and black dashed line correspond to the predicted throughput and actual throughput, respectively. If the prediction is perfect, the lines overlap. The timing of the objects being present in the transit window shown in Fig. 6 is indicated as horizontal stripes. The yellow and blue stripes indicate the transit events of vehicles and persons; they were detected by M2Det with S recog = 0.3. The threshold setting and dependencies on the object detection algorithms are discussed in Section 5.5. Figure 10 shows that the throughput degradation of the 5.6-GHz channel was predicted by using the 1 s past boundary-box information. In particular, the vehicle-related throughput degradation was more clearly predicted than people movement. This is because vehicle movement was stable over time while the people walking around the terminal changed movement speed and body position more freely.  Figure 11 a and b show the CDFs of the prediction errors for the probability range from 0 to 1.0 and 0.8 to 1.0, respectively. The distribution from 0 to 80% in CDF mainly corresponds to the timing when the moving object does not affect the LQ while that from 80 to 100% denotes that the moving object does impact the LQ. The horizontal distribution at 99.99% value of the CDF denotes the LQ change which cannot be predicted by using the proposed LQ prediction. Since Fig. 7 shows that about 20% of the normalized throughputs of the all-event dataset are degraded by the mobile object, the highest 20% of the prediction error (80 to 100%) is considered to correspond to the LQ degradation caused by the mobile object. Thus, we focus on the 90% value of the CDF of the prediction error as the middle value between 80 and 100%. Figure 11 a shows that the prediction performance using Ω BVT has the best performance and the median value of Ω BVT is 49.3% less than that of Θ TH . Figure 11 b shows 90% values of Ω BVT and Ω BV were 31.1% and 31.9% less than those of Θ TH . The prediction performance using Ω BV was slightly better than that using Ω BVT at 90% outage. This indicated that past LQ information did not contribute to the prediction performance for the 5.6-GHz channel with T F = 1.0 [s] when Ω BV was available. The 90% outage values of Ω BB and Ω BBA were 19.5% and 13.3% less than those of Θ TH , respectively. Thus, the object classification improves 6.2% at 90% value (Ω BB over Ω BBA ), and the delta value information improves 12.4% (Ω BV over Ω BB ). Figure 11 c and d show the CDFs of the prediction errors corresponding to vehicle event and person event, respectively, for the range from 0.8 to 1.0. In the vehicle event (c), the 90% outage values for Ω BVT and Ω BV were 31.6% and 31.4%, respectively, less than those for Θ TH . In the person event (d), the 90% outage values for Ω BVT and Ω BV were 16.2% and 20.3%, respectively, less than those for Θ TH . This suggests that LQ prediction is more effective for vehicle transit events than for person transit events. In the person event, the accuracy of the LQ prediction using Ω BB and Ω BBA was less than that of Θ TH -based LQ prediction, and the prediction error using Ω BB was greater than that using Ω BBA . This is caused by the shortage of training data of person transit event. Considering all objects as a single category in Ω BBA yielded more efficient training of the LQ prediction model in the second machine learning block for our datasets.

Past information use
LQ prediction performances with past information was evaluated for the lead time T F of 1.0 [s] by using M2Det with S recog of 0.3. In this evaluation, the input feature set from t 0 − T past to t 0 was used. Thus, the lines of Ω BV and Θ TH denote LQ prediction  Figure 12 shows that LQ prediction using past LQ information improved the prediction performance while the past bounding-box information yielded no improvement. This shows that the latest object bounding-box information is the most important features in predicting LQ.

Lead time dependency
The prediction performance against lead time T F was evaluated by using M2Det with S recog of 0.3. Figure 13 a, b, and c plot the 90% values of LQ prediction with Θ TH , Ω BV , and Ω BVT versus T F for all-transit events, vehicle events, and person events, respectively. Although the prediction error with Ω BV increases as T F becomes large, the rate of degradation in LQ prediction performance with Ω BV and Ω BVT was gentle and the  We can see that old LQ information was less effective for lead times greater than 1 s since there was no advantage to using Ω BVT rather than Ω BV .

Impact of detection threshold and algorithm
The proposed LQ prediction adopts the two-step machine learning, and each machine learning block must prepare its own model. Since the pre-trained model has much higher training cost than the prediction phase, the dependencies on the object detection model and detection threshold setting must be evaluated. If the relationship between the bounding-box information and the LQ depends on the detection algorithm and threshold setting, the second LQ prediction block must prepare all combinations to cover all possible object detection algorithms and threshold settings. The resulting computation load for training will be significant. On the other hand, if the dependency on object detection algorithms and threshold setting is not critical, the prediction model of the second machine learning block can be common and it is also expected that the second machine learning block can be developed independently by using the bounding-box information. Furthermore, it is also important to confirm that better object detection algorithms will enhance the LQ prediction performance. Therefore, the impact of the object detection algorithm used and threshold S recog was evaluated to confirm that the proposal can take advantage of advances in object detection algorithms.
The bounding-box information was generated by using M2Det and YOLO v3 with S recog values of 0.1, 0.3, 0.5, and 0.8, and 8 all-event datasets were generated for the input feature set Ω BV . Eight all-event datasets were divided into 10 sub-datasets to conduct the k-cross validation and duplicated for 10 test sub-datasets and 10 training subdatasets. Then, 9 of the training sub-datasets were used as the training data to make the LQ prediction model, and the normalized throughput was determined for the remaining one of the test sub-datasets, which corresponds to different timing.  Table 3 shows the 90% outage value of the prediction error corresponding to all the combination of the object detection algorithm pairs. Since 90% absolute error of LQ prediction with old LQ information was 0.183, some of the combinations were worse than that with Θ TH . We can see that M2Det with S recog of 0.8 for training and M2Det with S recog with 0.1 for test provided bad prediction performances, and the same algorithm combination with S recog of 0.3 and 0.5 provided the best performances. Although almost all combinations outperformed LQ prediction with the old LQ information, using the same algorithm for training and test yielded better performance than using the different algorithms for test and training. Since M2Det has better detection accuracy than YOLO v3, the best 90% error of M2Det (S recog of 0.3) is better than that of YOLO v3 (S recog of 0.5). This shows that the LQ prediction scheme can take advantage of advances in object detection algorithms if the LQ prediction model is updated by using the advanced object detection algorithms.
The averaged values of each column and row are also listed in Table 3. The averaged prediction error for training sub-datasets decreases as S recog is set to a lower value. On the other hand, the detection accuracy becomes more important for the prediction phase. The averaged prediction error for the test sub-dataset shows that S recog of 0.5 yielded the lowest error for both M2Det and YOLO v3. When training the LQ prediction model, many mobile objects should be used even if the misrecognition number increases.

Conclusion
This paper presented a wireless link quality prediction scheme that uses the two-step machine learning; the first machine learning block realizes object detection while the second block predicts the future LQ using bounding-box information. Although the structure is simple, the proposed LQ prediction can well predict throughputs with lead times of more than 1 s. Proof of concept experiments were conducted in 5.6-GHz WLAN channels, and the relationship between the type of passing object and its impact of measured throughput was shown. Performance evaluation in the 5.6-GHz channel clarified the dependency on the future time, the input feature sets, and the advantages compared to LQ prediction based on the past throughput information. By using the object bounding-box information, the 90% values of the absolute prediction error in the proposed LQ prediction were 31.1% less than those of the LQ prediction using past LQ information. By using the LQ prediction, the connected device side can recognize the surrounding environment precisely. So far, wireless management techniques have been developed on the network side as in LTE and 5G because the network side can monitor the data traffic and obtain various types of information. The network side has much more abundant information than the terminal side, while the connected device wins in terms of freshness of obtained information. The vision of the smart connected devices is expected to be one of keys for raising the nextgeneration wireless systems to a whole new level of service. Table 3 The 90% value of absolute error with Ω BV using different object detection algorithm and threshold