Features extraction and analysis for device-free human activity recognition based on channel statement information in b5G wireless communications

Features extraction and analysis for human activity recognition (HAR) have been studied for decades in the 5th generation (5G) and beyond the 5th generation (B5G) era. Nowadays, with the extensive use of unmanned aerial vehicles (UAVs) in the civil field, integrating wireless signal receivers on UAVs could be a better choice to receive hearable signals more conveniently. In recent years, the HAR system based on CSI based on WiFi radar has received widespread attention due to its low cost and privacy protection property. However, in the existing CSI-based HAR system, there are two disadvantages: (1) The detection threshold is manually set, which limits its adaptability and immediacy in different wireless environments. (2) A sole classifier is used to complete the recognition, resulting in poor robustness and relatively low recognition accuracy. In this paper, we propose a CSI-based device-free HAR (CDHAR) system with WiFi-sensing radar integrated on UAVs to recognize everyday human activities. Firstly, by using machine learning, CDHAR applies kernel density estimation (KDE) to obtain adaptive detection thresholds to complete the extraction of activity duration. Second, we proposed a random subspace classifier ensemble method for classification, which applies the frequency domain feature instead of the time domain feature, and we choose each kind of feature in the same amount. Finally, we prototype CDHAR on commercial WiFi devices and evaluate its performance in both indoor environment and outdoor environments. The experiment results tell that even if experimental scenario varies, the accuracy of activity durations extraction can reach 98% and 99.60% whether in outdoor or indoor environments. According to the extracted data, the recognition accuracy in outdoor and indoor environments can reach 91.2% and 90.2%, respectively. CDHAR ensures high recognition accuracy while improving the adaptability and instantaneity.


Introduction
Human activity recognition (HAR) is the vital technology nowadays, and it enables to use for realizing applications such as intelligent sensory games, smart homes, and human body posture monitoring and some other individual applications [1][2][3][4][5][6][7]. Channel statement information (CSI) is a kind of fine-grained physical layer information with high resolution [8][9][10]. Therefore, CSI-based HAR is popular with researchers and has been extensively studied [11][12][13][14][15][16]. Traditional HAR systems distinguish the difference of CSI between action phase (the case that someone is interfering with the link) and stationary phase (the case that no one is interfering with the link) [10,[17][18][19] and artificially set the threshold to detect the start and the end of the activity. However, when the environment varies the threshold needs to be reset or new activities need to be identified, which limits the adaptability and instantaneity of this system. Because the HAR system is sensitive to unexpected errors, using this unique method will result in incorrect or incomplete activity duration extraction. In addition, the existing HAR system ignores the frequency domain feature, which is the key parameter of recognition. Moreover, the previous HAR systems mainly use a limited (2020) 2020: 36 Page 2 of 10 sole classifier with poor robustness and low recognition rate, such as k-nearest neighbor (KNN) classification algorithm [20][21][22] and support vector machine (SVM) [23]. When identifying similar activities, the recognition accuracy of this sole classifier is not satisfactory. Because ensemble learning is a way to train and combine multiple classifiers [24][25][26][27], a recent work cited an integrated method in Bagging-SVM to identify cite R11. However, Bagging-SVM is replacing samples, so some samples may appear multiple times in the same training set, while others may be ignored, which will reduce the recognition accuracy.
In this paper, we propose a CSI-based device-free HAR (CDHAR) system that integrates WiFi-sensing radar on a UAV for HAR to overcome the shortcomings of existing HAR systems. First of all, CDHAR estimates the probability density distribution of the CSI of each subcarrier in the action phase and the rest phase according to the fluctuation of the signal, so as to obtain the adaptive detection threshold, and then, use this threshold to complete the extraction of the activity duration. In order to make full use of the frequency domain features, the discrete wavelet transform (DWT) is used to extract the time-frequency component features of each activity [28][29][30][31]. In addition, a sampling criterion is proposed to choose subsets of components from the input feature matrix by CDHAR. Finally, SVM is conducted on the subsets in order to generate a set of classifiers and giving each of them a weight by the weight assignment method, and the classification results are combined according to the obtained weight vector to get the final recognition result.
We show the framework of CDHAR in Fig. 1. CDHAR is composed of two parts: offline phase and online phase. In the offline phase, it extracts activity durations by using the proposed detection algorithm. Then, a random subspace set classification method based on support vector machine is designed. In the online phase, it uses the same method as the classifier trained in the offline phase to extract features and classify activities.  Different from the previous systems which struggle to  set the detection threshold manually and utilizes a simple classifier with time domain feature to reach HAR, CDHAR obtains the adaptive threshold for the extraction of activity durations and complete recognition by the proposed ensemble method. The main contributions of this paper are summarized as follows.
• In this paper, we proposed a new algorithm to obtain the adaptive detection threshold in order to extract activity durations when the environment varies or other activities are added. The algorithm is adaptive and instantaneous in different wireless environments.
The experimental results show that the extraction of CDHAR activity duration is accurate and it can meet the requirement of HAR. • In this paper, a random subspace classifier ensemble method is proposed, in which frequency domain features are used instead of time domain features, and each feature is selected with the same amount.
Due to the use of CDHAR, the recognition accuracy of CDHAR is higher than that of the existing HAR system.
The rest of the paper is organized as follows. Section 2 introduces the process of the proposed adaptive detection threshold and the detail steps of the proposed random subspace ensemble method. The extensive experiments and evaluation are shown in Section 3. Finally, we conclude the paper in Section 4.

Extraction of activity durations based on adaptive detection threshold
The link between Commercial WiFi devices are used in our system to detect human activities and it is shown in Fig. 2. It is considered that the case that the transmitter is  sending WiFi frames to the receiver continuously. When an action emerges, the signal reflected by the human body will influence the signals traveled through the line-of-sight (LoS) path as shown in Fig. 2. receivers can measure small signal changes caused by human movements and apply these changes to recognize human activities by monitoring the wireless channel state. In order to extract the activity durations in online phase, adaptive detection threshold is estimated by CDHAR in the offline phase. Let H f j , t denote the amplitude of CSI, f j and t represent the frequency at the jth subcarrier and the time moment, respectively. To analyze the fluctuation feature of the received signal, we calculate the mean and the variance of H f j , t at moment t in a sliding window with length l.
where m j,t and v j,t represent the mean and variance of H f j , t in a sliding window, respectively. CDHAR employs v j,t as the feature of the signal fluctuation. Since kernel density estimation (KDE) [32][33][34] constructs the distribution model of the data according to the data itself instead of depending on the assumption of the distribution in advance, CDHAR statistically estimates the v j,t extracted above and establishes the distribution model by KDE.
where K(·) represents the kernel function, h j and n denote the length and the number of sliding window, respectively. The type of K(·) does not affect the estimation result and CDHAR chooses Gaussian as the kernel function for the universality of Gaussian function. Then, CDHAR calculates the optimal bandwidth by Eq. (4) whose robustness and practicality have been tested [35].
whereσ j denotes the variance of v j,t . Letĝ 0,j (x) andĝ 1,j (x) represent the probability density function (PDF) of v j,t in the stationary phase and in the action phase, respectively. Thus, we can obtainĝ 0,j (x) andĝ 1,j (x) by Eqs. (3) and (4). The false alarm probability P e0,j and the missed alarm probability P e1,j are carried out by Eqs. (5) and (6), respectively.
The total error detection probability of the jth subcarrier is P e,j = P e0,j + P e1,j , and it is minimized by Eq. (7) This is a problem of finding the minimum value that can be solved by conducting derivation.
The detection threshold is set according to j = arg min j P e,j and ρ opt = ρ j ,opt , where j and ρ opt represent the index of the j th subcarrier and the adaptive detection threshold which is obtained by the j th subcarrier, respectively. These steps are done in the offline phase. During the online phase with an action, t 0 and t 1 are recorded as the beginning and the ending of the activity in the case that v j,t is larger and lower than ρ opt during a fixed period, respectively. The activity durations extracted from 30 subcarrier form a matrix H with 30 rows and T columns, where T represents the durations of one activity.

Random subspace classifier ensemble
CDHAR proposed a random subspace classifier integration method for classification. First of all, it divides the feature space into subspaces through the proposed selection method, which takes into account the balance of each feature. Then, the SVM is used to classify the subspace to generate a classifier. Then, CDHAR assigns weights to each classifier through the proposed weight allocation method. Finally, the recognition result is obtained through the weighted sum of the results of each classifier.
In order to extract important information from H and reduce the time cost at the same time, the principal component analysis (PCA) is used. It is considered that the correlation between each subcarrier has high correlation and the first principal component contains a large number of sudden noise as a result of the instability of the device. The second to fourth principal components are chosen in this paper and the number of principle components k = 3. The matrix H r with k rows and T columns is obtained after the PCA of H. Since the change rate of CSI amplitude can reflect the speed of the activity [11], different activities can be distinguished by the variation trend (2020) 2020: 36 Page 4 of 10 of CSI amplitude. Since DWT is a tool for numerical and time-frequency analysis, CDHAR utilizes it to extract the feature in frequency domain. H d with k ×(q + 1) rows and T columns is obtained after the q-layer DWT of H r . Due to the length of data in each row of H d is long, the real-time performance of the system cannot be satisfied. CDHAR extracts the statistical information of H d to reduce the length of features. The statistical features selected in this paper are mean, standard deviation, interquartile range, the 50th percentile, the 68.3th percentile, and the 95th percentile. For each row in H d , 6 kinds of statistical features can be obtained, which means each activity is featured by a vector with size 6 × k × (q + 1). Suppose there are N activities in the database and they can form a matrix H e with N rows and 6×k ×(q + 1) columns. Each row represents features of one activity and each column represents the same statistical feature of different activity. H e is standardized by where S j = mean(H e (·, j)) and S j = std(H e (·, j)). It should be noted that mean(·) denotes the mean of (·), and std(·) represent the standard deviation of (·). Thus, H f is a matrix with N rows and 6×k×(q + 1) columns. It is noted that features are standardized both in online and offline phase. From above, we know that the feature matrix H f is 6 × k × (q + 1) dimensions and the space composed by all matrix elements is termed full space, which is denoted as V, while the space composed by part of matrix elements is termed subspace. CDHAR selects x subsets from V with the thought of bagging to construct the subspaces. The proposed selecting method ensures the final results do not bias any one of the principal components, frequency components or statistics. x is set to be x = lcm(6, k, q + 1), where lcm(·) denotes the lowest common multiple of (·). The subsets selection approach of subsets is conducted by following steps:  × (q + 1)). If z / ∈ S i , add the ith statistical feature of group z to S and z to S i , remove z from U, time = time + 1; otherwise, repeat step 4. It should be noted that z is the column index of V. In other words, S stores a series index of column which are elected from V. The subsets {S 1 , S 2 , · · · , S c } that construct the feature subspaces can be obtained, and SVM is conducted to generate the classifiers {model 1 , model 2 , · · · , model c }, where c represents repeat times. Because classifiers that contain certain functions have better performance, these classifiers should be given more weight. Then, the final result can be obtained through the weighted sum of the results of each classifier. The idea of cross-verification is used to obtain the weight vector. The pseudo code for the proposed weight allocation method is shown below.
The weight vector w is calculated in the offline phase. In the online phase, when there has new data, the feature is extracted by {S 1 , S 2 , · · · , S c }; thus, we can obtain c groups of the feature. Then, CDHAR gets the set of results {b 1 , b 2 , · · · b c } by classifying each group in corresponding classifier, and the recognition results are combined by Eqs. (6) and (7).
where i denotes the activity index of the ith activity in {b 1 , b 2 , · · · b c }, J (i) represents the weighted sum of classifier whose recognition result is i, and opt indicates the activity index of final recognition result.

Results and discussion
In order to verify the effectiveness of proposed system, we test activities in two environments and experimental scenarios are demonstrated in Fig. 3. Figure 3a and b respectively demonstrate the indoor and outdoor environment. The later one with size 57.6 m × 51.0 m is a typical outdoor environment which has seldom obstacles and multi-path components. The transmitter is 10m far from the receiver. The other one with size 13.3 m × 7.8 m is a typical indoor environment which has more furniture and the receiver is 7.6 m far from the transmitter. We use the MS-B083 mini host equipped with an Intel 5300 network card. Both the transmitter and receiver use a single antenna, as demonstrated in Fig. 4a. During the experiments, we tested 5 common activities shown in Fig. 4 to verify the effectiveness of the proposed system. Utilize model p to classify the yth sample; 9: if The recognition result is correct then 10: end if 11: end for

12: end for
Offline activity databases have been built in the two scenarios. The structure of the database is the same in indoor and outdoor environments. For outdoor environment, the database includes the above 5 activities, each of which is repeated 30 groups (30 times). To obtain the detection threshold, we collected data for 10 min without taking any action. Volunteers are invited to perform these 5 activities indoors and outdoors, and each activity will collect 100 groups. In order to analyze the effect of length of the sliding window on the effect of signal fluctuation feature extraction, we do "Run" 4 times through the link back and forth, and the results are demonstrated in Fig. 5. We can see from Fig. 5a that when someone approaches the link, the CSI amplitude will fluctuate significantly within the duration of 2 to 2.5 s, 4.2 to 5.8 s, 6.4 to 8 s, and 9.9 to 10.2 s as the result of the impact of multi-path. In particular, when the target is in the middle of the link at the moments of 2.5 s, 5 s, 7.5 s, and 10.2 s, the CSI amplitude decreases. From Fig. 5b and c we can find in the case of l is relatively small, when the target is still running in the link, the variance of CSI amplitude in the sliding window is reduced to low, which will cause the detection to end earlier and the extracted data to be incomplete. Due to the variance at time t is calculated from the data in the following l a b Fig. 3 Two standard experimental scenarios. a Indoor, office with desks. b Outdoor, open-air platform moments, which may cause the detection to start earlier when l is relatively large, as demonstrated in Fig. 5f. When l is too long the real-time of the system will be affected. On the contrary, when the l is too short, we cannot extract the fluctuation of CSI amplitude completely. As a result, CDHAR takes l = 200 in this paper.
We observe a significant difference in the distribution of CSI amplitude variance between the action phase and the stationary phase as demonstrated in Fig. 6. A red solid dot marks the intersection of blue and orange line, and the adaptive detection threshold is represented by the abscissa value of it. From the ordinate range in Fig. 6a and b, it can find that the value of the variance of CSI amplitude in outdoor environment is smaller than that in indoor as a result of the muti-path. It also can know from it that as the existence of muti-path the threshold in indoor environment is higher than that in outdoor.
In this paper, we apply three evaluation indexes to analyze the performance of the proposed detection algorithm, namely false-positive rate (FPR), false-positive rate (FNR), and F1-measure. Due to FPR and FNR are indicators of the error ratio, the lower the values of FNR and FPR, the better the system performance. F1-measure is a comprehensive evaluation index used to evaluate the performance of the detection algorithm and it represents the average of the precision and the recall [36,37]. Therefore, the F1-measure is higher and the system performance is better. We can find from Fig. 7 that in both outdoor and indoor scenarios, the FPR and the FNR are lower than 10% as well as the F1-measure which can reach more than 90% and the proposed detection algorithm meets the requirements of activity extraction. Compared to outdoor environment, the indoor FPR is higher and the FNR is lower, as a result of the indoor multi-path environment.
We design a succession of experiments to proof the relationship between the CSI amplitude and target's moving speed. Volunteers push a smooth plate respectively across the link at a slow and fast speed. They collect 5 groups of data in each case and extract the amplitude of CSI. At the same time, the instantaneous phase of CSI can be obtained through Hilbert transformation. From Fig. 8a, the phase of CSI does not vary when the plate is stationary, while a b Fig. 7 Instantaneous phase-time graph of CSI amplitude after descrambling. a Outdoor. b Indoor the phase quickly changes when the plate is moving. Comparing Fig. 8a from Fig. 8b, we can find when the plate moves faster, the phase changes quicker. Due to the phase is calculated by the CSI amplitude, it can verify the CSI amplitude reflects the target speed to some extent.
We use the Matlab Discrete Wavelet Transform (DWT) Toolbox to analyze the DWT performance after principal component analysis (PCA). Note that we have extended the duration of the extracted activity to include stationary piece for further analysis. The time-frequency component diagram is demonstrated in Fig. 9. Each layer represents a frequency range, the frequency increases in an orderly manner from the first layer to the ninth layer, and brightness indicates feature amount in frequency components. The darker it is in Fig. 9, the more features there are in this frequency components. It is shown in Fig. 9a that the frequency components of "walk" are no more than a b Fig. 8 Performance of the proposed detection algorithm. a, b The values of FPR, FNR, and F1-measure in outdoor and indoor, respectively seventh layer. It is shown in both Fig. 9a and b that the frequency components of "Run" with eight layers is no more layer than that of "Walk". It is shown in both Fig. 9c and d that features of "Squat" and "Sit down" are mostly concentrated in the lower layers. It is shown in Fig. 9e and b that similar to 'Run' , there are more high-frequency components of "Fall down" due to the sudden increase of the speed in the moment of the falling. The difference between "Squat" and "Sit down" is mostly reflected by the fourth and fifth layers and the difference between "Run" and "Fall down" mostly exist in the intermediate frequency section. The difference between "Squat" and "Sit down" is mainly reflected by the fourth and fifth layers and the differences between "Run" and "Fall down" mostly exist in the intermediate frequency section. It can also be demonstrated from Fig. 9 that the time-frequency components are concentrated in the first layer in the stationary phase. In addition, the high-frequency components increased obviously in the action phase. The above results demonstrate that each activity can be well characterized by the time-frequency component. The offline activity database is trained in both indoor and outdoor environment. Figure 10 shows the confusion matrix of the recognition results. It can be found from Fig. 10 that the recognition accuracy of the outdoor achieves more than 90%, and the misjudgment rate of the indoor significantly increases. As the result of the influence of multi-path, the fluctuation of the signal is more random, which may cause the extracted features to deviate from a certain frequency component, thereby affecting the recognition result.
Most existing HAR systems apply unique classifiers (such as KNN, SVM, and Bagging-SVM) to accomplish HAR. By using the same data, the proposed algorithm is compared with these common recognition algorithms to analyze the recognition performance of CDHAR. As can be seen from Fig. 11, the algorithm has good performance whether in an outdoor or indoor environment. In addition, the recognition accuracy can reach 90%. The performance of KNN is dependent on the selection of neighboring points K, and it has insufficient robustness.  What is more, the recognition rate will be further reduced if the feature sizes of various samples is uneven. Due to the use of signals, SVM is not stable and in the multi-path in indoor environment, the misjudgement rate increases significantly. Because of the limitation of samples and sampling methods on Bagging-SVM, the training samples are uneven and lead to the insufficient training of a certain type of activity. From the results of our experiments, CDHAR has good detection performance and can ensure the requirements for activity extraction. Moreover, the recognition rate of proposed algorithm perform slightly well than intensely used conventional recognition method for existing HAR system.

Conclusions
In this paper, an adaptive detection method is first proposed for extracting activity durations. It is adaptive and instantaneous in an indoor and outdoor wireless environment and works in various environments or adding other activities. Secondly, a recognition method based on frequency domain features is proposed to design a subspace selection scheme and a weight assignment scheme, which can overcome the poor robustness of existing sole classifier. From our experimental result, even in the various experimental site, the extraction accuracy rate of CDHAR can reach 99.80% in outdoor environment and 99.60% in indoor environment, respectively. Besides, the rate of recognition accuracy can achieve 91.2% in outdoor environment and 90.2% in indoor environment, respectively.