Application of data mining technology and wireless network sensing technology in sports training index analysis

The conventional analysis method can provide a general analysis of sports training index, but its ability is relatively low when analyzing niche data. To solve this problem, this paper proposes data mining technology. First, the indicator parameter classification is determined, then the data mining technology is imported, the sports training analysis mechanism is established through this technology, and the construction of the index analysis model is completed. The model is used to analyze the process of niche data mining, and effective data of training indicators are obtained. Deep learning is a method of machine learning based on the representation of data. Through the coverage test, accuracy test, and immunity test, the variable parameters of the comprehensive analysis capability are determined. Further calculation of this parameter shows that the comprehensive ability of the data mining application analysis method is improved by 37.14% compared with the conventional method, which is suitable for the analysis of niche sports training indicators of different data types.


Introduction
The conventional index analysis method adopts a statistical method to make a general analysis of sports training indicators.When analyzing niche data, due to the small amount of statistical data, conventional methods have the disadvantage of low comprehensive analysis capabilities [1,2].Therefore, the application of data mining technology in the analysis of sports training index is proposed in this paper.According to the characteristics of the data set, the classification of index parameters was determined, then the sports training analysis mechanism was established by importing data mining technology, and the analysis model was constructed.By analyzing the three processes of data preparation, data mining, and result interpretation, the data mining results of training indicators are obtained and data analysis is completed.Finally, the data mining technology is applied in the analysis of sports training indicators.In the simulation test environment, two different methods were used for coverage test, accuracy test, and immunity test to obtain variable parameters for comprehensive analysis.Through the calculation and comparison of this parameter, it can be seen that the analysis method proposed in this paper has a very high effectiveness.
The specific contributions of this paper include the following: (1) this paper reviews the existing algorithms and analyzes their advantages and disadvantages; (2) this paper proposes a data mining technology combined with a deep learning algorithm and constructs a model for it; (3) a case study of applying data mining technology to sports training index is presented; and (4) the performance of the proposed algorithm is analyzed and compared with other existing algorithms.
The rest of this paper is organized as follows.Section 2 discusses the related work.Secondly, the Section 3 discusses the construction process of the data mining model.Section 4 discusses the application of data mining technology to sports training indexes.Section 5 presents the simulation results and summarizes the future research directions.

Import of data mining technology
Determining the index parameters is based on the characteristics of the data set to find the concept description of the category which represents the overall information of such data, that is, the intension description of the category.The purpose of the classification is to analyze the input data.Through the characteristics represented by the data, the accurate description is found for each type of data; such description is often expressed as a predicate, and it is used to classify the subsequent data.Although the class labels for these data are unknown, their categories can still be predicted [3].
The classification can be described as follows: Given a set T of training data, where the element record is described by several attributes, there is only one attribute as a class attribute in all the attributes.This set is represented by a vector X = (X1, X2,…, Xn), where Xi (1 ≤ i ≤ n) represents a non-category attribute and may have different ranges.When the value range of the attribute is continuous, the attribute is called a continuous attribute; otherwise, it is called the discrete attribute.C = {Cl,C2,…,C k } represents the data set with k different categories of attributes.Then, the T determines a mapping function from vector X to C, that is, .The purpose of the classification is to use data mining techniques to express the implicit function H.The expression of the function H is as follows [4]: In Second, the test data is used to evaluate the model as shown in Fig. 2. If the accuracy rate is acceptable, the classification rules will be used to classify the new data.
Data mining methods are developed from artificial intelligence and machine learning methods, which combine statistical analysis methods, fuzzy mathematical methods, and scientific computing visualization techniques.Data mining technology can be divided into the following six categories.
Inductive learning is currently the focus of research, and it is mainly divided into two categories: information theory and set theory.The information theory approach uses information theory to establish a decision tree.In the field of sports training analysis, the decision tree is a simple index representation method.It gradually classifies cases into different categories.This kind of method has good practical effect and great influence.Since the method finally obtains a decision tree, it is generally called a decision tree method.The more distinctive methods in information theory methods are ID3 and IBLE methods [5].In recent years, due to the development of the rough set theory, the set theory method has been rapidly developed.This includes the coverage of positive exclusion exceptions (AQ method), the concept tree method, and the rough set method.Their three relationships are shown in Eq. 2 [6].
Typical biomimetic technology methods are neural network methods and genetic algorithms.These two methods have formed an independent research system, and they have also played a huge role in data mining.The neural network method is based on the IP model and the Hebb Learning Rule, and the three types of neural network models are established.The neural network sports training index is a distributed matrix structure.Neural network learning is reflected in the gradual calculation of neural network weights.Using neural network techniques is particularly effective when it is difficult to obtain concepts from complex or inaccurate data.The trained neural network is like an "expert" with some kind of specialized sports training indicators, so it can learn from experience like people [7].
Genetic algorithm is the algorithm that simulates the process of biological evolution.It consists of three basic processes of breeding, crossing, and mutating.The algorithm has played a significant role in optimizing calculations and classifying machine learning.
From this, it can be seen that certain mathematic operations on several data variables can get the corresponding mathematical formulas.The statistical analysis method uses statistical principles to analyze the data.It includes common statistics, correlation analysis, regression analysis, variance analysis, cluster analysis, and discriminant analysis [8].
The generation of fuzzy mathematics is due to the objective existence of ambiguity.And the higher the complexity of the system is, the stronger its ambiguity is.This is the principle of mutual grammar summarized by Zadeh.The fuzzy set theory can be used to make fuzzy judgments, fuzzy decisions, fuzzy pattern recognition, fuzzy association rules, and fuzzy cluster analysis on practical problems.The expression of the fuzzy mathematics method is shown in formula 3.

Pi
In which Pi(m) denotes fuzzy mathematics, and P is the representation of the complexity of the system.Z 1 represents the fuzzy judgment, Z 2 is the fuzzy decision, x 1 represents the fuzzy pattern recognition, x 2 represents the fuzzy cluster, and λ represents the fuzzy association rule.
The visual data analysis technology broadened the traditional charting function and enabled users to analyze the data more clearly, for example, turning multi-dimensional data into a variety of graphics, which play a very strong role in revealing the inherent nature and regularity of the data.The purpose is to enable users to browse data and mining process alternately and improve the effect of mining.This technology plays an important role in all stages of data mining.In the preparation phase, the source data is displayed through scatter plots and histograms, which will lay the foundation for better data selection.In the mining phase, various mining processes are described in the visual form, and the user can see from which database the data is extracted, how to extract, how to preprocess, and how to mine.In the presentation phase, the technique makes the training indicators easier to understand.

Establishment of training analysis mechanism
Data mining classification techniques include decision trees, Bayesian, neural networks, and rough sets [9].This paper mainly studies the decision tree classification method based on the following considerations.
First, the decision tree method can generate easily understandable rules.Because the end users are teaching managers, they often do not have the data mining sports training indicators, so the interpretability of the mining method is very important.The decision tree represents the final classification result in a tree structure, and it can also generate If-Then rules.The theoretical expression can be written as follows: In which E 0 represents the theoretical expression function, n is the calculated length, a means the element record range, f is the discrete index, and e represents the index range.
Second, the calculation of the method is not very large.This system is mainly a practical application, not algorithm research, so the work efficiency is more important.This method can greatly shorten the time of calculation and improve the system's execution efficiency.The efficiency of execution can be written as the following formula [10]: In which E is the execution efficiency, and p means the defined attributes of function.In addition, the decision tree method can handle continuous and discrete data.The database contains more types of data, not only qualitative attributes but also quantitative attributes.Among them, qualitative attributes account for the majority.The method works better with discrete data.
Finally, the decision tree can clearly show the importance of attributes.It chooses the splitting attribute by the calculation of the information entropy, and which is the metric of the importance of the attribute.From an intuitive point of view, the higher the level of the node is, the more important the attributes represented by the node are.From an intuitive point of view, the higher the level of the decision tree node is, the more important the attributes represented by that node are.Then, the role of the nodes of the same level is basically the same.
In summary, this paper chooses the decision tree method for the analysis of sports training index.Its process function can express formula 6 [11].
In which T represents the set of training data, also known as the training set or training database, t is the decision tree fancier, x j represents the j-layer data, O represents the split-choice attribute, τ represents the calculation width, and n is the range of conditions.
The decision tree is the process of classifying data through a series of rules, which is the induction learning algorithm.It infers the classification rules of the decision tree representation from a set of irregular elements.It adopts the top-down recursive method to compare attribute values at internal nodes and branch downwards according to different attribute values.The leaf node is the class to be divided.The path from the root node to the leaf node corresponds to a classification rule, and the entire tree represents a set of rules.
In Fig. 3, it is seen that the decision tree is a tree structure similar to a flow chart, which consists of decision nodes, branches, and leaves.Each node corresponds to a non-category attribute, each branch corresponds to each possible value of the attribute, and each leaf node of the tree represents a category.The middle node of the tree is usually represented by a rectangle, while the leaf node is represented by an ellipse.At present, a variety of decision tree algorithms have been formed, such as CLS, ID3, CHAID, CART, FACT, C4.5, GINI, SEES, SLIQ, and SPRINT [12].The most famous algorithm is the ID3 algorithm proposed by Quinlan.
Figure 4 describes the generation process of the decision tree, which is divided into learning and testing.The learning phase adopts a top-down recursive approach [13].The algorithm is divided into two steps: one is the generation of the tree, and the other is the pruning of the tree, which is to remove some data that may be noise or abnormal.
The formula for removing noise and abnormal data volume is as follows: In which L n (x) represents the amount of noise removed, x represents a series, x i is the ith layer of the conclusion, x j represents the jth layer of the conclusion, C represents the decision tree, and n means the scope of the condition.The condition that the decision tree stops splitting is that the data on one node belongs to the same category and no attribute can be reused for segmentation.Building the decision tree can be done by scanning the database several times.This means that fewer resources are required and that it is easy to handle situations where there are many predictors.Therefore, the model of the decision tree can be built very quickly and is suitable for applying to a large amount of data.Through the determination of index parameter classification, the data mining technology is imported and the analysis mechanism is established.Finally the model is built.

The analysis of data mining process
Data mining is a multi-stage process, which mainly includes data preparation, data mining, and result interpretation.The data mining process of sports training index is the iterative process of these three phases, as shown in Fig. 5 [14].
Data preparation accounts for the largest proportion of the entire mining process, usually around 60%.This stage is divided into three steps: data selection, data processing, and data transformation.Data selection mainly refers to the extraction of data from the database and the formation of target data.Preprocessing is to process the extracted data so that it meets the requirements.The main purpose of the transformation is to reduce the data dimension.According to formula 4 and formula 5, the expression of the initial feature function is as follows [15]: In which m represents the data feature variables, I is the data variability, N is the target data, v means calculation magnitude, θ is the spelling records, l means mining scope, E is the data mining, E 1 represents the mining of initial conditions, E 2 represents the mining of working state, and i represents the data of the ith level.
Data mining is firstly algorithmic planning, such as the discovery of data summary, classification, clustering and association rules, or discovery of sequence patterns.Then, the algorithm is selected for this mining method.The choice of the algorithm directly affects the quality of the mining model.After completing the above preparations, the algorithms of data mining can be run.This stage is the phase that data mining analysts and experts are most concerned about.It can also be called data mining in the real sense and expressed by the following function: In which D represents the data mining process, M(q) represents the sum of condition vectors of Eq. 5 and Eq. 6, G(q) represents the sum of state mining state vectors, m x represents the difficulty of index analysis, and f represents the frequency of mining.
I i represents the amount of data mining in the i period, and Es represents the mining status.E R represents the mining conditions, where q∈Rn, |M(q)| ≤ d, and Es is a constant, M(q) is an inverse matrix, and and the integral of the value is a constant.Data mining tasks include correlation analysis, cluster analysis, classification, prediction, timing model, and deviation detection.Association analysis means that when the values of two or more data items appear repeatedly and the probability is high, there is an association between them.The association rules of these data items can be established to reflect the correlation between events.If there is an association between multiple attributes, the attribute value of one can be predicted based on the others, for example, 90% of customers who buy bread buy milk, which is an association rule.Putting them together will increase their sales.In large databases, there are many such association rules, which require them to be screened.Generally, use the values of "support" and "confidence" to filter the useless rules, which can be expressed by the following formula [16]: In which δ represents the useless association rule, L represents the value of the support, l is the value of confidence, S is the data schedulability, F is the value of the data attribute, h represents the correlation coefficient, and d represents a motion index state.
The timing model of data mining refers to searching through the time series for a pattern with a high probability of recurrence.In this model, it is necessary to find out the rule that the ratio always exceeds a certain minimum percentage in a certain minimum time [17].These rules will be adjusted as the situation changes.One of the most important methods in the model is "similar timing."Using it, the temporal event database is viewed in chronological order, from which one or more similar temporal events can be found.
In which Ψ is the similar timing events, Nc represents the data mining time, Fs means the constant of the sports index, and ζ represents the data of the sports index.
Data mining clusters data into several categories.The data of the same category are similar to each other.The distances of different categories of data are relatively large and different from each other.Clustering includes statistical analysis methods, machine learning methods, and neural network methods.Statistical analysis is clustering based on distance.This method is the clustering global comparison.It needs to examine all individuals to determine the division of the class.The distance in clustering of the machine learning is determined according to the concept description, which is called concept clustering.When the clustering objects increase, concept clustering is called concept formation.In neural networks, self-organizing neural network methods are used for clustering.Such as the ART model and Kohonen model, this is the unsupervised learning method [18].After a given distance threshold, each sample is clustered according to the threshold.The clustering formula is as follows: In which λ represents the clustering of data mining, Q represents the mining coefficient, the N means the total amount of mining, the Ns represents the overall amount of the s-layer, Ns L represents the overall amount of the next layer of the s-layer, and Ψ is the similar timing events.
Classification is most widely used in data mining.It is to find the concept description of the category and use this description to construct the model.The description represents the overall information of such data, that is, the content description.Connotation description is divided into feature description and discernment description.Feature description is the description of common features of data, and discriminatory description is the description of the difference between them.The process of classification is to analyze the input data, find an accurate description for each class by calculating the characteristics of the data, and use this description to classify the subsequent data [19].
Deviation detection is to find out the abnormal situation of data.Deviation includes many potential sports training indicators, such as anomalous instances in the classification, deviations in results from predictions, and changes in magnitude.The basic method of deviation detection is to find the difference between the result and the reference, which can be expressed by Eq. 13.

Pi w
In which the Pi(w) represents the difference between the observations and the reference, and the Pi(m) is the fuzzy mathematical method.
The forecast is to use historical data to find the law of change, establish a model, and use this model to predict the types and characteristics of the data.Regression analysis is a typical prediction method, which establishes a regression equation with time as a variable.In the prediction process, entering any time value can get the status at this time.The neural network method realizes the learning of nonlinear samples, which can discriminate nonlinear functions.Classification can also be used for prediction, but classification is generally used for discrete values; regression prediction is used for continuous values; neural network method prediction can be used for continuous values as well as discrete values [20,21].
The expression and interpretation of results are based on the user's purpose of analyzing the information and distinguishing the most valuable information.
The patterns found in the early stages are evaluated by the user, and useless patterns are deleted.If the user's requirements cannot be met, the pattern is returned to the previous stage.In addition, the end users faced by data mining are people.Therefore, the discovered patterns must be visualized.For example, the decision tree is transformed into an "if…then…" rule whose process model is shown in Fig. 6.

Effective data analysis
If the sample belongs to the same class, this node is a leaf node and is marked with this class.Otherwise, the measure of information gain is used as heuristic information to select the attribute of the sample classification, which is the "test" or "decision" attribute of the node.Assumed that all attributes are classified, that is, take discrete values.The branch is created for each known value of the test attribute, and the sample is divided.
The algorithm uses a similar approach, recursively forming the sample decision tree on each partition.Once the attribute appears on a node, it is not necessary to consider this attribute on the descendants of the node.The entire recursion process stops when one of the following conditions is true: (1) All samples for the given node belong to the same class.
(2) There are no remaining attributes that can be used to further divide.
(3) There is no sample in the branch.In this case, the leaf is created with the majority of the training sample set.When the decision tree is created, many branches reflect abnormalities in training.The pruning method uses statistical metrics to clip the least reliable branches to improve the ability of the decision tree to correctly classify.
The pre-pruning method prunes by stopping the construction of the tree in advance.Once stopped, the node becomes the leaf.This leaf has the most frequent classes in the subset sample.When constructing a tree, if the information gain is the equal measure, it can be used to assess the superiority of the split.If the partitioning of the sample results in a split below a predefined threshold, the partitioning stops.However, choosing a proper threshold is difficult, higher may result in an oversimplified tree, and lower will make the simplification too little.The method expression can be written as follows: In which δ means the pre-pruning method, μ represents containing the sample of subsets, Pi(m) represents the fuzzy mathematics method, and t represents the fancier of decision tree.
Post-pruning is the cutting of fully grown branches.By deleting the node's branches, the nodes are cut off.In the pruning algorithm of the cost complexity, the untrimmed nodes become leaves.
The expected error rate of pruned subtrees on non-leaf nodes is calculated.Then, combing the weighting of the branches, the error rate of each branch is used to calculate the expected error rate of no pruning.If the pruning results in a high error rate, the subtree is preserved.Using sets to evaluate the accuracy of each tree, the decision tree with the lowest expected error rate is obtained.Post-pruning requires more calculations than pruning, but the resulting tree is more reliable, and its formula can be written as Eq. 15.

As
In which As represents the marker expression, A 1 represents the error rate of each branch, A 2 is the weight assessment of the branches, V means the range of element records, g 0 represents the index of the index, F 0 represents the index range, and H is the concept tree method.
When extracting classification rules from the decision tree, the rules are expressed in the form of "if-then."The rule is created for each path from the root to the leaves to form the conjunct of the predecessor.Leaf nodes contain class predictions that form the rule post.Based on the analysis of the three processes of data preparation, data mining, and result interpretation, the model based on the analysis of sports training index obtains the data mining results of the index and realizes the analysis of the index.The application of data mining technology in the analysis of sports training index was completed.

Experiment
In order to ensure the effectiveness of the technology proposed in this paper, a simulation test analysis is performed.The test uses different types of sports training index as objects for the analysis.In order to ensure the validity of the test, the conventional index analysis method is used as a comparison object.The test data is presented in the same chart, and conclusions are reached through the calculation of comprehensive analysis capabilities.

Data preparation
The test parameters are set to ensure the accuracy of the test.In this paper, different types of index are used as test objects.Two kinds of analysis methods are used to conduct simulation tests and analyze the results.Because the results obtained by different methods and the analysis methods are different, it is necessary to ensure the consistency of the environment.The data set results in this paper are shown in Table 1.

The test of coverage simulation
The comparison curve of the results of the coverage simulation test is shown in Fig. 7.
From the comparison curve, it can be seen that the overall results of the method designed in this paper is 87.41%, while the coverage of the traditional method is only 79.42%.

The test of the accuracy simulation
The comparison curve of the test results is shown in Fig. 8.
By comparison, the error rate of the method designed in this paper is 89.92%, while that of the traditional analysis method is 74.12%.

The test of the noise immunity simulation
The comparison curve of the noise immunity simulation test results is shown in Fig. 9.
As can be seen in Fig. 9, the anti-interference ability of the design method is 98.41%, while the anti-jamming capability of the traditional analysis method is 69.53%.

The calculation of comprehensive analysis ability
Substitute the test results of coverage and accuracy into the following formula: In which D represents the coverage test results, g means the results of the accuracy test, y represents the results of the noise immunity test, and k represents the simulation coefficient.This paper takes 0.98.
The proposed method is denoted as χ 1 , the conventional method is denoted asχ 2 , Δχ = χ 1 − χ 2 is the positive number indicates that the comprehensive analysis ability is improved, and the negative number Δχ = χ 1 − χ 2 represents the decrease in the comprehensive analysis ability.Then, the Δχ = χ 1 − χ 2 is written as follows: ¼ 0:371409 Compared with the conventional method, the comprehensive ability of the data mining application is increased by 37.14%, which can be applied to the analysis of sports training index of different data types.

Results and discussion
This paper puts forward the application of data mining technology in the analysis of sports training indicators.It relies on the construction of the index analysis model; through the analysis of mining process and data, the analysis of sports training index is completed.The experimental data shows that the method designed in this paper has extremely high effectiveness.It is hoped that the research in this paper can provide a theoretical basis for the analysis methods of sports training indicators.
Abbreviation SEES: Sage Extended Enterprise Suite which H represents the implicit function, H 0 represents the initial state of the function, p represents the defining attribute of the function, a represents the range of the element record, n represents the range of the condition, e represents the range of the sports index, and f represents the discrete index of the sports index.Index parameter classification is generally divided into two steps: The first step is to import data mining technology through known data sets.The second step is to use the obtained model for the classification operation.First, the accuracy of model classification is estimated.If the accuracy of the model is acceptable, the model can be used for classification.The first step, as shown in Fig. 1, is to use the traiH : f(X) → Cning data set for learning.Training sets are analyzed by classification algorithms to generate classification rules.

Fig. 5
Fig. 5 Data mining process model

Fig. 6
Fig.6The model of the data mining process

Fig. 7 Fig. 8
Fig. 7 The comparison curve of the coverage simulation test

Fig. 9
Fig.9The comparison curve of noise immunity simulation test