Skip to main content

Table 1 The main process of data cleaning and preprocessing

From: RETRACTED ARTICLE: Data cleansing method of talent management data in wireless sensor network based on data mining technology

Form

The main process of data cleaning and preprocessing

Scavenging dirty data fields

The main purpose of this step is to remove data input errors. Some simple errors in correcting data records through some external functions and external source files, such as checking whether the postal code corresponds to the city, and whether the birthday and the age are consistent. This will improve the accuracy and standardization of the data, and effectively avoid the clustering process, because the data error is too much to make the record of the same entity did not appear in the same cluster.

Use a unified abbreviation

According to the corresponding relationship between the abbreviation and the full name, all the data are processed in a standardized way, either in a unified abbreviation form or by the full name representation.

Data conversion

In this process, we mainly convert some data with different formats. In a database, the male is represented in a database, and the “1” is expressed in another database, which produces inconsistent data. The data conversion process is to convert these inconsistent data into consistent data. This process can also transform a data table into data tables of many different structures according to certain requirements.