Skip to main content

Table 1 The main process of data cleaning and preprocessing

From: Data cleansing method of talent management data in wireless sensor network based on data mining technology

Form The main process of data cleaning and preprocessing
Scavenging dirty data fields The main purpose of this step is to remove data input errors. Some simple errors in correcting data records through some external functions and external source files, such as checking whether the postal code corresponds to the city, and whether the birthday and the age are consistent. This will improve the accuracy and standardization of the data, and effectively avoid the clustering process, because the data error is too much to make the record of the same entity did not appear in the same cluster.
Use a unified abbreviation According to the corresponding relationship between the abbreviation and the full name, all the data are processed in a standardized way, either in a unified abbreviation form or by the full name representation.
Data conversion In this process, we mainly convert some data with different formats. In a database, the male is represented in a database, and the “1” is expressed in another database, which produces inconsistent data. The data conversion process is to convert these inconsistent data into consistent data. This process can also transform a data table into data tables of many different structures according to certain requirements.