An efficient privacy protection in mobility social network services with novel clusteringbased anonymization
 ZhiGuo Chen^{1}View ORCID ID profile,
 HoSeok Kang^{1},
 ShangNan Yin^{1} and
 SungRyul Kim^{1}Email author
https://doi.org/10.1186/s1363801607671
© The Author(s) 2016
Received: 31 May 2016
Accepted: 2 November 2016
Published: 29 November 2016
Abstract
A popular means of social communication for online users has become a trend with rapid growth of social networks in the last few years. Facebook, Myspace, Twitter, LinkedIn, etc. have created huge amounts of data about interactions of social networks. Meanwhile, the trend is also true for offline scenarios with rapid growth of mobile devices such as smart phones, tablets, and laptops used for social interactions. These mobile devices enlarge the traditional social network services platform and lead to a greater amount of mobile social network data. These data contain more private information of individuals such as location, habit, and health condition. However, there are many analytical, sociological, and economic questions that can be answered using these data, so the mobility data managers are expected to share the data with researchers, governments, and/or companies.
Therefore, mobile social network data is badly in need of anonymization before it is shared or analyzed widely. kanonymization is a wellknown clusteringbased anonymization approach. However, the implementation of this basic approach has been a challenge since many of the mobile social network data involve categorical data values. In this paper, we propose an approach for categorical data clustering using rough entropy method with DBSCAN clustering algorithm to improve the performance of kanonymization approach. It has the ability to deal with uncertainty in the clustering process and can effectively find arbitrarily shaped clusters. We will report the proposed approach and discuss the credibility by theoretical studies and examples. And experimental results on two benchmark data sets obtained from UCI Machine Learning Repository show that our approach is second to none among the Fuzzy Centroids, MMeR, SDR and ITDR, etc. with respect to the local and global purity of clusters. Since the clustering algorithm is a key point of kanonymization for clustering mobile social network data, our experimental results show that our proposed algorithm can be more effective to balance the utility of the mobile social network data and the performance of anonymization.
Keywords
1 Introduction
Social network service is an evolving platform that focuses on making and maintaining social network or social relations among people who share some common activities or interests. Popular social network services, such as Facebook, Twitter, Myspace, Linkedin, and many more are the basic carriers of multidimensional space which aim at shaping a virtual society for reflecting people’s real life and status in daily life. Besides, with rapid growth of mobile devices, they provide huge processing power to software applications. These devices could supply valuable user information to social network and share geographical location coordinates of the user. However, mobile social network data uploaded into the social network by these mobile devices are very sensitive. It contains lots of private information which lead to privacy leak. Meanwhile, high quality of these mobile social network data is interesting to researchers or companies with many disciplines, such as sociology, psychology, market, or habit research. Therefore, these data need to be more effectively anonymized so as to protect the private information before it gets published.
In recent years, a simple and practical privacypreserving anonymization [1–7] were proposed to prevent privacy leak or against identifying individuals. However, a serious issue is that it decreases data quality a lot (information loss) after data is anonymized. kanonymization is assigning all records into several groups so that each group contains at least k records. The observations in the same groups are similar or identical to the values of their quasiidentifier. Hence, the efficiency and accuracy of assignment affect the information loss and the performance [6] of an anonymous algorithm. Clustering is a useful technique that partitions a set of instances into subsets (called clusters) so that observations in the same cluster are similar to each other. A good clustering algorithm can present high accuracy of assignment. Hence, clustering is a key point of kanonymization. If we improve the performance of clustering algorithm, we can obtain efficient kanonymization which can reduce information loss of data [6]. Most of the literature adopt the kmeansbased clustering algorithm for kanonymization [4, 6, 7]. However, kmeans and most of clustering algorithms are presented for clustering numerical data using some distance function. Therefore, there has been a big challenging issue for clustering mobile social network data which mostly involve categorical data values. Meanwhile, these data have often no sharp boundary between clusters. Therefore, an algorithm needs to be designed to handle uncertainty in the clustering process. Huang [8] and Kim et al. [9] have proposed some works for applying fuzzy sets in clustering categorical data to solve the uncertainty issue. Shortly afterward, Kumar et al. [10] propose an algorithm (MMeR) which use the basic rough set concepts to handle categorical attribute values as well as the uncertainty of data sets in 2009. In 2011, Panda et al. [5] uses this MMeR algorithm instead of the clustering stage algorithm of OKA [6] and show that MMeR can have a great success to improve the performance of kanonymization for mobile social network data. In 2011, Tripathy et al. propose termed standard deviation roughness (SDR) [11] for clustering categorical data. In 2015, ITDR [12] proposed by Park et al. and the experimental results demonstrate that it has the highest performance than previous research.
DBSCAN clustering algorithm is proposed by Ester et al. in 1996 [13]. Although it employs distance function for numerical data clustering, it does not specify initial points or cluster number and it can also determine arbitrary shapes. Therefore, for improving the performance of mobile social network categorical data clustering algorithm, we refer to the basic theory of DBSCAN algorithm in our works.
Rough set is a powerful theory proposed by Pawlak in 1982 [14, 15], which is applied to data mining, machine learning, pattern recognition, and feature selection successfully [16–19]. The entropy in information theory proposed by Shannon [20] is a useful mechanism for measuring uncertainty in rough sets. Therefore, many papers are presented which combine rough set theory with Shannon’s entropy theory for data labeling and outlier detection [21–23]. Especially, Reddy et al. present data labeling method based on cluster purity using relative rough entropy for categorical data clustering [23]. They apply any clustering algorithm to cluster categorical data into several clusters, then use their proposed method to cluster unlabeled data. Their experimental result demonstrates that it obtains a satisfactory performance. We employ their cluster purity theory with another way to design a novel clustering algorithm for the mobile social network categorical data.
In this paper, we propose a rough entropy method with DBSCAN algorithm for clustering mobile social network data to improve the performance of Kanonymization approach. We employ DBSCAN algorithm with rough entropy method, to calculate the purity of cluster, which can handle uncertainty and improve the performance of categorical data clustering. After adding one data point, if cluster purity is decreased to an acceptable level (threshold λ), then we add this point into the cluster. Subsequently, we use the DBSCAN algorithm to recognize next core point and generate new clusters. Finally, after merging several clusters which have common data points, we can get objective clusters. We have succeeded in showing that the proposed method is able to achieve higher local and global purity as compared to Fuzzy Centroids, MMeR, SDR, and ITDR technique [9–12]. Therefore, our clustering algorithm can improve the efficiency and accuracy of assignment for kanonymization.
The rest of this paper is organized as follows: in Section 2, we will introduce mobile social network data, rough set theory, and rough entropy, as well as DBSCAN algorithm. Section 3 introduces our proposed method. In Section 4, a data set which is referred to paper [23] is used to illustrate our algorithm. Section 5 compares the performance of our proposed algorithm with other related algorithms by the concept of local and global purity to ensure our algorithm can be more efficiently for mobile social network categorical data clustering. Finally, in Section 6, we will conclude this paper.
2 Related Work
2.1 Mobile social network data
With the rapid development of mobile devices, users upload personal comment and share their location, habit, and emotion into social network service to reflect real life more conveniently. Therefore, these mobile devices enlarge the traditional social network service platform and lead to huge amounts of mobile social network data. These large data are interesting to government or companies for big data analysis. The comment, location, and published time of these mobile social network data can be used to analyze the popularity of tourist attractions with the change of seasons to improve the quality of tourism environment. People’s attention or habit of these data can be used to manufacture popularity products. These data also can be used to analyze the user’s comments or emotion to adjust some policies by government, etc. However, these huge amounts of published data lead to privacy leak and can even be linked to an individual. Hence, how to balance the data quality and data privacy has become a challenging issue before mobility data managers publish these mobile social network data.
Mobile social network data
User  Live location  Birth year  Citizenship  Nickname  ...  Privacy information 

1  Seoul,Gangnam  1980  Korean  kim  ⋯  
2  Seoul,Dongdaemun  1985  Korean  park  ⋯  The time,location, 
3  Seoul,Gangnam  1982  Korean  lee  ⋯  contents of posted 
4  Seoul,Dongdaemun  1985  Korean  jin  ⋯  comment,habit, 
5  Seoul,Jongno  1982  Korean  jin  ⋯  emotion,etc. 
6  Seoul,Gwangjin  1988  China  chen  ⋯ 
The purpose of kanonymization is to hide or generalize the values of quasiidentifier in the same cluster before mobility data managers publish it so as to protect personal privacy. Therefore, clustering can be treated as a key point of kanonymization for these mobile social network data.
2.2 Rough set theory
In rough set, a set of data points are stored in a table, this table is referred to as an information system. This information system is defined as a quadruple IS=(U,A,V,f). U is a nonempty finite set of data points. A is a nonempty finite set of attributes. V is the union of attribute values. f:U×A→V is an information function which associates a unique value of each attribute with every object belonging to U, such that for any a∈A and x∈U,f(x,a)∈V _{ a }. V _{ a } is called the value set of attribute a [24–26].
Definition 1
The relation IND(P) is called a Pindiscernibility relation. The partition of U is a family of all equivalence classes of IND(P) and is denoted by U/IND(P). If (x,y)∈IND(P), then objects x and y are indiscernible from each other by attributes from P. The equivalence classes of the Pindiscernibility relation are denoted \([x]_{P}^{U}\).
2.3 Rough entropy
The entropy put forward by Shannon [20] as an effective measure of uncertainty has been wildly used for characterizing the information contents in all sorts of fields.
Rough entropy is an extension of entropy to measure the uncertainty in rough sets.
Definition 2
P _{ i }/U is denotes the probability of any x∈U being in equivalence class P _{ i }. 1≤i≤m and Mdenotes the cardinality of set M.
2.4 DBSCAN algorithm

A point p is a core point if it has more than a specified number of points (MinPts) within ε (Eps).

A border point has fewer than MinPts within ε (Eps), but is in the neighborhood of a core point p.

A noise point is any point that is not a core point or a border point.
To summarize, if clusters have a common data point (reachable from each other), these clusters can be merged and generate a new cluster (x _{9} is not reachable from any other point, it is a noise point).
3 DBASCN based on rough entropy
Rough entropy method is usually used with data labeling and outlier detection. However, researchers give a few attentions to use it in categorical data clustering. In the following paragraph, we will introduce the procedure that performs categorical data clustering using rough entropy with DBSCAN algorithm.
Definition 3
The sum of rough entropy on all attributes of cluster can be calculated as the total rough entropy.
Definition 4
The maximum total rough entropy (MaxRE) of any cluster C _{ i } for any attribute is defined as MaxRE(C _{ i })=K×log(m). Where, K is the attribute number of data points and m is the number of data points in cluster C _{ i }[23].
This section we will introduce our algorithm which is called REbased DBSCAN algorithm. The related definitions have been discussed in the previous section. DBSCAN algorithm can automatically find core point and generate clustering based on these points. Since the distance function is not suitable for categorical data, we employ the variation of cluster purity (calculated by rough entropy) to instead of parameter ε (Eps).
Some of social network data may contain a few numerical values. Therefore, we can adopt the method of calculating means between maximum and minimum value to group numerical values into several intervals. Hence, the numerical values can be converted to categorical values.
After DBSCAN algorithm recognizes one core point, an information system IS=(U,A,V,f) can be obtained. Although, this system only has one data point in U (regarded as cluster C _{ i }). After sorting the remaining data points according to the corresponding cluster purity, we can obtain one sorted table (x _{1},x _{2},⋯,x _{ k },⋯,x _{ n }) (the cluster purity is calculated by one core point and one of remaining point). Since clustering algorithm is the task for partitions a set of instances into groups, the observations in the same groups are most similar to each other. It does not mean that objects should be identical. Therefore, for any other data point x _{ k } of the sorted table, add x _{ k } into cluster C _{ i } if cluster purity is decreased to an acceptable level (threshold λ), add data point x _{ k } into cluster C _{ i }. Otherwise, DBSCAN algorithm recognizes next core point and generate cluster by the above method. Finally, after merging several clusters which have common data points, we can get objective clusters. If the minimum point number of one cluster is smaller than a threshold value v, then we can merge this cluster into another new cluster, which has the highest cluster purity.
REbased DBSCAN algorithm treats every point as core point and estimates the cluster purity to make clusters. The evaluation criteria λ (ε for the DBSCAN algorithm) is equal for every point. Analogous to DBSCAN algorithm, if this value of λ is too large, we will suffer from the overfitting problem. However, our algorithm use the cluster purity to partition the clusters. We can let the threshold λ=N×log(m)/K×log(m)(N≤M). If N=K,λ is equal to 1, it means all values of all attributes are identical. Otherwise, all values of all attributes are different when N=0. Since we do not know the actual cluster number of the unseen data set, we can guess the distribution of clusters to control the threshold value λ so as to avoid the overfitting problem and obtain the object clusters.
4 Certification by example
A data set with 14 data points
User  Live location  Birth year  Citizenship  Privacy information 

x _{1}  Korea,Seoul  1988  Korean  
x _{2}  Korea,Busan  1988  Korean  
x _{3}  Korea,Seoul  1988  Chinese  
x _{4}  Korea,Seoul  1988  Chinese  
x _{5}  Germany,Berlin  1988  German  The time, location, 
x _{6}  Germany,Munich  1988  German  contents of posted 
x _{7}  Germany,Berlin  1999  German  comment,habit, 
x _{8}  Korea,Seoul  1988  Korean  emotion,etc. 
x _{9}  Korea,Busan  2000  American  
x _{10}  Germany,Munich  1999  German  
x _{11}  Korea,Incheon  2000  American  
x _{12}  Germany,Munich  1988  German 
Given an information system IS=(U,A,V,f), where U={x _{1},x _{2},⋯,x _{14}}, A={a,b,c}. Let threshold λ = 0.67 and v = 2.
For core data point x _{1} of DBSCAN, calculate the cluster purity between x _{1} and the rest points, and then sorting these points by cluster purity values.
Data points in cluster with two unsorted data points
User  Live location  Birth year  Citizenship 

x _{1}  Korea,Seoul  1988  Korean 
x _{2}  Korea, Busan  1988  Korean 
Analogously, we can find that cluster purity values between x _{1} and the rest of the data points are (0.667, 0.667, 0.667, 0.333, 0.333, 0, 1.0, 0, 0, 0, and 0.333). According to these cluster purity values, we sort data points as follows: (x _{8},x _{2},x _{3},x _{4},x _{5},x _{6},x _{12},x _{7},x _{9},x _{10}, and x _{11}).
Data points in cluster with two sorted data points
User  Live location  Birth year  Citizenship 

x _{1}  Korea,Seoul  1988  Korean 
x _{8}  Korea,Seoul  1988  Korean 
Analogously, we calculate the cluster purity as 1.0. It means all points in the cluster C _{ i } is same with each other. The cluster is “clean”. Therefore, add data point x _{8} into cluster C _{ i }.
Data points in cluster with 4 sorted data points
U  Live location  Birth year  Citizenship 

x _{1}  Korea,Seoul  1988  Korean 
x _{8}  Korea,Seoul  1988  Korean 
x _{2}  Korea,Busan  1988  Korean 
x _{3}  Korea,Seoul  1988  Chinese 
Therefore, add data point x _{3} into cluster C _{ i }.
After adding data point x _{4}, the cluster C _{ i } is: U={x _{1},x _{8},x _{2},x _{3},x _{4}}. Correspondingly, we can obtain that the cluster purity is 0.757. Although this value is over the threshold value λ, this value is increased after adding data point x _{4} into cluster C _{ i }. It means that this point is similar to another points (like x _{3}) rather than core data point x _{1} of DBSCAN method. Therefore, we stop this procedure and we can obtain one cluster that contains {x _{1},x _{8},x _{2},x _{3}}. Similarly, all clusters of each core point can be found by repeating the above method.
Finally, by DBSCAN algorithm, merge these small clusters if they have common data points. Therefore, the result of our proposed algorithm is C _{1}{x _{1},x _{2},x _{3},x _{4},x _{8}},C _{2}{x _{5},x _{6},x _{7},x _{10},x _{12}} and C _{3}{x _{9},x _{11}}.
The data set in Table 2 is given for estimating the performance of our mobile social network data clustering algorithm. We use rough entropy method to calculate the similarity between data points and employ the basic idea of DBSCAN method to merge the clusters which have common data points to generate target clusters. Our algorithm analyzes the change of cluster purity after attempting to add new data points to the cluster. If the reduction of cluster purity keeps at an acceptable level of λ after adding a data point, then add the data point into the cluster. Analogously, if the number of instances in one cluster is under the minimum threshold value v, this cluster can be merged into other clusters by comparing the cluster purity.
5 Experimental results
According to the above measure, a higher value of overall purity and local purity indicates a higher performance of clustering algorithm. Therefore, if these two values are 1, it means clustering algorithm is perfect for clustering data into its corresponding cluster. In this section, our proposed algorithm is compared to six algorithms based on Soybean and Zoo data sets to ensure the superiority and efficiency of our algorithm.
Local Purity of clusters in Soybean data
Clusters  C _{1}  C _{2}  C _{3}  C _{4}  Sum  Purity 

1  10  0  0  0  10  1 
2  0  10  0  0  10  1 
3  0  0  10  0  10  1 
4  0  0  0  17  17  1 
Table 6 contains information about actual and predicted data distribution by a clustering algorithm. Each column of the Table (C _{1},C _{2},...,C _{4}) represents the instances in an actual cluster, each row (1, 2,..., 4) represents the instances in a predicted cluster by a clustering algorithm in Soybean data set.
As shown in Table 6, all of 47 objects are correctly classified into its corresponding clusters. Thus, the local and global purity of the clusters is 100%. Kim et al. [9], Kumar et al. [10], Tripathy et al. [11], and Park et al. [12] have compared this data set with different categorical data clustering algorithms. In this research, our algorithm is applied to the same data set, and therefore we can compare with these algorithms to check superiority and efficiency of our proposed algorithm.
Local purity of clusters in Zoo data
Clusters  C _{1}  C _{2}  C _{3}  C _{4}  C _{5}  C _{6}  C _{7}  Sum  Purity 

1  27  0  0  0  0  0  0  27  1 
2  14  0  0  0  0  0  0  14  1 
3  0  0  2  13  0  0  0  15  0.867 
4  0  0  0  0  0  0  7  7  1 
5  0  20  0  0  0  0  1  21  0.952 
6  0  0  0  0  0  8  2  10  0.8 
7  0  0  3  0  4  0  0  7  0.571 
Similar to Table 6, each column of the Table 7 (C _{1},C _{2},...,C _{7}) represents the instances in an actual cluster, each row (1, 2,..., 7) represents the instances in a predicted cluster by a clustering algorithm in Zoo data set.
For 101 objects, 93 objects are correctly classified into its corresponding clusters. Thus, the global purity of the clusters is 92%.
The comparison of global and local purity are showed as follows:
From above two examples, we can conclude that our algorithm is the most efficient algorithm for categorical data clustering. Meanwhile, our experiments show that it is more efficient than MMeR, SDR, and ITDR which employ the rough set theory for clustering. Therefore, our clustering algorithm is suitable for kanonymization and will reduce the information loss and provide more useful mobile social network data to the government or companies.
6 Conclusions
Social networks are growing rapidly with the development of mobile devices. These devices supply valuable user information to social network including the user’s geographical location coordinates. Meanwhile, mobile social network data provide interesting opportunities to researchers and/or companies with many disciplines, such as sociology, psychology, market, or habit research. Therefore, these data are badly in need of anonymization before it gets published by mobility data managers. For balancing the utility of these data and the performance of anonymization, we propose a new clustering method for anonymous mobile social network data. Our algorithm needs no initial points and cluster number from users and it can handle the impreciseness of data. The theoretical studies and experimental results show that our method is more efficient than most of the related algorithms including MMeR, SDR, and ITDR which are the algorithms based on rough set theory. We believe that our approach is useful and credible for anonymization. The further work will focus on gathering high dimension mobile social network data to estimate the feasibility of our proposed approach.
Declarations
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2016R1D1A1B02011964).
Competing interests
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 L Sweeney, kanonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness KnowledgeBased Syst. 10(05), 557–570 (2002).MathSciNetView ArticleMATHGoogle Scholar
 L Sweeney, Achieving kanonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness KnowledgeBased Syst. 10(05), 571–588 (2002).MathSciNetView ArticleMATHGoogle Scholar
 G Aggarwal, T Feder, K Kenthapadi, S Khuller, R Panigrahy, D Thomas, A Zhu, in Proceedings of the Twentyfifth ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems. Achieving anonymity via clustering (ACMNew York, 2006), pp. 153–162.View ArticleGoogle Scholar
 X Xu, M Numao, in 2015 Third International Symposium on Computing and Networking (CANDAR). An efficient generalized clustering method for achieving kanonymization (IEEESapporo, 2015), pp. 499–502.View ArticleGoogle Scholar
 G Panda, B Tripathy, S Jha, Security aspects in mobile cloud social network services. Int. J. 2(1) (2011).Google Scholar
 JL Lin, MC Wei, in Proceedings of the 2008 International Workshop on Privacy and Anonymity in Information Society. An efficient clustering method for kanonymization (ACMNew York, 2008), pp. 46–50.Google Scholar
 JW Byun, A Kamra, E Bertino, N Li, in Advances in Databases: Concepts, Systems and Applications. Efficient kanonymization using clustering techniques (SpringerBerlin, 2007), pp. 188–200.View ArticleGoogle Scholar
 Z Huang, Extensions to the kmeans algorithm for clustering large data sets with categorical values. Data Mining Knowl. Discov. 2(3), 283–304 (1998).View ArticleGoogle Scholar
 DW Kim, KH Lee, D Lee, Fuzzy clustering of categorical data using fuzzy centroids. Pattern Recognit. Lett. 25(11), 1263–1271 (2004).View ArticleGoogle Scholar
 P Kumar, B Tripathy, Mmer: an algorithm for clustering heterogeneous data using rough set theory. Int. J. Rapid Manuf. 1(2), 189–207 (2009).View ArticleGoogle Scholar
 B Tripathy, A Ghosh, in Recent Advances in Intelligent Computational Systems (RAICS), 2011 IEEE. Sdr: An algorithm for clustering categorical data using rough set theory (IEEETrivandrum, 2011), pp. 867–872.View ArticleGoogle Scholar
 IK Park, GS Choi, Rough set approach for clustering categorical data using informationtheoretic dependency measure. Inf. Syst. 48:, 289–295 (2015).View ArticleGoogle Scholar
 M Ester, HP Kriegel, J Sander, X Xu, in Kdd, 96. A densitybased algorithm for discovering clusters in large spatial databases with noise (AAAIPortland, 1996), pp. 226–231.Google Scholar
 Z Pawlak, Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982).MathSciNetView ArticleMATHGoogle Scholar
 Z Pawlak, Rough Sets: Theoretical Aspects of Reasoning About Data vol.9 (Springer Science & Business Media, Berlin, 2012).Google Scholar
 TY Lin, N Cercone. Rough Sets and Data Mining: Analysis of Imprecise Data (Springer Science & Business MediaBerlin, 2012).Google Scholar
 Y Kaya, M Uyar, A hybrid decision support system based on rough set and extreme learning machine for diagnosis of hepatitis disease. Appl. Soft Comput. 13(8), 3429–3438 (2013).View ArticleGoogle Scholar
 BB Nair, V Mohandas, N Sakthivel, A decision tree rough set hybrid system for stock market trend prediction. Int. J. Comput. Appl. 6(9), 1–6 (2010).Google Scholar
 Y Qian, J Liang, W Pedrycz, C Dang, Positive approximation: an accelerator for attribute reduction in rough set theory. Artif. Intell. 174(9), 597–618 (2010).MathSciNetView ArticleMATHGoogle Scholar
 CE Shannon, A mathematical theory of communication. ACM SIGMOBILE Mobile Comput. Commun. Rev. 5(1), 3–55 (2001).MathSciNetView ArticleGoogle Scholar
 F Jiang, Y Sui, C Cao, An information entropybased approach to outlier detection in rough sets. Expert Syst. Appl. 37(9), 6338–6344 (2010).View ArticleGoogle Scholar
 X Li, F Rao, An rough entropy based approach to outlier detection. J. Comput. Inf. Syst. 8(24), 10501–10508 (1050).Google Scholar
 HV Reddy, S Viswanadha Raju, P Agrawal, in Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference On. Data labeling method based on cluster purity using relative rough entropy for categorical data clustering (IEEEMysore, 2013), pp. 500–506.View ArticleGoogle Scholar
 L Polkowski, S Tsumoto, TY Lin. Rough set methods and applications: new developments in knowledge discovery in information systems vol.56 Physica (Springer Science & Business MediaBerlin, 2012).Google Scholar
 F Jiang, Y Sui, C Cao, A rough set approach to outlier detection[J]. Int. J. General Syst. 37(5), 519–536 (2008).View ArticleMATHGoogle Scholar
 F Jiang, Z Zhao, Y Ge, A supervised and multivariate discretization algorithm for rough sets[M]//Rough Set and Knowledge Technology (Springer, Berlin, Heidelberg, 2010).Google Scholar
 Visualizing DBSCAN Clustering. http://www.naftaliharris.com/blog/visualizingdbscanclustering/. Accessed Jan 2016.
 V Panchami, N Radhika, in Applications of Digital Information and Web Technologies (ICADIWT), 2014 Fifth International Conference on The. A novel approach for predicting the length of hospital stay with dbscan and supervised classification algorithms (IEEEBangalore, 2014), pp. 207–212.Google Scholar