A local-density based spatial clustering algorithm with noise

被引:172
作者
Duan, Lian [1 ]
Xu, Lida
Guo, Feng
Lee, Jun
Yan, Baopin
机构
[1] Chinese Acad Sci, Comp Network Informat Ctr, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[3] Zhejiang Univ, Hangzhou 310027, Peoples R China
[4] Old Dominion Univ, Norfolk, VA 23529 USA
关键词
data mining; local outlier factor; local reachability density; local-density-based clustering;
D O I
10.1016/j.is.2006.10.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Density-based clustering algorithms are attractive for the task of class identification in spatial database. However, in many cases, very different local-density clusters exist in different regions of data space, therefore, DBSCAN method [M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in: E. Simoudis, J. Han, U.M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI, Menlo Park, CA, 1996, pp. 226-231] using a global density parameter is not suitable. Although OPTICS [M. Ankerst, M.M. Breunig, H.-P. Kriegel, J. Sander, OPTICS: ordering points to identify the clustering structure, in: A. Delis, C. Faloutsos, S. Ghandeharizadeh (Eds.), Proceedings of ACM SIGMOD International Conference on Management of Data Philadelphia, PA, ACM, New York, 1999, pp. 49-60] provides an augmented ordering of the database to represent its density-based clustering structure, it only generates the clusters with local-density exceeds certain thresholds but not the cluster of similar local-density; in addition, it does not produce clusters of a data set explicitly. Furthermore, the parameters required by almost all the major clustering algorithms are hard to determine although they significantly impact on the clustering result. In this paper, a new clustering algorithm LDBSCAN relying on a local-density-based notion of clusters is proposed. In this technique, the selection of appropriate parameters is not difficult, it also takes the advantage of the LOF [M.M. Breunig, H.-P. Kriegel, R.T. Ng, J. Sander, LOF: identifying density-based local outliers, in: W. Chen, J.F. Naughton, P.A. Bernstein (Eds.), Proceedings of ACM SIGMOD International Conference on Management of Data, Dalles, TX, ACM, New York, 2000, pp. 93-104] to detect the noises comparing with other density-based clustering algorithms. The proposed algorithm has potential applications in business intelligence. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:978 / 986
页数:9
相关论文
共 13 条
[1]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[2]  
Berchtold S, 1996, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P28
[3]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[4]  
Ester M., 1996, P 2 INT C KNOWL DISC, P226, DOI DOI 10.5555/3001460.3001507
[5]  
Hinneburg A., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P58
[6]  
Jain K, 1988, Algorithms for clustering data
[7]   Feature space theory in data mining: transformations between extensions and intensions in knowledge representation [J].
Li, HX ;
Xu, LD ;
Wang, JY ;
Mo, ZW .
EXPERT SYSTEMS, 2003, 20 (02) :60-71
[8]   Feature space theory - a mathematical foundation for data mining [J].
Li, HX ;
Xu, LD .
KNOWLEDGE-BASED SYSTEMS, 2001, 14 (5-6) :253-257
[9]   A new shifting grid clustering algorithm [J].
Ma, EWM ;
Chow, TWS .
PATTERN RECOGNITION, 2004, 37 (03) :503-514
[10]  
Ma S, 2003, LECT NOTES COMPUT SC, V2762, P214