Cluster-based outlier detection

被引:151
作者
Duan, Lian [1 ]
Xu, Lida [2 ,3 ]
Liu, Ying [4 ]
Lee, Jun [5 ]
机构
[1] Univ Iowa, Dept Management Sci, Iowa City, IA 52242 USA
[2] Beijing Jiaotong Univ, Coll Econ & Management, Beijing 100044, Peoples R China
[3] Old Dominion Univ, Dept Informat Technol & Decis Sci, Norfolk, VA 23529 USA
[4] Chinese Acad Sci, Res Ctr Fictitious Econ & Data Sci, Beijing, Peoples R China
[5] Chinese Acad Sci, China Sci & Technol Network, Beijing, Peoples R China
关键词
Outlier detection; Cluster-based outlier; LDBSCAN; Local outlier factor; FEATURE SPACE THEORY;
D O I
10.1007/s10479-008-0371-9
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Outlier detection has important applications in the field of data mining, such as fraud detection, customer behavior analysis, and intrusion detection. Outlier detection is the process of detecting the data objects which are grossly different from or inconsistent with the remaining set of data. Outliers are traditionally considered as single points; however, there is a key observation that many abnormal events have both temporal and spatial locality, which might form small clusters that also need to be deemed as outliers. In other words, not only a single point but also a small cluster can probably be an outlier. In this paper, we present a new definition for outliers: cluster-based outlier, which is meaningful and provides importance to the local data behavior, and how to detect outliers by the clustering algorithm LDBSCAN (Duan et al. in Inf. Syst. 32(7):978-986, 2007) which is capable of finding clusters and assigning LOF (Breunig et al. in Proceedings of the 2000 ACM SIG MOD International Conference on Manegement of Data, ACM Press, pp. 93-104, 2000) to single points.
引用
收藏
页码:151 / 168
页数:18
相关论文
共 35 条
[1]  
Agrawal R., 1998, SIGMOD Record, V27, P94, DOI 10.1145/276305.276314
[2]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[3]  
[Anonymous], 1980, IDENTIFICATION OUTLI, DOI DOI 10.1007/978-94-015-3994-4
[4]  
[Anonymous], 2011, Pei. data mining concepts and techniques
[5]  
Barnett V., 1994, Wiley series in probability and mathematical statistics applied probability and statistics, P224
[6]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[7]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[8]  
CARVALHO RA, 2007, ENTERPRISE INFORM SY, V1, P197, DOI DOI 10.1080/17517570701356208
[9]   Self-similarity in World Wide Web traffic: Evidence and possible causes [J].
Crovella, ME ;
Bestavros, A .
IEEE-ACM TRANSACTIONS ON NETWORKING, 1997, 5 (06) :835-846
[10]   A local-density based spatial clustering algorithm with noise [J].
Duan, Lian ;
Xu, Lida ;
Guo, Feng ;
Lee, Jun ;
Yan, Baopin .
INFORMATION SYSTEMS, 2007, 32 (07) :978-986