An effective and efficient algorithm for high-dimensional outlier detection

被引:224
作者
Aggarwal, CC [1 ]
Yu, PS [1 ]
机构
[1] IBM TJ Watson Res Ctr, Hawthorne, NY 10532 USA
关键词
data mining; high-dimensional spaces; outlier detection;
D O I
10.1007/s00778-004-0125-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
The outlier detection problem has important applications in the field of fraud detection, network robustness analysis, and intrusion detection. Most such applications are most important for high-dimensional domains in which the data can contain hundreds of dimensions. Many recent algorithms have been proposed for outlier detection that use several concepts of proximity in order to find the outliers based on their relationship to the other points in the data. However, in high-dimensional space, the data are sparse and concepts using the notion of proximity fail to retain their effectiveness. In fact, the sparsity of high-dimensional data can be understood in a different way so as to imply that every point is an equally good outlier from the perspective of distance-based definitions. Consequently, for high-dimensional data, the notion of finding meaningful outliers becomes substantially more complex and nonobvious. In this paper, we discuss new techniques for outlier detection that find the outliers by studying the behavior of projections from the data set.
引用
收藏
页码:211 / 221
页数:11
相关论文
共 24 条
[1]
Aggarwal CC, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P61, DOI 10.1145/304181.304188
[2]
Aggarwal CC, 2001, SIGMOD RECORD, V30, P13, DOI 10.1145/373626.373638
[3]
Optimized crossover for the independent set problem [J].
Aggarwal, CC ;
Orlin, JB ;
Tai, RP .
OPERATIONS RESEARCH, 1997, 45 (02) :226-234
[4]
AGGARWAL CC, 2001, P 8 INT C DAT THEOR, P420
[5]
AGGARWAL CC, 2000, P ACM SIGMOD INT C M, P70, DOI DOI 10.1145/335191
[6]
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[7]
Arning A., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P164
[8]
Beyer Kevin., 1999, INT C DATABASE THEOR, P217, DOI [DOI 10.1007/3-540-49257-7_15, 10.1007/3-540-49257-7_15]
[9]
LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[10]
CHAKRABARTI K, 2000, P 26 INT C VER LARG, P89