Outlier detection for high dimensional data

被引:158
作者
Aggarwal, CC [1 ]
Yu, PS [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Heights, NY 10598 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
The outlier detection problem has important applications in the field of fraud detection, network robustness analysis, and intrusion detection. Most such applications are high dimensional domains in which the data can contain hundreds of dimensions. Many recent algorithms use concepts of proximity in order to find outliers based on their relationship to the rest of the data. However, in high dimensional space, the data is sparse and the notion of proximity fails to retain its meaningfulness. In fact, the sparsity of high dimensional data implies that every point is an almost equally good outlier from the perspective of proximity-based definitions. Consequently, for high dimensional data, the notion of finding meaningful outliers becomes substantially more complex and non-obvious. In this paper, we discuss new techniques for outlier detection which find the outliers by studying the behavior of projections from the data set.
引用
收藏
页码:37 / 46
页数:10
相关论文
共 27 条
[1]
Aggarwal C. C., 2000, ACM SIGMOD C P
[2]
AGGARWAL CC, 2001, ACM SIGMOD
[3]
AGGARWAL CC, 1999, ACM SIGMOD C P
[4]
AGGARWAL CC, 1997, OPERATIONS RES, V45
[5]
AGRAWAL R, 1993, ACM SIGMOD C P
[6]
Agrawal R., 1998, ACM SIGMOD C P
[7]
[Anonymous], 1989, GENETIC ALGORITHM SE
[8]
[Anonymous], VLDB
[9]
ARNING A, 1995, KDD C P
[10]
Barnett V., 1984, Outliers in Statistical Data, V2nd