Conditional anomaly detection

被引:183
作者
Song, Xiuyao [1 ]
Wu, Mingxi [1 ]
Jermaine, Christopher [1 ]
Ranka, Sanjay [1 ]
机构
[1] Univ Florida, Comp & Informat Sci & Engn Dept, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
data mining; mining methods and algorithms; LIKELIHOOD;
D O I
10.1109/TKDE.2007.1009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When anomaly detection software is used as a data analysis tool, finding the hardest-to-detect anomalies is not the most critical task. Rather, it is often more important to make sure that those anomalies that are reported to the user are in fact interesting. If too many unremarkable data points are returned to the user labeled as candidate anomalies, the software will soon fall into disuse. One way to ensure that returned anomalies are useful is to make use of domain knowledge provided by the user. Often, the data in question includes a set of environmental attributes whose values a user would never consider to be directly indicative of an anomaly. However, such attributes cannot be ignored because they have a direct effect on the expected distribution of the result attributes whose values can indicate an anomalous observation. This paper describes a general purpose method called conditional anomaly detection for taking such differences among attributes into account, and proposes three different expectation-maximization algorithms for learning the model that is used in conditional anomaly detection. Experiments with more than 13 different data sets compare our algorithms with several other more standard methods for outlier or anomaly detection.
引用
收藏
页码:631 / 645
页数:15
相关论文
共 27 条
[1]  
ADAM A, 2000, P C COMP VIS PATT RE, P1002
[2]  
Aggarwal C. C., 2001, SIGMOD Record, V30, P37, DOI 10.1145/376284.375668
[3]  
[Anonymous], P 3 SIAM INT C DAT M
[4]  
[Anonymous], P SIAM C DAT MIN
[5]  
Beyer K., 1999, P 7 INT C DAT THEOR, P217, DOI DOI 10.1007/3-540-49257-7_15
[6]  
Bilmes J., 1997, ICSITR97021 U CAL BE
[7]  
Bradley P. S., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P9
[8]  
Breiman L., 1984, Classification and Regression Trees, V432, P151
[9]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[10]   Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables [J].
Chickering, DM ;
Heckerman, D .
MACHINE LEARNING, 1997, 29 (2-3) :181-212