A survey of outlier detection methodologies

被引:1968
作者
Hodge V.J. [1 ]
Austin J. [1 ]
机构
[1] Department of Computer Science, University of York, York
关键词
Anomaly; Detection; Deviation; Noise; Novelty; Outlier; Recognition;
D O I
10.1023/B:AIRE.0000045502.10941.a9
中图分类号
学科分类号
摘要
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.
引用
收藏
页码:85 / 126
页数:41
相关论文
共 67 条
[41]  
Laurikkala J., Juhola M., Kentala E., Informal identification of outliers in medical data, Fifth International Workshop on Intelligent Data Analysis in Medicine and Pharmacology IDAMAP-2000 Berlin, 22 August. Organized As A Workshop of the 14th European Conference on Artificial Intelligence ECAI-2000, (2000)
[42]  
Marsland S., On-line Novelty Detection Through Self-organisation, with Application to Inspection Robotics, (2001)
[43]  
Nairac A., Townsend N., Carr R., King S., Cowley P., Tarassenko L., A System for the analysis of jet system vibration data, Integrated ComputerAided Engineering, 6, 1, pp. 53-65, (1999)
[44]  
Ng R.T., Han J., Efficient and effective clustering methods for spatial data mining, Proceedings of the 20th International Conference on Very Large Data Bases, September 12-15, 1994, pp. 144-155, (1994)
[45]  
Parra L., Deco G., Miesbach S., Statistical independence and novelty detection with information preserving nonlinear maps, Neural Computation, 8, 2, pp. 260-269, (1996)
[46]  
Prodromidis A.L., Stolfo S.J., Mining databases with different schemas: Integrating incompatible classifiers, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 314-318, (1998)
[47]  
Quinlan J.R., Induction of decision trees, Machine Learning, 1, 1, pp. 81-106, (1986)
[48]  
Quinlan J.R., Programs for Machine Learning, (1993)
[49]  
Ramaswamy S., Rastogi R., Shim K., Efficient algorithms for mining outliers from large data sets, Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 427-438, (2000)
[50]  
Roberts S.J., Novelty detection using extreme value statistics, IEE Proceedings on Vision, Image and Signal Processing, 146, 3, pp. 124-129, (1998)