A survey of outlier detection methodologies

被引:1968
作者
Hodge V.J. [1 ]
Austin J. [1 ]
机构
[1] Department of Computer Science, University of York, York
关键词
Anomaly; Detection; Deviation; Noise; Novelty; Outlier; Recognition;
D O I
10.1023/B:AIRE.0000045502.10941.a9
中图分类号
学科分类号
摘要
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.
引用
收藏
页码:85 / 126
页数:41
相关论文
共 67 条
[61]  
Tax D.M.J., Ypma A., Duin R.P.W., Support vector data description applied to machine vibration analysis, Proceedings of ASCT99, (1999)
[62]  
Taylor O., Addison D., Novelty detection using neural network technology, Proceedings of the COMADEN Conference, (2000)
[63]  
Torr P.H.S., Murray D.W., Outlier detection and motion segmentation, Proceedings of SPIE, (1993)
[64]  
Vesanto J., Himberg J., Siponen M., Simula O., Enhancing SOM based data visualization, Proceedings of the 5th International Conference on Soft Computing and Information/Intelligent Systems. Methodologies for the Conception, Design and Application of Soft Computing, 1, pp. 64-67, (1998)
[65]  
Wettschereck D., A Study of Distance-based Machine Learning Algorithms, (1994)
[66]  
Ypma A., Duin R.P.W., Novelty detection using self-organizing maps, Progress in Connectionist-based Information Systems, 2, pp. 1322-1325, (1997)
[67]  
Zhang T., Ramakrishnan R., Livny M., BIRCH: An efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103-114, (1996)