BACON: blocked adaptive computationally efficient outlier nominators

被引:356
作者
Billor, N
Hadi, AS
Velleman, PF
机构
[1] Cukurova Univ, Dept Math, Adana, Turkey
[2] Cornell Univ, Dept Stat Sci, Ithaca, NY 14853 USA
关键词
data mining; mahalanobis distance; multivariate outliers; outlier detection; prediction error; regression outliers; residuals; robust distance; robust statistics;
D O I
10.1016/S0167-9473(99)00101-2
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Although it is customary to assume that data are homogeneous, in fact, they often contain outliers or subgroups. Methods for identifying multiple outliers and subgroups must deal with the challenge of establishing a metric that is not itself contaminated by inhomogeneities by which to measure how extraordinary a data point is. For samples of a sufficient size to support sophisticated methods, the computation cost often makes outlier detection unattractive. All multiple outlier detection methods have suffered in the past from a computational cost that escalated rapidly with the sample size. We propose a new general approach, based on the methods of Hadi (1992a,1994) and Hadi and Simonoff (1993) that can be computed quickly - often requiring less than five evaluations of the model being fit to the data, regardless of the sample size. Two cases of this approach are presented in this paper (algorithms for the detection of outliers in multivariate and regression data). The algorithms, however, can be applied more broadly than to these two cases. We show that the proposed methods match the performance of more computationally expensive methods on standard test problems and demonstrate their superior performance on large simulated challenges. (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:279 / 298
页数:20
相关论文
共 48 条
  • [1] FAST VERY ROBUST METHODS FOR THE DETECTION OF MULTIPLE OUTLIERS
    ATKINSON, AC
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (428) : 1329 - 1339
  • [2] Atkinson AC., 1985, Plots, transformations and regression
  • [3] an introduction to graphical methods of diagnostic regression analysis
  • [4] Bacon Francis., 1994, NOVUM ORGANUM
  • [5] Barnett V., 1984, Outliers in Statistical Data, V2nd
  • [6] BARRETT BE, 1997, P STAT COMP SECT AM, P130
  • [7] Belsley D.A., 1980, Regression Diagnostics: Identifying Influential Data and Sources of Collinearity
  • [8] Chatterjee S., 1988, Sensitivity Analysis in Linear Regression, DOI 10.1002/9780470316764
  • [9] Cook R. D., 1982, RESIDUALS INFLUENCE
  • [10] COOK RD, 1990, J AM STAT ASSOC, V85, P640, DOI 10.2307/2289996