A fast algorithm for the minimum covariance determinant estimator

被引:1667
作者
Rousseeuw, PJ
Van Driessen, K
机构
[1] Univ Instelling Antwerp, Dept Math & Comp Sci, B-2610 Wilrijk, Belgium
[2] Univ Faculteiten St Ignatius, Fac Appl Econ, B-2000 Antwerp, Belgium
关键词
breakdown value; multivariate location and scatter; outlier detection; regression; robust estimation;
D O I
10.2307/1270566
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The minimum covariance determinant (MCD) method of Rousseeuw is a highly robust estimator of multivariate location and scatter. Its objective is to find h observations (out of n) whose covariance matrix has the lowest determinant. Until now, applications of the MCD were hampered by the computation time of existing algorithms, which were limited to a few hundred objects in a few dimensions. We discuss two important applications of larger size, one about a production process at Philips with n = 677 objects and p = 9 variables, and a dataset from astronomy with n = 137,256 objects and p = 27 variables. To deal with such problems we have developed a new algorithm for the MCD, called FAST-MCD. The basic ideas are an inequality involving order statistics and determinants, and techniques which we call "selective iteration" and "nested extensions." For small datasets, FAST-MCD typically finds the exact MCD, whereas for larger datasets it gives more accurate results than existing algorithms and is faster by orders of magnitude. Moreover, FAST-MCD is able to detect an exact fit-that is, a hyperplane containing h or more observations. The new algorithm makes the MCD method available as a routine tool for analyzing multivariate data. We also propose the distance-distance plot (D-D plot), which displays MCD-based robust distances versus Mahalanobis distances, and illustrate it with some examples.
引用
收藏
页码:212 / 223
页数:12
相关论文
共 27 条