A fast procedure for outlier diagnostics in large regression problems

被引:46
作者
Peña, D [1 ]
Yohai, V
机构
[1] Univ Carlos III Madrid, Dept Stat & Econometr, E-28903 Getafe, Spain
[2] Univ Buenos Aires, RA-1053 Buenos Aires, DF, Argentina
关键词
masking; outliers; robust regression;
D O I
10.2307/2670164
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a procedure for computing a fast approximation to regression estimates based on the minimization of a robust scale. The procedure can be applied with a large number of independent variables where the usual algorithms require an unfeasible or extremely costly computer time. Also, it can be incorporated in any high-breakdown estimation method and may improve it with just little additional computer time. The procedure minimizes the robust scale over a set of tentative parameter vectors estimated by least squares after eliminating a set of possible outliers, which are obtained as follows. We represent each observation by the vector of changes of the least squares forecasts of the observation when each of the data points is deleted. Then we obtain the sets of possible outliers as the extreme points in the principal components of these vectors, or as the set of points with large residuals. The good performance of the procedure allows identification of multiple outliers, avoiding masking effects. We investigate the procedure's efficiency for robust estimation and power as an outlier detection tool in a large real dataset and in a simulation study.
引用
收藏
页码:434 / 445
页数:12
相关论文
共 23 条
[1]  
[Anonymous], J COMPUTATIONAL GRAP
[2]   FAST VERY ROBUST METHODS FOR THE DETECTION OF MULTIPLE OUTLIERS [J].
ATKINSON, AC .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (428) :1329-1339
[3]  
Cook R. D., 1982, RESIDUALS INFLUENCE
[4]   DETECTION OF INFLUENTIAL OBSERVATION IN LINEAR-REGRESSION [J].
COOK, RD .
TECHNOMETRICS, 1977, 19 (01) :15-18
[5]   PROCEDURES FOR THE IDENTIFICATION OF MULTIPLE OUTLIERS IN LINEAR-MODELS [J].
HADI, AS ;
SIMONOFF, JS .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (424) :1264-1272
[6]   LOCATION OF SEVERAL OUTLIERS IN MULTIPLE-REGRESSION DATA USING ELEMENTAL SETS [J].
HAWKINS, DM ;
BRADU, D ;
KASS, GV .
TECHNOMETRICS, 1984, 26 (03) :197-208
[7]   THE FEASIBLE SET ALGORITHM FOR LEAST MEDIAN OF SQUARES REGRESSION [J].
HAWKINS, DM .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1993, 16 (01) :81-101
[8]  
HAWKINS DM, 1994, COMPUTATIONAL STAT D, V17, P95
[9]  
HE X, 1990, ANN STAT, V20, P2161
[10]  
JORGENSEN B, 1992, SCAND J STAT, V19, P139