Identification of multiple high leverage points in logistic regression

被引:12
作者
Imon, A. H. M. Rahmatullah [1 ]
Hadi, Ali S. [2 ]
机构
[1] Ball State Univ, Dept Math Sci, Muncie, IN 47306 USA
[2] Amer Univ Cairo, Dept Math, Cairo, Egypt
关键词
logistic regression; covariates; high leverage points; masking; swamping; group deletion; robust regression; deletion median distance from the median; Monte Carlo simulation; LINEAR-REGRESSION; INFLUENTIAL OBSERVATIONS; DIAGNOSTICS;
D O I
10.1080/02664763.2013.822057
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
070103 [概率论与数理统计]; 140311 [社会设计与社会创新];
摘要
Leverage values are being used in regression diagnostics as measures of unusual observations in the X-space. Detection of high leverage observations or points is crucial due to their responsibility for masking outliers. In linear regression, high leverage points (HLP) are those that stand far apart from the center (mean) of the data and hence the most extreme points in the covariate space get the highest leverage. But Hosemer and Lemeshow [Applied logistic regression, Wiley, New York, 1980] pointed out that in logistic regression, the leverage measure contains a component which can make the leverage values of genuine HLP misleadingly very small and that creates problem in the correct identification of the cases. Attempts have been made to identify the HLP based on the median distances from the mean, but since they are designed for the identification of a single high leverage point they may not be very effective in the presence of multiple HLP due to their masking (false-negative) and swamping (false-positive) effects. In this paper we propose a new method for the identification of multiple HLP in logistic regression where the suspect cases are identified by a robust group deletion technique and they are confirmed using diagnostic techniques. The usefulness of the proposed method is then investigated through several well-known examples and a Monte Carlo simulation.
引用
收藏
页码:2601 / 2616
页数:16
相关论文
共 22 条
[1]
ANDREWS DF, 1978, J ROY STAT SOC B MET, V40, P85
[2]
Barnett V., 1995, OUTLIERS STAT DATA, V3th
[3]
Belsley D.A., 2005, REGRESSION DIAGNOSTI
[4]
BACON: blocked adaptive computationally efficient outlier nominators [J].
Billor, N ;
Hadi, AS ;
Velleman, PF .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2000, 34 (03) :279-298
[5]
BROWN BW, 1980, BIOSTATISTICS CASEBO, P3
[6]
Chatterjee S., 1986, Stat. Sci., V1, P379, DOI [10.1214/ss/1177013622, DOI 10.1214/SS/1177013622]
[7]
Cook R. D., 1990, J AM STAT ASSOC, V85, P648
[8]
The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression [J].
Habshah, M. ;
Norazan, M. R. ;
Imon, A. H. M. Rahmatullah .
JOURNAL OF APPLIED STATISTICS, 2009, 36 (05) :507-520
[9]
Detection of outliers [J].
Hadi, Ali S. ;
Imon, A. H. M. Rahmatullah ;
Werner, Mark .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2009, 1 (01) :57-70
[10]
A NEW MEASURE OF OVERALL POTENTIAL INFLUENCE IN LINEAR-REGRESSION [J].
HADI, AS .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (01) :1-27