Stepwise selection in small data sets: A simulation study of bias in logistic regression analysis

被引:362
作者
Steyerberg, EW [1 ]
Eijkemans, MJC [1 ]
Habbema, JDF [1 ]
机构
[1] Erasmus Univ, Dept Publ Hlth, Ctr Clin Decis Sci, NL-3000 DR Rotterdam, Netherlands
关键词
regression analysis; logistic models; bias; variable selection;
D O I
10.1016/S0895-4356(99)00103-1
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Stepwise selection methods are widely applied to identify covariables for inclusion in regression models. One of the problems of stepwise selection is biased estimation of the regression coefficients. We illustrate this "selection bias" with logistic regression in the GUSTO-I trial (40,830 patients with an acute myocardial infarction). Random samples were drawn that included 3, 5, 10, 20, or 40 events per variable (EPV). Backward stepwise selection was applied in models containing 8 or 16 pre-specified predictors of 30-day mortality. We found a considerable overestimation of regression coefficients of selected covariables. The selection bias decreased with increasing EPV. For EPV 3, 10, or 40, the bias exceeded 25% fur 7, 3, and 1 in the 8-predictor model respectively, when a conventional selection criterion was used (alpha = 0.05). For these EPV values, the bias was less than 20% for all covariables when no selection was applied. We conclude that stepwise selection may result in a substantial bias of estimated regression coefficients. (C) 1999 Elsevier Science Inc.
引用
收藏
页码:935 / 942
页数:8
相关论文
共 42 条
[1]   BOOTSTRAP INVESTIGATION OF THE STABILITY OF A COX REGRESSION-MODEL [J].
ALTMAN, DG ;
ANDERSEN, PK .
STATISTICS IN MEDICINE, 1989, 8 (07) :771-783
[2]   RANDOMIZATION AND BASE-LINE COMPARISONS IN CLINICAL-TRIALS [J].
ALTMAN, DG ;
DORE, CJ .
LANCET, 1990, 335 (8682) :149-153
[3]  
[Anonymous], 1996, Multivariable Analysis
[4]  
[Anonymous], 1990, SUBSET SELECTION REG, DOI DOI 10.1007/978-1-4899-2939-6
[5]  
ATKINSON AC, 1980, BIOMETRIKA, V67, P413, DOI 10.1093/biomet/67.2.413
[6]  
BANCROFT TA, 1977, INT STAT REV, V45, P117
[7]   Model selection: An integral part of inference [J].
Buckland, ST ;
Burnham, KP ;
Augustin, NH .
BIOMETRICS, 1997, 53 (02) :603-618
[8]   MODEL UNCERTAINTY, DATA MINING AND STATISTICAL-INFERENCE [J].
CHATFIELD, C .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1995, 158 :419-466
[9]   THE BOOTSTRAP AND IDENTIFICATION OF PROGNOSTIC FACTORS VIA COX PROPORTIONAL HAZARDS REGRESSION-MODEL [J].
CHEN, CH ;
GEORGE, SL .
STATISTICS IN MEDICINE, 1985, 4 (01) :39-46
[10]   Importance of events per independent variable in proportional hazards analysis .1. Background, goals, and general strategy [J].
Concato, J ;
Peduzzi, P ;
Holford, TR ;
Feinstein, AR .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 1995, 48 (12) :1495-1501