Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis

被引:2067
作者
Steyerberg, EW
Harrell, FE
Borsboom, GJJM
Eijkemans, MJC
Vergouwe, Y
Habbema, JDF
机构
[1] Erasmus Univ, Dept Publ Hlth, Ctr Clin Decis Sci, NL-3000 DR Rotterdam, Netherlands
[2] Univ Virginia, Dept Hlth Evaluat Sci, Div Biostat & Epidemiol, Charlottesville, VA USA
关键词
predictive models; internal validation; logistic regression analysis; bootstrapping;
D O I
10.1016/S0895-4356(01)00341-9
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
The performance of a predictive model is overestimated when simply determined on the sample of subjects that was used to construct the model. Several internal validation methods are available that aim to provide a more accurate estimate of model performance in new subjects. We evaluated several variants of split-sample, cross-validation and bootstrapping methods with a logistic regression model that included eight predictors for 30-day mortality after an acute myocardial infarction. Random samples with a size between,n = 572 and n = 9165 were drawn from a large data set (GUSTO-I; n = 40,830; 2851 deaths) to reflect modeling in data sets with between 5 and 80 events per variable. Independent performance was determined on the remaining subjects. Performance measures included discriminative ability, calibration and overall accuracy. We found that split-sample analyses gave overly pessimistic estimates of performance with large variability. Cross-validation on 10% of the sample had low bias and low variability, but was not suitable fur all performance measures. Internal validity could best be estimated with bootstrapping, which provided stable estimates with low bias. We conclude that split-sample validation is inefficient, and recommend bootstrapping for estimation of internal validity of a predictive logistic regression model. (C) 2001 Elsevier Science Inc. All rights reserved.
引用
收藏
页码:774 / 781
页数:8
相关论文
共 37 条
[1]   BOOTSTRAP INVESTIGATION OF THE STABILITY OF A COX REGRESSION-MODEL [J].
ALTMAN, DG ;
ANDERSEN, PK .
STATISTICS IN MEDICINE, 1989, 8 (07) :771-783
[2]  
Altman DG, 2000, STAT MED, V19, P453, DOI 10.1002/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.3.CO
[3]  
2-X
[4]  
[Anonymous], 1993, INTRO BOOTSTRAP, DOI DOI 10.1007/978-1-4899-4541-9
[5]  
[Anonymous], COMP STRATEGIES VALI
[6]  
ARKES HR, 1995, MED DECIS MAKING, V15, P120
[7]   Model selection: An integral part of inference [J].
Buckland, ST ;
Burnham, KP ;
Augustin, NH .
BIOMETRICS, 1997, 53 (02) :603-618
[8]   MODEL UNCERTAINTY, DATA MINING AND STATISTICAL-INFERENCE [J].
CHATFIELD, C .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1995, 158 :419-466
[9]  
COPAS JB, 1983, J R STAT SOC B, V45, P311
[10]  
COX DR, 1958, BIOMETRIKA, V45, P562, DOI 10.1093/biomet/45.3-4.562