Validation and updating of predictive logistic regression models: a study on sample size and shrinkage

被引:421
作者
Steyerberg, EW
Borsboom, GJJM
van Houwelingen, HC
Eijkemans, MJC
Habbema, JDF
机构
[1] Erasmus MC, Ctr Clin Decis Sci, Dept Publ Hlth, NL-3000 DR Rotterdam, Netherlands
[2] Leiden Univ, Dept Med Stat, Ctr Med, NL-2300 RA Leiden, Netherlands
关键词
logistic regression; validation; updating; shrinkage;
D O I
10.1002/sim.1844
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A logistic regression model may be used to provide predictions of outcome for individual patients at another centre than where the model was developed. When empirical data are available from this centre, the validity of predictions can be assessed by comparing observed outcomes and predicted probabilities. Subsequently, the model may be updated to improve predictions for future patients. As an example, we analysed 30-day mortality after acute myocardial infarction in a large data set (GUSTO-I, n=40830). We validated and updated a previously published model from another study (TIMI-II, n=3339) in validation samples ranging from small (200 patients, 14 deaths) to large (10000 patients, 700 deaths). Updated models were tested on independent patients. Updating methods included re-calibration (re-estimation of the intercept or slope of the linear predictor) and more structural model revisions (re-estimation of some or all regression coefficients, model extension with more predictors). We applied heuristic shrinkage approaches in the model revision methods, such that regression coefficients were shrunken towards their re-calibrated values. Parsimonious updating methods were found preferable to more extensive model revisions, which should only be attempted with relatively large validation samples in combination with shrinkage. Copyright (C) 2004 John Wiley Sons, Ltd.
引用
收藏
页码:2567 / 2586
页数:20
相关论文
共 42 条
[11]   SHORT-TERM RISK STRATIFICATION AT ADMISSION BASED ON SIMPLE CLINICAL-DATA IN ACUTE MYOCARDIAL-INFARCTION [J].
DUBOIS, C ;
PIERARD, LA ;
ALBERT, A ;
SMEETS, JP ;
DEMOULIN, JC ;
BOLAND, J ;
KULBERTUS, HE .
AMERICAN JOURNAL OF CARDIOLOGY, 1988, 61 (04) :216-219
[12]  
Ennis M, 1998, STAT MED, V17, P2501
[13]   METHODS FOR EPIDEMIOLOGIC ANALYSES OF MULTIPLE EXPOSURES - A REVIEW AND COMPARATIVE-STUDY OF MAXIMUM-LIKELIHOOD, PRELIMINARY-TESTING, AND EMPIRICAL-BAYES REGRESSION [J].
GREENLAND, S .
STATISTICS IN MEDICINE, 1993, 12 (08) :717-736
[14]   REGRESSION MODELING STRATEGIES FOR IMPROVED PROGNOSTIC PREDICTION [J].
HARRELL, FE ;
LEE, KL ;
CALIFF, RM ;
PRYOR, DB ;
ROSATI, RA .
STATISTICS IN MEDICINE, 1984, 3 (02) :143-152
[15]  
Harrell FE, 1996, STAT MED, V15, P361, DOI 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO
[16]  
2-4
[17]  
Harrell FE Jr, 2001, REGRESSION MODELING
[18]   Ready-made, recalibrated, or remodeled? Issues in the use of risk indexes for assessing mortality after coronary artery bypass graft surgery [J].
Ivanov, J ;
Tu, JV ;
Naylor, CD .
CIRCULATION, 1999, 99 (16) :2098-2104
[19]   Assessing the generalizability of prognostic information [J].
Justice, AC ;
Covinsky, KE ;
Berlin, JA .
ANNALS OF INTERNAL MEDICINE, 1999, 130 (06) :515-524
[20]   PREDICTORS OF 30-DAY MORTALITY IN THE ERA OF REPERFUSION FOR ACUTE MYOCARDIAL-INFARCTION - RESULTS FROM AN INTERNATIONAL TRIAL OF 41 021 PATIENTS [J].
LEE, KL ;
WOODLIEF, LH ;
TOPOL, EJ ;
WEAVER, WD ;
BETRIU, A ;
COL, J ;
SIMOONS, M ;
AYLWARD, P ;
VANDEWERF, F ;
CALIFF, RM .
CIRCULATION, 1995, 91 (06) :1659-1668