Assessing the Performance of Prediction Models A Framework for Traditional and Novel Measures

被引:3459
作者
Steyerberg, Ewout W. [1 ]
Vickers, Andrew J. [2 ]
Cook, Nancy R. [3 ]
Gerds, Thomas [4 ]
Gonen, Mithat [2 ]
Obuchowski, Nancy [5 ]
Pencina, Michael J. [6 ]
Kattan, Michael W. [5 ]
机构
[1] Erasmus MC, Dept Publ Hlth, NL-3000 CA Rotterdam, Netherlands
[2] Mem Sloan Kettering Canc Ctr, Dept Epidemiol & Biostat, New York, NY 10021 USA
[3] Harvard Univ, Brigham & Womens Hosp, Sch Med, Boston, MA 02115 USA
[4] Univ Copenhagen, Inst Publ Hlth, Copenhagen, Denmark
[5] Cleveland Clin, Dept Quantitat Hlth Sci, Cleveland, OH 44106 USA
[6] Boston Univ, Dept Math & Stat, Boston, MA 02215 USA
关键词
OPERATING CHARACTERISTIC CURVE; RANDOMIZED CONTROLLED-TRIALS; ROC CURVE; PROBABILISTIC DIAGNOSIS; COVARIATE ADJUSTMENT; PROGNOSTIC MODELS; TESTICULAR CANCER; CLINICAL-PRACTICE; RISK PREDICTION; MASS HISTOLOGY;
D O I
10.1097/EDE.0b013e3181c30fb2
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
The performance of prediction models can be assessed using a variety of methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver operating characteristic [ROC] curve), and goodness-of-fit statistics for calibration. Several new measures have recently been proposed that can be seen as refinements of discrimination measures, including variants of the c statistic for survival, reclassification tables, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Moreover, decision-analytic measures have been proposed, including decision curves to plot the net benefit achieved by making decisions based on model predictions. We aimed to define the role of these relatively novel approaches in the evaluation of the performance of prediction models. For illustration, we present a case study of predicting the presence of residual tumor versus benign tissue in patients with testicular cancer (n = 544 for model development, n = 273 for external validation). We suggest that reporting discrimination and calibration will always be important for a prediction model. Decision-analytic measures should be reported if the predictive model is to be used for clinical decisions. Other measures of performance may be warranted in specific applications, such as reclassification metrics to gain insight into the value of adding a novel predictor to an established model.
引用
收藏
页码:128 / 138
页数:11
相关论文
共 71 条
[61]  
Steyerberg EW, 1999, CANCER, V85, P1331
[62]   Decision curve analysis: A discussion [J].
Steyerberg, Ewout W. ;
Vickers, Andrew J. .
MEDICAL DECISION MAKING, 2008, 28 (01) :146-149
[63]   PREDICTIVE VALUE OF STATISTICAL-MODELS [J].
VANHOUWELINGEN, JC ;
LECESSIE, S .
STATISTICS IN MEDICINE, 1990, 9 (11) :1303-1325
[64]   Validation of a prediction model and its predictors for the histology of residual masses in nonseminomatous testicular cancer [J].
Vergouwe, Y ;
Steyerberg, EW ;
Foster, RS ;
Habbema, JDF ;
Donohue, JP .
JOURNAL OF UROLOGY, 2001, 165 (01) :84-88
[65]  
Vergouwe Yvonne, 2002, Semin Urol Oncol, V20, P96, DOI 10.1053/suro.2002.32521
[66]   Selecting patients for randomized trials: a systematic approach based on risk group [J].
Vickers, Andrew J. ;
Kramer, Barry S. ;
Baker, Stuart G. .
TRIALS, 2006, 7 (1)
[67]   Decision curve analysis: A novel method for evaluating prediction models [J].
Vickers, Andrew J. ;
Elkin, Elena B. .
MEDICAL DECISION MAKING, 2006, 26 (06) :565-574
[68]  
Vittinghoff E, 2005, STAT BIOL HEALTH, pVII
[69]   EXTERNAL CORRESPONDENCE - DECOMPOSITIONS OF THE MEAN PROBABILITY SCORE [J].
YATES, JF .
ORGANIZATIONAL BEHAVIOR AND HUMAN PERFORMANCE, 1982, 30 (01) :132-156
[70]  
YOUDEN WJ, 1950, BIOMETRICS, V6, P172, DOI 10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO