Simplifying a prognostic model: a simulation study based on clinical data

被引:103
作者
Ambler, G
Brady, AR
Royston, P
机构
[1] UCL, Dept Stat Sci, London WC1E 7HB, England
[2] Intens Care Natl Audit & Res Ctr, London WC1H 9HR, England
[3] MRC, Clin Trials Unit, London NW1 2DA, England
关键词
prognostic models; variable selection; penalisation; lassos; ROC; AIC;
D O I
10.1002/sim.1422
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Prognostic models are designed to predict a clinical outcome in individuals or groups of individuals with a particular disease or condition. To avoid bias many researchers advocate the use of full models developed by prespecifying predictors. Variable selection is not employed and the resulting models may be large and complicated. In practice more parsimonious models that retain most of the prognostic information may be preferred. We investigate the effect on various performance measures, including mean square error and prognostic classification, of three methods for estimating full models (including penalized estimation and Tibshirani's lasso) and consider two methods (backwards elimination and a new proposal called stepdown) for simplifying full models. Simulation studies based on two medical data sets suggest that simplified models can be found that perform nearly as well as, or sometimes even better than, full models. Optimizing the Akaike information criterion appears to be appropriate for choosing the degree of simplification. Copyright (C) 2002 John Wiley Sons, Ltd.
引用
收藏
页码:3803 / 3822
页数:20
相关论文
共 28 条
[1]  
Altman DG, 2000, STAT MED, V19, P453, DOI 10.1002/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.3.CO
[2]  
2-X
[3]   Fractional polynomial model selection procedures: Investigation of type I error rate [J].
Ambler, G ;
Royston, P .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2001, 69 (01) :89-108
[4]  
Brier G.W., 1950, Monthly Weather Review, V78, P1
[5]   FLEXIBLE REGRESSION-MODELS WITH CUBIC-SPLINES [J].
DURRLEMAN, S ;
SIMON, R .
STATISTICS IN MEDICINE, 1989, 8 (05) :551-561
[6]   FLEXIBLE METHODS FOR ANALYZING SURVIVAL-DATA USING SPLINES, WITH APPLICATIONS TO BREAST-CANCER PROGNOSIS [J].
GRAY, RJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1992, 87 (420) :942-951
[7]   THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE [J].
HANLEY, JA ;
MCNEIL, BJ .
RADIOLOGY, 1982, 143 (01) :29-36
[8]   REGRESSION MODELING STRATEGIES FOR IMPROVED PROGNOSTIC PREDICTION [J].
HARRELL, FE ;
LEE, KL ;
CALIFF, RM ;
PRYOR, DB ;
ROSATI, RA .
STATISTICS IN MEDICINE, 1984, 3 (02) :143-152
[9]   REGRESSION-MODELS IN CLINICAL-STUDIES - DETERMINING RELATIONSHIPS BETWEEN PREDICTORS AND RESPONSE [J].
HARRELL, FE ;
LEE, KL ;
POLLOCK, BG .
JOURNAL OF THE NATIONAL CANCER INSTITUTE, 1988, 80 (15) :1198-1202
[10]  
Harrell FE, 2001, Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis, V2nd