A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality

被引:125
作者
Austin, Peter C.
机构
[1] Inst Clin Evaluat Sci, Toronto, ON M4N 3M5, Canada
[2] Univ Toronto, Dept Publ Hlth Sci, Toronto, ON, Canada
[3] Univ Toronto, Dept Hlth Management Policy & Evaluat, Toronto, ON, Canada
关键词
logistic regression; regression trees; classification trees; predictive model; validation; recursive partitioning; generalized additive models; multivariate adaptive regression splines; acute myocardial infarction;
D O I
10.1002/sim.2770
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Clinicians and health service researchers are frequently interested in predicting patient- specific probabilities of adverse events (e.g. death, disease recurrence, post-operative complications, hospital readmission). There is an increasing interest in the use of classification and regression trees (CART) for predicting outcomes in clinical studies. We compared the predictive accuracy of logistic regression with that of regression trees for predicting mortality after hospitalization with an acute myocardial infarction (AMI). We also examined the predictive ability of two other types of data-driven models: generalized additive models (GAMs) and multivariate adaptive regression splines (MARS). We used data on 9484 patients admitted to hospital with an AMI in Ontario. We used repeated split-sample validation: the data were randomly divided into derivation and validation samples. Predictive models were estimated using the derivation sample and the predictive accuracy of the resultant model was assessed using the area under the receiver operating characteristic (ROC) curve in the validation sample. This process was repeated 1000 times-the initial data set was randomly divided into derivation and validation samples 1000 times, and the predictive accuracy of each method was assessed each time. The mean ROC curve area for the regression tree models in the 1000 derivation samples was 0.762, while the mean ROC curve area of a simple logistic regression model was 0.845. The mean ROC curve areas for the other methods ranged from a low of 0.831 to a high of 0. 85 1. Our study shows that regression trees do not perform as well as logistic regression for predicting mortality following AML However, the logistic regression model had performance comparable to that of more flexible, data-driven models such as GAMs and MARS. Copyright (c) 2006 John Wiley & Sons, Ltd.
引用
收藏
页码:2937 / 2957
页数:21
相关论文
共 58 条
[1]  
[Anonymous], 1989, Applied Logistic Regression
[2]  
[Anonymous], 2004, QUALITY CARDIAC CARE
[3]   The utility of structure-activity relationship (SAR) models for prediction and covariate selection in developmental toxicity: Comparative analysis of logistic regression and decision tree models [J].
Arena, VC ;
Sussman, NB ;
Mazumdar, S ;
Yu, S ;
Macina, OT .
SAR AND QSAR IN ENVIRONMENTAL RESEARCH, 2004, 15 (01) :1-18
[4]   Bootstrap methods for developing predictive models [J].
Austin, PC ;
Tu, JV .
AMERICAN STATISTICIAN, 2004, 58 (02) :131-137
[5]   The use of finite mixture models to estimate the distribution of the health utilities index in the presence of a ceiling effect [J].
Austin, PC ;
Escobar, MD .
JOURNAL OF APPLIED STATISTICS, 2003, 30 (08) :909-923
[6]   Predictors of late asthmatic response - Logistic regression and classification tree analyses [J].
Avila, PC ;
Segal, MR ;
Wong, HH ;
Boushey, HA ;
Fahy, JV .
AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2000, 161 (06) :2092-2095
[7]   Diagnosis of pancreatic cancer using serum proteomic profiling [J].
Bhattacharyya, S ;
Siegel, ER ;
Petersen, GM ;
Chari, ST ;
Suva, LJ ;
Haun, RS .
NEOPLASIA, 2004, 6 (05) :674-686
[8]   A nationwide, multicenter, case-control study comparing risk factors, treatment, and outcome for vancomycin-resistant and -susceptible enterococcal bacteremia [J].
Bhavnani, SM ;
Drake, JA ;
Forrest, A ;
Deinhart, JA ;
Jones, RN ;
Biedenbach, DJ ;
Ballow, CH .
DIAGNOSTIC MICROBIOLOGY AND INFECTIOUS DISEASE, 2000, 36 (03) :145-158
[9]  
Clark L. A., 1993, STAT MODELS S
[10]   DEMAND FOR AUTOMOBILES [J].
CRAGG, JG ;
UHLER, RS .
CANADIAN JOURNAL OF ECONOMICS, 1970, 3 (03) :386-406