Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality

被引:255
作者
Austin, PC
Tu, JV
机构
[1] Inst Clin Evaluat Sci, Toronto, ON M4N 3M5, Canada
[2] Univ Toronto, Dept Publ Hlth Sci, Toronto, ON M5S 1A8, Canada
[3] Univ Toronto, Dept Hlth Policy Management & Evaluat, Toronto, ON M5S 1A8, Canada
[4] Sunnybrook & Womens Coll, Hlth Sci Ctr, Clin Epidemiol & Hlth Care Res Program, Toronto, ON M4N 3M5, Canada
[5] Sunnybrook & Womens Coll, Hlth Sci Ctr, Div Gen Internal Med, Toronto, ON M4N 3M5, Canada
基金
加拿大健康研究院;
关键词
regression models; multivariate analysis; variable selection; logistic regression; acute myocardial infarction; epidemiology;
D O I
10.1016/j.jclinepi.2004.04.003
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objectives: Automated variable selection methods are frequently used to determine the independent predictors of an outcome. The objective of this study was to determine the reproducibility of logistic regression models developed using automated variable selection methods. Study Design and Setting: An initial set of 29 candidate variables were considered for predicting mortality after acute myocardial infarction (AMI). We drew 1,000 bootstrap samples from a dataset consisting of 4,911 patients admitted to hospital with an AMI. Using each bootstrap sample, logistic regression models predicting 30-day mortality were obtained using backward elimination, forward selection, and stepwise selection. The agreement between the different model selection methods and the agreement across the 1,000 bootstrap samples were compared. Results: Using 1,000 bootstrap samples, backward elimination identified 940 unique models for predicting mortality. Similar results were obtained for forward and stepwise selection. Three variables were identified as independent predictors of mortality among all bootstrap samples. Over half the candidate prognostic variables were identified as independent predictors in less than half of the bootstrap samples. Conclusion: Automated variable selection methods result in models that are unstable and not reproducible. The variables selected as independent predictors are sensitive to random fluctuations in the data. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:1138 / 1146
页数:9
相关论文
共 23 条