Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation

被引:32
作者
Austin, Peter C. [1 ,2 ]
机构
[1] Inst Clin Evaluat Sci, Toronto, ON M4N 3M5, Canada
[2] Univ Toronto, Toronto, ON M5S 1A1, Canada
基金
加拿大健康研究院;
关键词
PROPENSITY SCORE ESTIMATION; REGRESSION; INFERENCE;
D O I
10.1080/00273171.2012.640600
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Researchers are increasingly using observational or nonrandomized data to estimate causal treatment effects. Essential to the production of high-quality evidence is the ability to reduce or minimize the confounding that frequently occurs in observational studies. When using the potential outcome framework to define causal treatment effects, one requires the potential outcome under each possible treatment. However, only the outcome under the actual treatment received is observed, whereas the potential outcomes under the other treatments are considered missing data. Some authors have proposed that parametric regression models be used to estimate potential outcomes. In this study, we examined the use of ensemble-based methods (bagged regression trees, random forests, and boosted regression trees) to directly estimate average treatment effects by imputing potential outcomes. We used an extensive series of Monte Carlo simulations to estimate bias, variance, and mean squared error of treatment effects estimated using different ensemble methods. For comparative purposes, we compared the performance of these methods with inverse probability of treatment weighting using the propensity score when logistic regression or ensemble methods were used to estimate the propensity score. Using boosted regression trees of depth 3 or 4 to impute potential outcomes tended to result in estimates with bias equivalent to that of the best performing methods. Using an empirical case study, we compared inferences on the effect of in-hospital smoking cessation counseling on subsequent mortality in patients hospitalized with an acute myocardial infarction.
引用
收藏
页码:115 / 135
页数:21
相关论文
共 31 条
[1]  
[Anonymous], 1984, OLSHEN STONE CLASSIF, DOI 10.2307/2530946
[2]  
[Anonymous], 2005, R LANG ENV STAT COMP
[3]   A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality [J].
Austin, Peter C. .
STATISTICS IN MEDICINE, 2007, 26 (15) :2937-2957
[4]   An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies [J].
Austin, Peter C. .
MULTIVARIATE BEHAVIORAL RESEARCH, 2011, 46 (03) :399-424
[5]   A Tutorial and Case Study in Propensity Score Analysis: An Application to Estimating the Effect of In-Hospital Smoking Cessation Counseling on Mortality [J].
Austin, Peter C. .
MULTIVARIATE BEHAVIORAL RESEARCH, 2011, 46 (01) :119-151
[6]   Logistic regression had superior performance compared with regression trees for predicting in-hospital mortality in patients hospitalized with heart failure [J].
Austin, Peter C. ;
Tu, Jack V. ;
Lee, Douglas S. .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2010, 63 (10) :1145-1155
[7]   A Data-Generation Process for Data with Specified Risk Differences or Numbers Needed to Treat [J].
Austin, Peter C. .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2010, 39 (03) :563-577
[8]   Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples [J].
Austin, Peter C. .
STATISTICS IN MEDICINE, 2009, 28 (25) :3083-3107
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]  
Clark L.A., 1993, Statistical Models in S, P377