Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples

被引:4476
作者
Austin, Peter C. [1 ,2 ,3 ]
机构
[1] Inst Clin Evaluat Sci, Toronto, ON M4N 3M5, Canada
[2] Univ Toronto, Dalla Lana Sch Publ Hlth, Toronto, ON, Canada
[3] Univ Toronto, Dept Hlth Policy Management & Evaluat, Toronto, ON M5S 1A1, Canada
基金
加拿大健康研究院;
关键词
balance; goodness-of-fit; observational study; propensity score; matching; propensity-score matching; standardized difference; bias; ACUTE MYOCARDIAL-INFARCTION; HEART-FAILURE; ODDS RATIO; MODELS; PRINCIPLES; REGRESSION;
D O I
10.1002/sim.3697
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The propensity score is a subject's probability of treatment, conditional on observed baseline covariates. Conditional on the true propensity score, treated and untreated subjects have similar distributions of observed baseline covariates. Propensity-score matching is a popular method of using the propensity score in the medical literature. Using this approach, matched sets of treated and untreated Subjects with similar values of the propensity score are formed. Inferences about treatment effect made using propensity-score matching are valid only if, in the matched sample, treated and untreated subjects have similar distributions of measured baseline covariates. In this paper we discuss the following methods for assessing whether the propensity score model has been correctly specified: comparing means and prevalences of baseline characteristics using standardized differences; ratios comparing the variance of continuous covariates between treated and untreated subjects; comparison of higher order moments and interactions; five-number summaries; and graphical methods such as quantile-quantile plots, side-by-side boxplots, and non-parametric density plots for comparing the distribution of baseline covariates between treatment groups. We describe methods to determine the sampling distribution of the standardized difference when the true standardized difference is equal to zero, thereby allowing one to determine the range of standardized differences that are plausible with the propensity score model having been correctly specified. We highlight the limitations of some previously used methods for assessing the adequacy of the specification of the propensity-score model. In particular, methods based on comparing the distribution of the estimated propensity score between treated and untreated subjects are uninformative. Copyright (C) 2009 John Wiley & Sons, Ltd.
引用
收藏
页码:3083 / 3107
页数:25
相关论文
共 47 条