Too much ado about propensity score models? Comparing methods of propensity score matching

被引:272
作者
Baser, Onur [1 ]
机构
[1] Thomson Medstat, Ann Arbor, MI 48108 USA
关键词
propensity score matching; randomization; selection bias;
D O I
10.1111/j.1524-4733.2006.00130.x
中图分类号
F [经济];
学科分类号
02 ;
摘要
Objectives: A large number of possible techniques are available when conducting matching procedures, yet coherent guidelines for selecting the most appropriate application do not yet exist. In this article we evaluate several matching techniques and provide a suggested guideline for selecting the best technique. Methods: The main purpose of a matching procedure is to reduce selection bias by increasing the balance between the treatment and control groups. The following approach, consisting of five quantifiable steps, is proposed to check for balance: 1) Using two sample t-statistics to compare the means of the treatment and control groups for each explanatory variable; 2) Comparing the mean difference as a percentage of the average standard deviations; 3) Comparing percent reduction of bias in the means of the explanatory variables before and after matching; 4) Comparing treatment and control density estimates for the explanatory variables; and 5) Comparing the density estimates of the propensity scores of the control units with those of the treated units. We investigated seven different matching techniques and how they performed with regard to proposed five steps. Moreover, we estimate the average treatment effect with multivariate analysis and compared the results with the estimates of propensity score matching techniques. The Medstat MarketScan Data Base provided data for use in empirical examples of the utility of several matching methods. We conducted nearest neighborhood matching (NNM) analyses in seven ways: replacement, 2 to 1 matching, Mahalanobis matching (MM), MM with caliper, kernel matching, radius matching, and the stratification method. Results: Comparing techniques according to the above criteria revealed that the choice of matching has significant effects on outcomes. Patients with asthma are compared with patients without asthma and cost of illness ranged from $2040 to $4463 depending on the type of matching. After matching, we looked at the insignificant differences or larger P-values in the mean values (criterion 1); low mean differences as a percentage of the average standard deviation (criterion 2); 100% reduction bias in the means of explanatory variables (criterion 3); and insignificant differences when comparing the density estimates of the treatment and control groups (criterion 4 and criterion 5). Mahalanobis matching with caliber yielded the better results according all five criteria (Mean = $4463, SD = $3252). We also applied multivariate analysis over the matched sample. This decreased the deviation in cost of illness estimates more than threefold (Mean = $4456, SD = $996). Conclusions: Sensitivity analysis of the matching techniques is especially important because none of the proposed methods in the literature is a priori superior to the others. The suggested joint consideration of propensity score matching and multivariate analysis offers an approach to assessing the robustness of the estimates.
引用
收藏
页码:377 / 385
页数:9
相关论文
共 36 条
[1]  
[Anonymous], 1996, Multivariable Analysis: An Introduction
[2]  
[Anonymous], 1993, EVALUATION SYSTEMATI
[3]   USING THE LONGITUDINAL STRUCTURE OF EARNINGS TO ESTIMATE THE EFFECT OF TRAINING-PROGRAMS [J].
ASHENFELTER, O ;
CARD, D .
REVIEW OF ECONOMICS AND STATISTICS, 1985, 67 (04) :648-660
[4]   Logistic regression in the medical literature: Standards for use and reporting, with particular attention to one medical domain [J].
Bagley, SC ;
White, H ;
Golomb, BA .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2001, 54 (10) :979-985
[5]  
Bryson A., 2002, The use of propensity score matching in the evaluation of active labor market policies
[6]  
Conover WJ, 1999, PRACTICAL NONPARAMET, P3
[7]  
Crown WH, 2004, NBER FR HLTH POL RES, P95
[8]  
D'Agostino RB, 1998, STAT MED, V17, P2265, DOI 10.1002/(SICI)1097-0258(19981015)17:19<2265::AID-SIM918>3.0.CO
[9]  
2-B
[10]   Causal effects in, nonexperimental studies: Reevaluating the evaluation of training programs [J].
Dehejia, RH ;
Wahba, S .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1999, 94 (448) :1053-1062