Estimating and using propensity scores with partially missing data

被引：198

作者：

D'Agostino, RB

Rubin, DB

机构：

[1] Wake Forest Univ, Bowman Gray Sch Med, Dept Publ Hlth Sci, Biostat Sect, Winston Salem, NC 27157 USA

[2] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2000年 / 95卷 / 451期

关键词：

general location model; ignorability; iterative proportional fitting; log-linear model; matching; matched sampling; maximum likelihood estimation; missing data; observational study; pattern-mixture model;

D O I：

10.2307/2669455

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Investigators in observational studies have no control over treatment assignment. As a result, large differences can exist between the treatment and control groups on observed covariates, which can lead to badly biased estimates of treatment effects. Propensity score methods are an increasingly popular method for balancing the distribution of the covariates in the two groups to reduce this bias; for example, using matching or subclassification, sometimes in combination with model-based adjustment. To estimate propensity scores, which are the conditional probabilities of being treated given a vector of observed covariates, we must model the distribution of the treatment indicator given these observed covariates. Much work has been done in the case where covariates are fully observed. We address the problem of calculating propensity scores when covariates can have missing values. In such cases, which commonly arise in practice, the pattern of missing covariates can be prognostically important, and then propensity scores should condition both on observed values of covariates and on the observed missing-data indicators. Using the resulting generalized propensity scores to adjust for the observed background differences between treatment and control groups leads, in expectation, to balanced distributions of observed covariates in the treatment and control groups, as well as balanced distributions of patterns of missing data. The methods are illustrated using the generalized propensity scores to create matched samples in a study of the effects of postterm pregnancy.

引用

页码：749 / 759

页数：11

共 47 条

[1] [Anonymous], ANAL INCOMPLETE DATA
[2] [Anonymous], 1958, INTRO MULTIVARIATE S
[3] REGRESSION-ANALYSIS FOR CATEGORICAL VARIABLES WITH OUTCOME SUBJECT TO NONIGNORABLE NONRESPONSE
BAKER, SG
LAIRD, NM
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1988, 83 (401) : 62 - 69
[4] Survival and functional status after resection of recurrent glioblastoma multiforme
Barker, FG
Chang, SM
Gutin, PH
Malec, MK
McDermott, MW
Prados, MD
Wilson, CB
[J]. NEUROSURGERY, 1998, 42 (04) : 709 - 720
[5] Bishop M.M., 1975, DISCRETE MULTIVARIAT
[6] THE ANALYSIS OF REPEATED CATEGORICAL MEASUREMENTS SUBJECT TO NONIGNORABLE NONRESPONSE
CONAWAY, MR
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1992, 87 (419) : 817 - 824
[7] The effectiveness of right heart catheterization in the initial care of critically ill patients
Connors, AF
Speroff, T
Dawson, NV
Thomas, C
Harrell, FE
Wagner, D
Desbiens, N
Goldman, L
Wu, AW
Califf, RM
Fulkerson, WJ
Vidaillet, H
Broste, S
Bellamy, P
Lynn, J
Knaus, WA
[J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1996, 276 (11): : 889 - 897
[8] A firm trial of interdisciplinary rounds on the inpatient medical wards - An intervention designed using continuous quality improvement
Curley, C
McEachern, JE
Speroff, T
[J]. MEDICAL CARE, 1998, 36 (08) : AS4 - AS12
[9] D'Agostino RB, 1998, STAT MED, V17, P2265, DOI 10.1002/(SICI)1097-0258(19981015)17:19<2265::AID-SIM918>3.0.CO
[10] 2-B

← 1 2 3 4 5 →