Monte Carlo EM for missing covariates in parametric regression models

被引:90
作者
Ibrahim, JG
Chen, MH
Lipsitz, SR
机构
[1] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[2] Dana Farber Canc Inst, Boston, MA 02115 USA
[3] Worcester Polytech Inst, Dept Math Sci, Worcester, MA 01609 USA
关键词
EM algorithm; generalized linear model; Gibbs sampler; maximum likelihood estimation; missing data mechanism; Poisson regression; proportional hazards; Weibull regression;
D O I
10.1111/j.0006-341X.1999.00591.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We propose a method for estimating parameters for general parametric regression models with an arbitrary number of missing covariates. We allow any pattern of missing data and assume that the missing data mechanism is ignorable throughout. When the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). We extend this method to continuous or mixed categorical and continuous covariates, and for arbitrary parametric regression models, by adapting a Monte Carlo version of the EM algorithm as discussed by Wei and Tanner (1990, Journal of the American, Statistical Association 85, 699-704). In addition, we discuss the Gibbs sampler for sampling from the conditional distribution of the missing covariates given the observed data and show that the appropriate complete conditionals are log-concave. The log-concavity property of the conditional distributions will facilitate a straightforward implementation of the Gibbs sampler via the adaptive rejection algorithm of Gilks and Wild (1992, Applied Statistics 41, 337-348). We assume the model for the response given the covariates is an arbitrary parametric regression model, such as a generalized linear model, a parametric survival model, or a nonlinear model. We model the marginal distribution of the covariates as a product of one-dimensional conditional distributions. This allows us a great deal of flexibility in modeling the distribution of the covariates and reduces the number of nuisance parameters that are introduced in the E-step. We present examples involving both simulated and real data.
引用
收藏
页码:591 / 596
页数:6
相关论文
共 10 条
[1]  
[Anonymous], APPL STAT, DOI DOI 10.2307/2347565
[2]   A RANDOMIZED PHASE-II STUDY OF ACIVICIN AND 4'DEOXYDOXORUBICIN IN PATIENTS WITH HEPATOCELLULAR-CARCINOMA IN AN EASTERN COOPERATIVE ONCOLOGY GROUP-STUDY [J].
FALKSON, G ;
CNAAN, A ;
SIMSON, IW ;
DAYAL, Y ;
FALKSON, H ;
SMITH, TJ ;
HALLER, DG .
AMERICAN JOURNAL OF CLINICAL ONCOLOGY-CANCER CLINICAL TRIALS, 1990, 13 (06) :510-515
[3]   HEPATOCELLULAR-CARCINOMA - AN ECOG RANDOMIZED PHASE-II STUDY OF INTERFERON-BETA AND MENAGORIL [J].
FALKSON, G ;
LIPSITZ, S ;
BORDEN, E ;
SIMSON, I ;
HALLER, D .
AMERICAN JOURNAL OF CLINICAL ONCOLOGY-CANCER CLINICAL TRIALS, 1995, 18 (04) :287-292
[4]   INCOMPLETE DATA IN GENERALIZED LINEAR-MODELS [J].
IBRAHIM, JG .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1990, 85 (411) :765-769
[5]   Parameter estimation from incomplete data in binomial regression when the missing data mechanism is nonignorable [J].
Ibrahim, JG ;
Lipsitz, SR .
BIOMETRICS, 1996, 52 (03) :1071-1078
[6]  
Lipsitz S R, 1996, Lifetime Data Anal, V2, P5, DOI 10.1007/BF00128467
[7]   A conditional model for incomplete covariates in parametric regression models [J].
Lipsitz, SR ;
Ibrahim, JG .
BIOMETRIKA, 1996, 83 (04) :916-922
[8]   MAXIMUM-LIKELIHOOD ESTIMATION FOR MIXED CONTINUOUS AND CATEGORICAL-DATA WITH MISSING VALUES [J].
LITTLE, RJA ;
SCHLUCHTER, MD .
BIOMETRIKA, 1985, 72 (03) :497-512
[9]  
RUBIN DB, 1976, BIOMETRIKA, V63, P581, DOI 10.1093/biomet/63.3.581
[10]   A MONTE-CARLO IMPLEMENTATION OF THE EM ALGORITHM AND THE POOR MANS DATA AUGMENTATION ALGORITHMS [J].
WEI, GCG ;
TANNER, MA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1990, 85 (411) :699-704