Missing covariates in generalized linear models when the missing data mechanism is non-ignorable

被引:195
作者
Ibrahim, JG
Lipsitz, SR
机构
[1] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[2] Dana Farber Canc Inst, Boston, MA 02115 USA
[3] Worcester Polytech Inst, Worcester, MA 01609 USA
关键词
EM algorithm; Gibbs sampling; logistic regression; maximum likelihood estimation; missing data mechanism; Monte Carlo EM algorithm;
D O I
10.1111/1467-9868.00170
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a method for estimating parameters in generalized linear models with missing covariates and a non-ignorable missing data mechanism. We use a multinomial model for the missing data indicators and propose a joint distribution for them which can be written as a sequence of one-dimensional conditional distributions, with each one-dimensional conditional distribution consisting of a logistic regression. We allow the covariates to be either categorical or continuous. The joint covariate distribution is also modelled via a sequence of one-dimensional conditional distributions, and the response variable is assumed to be completely observed. We derive the E- and M-steps of the EM algorithm with non-ignorable missing covariate data. For categorical covariates, we derive a closed form expression for the E- and M-steps of the EM algorithm for obtaining the maximum likelihood estimates (MLEs). For continuous covariates, we use a Monte Carlo version of the EM algorithm to obtain the MLEs via the Gibbs sampler. Computational techniques for Gibbs sampling are proposed and implemented. The parametric form of the assumed missing data mechanism itself is not 'testable' from the data, and thus the non-ignorable modelling considered here can be viewed as a sensitivity analysis concerning a more complicated model. Therefore, although a model may have 'passed' the tests for a certain missing data mechanism, this does not mean that we have captured, even approximately, the correct missing data mechanism. Hence, model checking for the missing data mechanism and sensitivity analyses play an important role in this problem and are discussed in detail. Several simulations are given to demonstrate the methodology. In addition, a real data set from a melanoma cancer clinical trial is presented to illustrate the methods proposed.
引用
收藏
页码:173 / 190
页数:18
相关论文
共 22 条
[1]  
Agresti A., 1990, Analysis of categorical data
[2]  
[Anonymous], APPL STAT, DOI DOI 10.2307/2347565
[3]   REGRESSION-ANALYSIS FOR CATEGORICAL VARIABLES WITH OUTCOME SUBJECT TO NONIGNORABLE NONRESPONSE [J].
BAKER, SG ;
LAIRD, NM .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1988, 83 (401) :62-69
[4]  
CHAMBERS RL, 1993, J ROY STAT SOC B MET, V55, P157
[5]   Markov chain Monte Carlo convergence diagnostics: A comparative review [J].
Cowles, MK ;
Carlin, BP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1996, 91 (434) :883-904
[6]  
Diggle P. G., 1994, J ROY STAT SOC C, V43, P49
[7]   MULTIPLE IMPUTATION IN MIXTURE-MODELS FOR NONIGNORABLE NONRESPONSE WITH FOLLOW-UPS [J].
GLYNN, RJ ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (423) :984-993
[8]  
GREENLEES JS, 1982, J AM STAT ASSOC, V77, P251
[9]   INCOMPLETE DATA IN GENERALIZED LINEAR-MODELS [J].
IBRAHIM, JG .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1990, 85 (411) :765-769
[10]   Interferon alfa-2b adjuvant therapy of high-risk resected cutaneous melanoma: The Eastern Cooperative Oncology Group trial EST 1684 [J].
Kirkwood, JM ;
Strawderman, MH ;
Ernstoff, MS ;
Smith, TJ ;
Borden, EC ;
Blum, RH .
JOURNAL OF CLINICAL ONCOLOGY, 1996, 14 (01) :7-17