Regression models for mixed discrete and continuous responses with potentially missing values

被引:56
作者
Fitzmaurice, GM [1 ]
Laird, NM [1 ]
机构
[1] HARVARD UNIV,SCH PUBL HLTH,DEPT BIOSTAT,BOSTON,MA 02115
关键词
EM algorithm; location model; marginal model; missing data; MAXIMUM-LIKELIHOOD; BINARY; INFERENCE; VARIABLES;
D O I
10.2307/2533101
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In this paper a likelihood-based method for analyzing mixed discrete and continuous regression models is proposed. We focus on marginal regression models, that is, models in which the marginal expectation of the response vector is related to covariates by known link functions. The proposed model is based on an extension of the general location model of Olkin and Tate (1961, Annals of Mathematical Statistics 32, 448-465), and can accommodate missing responses. When there are no missing data, our particular choice of parameterization yields maximum likelihood estimates of the marginal mean parameters that are robust to misspecification of the association between the responses. This robustness property does not, in general, hold for the case of incomplete data. There are a number of potential benefits of a multivariate approach over separate analyses of the distinct responses. First, a multivariate analysis can exploit the correlation structure of the response vector to address intrinsically multivariate questions. Second, multivariate test statistics allow for control over the inflation of the type I error that results when separate analyses of the distinct responses are performed without accounting for multiple comparisons. Third, it is generally possible to obtain more precise parameter estimates by accounting for the association between the responses. Finally, separate analyses of the distinct responses may be difficult to interpret when there is nonresponse because different sets of individuals contribute to each analysis. Furthermore, separate analyses can introduce bias when the missing responses are missing at random (MAR). A multivariate analysis can circumvent both of these problems. The proposed methods are applied to two biomedical datasets.
引用
收藏
页码:110 / 122
页数:13
相关论文
共 22 条
[1]  
COX DR, 1972, J ROY STAT SOC C-APP, V21, P113, DOI 10.2307/2346482
[2]  
COX DR, 1992, BIOMETRIKA, V79, P441, DOI 10.1093/biomet/79.3.441
[3]  
COX DR, 1987, J ROY STAT SOC B MET, V49, P1
[4]  
COX DR, 1961, 4TH P BERK S MATH ST, V1, P105
[5]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[6]   A LIKELIHOOD-BASED METHOD FOR ANALYZING LONGITUDINAL BINARY RESPONSES [J].
FITZMAURICE, GM ;
LAIRD, NM .
BIOMETRIKA, 1993, 80 (01) :141-151
[7]   REGRESSION-MODELS FOR A BIVARIATE DISCRETE AND CONTINUOUS OUTCOME WITH CLUSTERING [J].
FITZMAURICE, GM ;
LAIRD, NM .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (431) :845-852
[8]  
GLONEK GFV, 1995, J ROY STAT SOC B MET, V57, P533
[9]   RACE AND GENDER DIFFERENCES IN RESPIRATORY ILLNESS PREVALENCE AND THEIR RELATIONSHIP TO ENVIRONMENTAL EXPOSURES IN CHILDREN 7 TO 14 YEARS OF AGE [J].
GOLD, DR ;
ROTNITZKY, A ;
DAMOKOSH, AI ;
WARE, JH ;
SPEIZER, FE ;
FERRIS, BG ;
DOCKERY, DW .
AMERICAN REVIEW OF RESPIRATORY DISEASE, 1993, 148 (01) :10-18
[10]  
HUBER P., 1965, 5 S STAT PROBABILITY, V1, P221