Zero-Inflated and Hurdle Models of Count Data with Extra Zeros: Examples from an HIV-Risk Reduction Intervention Trial

被引:228
作者
Hu, Mei-Chen [2 ]
Pavlicova, Martina [3 ]
Nunes, Edward V. [1 ,2 ]
机构
[1] New York State Psychiat Inst & Hosp, New York, NY 10032 USA
[2] Columbia Univ, Dept Psychiat, New York, NY USA
[3] Columbia Univ, Mailman Sch Publ Hlth, Dept Biostat, New York, NY USA
关键词
overdispersion; extra zeros; Poisson; negative binomial; hurdle model;
D O I
10.3109/00952990.2011.597280
中图分类号
B849 [应用心理学];
学科分类号
040203 ;
摘要
Background: In clinical trials of behavioral health interventions, outcome variables often take the form of counts, such as days using substances or episodes of unprotected sex. Classically, count data follow a Poisson distribution; however, in practice such data often display greater heterogeneity in the form of excess zeros (zero-inflation) or greater spread in the values (overdispersion) or both. Greater sample heterogeneity may be especially common in community-based effectiveness trials, where broad eligibility criteria are implemented to achieve a generalizable sample. Objectives: This article reviews the characteristics of Poisson model and the related models that have been developed to handle overdispersion (negative binomial (NB) model) or zero-inflation (zero-inflated Poisson (ZIP) and Poisson hurdle (PH) models) or both (zero-inflated negative binomial (ZINB) and negative binomial hurdle (NBH) models). Methods: All six models were used to model the effect of an HIV-risk reduction intervention on the count of unprotected sexual occasions (USOs), using data from a previously completed clinical trial among female patients (N = 515) participating in community-based substance abuse treatment (Tross et al. Effectiveness of HIV/AIDS sexual risk reduction groups for women in substance abuse treatment programs: Results of NIDA Clinical Trials Network Trial. J Acquir Immune Defic Syndr 2008; 48(5):581-589). Goodness of fit and the estimates of treatment effect derived from each model were compared. Results: The ZINB model provided the best fit, yielding a medium-sized effect of intervention. Conclusions and Scientific Significance: This article illustrates the consequences of applying models with different distribution assumptions on the data. If a model used does not closely fit the shape of the data distribution, the estimate of the effect of the intervention may be biased, either over-or underestimating the intervention effect.
引用
收藏
页码:367 / 375
页数:9
相关论文
共 17 条
[1]  
Akaike H., 1973, 2 INT S INFORM THEOR, P267
[2]  
[Anonymous], 2013, Regression Analysis of Count Data
[3]  
[Anonymous], ECONOMETRIC ANAL
[4]  
Brown H, 2015, Applied mixed models in medicine
[5]   Modeling count data with excess zeroes - An empirical application to traffic accidents [J].
Chin, HC ;
Quddus, MA .
SOCIOLOGICAL METHODS & RESEARCH, 2003, 32 (01) :90-116
[6]  
Exner T.M., 1997, AIDS and Behavior, V1, P93, DOI DOI 10.1023/B:AIBE.0000002972.61606.99
[7]  
Joshua S.C., 1990, TRANSPORT PLAN TECHN, V15, P41, DOI DOI 10.1080/03081069008717439
[8]   ZERO-INFLATED POISSON REGRESSION, WITH AN APPLICATION TO DEFECTS IN MANUFACTURING [J].
LAMBERT, D .
TECHNOMETRICS, 1992, 34 (01) :1-14
[9]  
LITTLE R. J., 2019, Statistical analysis with missing data, V793
[10]  
LIU W, 2008, SAS GLOB FOR 2008 16