Modelling count data with excessive zeros: The need for class prediction in zero-inflated models and the issue of data generation in choosing between zero-inflated and generic mixture models for dental caries data

被引：35

作者：

Gilthorpe, Mark S. ^{[1
]}

Frydenberg, Morten ^{[2
]}

Cheng, Yaping ^{[1
]}

Baelum, Vibeke ^{[3
,4
]}

机构：

[1] Univ Leeds, Div Biostat, Ctr Biostat & Epidemiol, Leeds LS2 9JT, W Yorkshire, England

[2] Univ Aarhus, Dept Biostat, Inst Publ Hlth, Fac Hlth Sci, Aarhus, Denmark

[3] Univ Aarhus, Dept Epidemiol, Inst Publ Hlth, Aarhus, Denmark

[4] Univ Aarhus, Sch Dent, Fac Hlth Sci, Aarhus, Denmark

来源：

STATISTICS IN MEDICINE | 2009年 / 28卷 / 28期

关键词：

zero-inflated; mixture modelling; oral health; dmft/DMFT; latent variable; POISSON; PREVALENCE; REGRESSION;

D O I：

10.1002/sim.3699

中图分类号：

Q [生物科学];

学科分类号：

090105 [作物生产系统与生态工程];

摘要：

Count data may possess an 'excess' of zeros relative to standard distributions. Zero-inflated Poisson (ZiP) or binomial (ZiB) and generic mixture models have been proposed to deal with such data. We consider biomedical count data with an excess number of zeros and seek to address the following: (i) do zero-inflated models need covariates in the distribution part to predict class membership; (ii) what model-fit criteria have clinical relevance to predicted counts; (iii) can very different model parameterizations have near-identical fit; and (iv) how could model selection and hence model interpretation be aided by considering data generation processes? We show that covariates in the distribution part of zero-inflated models are needed to predict class membership. A range of model-fit criteria should be considered, as consensus is rarely achieved, and considering predicted Outcomes may be just as valuable as likelihood-based criteria. Zero-inflated and generic mixture models may be indistinguishable according to both likelihood-based model-fit criteria and predicted outcomes, in which case model differentiation, hence, model selection and interpretation, might be guided by the consideration of a priori data generation processes. Zero-inflated models reflect whether or not there are (or have been) risk differences in disease onset and disease progression, while generic mixture models identify sub-types of individuals with similar risks of disease onset and progression. One or both modelling strategies may be used, though a priori knowledge or clinical impression of data generation might help to distinguish between two or more parameterizations that exhibit similar fit and yield near-identical predicted counts. Copyright (C) 2009 John Wiley & Sons, Ltd.

引用

页码：3539 / 3553

页数：15

共 20 条

[1]

[Anonymous], 2004, Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models

[2]

Statistical issues on the analysis of change in follow-up studies in dental research [J].