Global goodness-of-fit tests in logistic regression with sparse data

被引:44
作者
Kuss, O [1 ]
机构
[1] Univ Halle Wittenberg, Inst Med Epidemiol Biostat & Informat, D-06097 Halle An Der Saale, Germany
关键词
logistic regression; goodness-of-fit; sparse data; Hosmer-Lemeshow test;
D O I
10.1002/sim.1421
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The logistic regression model has become the standard analysing tool for binary responses in medical statistics. Methods for assessing goodness-of-fit, however, are less developed where this problem is especially pronounced in performing global goodness-of-fit tests with sparse data, that is, if the data contain only a small numbers of observations for each pattern of covariate values. In this situation it has been known for a long time that the standard goodness-of-fit tests (residual deviance and Pearson chi-square) behave unsatisfactorily if p-values are calculated from the chi(2)-distribution. As a remedy in this situation the Hosmer Lemeshow test is frequently recommended; it relies on a new grouping of the observations to avoid sparseness, where this grouping depends on the estimated probabilities from the model. It has been shown, however. that the Hosmer-Lemeshow test also has some deficiencies, for example, it depends heavily on the calculating algorithm and thus different implementations might lead to different conclusions regarding the fit of the model. We present some alternative tests from the statistical literature which should also perform well with sparse data. Results from a simulation study are given which show that there exist some goodness-of-fit tests (for example, the Farrington test) that have good properties regarding size and power and that even outperform the Hosmer-Lemeshow test. We illustrate the various tests with an example from dermatology on occupational hand eczema in hairdressers, Copyright (C) 2002 John Wiley Sons, Ltd.
引用
收藏
页码:3789 / 3801
页数:13
相关论文
共 33 条
[1]   Exact inference for categorical data: recent advances and continuing controversies [J].
Agresti, A .
STATISTICS IN MEDICINE, 2001, 20 (17-18) :2709-2722
[2]  
Agresti A., 1990, CATEGORICAL DATA ANA
[3]   UNSTABLE MODELS FROM INCORRECT FORMS [J].
ALSTON, JM ;
CHALFANT, JA .
AMERICAN JOURNAL OF AGRICULTURAL ECONOMICS, 1991, 73 (04) :1171-1181
[4]  
[Anonymous], 1988, MISSPECIFICATION TES, DOI [10.1017/CCOL0521266165, DOI 10.1017/CCOL0521266165]
[5]  
[Anonymous], 1989, STAT ANAL DISCRETE D
[6]   The asymptotically efficient version of the information matrix test in binary choice models. A study of size and power [J].
Aparicio, T ;
Villanua, I .
JOURNAL OF APPLIED STATISTICS, 2001, 28 (02) :167-182
[7]  
Bertolini G, 2000, J Epidemiol Biostat, V5, P251
[8]   TESTING FOR NEGLECTED HETEROGENEITY [J].
CHESHER, A .
ECONOMETRICA, 1984, 52 (04) :865-872
[9]  
COPAS JB, 1989, J R STAT SOC C-APPL, V38, P71
[10]  
CRESSIE N, 1984, J ROY STAT SOC B MET, V46, P440