Binomial Confidence Intervals and Contingency Tests: Mathematical Fundamentals and the Evaluation of Alternative Methods

被引:220
作者
Wallis, Sean [1 ]
机构
[1] UCL, Survey English Usage, London WC1E 6BT, England
关键词
PROPORTION;
D O I
10.1080/09296174.2013.799918
中图分类号
H0 [语言学];
学科分类号
050103 [汉语言文字学];
摘要
Many statistical methods rely on an underlying mathematical model of probability based on a simple approximation, one that is simultaneously well-known and yet frequently misunderstood. The Normal approximation to the Binomial distribution underpins a range of statistical tests and methods, including the calculation of accurate confidence intervals, performing goodness of fit and contingency tests, line- and model-fitting, and computational methods based upon these. A common mistake is in assuming that, since the probable distribution of error about the true value in the population is approximately Normally distributed, the same can be said for the error about an observation. This paper is divided into two parts: fundamentals and evaluation. First, we examine the estimation of confidence intervals using three initial approaches: the Wald (Normal) interval, the Wilson score interval and the exact Clopper-Pearson Binomial interval. Whereas the first two can be calculated directly from formulae, the Binomial interval must be approximated towards by computational search, and is computationally expensive. However this interval provides the most precise significance test, and therefore will form the baseline for our later evaluations. We also consider two further refinements: employing log-likelihood in intervals (also requiring search) and the effect of adding a continuity correction. Second, we evaluate each approach in three test paradigms. These are the single proportion interval or 2 x 1 goodness of fit test, and two variations on the common 2 x 2 contingency test. We evaluate the performance of each approach by a practitioner strategy. Since standard advice is to fall back to exact Binomial tests in conditions when approximations are expected to fail, we report the proportion of instances where one test obtains a significant result when the equivalent exact test does not, and vice versa, across an exhaustive set of possible values. We demonstrate that optimal methods are based on continuity-corrected versions of the Wilson interval or Yates' test, and that commonly-held beliefs about weaknesses of tests are misleading. Log-likelihood, often proposed as an improvement on , performs disappointingly. Finally we note that at this level of precision we may distinguish two types of 2 2 test according to whether the independent variable partitions data into independent populations, and we make practical recommendations for their use.
引用
收藏
页码:178 / 208
页数:31
相关论文
共 9 条
[1]
Aarts Bas., 2013, VERB PHRASE ENGLISH, P14, DOI DOI 10.1017/CBO9781139060998.003
[2]
Interval estimation for a binomial proportion - Comment - Rejoinder [J].
Brown, LD ;
Cai, TT ;
DasGupta, A ;
Agresti, A ;
Coull, BA ;
Casella, G ;
Corcoran, C ;
Mehta, C ;
Ghosh, M ;
Santner, TJ ;
Brown, LD ;
Cai, TT ;
DasGupta, A .
STATISTICAL SCIENCE, 2001, 16 (02) :101-133
[3]
Dunning T., 1993, Computational Linguistics, V19, P61
[4]
Newcombe RG, 1998, STAT MED, V17, P873, DOI 10.1002/(SICI)1097-0258(19980430)17:8<873::AID-SIM779>3.0.CO
[5]
2-I
[6]
Sheskin D.J., 1997, HDB PARAMETRIC NONPA
[7]
Wallis S. A., 2013, J QUANT LINGUIST, V20, P4
[8]
Wallis S. A., 2011, COMP CHI2 TESTS SEPA