The p-value interval as an inferential tool

被引:9
作者
Berger, VW [1 ]
机构
[1] NCI, Div Canc Prevent, Biometry Res Grp, Bethesda, MD 20892 USA
关键词
conservatism; exact test; mid-p-value; permutation test; randomization test; robustness;
D O I
10.1111/1467-9884.00262
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In the phase III randomized clinical trial setting, design-based analyses for between-group comparisons are permutation tests which strictly preserve the type I error rate. However, the conservatism of permutation tests can cause a loss of power sufficient to prevent statistical significance from being reached. Arguments regarding the use of permutation tests thus tend to be broad brush: permutation tests should be used routinely for strict preservation of the type I error, or permutation tests should not be used at all because of the loss of power due to conservatism. Lost in these arguments is the fact that the conservatism of any particular permutation test can be measured, to allow for a more moderate decision rule: use a permutation test, but only if it is not overly conservative. We propose reversing the measure of conservatism from data independent and alpha dependent to data dependent and alpha independent, to reflect the practice of reporting p-values rather than reject or no-reject decisions at a given alpha -level. Specifically, we define the p-value interval, whose lower end point is the smallest the p-value could have been without conservatism, and whose upper end point is the (generally conservative) traditional p-value. The length of the p-value interval is the null probability of the observed outcome and measures the conservatism of the test. The p-value interval allows for an explicit quantification of the extent to which more discriminating (or secondary) test statistics help to reduce conservatism by generating more outcomes, each with a lower null probability. Higher order p-value intervals can also be used to assess the robustness of statistical significance in terms of the number of patients required to switch treatment groups to break the observed statistical significance.
引用
收藏
页码:79 / 85
页数:7
相关论文
共 12 条
[1]   Detecting selection bias in randomized clinical trials [J].
Berger, VW ;
Exner, DV .
CONTROLLED CLINICAL TRIALS, 1999, 20 (04) :319-327
[2]  
Berger VW, 2000, STAT MED, V19, P1319, DOI 10.1002/(SICI)1097-0258(20000530)19:10<1319::AID-SIM490>3.0.CO
[3]  
2-0
[4]   Convex hull test for ordered categorical data [J].
Berger, VW ;
Permutt, T ;
Ivanova, A .
BIOMETRICS, 1998, 54 (04) :1541-1550
[5]   Mid-P confidence intervals: A brief review [J].
Berry, G ;
Armitage, P .
STATISTICIAN, 1995, 44 (04) :417-423
[6]   AN EVALUATION OF SOME TESTS OF TREND IN CONTINGENCY-TABLES [J].
COHEN, A ;
SACKROWITZ, HB .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1992, 87 (418) :470-475
[7]  
ELWOOD JM, 1998, CRITICAL APPRAISAL E, pCH10
[8]   A NOTE ON THE WILCOXON-MANN-WHITNEY TEST FOR 2XK ORDERED TABLES [J].
EMERSON, JD ;
MOSES, LE .
BIOMETRICS, 1985, 41 (01) :303-309
[9]   IMPROVED EXACT INFERENCE ABOUT CONDITIONAL ASSOCIATION IN 3-WAY CONTINGENCY-TABLES [J].
KIM, D ;
AGRESTI, A .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (430) :632-639
[10]   ON TESTS THAT ARE UNIFORMLY MORE POWERFUL THAN THE WILCOXON-MANN-WHITNEY TEST [J].
STREITBERG, B ;
ROEHMEL, J .
BIOMETRICS, 1990, 46 (02) :481-484