Abandon Statistical Significance

被引:580
作者
McShane, Blakeley B. [1 ]
Gal, David [2 ]
Gelman, Andrew [3 ,4 ]
Robert, Christian [5 ]
Tackett, Jennifer L. [6 ]
机构
[1] Northwestern Univ, Kellogg Sch Management, Dept Mkt, 2211 Campus Dr, Evanston, IL 60208 USA
[2] Univ Illinois, Coll Business Adm, Dept Managerial Studies, Chicago, IL USA
[3] Columbia Univ, Dept Stat, New York, NY USA
[4] Columbia Univ, Dept Polit Sci, New York, NY 10027 USA
[5] Univ Paris 09, Ctr Rech Math Decis CEREMADE, Paris, France
[6] Northwestern Univ, Dept Psychol, Evanston, IL 60208 USA
基金
美国国家科学基金会;
关键词
Null hypothesis significance testing; p-Value; Replication; Sociology of science; Statistical significance; NULL-HYPOTHESIS; P-VALUES; REVISED STANDARDS; METAANALYSIS; REPLICATION; KNOWLEDGE; INFERENCE;
D O I
10.1080/00031305.2018.1527253
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
070103 [概率论与数理统计]; 140311 [社会设计与社会创新];
摘要
We discuss problems the null hypothesis significance testing (NHST) paradigm poses for replication and more broadly in the biomedical and social sciences as well as how these problems remain unresolved by proposals involving modified p-value thresholds, confidence intervals, and Bayes factors. We then discuss our own proposal, which is to abandon statistical significance. We recommend dropping the NHST paradigm-and the p-value thresholds intrinsic to it-as the default statistical paradigm for research, publication, and discovery in the biomedical and social sciences. Specifically, we propose that the p-value be demoted from its threshold screening role and instead, treated continuously, be considered along with currently subordinate factors (e.g., related prior evidence, plausibility of mechanism, study design and data quality, real world costs and benefits, novelty of finding, and other factors that vary by research domain) as just one among many pieces of evidence. We have no desire to "ban" p-values or other purely statistical measures. Rather, we believe that such measures should not be thresholded and that, thresholded or not, they should not take priority over the currently subordinate factors. We also argue that it seldom makes sense to calibrate evidence as a function of p-values or other purely statistical measures. We offer recommendations for how our proposal can be implemented in the scientific publication process as well as in statistical decision making more broadly.
引用
收藏
页码:235 / 245
页数:11
相关论文
共 80 条
[1]
Remove, rather than redefine, statistical significance [J].
Amrhein, Valentin ;
Greenland, Sander .
NATURE HUMAN BEHAVIOUR, 2018, 2 (01) :4-4
[2]
The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research [J].
Amrhein, Valentin ;
Korner-Nievergelt, Franzi ;
Roth, Tobias .
PEERJ, 2017, 5
[3]
Null hypothesis testing: Problems, prevalence, and an alternative [J].
Anderson, DR ;
Burnham, KP ;
Thompson, WL .
JOURNAL OF WILDLIFE MANAGEMENT, 2000, 64 (04) :912-923
[4]
[Anonymous], 2002, Methods Psychol Res, DOI DOI 10.1119/1.2343497
[5]
[Anonymous], 2017, PERSONALITY SOCIAL P, DOI DOI 10.1177/0146167217729162
[6]
TEST OF SIGNIFICANCE IN PSYCHOLOGICAL RESEARCH [J].
BAKAN, D .
PSYCHOLOGICAL BULLETIN, 1966, 66 (06) :423-&
[7]
Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect [J].
Bem, Daryl J. .
JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 2011, 100 (03) :407-425
[8]
Redefine statistical significance [J].
Benjamin, Daniel J. ;
Berger, James O. ;
Johannesson, Magnus ;
Nosek, Brian A. ;
Wagenmakers, E. -J. ;
Berk, Richard ;
Bollen, Kenneth A. ;
Brembs, Bjoern ;
Brown, Lawrence ;
Camerer, Colin ;
Cesarini, David ;
Chambers, Christopher D. ;
Clyde, Merlise ;
Cook, Thomas D. ;
De Boeck, Paul ;
Dienes, Zoltan ;
Dreber, Anna ;
Easwaran, Kenny ;
Efferson, Charles ;
Fehr, Ernst ;
Fidler, Fiona ;
Field, Andy P. ;
Forster, Malcolm ;
George, Edward I. ;
Gonzalez, Richard ;
Goodman, Steven ;
Green, Edwin ;
Green, Donald P. ;
Greenwald, Anthony ;
Hadfield, Jarrod D. ;
Hedges, Larry V. ;
Held, Leonhard ;
Ho, Teck Hua ;
Hoijtink, Herbert ;
Hruschka, Daniel J. ;
Imai, Kosuke ;
Imbens, Guido ;
Ioannidis, John P. A. ;
Jeon, Minjeong ;
Jones, James Holland ;
Kirchler, Michael ;
Laibson, David ;
List, John ;
Little, Roderick ;
Lupia, Arthur ;
Machery, Edouard ;
Maxwell, Scott E. ;
McCarthy, Michael ;
Moore, Don ;
Morgan, Stephen L. .
NATURE HUMAN BEHAVIOUR, 2018, 2 (01) :6-10
[9]
BERGER JO, 1987, J AM STAT ASSOC, V82, P112, DOI 10.2307/2289131
[10]
Some difficulties of interpretation encountered in the application of the chi-square test [J].
Berkson, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1938, 33 (203) :526-536