Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don't Expect Replication

被引：251

作者：

Amrhein, Valentin ^{[1
]}

Trafinnow, David ^{[2
,3
,4
]}

Greenland, Sander ^{[3
,4
]}

机构：

[1] Univ Basel, Zool Inst, CH-4501 Basel, Switzerland

[2] New Mexico State Univ, Dept Psychol, Las Cruces, NM 88003 USA

[3] Univ Calif Los Angeles, Dept Epidemiol, Los Angeles, CA USA

[4] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA USA

来源：

AMERICAN STATISTICIAN | 2019年 / 73卷

关键词：

Auxiliary hypotheses; Confidence interval; Hypothesis test; P-value; Posterior probability; Replication; Selective reporting; Significance test; Statistical model; Unreplicable research; P-VALUES; BAYES;

D O I：

10.1080/00031305.2018.1543137

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Statistical inference often fails to replicate. One reason is that many results may be selected for drawing inference because some threshold of a statistic like the P-value was crossed, leading to biased reported effect sizes. Nonetheless, considerable non-replication is to be expected even without selective reporting, and generalizations from single studies are rarely if ever warranted. Honestly reported results must vary from replication to replication because of varying assumption violations and random variation; excessive agreement itself would suggest deeper problems, such as failure to publish results in conflict with group expectations or desires. A general perception of a "replication crisis" may thus reflect failure to recognize that statistical tests not only test hypotheses, but countless assumptions and the entire environment in which research takes place. Because of all the uncertain and unknown assumptions that underpin statistical inferences, we should treat inferential statistics as highly unstable local descriptions of relations between assumptions and data, rather than as providing generalizable inferences about hypotheses or models. And that means we should treat statistical results as being much more incomplete and uncertain than is currently the norm. Acknowledging this uncertainty could help reduce the allure of selective reporting: Since a small P-value could be large in a replication study, and a large P-value could be small, there is simply no need to selectively report studies based on statistical results. Rather than focusing our study reports on uncertain conclusions, we should thus focus on describing accurately how the study was conducted, what problems occurred, what data were obtained, what analysis methods were used and why, and what output those methods produced.

引用

页码：262 / 270

页数：9

共 54 条

[1] Estimating the reproducibility of psychological science [J].

Aarts, Alexander A. ;

Anderson, Joanna E. ;

Anderson, Christopher J. ;

Attridge, Peter R. ;

Attwood, Angela ;

Axt, Jordan ;

Babel, Molly ;

Bahnik, Stepan ;

Baranski, Erica ;

Barnett-Cowan, Michael ;

Bartmess, Elizabeth ;

Beer, Jennifer ;

Bell, Raoul ;

Bentley, Heather ;

Beyan, Leah ;

Binion, Grace ;

Borsboom, Denny ;

Bosch, Annick ;

Bosco, Frank A. ;

Bowman, Sara D. ;

Brandt, Mark J. ;

Braswell, Erin ;

Brohmer, Hilmar ;

Brown, Benjamin T. ;

Brown, Kristina ;

Bruening, Jovita ;

Calhoun-Sauls, Ann ;

Callahan, Shannon P. ;

Chagnon, Elizabeth ;

Chandler, Jesse ;

Chartier, Christopher R. ;

Cheung, Felix ;

Christopherson, Cody D. ;

Cillessen, Linda ;

Clay, Russ ;

Cleary, Hayley ;

Cloud, Mark D. ;

Cohn, Michael ;

Cohoon, Johanna ;

Columbus, Simon ;

Cordes, Andreas ;

Costantini, Giulio ;

Alvarez, Leslie D. Cramblet ;

Cremata, Ed ;

Crusius, Jan ;

DeCoster, Jamie ;

DeGaetano, Michelle A. ;

Della Penna, Nicolas ;

den Bezemer, Bobby ;

Deserno, Marie K. .

SCIENCE, 2015, 349 (6251)

[2]

Amrhein V., 2018, PEERJ PREPRINTS, V6, DOI [10.7287/peerj.preprints.26857v1, DOI 10.7287/PEERJ.PREPRINTS.26857V1]

[3]

Amrhein V, 2018, SCI 5

[4] Remove, rather than redefine, statistical significance [J].

Amrhein, Valentin ;

Greenland, Sander .

NATURE HUMAN BEHAVIOUR, 2018, 2 (01) :4-4

[5] The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research [J].

Amrhein, Valentin ;

Korner-Nievergelt, Franzi ;

Roth, Tobias .

PEERJ, 2017, 5

[6]

[Anonymous], 1978, AUST J STAT, DOI DOI 10.1111/J.1467-842X.1978.TB01094.X

[7]

[Anonymous], 2011, RMM, Volume, DOI DOI 10.1038/CLPT.2010.128

[8]

[Anonymous], 1885, Journal of the Statistical Society of London, Jubilee Volume

[9]

Baker M, 2016, NATURE, V533, P452, DOI 10.1038/533452a

[10]

Barnards GA, 1996, STUDENT, V1, P257

← 1 2 3 4 5 6 →