Significance tests harm progress in forecasting

被引:77
作者
Armstrong, J. Scott [1 ]
机构
[1] Univ Penn, Wharton Sch, Philadelphia, PA 19104 USA
关键词
accuracy measures; combining forecasts; confidence intervals; effect size; M-competition; meta-analysis; null hypothesis; practical significance; replications;
D O I
10.1016/j.ijforecast.2007.03.004
中图分类号
F [经济];
学科分类号
02 ;
摘要
I briefly summarize prior research showing that tests of statistical significance are improperly used even in leading scholarly journals. Attempts to educate researchers to avoid pitfalls have had little success. Even when done properly, however, statistical significance tests are of no value. Other researchers have discussed reasons for these failures. I was unable to find empirical evidence to support the use of significance tests under any conditions. I then show that tests of statistical significance are harmful to the development of scientific knowledge because they distract the researcher from the use of proper methods. I illustrate the dangers of significance tests by examining a re-analysis of the W-Competition. Although the authors of the reanalysis conducted a proper series of statistical tests, they suggested that the original W-Competition was not justified in concluding that combined forecasts reduce errors, and that the selection of the best method is dependent on the selection of a proper error measure. I show that the original conclusions were correct. Authors should avoid tests of statistical significance; instead, they should report on effect sizes, confidence intervals, replications/extensions, and meta-analyses. Practitioners should ignore significance tests and journals should discourage them. (c) 2007 Published by Elsevier B.V. on behalf of International Institute of Forecasters.
引用
收藏
页码:321 / 327
页数:7
相关论文
共 20 条
[1]  
Allen PG, 2001, INT SER OPER RES MAN, V30, P303
[2]  
[Anonymous], 1992, Marketing Letters, DOI [DOI 10.1007/BF00993992, 10.1007/BF00993992]
[3]  
[Anonymous], 2001, PRINCIPLES FORECASTI
[4]   Findings from evidence-based forecasting: Methods for reducing forecast error [J].
Armstrong, J. Scott .
INTERNATIONAL JOURNAL OF FORECASTING, 2006, 22 (03) :583-598
[5]  
Armstrong JS, 2001, INT SER OPER RES MAN, V30, P417
[6]   ERROR MEASURES FOR GENERALIZING ABOUT FORECASTING METHODS - EMPIRICAL COMPARISONS [J].
ARMSTRONG, JS ;
COLLOPY, F .
INTERNATIONAL JOURNAL OF FORECASTING, 1992, 8 (01) :69-80
[7]   STATISTICAL SIGNIFICANCE, REVIEWER EVALUATIONS, AND THE SCIENTIFIC PROCESS - IS THERE A (STATISTICALLY) SIGNIFICANT RELATIONSHIP [J].
ATKINSON, DR ;
FURLONG, MJ ;
WAMPOLD, BE .
JOURNAL OF COUNSELING PSYCHOLOGY, 1982, 29 (02) :189-194
[8]  
COHEN J, 1994, AM PSYCHOL, V49, P997, DOI 10.1037/0003-066X.50.12.1103
[9]   Debiasing forecasts: how useful is the unbiasedness test? [J].
Goodwin, P ;
Lawton, R .
INTERNATIONAL JOURNAL OF FORECASTING, 2003, 19 (03) :467-475
[10]   Confusion over measures of evidence (p's) versus errors (a's) in classical statistical testing [J].
Hubbard, R ;
Bayarri, MJ .
AMERICAN STATISTICIAN, 2003, 57 (03) :171-178