Evolution of Reporting P Values in the Biomedical Literature, 1990-2015

被引:263
作者
Chavalarias, David [1 ,2 ]
Wallach, Joshua David [3 ,4 ,5 ]
Li, Alvin Ho Ting [6 ]
Ioannidis, John P. A. [5 ,7 ,8 ,9 ]
机构
[1] EHESS CNRS UMR8557, CAMS, Paris, France
[2] Complex Syst Inst Paris Ile de France ISC PIF, UPS3611, Paris, France
[3] Stanford Univ, Dept Hlth Res, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Policy, Stanford, CA 94305 USA
[5] Stanford Univ, Meta Res Innovat Ctr Stanford METR, 1265 Welch Rd,MSOB X306, Stanford, CA 94305 USA
[6] Univ Western Ontario, Dept Epidemiol & Biostat, London, ON, Canada
[7] Stanford Univ, Dept Med, 1265 Welch Rd,MSOB X306, Stanford, CA 94305 USA
[8] Stanford Univ, Dept Hlth Res & Policy, 1265 Welch Rd,MSOB X306, Stanford, CA 94305 USA
[9] Stanford Univ, Dept Stat, 1265 Welch Rd,MSOB X306, Stanford, CA 94305 USA
来源
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION | 2016年 / 315卷 / 11期
关键词
GENOME-WIDE SIGNIFICANCE; PUBLICATION BIAS; MEDICAL STATISTICS; EFFECT SIZE; ABSTRACTS; TRIALS; FALSE;
D O I
10.1001/jama.2016.1952
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
IMPORTANCE The use and misuse of P values has generated extensive debates. OBJECTIVE To evaluate in large scale the P values reported in the abstracts and full text of biomedical research articles over the past 25 years and determine how frequently statistical information is presented in ways other than P values. DESIGN Automated text-mining analysis was performed to extract data on P values reported in 12 821 790 MEDLINE abstracts and in 843 884 abstracts and full-text articles in PubMed Central (PMC) from 1990 to 2015. Reporting of P values in 151 English-language core clinical journals and specific article types as classified by PubMed also was evaluated. A random sample of 1000 MEDLINE abstracts was manually assessed for reporting of P values and other types of statistical information; of those abstracts reporting empirical data, 100 articles were also assessed in full text. MAIN OUTCOMES AND MEASURES P values reported. RESULTS Text mining identified 4 572 043 P values in 1 608 736 MEDLINE abstracts and 3 438 299 P values in 385 393 PMC full-text articles. Reporting of P values in abstracts increased from 7.3% in 1990 to 15.6% in 2014. In 2014, P values were reported in 33.0% of abstracts from the 151 core clinical journals (n = 29 725 abstracts), 35.7% of meta-analyses (n = 5620), 38.9% of clinical trials (n = 4624), 54.8% of randomized controlled trials (n = 13 544), and 2.4% of reviews (n = 71 529). The distribution of reported P values in abstracts and in full text showed strong clustering at P values of .05 and of .001 or smaller. Over time, the "best" (most statistically significant) reported P values were modestly smaller and the "worst" (least statistically significant) reported P values became modestly less significant. Among the MEDLINE abstracts and PMC full-text articles with P values, 96% reported at least 1 P value of .05 or lower, with the proportion remaining steady over time in PMC full-text articles. In 1000 abstracts that were manually reviewed, 796 were from articles reporting empirical data; P values were reported in 15.7%(125/796 [95% CI, 13.2%-18.4%]) of abstracts, confidence intervals in 2.3%(18/796 [95% CI, 1.3%-3.6%]), Bayes factors in 0% (0/796 [95% CI, 0%-0.5%]), effect sizes in 13.9%(111/796 [95% CI, 11.6%-16.5%]), other information that could lead to estimation of P values in 12.4%(99/796 [95% CI, 10.2%-14.9%]), and qualitative statements about significance in 18.1%(181/1000 [95% CI, 15.8%-20.6%]); only 1.8%(14/796 [95% CI, 1.0%-2.9%]) of abstracts reported at least 1 effect size and at least 1 confidence interval. Among 99 manually extracted full-text articles with data, 55 reported P values, 4 presented confidence intervals for all reported effect sizes, none used Bayesian methods, 1 used false-discovery rates, 3 used sample size/power calculations, and 5 specified the primary outcome. CONCLUSIONS AND RELEVANCE In this analysis of P values reported in MEDLINE abstracts and in PMC articles from 1990-2015, more MEDLINE abstracts and articles reported P values over time, almost all abstracts and articles with P values reported statistically significant results, and, in a subgroup analysis, few articles included confidence intervals, Bayes factors, or effect sizes. Rather than reporting isolated P values, articles should include effect sizes and uncertainty metrics.
引用
收藏
页码:1141 / 1148
页数:8
相关论文
共 33 条
[1]  
BERLIN JA, 1989, J AM STAT ASSOC, V84, P381
[2]   Reporting and Interpretation of Randomized Controlled Trials With Statistically Nonsignificant Results for Primary Outcomes [J].
Boutron, Isabelle ;
Dutton, Susan ;
Ravaud, Philippe ;
Altman, Douglas G. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2010, 303 (20) :2058-2064
[3]  
COHEN J, 1994, AM PSYCHOL, V49, P997, DOI 10.1037/0003-066X.50.12.1103
[4]   A surge of p-values between 0.041 and 0.049 in recent decades (but negative results are increasing rapidly too) [J].
de Winter, Joost C. F. ;
Dodou, Dimitra .
PEERJ, 2015, 3
[5]   FACTORS INFLUENCING PUBLICATION OF RESEARCH RESULTS - FOLLOW-UP OF APPLICATIONS SUBMITTED TO 2 INSTITUTIONAL REVIEW BOARDS [J].
DICKERSIN, K ;
MIN, YI ;
MEINERT, CL .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1992, 267 (03) :374-378
[6]   PUBLICATION BIAS IN CLINICAL RESEARCH [J].
EASTERBROOK, PJ ;
BERLIN, JA ;
GOPALAN, R ;
MATTHEWS, DR .
LANCET, 1991, 337 (8746) :867-872
[7]   Negative results are disappearing from most disciplines and countries [J].
Fanelli, Daniele .
SCIENTOMETRICS, 2012, 90 (03) :891-904
[8]   "Positive" Results Increase Down the Hierarchy of the Sciences [J].
Fanelli, Daniele .
PLOS ONE, 2010, 5 (03)
[9]  
Gardner M.J., 1989, STAT CONFIDENCE CONF
[10]   CONFIDENCE-INTERVALS RATHER THAN P-VALUES - ESTIMATION RATHER THAN HYPOTHESIS-TESTING [J].
GARDNER, MJ ;
ALTMAN, DG .
BMJ-BRITISH MEDICAL JOURNAL, 1986, 292 (6522) :746-750