Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome

被引:62
作者
Garijo, Daniel [1 ]
Kinnings, Sarah [2 ]
Xie, Li [3 ]
Xie, Lei [4 ]
Zhang, Yinliang [5 ]
Bourne, Philip E. [3 ]
Gil, Yolanda [6 ,7 ]
机构
[1] Univ Politecn Madrid, Fac Informat, Ontol Engn Grp, Madrid, Spain
[2] Univ Calif San Diego, Dept Chem & Biochem, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Skaggs Sch Pharm & Pharmaceut Sci, La Jolla, CA 92093 USA
[4] CUNY Hunter Coll, Dept Comp Sci, New York, NY 10021 USA
[5] Univ Sci & Technol China, Sch Life Sci, Hefei, Anhui, Peoples R China
[6] Univ So Calif, Inst Informat Sci, Los Angeles, CA USA
[7] Univ So Calif, Dept Comp Sci, Los Angeles, CA 90089 USA
基金
美国国家科学基金会;
关键词
REPEATABILITY;
D O I
10.1371/journal.pone.0080278
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
070301 [无机化学]; 070403 [天体物理学]; 070507 [自然资源与国土空间规划学]; 090105 [作物生产系统与生态工程];
摘要
How easy is it to reproduce the results found in a typical computational biology paper? Either through experience or intuition the reader will already know that the answer is with difficulty or not at all. In this paper we attempt to quantify this difficulty by reproducing a previously published paper for different classes of users (ranging from users with little expertise to domain experts) and suggest ways in which the situation might be improved. Quantification is achieved by estimating the time required to reproduce each of the steps in the method described in the original paper and make them part of an explicit workflow that reproduces the original results. Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results. The quantification leads to "reproducibility maps" that reveal that novice researchers would only be able to reproduce a few of the steps in the method, and that only expert researchers with advance knowledge of the domain would be able to reproduce the method in its entirety. The workflow itself is published as an online resource together with supporting software and data. The paper concludes with a brief discussion of the complexities of requiring reproducibility in terms of cost versus benefit, and a desiderata with our observations and guidelines for improving reproducibility. This has implications not only in reproducing the work of others from published papers, but reproducing work from one's own laboratory.
引用
收藏
页数:11
相关论文
共 50 条
[1]
Enhancing reproducibility [J].
不详 .
NATURE METHODS, 2013, 10 (05) :367-367
[2]
[Anonymous], 2004, RDF VOCABULARY DESCR
[3]
[Anonymous], WALL STREET J
[4]
DERIVING CHEMOSENSITIVITY FROM CELL LINES: FORENSIC BIOINFORMATICS AND REPRODUCIBLE RESEARCH IN HIGH-THROUGHPUT BIOLOGY [J].
Baggerly, Keith A. ;
Coombes, Kevin R. .
ANNALS OF APPLIED STATISTICS, 2009, 3 (04) :1309-1334
[5]
BAKER SG, 2010, BIOSTATISTICS, P11
[6]
Bell AW, 2009, NAT METHODS, V6, P423, DOI [10.1038/NMETH.1333, 10.1038/nmeth.1333]
[7]
The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[8]
Repeatability and Workability Evaluation of SIGMOD 2011 [J].
Bonnet, Philippe ;
Manegold, Stefan ;
Bjorling, Matias ;
Cao, Wei ;
Gonzalez, Javier ;
Granados, Joel ;
Hall, Nancy ;
Idreos, Stratos ;
Ivanova, Milena ;
Johnson, Ryan ;
Koop, David ;
Kraska, Tim ;
Mueller, Rene ;
Olteanu, Dan ;
Papotti, Paolo ;
Reilly, Christine ;
Tsirogiannis, Dimitris ;
Yu, Cong ;
Freire, Juliana ;
Shasha, Dennis .
SIGMOD RECORD, 2011, 40 (02) :45-48
[9]
BOURNE PE, 2013, IMPROVING FUTURE RES
[10]
What Do I Want from the Publisher of the Future? [J].
Bourne, Philip E. .
PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (05) :1-3