Reproducibility of computational workflows is automated using continuous analysis

被引:89
作者
Beaulieu-Jones, Brett K. [1 ]
Greene, Casey S. [2 ]
机构
[1] Univ Penn, Perelman Sch Med, Genom & Computat Biol Grad Grp, Philadelphia, PA 19104 USA
[2] Univ Penn, Perelman Sch Med, Dept Syst Pharmacol & Translat Therapeut, Philadelphia, PA 19104 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1038/nbt.3780
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Replication, validation and extension of experiments are crucial for scientific progress. Computational experiments are scriptable and should be easy to reproduce. However, computational analyses are designed and run in a specific computing environment, which may be difficult or impossible to match using written instructions. We report the development of continuous analysis, a workflow that enables reproducible computational analyses. Continuous analysis combines Docker, a container technology akin to virtual machines, with continuous integration, a software development technique, to automatically rerun a computational analysis whenever updates or improvements are made to source code or data. This enables researchers to reproduce results without contacting the study authors. Continuous analysis allows reviewers, editors or readers to verify reproducibility without manually downloading and rerunning code and can provide an audit trail for analyses of data that cannot be shared.
引用
收藏
页码:342 / +
页数:7
相关论文
共 30 条
[1]   Rebooting review [J].
不详 .
NATURE BIOTECHNOLOGY, 2015, 33 (04) :319-319
[2]   Software with impact [J].
不详 .
NATURE METHODS, 2014, 11 (03) :211-211
[3]   Illuminating the black box [J].
不详 .
NATURE, 2006, 442 (7098) :1-1
[4]  
Baker M, 2016, NATURE, V533, P452, DOI 10.1038/533452a
[5]  
Baumer B., 2014, TECHNOL INNOV STAT E, V8
[6]   Semi-supervised learning of the electronic health record for phenotype stratification [J].
Beaulieu-Jones, Brett K. ;
Greene, Casey S. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 64 :168-178
[7]  
Boettiger Carl, 2015, ACM SIGOPS Operating Systems Review, V49, P71
[8]   Organoid Models of Human and Mouse Ductal Pancreatic Cancer [J].
Boj, Sylvia F. ;
Hwang, Chang-Il ;
Baker, Lindsey A. ;
Chio, Iok In Christine ;
Engle, Dannielle D. ;
Corbo, Vincenzo ;
Jager, Myrthe ;
Ponz-Sarvise, Mariano ;
Tiriac, Herve ;
Spector, Mona S. ;
Gracanin, Ana ;
Oni, Tobiloba ;
Yu, Kenneth H. ;
van Boxtel, Ruben ;
Huch, Meritxell ;
Rivera, Keith D. ;
Wilson, John P. ;
Feigin, Michael E. ;
Oehlund, Daniel ;
Handly-Santana, Abram ;
Ardito-Abraham, Christine M. ;
Ludwig, Michael ;
Elyada, Ela ;
Alagesan, Brinda ;
Biffi, Giulia ;
Yordanov, Georgi N. ;
Delcuze, Bethany ;
Creighton, Brianna ;
Wright, Kevin ;
Park, Youngkyu ;
Morsink, Folkert H. M. ;
Molenaar, I. Quintus ;
Rinkes, Inne H. Borel ;
Cuppen, Edwin ;
Hao, Yuan ;
Jin, Ying ;
Nijman, Isaac J. ;
Iacobuzio-Donahue, Christine ;
Leach, Steven D. ;
Pappin, Darryl J. ;
Hammell, Molly ;
Klimstra, David S. ;
Basturk, Olca ;
Hruban, Ralph H. ;
Offerhaus, George Johan ;
Vries, Robert G. J. ;
Clevers, Hans ;
Tuveson, David A. .
CELL, 2015, 160 (1-2) :324-338
[9]   Near-optimal probabilistic RNA-seq quantification (vol 34, pg 525, 2016) [J].
Bray, Nicolas L. ;
Pimentel, Harold ;
Melsted, Pall ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2016, 34 (08) :888-888
[10]   Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data [J].
Dai, MH ;
Wang, PL ;
Boyd, AD ;
Kostov, G ;
Athey, B ;
Jones, EG ;
Bunney, WE ;
Myers, RM ;
Speed, TP ;
Akil, H ;
Watson, SJ ;
Meng, F .
NUCLEIC ACIDS RESEARCH, 2005, 33 (20) :e175.1-e175.9