'Big data', Hadoop and cloud computing in genomics

被引:246
作者
O'Driscoll, Aisling [1 ]
Daugelaite, Jurate [2 ]
Sleator, Roy D. [2 ]
机构
[1] Cork Inst Technol, Dept Comp, Cork, Ireland
[2] Cork Inst Technol, Dept Biol Sci, Cork, Ireland
关键词
Cloud computing; Bioinformatics; Big data; Genomics; Hadoop; BIOINFORMATICS; FRAMEWORK; TOOL; PREDICTION; PLATFORM; ERA;
D O I
10.1016/j.jbi.2013.07.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Since the completion of the Human Genome project at the turn of the Century, there has been an unprecedented proliferation of genomic sequence data. A consequence of this is that the medical discoveries of the future will largely depend on our ability to process and analyse large genomic data sets, which continue to expand as the cost of sequencing decreases. Herein, we provide an overview of cloud computing and big data technologies, and discuss how such expertise can be used to deal with biology's big data sets. In particular, big data technologies such as the Apache Hadoop project, which provides distributed and parallelised data processing and analysis of petabyte (PB) scale data sets will be discussed, together with an overview of the current usage of Hadoop within the bioinformatics community. (C) 2013 Elsevier Inc. All rights reserved.
引用
收藏
页码:774 / 781
页数:8
相关论文
共 68 条
[1]  
Angiuoli SV, 2011, BMC BIOINFORM, P12
[2]  
[Anonymous], 2011, BIG DATA NEXT FRONTI
[3]   Bio and health informatics meets cloud : BioVLab as an example [J].
Chae H. ;
Jung I. ;
Lee H. ;
Marru S. ;
Lee S.-W. ;
Kim S. .
Health Information Science and Systems, 1 (1)
[4]   A de novo next generation genomic sequence assembler based on string graph and MapReduce cloud computing framework [J].
Chang, Yu-Jung ;
Chen, Chien-Chih ;
Chen, Chuen-Liang ;
Ho, Jan-Ming .
BMC GENOMICS, 2012, 13
[5]   Nephele: genotyping via complete composition vectors and MapReduce [J].
Colosimo, Marc E. ;
Peterson, Matthew W. ;
Mardis, Scott ;
Hirschman, Lynette .
SOURCE CODE FOR BIOLOGY AND MEDICINE, 2011, 6 (01)
[6]   Predicting protein structures with a multiplayer online game [J].
Cooper, Seth ;
Khatib, Firas ;
Treuille, Adrien ;
Barbero, Janos ;
Lee, Jeehyung ;
Beenen, Michael ;
Leaver-Fay, Andrew ;
Baker, David ;
Popovic, Zoran ;
Players, Foldit .
NATURE, 2010, 466 (7307) :756-760
[7]  
Dai L, 2012, BIOL DIRECT, P7
[8]   Bioinformatics clouds for big data manipulation [J].
Dai, Lin ;
Gao, Xin ;
Guo, Yan ;
Xiao, Jingfa ;
Zhang, Zhang .
BIOLOGY DIRECT, 2012, 7
[9]  
Davenport THP, 2012, HARWARD BUSINESS, V90, P128
[10]  
Davies K., 2010, The $1,000 Genome: The Revolution in DNA Sequencing and the New Era of Personalized Medicine