Galaxy CloudMan: delivering cloud compute clusters

被引:106
作者
Afgan, Enis [1 ,2 ]
Baker, Dannon [1 ,2 ]
Coraor, Nate [3 ]
Chapman, Brad [4 ]
Nekrutenko, Anton [3 ]
Taylor, James [1 ,2 ]
机构
[1] Emory Univ, Dept Biol, Atlanta, GA 30322 USA
[2] Emory Univ, Dept Math & Comp Sci, Atlanta, GA 30322 USA
[3] Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA
[4] Massachusetts Gen Hosp, Dept Mol Biol, Simches Res Ctr, Boston, MA 02114 USA
来源
BMC BIOINFORMATICS | 2010年 / 11卷
基金
美国国家科学基金会;
关键词
Computing power;
D O I
10.1186/1471-2105-11-S12-S4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is "cloud computing", which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate "as is" use by experimental biologists. Results: We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. Conclusions: The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.
引用
收藏
页数:6
相关论文
共 10 条
[1]  
ARMBRUST M, 2009, BOOK CLOUDS BERKELEY, P23
[2]   Open software for biologists: from famine to feast [J].
Field, Dawn ;
Tiwari, Bela ;
Booth, Tim ;
Houten, Stewart ;
Swan, Dan ;
Bertrand, Nicolas ;
Thurston, Milo .
NATURE BIOTECHNOLOGY, 2006, 24 (07) :801-803
[3]  
Keahey Katarzyna, 2008, 2008 IEEE Fourth International Conference on eScience, P301, DOI 10.1109/eScience.2008.82
[4]   Searching for SNPs with cloud computing [J].
Langmead, Ben ;
Schatz, Michael C. ;
Lin, Jimmy ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (11)
[5]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[6]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[7]   Cloud computing and the DNA data race [J].
Schatz, Michael C. ;
Langmead, Ben ;
Salzberg, Steven L. .
NATURE BIOTECHNOLOGY, 2010, 28 (07) :691-693
[8]   CloudBurst: highly sensitive read mapping with MapReduce [J].
Schatz, Michael C. .
BIOINFORMATICS, 2009, 25 (11) :1363-1369
[9]  
Taylor J, 2007, CURRENT PROTOCOLS BI, V19, DOI DOI 10.15.11-10.15.25
[10]   Cloud computing for comparative genomics [J].
Wall, Dennis P. ;
Kudtarkar, Parul ;
Fusaro, Vincent A. ;
Pivovarov, Rimma ;
Patil, Prasad ;
Tonellato, Peter J. .
BMC BIOINFORMATICS, 2010, 11