Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

被引:45
作者
Angiuoli, Samuel V. [1 ,2 ]
White, James R. [1 ]
Matalka, Malcolm [1 ]
White, Owen [1 ]
Fricke, W. Florian [1 ]
机构
[1] Univ Maryland, IGS, Baltimore, MD 21201 USA
[2] Univ Maryland, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
来源
PLOS ONE | 2011年 / 6卷 / 10期
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
PROTEIN FAMILIES; RNA GENES; ANNOTATION; MEDICINE; DATABASE; SOFTWARE; DNA;
D O I
10.1371/journal.pone.0026624
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck'' resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results: We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small-to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions: Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.
引用
收藏
页数:10
相关论文
共 48 条
[21]   The TIGRFAMs database of protein families [J].
Haft, DH ;
Selengut, JD ;
White, O .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :371-373
[22]   MEGAN analysis of metagenomic data [J].
Huson, Daniel H. ;
Auch, Alexander F. ;
Qi, Ji ;
Schuster, Stephan C. .
GENOME RESEARCH, 2007, 17 (03) :377-386
[23]   KEGG for representation and analysis of molecular networks involving diseases and drugs [J].
Kanehisa, Minoru ;
Goto, Susumu ;
Furumichi, Miho ;
Tanabe, Mao ;
Hirakawa, Mika .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D355-D360
[24]  
Koenig JE, 2010, P NATL ACAD SCI US
[25]   RNAmmer:: consistent and rapid annotation of ribosomal RNA genes [J].
Lagesen, Karin ;
Hallin, Peter ;
Rodland, Einar Andreas ;
Stærfeldt, Hans-Henrik ;
Rognes, Torbjorn ;
Ussery, David W. .
NUCLEIC ACIDS RESEARCH, 2007, 35 (09) :3100-3108
[26]   Cloud-scale RNA-sequencing differential expression analysis with Myrna [J].
Langmead, Ben ;
Hansen, Kasper D. ;
Leek, Jeffrey T. .
GENOME BIOLOGY, 2010, 11 (08) :R83
[27]   Searching for SNPs with cloud computing [J].
Langmead, Ben ;
Schatz, Michael C. ;
Lin, Jimmy ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (11)
[28]   tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence [J].
Lowe, TM ;
Eddy, SR .
NUCLEIC ACIDS RESEARCH, 1997, 25 (05) :955-964
[29]   eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations [J].
Muller, J. ;
Szklarczyk, D. ;
Julien, P. ;
Letunic, I. ;
Roth, A. ;
Kuhn, M. ;
Powell, S. ;
von Mering, C. ;
Doerks, T. ;
Jensen, L. J. ;
Bork, P. .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D190-D195
[30]   MetaGene: prokaryotic gene finding from environmental genome shotgun sequences [J].
Noguchi, Hideki ;
Park, Jungho ;
Takagi, Toshihisa .
NUCLEIC ACIDS RESEARCH, 2006, 34 (19) :5623-5630