Analysis and comparison of very large metagenomes with fast clustering and functional annotation

被引:76
作者
Li, Weizhong [1 ]
机构
[1] Univ Calif San Diego, Calif Inst Telecommun & Informat Technol, La Jolla, CA 92093 USA
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
PHYLOGENETIC CLASSIFICATION; GENE PREDICTION; PROTEIN; GENERATION; SEQUENCES; PATTERNS; GENOMICS; PROGRAM; SETS;
D O I
10.1186/1471-2105-10-359
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results: The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the " Metagenomic Profiling of Nine Biomes". Conclusion: RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from http://tools.camera.calit2.net/camera/rammcap/.
引用
收藏
页数:9
相关论文
共 35 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   The marine viromes of four oceanic regions [J].
Angly, Florent E. ;
Felts, Ben ;
Breitbart, Mya ;
Salamon, Peter ;
Edwards, Robert A. ;
Carlson, Craig ;
Chan, Amy M. ;
Haynes, Matthew ;
Kelley, Scott ;
Liu, Hong ;
Mahaffy, Joseph M. ;
Mueller, Jennifer E. ;
Nulton, Jim ;
Olson, Robert ;
Parsons, Rachel ;
Rayhawk, Steve ;
Suttle, Curtis A. ;
Rohwer, Forest .
PLOS BIOLOGY, 2006, 4 (11) :2121-2131
[3]  
[Anonymous], IMPROVED HMMERHEAD B
[4]   The RAST server: Rapid annotations using subsystems technology [J].
Aziz, Ramy K. ;
Bartels, Daniela ;
Best, Aaron A. ;
DeJongh, Matthew ;
Disz, Terrence ;
Edwards, Robert A. ;
Formsma, Kevin ;
Gerdes, Svetlana ;
Glass, Elizabeth M. ;
Kubal, Michael ;
Meyer, Folker ;
Olsen, Gary J. ;
Olson, Robert ;
Osterman, Andrei L. ;
Overbeek, Ross A. ;
McNeil, Leslie K. ;
Paarmann, Daniel ;
Paczian, Tobias ;
Parrello, Bruce ;
Pusch, Gordon D. ;
Reich, Claudia ;
Stevens, Rick ;
Vassieva, Olga ;
Vonstein, Veronika ;
Wilke, Andreas ;
Zagnitko, Olga .
BMC GENOMICS, 2008, 9 (1)
[5]   Community genomics among stratified microbial assemblages in the ocean's interior [J].
DeLong, EF ;
Preston, CM ;
Mincer, T ;
Rich, V ;
Hallam, SJ ;
Frigaard, NU ;
Martinez, A ;
Sullivan, MB ;
Edwards, R ;
Brito, BR ;
Chisholm, SW ;
Karl, DM .
SCIENCE, 2006, 311 (5760) :496-503
[6]   Functional metagenomic profiling of nine biomes [J].
Dinsdale, Elizabeth A. ;
Edwards, Robert A. ;
Hall, Dana ;
Angly, Florent ;
Breitbart, Mya ;
Brulc, Jennifer M. ;
Furlan, Mike ;
Desnues, Christelle ;
Haynes, Matthew ;
Li, Linlin ;
McDaniel, Lauren ;
Moran, Mary Ann ;
Nelson, Karen E. ;
Nilsson, Christina ;
Olson, Robert ;
Paul, John ;
Brito, Beltran Rodriguez ;
Ruan, Yijun ;
Swan, Brandon K. ;
Stevens, Rick ;
Valentine, David L. ;
Thurber, Rebecca Vega ;
Wegley, Linda ;
White, Bryan A. ;
Rohwer, Forest .
NATURE, 2008, 452 (7187) :629-U8
[7]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[8]   Metagenomic analysis of the human distal gut microbiome [J].
Gill, Steven R. ;
Pop, Mihai ;
DeBoy, Robert T. ;
Eckburg, Paul B. ;
Turnbaugh, Peter J. ;
Samuel, Buck S. ;
Gordon, Jeffrey I. ;
Relman, David A. ;
Fraser-Liggett, Claire M. ;
Nelson, Karen E. .
SCIENCE, 2006, 312 (5778) :1355-1359
[9]   Gene prediction in metagenomic fragments: A large scale machine learning approach [J].
Hoff, Katharina J. ;
Tech, Maike ;
Lingner, Thomas ;
Daniel, Rolf ;
Morgenstern, Burkhard ;
Meinicke, Peter .
BMC BIOINFORMATICS, 2008, 9 (1)
[10]   MEGAN analysis of metagenomic data [J].
Huson, Daniel H. ;
Auch, Alexander F. ;
Qi, Ji ;
Schuster, Stephan C. .
GENOME RESEARCH, 2007, 17 (03) :377-386