Scalable metagenomic taxonomy classification using a reference genome database

被引:127
作者
Ames, Sasha K. [1 ,2 ]
Hysom, David A. [1 ,2 ]
Gardner, Shea N. [2 ,3 ]
Lloyd, G. Scott [1 ,2 ]
Gokhale, Maya B. [1 ,2 ]
Allen, Jonathan E. [2 ,3 ]
机构
[1] Ctr Appl Sci Comp, Livermore, CA 94551 USA
[2] Lawrence Livermore Natl Lab, Livermore, CA 94551 USA
[3] Global Secur Directorate, Livermore, CA 94551 USA
关键词
SEQUENCES; ALGORITHM; ACCURATE;
D O I
10.1093/bioinformatics/btt389
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample.
引用
收藏
页码:2253 / 2260
页数:8
相关论文
共 25 条
[1]   Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing [J].
Angiuoli, Samuel V. ;
White, James R. ;
Matalka, Malcolm ;
White, Owen ;
Fricke, W. Florian .
PLOS ONE, 2011, 6 (10)
[2]   Plantagora: Modeling Whole Genome Sequencing and Assembly of Plant Genomes [J].
Barthelson, Roger ;
McFarlin, Adam J. ;
Rounsley, Steven D. ;
Young, Sarah .
PLOS ONE, 2011, 6 (12)
[3]   Rapid phylogenetic and functional classification of short genomic fragments with signature peptides [J].
Berendzen J. ;
Bruno W.J. ;
Cohn J.D. ;
Hengartner N.W. ;
Kuske C.R. ;
McMahon B.H. ;
Wolinsky M.A. ;
Xie G. .
BMC Research Notes, 5 (1)
[4]   PhymmBL expanded: confidence scores, custom databases, parallelization and more [J].
Brady, Arthur ;
Salzberg, Steven .
NATURE METHODS, 2011, 8 (05) :367-367
[5]   Genometa - A Fast and Accurate Classifier for Short Metagenomic Shotgun Reads [J].
Davenport, Colin F. ;
Neugebauer, Jens ;
Beckmann, Nils ;
Friedrich, Benedikt ;
Kameri, Burim ;
Kokott, Svea ;
Paetow, Malte ;
Siekmann, Bjoern ;
Wieding-Drewes, Matthias ;
Wienhoefer, Markus ;
Wolf, Stefan ;
Tuemmler, Burkhard ;
Ahlers, Volker ;
Sprengel, Frauke .
PLOS ONE, 2012, 7 (08)
[6]   Taxonomic binning of metagenome samples generated by next-generation sequencing technologies [J].
Droege, Johannes ;
McHardy, Alice C. .
BRIEFINGS IN BIOINFORMATICS, 2012, 13 (06) :646-655
[7]  
Evans JH, 2006, MATER WORLD, V14, P3
[8]   MEGAN analysis of metagenomic data [J].
Huson, Daniel H. ;
Auch, Alexander F. ;
Qi, Ji ;
Schuster, Stephan C. .
GENOME RESEARCH, 2007, 17 (03) :377-386
[9]   New insights into the Tyrolean Iceman's origin and phenotype as inferred by whole-genome sequencing [J].
Keller, Andreas ;
Graefen, Angela ;
Ball, Markus ;
Matzas, Mark ;
Boisguerin, Valesca ;
Maixner, Frank ;
Leidinger, Petra ;
Backes, Christina ;
Khairat, Rabab ;
Forster, Michael ;
Stade, Bjoern ;
Franke, Andre ;
Mayer, Jens ;
Spangler, Jessica ;
McLaughlin, Stephen ;
Shah, Minita ;
Lee, Clarence ;
Harkins, Timothy T. ;
Sartori, Alexander ;
Moreno-Estrada, Andres ;
Henn, Brenna ;
Sikora, Martin ;
Semino, Ornella ;
Chiaroni, Jacques ;
Rootsi, Siiri ;
Myres, Natalie M. ;
Cabrera, Vicente M. ;
Underhill, Peter A. ;
Bustamante, Carlos D. ;
Vigl, Eduard Egarter ;
Samadelli, Marco ;
Cipollini, Giovanna ;
Haas, Jan ;
Katus, Hugo ;
O'Connor, Brian D. ;
Carlson, Marc R. J. ;
Meder, Benjamin ;
Blin, Nikolaus ;
Meese, Eckart ;
Pusch, Carsten M. ;
Zink, Albert .
NATURE COMMUNICATIONS, 2012, 3
[10]   Unlocking the potential of metagenomics through replicated experimental design [J].
Knight, Rob ;
Jansson, Janet ;
Field, Dawn ;
Fierer, Noah ;
Desai, Narayan ;
Fuhrman, Jed A. ;
Hugenholtz, Phil ;
van der Lelie, Daniel ;
Meyer, Folker ;
Stevens, Rick ;
Bailey, Mark J. ;
Gordon, Jeffrey I. ;
Kowalchuk, George A. ;
Gilbert, Jack A. .
NATURE BIOTECHNOLOGY, 2012, 30 (06) :513-520