Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences

被引:420
作者
Rideout, Jai Ram [1 ,2 ]
He, Yan [3 ,4 ]
Navas-Molina, Jose A. [5 ]
Walters, William A. [6 ]
Ursell, Luke K. [7 ]
Gibbons, Sean M. [8 ,11 ]
Chase, John [9 ]
McDonald, Daniel [5 ,10 ]
Gonzalez, Antonio [10 ]
Robbins-Pianka, Adam [5 ,10 ]
Clemente, Jose C. [2 ]
Gilbert, Jack A. [11 ,12 ]
Huse, Susan M. [13 ]
Zhou, Hong-Wei [3 ,4 ]
Knight, Rob [10 ,14 ]
Caporaso, J. Gregory [1 ,9 ]
机构
[1] No Arizona Univ, Ctr Microbial Genet & Gen, Flagstaff, AZ 86011 USA
[2] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, New York, NY 10029 USA
[3] Southern Med Univ, Sch Publ Hlth & Trop Med, State Key Lab Organ Failure Prevent, Guangzhou, Guangdong, Peoples R China
[4] Southern Med Univ, Sch Publ Hlth & Trop Med, Dept Environm Hlth, Guangzhou, Guangdong, Peoples R China
[5] Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USA
[6] Univ Colorado, Dept Mol Cellular & Dev Biol, Boulder, CO 80309 USA
[7] Univ Colorado, Dept Chem & Biochem, Boulder, CO 80309 USA
[8] Univ Chicago, Grad Program Biophys Sci, Chicago, IL 60637 USA
[9] No Arizona Univ, Dept Biol Sci, Flagstaff, AZ 86011 USA
[10] Univ Colorado, BioFrontiers Inst, Boulder, CO 80309 USA
[11] Argonne Natl Lab, Inst Genom & Syst Biol, Lemont, IL USA
[12] Univ Chicago, Dept Ecol & Evolut, Chicago, IL 60637 USA
[13] Brown Univ, Warren Alpert Med Sch, Dept Pathol & Lab Sci, Providence, RI 02912 USA
[14] Univ Colorado, Howard Hughes Med Inst, Boulder, CO 80309 USA
来源
PEERJ | 2014年 / 2卷
基金
美国国家科学基金会;
关键词
OTU picking; Microbial ecology; Microbiome; Qiime; Bioinformatics; GREENGENES; QIIME;
D O I
10.7717/peerj.545
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and close-dreference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to "classic" open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, "classic" open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of "classic" open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by "classic" open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME's uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME's OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.
引用
收藏
页数:25
相关论文
共 19 条
[1]   Moving pictures of the human microbiome [J].
Caporaso, J. Gregory ;
Lauber, Christian L. ;
Costello, Elizabeth K. ;
Berg-Lyons, Donna ;
Gonzalez, Antonio ;
Stombaugh, Jesse ;
Knights, Dan ;
Gajer, Pawel ;
Ravel, Jacques ;
Fierer, Noah ;
Gordon, Jeffrey I. ;
Knight, Rob .
GENOME BIOLOGY, 2011, 12 (05)
[2]   QIIME allows analysis of high-throughput community sequencing data [J].
Caporaso, J. Gregory ;
Kuczynski, Justin ;
Stombaugh, Jesse ;
Bittinger, Kyle ;
Bushman, Frederic D. ;
Costello, Elizabeth K. ;
Fierer, Noah ;
Pena, Antonio Gonzalez ;
Goodrich, Julia K. ;
Gordon, Jeffrey I. ;
Huttley, Gavin A. ;
Kelley, Scott T. ;
Knights, Dan ;
Koenig, Jeremy E. ;
Ley, Ruth E. ;
Lozupone, Catherine A. ;
McDonald, Daniel ;
Muegge, Brian D. ;
Pirrung, Meg ;
Reeder, Jens ;
Sevinsky, Joel R. ;
Tumbaugh, Peter J. ;
Walters, William A. ;
Widmann, Jeremy ;
Yatsunenko, Tanya ;
Zaneveld, Jesse ;
Knight, Rob .
NATURE METHODS, 2010, 7 (05) :335-336
[3]   Bacterial Community Variation in Human Body Habitats Across Space and Time [J].
Costello, Elizabeth K. ;
Lauber, Christian L. ;
Hamady, Micah ;
Fierer, Noah ;
Gordon, Jeffrey I. ;
Knight, Rob .
SCIENCE, 2009, 326 (5960) :1694-1697
[4]  
CUTTING DR, 1992, SIGIR 92 : PROCEEDINGS OF THE FIFTEENTH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P318
[5]   Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB [J].
DeSantis, T. Z. ;
Hugenholtz, P. ;
Larsen, N. ;
Rojas, M. ;
Brodie, E. L. ;
Keller, K. ;
Huber, T. ;
Dalevi, D. ;
Hu, P. ;
Andersen, G. L. .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2006, 72 (07) :5069-5072
[6]   Search and clustering orders of magnitude faster than BLAST [J].
Edgar, Robert C. .
BIOINFORMATICS, 2010, 26 (19) :2460-2461
[7]   Oligotyping: differentiating between closely related microbial taxa using 16S rRNA gene data [J].
Eren, A. Murat ;
Maignien, Lois ;
Sul, Woo Jun ;
Murphy, Leslie G. ;
Grim, Sharon L. ;
Morrison, Hilary G. ;
Sogin, Mitchell L. .
METHODS IN ECOLOGY AND EVOLUTION, 2013, 4 (12) :1111-1119
[8]   CONSERVATION EVALUATION AND PHYLOGENETIC DIVERSITY [J].
FAITH, DP .
BIOLOGICAL CONSERVATION, 1992, 61 (01) :1-10
[9]   Meeting Report: The Terabase Metagenomics Workshop and the Vision of an Earth Microbiome Project [J].
Jack A. Gilbert ;
Folker Meyer ;
Dion Antonopoulos ;
Pavan Balaji ;
C. Titus Brown ;
Christopher T. Brown ;
Narayan Desai ;
Jonathan A. Eisen ;
Dirk Evers ;
Dawn Field ;
Wu Feng ;
Daniel Huson ;
Janet Jansson ;
Rob Knight ;
James Knight ;
Eugene Kolker ;
Kostas Konstantindis ;
Joel Kostka ;
Nikos Kyrpides ;
Rachel Mackelprang ;
Alice McHardy ;
Christopher Quince ;
Jeroen Raes ;
Alexander Sczyrba ;
Ashley Shade ;
Rick Stevens .
Standards in Genomic Sciences, 2010, 3 (3) :243-248
[10]  
Jensen E. C., 2002, Proceedings of the Eleventh International Conference on Information and Knowledge Management. CIKM 2002, P684, DOI 10.1145/584792.584919