Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering

被引:190
作者
Hao, Xiaolin [1 ]
Jiang, Rui [2 ,3 ]
Chen, Ting [1 ]
机构
[1] Univ So Calif, Dept Biol, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
[2] Tsinghua Univ, Dept Automat, MOE Key Lab Bioinformat, TNLIST, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Automat, TNLIST, Bioinformat Div, Beijing 100084, Peoples R China
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
MULTIPLE SEQUENCE ALIGNMENT; MICROBIAL DIVERSITY; UNKNOWN NUMBER; RARE BIOSPHERE; COMPONENTS; SEARCH;
D O I
10.1093/bioinformatics/btq725
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: With the advancements of next-generation sequencing technology, it is now possible to study samples directly obtained from the environment. Particularly, 16S rRNA gene sequences have been frequently used to profile the diversity of organisms in a sample. However, such studies are still taxed to determine both the number of operational taxonomic units (OTUs) and their relative abundance in a sample. Results: To address these challenges, we propose an unsupervised Bayesian clustering method termed Clustering 16S rRNA for OTU Prediction (CROP). CROP can find clusters based on the natural organization of data without setting a hard cut-off threshold (3%/5%) as required by hierarchical clustering methods. By applying our method to several datasets, we demonstrate that CROP is robust against sequencing errors and that it produces more accurate results than conventional hierarchical clustering methods.
引用
收藏
页码:611 / 618
页数:8
相关论文
共 23 条
[1]   Efficient functional clustering of protein sequences using the Dirichlet process [J].
Brown, Duncan P. .
BIOINFORMATICS, 2008, 24 (16) :1765-1771
[2]   The Ribosomal Database Project: improved alignments and new tools for rRNA analysis [J].
Cole, J. R. ;
Wang, Q. ;
Cardenas, E. ;
Fish, J. ;
Chai, B. ;
Farris, R. J. ;
Kulam-Syed-Mohideen, A. S. ;
McGarrell, D. M. ;
Marsh, T. ;
Garrity, G. M. ;
Tiedje, J. M. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D141-D145
[3]   Bacterial Community Variation in Human Body Habitats Across Space and Time [J].
Costello, Elizabeth K. ;
Lauber, Christian L. ;
Hamady, Micah ;
Fierer, Noah ;
Gordon, Jeffrey I. ;
Knight, Rob .
SCIENCE, 2009, 326 (5960) :1694-1697
[4]   NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes [J].
DeSantis, T. Z. ;
Hugenholtz, P. ;
Keller, K. ;
Brodie, E. L. ;
Larsen, N. ;
Piceno, Y. M. ;
Phan, R. ;
Andersen, G. L. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :W394-W399
[5]   Environmental shotgun sequencing: Its potential and challenges for studying the hidden world of microbes [J].
Eisen, Jonathan A. .
PLOS BIOLOGY, 2007, 5 (03) :384-388
[6]   An efficient algorithm for large-scale detection of protein families [J].
Enright, AJ ;
Van Dongen, S ;
Ouzounis, CA .
NUCLEIC ACIDS RESEARCH, 2002, 30 (07) :1575-1584
[7]   Topographical and Temporal Diversity of the Human Skin Microbiome [J].
Grice, Elizabeth A. ;
Kong, Heidi H. ;
Conlan, Sean ;
Deming, Clayton B. ;
Davis, Joie ;
Young, Alice C. ;
Bouffard, Gerard G. ;
Blakesley, Robert W. ;
Murray, Patrick R. ;
Green, Eric D. ;
Turner, Maria L. ;
Segre, Julia A. .
SCIENCE, 2009, 324 (5931) :1190-1192
[8]   Accuracy and quality of massively parallel DNA pyrosequencing [J].
Huse, Susan M. ;
Huber, Julie A. ;
Morrison, Hilary G. ;
Sogin, Mitchell L. ;
Mark Welch, David .
GENOME BIOLOGY, 2007, 8 (07)
[9]   Ironing out the wrinkles in the rare biosphere through improved OTU clustering [J].
Huse, Susan M. ;
Welch, David Mark ;
Morrison, Hilary G. ;
Sogin, Mitchell L. .
ENVIRONMENTAL MICROBIOLOGY, 2010, 12 (07) :1889-1898
[10]   HIERARCHICAL CLUSTERING SCHEMES [J].
JOHNSON, SC .
PSYCHOMETRIKA, 1967, 32 (03) :241-254