M-pick, a modularity-based method for OTU picking of 16S rRNA sequences

被引:29
作者
Wang, Xiaoyu [1 ,2 ]
Yao, Jin [1 ,2 ,3 ]
Sun, Yijun [4 ]
Mai, Volker [1 ,2 ]
机构
[1] Univ Florida, Coll Publ Hlth & Hlth Profess, Dept Epidemiol, Gainesville, FL 32610 USA
[2] Univ Florida, Coll Med, Emerging Pathogens Inst, Gainesville, FL 32610 USA
[3] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32610 USA
[4] SUNY Buffalo, Dept Comp Sci & Engn, Ctr Excellence Bioinformat & Life Sci, Dept Microbiol & Immunol, Buffalo, NY 14214 USA
来源
BMC BIOINFORMATICS | 2013年 / 14卷
基金
美国国家科学基金会;
关键词
PROGRAM;
D O I
10.1186/1471-2105-14-43
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Binning 16S rRNA sequences into operational taxonomic units (OTUs) is an initial crucial step in analyzing large sequence datasets generated to determine microbial community compositions in various environments including that of the human gut. Various methods have been developed, but most suffer from either inaccuracies or from being unable to handle millions of sequences generated in current studies. Furthermore, existing binning methods usually require a priori decisions regarding binning parameters such as a distance level for defining an OTU. Results: We present a novel modularity-based approach (M-pick) to address the aforementioned problems. The new method utilizes ideas from community detection in graphs, where sequences are viewed as vertices on a weighted graph, each pair of sequences is connected by an imaginary edge, and the similarity of a pair of sequences represents the weight of the edge. M-pick first generates a graph based on pairwise sequence distances and then applies a modularity-based community detection technique on the graph to generate OTUs to capture the community structures in sequence data. To compare the performance of M-pick with that of existing methods, specifically CROP and ESPRIT-Tree, sequence data from different hypervariable regions of 16S rRNA were used and binning results were compared. Conclusions: A new modularity-based clustering method for OTU picking of 16S rRNA sequences is developed in this study. The algorithm does not require a predetermined cut-off level, and our simulation studies suggest that it is superior to existing methods that require specified distance levels to define OTUs. The source code is available at http://plaza.ufl.edu/xywang/Mpick.htm.
引用
收藏
页数:8
相关论文
共 29 条
[1]   A comparison of extrinsic clustering evaluation metrics based on formal constraints [J].
Amigo, Enrique ;
Gonzalo, Julio ;
Artiles, Javier ;
Verdejo, Felisa .
INFORMATION RETRIEVAL, 2009, 12 (04) :461-486
[2]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[3]   ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time [J].
Cai, Yunpeng ;
Sun, Yijun .
NUCLEIC ACIDS RESEARCH, 2011, 39 (14) :e95
[4]   Bayesian estimation of bacterial community composition from 454 sequencing data [J].
Cheng, Lu ;
Walker, Alan W. ;
Corander, Jukka .
NUCLEIC ACIDS RESEARCH, 2012, 40 (12) :5240-5249
[5]   The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis [J].
Cole, JR ;
Chai, B ;
Farris, RJ ;
Wang, Q ;
Kulam, SA ;
McGarrell, DM ;
Garrity, GM ;
Tiedje, JM .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D294-D296
[6]   Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB [J].
DeSantis, T. Z. ;
Hugenholtz, P. ;
Larsen, N. ;
Rojas, M. ;
Brodie, E. L. ;
Keller, K. ;
Huber, T. ;
Dalevi, D. ;
Hu, P. ;
Andersen, G. L. .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2006, 72 (07) :5069-5072
[7]  
Dunn J. C., 1973, Journal of Cybernetics, V3, P32, DOI 10.1080/01969727308546046
[8]   Search and clustering orders of magnitude faster than BLAST [J].
Edgar, Robert C. .
BIOINFORMATICS, 2010, 26 (19) :2460-2461
[9]   Resolution limit in community detection [J].
Fortunato, Santo ;
Barthelemy, Marc .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (01) :36-41
[10]   Community detection in graphs [J].
Fortunato, Santo .
PHYSICS REPORTS-REVIEW SECTION OF PHYSICS LETTERS, 2010, 486 (3-5) :75-174