The Effects of Alignment Quality, Distance Calculation Method, Sequence Filtering, and Region on the Analysis of 16S rRNA Gene-Based Studies

被引:263
作者
Schloss, Patrick D. [1 ]
机构
[1] Univ Michigan, Dept Microbiol & Immunol, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
DIVERSITY; DATABASE; MUSCLE; SPACE;
D O I
10.1371/journal.pcbi.1000844
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Pyrosequencing of PCR-amplified fragments that target variable regions within the 16S rRNA gene has quickly become a powerful method for analyzing the membership and structure of microbial communities. This approach has revealed and introduced questions that were not fully appreciated by those carrying out traditional Sanger sequencing-based methods. These include the effects of alignment quality, the best method of calculating pairwise genetic distances for 16S rRNA genes, whether it is appropriate to filter variable regions, and how the choice of variable region relates to the genetic diversity observed in full-length sequences. I used a diverse collection of 13,501 high-quality full-length sequences to assess each of these questions. First, alignment quality had a significant impact on distance values and downstream analyses. Specifically, the greengenes alignment, which does a poor job of aligning variable regions, predicted higher genetic diversity, richness, and phylogenetic diversity than the SILVA and RDP-based alignments. Second, the effect of different gap treatments in determining pairwise genetic distances was strongly affected by the variation in sequence length for a region; however, the effect of different calculation methods was subtle when determining the sample's richness or phylogenetic diversity for a region. Third, applying a sequence mask to remove variable positions had a profound impact on genetic distances by muting the observed richness and phylogenetic diversity. Finally, the genetic distances calculated for each of the variable regions did a poor job of correlating with the full-length gene. Thus, while it is tempting to apply traditional cutoff levels derived for full-length sequences to these shorter sequences, it is not advisable. Analysis of beta-diversity metrics showed that each of these factors can have a significant impact on the comparison of community membership and structure. Taken together, these results urge caution in the design and interpretation of analyses using pyrosequencing data.
引用
收藏
页数:16
相关论文
共 32 条
[1]   PyNAST: a flexible tool for aligning sequences to a template alignment [J].
Caporaso, J. Gregory ;
Bittinger, Kyle ;
Bushman, Frederic D. ;
DeSantis, Todd Z. ;
Andersen, Gary L. ;
Knight, Rob .
BIOINFORMATICS, 2010, 26 (02) :266-267
[2]   The Ribosomal Database Project: improved alignments and new tools for rRNA analysis [J].
Cole, J. R. ;
Wang, Q. ;
Cardenas, E. ;
Fish, J. ;
Chai, B. ;
Farris, R. J. ;
Kulam-Syed-Mohideen, A. S. ;
McGarrell, D. M. ;
Marsh, T. ;
Garrity, G. M. ;
Tiedje, J. M. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D141-D145
[3]   Bacterial Community Variation in Human Body Habitats Across Space and Time [J].
Costello, Elizabeth K. ;
Lauber, Christian L. ;
Hamady, Micah ;
Fierer, Noah ;
Gordon, Jeffrey I. ;
Knight, Rob .
SCIENCE, 2009, 326 (5960) :1694-1697
[4]   NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes [J].
DeSantis, T. Z. ;
Hugenholtz, P. ;
Keller, K. ;
Brodie, E. L. ;
Larsen, N. ;
Piceno, Y. M. ;
Phan, R. ;
Andersen, G. L. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :W394-W399
[5]   Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB [J].
DeSantis, T. Z. ;
Hugenholtz, P. ;
Larsen, N. ;
Rojas, M. ;
Brodie, E. L. ;
Keller, K. ;
Huber, T. ;
Dalevi, D. ;
Hu, P. ;
Andersen, G. L. .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2006, 72 (07) :5069-5072
[6]   The Pervasive Effects of an Antibiotic on the Human Gut Microbiota, as Revealed by Deep 16S rRNA Sequencing [J].
Dethlefsen, Les ;
Huse, Sue ;
Sogin, Mitchell L. ;
Relman, David A. .
PLOS BIOLOGY, 2008, 6 (11) :2383-2400
[7]   Diversity of the human intestinal microbial flora [J].
Eckburg, PB ;
Bik, EM ;
Bernstein, CN ;
Purdom, E ;
Dethlefsen, L ;
Sargent, M ;
Gill, SR ;
Nelson, KE ;
Relman, DA .
SCIENCE, 2005, 308 (5728) :1635-1638
[8]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[9]   The Cladistic Basis for the Phylogenetic Diversity (PD) Measure Links Evolutionary Features to Environmental Gradients and Supports Broad Applications of Microbial Ecology's "Phylogenetic Beta Diversity" Framework [J].
Faith, Daniel P. ;
Lozupone, Catherine A. ;
Nipperess, David ;
Knight, Rob .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2009, 10 (11) :4723-4741
[10]  
FELSENSTEIN J, 1989, CLADISTICS, V5, P166