Implications of Pyrosequencing Error Correction for Biological Data Interpretation

被引:18
作者
Bakker, Matthew G. [1 ]
Tu, Zheng J. [2 ]
Bradeen, James M. [1 ]
Kinkel, Linda L. [1 ]
机构
[1] Univ Minnesota, Dept Plant Pathol, St Paul, MN 55108 USA
[2] Univ Minnesota, Inst Supercomp, Minneapolis, MN 55455 USA
来源
PLOS ONE | 2012年 / 7卷 / 08期
基金
美国国家科学基金会;
关键词
DIVERSITY; COMMUNITIES; QUALITY; READS; SOIL;
D O I
10.1371/journal.pone.0044357
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
There has been a rapid proliferation of approaches for processing and manipulating second generation DNA sequence data. However, users are often left with uncertainties about how the choice of processing methods may impact biological interpretation of data. In this report, we probe differences in output between two different processing pipelines: a de-noising approach using the AmpliconNoise algorithm for error correction, and a standard approach using quality filtering and preclustering to reduce error. There was a large overlap in reads culled by each method, although AmpliconNoise removed a greater net number of reads. Most OTUs produced by one method had a clearly corresponding partner in the other. Although each method resulted in OTUs consisting entirely of reads that were culled by the other method, there were many more such OTUs formed in the standard pipeline. Total OTU richness was reduced by AmpliconNoise processing, but per-sample OTU richness, diversity and evenness were increased. Increases in per-sample richness and diversity may be a result of AmpliconNoise processing producing a more even OTU rank-abundance distribution. Because communities were randomly subsampled to equalize sample size across communities, and because rare sequence variants are less likely to be selected during subsampling, fewer OTUs were lost from individual communities when subsampling AmpliconNoise-processed data. In contrast to taxon-based diversity estimates, phylogenetic diversity was reduced even on a per-sample basis by de-noising, and samples switched widely in diversity rankings. This work illustrates the significant impacts of processing pipelines on the biological interpretations that can be made from pyrosequencing surveys. This study provides important cautions for analyses of contemporary data, for requisite data archiving (processed vs. non-processed data), and for drawing comparisons among studies performed using distinct data processing pipelines.
引用
收藏
页数:9
相关论文
共 24 条
[1]   QIIME allows analysis of high-throughput community sequencing data [J].
Caporaso, J. Gregory ;
Kuczynski, Justin ;
Stombaugh, Jesse ;
Bittinger, Kyle ;
Bushman, Frederic D. ;
Costello, Elizabeth K. ;
Fierer, Noah ;
Pena, Antonio Gonzalez ;
Goodrich, Julia K. ;
Gordon, Jeffrey I. ;
Huttley, Gavin A. ;
Kelley, Scott T. ;
Knights, Dan ;
Koenig, Jeremy E. ;
Ley, Ruth E. ;
Lozupone, Catherine A. ;
McDonald, Daniel ;
Muegge, Brian D. ;
Pirrung, Meg ;
Reeder, Jens ;
Sevinsky, Joel R. ;
Tumbaugh, Peter J. ;
Walters, William A. ;
Widmann, Jeremy ;
Yatsunenko, Tanya ;
Zaneveld, Jesse ;
Knight, Rob .
NATURE METHODS, 2010, 7 (05) :335-336
[2]   Insidious effects of sequencing errors on perceived diversity in molecular surveys [J].
Dickie, Ian A. .
NEW PHYTOLOGIST, 2010, 188 (04) :916-918
[3]   The genome sequencer FLX™ system-longer reads, more applications, straight forward bioinformatics and more complete data sets [J].
Droege, Marcus ;
Hill, Brendon .
JOURNAL OF BIOTECHNOLOGY, 2008, 136 (1-2) :3-10
[4]   UCHIME improves sensitivity and speed of chimera detection [J].
Edgar, Robert C. ;
Haas, Brian J. ;
Clemente, Jose C. ;
Quince, Christopher ;
Knight, Rob .
BIOINFORMATICS, 2011, 27 (16) :2194-2200
[5]   Accuracy and quality of massively parallel DNA pyrosequencing [J].
Huse, Susan M. ;
Huber, Julie A. ;
Morrison, Hilary G. ;
Sogin, Mitchell L. ;
Mark Welch, David .
GENOME BIOLOGY, 2007, 8 (07)
[6]   Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates [J].
Kunin, Victor ;
Engelbrektson, Anna ;
Ochman, Howard ;
Hugenholtz, Philip .
ENVIRONMENTAL MICROBIOLOGY, 2010, 12 (01) :118-123
[7]   Clustal W and clustal X version 2.0 [J].
Larkin, M. A. ;
Blackshields, G. ;
Brown, N. P. ;
Chenna, R. ;
McGettigan, P. A. ;
McWilliam, H. ;
Valentin, F. ;
Wallace, I. M. ;
Wilm, A. ;
Lopez, R. ;
Thompson, J. D. ;
Gibson, T. J. ;
Higgins, D. G. .
BIOINFORMATICS, 2007, 23 (21) :2947-2948
[8]   UniFrac: a new phylogenetic method for comparing microbial communities [J].
Lozupone, C ;
Knight, R .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2005, 71 (12) :8228-8235
[9]   Molecular evidence for the presence of novel actinomycete lineages in a temperate forest soil [J].
McVeigh, HP ;
Munro, J ;
Embley, TM .
JOURNAL OF INDUSTRIAL MICROBIOLOGY, 1996, 17 (3-4) :197-204
[10]   A pyrosequencing-tailored nucleotide barcode design unveils opportunities for large-scale sample multiplexing [J].
Parameswaran, Poornima ;
Jalili, Roxana ;
Tao, Li ;
Shokralla, Shadi ;
Gharizadeh, Baback ;
Ronaghi, Mostafa ;
Fire, Andrew Z. .
NUCLEIC ACIDS RESEARCH, 2007, 35 (19)