PANDAseq: PAired-eND Assembler for Illumina sequences

被引:1739
作者
Masella, Andre P. [1 ]
Bartram, Andrea K. [1 ]
Truszkowski, Jakub M. [2 ]
Brown, Daniel G. [2 ]
Neufeld, Josh D. [1 ]
机构
[1] Univ Waterloo, Dept Biol, Waterloo, ON N2L 3G1, Canada
[2] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
16S RIBOSOMAL-RNA; DIVERSITY;
D O I
10.1186/1471-2105-13-31
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Background: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. Results: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. Conclusions: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naive assembly with negligible loss of "good" sequence.
引用
收藏
页数:7
相关论文
共 13 条
[1]
Generation of Multimillion-Sequence 16S rRNA Gene Libraries from Complex Microbial Communities by Assembling Paired-End Illumina Reads [J].
Bartram, Andrea K. ;
Lynch, Michael D. J. ;
Stearns, Jennifer C. ;
Moreno-Hagelsieb, Gabriel ;
Neufeld, Josh D. .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2011, 77 (11) :3846-3852
[2]
Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample [J].
Caporaso, J. Gregory ;
Lauber, Christian L. ;
Walters, William A. ;
Berg-Lyons, Donna ;
Lozupone, Catherine A. ;
Turnbaugh, Peter J. ;
Fierer, Noah ;
Knight, Rob .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 :4516-4522
[3]
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants [J].
Cock, Peter J. A. ;
Fields, Christopher J. ;
Goto, Naohisa ;
Heuer, Michael L. ;
Rice, Peter M. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (06) :1767-1771
[4]
The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data [J].
Cole, J. R. ;
Chai, B. ;
Farris, R. J. ;
Wang, Q. ;
Kulam-Syed-Mohideen, A. S. ;
McGarrell, D. M. ;
Bandela, A. M. ;
Cardenas, E. ;
Garrity, G. M. ;
Tiedje, J. M. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D169-D172
[5]
The Ribosomal Database Project: improved alignments and new tools for rRNA analysis [J].
Cole, J. R. ;
Wang, Q. ;
Cardenas, E. ;
Fish, J. ;
Chai, B. ;
Farris, R. J. ;
Kulam-Syed-Mohideen, A. S. ;
McGarrell, D. M. ;
Marsh, T. ;
Garrity, G. M. ;
Tiedje, J. M. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D141-D145
[6]
Degnan P H., 2011, ISME J
[7]
Microbiome Profiling by Illumina Sequencing of Combinatorial Sequence-Tagged PCR Products [J].
Gloor, Gregory B. ;
Hummelen, Ruben ;
Macklaim, Jean M. ;
Dickson, Russell J. ;
Fernandes, Andrew D. ;
MacPhee, Roderick ;
Reid, Gregor .
PLOS ONE, 2010, 5 (10)
[8]
Illumina, 2010, CASAVA SOFTW VERS 1
[9]
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences [J].
Li, Weizhong ;
Godzik, Adam .
BIOINFORMATICS, 2006, 22 (13) :1658-1659
[10]
PROFILING OF COMPLEX MICROBIAL-POPULATIONS BY DENATURING GRADIENT GEL-ELECTROPHORESIS ANALYSIS OF POLYMERASE CHAIN REACTION-AMPLIFIED GENES-CODING FOR 16S RIBOSOMAL-RNA [J].
MUYZER, G ;
DEWAAL, EC ;
UITTERLINDEN, AG .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 1993, 59 (03) :695-700