FLASH: fast length adjustment of short reads to improve genome assemblies

被引:11181
作者
Magoc, Tanja [1 ]
Salzberg, Steven L. [1 ]
机构
[1] Johns Hopkins Univ, Sch Med, McKusick Nathans Inst Genet Med, Baltimore, MD 21205 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btr507
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome. Results: We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads > 99% of the time on simulated reads with an error rate of < 1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds.
引用
收藏
页码:2957 / 2963
页数:7
相关论文
共 9 条
[1]   High-quality draft assemblies of mammalian genomes from massively parallel sequence data [J].
Gnerre, Sante ;
MacCallum, Iain ;
Przybylski, Dariusz ;
Ribeiro, Filipe J. ;
Burton, Joshua N. ;
Walker, Bruce J. ;
Sharpe, Ted ;
Hall, Giles ;
Shea, Terrance P. ;
Sykes, Sean ;
Berlin, Aaron M. ;
Aird, Daniel ;
Costello, Maura ;
Daza, Riza ;
Williams, Louise ;
Nicol, Robert ;
Gnirke, Andreas ;
Nusbaum, Chad ;
Lander, Eric S. ;
Jaffe, David B. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) :1513-1518
[2]   Quake: quality-aware detection and correction of sequencing errors [J].
Kelley, David R. ;
Schatz, Michael C. ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2010, 11 (11)
[3]   Versatile and open software for comparing large genomes [J].
Kurtz, S ;
Phillippy, A ;
Delcher, AL ;
Smoot, M ;
Shumway, M ;
Antonescu, C ;
Salzberg, SL .
GENOME BIOLOGY, 2004, 5 (02)
[4]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[5]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[6]   De novo assembly of human genomes with massively parallel short read sequencing [J].
Li, Ruiqiang ;
Zhu, Hongmei ;
Ruan, Jue ;
Qian, Wubin ;
Fang, Xiaodong ;
Shi, Zhongbin ;
Li, Yingrui ;
Li, Shengting ;
Shan, Gao ;
Kristiansen, Karsten ;
Li, Songgang ;
Yang, Huanming ;
Wang, Jian ;
Wang, Jun .
GENOME RESEARCH, 2010, 20 (02) :265-272
[7]   ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads [J].
MacCallum, Iain ;
Przybylski, Dariusz ;
Gnerre, Sante ;
Burton, Joshua ;
Shlyakhter, Ilya ;
Gnirke, Andreas ;
Malek, Joel ;
McKernan, Kevin ;
Ranade, Swati ;
Shea, Terrance P. ;
Williams, Louise ;
Young, Sarah ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME BIOLOGY, 2009, 10 (10)
[8]   Aggressive assembly of pyrosequencing reads with mates [J].
Miller, Jason R. ;
Delcher, Arthur L. ;
Koren, Sergey ;
Venter, Eli ;
Walenz, Brian P. ;
Brownley, Anushka ;
Johnson, Justin ;
Li, Kelvin ;
Mobarry, Clark ;
Sutton, Granger .
BIOINFORMATICS, 2008, 24 (24) :2818-2824
[9]   Unlocking Short Read Sequencing for Metagenomics [J].
Rodrigue, Sebastien ;
Materna, Arne C. ;
Timberlake, Sonia C. ;
Blackburn, Matthew C. ;
Malmstrom, Rex R. ;
Alm, Eric J. ;
Chisholm, Sallie W. .
PLOS ONE, 2010, 5 (07)