SHREC: a short-read error correction method

被引:99
作者
Schroeder, Jan [1 ,3 ]
Schroeder, Heiko [2 ]
Puglisi, Simon J. [2 ]
Sinha, Ranjan [3 ]
Schmidt, Bertil [4 ]
机构
[1] Univ Kiel, Inst Informat, D-24118 Kiel, Germany
[2] RMIT Univ, Sch Comp Sci & Informat Technol, Melbourne, Vic 3000, Australia
[3] Univ Melbourne, Dept Comp Sci & Software Engn, Melbourne, Vic 3010, Australia
[4] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
基金
澳大利亚研究理事会;
关键词
SEQUENCING TECHNOLOGY; MILLIONS;
D O I
10.1093/bioinformatics/btp379
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Second-generation sequencing technologies produce a massive amount of short reads in a single experiment. However, sequencing errors can cause major problems when using this approach for de novo sequencing applications. Moreover, existing error correction methods have been designed and optimized for shortgun sequencing. Therefore, there is an urgent need for the design of fast and accurate computational methods and tools for error correction of large amounts of short read data. Results: We present SHREC, a new algorithm for correcting errors in short-read data that uses a generalized suffix trie on the read data as the underlying data structure. Our results show that the method can identify erroneous reads with sensitivity and specificity of over 99% and 96% for simulated data with error rates of up to 3% as well as for real data. Furthermore, it achieves an error correction accuracy of over 80% for simulated data and over 88% for real data. These results are clearly superior to previously published approaches. SHREC is available as an efficient open-source Java implementation that allows processing of 10 million of short reads on a standard workstation.
引用
收藏
页码:2157 / 2163
页数:7
相关论文
共 19 条
[1]  
[Anonymous], 1997, ACM SIGACT NEWS
[2]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[3]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[4]   Fragment assembly with short reads [J].
Chaisson, M ;
Pevzner, P ;
Tang, HX .
BIOINFORMATICS, 2004, 20 (13) :2067-2074
[5]   Short read fragment assembly of bacterial genomes [J].
Chaisson, Mark J. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2008, 18 (02) :324-330
[6]   De novo fragment assembly with short mate-paired reads: Does the read length matter? [J].
Chaisson, Mark J. ;
Brinza, Dumitru ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2009, 19 (02) :336-346
[7]   Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
[8]   SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
GENOME RESEARCH, 2007, 17 (11) :1697-1706
[9]   De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer [J].
Hernandez, David ;
Francois, Patrice ;
Farinelli, Laurent ;
Osteras, Magne ;
Schrenzel, Jacques .
GENOME RESEARCH, 2008, 18 (05) :802-809
[10]   The impact of next-generation sequencing technology on genetics [J].
Mardis, Elaine R. .
TRENDS IN GENETICS, 2008, 24 (03) :133-141