HiTEC: accurate error correction in high-throughput sequencing data

被引:89
作者
Ilie, Lucian [1 ]
Fazayeli, Farideh [1 ]
Ilie, Silvana [2 ]
机构
[1] Univ Western Ontario, Dept Comp Sci, London, ON N6A 5B7, Canada
[2] Ryerson Univ, Dept Math, Toronto, ON M5B 2K3, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
SHORT DNA-SEQUENCES; READS; ALIGNMENT; GENOME; OLIGONUCLEOTIDES; TECHNOLOGY; ALGORITHM; MILLIONS; PROGRAM;
D O I
10.1093/bioinformatics/btq653
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: High-throughput sequencing technologies produce very large amounts of data and sequencing errors constitute one of the major problems in analyzing such data. Current algorithms for correcting these errors are not very accurate and do not automatically adapt to the given data. Results: We present HiTEC, an algorithm that provides a highly accurate, robust and fully automated method to correct reads produced by high-throughput sequencing methods. Our approach provides significantly higher accuracy than previous methods. It is time and space efficient and works very well for all read lengths, genome sizes and coverage levels.
引用
收藏
页码:295 / 302
页数:8
相关论文
共 36 条
[1]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[2]   PASS: a program to align short sequences [J].
Campagna, Davide ;
Albiero, Alessandro ;
Bilardi, Alessandra ;
Caniato, Elisa ;
Forcato, Claudio ;
Manavski, Svetlin ;
Vitulo, Nicola ;
Valle, Giorgio .
BIOINFORMATICS, 2009, 25 (07) :967-968
[3]   De novo fragment assembly with short mate-paired reads: Does the read length matter? [J].
Chaisson, Mark J. ;
Brinza, Dumitru ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2009, 19 (02) :336-346
[4]  
Chen J., 2007, ADV GENOME SEQUENCIN, P123
[5]   PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds [J].
Chen, Yangho ;
Souaiaia, Tade ;
Chen, Ting .
BIOINFORMATICS, 2009, 25 (19) :2514-2521
[6]   SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
GENOME RESEARCH, 2007, 17 (11) :1697-1706
[7]   MOM: maximum oligonucleotide mapping [J].
Eaves, Hugh L. ;
Gao, Yuan .
BIOINFORMATICS, 2009, 25 (07) :969-970
[8]   De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer [J].
Hernandez, David ;
Francois, Patrice ;
Farinelli, Laurent ;
Osteras, Magne ;
Schrenzel, Jacques .
GENOME RESEARCH, 2008, 18 (05) :802-809
[9]   Extending assembly of short DNA sequences to handle error [J].
Jeck, William R. ;
Reinhardt, Josephine A. ;
Baltrus, David A. ;
Hickenbotham, Matthew T. ;
Magrini, Vincent ;
Mardis, Elaine R. ;
Dangl, Jeffery L. ;
Jones, Corbin D. .
BIOINFORMATICS, 2007, 23 (21) :2942-2944
[10]   SeqMap: mapping massive amount of oligonucleotides to the genome [J].
Jiang, Hui ;
Wong, Wing Hung .
BIOINFORMATICS, 2008, 24 (20) :2395-2396