Correcting errors in short reads by multiple alignments

被引:119
作者
Salmela, Leena [1 ]
Schroeder, Jan [2 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki Inst Informat Technol HIT, SF-00510 Helsinki, Finland
[2] Univ Melbourne, Dept Comp Sci & Software Engn, NICTA Victorian Res Lab, Melbourne, Vic, Australia
基金
澳大利亚研究理事会; 芬兰科学院;
关键词
SEQUENCE;
D O I
10.1093/bioinformatics/btr170
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Current sequencing technologies produce a large number of erroneous reads. The sequencing errors present a major challenge in utilizing the data in de novo sequencing projects as assemblers have difficulties in dealing with errors. Results: We present Coral which corrects sequencing errors by forming multiple alignments. Unlike previous tools for error correction, Coral can utilize also bases distant from the error in the correction process because the whole read is present in the alignment. Coral is easily adjustable to reads produced by different sequencing technologies like Illumina Genome Analyzer and Roche/454 Life Sciences sequencing platforms because the sequencing error model can be defined by the user. We show that our method is able to reduce the error rate of reads more than previous methods.
引用
收藏
页码:1455 / 1461
页数:7
相关论文
共 20 条
  • [1] Fragment assembly with short reads
    Chaisson, M
    Pevzner, P
    Tang, HX
    [J]. BIOINFORMATICS, 2004, 20 (13) : 2067 - 2074
  • [2] Short read fragment assembly of bacterial genomes
    Chaisson, Mark J.
    Pevzner, Pavel A.
    [J]. GENOME RESEARCH, 2008, 18 (02) : 324 - 330
  • [3] De novo fragment assembly with short mate-paired reads: Does the read length matter?
    Chaisson, Mark J.
    Brinza, Dumitru
    Pevzner, Pavel A.
    [J]. GENOME RESEARCH, 2009, 19 (02) : 336 - 346
  • [4] Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
  • [5] Single-molecule DNA sequencing technologies for future genomics research
    Gupta, Pushpendra K.
    [J]. TRENDS IN BIOTECHNOLOGY, 2008, 26 (11) : 602 - 611
  • [6] De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer
    Hernandez, David
    Francois, Patrice
    Farinelli, Laurent
    Osteras, Magne
    Schrenzel, Jacques
    [J]. GENOME RESEARCH, 2008, 18 (05) : 802 - 809
  • [7] HiTEC: accurate error correction in high-throughput sequencing data
    Ilie, Lucian
    Fazayeli, Farideh
    Ilie, Silvana
    [J]. BIOINFORMATICS, 2011, 27 (03) : 295 - 302
  • [8] Whole-genome sequence assembly for mammalian genomes: Arachne 2
    Jaffe, DB
    Butler, J
    Gnerre, S
    Mauceli, E
    Lindblad-Toh, K
    Mesirov, JP
    Zody, MC
    Lander, ES
    [J]. GENOME RESEARCH, 2003, 13 (01) : 91 - 96
  • [9] Quake: quality-aware detection and correction of sequencing errors
    Kelley, David R.
    Schatz, Michael C.
    Salzberg, Steven L.
    [J]. GENOME BIOLOGY, 2010, 11 (11):
  • [10] SOAP: short oligonucleotide alignment program
    Li, Ruiqiang
    Li, Yingrui
    Kristiansen, Karsten
    Wang, Jun
    [J]. BIOINFORMATICS, 2008, 24 (05) : 713 - 714