Quake: quality-aware detection and correction of sequencing errors

被引:400
作者
Kelley, David R. [1 ,2 ]
Schatz, Michael C. [3 ]
Salzberg, Steven L. [1 ,2 ]
机构
[1] Univ Maryland, Inst Adv Comp Studies, Ctr Bioinformat & Computat Biol, College Pk, MD 20742 USA
[2] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[3] Cold Spring Harbor Lab, Simons Ctr Quantitat Biol, Cold Spring Harbor, NY 11724 USA
来源
GENOME BIOLOGY | 2010年 / 11卷 / 11期
基金
美国国家科学基金会;
关键词
GENOME SEQUENCE; READS; ASSEMBLER; MODEL;
D O I
10.1186/gb-2010-11-11-r116
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
We introduce Quake, a program to detect and correct errors in DNA sequencing reads. Using a maximum likelihood approach incorporating quality values and nucleotide specific miscall rates, Quake achieves the highest accuracy on realistically simulated reads. We further demonstrate substantial improvements in de novo assembly and SNP detection after using Quake. Quake can be used for any size project, including more than one billion human reads, and is freely available as open source software from http://www.cbcb.umd.edu/software/quake.
引用
收藏
页数:13
相关论文
共 49 条
  • [1] The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group
    Ahn, Sung-Min
    Kim, Tae-Hyung
    Lee, Sunghoon
    Kim, Deokhoon
    Ghang, Ho
    Kim, Dae-Soo
    Kim, Byoung-Chul
    Kim, Sang-Yoon
    Kim, Woo-Yeon
    Kim, Chulhong
    Park, Daeui
    Lee, Yong Seok
    Kim, Sangsoo
    Reja, Rohit
    Jho, Sungwoong
    Kim, Chang Geun
    Cha, Ji-Young
    Kim, Kyung-Hee
    Lee, Bonghee
    Bhak, Jong
    Kim, Seong-Jin
    [J]. GENOME RESEARCH, 2009, 19 (09) : 1622 - 1629
  • [2] Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
  • [3] Model-Based Quality Assessment and Base-Calling for Second-Generation Sequencing Data
    Bravo, Hector Corrada
    Irizarry, Rafael A.
    [J]. BIOMETRICS, 2010, 66 (03) : 665 - 674
  • [4] ALLPATHS: De novo assembly of whole-genome shotgun microreads
    Butler, Jonathan
    MacCallum, Iain
    Kleber, Michael
    Shlyakhter, Ilya A.
    Belmonte, Matthew K.
    Lander, Eric S.
    Nusbaum, Chad
    Jaffe, David B.
    [J]. GENOME RESEARCH, 2008, 18 (05) : 810 - 820
  • [5] De novo fragment assembly with short mate-paired reads: Does the read length matter?
    Chaisson, Mark J.
    Brinza, Dumitru
    Pevzner, Pavel A.
    [J]. GENOME RESEARCH, 2009, 19 (02) : 336 - 346
  • [6] Finding optimal threshold for correction error reads in DNA assembling
    Chin, Francis Y. L.
    Leung, Henry C. M.
    Li, Wei-Lin
    Yiu, Siu-Ming
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [7] The impact of retrotransposons on human genome evolution
    Cordaux, Richard
    Batzer, Mark A.
    [J]. NATURE REVIEWS GENETICS, 2009, 10 (10) : 691 - 703
  • [8] OpenMP: An industry standard API for shared-memory programming
    Dagum, L
    Menon, R
    [J]. IEEE COMPUTATIONAL SCIENCE & ENGINEERING, 1998, 5 (01): : 46 - 55
  • [9] Mapreduce: Simplified data processing on large clusters
    Dean, Jeffrey
    Ghemawat, Sanjay
    [J]. COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 107 - 113
  • [10] Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
    Dohm, Juliane C.
    Lottaz, Claudio
    Borodina, Tatiana
    Himmelbauer, Heinz
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 (16)