Automated correction of genome sequence errors

被引:33
作者
Gajer, P [1 ]
Schatz, M [1 ]
Salzberg, SL [1 ]
机构
[1] Inst Gen Res, Rockville, MD 20850 USA
关键词
D O I
10.1093/nar/gkh216
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
By using information from an assembly of a genome, a new program called AutoEditor significantly improves base calling accuracy over that achieved by previous algorithms. This in turn improves the overall accuracy of genome sequences and facilitates the use of these sequences for polymorphism discovery. We describe the algorithm and its application in a large set of recent genome sequencing projects. The number of erroneous base calls in these projects was reduced by 80%. In an analysis of over one million corrections, we found that AutoEditor made just one error per 8828 corrections. By substantially increasing the accuracy of base calling, AutoEditor can dramatically accelerate the process of finishing genomes, which involves closing all gaps and ensuring minimum quality standards for the final sequence. It also greatly improves our ability to discover single nucleotide polymorphisms (SNPs) between closely related strains and isolates of the same species.
引用
收藏
页码:562 / 569
页数:8
相关论文
共 12 条
[1]  
[Anonymous], P ANN INT C COMP MOL
[2]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[3]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[4]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[5]   Single nucleotide polymorphisms in Mycobacterium tuberculosis structural genes -: Response to Dr. Musser [J].
Fleischmann, R .
EMERGING INFECTIOUS DISEASES, 2001, 7 (03) :487-488
[6]   Variation is the spice of life [J].
Kruglyak, L ;
Nickerson, DA .
NATURE GENETICS, 2001, 27 (03) :234-236
[7]   A whole-genome assembly of Drosophila [J].
Myers, EW ;
Sutton, GG ;
Delcher, AL ;
Dew, IM ;
Fasulo, DP ;
Flanigan, MJ ;
Kravitz, SA ;
Mobarry, CM ;
Reinert, KHJ ;
Remington, KA ;
Anson, EL ;
Bolanos, RA ;
Chou, HH ;
Jordan, CM ;
Halpern, AL ;
Lonardi, S ;
Beasley, EM ;
Brandon, RC ;
Chen, L ;
Dunn, PJ ;
Lai, ZW ;
Liang, Y ;
Nusskern, DR ;
Zhan, M ;
Zhang, Q ;
Zheng, XQ ;
Rubin, GM ;
Adams, MD ;
Venter, JC .
SCIENCE, 2000, 287 (5461) :2196-2204
[8]   Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21 [J].
Patil, N ;
Berno, AJ ;
Hinds, DA ;
Barrett, WA ;
Doshi, JM ;
Hacker, CR ;
Kautzer, CR ;
Lee, DH ;
Marjoribanks, C ;
McDonough, DP ;
Nguyen, BTN ;
Norris, MC ;
Sheehan, JB ;
Shen, NP ;
Stern, D ;
Stokowski, RP ;
Thomas, DJ ;
Trulson, MO ;
Vyas, KR ;
Frazer, KA ;
Fodor, SPA ;
Cox, DR .
SCIENCE, 2001, 294 (5547) :1719-1723
[9]   An Eulerian path approach to DNA fragment assembly [J].
Pevzner, PA ;
Tang, HX ;
Waterman, MS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (17) :9748-9753
[10]   Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis [J].
Read, TD ;
Salzberg, SL ;
Pop, M ;
Shumway, M ;
Umayam, L ;
Jiang, LX ;
Holtzapple, E ;
Busch, JD ;
Smith, KL ;
Schupp, JM ;
Solomon, D ;
Keim, P ;
Fraser, CM .
SCIENCE, 2002, 296 (5575) :2028-2033