A greedy algorithm for aligning DNA sequences

被引:4036
作者
Zhang, Z
Schwartz, S
Wagner, L
Miller, W [1 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
[2] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
关键词
sequence alignment; greedy algorithms; dynamic programming;
D O I
10.1089/10665270050081478
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and Set produce an alignment that is guaranteed to be theoretically optimal. We introduce a nem greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.
引用
收藏
页码:203 / 214
页数:12
相关论文
共 18 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]  
Chao KM, 1997, COMPUT APPL BIOSCI, V13, P75
[4]   Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence [J].
Cole, ST ;
Brosch, R ;
Parkhill, J ;
Garnier, T ;
Churcher, C ;
Harris, D ;
Gordon, SV ;
Eiglmeier, K ;
Gas, S ;
Barry, CE ;
Tekaia, F ;
Badcock, K ;
Basham, D ;
Brown, D ;
Chillingworth, T ;
Connor, R ;
Davies, R ;
Devlin, K ;
Feltwell, T ;
Gentles, S ;
Hamlin, N ;
Holroyd, S ;
Hornby, T ;
Jagels, K ;
Krogh, A ;
McLean, J ;
Moule, S ;
Murphy, L ;
Oliver, K ;
Osborne, J ;
Quail, MA ;
Rajandream, MA ;
Rogers, J ;
Rutter, S ;
Seeger, K ;
Skelton, J ;
Squares, R ;
Squares, S ;
Sulston, JE ;
Taylor, K ;
Whitehead, S ;
Barrell, BG .
NATURE, 1998, 393 (6685) :537-+
[5]   Alignment of whole genomes [J].
Delcher, AL ;
Kasif, S ;
Fleischmann, RD ;
Peterson, J ;
White, O ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (11) :2369-2376
[6]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[7]   A computer program for aligning a cDNA sequence with a genomic DNA sequence [J].
Florea, L ;
Hartzell, G ;
Zhang, Z ;
Rubin, GM ;
Miller, W .
GENOME RESEARCH, 1998, 8 (09) :967-974
[8]   AN IMPROVED ALGORITHM FOR MATCHING BIOLOGICAL SEQUENCES [J].
GOTOH, O .
JOURNAL OF MOLECULAR BIOLOGY, 1982, 162 (03) :705-708
[9]   Generation and analysis of 280,000 human expressed sequence tags [J].
Hillier, L ;
Lennon, G ;
Becker, M ;
Bonaldo, MF ;
Chiapelli, B ;
Chissoe, S ;
Dietrich, N ;
DuBuque, T ;
Favello, A ;
Gish, W ;
Hawkins, M ;
Hultman, M ;
Kucaba, T ;
Lacy, M ;
Le, M ;
Le, N ;
Mardis, E ;
Moore, B ;
Morris, M ;
Parsons, J ;
Prange, C ;
Rifkin, L ;
Rohlfing, T ;
Schellenberg, K ;
Soares, MB ;
Tan, F ;
ThierryMeg, J ;
Trevaskis, E ;
Underwood, K ;
Wohldman, P ;
Waterston, R ;
Wilson, R ;
Marra, M .
GENOME RESEARCH, 1996, 6 (09) :807-828
[10]   A FILE COMPARISON PROGRAM [J].
MILLER, W ;
MYERS, EW .
SOFTWARE-PRACTICE & EXPERIENCE, 1985, 15 (11) :1025-1040