A generalized global alignment algorithm

被引:49
作者
Huang, XQ
Chao, KM
机构
[1] Iowa State Univ, Dept Comp Sci, Ames, IA 50011 USA
[2] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 10764, Taiwan
关键词
D O I
10.1093/bioinformatics/19.2.228
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. Results: We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. Availability: The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/ align.html. Contact: xqhuang@cs.iastate.edu; kmchao@csie.ntu.edu.tw.
引用
收藏
页码:228 / 233
页数:6
相关论文
共 22 条
  • [1] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [2] A new approach to sequence comparison:: normalired sequence alignment
    Arslan, AN
    Egecioglu, Ö
    Pevzner, PA
    [J]. BIOINFORMATICS, 2001, 17 (04) : 327 - 337
  • [3] Human and mouse gene structure: Comparative analysis and application to exon prediction
    Batzoglou, S
    Pachter, L
    Mesirov, JP
    Berger, B
    Lander, ES
    [J]. GENOME RESEARCH, 2000, 10 (07) : 950 - 958
  • [4] BURKHARDT S, 1999, 3 ANN INT C COMP MOL, P11
  • [5] Alignment of whole genomes
    Delcher, AL
    Kasif, S
    Fleischmann, RD
    Peterson, J
    White, O
    Salzberg, SL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (11) : 2369 - 2376
  • [6] Profile hidden Markov models
    Eddy, SR
    [J]. BIOINFORMATICS, 1998, 14 (09) : 755 - 763
  • [7] AN IMPROVED ALGORITHM FOR MATCHING BIOLOGICAL SEQUENCES
    GOTOH, O
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1982, 162 (03) : 705 - 708
  • [8] LINEAR SPACE ALGORITHM FOR COMPUTING MAXIMAL COMMON SUBSEQUENCES
    HIRSCHBERG, DS
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (06) : 341 - 343
  • [9] HUANG X, 2002, CURRENT TOPICS COMPU, P45
  • [10] A TIME-EFFICIENT, LINEAR-SPACE LOCAL SIMILARITY ALGORITHM
    HUANG, XQ
    MILLER, W
    [J]. ADVANCES IN APPLIED MATHEMATICS, 1991, 12 (03) : 337 - 357