Fast and sensitive alignment of large genomic sequences

被引:31
作者
Brudno, M [1 ]
Morgenstern, B [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
CSB2002: IEEE COMPUTER SOCIETY BIOINFORMATICS CONFERENCE | 2002年
关键词
D O I
10.1109/CSB.2002.1039337
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Comparative analysis of syntenic genome sequences can be used to identify functional sites such as exons and regulatory elements. Here, the first step is to align two or several evolutionary related sequences and, in recent years, a number of computer programs have been developed for alignment of, large genomic sequences. Some of these programs are extremely fast but often time-efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast heuristic identifies a chain of strong sequence similarities that serve as anchor points. In a second step, regions between these anchor points are aligned using a slower but more sensitive method. We present CHAOS, a novel algorithm for rapid identification of chains of local sequence similarities among large genomic sequences. Similarities identified by CHAOS are used as anchor points to improve the running time of the DIALIGN alignment program. Systematic test runs show that this method can reduce the running time of DIALIGN by more than 93% while affecting the quality of the resulting alignments by only 1%.
引用
收藏
页码:138 / 147
页数:10
相关论文
共 33 条
  • [1] EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH
    AHO, AV
    CORASICK, MJ
    [J]. COMMUNICATIONS OF THE ACM, 1975, 18 (06) : 333 - 340
  • [2] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [3] BAFNA V, 2000, P 8 INT C INT SYST M
  • [4] Human and mouse gene structure: Comparative analysis and application to exon prediction
    Batzoglou, S
    Pachter, L
    Mesirov, JP
    Berger, B
    Lander, ES
    [J]. GENOME RESEARCH, 2000, 10 (07) : 950 - 958
  • [5] Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences
    Bergman, CM
    Kreitman, M
    [J]. GENOME RESEARCH, 2001, 11 (08) : 1335 - 1345
  • [6] Algorithms for phylogenetic footprinting
    Blanchette, M
    Schwikowski, B
    Tompa, M
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (02) : 211 - 223
  • [7] Discovery of regulatory elements by a computational method for phylogenetic footprinting
    Blanchette, M
    Tompa, M
    [J]. GENOME RESEARCH, 2002, 12 (05) : 739 - 748
  • [8] Alignment of whole genomes
    Delcher, AL
    Kasif, S
    Fleischmann, RD
    Peterson, J
    White, O
    Salzberg, SL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (11) : 2369 - 2376
  • [9] TRIE MEMORY
    FREDKIN, E
    [J]. COMMUNICATIONS OF THE ACM, 1960, 3 (09) : 490 - 499
  • [10] Gene recognition via spliced sequence alignment
    Gelfand, MS
    Mironov, AA
    Pevzner, PA
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (17) : 9061 - 9066