LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA

被引:823
作者
Brudno, M
Do, CB
Cooper, GM
Kim, MF
Davydov, E
Green, ED
Sidow, A
Batzoglou, S [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Pathol, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[4] NHGRI, Genome Technol Branch, NIH, Bethesda, MD 20892 USA
[5] NHGRI, Intramural Sequencing Ctr, NIH, Bethesda, MD 20892 USA
关键词
D O I
10.1101/gr.926603
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. We present LAGAN, a system for rapid global alignment of two homologous genomic sequences, and Multi-LAGAN, a system for multiple global alignment of genomic sequences. We tested our systems on a data set consisting of greater than 12 Mb of high-quality sequence from 12 vertebrate species. All the sequence was derived from the genomic region orthologous to an similar to1.5-Mb region on human chromosome 7q31.3. We found that both LAGAN and Multi-LAGAN compare favorably with other leading alignment methods in correctly aligning protein-coding exons, especially between distant homologs such as human and chicken, or human and fugu. Multi-LAGAN produced the most accurate alignments, while requiring just 75 minutes on a personal computer to obtain the multiple alignment of all 12 sequences. Multi-LAGAN is a practical method for generating multiple alignments of long genomic sequences at any evolutionary distance. Our systems are publicly available at http:/ / lagan.stanford.edu.
引用
收藏
页码:721 / 731
页数:11
相关论文
共 37 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [3] ReAligner: A program for refining DNA sequence multi-alignments
    Anson, EL
    Myers, EW
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (03) : 369 - 383
  • [4] A STRATEGY FOR THE RAPID MULTIPLE ALIGNMENT OF PROTEIN SEQUENCES - CONFIDENCE LEVELS FROM TERTIARY STRUCTURE COMPARISONS
    BARTON, GJ
    STERNBERG, MJE
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1987, 198 (02) : 327 - 337
  • [5] Human and mouse gene structure: Comparative analysis and application to exon prediction
    Batzoglou, S
    Pachter, L
    Mesirov, JP
    Berger, B
    Lander, ES
    [J]. GENOME RESEARCH, 2000, 10 (07) : 950 - 958
  • [6] The complexity of multiple sequence alignment with SP-score that is a metric
    Bonizzoni, P
    Della Vedova, G
    [J]. THEORETICAL COMPUTER SCIENCE, 2001, 259 (1-2) : 63 - 79
  • [7] AVID: A global alignment program
    Bray, N
    Dubchak, I
    Pachter, L
    [J]. GENOME RESEARCH, 2003, 13 (01) : 97 - 102
  • [8] Fast and sensitive alignment of large genomic sequences
    Brudno, M
    Morgenstern, B
    [J]. CSB2002: IEEE COMPUTER SOCIETY BIOINFORMATICS CONFERENCE, 2002, : 138 - 147
  • [9] Alignment of whole genomes
    Delcher, AL
    Kasif, S
    Fleischmann, RD
    Peterson, J
    White, O
    Salzberg, SL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (11) : 2369 - 2376
  • [10] Fast algorithms for large-scale genome alignment and comparison
    Delcher, AL
    Phillippy, A
    Carlton, J
    Salzberg, SL
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (11) : 2478 - 2483