Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs

被引:194
作者
Paten, Benedict [1 ]
Herrero, Javier [2 ]
Beal, Kathryn [2 ]
Fitzgerald, Stephen [2 ]
Birney, Ewan [2 ]
机构
[1] Univ Calif Santa Cruz, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
[2] EMBL European Bioinformat Inst, Cambridge CB10 1SD, England
关键词
D O I
10.1101/gr.076554.108
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Pairwise whole-genome alignment involves the creation of a homology map, capable of performing a near complete transformation of one genome into another. For multiple genomes this problem is generalized to finding a set of consistent homology maps for converting each genome in the set of aligned genomes into any of the others. The problem can be divided into two principal stages. First, the partitioning of the input genomes into a set of colinear segments, a process which essentially deals with the complex processes of rearrangement. Second, the generation of a base pair level alignment map for each colinear segment. We have developed a new genome-wide segmentation program, Enredo, which produces colinear segments from extant genomes handling rearrangements, including duplications. We have then applied the new alignment program Pecan, which makes the consistency alignment methodology practical at a large scale, to create a new set of genome-wide mammalian alignments. We test both Enredo and Pecan using novel and existing assessment analyses that incorporate both real biological data and simulations, and show that both independently and in combination they outperform existing programs. Alignments from our pipeline are publicly available within the Ensembl genome browser.
引用
收藏
页码:1814 / 1828
页数:15
相关论文
共 49 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] BAFNA V, 1993, LECT NOTES COMPUTER, P148
  • [3] BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations
    Bahr, A
    Thompson, JD
    Thierry, JC
    Poch, O
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 323 - 326
  • [4] Human and mouse gene structure: Comparative analysis and application to exon prediction
    Batzoglou, S
    Pachter, L
    Mesirov, JP
    Berger, B
    Lander, ES
    [J]. GENOME RESEARCH, 2000, 10 (07) : 950 - 958
  • [5] Aligning multiple genomic sequences with the threaded blockset aligner
    Blanchette, M
    Kent, WJ
    Riemer, C
    Elnitski, L
    Smit, AFA
    Roskin, KM
    Baertsch, R
    Rosenbloom, K
    Clawson, H
    Green, ED
    Haussler, D
    Miller, W
    [J]. GENOME RESEARCH, 2004, 14 (04) : 708 - 715
  • [6] MAVID: Constrained ancestral alignment of multiple sequences
    Bray, N
    Pachter, L
    [J]. GENOME RESEARCH, 2004, 14 (04) : 693 - 699
  • [7] AVID: A global alignment program
    Bray, N
    Dubchak, I
    Pachter, L
    [J]. GENOME RESEARCH, 2003, 13 (01) : 97 - 102
  • [8] The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences
    Brudno, M
    Steinkamp, R
    Morgenstern, B
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : W41 - W44
  • [9] LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA
    Brudno, M
    Do, CB
    Cooper, GM
    Kim, MF
    Davydov, E
    Green, ED
    Sidow, A
    Batzoglou, S
    [J]. GENOME RESEARCH, 2003, 13 (04) : 721 - 731
  • [10] Glocal alignment: finding rearrangements during alignment
    Brudno, Michael
    Malde, Sanket
    Poliakov, Alexander
    Do, Chuong B.
    Couronne, Olivier
    Dubchak, Inna
    Batzoglou, Serafim
    [J]. BIOINFORMATICS, 2003, 19 : i54 - i62