Exact and heuristic algorithms for the Indel Maximum Likelihood Problem

被引:28
作者
Diallo, Abdoulaye Banire
Makarenkov, Vladimir
Blanchette, Mathieu [1 ]
机构
[1] McGill Univ, Ctr Bioinformat, Montreal, PQ H3A 2T5, Canada
[2] McGill Univ, Sch Comp Sci, Montreal, PQ H3A 2T5, Canada
[3] Univ Quebec, Dept Informat, Montreal, PQ H3C 3P8, Canada
关键词
ancestral genome reconstruction; ancestral mammalian genomes; Indel Maximum Likelihood Problem; insertions and deletions; tree-HMM;
D O I
10.1089/cmb.2007.A006
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Given a multiple alignment of orthologous DNA sequences and a phylogenetic tree for these sequences, we investigate the problem of reconstructing the most likely scenario of insertions and deletions capable of explaining the gaps observed in the alignment. This problem, that we called the Indel Maximum Likelihood Problem ( IMLP), is an important step toward the reconstruction of ancestral genomics sequences, and is important for studying evolutionary processes, genome function, adaptation and convergence. We solve the IMLP using a new type of tree hidden Markov model whose states correspond to single-base evolutionary scenarios and where transitions model dependencies between neighboring columns. The standard Viterbi and Forward-backward algorithms are optimized to produce the most likely ancestral reconstruction and to compute the level of confidence associated to specific regions of the reconstruction. A heuristic is presented to make the method practical for large data sets, while retaining an extremely high degree of accuracy. The methods are illustrated on a 1-Mb alignment of the CFTR regions from 12 mammals.
引用
收藏
页码:446 / 461
页数:16
相关论文
共 28 条
  • [1] The past as the key to the present: Resurrection of ancient proteins from eosinophils
    Benner, SA
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (08) : 4760 - 4761
  • [2] Reconstructing large regions of an ancestral mammalian genome in silico
    Blanchette, M
    Green, ED
    Miller, W
    Haussler, D
    [J]. GENOME RESEARCH, 2004, 14 (12) : 2412 - 2423
  • [3] Aligning multiple genomic sequences with the threaded blockset aligner
    Blanchette, M
    Kent, WJ
    Riemer, C
    Elnitski, L
    Smit, AFA
    Roskin, KM
    Baertsch, R
    Rosenbloom, K
    Clawson, H
    Green, ED
    Haussler, D
    Miller, W
    [J]. GENOME RESEARCH, 2004, 14 (04) : 708 - 715
  • [4] MAVID: Constrained ancestral alignment of multiple sequences
    Bray, N
    Pachter, L
    [J]. GENOME RESEARCH, 2004, 14 (04) : 693 - 699
  • [5] LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA
    Brudno, M
    Do, CB
    Cooper, GM
    Kim, MF
    Davydov, E
    Green, ED
    Sidow, A
    Batzoglou, S
    [J]. GENOME RESEARCH, 2003, 13 (04) : 721 - 731
  • [6] Chindelevitch Leonid, 2006, Journal of Bioinformatics and Computational Biology, V4, P721, DOI 10.1142/S0219720006002168
  • [7] Durbin R., 1998, BIOL SEQUENCE ANAL
  • [8] The ENCODE (ENCyclopedia of DNA elements) Project
    Feingold, EA
    Good, PJ
    Guyer, MS
    Kamholz, S
    Liefer, L
    Wetterstrand, K
    Collins, FS
    Gingeras, TR
    Kampa, D
    Sekinger, EA
    Cheng, J
    Hirsch, H
    Ghosh, S
    Zhu, Z
    Pate, S
    Piccolboni, A
    Yang, A
    Tammana, H
    Bekiranov, S
    Kapranov, P
    Harrison, R
    Church, G
    Struhl, K
    Ren, B
    Kim, TH
    Barrera, LO
    Qu, C
    Van Calcar, S
    Luna, R
    Glass, CK
    Rosenfeld, MG
    Guigo, R
    Antonarakis, SE
    Birney, E
    Brent, M
    Pachter, L
    Reymond, A
    Dermitzakis, ET
    Dewey, C
    Keefe, D
    Denoeud, F
    Lagarde, J
    Ashurst, J
    Hubbard, T
    Wesselink, JJ
    Castelo, R
    Eyras, E
    Myers, RM
    Sidow, A
    Batzoglou, S
    [J]. SCIENCE, 2004, 306 (5696) : 636 - 640
  • [9] A hidden Markov Model approach to variation among sites in rate of evolution
    Felsenstein, J
    Churchill, GA
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (01) : 93 - 104
  • [10] EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH
    FELSENSTEIN, J
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) : 368 - 376