Birth-death prior on phylogeny and speed dating

被引:16
作者
Akerborg, Orjan [1 ,2 ]
Sennblad, Bengt [1 ]
Lagergren, Jens [1 ,2 ]
机构
[1] Stockholm Univ, Stockholm Bioinformat Ctr, SE-10691 Stockholm, Sweden
[2] Royal Inst Technol, Sch Comp Sci & Commun, SE-10044 Stockholm, Sweden
关键词
D O I
10.1186/1471-2148-8-77
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. Results: We demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on. Conclusion: Our contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models.
引用
收藏
页数:14
相关论文
共 35 条
[1]  
[Anonymous], 1991, The Causes of Molecular Evolution
[2]   Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA Phylogeny [J].
Aris-Brosou, S ;
Yang, ZH .
SYSTEMATIC BIOLOGY, 2002, 51 (05) :703-714
[3]   Molecular estimates of primate divergences and new hypotheses for primate dispersal and the origin of modern humans [J].
Arnason, U ;
Gullberg, A ;
Burguete, AS ;
Janke, A .
HEREDITAS, 2000, 133 (03) :217-228
[4]  
ARVESTAD L, 2004, P 8 ANN INT C RES CO
[5]   Bayesian gene/species tree reconciliation and orthology analysis using MCMC [J].
Arvestad, Lars ;
Berglund, Ann-Charlotte ;
Lagergren, Jens ;
Sennblad, Bengt .
BIOINFORMATICS, 2003, 19 :i7-i15
[6]   Estimating divergence times in phylogenetic trees without a molecular clock [J].
Britton, T .
SYSTEMATIC BIOLOGY, 2005, 54 (03) :500-507
[7]   Relaxed phylogenetics and dating with confidence [J].
Drummond, Alexei J. ;
Ho, Simon Y. W. ;
Phillips, Matthew J. ;
Rambaut, Andrew .
PLOS BIOLOGY, 2006, 4 (05) :699-710
[8]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[9]  
Felsenstein Joseph, 2004, Inferring_phylogenies, V2
[10]  
Gelman A., 1992, Statistical Science, V7, DOI [DOI 10.1214/SS/1177011136, 10.1214/ss/1177011136]