Improving the Accuracy of Demographic and Molecular Clock Model Comparison While Accommodating Phylogenetic Uncertainty

被引:942
作者
Baele, Guy [1 ]
Lemey, Philippe [1 ]
Bedford, Trevor [2 ]
Rambaut, Andrew [2 ]
Suchard, Marc A. [3 ,4 ,5 ]
Alekseyenko, Alexander V. [6 ]
机构
[1] Katholieke Univ Leuven, Dept Microbiol & Immunol, Louvain, Belgium
[2] Univ Edinburgh, Inst Evolutionary Biol, Edinburgh, Midlothian, Scotland
[3] Univ Calif Los Angeles, David Geffen Sch Med, Dept Biomath, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, David Geffen Sch Med, Dept Human Genet, Los Angeles, CA 90095 USA
[5] Univ Calif Los Angeles, Sch Publ Hlth, Dept Biostat, Los Angeles, CA 90024 USA
[6] NYU, Sch Med, Ctr Hlth Informat & Bioinformat, Dept Med, New York, NY 10003 USA
基金
英国惠康基金; 美国国家卫生研究院;
关键词
model comparison; marginal likelihood; Bayes factors; path sampling; stepping-stone sampling; demographic models; molecular clock; Bayesian inference; phylogeny; BEAST; MARGINAL LIKELIHOOD ESTIMATION; MONTE-CARLO METHOD; BAYES FACTORS; SEQUENCE DATA; INFERENCE; EVOLUTION; TIME; INTEGRATION; SELECTION; VIRUSES;
D O I
10.1093/molbev/mss084
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
070307 [化学生物学]; 071010 [生物化学与分子生物学];
摘要
Recent developments in marginal likelihood estimation for model selection in the field of Bayesian phylogenetics and molecular evolution have emphasized the poor performance of the harmonic mean estimator (HME). Although these studies have shown the merits of new approaches applied to standard normally distributed examples and small real-world data sets, not much is currently known concerning the performance and computational issues of these methods when fitting complex evolutionary and population genetic models to empirical real-world data sets. Further, these approaches have not yet seen widespread application in the field due to the lack of implementations of these computationally demanding techniques in commonly used phylogenetic packages. We here investigate the performance of some of these new marginal likelihood estimators, specifically, path sampling (PS) and stepping-stone (SS) sampling for comparing models of demographic change and relaxed molecular clocks, using synthetic data and real-world examples for which unexpected inferences were made using the HME. Given the drastically increased computational demands of PS and SS sampling, we also investigate a posterior simulation-based analogue of Akaike's information criterion (AIC) through Markov chain Monte Carlo (MCMC), a model comparison approach that shares with the HME the appealing feature of having a low computational overhead over the original MCMC analysis. We confirm that the HME systematically overestimates the marginal likelihood and fails to yield reliable model classification and show that the AICM performs better and may be a useful initial evaluation of model choice but that it is also, to a lesser degree, unreliable. We show that PS and SS sampling substantially outperform these estimators and adjust the conclusions made concerning previous analyses for the three real-world data sets that we reanalyzed. The methods used in this article are now available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.
引用
收藏
页码:2157 / 2167
页数:11
相关论文
共 38 条
[1]
Akaike H., 1973, 2 INT S INFORM THEOR, P267
[2]
[Anonymous], 2021, Bayesian data analysis
[3]
Drummond AJ, 2005, MOL BIOL EVOL, V22, P1185, DOI [10.1093/molbev/msi103, 10.1093/molbev/mss075]
[4]
Drummond AJ, 2002, GENETICS, V161, P1307
[5]
Relaxed phylogenetics and dating with confidence [J].
Drummond, Alexei J. ;
Ho, Simon Y. W. ;
Phillips, Matthew J. ;
Rambaut, Andrew .
PLOS BIOLOGY, 2006, 4 (05) :699-710
[6]
Bayesian Phylogenetics with BEAUti and the BEAST 1.7 [J].
Drummond, Alexei J. ;
Suchard, Marc A. ;
Xie, Dong ;
Rambaut, Andrew .
MOLECULAR BIOLOGY AND EVOLUTION, 2012, 29 (08) :1969-1973
[7]
Rates of evolutionary change in viruses: patterns and determinants [J].
Duffy, Siobain ;
Shackelton, Laura A. ;
Holmes, Edward C. .
NATURE REVIEWS GENETICS, 2008, 9 (04) :267-276
[8]
Choosing among Partition Models in Bayesian Phylogenetics [J].
Fan, Yu ;
Wu, Rui ;
Chen, Ming-Hui ;
Kuo, Lynn ;
Lewis, Paul O. .
MOLECULAR BIOLOGY AND EVOLUTION, 2011, 28 (01) :523-532
[9]
Using Time-Structured Data to Estimate Evolutionary Rates of Double-Stranded DNA Viruses [J].
Firth, Cadhla ;
Kitchen, Andrew ;
Shapiro, Beth ;
Suchard, Marc A. ;
Holmes, Edward C. ;
Rambaut, Andrew .
MOLECULAR BIOLOGY AND EVOLUTION, 2010, 27 (09) :2038-2051
[10]
Marginal likelihood estimation via power posteriors [J].
Friel, N. ;
Pettitt, A. N. .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :589-607