The importance of data partitioning and the utility of bayes factors in Bayesian phylogenetics

被引:276
作者
Brown, Jeremy M. [1 ]
Lemmon, Alan R. [1 ]
机构
[1] Univ Texas, Sect Integrat Biol, Austin, TX 78712 USA
基金
美国国家科学基金会;
关键词
D O I
10.1080/10635150701546249
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
As larger, more complex data sets are being used to infer phylogenies, accuracy of these phylogenies increasingly requires models of evolution that accommodate heterogeneity in the processes of molecular evolution. We investigated the effect of improper data partitioning on phylogenetic accuracy, as well as the type I error rate and sensitivity of Bayes factors, a commonly used method for choosing among different partitioning strategies in Bayesian analyses. We also used Bayes factors to test empirical data for the need to divide data in a manner that has no expected biological meaning. Posterior probability estimates are misleading when an incorrect partitioning strategy is assumed. The error was greatest when the assumed model was underpartitioned. These results suggest that model partitioning is important for large data sets. Bayes factors performed well, giving a 5% type I error rate, which is remarkably consistent with standard frequentist hypothesis tests. The sensitivity of Bayes factors was found to be quite high when the across- class model heterogeneity reflected that of empirical data. These results suggest that Bayes factors represent a robust method of choosing among partitioning strategies. Lastly, results of tests for the inclusion of unexpected divisions in empirical data mirrored the simulation results, although the outcome of such tests is highly dependent on accounting for rate variation among classes. We conclude by discussing other approaches for partitioning data, as well as other applications of Bayes factors.
引用
收藏
页码:643 / 655
页数:13
相关论文
共 26 条
[11]   BAYES FACTORS [J].
KASS, RE ;
RAFTERY, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (430) :773-795
[12]   Computing Bayes factors using thermodynamic integration [J].
Lartillot, N ;
Philippe, H .
SYSTEMATIC BIOLOGY, 2006, 55 (02) :195-207
[13]   A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process [J].
Lartillot, N ;
Philippe, H .
MOLECULAR BIOLOGY AND EVOLUTION, 2004, 21 (06) :1095-1109
[14]   The importance of proper model assumption in Bayesian phylogenetics [J].
Lemmon, AR ;
Moriarty, EC .
SYSTEMATIC BIOLOGY, 2004, 53 (02) :265-277
[15]   Accurate branch length estimation in partitioned Bayesian analyses requires accommodation of among-partition rate variation and attention to branch length priors [J].
Marshall, David C. ;
Simon, Chris ;
Buckley, Thomas R. .
SYSTEMATIC BIOLOGY, 2006, 55 (06) :993-1003
[16]   Morphological homoplasy, life history evolution, and historical biogeography of plethodontid salamanders inferred from complete mitochondrial genomes [J].
Mueller, RL ;
Macey, JR ;
Jaekel, M ;
Wake, DB ;
Boore, JL .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (38) :13820-13825
[17]  
NEWTON MA, 1994, J R STAT SOC B, V56, P3
[18]  
Nylander J.A. A., 2004, PROGRAM DISTRIBUTED
[19]   Bayesian phylogenetic analysis of combined data [J].
Nylander, JAA ;
Ronquist, F ;
Huelsenbeck, JP ;
Nieves-Aldrey, JL .
SYSTEMATIC BIOLOGY, 2004, 53 (01) :47-67
[20]   MODELTEST: testing the model of DNA substitution [J].
Posada, D ;
Crandall, KA .
BIOINFORMATICS, 1998, 14 (09) :817-818