Performance-based selection of likelihood models for phylogeny estimation

被引:347
作者
Minin, V
Abdo, Z
Joyce, P
Sullivan, J
机构
[1] Univ Idaho, Dept Math, Moscow, ID 83844 USA
[2] Univ Idaho, Initiat Bioinformat & Evolutionary Studies, Moscow, ID 83844 USA
[3] Univ Calif Los Angeles, Sch Med, Dept Biomath, Los Angeles, CA 90025 USA
[4] Univ Idaho, Dept Biol Sci, Moscow, ID 83844 USA
关键词
Bayesian model selection; decision theory; incorrect models; likelihood ratio test; maximum likelihood; nucleotide-substitution model; phylogeny;
D O I
10.1080/10635150390235494
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Phylogenetic estimation has largely come to rely on explicitly model-based methods. This approach requires that a model be chosen and that that choice be justified. To date, justification has largely been accomplished through use of likelihood-ratio tests (LRTs) to assess the relative fit of a nested series of reversible models. While this approach certainly represents an important advance over arbitrary model selection, the best fit of a series of models may not always provide the most reliable phylogenetic estimates for finite real data sets, where all available models are surely incorrect. Here, we develop a novel approach to model selection, which is based on the Bayesian information criterion, but incorporates relative branch-length error as a performance measure in a decision theory (DT) framework. This DT method includes a penalty for overfitting, is applicable prior to running extensive analyses, and simultaneously compares all models being considered and thus does not rely on a series of pairwise comparisons of models to traverse model space. We evaluate this method by examining four real data sets and by using those data sets to define simulation conditions. In the real data sets, the DT method selects the same or simpler models than conventional LRTs. In order to lend generality to the simulations, codon-based models (with parameters estimated from the real data sets) were used to generate simulated data sets, which are therefore more complex than any of the models we evaluate. On average, the DT method selects models that are simpler than those chosen by conventional LRTs. Nevertheless, these simpler models provide estimates of branch lengths that are more accurate both in terms of relative error and absolute error than those derived using the more complex ( yet still wrong) models chosen by conventional LRTs. This method is available in a program called DT-ModSel.
引用
收藏
页码:674 / 683
页数:10
相关论文
共 29 条
[1]  
Bernardo J.M., 2009, Bayesian Theory, V405
[2]   Bayesian model adequacy and choice in phylogenetics [J].
Bollback, JP .
MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (07) :1171-1180
[3]   Exploring among-site rate variation models in a maximum likelihood framework using empirical data: Effects of model assumptions on estimates of topology, branch lengths, and bootstrap support [J].
Buckley, TR ;
Simon, C ;
Chambers, GK .
SYSTEMATIC BIOLOGY, 2001, 50 (01) :67-86
[4]   Exploring data interaction and nucleotide alignment in a multiple gene analysis of Ips (Coleoptera: Scolytinae) [J].
Cognato, AI ;
Vogler, AP .
SYSTEMATIC BIOLOGY, 2001, 50 (06) :758-780
[5]   Extensive mtDNA variation within the yellow-pine chipmunk, Tamias amoenus (Rodentia: Sciuridae), and phylogeographic inferences for northwest North America [J].
Demboski, JR ;
Sullivan, J .
MOLECULAR PHYLOGENETICS AND EVOLUTION, 2003, 26 (03) :389-408
[6]   AN IMPROVED METHOD FOR DETERMINING CODON VARIABILITY IN A GENE AND ITS APPLICATION TO RATE OF FIXATION OF MUTATIONS IN EVOLUTION [J].
FITCH, WM ;
MARKOWITZ, E .
BIOCHEMICAL GENETICS, 1970, 4 (05) :579-+
[7]   Evolution of the mitochondrial cytochrome oxidase II gene in collembola [J].
Frati, F ;
Simon, C ;
Sullivan, J ;
Swofford, DL .
JOURNAL OF MOLECULAR EVOLUTION, 1997, 44 (02) :145-158
[8]   SUCCESS OF MAXIMUM-LIKELIHOOD PHYLOGENY INFERENCE IN THE 4-TAXON CASE [J].
GAUT, BS ;
LEWIS, PO .
MOLECULAR BIOLOGY AND EVOLUTION, 1995, 12 (01) :152-162
[9]   STATISTICAL TESTS OF MODELS OF DNA SUBSTITUTION [J].
GOLDMAN, N .
JOURNAL OF MOLECULAR EVOLUTION, 1993, 36 (02) :182-198
[10]  
HAYASAKA K, 1988, MOL BIOL EVOL, V5, P626