Evaluating the robustness of phylogenetic methods to among-site variability in substitution processes

被引:32
作者
Holder, Mark T. [1 ]
Zwickl, Derrick J. [1 ]
Dessimoz, Christophe [2 ,3 ]
机构
[1] Univ Kansas, Dept Ecol & Evolutionary Biol, Lawrence, KS 66045 USA
[2] Swiss Fed Inst Technol, Inst Computat Sci, CH-8092 Zurich, Switzerland
[3] Swiss Inst Bioinformat, CH-1211 Geneva, Switzerland
关键词
simulation; phylogenetic inference; codon model; mixture model; partitioned model; RY coding;
D O I
10.1098/rstb.2008.0162
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Computer simulations provide a flexible method for assessing the power and robustness of phylogenetic inference methods. Unfortunately, simulated data are often obviously atypical of data encountered in studies of molecular evolution. Unrealistic simulations can lead to conclusions that are irrelevant to real-data analyses or can provide a biased view of which methods perform well. Here, we present a software tool designed to generate data under a complex codon model that allows each residue in the protein sequence to have a different set of equilibrium amino acid frequencies. The software can obtain maximum-likelihood estimates of the parameters of the Halpern and Bruno model from empirical data and a fixed tree; given an arbitrary tree and a fixed set of parameters, the software can then simulate artificial datasets. We present the results of a simulation experiment using randomly generated tree shapes and substitution parameters estimated from 1610 mammalian cytochrome b sequences. We tested tree inference at the amino acid, nucleotide and codon levels and under parsimony, maximum-likelihood, Bayesian and distance criteria (for a total of more than 650 analyses on each dataset). Based on these simulations, nucleotide-level analyses seem to be more accurate than amino acid and codon analyses. The performance of distance-based phylogenetic methods appears to be quite sensitive to the choice of model and the form of rate heterogeneity used. Further studies are needed to assess the generality of these conclusions. For example, fitting parameters of the Halpern Bruno model to sequences from other genes will reveal the extent to which our conclusions were influenced by the choice of cytochrome b. Incorporating codon bias and more sources heterogeneity into the simulator will be crucial to determining whether the current results are caused by a bias in the current simulation study in favour of nucleotide analyses.
引用
收藏
页码:4013 / 4021
页数:9
相关论文
共 50 条
[41]   Performance of the maximum likelihood, neighbor joining, and maximum parsimony methods when sequence sites are not independent [J].
Schoniger, M ;
vonHaeseler, A .
SYSTEMATIC BIOLOGY, 1995, 44 (04) :533-547
[42]   Evaluating the performance of a successive-approximations approach to parameter optimization in maximum-likelihood phylogeny estimation [J].
Sullivan, J ;
Abdo, Z ;
Joyce, P ;
Swofford, DL .
MOLECULAR BIOLOGY AND EVOLUTION, 2005, 22 (06) :1386-1392
[43]   Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used [J].
Takahashi, K ;
Nei, M .
MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (08) :1251-1258
[44]   ESTIMATION OF THE NUMBER OF NUCLEOTIDE SUBSTITUTIONS IN THE CONTROL REGION OF MITOCHONDRIAL-DNA IN HUMANS AND CHIMPANZEES [J].
TAMURA, K ;
NEI, M .
MOLECULAR BIOLOGY AND EVOLUTION, 1993, 10 (03) :512-526
[45]   Can weighting improve bushy trees?: Models of cytochrome b evolution and the molecular systematics of pipits and wagtails (Aves: Motacillidae) [J].
Voelker, G ;
Edwards, SV .
SYSTEMATIC BIOLOGY, 1998, 47 (04) :589-603
[46]   A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach [J].
Whelan, S ;
Goldman, N .
MOLECULAR BIOLOGY AND EVOLUTION, 2001, 18 (05) :691-699
[47]  
YANG Z, 1995, SYST BIOL, V11, P316
[48]  
Yang ZH, 2000, GENETICS, V155, P431
[49]   MAXIMUM-LIKELIHOOD PHYLOGENETIC ESTIMATION FROM DNA-SEQUENCES WITH VARIABLE RATES OVER SITES - APPROXIMATE METHODS [J].
YANG, ZH .
JOURNAL OF MOLECULAR EVOLUTION, 1994, 39 (03) :306-314
[50]   A mathemahcal theory of evolution, based on the conclusions of Dr J C Willis, F R S [J].
Yule, GU .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY OF LONDON SERIES B-CONTAINING PAPERS OF A BIOLOGICAL CHARACTER, 1925, 213 :21-87