Modeling compositional heterogeneity

被引:341
作者
Foster, PG [1 ]
机构
[1] Nat Hist Museum, Dept Zool, London SW7 5BD, England
关键词
Compositional heterogeneity; Markov chain Monte Carlo; maximum likelihood; model assessment; model selection; phylogenetics;
D O I
10.1080/10635150490445779
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Compositional heterogeneity among lineages can compromise phylogenetic analyses, because models in common use assume compositionally homogeneous data. Models that can accommodate compositional heterogeneity with few extra parameters are described here, and used in two examples where the true tree is known with confidence. It is shown using likelihood ratio tests that adequate modeling of compositional heterogeneity can be achieved with few composition parameters, that the data may not need to be modelled with separate composition parameters for each branch in the tree. Tree searching and placement of composition vectors on the tree are done in a Bayesian framework using Markov chain Monte Carlo (MCMC) methods. Assessment of fit of the model to the data is made in both maximum likelihood (ML) and Bayesian frameworks. In an ML framework, overall model fit is assessed using the Goldman-Cox test, and the fit of the composition implied by a (possibly heterogeneous) model to the composition of the data is assessed using a novel tree- and model-based composition fit test. In a Bayesian framework, overall model fit and composition fit are assessed using posterior predictive simulation. It is shown that when composition is not accommodated, then the model does not fit, and incorrect trees are found; but when composition is accommodated, the model then fits, and the known correct phylogenies are obtained.
引用
收藏
页码:485 / 495
页数:11
相关论文
共 43 条
[31]   Heterogeneity of nucleotide frequencies among evolutionary lineages and phylogenetic inference [J].
Rosenberg, MS ;
Kumar, S .
MOLECULAR BIOLOGY AND EVOLUTION, 2003, 20 (04) :610-621
[32]   TESTS OF APPLICABILITY OF SEVERAL SUBSTITUTION MODELS FOR DNA-SEQUENCE DATA [J].
RZHETSKY, A ;
NEI, M .
MOLECULAR BIOLOGY AND EVOLUTION, 1995, 12 (01) :131-151
[33]   RECOVERING A TREE FROM THE LEAF COLOURATIONS IT GENERATES UNDER A MARKOV MODEL [J].
STEEL, M .
APPLIED MATHEMATICS LETTERS, 1994, 7 (02) :19-23
[34]   Bayesian selection of continuous-time Markov chain evolutionary models [J].
Suchard, MA ;
Weiss, RE ;
Sinsheimer, JS .
MOLECULAR BIOLOGY AND EVOLUTION, 2001, 18 (06) :1001-1013
[35]   Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics [J].
Sullivan J. ;
Swofford D.L. .
Journal of Mammalian Evolution, 1997, 4 (2) :77-86
[36]  
Swofford David L., 1996, P407
[37]  
Swofford DL., 2002, PAUP 40B10
[38]   Shared nucleotide composition biases among species and their impact on phylogenetic reconstructions of the drosophilidae [J].
Tarrío, R ;
Rodríguez-Trelles, F ;
Ayala, FJ .
MOLECULAR BIOLOGY AND EVOLUTION, 2001, 18 (08) :1464-1473
[39]   Tree rooting with outgroups when they differ in their nucleotide composition from the ingroup:: The Drosophila saltans and willistoni groups, a case study [J].
Tarrío, R ;
Rodríguez-Trelles, F ;
Ayala, FJ .
MOLECULAR PHYLOGENETICS AND EVOLUTION, 2000, 16 (03) :344-349
[40]   Molecular phylogenetics:: state-of-the-art methods for looking into the past [J].
Whelan, S ;
Liò, P ;
Goldman, N .
TRENDS IN GENETICS, 2001, 17 (05) :262-272