Assessing site-interdependent phylogenetic models of sequence evolution

被引:52
作者
Rodrigue, Nicolas [1 ]
Philippe, Herve
Lartillot, Nicolas
机构
[1] Univ Montreal, Dept Biochim, Canadian Inst Adv Res, Montreal, PQ H3C 3J7, Canada
[2] Univ Montpellier 2, URM 5506, Lab Informat Robot & Microelect Montpellier, Montpellier 2, France
关键词
Bayes factor; Markov chain Monte Carlo; thermodynamic integration; posterior predictive distributions; protein structure; statistical potentials;
D O I
10.1093/molbev/msl041
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In recent works, methods have been proposed for applying phylogenetic models that allow for a general interdependence between the amino acid positions of a protein. As of yet, such models have focused on site interdependencies resulting from sequence-structure compatibility constraints, using simplified structural representations in combination with a set of statistical potentials. This structural compatibility criterion is meant as a proxy for sequence fitness, and the methods developed thus far can incorporate different site-interdependent fitness proxies based on other measurements. However, no methods have been proposed for comparing and evaluating the adequacy of alternative fitness proxies in this context, or for more general comparisons with canonical models of protein evolution. In the present work, we apply Bayesian methods of model selection-based on numerical calculations of marginal likelihoods and posterior predictive checks-to evaluate models encompassing the site-interdependent framework. Our application of these methods indicates that considering site-interdependencies, as done here, leads to an improved model fit for all data sets studied. Yet, we find that the use of pairwise contact potentials alone does not suitably account for across-site rate heterogeneity or amino acid exchange propensities; for such complexities, site-independent treatments are still called for. The most favored models combine the use of statistical potentials with a suitably rich site-independent model. Altogether, the methodology employed here should allow for a more rigorous and systematic exploration of different ways of modeling explicit structural constraints, or any other site-interdependent criterion, while best exploiting the richness of previously proposed models.
引用
收藏
页码:1762 / 1775
页数:14
相关论文
共 58 条
[11]  
Dayhoff M. O., 1978, ATLAS PROTEIN SEQUEN, P345
[12]  
Dayhoff MO, 1972, Atlas of protein sequence and structure, V5, P88
[13]   A hidden Markov Model approach to variation among sites in rate of evolution [J].
Felsenstein, J ;
Churchill, GA .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (01) :93-104
[14]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[15]   Site-specific amino acid replacement matrices from structurally constrained protein evolution simulations [J].
Fornasari, MS ;
Parisi, G ;
Echave, J .
MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (03) :352-356
[16]   Inferring pattern and process: Maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis [J].
Galtier, N ;
Gouy, M .
MOLECULAR BIOLOGY AND EVOLUTION, 1998, 15 (07) :871-879
[17]  
Gan HH, 2001, PROTEINS, V43, P161, DOI 10.1002/1097-0134(20010501)43:2<161::AID-PROT1028>3.0.CO
[18]  
2-F
[19]  
Gelman A, 1998, STAT SCI, V13, P163
[20]  
Gelman A, 1996, STAT SINICA, V6, P733