Computational Methods for Evaluating Phylogenetic Models of Coding Sequence Evolution with Dependence between Codons

被引:36
作者
Rodrigue, Nicolas [1 ]
Kleinman, Claudia L. [2 ]
Philippe, Herve [2 ]
Lartillot, Nicolas [2 ]
机构
[1] Univ Ottawa, Dept Biol, Ctr Adv Res Environm Genom, Ottawa, ON K1N 6N5, Canada
[2] Univ Montreal, Dept Biochim, Ctr Robert Cedergren, Montreal, PQ H3C 3J7, Canada
基金
加拿大自然科学与工程研究理事会; 加拿大健康研究院;
关键词
Markov chain Monte Carlo; data augmentation; auxiliary variables; posterior predictive checking; Bayes factors; protein tertiary structure; PREDICTIVE P-VALUES; PROTEIN EVOLUTION; SUBSTITUTION MODELS; TERTIARY STRUCTURE; NUCLEOTIDE SUBSTITUTION; LIKELIHOOD APPROACH; SAMPLING METHODS; DNA-SEQUENCES; MARKOV-CHAINS; SELECTION;
D O I
10.1093/molbev/msp078
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In recent years, molecular evolutionary models formulated as site-interdependent Markovian codon substitution processes have been proposed as means of mechanistically accounting for selective features over long-range evolutionary scales. Under such models, site interdependencies are reflected in the use of a simplified protein tertiary structure representation and predefined statistical potential, which, along with mutational parameters, mediate nonsynonymous rates of substitution; rates of synonymous events are solely mediated by mutational parameters. Although theoretically attractive, the models are computationally challenging, and the methods used to manipulate them still do not allow for quantitative model evaluations in a multiple-sequence context. Here, we describe Markov chain Monte Carlo computational methodologies for sampling parameters from their posterior distribution under site-interdependent codon substitution models within a phylogenetic context and allowing for Bayesian model assessment and ranking. Specifically, the techniques we expound here can form the basis of posterior predictive checking under these models and can be embedded within thermodynamic integration algorithms for computing Bayes factors. We illustrate the methods using two data sets and find that although current forms of site-interdependent models of codon substitution provide an improved fit, they are outperformed by the extended site-independent versions. Altogether, the methodologies described here should enable a quantified contrasting of alternative ways of modeling structural constraints, or other site-interdependent criteria, and establish if such formulations can match (or supplant) site-independent model extensions.
引用
收藏
页码:1663 / 1676
页数:14
相关论文
共 46 条
[11]  
GOLDMAN N, 1994, MOL BIOL EVOL, V11, P725
[12]  
HASTINGS WK, 1970, BIOMETRIKA, V57, P97, DOI 10.1093/biomet/57.1.97
[13]   A Markov chain Monte Carlo expectation maximization algorithm for statistical analysis of DNA sequence evolution with neighbor-dependent substitution rates [J].
Hobolth, Asger .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2008, 17 (01) :138-162
[14]   A Dirichlet process model for detecting positive selection in protein-coding DNA sequences [J].
Huelsenbeck, JP ;
Jain, S ;
Frost, SWD ;
Pond, SLK .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (16) :6263-6268
[15]   A NEW APPROACH TO PROTEIN FOLD RECOGNITION [J].
JONES, DT ;
TAYLOR, WR ;
THORNTON, JM .
NATURE, 1992, 358 (6381) :86-89
[16]   A maximum likelihood framework for protein design [J].
Kleinman, Claudia L. ;
Rodrigue, Nicolas ;
Bonnard, Cecile ;
Philippe, Herve ;
Lartillot, Nicolas .
BMC BIOINFORMATICS, 2006, 7 (1)
[17]   Computing Bayes factors using thermodynamic integration [J].
Lartillot, N ;
Philippe, H .
SYSTEMATIC BIOLOGY, 2006, 55 (02) :195-207
[18]   A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process [J].
Lartillot, N ;
Philippe, H .
MOLECULAR BIOLOGY AND EVOLUTION, 2004, 21 (06) :1095-1109
[19]   Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model [J].
Lartillot, Nicolas ;
Brinkmann, Henner ;
Philippe, Herve .
BMC EVOLUTIONARY BIOLOGY, 2007, 7 (Suppl 1)
[20]   Conjugate Gibbs sampling for Bayesian phylogenetic models [J].
Lartillot, Nicolas .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (10) :1701-1722