Computational Methods for Evaluating Phylogenetic Models of Coding Sequence Evolution with Dependence between Codons

被引:36
作者
Rodrigue, Nicolas [1 ]
Kleinman, Claudia L. [2 ]
Philippe, Herve [2 ]
Lartillot, Nicolas [2 ]
机构
[1] Univ Ottawa, Dept Biol, Ctr Adv Res Environm Genom, Ottawa, ON K1N 6N5, Canada
[2] Univ Montreal, Dept Biochim, Ctr Robert Cedergren, Montreal, PQ H3C 3J7, Canada
基金
加拿大自然科学与工程研究理事会; 加拿大健康研究院;
关键词
Markov chain Monte Carlo; data augmentation; auxiliary variables; posterior predictive checking; Bayes factors; protein tertiary structure; PREDICTIVE P-VALUES; PROTEIN EVOLUTION; SUBSTITUTION MODELS; TERTIARY STRUCTURE; NUCLEOTIDE SUBSTITUTION; LIKELIHOOD APPROACH; SAMPLING METHODS; DNA-SEQUENCES; MARKOV-CHAINS; SELECTION;
D O I
10.1093/molbev/msp078
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In recent years, molecular evolutionary models formulated as site-interdependent Markovian codon substitution processes have been proposed as means of mechanistically accounting for selective features over long-range evolutionary scales. Under such models, site interdependencies are reflected in the use of a simplified protein tertiary structure representation and predefined statistical potential, which, along with mutational parameters, mediate nonsynonymous rates of substitution; rates of synonymous events are solely mediated by mutational parameters. Although theoretically attractive, the models are computationally challenging, and the methods used to manipulate them still do not allow for quantitative model evaluations in a multiple-sequence context. Here, we describe Markov chain Monte Carlo computational methodologies for sampling parameters from their posterior distribution under site-interdependent codon substitution models within a phylogenetic context and allowing for Bayesian model assessment and ranking. Specifically, the techniques we expound here can form the basis of posterior predictive checking under these models and can be embedded within thermodynamic integration algorithms for computing Bayes factors. We illustrate the methods using two data sets and find that although current forms of site-interdependent models of codon substitution provide an improved fit, they are outperformed by the extended site-independent versions. Altogether, the methodologies described here should enable a quantified contrasting of alternative ways of modeling structural constraints, or other site-interdependent criteria, and establish if such formulations can match (or supplant) site-independent model extensions.
引用
收藏
页码:1663 / 1676
页数:14
相关论文
共 46 条
[1]   Investigating Protein-Coding Sequence Evolution with Probabilistic Codon Substitution Models [J].
Anisimova, Maria ;
Kosiol, Carolin .
MOLECULAR BIOLOGY AND EVOLUTION, 2009, 26 (02) :255-271
[2]  
[Anonymous], 2006, P 22 ANN C UNC ART I
[3]   A Model-Based Approach to Study Nearest-Neighbor Influences Reveals Complex Substitution Patterns in Non-coding Sequences [J].
Baele, Guy ;
Van de Peer, Yves ;
Vansteelandt, Stijn .
SYSTEMATIC BIOLOGY, 2008, 57 (05) :675-692
[4]   Local propensities and statistical potentials of backbone dihedral angles in proteins [J].
Betancourt, MR ;
Skolnick, J .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 342 (02) :635-649
[5]  
Bollback JP, 2005, STAT BIOL HEALTH, P439, DOI 10.1007/0-387-27733-1_16
[6]   Quantifying the impact of protein tertiary structure on molecular evolution [J].
Choi, Sang Chul ;
Hobolth, Asger ;
Robinson, Douglas M. ;
Kishino, Hirohisa ;
Thorne, Jeffrey L. .
MOLECULAR BIOLOGY AND EVOLUTION, 2007, 24 (08) :1769-1782
[7]   On the conservativeness of posterior predictive p-values [J].
Dahl, FA .
STATISTICS & PROBABILITY LETTERS, 2006, 76 (11) :1170-1174
[8]   Models of coding sequence evolution [J].
Delport, Wayne ;
Scheffler, Konrad ;
Seoighe, Cathal .
BRIEFINGS IN BIOINFORMATICS, 2009, 10 (01) :97-109
[9]  
Gelman A, 1996, STAT SINICA, V6, P733
[10]  
Gelman A., 2021, Bayesian Data Analysis