An empirical codon model for protein sequence evolution

被引:128
作者
Kosiol, Carolin [1 ]
Holmes, Ian
Goldman, Nick
机构
[1] European Bioinformat Inst, European Mol Biol Lab, Hinxton, England
[2] Cornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY USA
[3] Univ Calif Berkeley, Dept Bioengn, Berkeley, CA 94720 USA
关键词
protein evolution; codon models; Markov models; maximum likelihood; phylogenetic inference;
D O I
10.1093/molbev/msm064
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In the past, 2 kinds of Markov models have been considered to describe protein sequence evolution. Codon-level models have been mechanistic with a small number of parameters designed to take into account features, such as transition-transversion bias, codon frequency bias, and synonymous-nonsynonymous amino acid substitution bias. Amino acid models have been empirical, attempting to summarize the replacement patterns observed in large quantities of data and not explicitly considering the distinct factors that shape protein evolution. We have estimated the first empirical codon model (ECM). Previous codon models assume that protein evolution proceeds only by successive single nucleotide substitutions, but our results indicate that model accuracy is significantly improved by incorporating instantaneous doublet and triplet changes. We also find that the affiliations between codons, the amino acid each encodes and the physicochemical properties of the amino acids are main factors driving the process of codon evolution. Neither multiple nucleotide changes nor the strong influence of the genetic code nor amino acids' physicochemical properties form a part of standard mechanistic models and their views of how codon evolution proceeds. We have implemented the ECM for likelihood-based phylogenetic analysis, and an assessment of its ability to describe protein evolution shows that it consistently outperforms comparable mechanistic codon models. We point out the biological interpretation of our ECM and possible consequences for studies of selection.
引用
收藏
页码:1464 / 1479
页数:16
相关论文
共 64 条
[1]   Accounting for uncertainty in the tree topology has little effect on the decision-theoretic approach to model selection in phylogeny estimation [J].
Abdo, Z ;
Minin, VN ;
Joyce, P ;
Sullivan, J .
MOLECULAR BIOLOGY AND EVOLUTION, 2005, 22 (03) :691-703
[2]   Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA [J].
Adachi, J ;
Waddell, PJ ;
Martin, W ;
Hasegawa, M .
JOURNAL OF MOLECULAR EVOLUTION, 2000, 50 (04) :348-358
[3]  
Adachi J, 1996, J MOL EVOL, V42, P459
[4]  
[Anonymous], ATLAS PROTEIN SEQUEN
[5]  
[Anonymous], THESIS U CAMBRIDGE
[6]  
[Anonymous], 1978, Atlas of protein sequence and structure
[7]  
[Anonymous], 2004, Inferring Phylogenies
[8]   Determinants of adaptive evolution at the molecular level: the extended complexity hypothesis [J].
Aris-Brosou, S .
MOLECULAR BIOLOGY AND EVOLUTION, 2005, 22 (02) :200-209
[9]   Evidence for a high frequency of simultaneous double-nucleotide substitutions [J].
Averof, M ;
Rokas, A ;
Wolfe, KH ;
Sharp, PM .
SCIENCE, 2000, 287 (5456) :1283-1286
[10]  
Bateman A, 2002, NUCLEIC ACIDS RES, V30, P276, DOI [10.1093/nar/gkr1065, 10.1093/nar/gkp985, 10.1093/nar/gkh121]