An empirical examination of the utility of codon-substitution models in phylogeny reconstruction

被引:93
作者
Ren, FR [1 ]
Tanaka, H
Yang, ZH
机构
[1] Tokyo Med & Dent Univ, Ctr Informat Med, Tokyo, Japan
[2] UCL, Dept Biol, London WC1E 6BT, England
基金
英国生物技术与生命科学研究理事会;
关键词
codon models; divergence dates; maximum likelihood; phylogenetics; phylogenetic information;
D O I
10.1080/10635150500354688
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Models of codon substitution have been commonly used to compare protein-coding DNA sequences and are particularly effective in detecting signals of natural selection acting on the protein. Their utility in reconstructing molecular phylogenies and in dating species divergences has not been explored. Codon models naturally accommodate synonymous and nonsynonymous substitutions, which occur at very different rates and may be informative for recent and ancient divergences, respectively. Thus codon models may be expected to make an efficient use of phylogenetic information in protein-coding DNA sequences. Here we applied codon models to 106 protein-coding genes from eight yeast species to reconstruct phylogenies using the maximum likelihood method, in comparison with nucleotide- and amino acid-based analyses. The results appeared to confirm that expectation. Nucleotide-based analysis, under simplistic substitution models, were efficient in recovering recent divergences whereas amino acid-based analysis performed better at recovering deep divergences. Codon models appeared to combine the advantages of amino acid and nucleotide data and had good performance at recovering both recent and deep divergences. Estimation of relative species divergence times using amino acid and codon models suggested that translation of gene sequences into proteins led to information loss of from 30% for deep nodes to 66% for recent nodes. Although computational burden makes codon models unfeasible for tree search in large data sets, we suggest that they may be useful for comparing candidate trees. Nucleotide models that accommodate the differences in evolutionary dynamics at the three codon positions also performed well, at much less computational cost. We discuss the relationship between a model's fit to data and its utility in phylogeny reconstruction and caution against use of overly complex substitution models.
引用
收藏
页码:808 / 818
页数:11
相关论文
共 55 条
[1]  
Adachi J, 1996, J MOL EVOL, V42, P459
[2]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[3]   A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution [J].
Bielawski, JP ;
Yang, ZH .
JOURNAL OF MOLECULAR EVOLUTION, 2004, 59 (01) :121-132
[4]   Recreating a functional ancestral archosaur visual pigment [J].
Chang, BSW ;
Jönsson, K ;
Kazmi, MA ;
Donoghue, MJ ;
Sakmar, TP .
MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (09) :1483-1489
[5]  
CUMMINGS MP, 1995, MOL BIOL EVOL, V12, P814
[6]  
Dayhoff M.O., 1978, ATLAS PROTEIN SEQ ST, V5
[7]  
EFRON B, 1978, BIOMETRIKA, V65, P457, DOI 10.1093/biomet/65.3.457
[8]   CASES IN WHICH PARSIMONY OR COMPATIBILITY METHODS WILL BE POSITIVELY MISLEADING [J].
FELSENSTEIN, J .
SYSTEMATIC ZOOLOGY, 1978, 27 (04) :401-410
[9]  
FELSENSTEIN J, 1985, EVOLUTION, V39, P783, DOI 10.1111/j.1558-5646.1985.tb00420.x
[10]  
Felsenstein Joseph, 2004, Inferring_phylogenies, V2