Impact of taxon sampling on the estimation of rates of evolution at sites

被引:20
作者
Blouin, C [1 ]
Butt, D
Roger, AJ
机构
[1] Dalhousie Univ, Dept Biochem & Mol Biol, Halifax, NS, Canada
[2] Canadian Inst Adv Res, Program Evolutionary Biol, Toronto, ON, Canada
[3] Dalhousie Univ, Fac Comp Sci, Halifax, NS, Canada
关键词
protein; evolutionary rate; functional divergence; maximum likelihood; simulation;
D O I
10.1093/molbev/msi065
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The function of individual sites within a protein influences their rate of accepted point mutation. During the computation of phylogenetic likelihoods, rate heterogeneity can be modeled on a site-per-site basis with relative rates drawn from a discretized Gamma-distribution. Site-rate estimates (e.g., the rate of highest posterior probability given the data at a site) can then be used as a measure of evolutionary constraints imposed by function. However, if the sequence availability is limited, the estimation of rates is subject to sampling error. This article presents a simulation study that evaluates the robustness of evolutionary site-rate estimates for both small and phylogenetically unbalanced samples. The sampling error on rate estimates was first evaluated for alignments that included 5-45 sequences, sampled by jackknifing, from a master alignment containing 968 sequences. We observed that the potentially enhanced resolution among site rates due to the inclusion of a larger number of rate categories is negated by the difficulty in correctly estimating intermediate rates. This effect is marked for data sets with less than 30 sequences. Although the computation of likelihood theoretically accounts for phylogenetic distances through branch lengths, the introduction of a single long-branch outlier sequence had a significant negative effect on site-rate estimates. Finally, the presence of a shift in rates of evolution between related lineages can be diagnostic of a gain/loss of function within a protein family. Our analyses indicate that detecting these rate shifts is a harder problem than estimating rates. This is so, partially, because the difference in rates depends on two rate estimates, each with an intrinsic uncertainty. The performances of four methods to detect these site-rate shifts are evaluated and compared. Guidelines are suggested for preparing data sets minimally influenced by error introduced by sequence sampling.
引用
收藏
页码:784 / 791
页数:8
相关论文
共 41 条
[1]  
[Anonymous], 1996, MOL SYSTEMATICS
[2]   Inferring functional constraints and divergence in protein families using 3D mapping of phylogenetic information [J].
Blouin, C ;
Boucher, Y ;
Roger, AJ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (02) :790-797
[3]   Modeling residue usage in aligned protein sequences via maximum likelihood [J].
Bruno, WJ .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (10) :1368-1374
[4]   The pattern of amino acid replacements in α/β-barrels [J].
Dean, AM ;
Neuhauser, C ;
Grenier, E ;
Golding, GB .
MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (11) :1846-1864
[5]   A hidden Markov Model approach to variation among sites in rate of evolution [J].
Felsenstein, J ;
Churchill, GA .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (01) :93-104
[6]   Predicting functional divergence in protein evolution by site-specific rate shifts [J].
Gaucher, EA ;
Gu, X ;
Miyamoto, MM ;
Benner, SA .
TRENDS IN BIOCHEMICAL SCIENCES, 2002, 27 (06) :315-321
[7]  
Grassly NC, 1997, COMPUT APPL BIOSCI, V13, P559
[8]   Statistical methods for testing functional divergence after gene duplication [J].
Gu, X .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (12) :1664-1674
[9]   DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family [J].
Gu, X ;
Vander Velden, K .
BIOINFORMATICS, 2002, 18 (03) :500-501
[10]   Maximum-likelihood approach for gene family evolution under functional divergence [J].
Gu, X .
MOLECULAR BIOLOGY AND EVOLUTION, 2001, 18 (04) :453-464