Identifying sites under positive selection with uncertain parameter estimates

被引:4
作者
Aris-Brosou, Stephane [1 ]
机构
[1] Univ Ottawa, Dept Biol, Ottawa, ON K1N 6N5, Canada
关键词
codon substitution models; empirical Bayes; Bayes empirical Bayes; full Bayes; ROC curves; AIC;
D O I
10.1139/G06-038
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Codon-based substitution models are routinely used to measure selective pressures acting on protein-coding genes. To this effect, the nonsynonymous to synonymous rate ratio (dN/dS = omega) is estimated. The proportion of amino-acid sites potentially under positive selection, as indicated by omega > 1, is inferred by fitting a probability distribution where some sites are permitted to have omega > 1. These sites are then inferred by means of an empirical Bayes or by a Bayes empirical Bayes approach that, respectively, ignores or accounts for sampling errors in maximum-likelihood estimates of the distribution used to infer the proportion of sites with omega > 1. Here, we extend a previous full-Bayes approach to include models with high power and low false-positive rates when inferring sites under positive selection. We propose some heuristics to alleviate the computational burden, and show that (i) full Bayes can be superior to empirical Bayes when analyzing a small data set or small simulated data, (ii) full Bayes has only a small advantage over Bayes empirical Bayes with our small test data, and (iii) Bayesian methods appear relatively insensitive to mild misspecifications of the random process generating adaptive evolution in our simulations, but in practice can prove extremely sensitive to model specification. We suggest that the codon model used to detect amino acids under selection should be carefully selected, for instance using Akaike information criterion (AIC).
引用
收藏
页码:767 / 776
页数:10
相关论文
共 25 条
[1]  
ANDREWS DF, 1970, J AM STAT ASSOC, V65, P1233
[2]  
Anisimova M, 2003, GENETICS, V164, P1229
[3]   Accuracy and power of Bayes prediction of amino acid sites under positive selection [J].
Anisimova, M ;
Bielawski, JP ;
Yang, ZH .
MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (06) :950-958
[4]   How Bayes tests of molecular phylogenies compare with frequentist approaches [J].
Aris-Brosou, S .
BIOINFORMATICS, 2003, 19 (05) :618-624
[5]   Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA Phylogeny [J].
Aris-Brosou, S ;
Yang, ZH .
SYSTEMATIC BIOLOGY, 2002, 51 (05) :703-714
[6]   BAYES EMPIRICAL BAYES [J].
DEELY, JJ ;
LINDLEY, DV .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1981, 76 (376) :833-841
[7]  
GOLDMAN N, 1994, MOL BIOL EVOL, V11, P725
[8]   Bayesian estimation of positively selected sites [J].
Huelsenbeck, JP ;
Dyer, KA .
JOURNAL OF MOLECULAR EVOLUTION, 2004, 58 (06) :661-672
[9]  
LAIRD NM, 1987, J AM STAT ASSOC, V82, P739, DOI 10.2307/2288778
[10]   Tempo and mode of nucleotide substitutions in gag and env gene fragments in human immunodeficiency virus type 1 populations with a known transmission history [J].
Leitner, T ;
Kumar, S ;
Albert, J .
JOURNAL OF VIROLOGY, 1997, 71 (06) :4761-4770