Estimation of rates-across-sites distributions in phylogenetic substitution models

被引:43
作者
Susko, E [1 ]
Field, C
Blouin, C
Roger, AJ
机构
[1] Dalhousie Univ, Dept Math & Stat, Halifax, NS B3H 3J5, Canada
[2] Dalhousie Univ, Dept Biochem & Mol Biol, Program Evolutionary Biol, Canadian Inst Adv Res, Halifax, NS B3H 4H7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
gamma model; Markov models; maximum likelihood; molecular evolution; phylogenetics; rate distribution;
D O I
10.1080/10635150390235395
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Previous work has shown that it is often essential to account for the variation in rates at different sites in phylogenetic models in order to avoid phylogenetic artifacts such as long branch attraction. In most current models, the gamma distribution is used for the rates-across-sites distributions and is implemented as an equal-probability discrete gamma. In this article, we introduce discrete distribution estimates with large numbers of equally spaced rate categories allowing us to investigate the appropriateness of the gamma model. With large numbers of rate categories, these discrete estimates are flexible enough to approximate the shape of almost any distribution. Likelihood ratio statistical tests and a nonparametric bootstrap confidence-bound estimation procedure based on the discrete estimates are presented that can be used to test the fit of a parametric family. We applied the methodology to several different protein data sets, and found that although the gamma model often provides a good parametric model for this type of data, rate estimates from an equal-probability discrete gamma model with a small number of categories will tend to underestimate the largest rates. In cases when the gamma model assumption is in doubt, rate estimates coming from the discrete rate distribution estimate with a large number of rate categories provide a robust alternative to gamma estimates. An alternative implementation of the gamma distribution is proposed that, for equal numbers of rate categories, is computationally more efficient during optimization than the standard gamma implementation and can provide more accurate estimates of site rates.
引用
收藏
页码:594 / 603
页数:10
相关论文
共 29 条
[21]   Quartet puzzling: A quartet maximum-likelihood method for reconstructing tree topologies [J].
Strimmer, K ;
vonHaeseler, A .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (07) :964-969
[22]   Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution pattern are violated? [J].
Sullivan, J ;
Swofford, DL .
SYSTEMATIC BIOLOGY, 2001, 50 (05) :723-729
[23]   Testing for differences in rates-across-sites distributions in phylogenetic subtrees [J].
Susko, E ;
Inagaki, Y ;
Field, C ;
Holder, ME ;
Roger, AJ .
MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (09) :1514-1523
[24]  
SWOFFORD DL, 2000, PAUP PHYLOGENETIC AN
[25]   CLUSTAL-W - IMPROVING THE SENSITIVITY OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT THROUGH SEQUENCE WEIGHTING, POSITION-SPECIFIC GAP PENALTIES AND WEIGHT MATRIX CHOICE [J].
THOMPSON, JD ;
HIGGINS, DG ;
GIBSON, TJ .
NUCLEIC ACIDS RESEARCH, 1994, 22 (22) :4673-4680
[26]  
UZZEL T, 1971, SCIENCE, V173, P1089
[27]   Hadamard conjugations and modeling sequence evolution with unequal rates across sites [J].
Waddell, PJ ;
Penny, D ;
Moore, T .
MOLECULAR PHYLOGENETICS AND EVOLUTION, 1997, 8 (01) :33-50
[28]  
Yang ZH, 2000, GENETICS, V155, P431
[29]   MAXIMUM-LIKELIHOOD PHYLOGENETIC ESTIMATION FROM DNA-SEQUENCES WITH VARIABLE RATES OVER SITES - APPROXIMATE METHODS [J].
YANG, ZH .
JOURNAL OF MOLECULAR EVOLUTION, 1994, 39 (03) :306-314