A nonparametric method for accommodating and testing across-site rate variation

被引:44
作者
Huelsenbeck, John P. [1 ]
Suchard, Marc A. [2 ,3 ,4 ]
机构
[1] Univ Calif Berkeley, Dept Integrat Biol, Berkeley, CA 94720 USA
[2] Univ Calif Los Angeles, Dept Biomath, David Geffen Sch Med, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Human Genet, David Geffen Sch Med, Los Angeles, CA 90095 USA
[4] Univ Calif Los Angeles, Sch Publ Hlth, Dept Biostat, Los Angeles, CA 90095 USA
关键词
across-site rate variation; Bayesian estimation; Dirichlet process prior; Markov chain Monte Carlo;
D O I
10.1080/10635150701670569
中图分类号
Q [生物科学];
学科分类号
07 [理学]; 0710 [生物学]; 09 [农学];
摘要
Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites.
引用
收藏
页码:975 / 987
页数:13
相关论文
共 59 条
[1]
MIXTURES OF DIRICHLET PROCESSES WITH APPLICATIONS TO BAYESIAN NONPARAMETRIC PROBLEMS [J].
ANTONIAK, CE .
ANNALS OF STATISTICS, 1974, 2 (06) :1152-1174
[2]
Substantial regional variation in substitution rates in the human genome: Importance of GC content, gene density, and telomere-specific effects [J].
Arndt, PF ;
Hwa, T ;
Petrov, DA .
JOURNAL OF MOLECULAR EVOLUTION, 2005, 60 (06) :748-U28
[3]
Bell ET., 1934, AM MATH MONTHLY, V41, P411, DOI DOI 10.1080/00029890.1934.11987615
[4]
Exploring among-site rate variation models in a maximum likelihood framework using empirical data: Effects of model assumptions on estimates of topology, branch lengths, and bootstrap support [J].
Buckley, TR ;
Simon, C ;
Chambers, GK .
SYSTEMATIC BIOLOGY, 2001, 50 (01) :67-86
[5]
EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[6]
Felsenstein Joseph, 2004, Inferring_phylogenies, V2
[7]
BAYESIAN ANALYSIS OF SOME NONPARAMETRIC PROBLEMS [J].
FERGUSON, TS .
ANNALS OF STATISTICS, 1973, 1 (02) :209-230
[8]
AN IMPROVED METHOD FOR DETERMINING CODON VARIABILITY IN A GENE AND ITS APPLICATION TO RATE OF FIXATION OF MUTATIONS IN EVOLUTION [J].
FITCH, WM ;
MARKOWITZ, E .
BIOCHEMICAL GENETICS, 1970, 4 (05) :579-+
[9]
Maximum-likelihood phylogenetic analysis under a covarion-like model [J].
Galtier, N .
MOLECULAR BIOLOGY AND EVOLUTION, 2001, 18 (05) :866-873
[10]
Green PJ, 1995, BIOMETRIKA, V82, P711, DOI 10.2307/2337340