A simple hierarchical approach to modeling distributions of substitution rates

被引:48
作者
Pond, SLK [1 ]
Frost, SDW [1 ]
机构
[1] Univ Calif San Diego, Antiviral Res Ctr, San Diego, CA 92103 USA
关键词
substitution rates; hierarchical model; adaptive evolution; hepatitis C; model selection; parallel algorithms;
D O I
10.1093/molbev/msi009
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Genetic sequence data typically exhibit variability in substitution rates across sites. In practice. there is often too hale, variation to fit a different rate for each site in the alignment. but the distribution of rates across sites may not be well modeled using simple parametric families. Mixtures of different distributions can capture more complex patterns of rate variation, but are often parameter-rich and difficult to fit. We present a simple hierarchical model in which a baseline rate distribution, such as a gamma distribution. is discretized into several categories, the quantiles of which are estimated using a discretized beta distribution. Although this approach involves adding only two extra parameters to a standard distribution, a wide range of rate distributions can be captured. Using simulated data, we demonstrate that a "beta-" model can reproduce the moments of the rate distribution more accurately than the distribution used to simulate the data. even when the baseline rate distribution is misspecified. Using hepatitis C virus and mammalian mitochondrial sequences, we show that a beta-model can fit as well or better than a model with multiple discrete rate categories. and compares favorably with a model which fits a separate rate category to each site. We also demonstrate this discretization scheme in the context of codon models specifically aimed at identifying individual sites undergoing adaptive or purifying evolution.
引用
收藏
页码:223 / 234
页数:12
相关论文
共 34 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   Taking variation of evolutionary rates between sites into account in inferring phylogenies [J].
Felsenstein, J .
JOURNAL OF MOLECULAR EVOLUTION, 2001, 53 (4-5) :447-455
[3]   BAYESIAN ANALYSIS OF SOME NONPARAMETRIC PROBLEMS [J].
FERGUSON, TS .
ANNALS OF STATISTICS, 1973, 1 (02) :209-230
[4]  
Fitch W M, 1971, J Mol Evol, V1, P84, DOI 10.1007/BF01659396
[5]   THE SUPEROXIDE-DISMUTASE MOLECULAR CLOCK REVISITED [J].
FITCH, WM ;
AYALA, FJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (15) :6802-6807
[6]   AN IMPROVED METHOD FOR DETERMINING CODON VARIABILITY IN A GENE AND ITS APPLICATION TO RATE OF FIXATION OF MUTATIONS IN EVOLUTION [J].
FITCH, WM ;
MARKOWITZ, E .
BIOCHEMICAL GENETICS, 1970, 4 (05) :579-+
[7]  
GOLDMAN N, 1994, MOL BIOL EVOL, V11, P725
[8]   A simple method for estimating the parameter of substitution rate variation among sites [J].
Gu, X ;
Zhang, JZ .
MOLECULAR BIOLOGY AND EVOLUTION, 1997, 14 (11) :1106-1113
[9]   The estimation of relative site variability among aligned homologous protein sequences [J].
Horner, DS ;
Pesole, G .
BIOINFORMATICS, 2003, 19 (05) :600-606
[10]   Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo [J].
Huelsenbeck, JP ;
Larget, B ;
Alfaro, ME .
MOLECULAR BIOLOGY AND EVOLUTION, 2004, 21 (06) :1123-1133