A Gamma mixture model better accounts for among site rate heterogeneity

被引:90
作者
Mayrose, I
Friedman, N
Pupko, T [1 ]
机构
[1] Tel Aviv Univ, George S Wise Fac Life Sci, Dept Cell Res & Immunol, IL-69978 Tel Aviv, Israel
[2] Hebrew Univ Jerusalem, Sch Engn & Comp Sci, IL-91904 Jerusalem, Israel
基金
以色列科学基金会;
关键词
D O I
10.1093/bioinformatics/bti1125
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Variation of substitution rates across nucleotide and amino acid sites has long been recognized as a characteristic of molecular sequence evolution. Evolutionary models that account for this rate heterogeneity usually use a gamma density function to model the rate distribution across sites. This density function, however, may not fit real datasets, especially when there is a multimodal distribution of rates. Here, we present a novel evolutionary model based on a mixture of gamma density functions. This model better describes the among-site rate variation characteristic of molecular sequence evolution. The use of this model may improve the accuracy of various phylogenetic methods, such as reconstructing phylogenetic trees, dating divergence events, inferring ancestral sequences and detecting conserved sites in proteins. Results: Using diverse sets of protein sequences we show that the gamma mixture model better describes the stochastic process underlying protein evolution. We show that the proposed gamma mixture model fits protein datasets significantly better than the single-gamma model in 9 out of 10 datasets tested. We further show that using the gamma mixture model improves the accuracy of model-based prediction of conserved residues in proteins.
引用
收藏
页码:151 / 158
页数:8
相关论文
共 31 条
[1]  
Abramowitz M., 1972, HDB MATH FUNCTIONS F
[2]   Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution [J].
Anisimova, M ;
Bielawski, JP ;
Yang, ZH .
MOLECULAR BIOLOGY AND EVOLUTION, 2001, 18 (08) :1585-1592
[3]  
[Anonymous], 1996, MOL SYSTEMATICS
[4]   Operations for Learning with Graphical Models [J].
Buntine, Wray L. .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1994, 2 :159-225
[5]  
Burnham K. P., 2002, MODEL SELECTION MULT
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[8]   Taking variation of evolutionary rates between sites into account in inferring phylogenies [J].
Felsenstein, J .
JOURNAL OF MOLECULAR EVOLUTION, 2001, 53 (4-5) :447-455
[9]  
Felsenstein Joseph, 2004, Inferring_phylogenies, V2
[10]   Predicting functional divergence in protein evolution by site-specific rate shifts [J].
Gaucher, EA ;
Gu, X ;
Miyamoto, MM ;
Benner, SA .
TRENDS IN BIOCHEMICAL SCIENCES, 2002, 27 (06) :315-321