Modeling Protein Evolution with Several Amino Acid Replacement Matrices Depending on Site Rates

被引:157
作者
Le, Si Quang [1 ,2 ]
Cuong Cao Dang [3 ]
Gascuel, Olivier [1 ]
机构
[1] Univ Montpellier 2, CNRS, Methodes & Algorithmes Bioinformat LIRMM & IBC, Montpellier 5, France
[2] Wellcome Trust Sanger Inst, Hinxton, England
[3] Vietnam Natl Univ, Univ Engn & Technol, Hanoi, Vietnam
关键词
amino acid substitutions; replacement matrices; gamma and distribution-free rate models; maximum likelihood estimations; phylogenetic inference; SECONDARY STRUCTURE; MIXTURE MODEL; SOLVENT ACCESSIBILITY; DNA-SEQUENCES; SUBSTITUTION; SELECTION; ALGORITHMS; PHYLOGENY; INFERENCE; TREES;
D O I
10.1093/molbev/mss112
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
070307 [化学生物学]; 071010 [生物化学与分子生物学];
摘要
Most protein substitution models use a single amino acid replacement matrix summarizing the biochemical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors that influence the substitution patterns. In this paper, we investigate the use of different substitution matrices for different site evolutionary rates. Indeed, the variability of evolutionary rates corresponds to one of the most apparent heterogeneity factors among sites, and there is no reason to assume that the substitution patterns remain identical regardless of the evolutionary rate. We first introduce LG4M, which is composed of four matrices, each corresponding to one discrete gamma rate category (of four). These matrices differ in their amino acid equilibrium distributions and in their exchangeabilities, contrary to the standard gamma model where only the global rate differs from one category to another. Next, we present LG4X, which also uses four different matrices, but leaves aside the gamma distribution and follows a distribution-free scheme for the site rates. All these matrices are estimated from a very large alignment database, and our two models are tested using a large sample of independent alignments. Detailed analysis of resulting matrices and models shows the complexity of amino acid substitutions and the advantage of flexible models such as LG4M and LG4X. Both significantly outperform single-matrix models, providing gains of dozens to hundreds of log-likelihood units for most data sets. LG4X obtains substantial gains compared with LG4M, thanks to its distribution-free scheme for site rates. Since LG4M and LG4X display such advantages but require the same memory space and have comparable running times to standard models, we believe that LG4M and LG4X are relevant alternatives to single replacement matrices. Our models, data, and software are available from http://www.atgc-montpellier.fr/models/lg4x.
引用
收藏
页码:2921 / 2936
页数:16
相关论文
共 46 条
[1]
NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]
[Anonymous], 1972, ATLAS PROTEIN SEQUEN
[3]
The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[4]
On the interpretation of bootstrap trees: Appropriate threshold of clade selection and induced gain [J].
Berry, V ;
Gascuel, O .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (07) :999-1011
[5]
Bryant D, 2005, MATHEMATICS OF EVOLUTION AND PHYLOGENY, P33
[6]
Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis [J].
Castresana, J .
MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (04) :540-552
[7]
Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle [J].
Desper, R ;
Gascuel, O .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (05) :687-705
[8]
Durbin R., 1998, Biological sequence analysis: probabilistic models of proteins and nucleic acids
[9]
A hidden Markov Model approach to variation among sites in rate of evolution [J].
Felsenstein, J ;
Churchill, GA .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (01) :93-104
[10]
EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376