A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach

被引:2206
作者
Whelan, S [1 ]
Goldman, N [1 ]
机构
[1] Univ Cambridge, Dept Zool, Cambridge CB2 3EJ, England
关键词
amino acid replacement; general reversible model; maximum likelihood; protein evolution;
D O I
10.1093/oxfordjournals.molbev.a003851
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Phylogenetic inference from amino acid sequence data uses mainly empirical models of amino acid replacement and is therefore dependent on those models. Two of the more widely used models, the Dayhoff and JTT models, are estimated using similar methods that can utilize large numbers of sequences from many unrelated protein families but are somewhat unsatisfactory because they rely on assumptions that may lead to systematic error and discard a large amount of the information within the sequences. The alternative method of maximum-likelihood estimation may utilize the information in the sequence data more efficiently and suffers from no systematic error, but it has previously been applicable to relatively few sequences related by a single phylogenetic tree. Here, we combine the best attributes of these two methods using an approximate maximum-likelihood method. We implemented this approach to estimate a new model of amino acid replacement from a database of globular protein sequences comprising 3,905 amino acid sequences split into 182 protein families. While the new model has an overall structure similar to those of other commonly used models, there are significant differences. The new model outperforms the Dayhoff and JTT models with respect to maximum-likelihood values for a large majority of the protein families in our database. This suggests that it provides a better overall fit to the evolutionary process in globular proteins and may lead to more accurate phylogenetic tree estimates. Potentially, this matrix. and the methods used to generate it, may also be useful in other areas of research, such as biological sequence database searching, sequence alignment, and protein structure prediction, for which an accurate description of amino acid replacement is required.
引用
收藏
页码:691 / 699
页数:9
相关论文
共 22 条
[1]   Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA [J].
Adachi, J ;
Waddell, PJ ;
Martin, W ;
Hasegawa, M .
JOURNAL OF MOLECULAR EVOLUTION, 2000, 50 (04) :348-358
[2]  
Adachi J, 1996, J MOL EVOL, V42, P459
[3]  
[Anonymous], 1978, Atlas of protein sequence and structure
[4]  
CAO Y, 1994, J MOL EVOL, V39, P519
[5]  
DAHOFF MO, 1972, ATLAS PROTEIN SEQUEN, V5, P89
[6]  
Felsenstein J., 1995, PHYLIP PHYLOGENETIC
[7]   A nuclear gene for higher level phylogenetics: Phosphoenolpyruvate carboxykinase tracks Mesozoic-age divergences within Lepidoptera (Insecta) [J].
Friedlander, TP ;
Regier, JC ;
Mitter, C ;
Wagner, DL .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (04) :594-604
[8]   Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses [J].
Goldman, N ;
Thorne, JL ;
Jones, DT .
JOURNAL OF MOLECULAR BIOLOGY, 1996, 263 (02) :196-208
[9]  
Goldman N, 1998, GENETICS, V149, P445
[10]   THE RAPID GENERATION OF MUTATION DATA MATRICES FROM PROTEIN SEQUENCES [J].
JONES, DT ;
TAYLOR, WR ;
THORNTON, JM .
COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1992, 8 (03) :275-282