Modeling coding-sequence evolution within the context of residue solvent accessibility

被引:26
作者
Scherrer, Michael P.
Meyer, Austin G.
Wilke, Claus O. [1 ]
机构
[1] Univ Texas Austin, Inst Cellular & Mol Biol, Ctr Computat Biol & Bioinformat, Austin, TX 78712 USA
来源
BMC EVOLUTIONARY BIOLOGY | 2012年 / 12卷
基金
美国国家科学基金会;
关键词
OPTIMAL CODONS ASSOCIATE; PROTEIN EVOLUTION; STRUCTURAL DETERMINANTS; NUCLEOTIDE SUBSTITUTION; PHYLOGENETIC MODELS; SECONDARY STRUCTURE; SITES; STABILITY; SELECTION; DEPENDENCE;
D O I
10.1186/1471-2148-12-179
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Protein structure mediates site-specific patterns of sequence divergence. In particular, residues in the core of a protein (solvent-inaccessible residues) tend to be more evolutionarily conserved than residues on the surface (solvent-accessible residues). Results: Here, we present a model of sequence evolution that explicitly accounts for the relative solvent accessibility of each residue in a protein. Our model is a variant of the Goldman-Yang 1994 (GY94) model in which all model parameters can be functions of the relative solvent accessibility (RSA) of a residue. We apply this model to a data set comprised of nearly 600 yeast genes, and find that an evolutionary-rate ratio omega that varies linearly with RSA provides a better model fit than an RSA-independent omega or an omega that is estimated separately in individual RSA bins. We further show that the branch length t and the transition-transverion ratio kappa also vary with RSA. The RSA-dependent GY94 model performs better than an RSA-dependent Muse-Gaut 1994 (MG94) model in which the synonymous and non-synonymous rates individually are linear functions of RSA. Finally, protein core size affects the slope of the linear relationship between omega and RSA, and gene expression level affects both the intercept and the slope. Conclusions: Structure-aware models of sequence evolution provide a significantly better fit than traditional models that neglect structure. The linear relationship between omega and RSA implies that genes are better characterized by their omega slope and intercept than by just their mean omega.
引用
收藏
页数:11
相关论文
共 62 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[4]  
Bierne N, 2003, GENETICS, V165, P1587
[5]   Protein stability promotes evolvability [J].
Bloom, JD ;
Labthavikul, ST ;
Otey, CR ;
Arnold, FH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (15) :5869-5874
[6]   Structural determinants of the rate of protein evolution in yeast [J].
Bloom, Jesse D. ;
Drummond, D. Allan ;
Arnold, Frances H. ;
Wilke, Claus O. .
MOLECULAR BIOLOGY AND EVOLUTION, 2006, 23 (09) :1751-1761
[7]   Multimodel inference - understanding AIC and BIC in model selection [J].
Burnham, KP ;
Anderson, DR .
SOCIOLOGICAL METHODS & RESEARCH, 2004, 33 (02) :261-304
[8]   Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica [J].
Bustamante, CD ;
Townsend, JP ;
Hartl, DL .
MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (02) :301-308
[9]   SGD:: Saccharomyces Genome Database [J].
Cherry, JM ;
Adler, C ;
Ball, C ;
Chervitz, SA ;
Dwight, SS ;
Hester, ET ;
Jia, YK ;
Juvik, G ;
Roe, T ;
Schroeder, M ;
Weng, SA ;
Botstein, D .
NUCLEIC ACIDS RESEARCH, 1998, 26 (01) :73-79
[10]   THE RELATION BETWEEN THE DIVERGENCE OF SEQUENCE AND STRUCTURE IN PROTEINS [J].
CHOTHIA, C ;
LESK, AM .
EMBO JOURNAL, 1986, 5 (04) :823-826