Integrating Sequence Variation and Protein Structure to Identify Sites under Selection

被引:32
作者
Meyer, Austin G. [1 ]
Wilke, Claus O. [1 ]
机构
[1] Univ Texas Austin, Ctr Computat Biol & Bioinformat, Inst Cellular & Mol Biol, Sect Integrat Biol, Austin, TX 78712 USA
关键词
positive selection; protein evolution; relative solvent accessibility; influenza; DETECTING POSITIVE SELECTION; CODON-SUBSTITUTION MODELS; AMINO-ACID SITES; SOLVENT ACCESSIBILITY; NUCLEOTIDE SUBSTITUTION; PHYLOGENETIC MODELS; SECONDARY STRUCTURE; TERTIARY STRUCTURE; EVOLUTION; DETERMINANTS;
D O I
10.1093/molbev/mss217
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We present a novel method to identify sites under selection in protein-coding genes. Our method combines the traditional Goldman-Yang model of coding-sequence evolution with the information obtained from the 3D structure of the evolving protein, specifically the relative solvent accessibility (RSA) of individual residues. We develop a random-effects likelihood sites model in which rate classes are RSA dependent. The RSA dependence is modeled with linear functions. We demonstrate that our RSA-dependent model provides a significantly better fit to molecular sequence data than does a traditional, RSA-independent model. We further show that our model provides a natural, RSA-dependent neutral baseline for the evolutionary rate ratio omega = dN/dS Sites that deviate from this neutral baseline likely experience selection pressure for function. We apply our method to the influenza proteins hemagglutinin and neuraminidase. For hemagglutinin, our method recovers positively selected sites near the sialic acid-binding site and negatively selected sites that may be important for trimerization. For neuraminidase, our method recovers the oseltamivir resistance site and otherwise suggests that few sites deviate from the neutral baseline. Our method is broadly applicable to any protein sequences for which structural data are available or can be obtained via homology modeling or threading.
引用
收藏
页码:36 / 44
页数:9
相关论文
共 44 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]   THE LIKELIHOOD RATIO TEST FOR GENERAL MIXTURE MODELS WITH OR WITHOUT STRUCTURAL PARAMETER [J].
Azais, Jean-Marc ;
Gassiat, Elisabeth ;
Mercadier, Cecile .
ESAIM-PROBABILITY AND STATISTICS, 2009, 13 :301-327
[3]   Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data [J].
Bao, Le ;
Gu, Hong ;
Dunn, Katherine A. ;
Bielawski, Joseph P. .
BMC EVOLUTIONARY BIOLOGY, 2007, 7 (Suppl 1)
[4]   The Genomic Rate of Molecular Adaptation of the Human Influenza A Virus [J].
Bhatt, Samir ;
Holmes, Edward C. ;
Pybus, Oliver G. .
MOLECULAR BIOLOGY AND EVOLUTION, 2011, 28 (09) :2443-2451
[5]   Structural determinants of the rate of protein evolution in yeast [J].
Bloom, Jesse D. ;
Drummond, D. Allan ;
Arnold, Frances H. ;
Wilke, Claus O. .
MOLECULAR BIOLOGY AND EVOLUTION, 2006, 23 (09) :1751-1761
[6]   Permissive Secondary Mutations Enable the Evolution of Influenza Oseltamivir Resistance [J].
Bloom, Jesse D. ;
Gong, Lizhi Ian ;
Baltimore, David .
SCIENCE, 2010, 328 (5983) :1272-1275
[7]   Multimodel inference - understanding AIC and BIC in model selection [J].
Burnham, KP ;
Anderson, DR .
SOCIOLOGICAL METHODS & RESEARCH, 2004, 33 (02) :261-304
[8]   Predicting the evolution of human influenza A [J].
Bush, RM ;
Bender, CA ;
Subbarao, K ;
Cox, NJ ;
Fitch, WM .
SCIENCE, 1999, 286 (5446) :1921-1925
[9]   Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica [J].
Bustamante, CD ;
Townsend, JP ;
Hartl, DL .
MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (02) :301-308
[10]   Quantifying the impact of protein tertiary structure on molecular evolution [J].
Choi, Sang Chul ;
Hobolth, Asger ;
Robinson, Douglas M. ;
Kishino, Hirohisa ;
Thorne, Jeffrey L. .
MOLECULAR BIOLOGY AND EVOLUTION, 2007, 24 (08) :1769-1782