Unified rational protein engineering with sequence-based deep representation learning

被引:606
作者
Alley, Ethan C. [1 ,2 ]
Khimulya, Grigory
Biswas, Surojit [1 ,3 ]
AlQuraishi, Mohammed [4 ]
Church, George M. [1 ,5 ]
机构
[1] Harvard Univ, Wyss Inst Biol Inspired Engn, Boston, MA 02115 USA
[2] MIT, Media Lab, Cambridge, MA 02139 USA
[3] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
[4] Harvard Med Sch, Dept Syst Biol, Boston, MA 02115 USA
[5] Harvard Med Sch, Dept Genet, Boston, MA 02115 USA
关键词
FITNESS LANDSCAPE; REMOTE HOMOLOGY; PREDICTION; STABILITY; FLUORESCENT; DESIGN;
D O I
10.1038/s41592-019-0598-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Rational protein engineering requires a holistic understanding of protein function. Here, we apply deep learning to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded. We show that the simplest models built on top of this unified representation (UniRep) are broadly applicable and generalize to unseen regions of sequence space. Our data-driven approach predicts the stability of natural and de novo designed proteins, and the quantitative function of molecularly diverse mutants, competitively with the state-of-the-art methods. UniRep further enables two orders of magnitude efficiency improvement in a protein engineering task. UniRep is a versatile summary of fundamental protein features that can be applied across protein engineering informatics.
引用
收藏
页码:1315 / +
页数:12
相关论文
共 79 条
[1]   The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design [J].
Alford, Rebecca F. ;
Leaver-Fay, Andrew ;
Jeliazkov, Jeliazko R. ;
O'Meara, Matthew J. ;
DiMaio, Frank P. ;
Park, Hahnbeom ;
Shapovalov, Maxim V. ;
Renfrew, P. Douglas ;
Mulligan, Vikram K. ;
Kappel, Kalli ;
Labonte, Jason W. ;
Pacella, Michael S. ;
Bonneau, Richard ;
Bradley, Philip ;
Dunbrack, Roland L., Jr. ;
Das, Rhiju ;
Baker, David ;
Kuhlman, Brian ;
Kortemme, Tanja ;
Gray, Jeffrey J. .
JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2017, 13 (06) :3031-3048
[2]   Diversity and Evolution of Coral Fluorescent Proteins [J].
Alieva, Naila O. ;
Konzen, Karen A. ;
Field, Steven F. ;
Meleshkevitch, Ella A. ;
Hunt, Marguerite E. ;
Beltran-Ramirez, Victor ;
Miller, David J. ;
Wiedenmann, Joerg ;
Salih, Anya ;
Matz, Mikhail V. .
PLOS ONE, 2008, 3 (07)
[3]   ProteinNet: a standardized data set for machine learning of protein structure [J].
AlQuraishi, Mohammed .
BMC BIOINFORMATICS, 2019, 20 (1)
[4]   End-to-End Differentiable Learning of Protein Structure [J].
AlQuraishi, Mohammed .
CELL SYSTEMS, 2019, 8 (04) :292-+
[5]  
[Anonymous], 2017, ABS170401444 CORR
[6]  
[Anonymous], H JACKHMM SEARCH HMM
[7]  
[Anonymous], 2016, Multiplicative lstm for sequence modelling
[8]  
[Anonymous], 2016, UNDERSTANDING DEEP L
[9]  
[Anonymous], UNIFIED RATIONAL PRO
[10]  
[Anonymous], 2011, J. Mach. Learn. Res.