Improved protein structure prediction by deep learning irrespective of co-evolution information

被引:133
作者
Xu, Jinbo [1 ]
McPartlon, Matthew [1 ,2 ]
Li, Jin [1 ,2 ]
机构
[1] Toyota Technol Inst, Chicago, IL 60637 USA
[2] Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
RESIDUE-RESIDUE CONTACTS; COMPUTATIONAL DESIGN; SEQUENCE;
D O I
10.1038/s42256-021-00348-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting the tertiary structure of a protein from its primary sequence has been greatly improved by integrating deep learning and co-evolutionary analysis, as shown in CASP13 and CASP14. We describe our latest study of this idea, analysing the efficacy of network size and co-evolution data and its performance on both natural and designed proteins. We show that a large ResNet (convolutional residual neural networks) can predict structures of correct folds for 26 out of 32 CASP13 free-modelling targets and L/5 long-range contacts with precision over 80%. When co-evolution is not used, ResNet can still predict structures of correct folds for 18 CASP13 free-modelling targets, greatly exceeding previous methods that do not use co-evolution either. Even with only the primary sequence, ResNet can predict the structures of correct folds for all tested human-designed proteins. In addition, ResNet may fare better for the designed proteins when trained without co-evolution than with co-evolution. These results suggest that ResNet does not simply de-noise co-evolution signals, but instead may learn important protein sequence-structure relationships. This has important implications for protein design and engineering, especially when co-evolutionary data are unavailable. In the last few years, computational protein structure prediction has greatly advanced by combining deep learning including convolutional residual networks (ResNet) with co-evolution data. A new study finds that using deeper and wider ResNets improves predictions in the absence of co-evolution information, suggesting that the ResNets do not not simply de-noise co-evolution signals, but instead may learn important protein sequence-structure relationships.
引用
收藏
页码:601 / +
页数:10
相关论文
共 44 条
[1]   A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments [J].
Abriata, Luciano A. ;
Tamo, Giorgio E. ;
Dal Peraro, Matteo .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2019, 87 (12) :1100-1112
[2]   End-to-End Differentiable Learning of Protein Structure [J].
AlQuraishi, Mohammed .
CELL SYSTEMS, 2019, 8 (04) :292-+
[3]   PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta [J].
Chaudhury, Sidhartha ;
Lyskov, Sergey ;
Gray, Jeffrey J. .
BIOINFORMATICS, 2010, 26 (05) :689-691
[4]   The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities [J].
Chen, I-Min A. ;
Chu, Ken ;
Palaniappan, Krishnaveni ;
Ratner, Anna ;
Huang, Jinghua ;
Huntemann, Marcel ;
Hajek, Patrick ;
Ritter, Stephan ;
Varghese, Neha ;
Seshadri, Rekha ;
Roux, Simon ;
Woyke, Tanja ;
Eloe-Fadrosh, Emiley A. ;
Ivanova, Natalia N. ;
Kyrpides, Nikos C. .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D751-D763
[5]   Emerging methods in protein co-evolution [J].
de Juan, David ;
Pazos, Florencio ;
Valencia, Alfonso .
NATURE REVIEWS GENETICS, 2013, 14 (04) :249-261
[6]   Predicting the Real-Valued Inter-Residue Distances for Proteins [J].
Ding, Wenze ;
Gong, Haipeng .
ADVANCED SCIENCE, 2020, 7 (19)
[7]   Predicting protein residue-residue contacts using deep networks and boosting [J].
Eickholt, Jesse ;
Cheng, Jianlin .
BIOINFORMATICS, 2012, 28 (23) :3066-3072
[8]   Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints [J].
Greener, Joe G. ;
Kandathil, Shaun M. ;
Jones, David T. .
NATURE COMMUNICATIONS, 2019, 10 (1)
[9]   Hidden Markov model speed heuristic and iterative HMM search procedure [J].
Johnson, L. Steven ;
Eddy, Sean R. ;
Portugaly, Elon .
BMC BIOINFORMATICS, 2010, 11
[10]   MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins [J].
Jones, David T. ;
Singh, Tanya ;
Kosciolek, Tomasz ;
Tetchner, Stuart .
BIOINFORMATICS, 2015, 31 (07) :999-1006