Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints

被引:128
作者
Greener, Joe G. [1 ,2 ]
Kandathil, Shaun M. [1 ,2 ]
Jones, David T. [1 ,2 ]
机构
[1] UCL, Dept Comp Sci, Gower St, London WC1E 6BT, England
[2] Francis Crick Inst, 1 Midland Rd, London NW1 1AT, England
基金
欧洲研究理事会; 英国医学研究理事会; 英国惠康基金;
关键词
IMPROVED CONTACT PREDICTIONS; SECONDARY STRUCTURE; RECOGNITION;
D O I
10.1038/s41467-019-11994-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The inapplicability of amino acid covariation methods to small protein families has limited their use for structural annotation of whole genomes. Recently, deep learning has shown promise in allowing accurate residue-residue contact prediction even for shallow sequence alignments. Here we introduce DMPfold, which uses deep learning to predict inter-atomic distance bounds, the main chain hydrogen bond network, and torsion angles, which it uses to build models in an iterative fashion. DMPfold produces more accurate models than two popular methods for a test set of CASP12 domains, and works just as well for transmembrane proteins. Applied to all Pfam domains without known structures, confident models for 25% of these so-called dark families were produced in under a week on a small 200 core cluster. DMPfold provides models for 16% of human proteome UniProt entries without structures, generates accurate models with fewer than 100 sequences in some cases, and is freely available.
引用
收藏
页数:13
相关论文
共 49 条
[1]   CONFOLD2: improved contact-driven ab initio protein structure modeling [J].
Adhikari, Badri ;
Cheng, Jianlin .
BMC BIOINFORMATICS, 2018, 19
[2]   End-to-End Differentiable Learning of Protein Structure [J].
AlQuraishi, Mohammed .
CELL SYSTEMS, 2019, 8 (04) :292-+
[3]  
[Anonymous], 2018, ICLR 2019 C BLIND SU
[4]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
[5]   GLOBAL FOLD DETERMINATION FROM A SMALL NUMBER OF DISTANCE RESTRAINTS [J].
ASZODI, A ;
GRADWELL, MJ ;
TAYLOR, WR .
JOURNAL OF MOLECULAR BIOLOGY, 1995, 251 (02) :308-326
[6]   Protocols for Molecular Modeling with Rosetta3 and RosettaScripts [J].
Bender, Brian J. ;
Cisneros, Alberto ;
Duran, Amanda M. ;
Finn, Jessica A. ;
Fu, Darwin ;
Lokits, Alyssa D. ;
Mueller, Benjamin K. ;
Sangha, Amandeep K. ;
Sauer, Marion F. ;
Sevy, Alexander M. ;
Sliwoski, Gregory ;
Sheehan, Jonathan H. ;
DiMaio, Frank ;
Meiler, Jens ;
Moretti, Rocco .
BIOCHEMISTRY, 2016, 55 (34) :4748-4763
[7]   Version 1.2 of the Crystallography and NMR system [J].
Brunger, Axel T. .
NATURE PROTOCOLS, 2007, 2 (11) :2728-2733
[8]   Emerging methods in protein co-evolution [J].
de Juan, David ;
Pazos, Florencio ;
Valencia, Alfonso .
NATURE REVIEWS GENETICS, 2013, 14 (04) :249-261
[9]   PSI-2: Structural Genomics to Cover Protein Domain Family Space [J].
Dessailly, Benoit H. ;
Nair, Rajesh ;
Jaroszewski, Lukasz ;
Fajardo, J. Eduardo ;
Kouranov, Andrei ;
Lee, David ;
Fiser, Andras ;
Godzik, Adam ;
Rost, Burkhard ;
Orengo, Christine .
STRUCTURE, 2009, 17 (06) :869-881
[10]   The Pfam protein families database in 2019 [J].
El-Gebali, Sara ;
Mistry, Jaina ;
Bateman, Alex ;
Eddy, Sean R. ;
Luciani, Aurelien ;
Potter, Simon C. ;
Qureshi, Matloob ;
Richardson, Lorna J. ;
Salazar, Gustavo A. ;
Smart, Alfredo ;
Sonnhammer, Erik L. L. ;
Hirsh, Layla ;
Paladin, Lisanna ;
Piovesan, Damiano ;
Tosatto, Silvio C. E. ;
Finn, Robert D. .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D427-D432