Highly accurate protein structure prediction for the human proteome

被引:1785
作者
Tunyasuvunakool, Kathryn [1 ,2 ]
Adler, Jonas [1 ]
Wu, Zachary [1 ]
Green, Tim [1 ]
Zielinski, Michal [1 ]
Zidek, Augustin [1 ]
Bridgland, Alex [1 ]
Cowie, Andrew [1 ]
Meyer, Clemens [1 ]
Laydon, Agata [1 ]
Velankar, Sameer [2 ]
Kleywegt, Gerard J. [2 ]
Bateman, Alex [2 ]
Evans, Richard [1 ]
Pritzel, Alexander [1 ]
Figurnov, Michael [1 ]
Ronneberger, Olaf [1 ]
Bates, Russ [1 ]
Kohl, Simon A. A. [1 ]
Potapenko, Anna [1 ]
Ballard, Andrew J. [1 ]
Romera-Paredes, Bernardino [1 ]
Nikolov, Stanislav [1 ]
Jain, Rishub [1 ]
Clancy, Ellen [1 ]
Reiman, David [1 ]
Petersen, Stig [1 ]
Senior, Andrew W. [1 ]
Kavukcuoglu, Koray [1 ]
Birney, Ewan [2 ]
Kohli, Pushmeet [1 ]
Jumper, John [1 ,2 ]
Hassabis, Demis [1 ,2 ]
机构
[1] DeepMind, London, England
[2] European Bioinformat Inst, European Mol Biol Lab, Hinxton, England
关键词
INTRINSIC DISORDER; SCORING FUNCTION; OPTIMIZATION; DISCOVERY; TOPOLOGY; DATABASE; MODELS;
D O I
10.1038/s41586-021-03828-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure(1). Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold(2), at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.
引用
收藏
页码:590 / +
页数:19
相关论文
共 75 条
  • [51] ModBase, a database of annotated comparative protein structure models and associated resources
    Pieper, Ursula
    Webb, Benjamin M.
    Dong, Guang Qiang
    Schneidman-Duhovny, Dina
    Fan, Hao
    Kim, Seung Joong
    Khuri, Natalia
    Spill, Yannick G.
    Weinkam, Patrick
    Hammel, Michal
    Tainer, John A.
    Nilges, Michael
    Sali, Andrej
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) : D336 - D346
  • [52] Wolfram syndrome and WFS1 gene
    Rigoli, L.
    Lombardo, F.
    Di Bella, C.
    [J]. CLINICAL GENETICS, 2011, 79 (02) : 103 - 117
  • [53] Functional Innovation in the Evolution of the Calcium-Dependent System of the Eukaryotic Endoplasmic Reticulum
    Schaffer, Daniel E.
    Iyer, Lakshminarayan M.
    Burroughs, A. Maxwell
    Aravind, L.
    [J]. FRONTIERS IN GENETICS, 2020, 11
  • [54] Schrodinger L., 2020, PYMOL
  • [55] Improved protein structure prediction using potentials from deep learning
    Senior, Andrew W.
    Evans, Richard
    Jumper, John
    Kirkpatrick, James
    Sifre, Laurent
    Green, Tim
    Qin, Chongli
    Zidek, Augustin
    Nelson, Alexander W. R.
    Bridgland, Alex
    Penedones, Hugo
    Petersen, Stig
    Simonyan, Karen
    Crossan, Steve
    Kohli, Pushmeet
    Jones, David T.
    Silver, David
    Kavukcuoglu, Koray
    Hassabis, Demis
    [J]. NATURE, 2020, 577 (7792) : 706 - +
  • [56] Genome3D: integrating a collaborative data pipeline to expand the depth and breadth of consensus protein structure annotation
    Sillitoe, Ian
    Andreeva, Antonina
    Blundell, Tom L.
    Buchan, Daniel W. A.
    Finn, Robert
    Gough, Julian
    Jones, David
    Kelley, Lawrence A.
    Paysan-Lafosse, Typhaine
    Lam, Su Datt
    Murzin, Alexey G.
    Pandurangan, Arun Prasad
    Salazar, Gustavo A.
    Skwark, Marcin J.
    Sternberg, Michael J. E.
    Velankar, Sameer
    Orengo, Christine
    [J]. NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) : D314 - D319
  • [57] CATH: expanding the horizons of structure-based functional annotations for genome sequences
    Sillitoe, Ian
    Dawson, Natalie
    Lewis, Tony E.
    Das, Sayoni
    Lees, Jonathan G.
    Ashford, Paul
    Tolulope, Adeyelu
    Scholes, Harry M.
    Senatorov, Ilya
    Bujan, Andra
    Rodriguez-Conde, Fatima Ceballos
    Dowling, Benjamin
    Thornton, Janet
    Orengo, Christine A.
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D280 - D284
  • [58] The challenge of protein structure determination - lessons from structural genomics
    Slabinski, Lukasz
    Jaroszewski, Lukasz
    Rodrigues, Ana P. C.
    Rychlewski, Leszek
    Wilson, Ian A.
    Lesley, Scott A.
    Godzik, Adam
    [J]. PROTEIN SCIENCE, 2007, 16 (11) : 2472 - 2482
  • [59] THE CRYSTAL-STRUCTURE OF PERTUSSIS TOXIN
    STEIN, PE
    BOODHOO, A
    ARMSTRONG, GD
    COCKLE, SA
    KLEIN, MH
    READ, RJ
    [J]. STRUCTURE, 1994, 2 (01) : 45 - 57
  • [60] Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold
    Steinegger, Martin
    Mirdita, Milot
    Soeding, Johannes
    [J]. NATURE METHODS, 2019, 16 (07) : 603 - +