Highly accurate protein structure prediction for the human proteome

被引:1785
作者
Tunyasuvunakool, Kathryn [1 ,2 ]
Adler, Jonas [1 ]
Wu, Zachary [1 ]
Green, Tim [1 ]
Zielinski, Michal [1 ]
Zidek, Augustin [1 ]
Bridgland, Alex [1 ]
Cowie, Andrew [1 ]
Meyer, Clemens [1 ]
Laydon, Agata [1 ]
Velankar, Sameer [2 ]
Kleywegt, Gerard J. [2 ]
Bateman, Alex [2 ]
Evans, Richard [1 ]
Pritzel, Alexander [1 ]
Figurnov, Michael [1 ]
Ronneberger, Olaf [1 ]
Bates, Russ [1 ]
Kohl, Simon A. A. [1 ]
Potapenko, Anna [1 ]
Ballard, Andrew J. [1 ]
Romera-Paredes, Bernardino [1 ]
Nikolov, Stanislav [1 ]
Jain, Rishub [1 ]
Clancy, Ellen [1 ]
Reiman, David [1 ]
Petersen, Stig [1 ]
Senior, Andrew W. [1 ]
Kavukcuoglu, Koray [1 ]
Birney, Ewan [2 ]
Kohli, Pushmeet [1 ]
Jumper, John [1 ,2 ]
Hassabis, Demis [1 ,2 ]
机构
[1] DeepMind, London, England
[2] European Bioinformat Inst, European Mol Biol Lab, Hinxton, England
关键词
INTRINSIC DISORDER; SCORING FUNCTION; OPTIMIZATION; DISCOVERY; TOPOLOGY; DATABASE; MODELS;
D O I
10.1038/s41586-021-03828-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure(1). Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold(2), at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.
引用
收藏
页码:590 / +
页数:19
相关论文
共 75 条
  • [1] Targeting diacylglycerol acyltransferase 2 for the treatment of nonalcoholic steatohepatitis
    Amin, Neeta B.
    Carvajal-Gonzalez, Santos
    Purkal, Julie
    Zhu, Tong
    Crowley, Collin
    Perez, Sylvie
    Chidsey, Kristin
    Kim, Albert M.
    Goodwin, Bryan
    [J]. SCIENCE TRANSLATIONAL MEDICINE, 2019, 11 (520)
  • [2] The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures
    Andreeva, Antonina
    Kulesha, Eugene
    Gough, Julian
    Murzin, Alexey G.
    [J]. NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) : D376 - D382
  • [3] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [4] Why do eukaryotic proteins contain more intrinsically disordered regions?
    Basile, Walter
    Salvatore, Marco
    Bassot, Claudio
    Elofsson, Arne
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (07)
  • [5] UniProt: the universal protein knowledgebase in 2021
    Bateman, Alex
    Martin, Maria-Jesus
    Orchard, Sandra
    Magrane, Michele
    Agivetova, Rahat
    Ahmad, Shadab
    Alpi, Emanuele
    Bowler-Barnett, Emily H.
    Britto, Ramona
    Bursteinas, Borisas
    Bye-A-Jee, Hema
    Coetzee, Ray
    Cukura, Austra
    Da Silva, Alan
    Denny, Paul
    Dogan, Tunca
    Ebenezer, ThankGod
    Fan, Jun
    Castro, Leyla Garcia
    Garmiri, Penelope
    Georghiou, George
    Gonzales, Leonardo
    Hatton-Ellis, Emma
    Hussein, Abdulrahman
    Ignatchenko, Alexandr
    Insana, Giuseppe
    Ishtiaq, Rizwan
    Jokinen, Petteri
    Joshi, Vishal
    Jyothi, Dushyanth
    Lock, Antonia
    Lopez, Rodrigo
    Luciani, Aurelien
    Luo, Jie
    Lussi, Yvonne
    Mac-Dougall, Alistair
    Madeira, Fabio
    Mahmoudy, Mahdi
    Menchi, Manuela
    Mishra, Alok
    Moulang, Katie
    Nightingale, Andrew
    Oliveira, Carla Susana
    Pundir, Sangya
    Qi, Guoying
    Raj, Shriya
    Rice, Daniel
    Lopez, Milagros Rodriguez
    Saidi, Rabie
    Sampson, Joseph
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D480 - D489
  • [6] Improving homology modeling from low-sequence identity templates in Rosetta: A case study in GPCRs
    Bender, Brian Joseph
    Marlow, Brennica
    Meiler, Jens
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (10)
  • [7] Finding Our Way in the Dark Proteome
    Bhowmick, Asmit
    Brookes, David H.
    Yost, Shane R.
    Dyson, H. Jane
    Forman-Kay, Julie D.
    Gunter, Daniel
    Head-Gordon, Martin
    Hura, Gregory L.
    Pande, Vijay S.
    Wemmer, David E.
    Wright, Peter E.
    Head-Gordon, Teresa
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2016, 138 (31) : 9730 - 9742
  • [8] Discovery of a Potent, Selective, and Orally Efficacious Pyrimidinooxazinyl Bicyclooctaneacetic Acid Diacylglycerol Acyltransferase-1 Inhibitor
    Birch, Alan M.
    Birtles, Susan
    Buckett, Linda K.
    Kemmitt, Paul D.
    Smith, Graham J.
    Smith, Tim J. D.
    Turnbull, Andrew V.
    Wang, Steven J. Y.
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 2009, 52 (06) : 1558 - 1568
  • [9] Protein Data Bank: the single global archive for 3D macromolecular structure data
    Burley, Stephen K.
    Berman, Helen M.
    Bhikadiya, Charmi
    Bi, Chunxiao
    Chen, Li
    Di Costanzo, Luigi
    Christie, Cole
    Duarte, Jose M.
    Dutta, Shuchismita
    Feng, Zukang
    Ghosh, Sutapa
    Goodsell, David S.
    Green, Rachel Kramer
    Guranovic, Vladimir
    Guzenko, Dmytro
    Hudson, Brian P.
    Liang, Yuhe
    Lowe, Robert
    Peisach, Ezra
    Periskova, Irina
    Randle, Chris
    Rose, Alexander
    Sekharan, Monica
    Shao, Chenghua
    Tao, Yi-Ping
    Valasatava, Yana
    Voigt, Maria
    Westbrook, John
    Young, Jasmine
    Zardecki, Christine
    Zhuravleva, Marina
    Kurisu, Genji
    Nakamura, Haruki
    Kengaku, Yumiko
    Cho, Hasumi
    Sato, Junko
    Kim, Ju Yaen
    Ikegawa, Yasuyo
    Nakagawa, Atsushi
    Yamashita, Reiko
    Kudou, Takahiro
    Bekker, Gert-Jan
    Suzuki, Hirofumi
    Iwata, Takeshi
    Yokochi, Masashi
    Kobayashi, Naohiro
    Fujiwara, Toshimichi
    Velankar, Sameer
    Kleywegt, Gerard J.
    Anyango, Stephen
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D520 - D528
  • [10] Structure-function analysis of diacylglycerol acyltransferase sequences from 70 organisms
    Cao H.
    [J]. BMC Research Notes, 4 (1)