Highly accurate protein structure prediction for the human proteome

被引：1785

作者：

Tunyasuvunakool, Kathryn ^{[1
,2
]}

Adler, Jonas ^{[1
]}

Wu, Zachary ^{[1
]}

Green, Tim ^{[1
]}

Zielinski, Michal ^{[1
]}

Zidek, Augustin ^{[1
]}

Bridgland, Alex ^{[1
]}

Cowie, Andrew ^{[1
]}

Meyer, Clemens ^{[1
]}

Laydon, Agata ^{[1
]}

Velankar, Sameer ^{[2
]}

Kleywegt, Gerard J. ^{[2
]}

Bateman, Alex ^{[2
]}

Evans, Richard ^{[1
]}

Pritzel, Alexander ^{[1
]}

Figurnov, Michael ^{[1
]}

Ronneberger, Olaf ^{[1
]}

Bates, Russ ^{[1
]}

Kohl, Simon A. A. ^{[1
]}

Potapenko, Anna ^{[1
]}

Ballard, Andrew J. ^{[1
]}

Romera-Paredes, Bernardino ^{[1
]}

Nikolov, Stanislav ^{[1
]}

Jain, Rishub ^{[1
]}

Clancy, Ellen ^{[1
]}

Reiman, David ^{[1
]}

Petersen, Stig ^{[1
]}

Senior, Andrew W. ^{[1
]}

Kavukcuoglu, Koray ^{[1
]}

Birney, Ewan ^{[2
]}

Kohli, Pushmeet ^{[1
]}

Jumper, John ^{[1
,2
]}

Hassabis, Demis ^{[1
,2
]}

机构：

[1] DeepMind, London, England

[2] European Bioinformat Inst, European Mol Biol Lab, Hinxton, England

来源：

NATURE | 2021年 / 596卷 / 7873期

关键词：

INTRINSIC DISORDER; SCORING FUNCTION; OPTIMIZATION; DISCOVERY; TOPOLOGY; DATABASE; MODELS;

D O I：

10.1038/s41586-021-03828-1

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure(1). Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold(2), at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.

引用

页码：590 / +

页数：19

共 75 条

[51] ModBase, a database of annotated comparative protein structure models and associated resources
Pieper, Ursula
Webb, Benjamin M.
Dong, Guang Qiang
Schneidman-Duhovny, Dina
Fan, Hao
Kim, Seung Joong
Khuri, Natalia
Spill, Yannick G.
Weinkam, Patrick
Hammel, Michal
Tainer, John A.
Nilges, Michael
Sali, Andrej
[J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) : D336 - D346
[52] Wolfram syndrome and WFS1 gene
Rigoli, L.
Lombardo, F.
Di Bella, C.
[J]. CLINICAL GENETICS, 2011, 79 (02) : 103 - 117
[53] Functional Innovation in the Evolution of the Calcium-Dependent System of the Eukaryotic Endoplasmic Reticulum
Schaffer, Daniel E.
Iyer, Lakshminarayan M.
Burroughs, A. Maxwell
Aravind, L.
[J]. FRONTIERS IN GENETICS, 2020, 11
[54] Schrodinger L., 2020, PYMOL
[55] Improved protein structure prediction using potentials from deep learning
Senior, Andrew W.
Evans, Richard
Jumper, John
Kirkpatrick, James
Sifre, Laurent
Green, Tim
Qin, Chongli
Zidek, Augustin
Nelson, Alexander W. R.
Bridgland, Alex
Penedones, Hugo
Petersen, Stig
Simonyan, Karen
Crossan, Steve
Kohli, Pushmeet
Jones, David T.
Silver, David
Kavukcuoglu, Koray
Hassabis, Demis
[J]. NATURE, 2020, 577 (7792) : 706 - +
[56] Genome3D: integrating a collaborative data pipeline to expand the depth and breadth of consensus protein structure annotation
Sillitoe, Ian
Andreeva, Antonina
Blundell, Tom L.
Buchan, Daniel W. A.
Finn, Robert
Gough, Julian
Jones, David
Kelley, Lawrence A.
Paysan-Lafosse, Typhaine
Lam, Su Datt
Murzin, Alexey G.
Pandurangan, Arun Prasad
Salazar, Gustavo A.
Skwark, Marcin J.
Sternberg, Michael J. E.
Velankar, Sameer
Orengo, Christine
[J]. NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) : D314 - D319
[57] CATH: expanding the horizons of structure-based functional annotations for genome sequences
Sillitoe, Ian
Dawson, Natalie
Lewis, Tony E.
Das, Sayoni
Lees, Jonathan G.
Ashford, Paul
Tolulope, Adeyelu
Scholes, Harry M.
Senatorov, Ilya
Bujan, Andra
Rodriguez-Conde, Fatima Ceballos
Dowling, Benjamin
Thornton, Janet
Orengo, Christine A.
[J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D280 - D284
[58] The challenge of protein structure determination - lessons from structural genomics
Slabinski, Lukasz
Jaroszewski, Lukasz
Rodrigues, Ana P. C.
Rychlewski, Leszek
Wilson, Ian A.
Lesley, Scott A.
Godzik, Adam
[J]. PROTEIN SCIENCE, 2007, 16 (11) : 2472 - 2482
[59] THE CRYSTAL-STRUCTURE OF PERTUSSIS TOXIN
STEIN, PE
BOODHOO, A
ARMSTRONG, GD
COCKLE, SA
KLEIN, MH
READ, RJ
[J]. STRUCTURE, 1994, 2 (01) : 45 - 57
[60] Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold
Steinegger, Martin
Mirdita, Milot
Soeding, Johannes
[J]. NATURE METHODS, 2019, 16 (07) : 603 - +

← 1 2 3 4 5 6 7 8 →