Genome3D: integrating a collaborative data pipeline to expand the depth and breadth of consensus protein structure annotation

被引:11
作者
Sillitoe, Ian [1 ]
Andreeva, Antonina [2 ]
Blundell, Tom L. [3 ]
Buchan, Daniel W. A. [4 ,5 ]
Finn, Robert [6 ]
Gough, Julian [2 ]
Jones, David [4 ,5 ]
Kelley, Lawrence A. [7 ]
Paysan-Lafosse, Typhaine [6 ]
Lam, Su Datt [1 ,8 ]
Murzin, Alexey G. [2 ]
Pandurangan, Arun Prasad [2 ]
Salazar, Gustavo A. [6 ]
Skwark, Marcin J. [3 ]
Sternberg, Michael J. E. [7 ]
Velankar, Sameer [6 ]
Orengo, Christine [1 ]
机构
[1] UCL, Inst Struct & Mol Biol, Gower St, London WC1E 6BT, England
[2] MRC Lab Mol Biol, Francis Crick Ave, Cambridge CB2 0QH, England
[3] Univ Cambridge, Dept Biochem, Old Addenbrookes Site,80 Tennis Court Rd, Cambridge CB2 0QH, England
[4] UCL, Dept Comp Sci, Gower St, London WC1E 6BT, England
[5] Francis Crick Inst, 1 Midland Rd, London NW1 1AT, England
[6] European Bioinformat Inst, Wellcome Trust Genome Campus, Hinxton CB10 1SD, Cambs, England
[7] Imperial Coll London, Ctr Bioinformat, Dept Life Sci, London SW7 2AZ, England
[8] Univ Kebangsaan Malaysia, Fac Sci & Technol, Bangi 43600, Selangor, Malaysia
基金
英国生物技术与生命科学研究理事会;
关键词
SEQUENCES; SCOP;
D O I
10.1093/nar/gkz967
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.
引用
收藏
页码:D314 / D319
页数:6
相关论文
共 15 条
  • [1] UniProt: the universal protein knowledgebase
    Bateman, Alex
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Alpi, Emanuele
    Antunes, Ricardo
    Bely, Benoit
    Bingley, Mark
    Bonilla, Carlos
    Britto, Ramona
    Bursteinas, Borisas
    Bye-A-Jee, Hema
    Cowley, Andrew
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Fazzini, Francesco
    Castro, Leyla Garcia
    Figueira, Luis
    Garmiri, Penelope
    Georghiou, George
    Gonzalez, Daniel
    Hatton-Ellis, Emma
    Li, Weizhong
    Liu, Wudong
    Lopez, Rodrigo
    Luo, Jie
    Lussi, Yvonne
    MacDougall, Alistair
    Nightingale, Andrew
    Palka, Barbara
    Pichler, Klemens
    Poggioli, Diego
    Pundir, Sangya
    Pureza, Luis
    Qi, Guoying
    Rosanoff, Steven
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Speretta, Elena
    Turner, Edward
    Tyagi, Nidhi
    Volynkin, Vladimir
    Wardell, Tony
    Warner, Kate
    Watkins, Xavier
    Zaru, Rossana
    Zellner, Hermann
    Xenarios, Ioannis
    [J]. NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) : D158 - D169
  • [2] The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data
    Berman, Helen
    Henrick, Kim
    Nakamura, Haruki
    Markley, John L.
    [J]. NUCLEIC ACIDS RESEARCH, 2007, 35 : D301 - D303
  • [3] Protein annotation and modelling servers at University College London
    Buchan, D. W. A.
    Ward, S. M.
    Lobley, A. E.
    Nugent, T. C. O.
    Bryson, K.
    Jones, D. T.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : W563 - W568
  • [4] CATH: an expanded resource to predict protein function through structure and sequence
    Dawson, Natalie L.
    Lewis, Tony E.
    Das, Sayoni
    Lees, Jonathan G.
    Lee, David
    Ashford, Paul
    Orengo, Christine A.
    Sillitoe, Ian
    [J]. NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) : D289 - D295
  • [5] The Phyre2 web portal for protein modeling, prediction and analysis
    Kelley, Lawrence A.
    Mezulis, Stefans
    Yates, Christopher M.
    Wass, Mark N.
    Sternberg, Michael J. E.
    [J]. NATURE PROTOCOLS, 2015, 10 (06) : 845 - 858
  • [6] Lewis TE, 2018, NUCLEIC ACIDS RES, V46, pD1282, DOI 10.1093/nar/gkx1187
  • [7] Genome3D: exploiting structure to help users understand their sequences
    Lewis, Tony E.
    Sillitoe, Ian
    Andreeva, Antonina
    Blundell, Tom L.
    Buchan, Daniel W. A.
    Chothia, Cyrus
    Cozzetto, Domenico
    Dana, Jose M.
    Filippis, Ioannis
    Gough, Julian
    Jones, David T.
    Kelley, Lawrence A.
    Kleywegt, Gerard J.
    Minneci, Federico
    Mistry, Jaina
    Murzin, Alexey G.
    Ochoa-Montano, Bernardo
    Oates, Matt E.
    Punta, Marco
    Rackham, Owen J. L.
    Stahlhacke, Jonathan
    Sternberg, Michael J. E.
    Velankar, Sameer
    Orengo, Christine
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D382 - D386
  • [8] Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains
    Lewis, Tony E.
    Sillitoe, Ian
    Andreeva, Antonina
    Blundell, Tom L.
    Buchan, Daniel W. A.
    Chothia, Cyrus
    Cuff, Alison
    Dana, Jose M.
    Filippis, Ioannis
    Gough, Julian
    Hunter, Sarah
    Jones, David T.
    Kelley, Lawrence A.
    Kleywegt, Gerard J.
    Minneci, Federico
    Mitchell, Alex
    Murzin, Alexey G.
    Ochoa-Montano, Bernardo
    Rackham, Owen J. L.
    Smith, James
    Sternberg, Michael J. E.
    Velankar, Sameer
    Yeats, Corin
    Orengo, Christine
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D499 - D507
  • [9] PDBe: towards reusable data delivery infrastructure at protein data bank in Europe
    Mir, Saqib
    Alhroub, Younes
    Anyango, Stephen
    Armstrong, David R.
    Berrisford, John M.
    Clark, Alice R.
    Conroy, Matthew J.
    Dana, Jose M.
    Deshpande, Mandar
    Gupta, Deepti
    Gutmanas, Aleksandras
    Haslam, Pauline
    Mak, Lora
    Mukhopadhyay, Abhik
    Nadzirin, Nurul
    Paysan-Lafosse, Typhaine
    Sehnal, David
    Sen, Sanchayita
    Smart, Oliver S.
    Varadi, Mihaly
    Kleywegt, Gerard J.
    Velankar, Sameer
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) : D486 - D492
  • [10] InterPro in 2019: improving coverage, classification and access to protein sequence annotations
    Mitchell, Alex L.
    Attwood, Teresa K.
    Babbitt, Patricia C.
    Blum, Matthias
    Bork, Peer
    Bridge, Alan
    Brown, Shoshana D.
    Chang, Hsin-Yu
    El-Gebali, Sara
    Fraser, Matthew I.
    Gough, Julian
    Haft, David R.
    Huang, Hongzhan
    Letunic, Ivica
    Lopez, Rodrigo
    Luciani, Aurelien
    Madeira, Fabio
    Marchler-Bauer, Aron
    Mi, Huaiyu
    Natale, Darren A.
    Necci, Marco
    Nuka, Gift
    Orengo, Christine
    Pandurangan, Arun P.
    Paysan-Lafosse, Typhaine
    Pesseat, Sebastien
    Potter, Simon C.
    Qureshi, Matloob A.
    Rawlings, Neil D.
    Redaschi, Nicole
    Richardson, Lorna J.
    Rivoire, Catherine
    Salazar, Gustavo A.
    Sangrador-Vegas, Amaia
    Sigrist, Christian J. A.
    Sillitoe, Ian
    Sutton, Granger G.
    Thanki, Narmada
    Thomas, Paul D.
    Tosatto, Silvio C. E.
    Yong, Siew-Yit
    Finn, Robert D.
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D351 - D360