Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis

被引:39
作者
Lees, Jonathan G. [1 ]
Lee, David [1 ]
Studer, Romain A.
Dawson, Natalie L. [1 ]
Sillitoe, Ian [1 ]
Das, Sayoni [1 ]
Yeats, Corin [2 ]
Dessailly, Benoit H. [1 ]
Rentzsch, Robert [3 ]
Orengo, Christine A. [1 ]
机构
[1] UCL, Inst Struct & Mol Biol, Div Biosci, London WC1E 6BT, England
[2] Univ London Imperial Coll Sci Technol & Med, Dept Infect Dis Epidemiol, London W2 1PG, England
[3] Res Grp Bioinformat Ng4, Robert Koch Inst, D-13353 Berlin, Germany
基金
英国惠康基金; 英国生物技术与生命科学研究理事会; 瑞士国家科学基金会; 美国国家卫生研究院;
关键词
DATABASE; RESOURCE; CATH;
D O I
10.1093/nar/gkt1205
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH super-families. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.
引用
收藏
页码:D240 / D245
页数:6
相关论文
共 28 条
  • [1] [Anonymous], NUCL ACIDS RES
  • [2] Update on activities at the Universal Protein Resource (UniProt) in 2013
    Apweiler, Rolf
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Alam-Faruque, Yasmin
    Alpi, Emanuela
    Antunes, Ricardo
    Arganiska, Joanna
    Casanova, Elisabet Barrera
    Bely, Benoit
    Bingley, Mark
    Bonilla, Carlos
    Britto, Ramona
    Bursteinas, Borisas
    Chan, Wei Mun
    Chavali, Gayatri
    Cibrian-Uhalte, Elena
    Da Silva, Alan
    De Giorgi, Maurizio
    Dimmer, Emily
    Fazzini, Francesco
    Gane, Paul
    Fedotov, Alexander
    Castro, Leyla Garcia
    Garmiri, Penelope
    Hatton-Ellis, Emma
    Hieta, Reija
    Huntley, Rachael
    Jacobsen, Julius
    Jones, Rachel
    Legge, Duncan
    Liu, Wudong
    Luo, Jie
    MacDougall, Alistair
    Mutowo, Prudence
    Nightingale, Andrew
    Orchard, Sandra
    Patient, Samuel
    Pichler, Klemens
    Poggioli, Diego
    Pundir, Sangya
    Pureza, Luis
    Qi, Guoying
    Rosanoff, Steven
    Sawford, Tony
    Sehra, Harminder
    Turner, Edward
    Volynkin, Vladimir
    Wardell, Tony
    Watkins, Xavier
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D43 - D47
  • [3] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [4] Domain rearrangements in protein evolution
    Björklund, ÅK
    Ekman, D
    Light, S
    Frey-Skött, J
    Elofsson, A
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2005, 353 (04) : 911 - 923
  • [5] BLAST plus : architecture and applications
    Camacho, Christiam
    Coulouris, George
    Avagyan, Vahram
    Ma, Ning
    Papadopoulos, Jason
    Bealer, Kevin
    Madden, Thomas L.
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [6] Extending CATH: increasing coverage of the protein structure universe and linking structure with function
    Cuff, Alison L.
    Sillitoe, Ian
    Lewis, Tony
    Clegg, Andrew B.
    Rentzsch, Robert
    Furnham, Nicholas
    Pellegrini-Calace, Marialuisa
    Jones, David
    Thornton, Janet
    Orengo, Christine A.
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D420 - D426
  • [7] The UniProt-GO Annotation database in 2011
    Dimmer, Emily C.
    Huntley, Rachael P.
    Alam-Faruque, Yasmin
    Sawford, Tony
    O'Donovan, Claire
    Martin, Maria J.
    Bely, Benoit
    Browne, Paul
    Chan, Wei Mun
    Eberhardt, Ruth
    Gardner, Michael
    Laiho, Kati
    Legge, Duncan
    Magrane, Michele
    Pichler, Klemens
    Poggioli, Diego
    Sehra, Harminder
    Auchincloss, Andrea
    Axelsen, Kristian
    Blatter, Marie-Claude
    Boutet, Emmanuel
    Braconi-Quintaje, Silvia
    Breuza, Lionel
    Bridge, Alan
    Coudert, Elizabeth
    Estreicher, Anne
    Famiglietti, Livia
    Ferro-Rojas, Serenella
    Feuermann, Marc
    Gos, Arnaud
    Gruaz-Gumowski, Nadine
    Hinz, Ursula
    Hulo, Chantal
    James, Janet
    Jimenez, Silvia
    Jungo, Florence
    Keller, Guillaume
    Lemercier, Phillippe
    Lieberherr, Damien
    Masson, Patrick
    Moinat, Madelaine
    Pedruzzi, Ivo
    Poux, Sylvain
    Rivoire, Catherine
    Roechert, Bernd
    Schneider, Michael
    Stutz, Andre
    Sundaram, Shyamala
    Tognolli, Michael
    Bougueleret, Lydie
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D565 - D570
  • [8] ELM-the database of eukaryotic linear motifs
    Dinkel, Holger
    Michael, Sushama
    Weatheritt, Robert J.
    Davey, Norman E.
    Van Roey, Kim
    Altenberg, Brigitte
    Toedt, Grischa
    Uyar, Bora
    Seiler, Markus
    Budd, Aidan
    Joedicke, Lisa
    Dammert, Marcel A.
    Schroeter, Christian
    Hammer, Maria
    Schmidt, Tobias
    Jehl, Peter
    McGuigan, Caroline
    Dymecka, Magdalena
    Chica, Claudia
    Luck, Katja
    Via, Allegra
    Chatr-Aryamontri, Andrew
    Haslam, Niall
    Grebnev, Gleb
    Edwards, Richard J.
    Steinmetz, Michel O.
    Meiselbach, Heike
    Diella, Francesca
    Gibson, Toby J.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D242 - D251
  • [9] BioJS']JS: an open source Java']JavaScript framework for biological data visualization
    Gomez, John
    Garcia, Leyla J.
    Salazar, Gustavo A.
    Villaveces, Jose
    Gore, Swanand
    Garcia, Alexander
    Martin, Maria J.
    Launay, Guillaume
    Alcantara, Rafael
    del-Toro, Noemi
    Dumousseau, Marine
    Orchard, Sandra
    Velankar, Sameer
    Hermjakob, Henning
    Zong, Chenggong
    Ping, Peipei
    Corpas, Manuel
    Jimenez, Rafael C.
    [J]. BIOINFORMATICS, 2013, 29 (08) : 1103 - 1104
  • [10] Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure
    Gough, J
    Karplus, K
    Hughey, R
    Chothia, C
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2001, 313 (04) : 903 - 919