The InterPro protein families database: the classification resource after 15 years

被引:958
作者
Mitchell, Alex [1 ]
Chang, Hsin-Yu [1 ]
Daugherty, Louise [1 ]
Fraser, Matthew [1 ]
Hunter, Sarah [1 ]
Lopez, Rodrigo [1 ]
McAnulla, Craig [1 ]
McMenamin, Conor [1 ]
Nuka, Gift [1 ]
Pesseat, Sebastien [1 ]
Sangrador-Vegas, Amaia [1 ]
Scheremetjew, Maxim [1 ]
Rato, Claudia [1 ]
Yong, Siew-Yit [1 ]
Bateman, Alex [1 ]
Punta, Marco
Attwood, Teresa K. [2 ,3 ]
Sigrist, Christian J. A. [4 ]
Redaschi, Nicole [4 ]
Rivoire, Catherine [4 ]
Xenarios, Ioannis [4 ,5 ,6 ]
Kahn, Daniel [7 ]
Guyot, Dominique [7 ]
Bork, Peer [8 ]
Letunic, Ivica [8 ]
Gough, Julian [9 ]
Oates, Matt [9 ]
Haft, Daniel [10 ]
Huang, Hongzhan [11 ]
Natale, Darren A. [11 ]
Wu, Cathy H. [11 ,12 ]
Orengo, Christine [13 ]
Sillitoe, Ian [13 ]
Mi, Huaiyu [14 ]
Thomas, Paul D. [14 ]
Finn, Robert D. [1 ]
机构
[1] European Bioinformat Inst EMBL EBI, European Mol Biol Lab, Cambridge CB10 1SD, England
[2] Univ Manchester, Fac Life Sci, Manchester M13 9PL, Lancs, England
[3] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
[4] SIB, CH-1211 Geneva 4, Switzerland
[5] Univ Lausanne, Ctr Integrat Genom, CH-1015 Lausanne, Switzerland
[6] Univ Geneva, Dept Biochem, CH-1211 Geneva, Switzerland
[7] Univ Lyon 1, PRABI, F-69622 Villeurbanne, France
[8] European Mol Lab EMBL, D-69117 Heidelberg, Germany
[9] Univ Bristol, Dept Comp Sci, Bristol BS8 1UB, Avon, England
[10] JCVI, Rockville, MD 20850 USA
[11] Georgetown Univ, Med Ctr, Washington, DC 20007 USA
[12] Univ Delaware, Ctr Bioinformat & Computat Biol, Newark, DE 19711 USA
[13] UCL, Struct & Mol Biol Dept, London WC1E 6BT, England
[14] Univ So Calif, Dept Prevent Med, Div Bioinformat, Los Angeles, CA 90089 USA
基金
英国生物技术与生命科学研究理事会;
关键词
GENE ONTOLOGY; ANNOTATION; TOPOLOGY; PROJECT; SCOP;
D O I
10.1093/nar/gku1243
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36 766 member database signatures integrated into 26 238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.
引用
收藏
页码:D213 / D221
页数:9
相关论文
共 32 条
[1]   Rhea-a manually curated resource of biochemical reactions [J].
Alcantara, Rafael ;
Axelsen, Kristian B. ;
Morgat, Anne ;
Belda, Eugeni ;
Coudert, Elisabeth ;
Bridge, Alan ;
Cao, Hong ;
de Matos, Paula ;
Ennis, Marcus ;
Turner, Steve ;
Owen, Gareth ;
Bougueleret, Lydie ;
Xenarios, Ioannis ;
Steinbeck, Christoph .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D754-D760
[2]   Data growth and its impact on the SCOP database: new developments [J].
Andreeva, Antonina ;
Howorth, Dave ;
Chandonia, John-Marc ;
Brenner, Steven E. ;
Hubbard, Tim J. P. ;
Chothia, Cyrus ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D419-D425
[3]  
[Anonymous], NUCLEIC ACIDS RES
[4]   Activities at the Universal Protein Resource (UniProt) [J].
Apweiler, Rolf ;
Bateman, Alex ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Alpi, Emanuele ;
Antunes, Ricardo ;
Arganiska, Joanna ;
Casanova, Elisabet Barrera ;
Bely, Benoit ;
Bingley, Mark ;
Bonilla, Carlos ;
Britto, Ramona ;
Bursteinas, Borisas ;
Chan, Wei Mun ;
Chavali, Gayatri ;
Cibrian-Uhalte, Elena ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Fazzini, Francesco ;
Gane, Paul ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Hatton-Ellis, Emma ;
Hieta, Reija ;
Huntley, Rachael ;
Legge, Duncan ;
Liu, Wudong ;
Luo, Jie ;
MacDougall, Alistair ;
Mutowo, Prudence ;
Nightingale, Andrew ;
Orchard, Sandra ;
Pichler, Klemens ;
Poggioli, Diego ;
Pundir, Sangya ;
Pureza, Luis ;
Qi, Guoying ;
Rosanoff, Steven ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Turner, Edward ;
Volynkin, Vladimir ;
Wardell, Tony ;
Watkins, Xavier ;
Zellner, Hermann ;
Corbett, Matt .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D191-D198
[5]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[6]   The PRINTS database: a fine-grained protein sequence annotation and analysis resource-its status in 2012 [J].
Attwood, Teresa K. ;
Coletta, Alain ;
Muirhead, Gareth ;
Pavlopoulou, Athanasia ;
Philippou, Peter B. ;
Popov, Ivan ;
Roma-Mateo, Carlos ;
Theodosiou, Athina ;
Mitchell, Alex L. .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2012,
[7]   The ENZYME database in 2000 [J].
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :304-305
[8]   The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data [J].
Berman, Helen ;
Henrick, Kim ;
Nakamura, Haruki ;
Markley, John L. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D301-D303
[9]   The ProDom database of protein domain families: more emphasis on 3D [J].
Bru, C ;
Courcelle, E ;
Carrre, S ;
Beausse, Y ;
Dalmar, S ;
Kahn, D .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D212-D215
[10]   The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases [J].
Caspi, Ron ;
Altman, Tomer ;
Billington, Richard ;
Dreher, Kate ;
Foerster, Hartmut ;
Fulcher, Carol A. ;
Holland, Timothy A. ;
Keseler, Ingrid M. ;
Kothari, Anamika ;
Kubo, Aya ;
Krummenacker, Markus ;
Latendresse, Mario ;
Mueller, Lukas A. ;
Ong, Quang ;
Paley, Suzanne ;
Subhraveti, Pallavi ;
Weaver, Daniel S. ;
Weerasinghe, Deepika ;
Zhang, Peifen ;
Karp, Peter D. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D459-D471