Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation

被引:71
作者
Burge, Sarah [1 ]
Kelly, Elizabeth [2 ]
Lonsdale, David [1 ]
Mutowo-Muellenet, Prudence [1 ]
McAnulla, Craig [1 ]
Mitchell, Alex [1 ]
Sangrador-Vegas, Amaia [1 ]
Yong, Siew-Yit [1 ]
Mulder, Nicola [2 ]
Hunter, Sarah [1 ]
机构
[1] EMBL EBI, Hinxton CB10 1SD, Cambs, England
[2] Univ Cape Town, Dept Clin Lab Sci, Inst Infect Dis & Mol Med, Computat Biol Grp,Med Sch, ZA-7925 Cape Town, South Africa
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2012年
基金
英国生物技术与生命科学研究理事会;
关键词
GENE ONTOLOGY; DATABASE; DOMAINS; FAMILY; SETS;
D O I
10.1093/database/bar068
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
InterPro amalgamates predictive protein signatures from a number of well-known partner databases into a single resource. To aid with interpretation of results, InterPro entries are manually annotated with terms from the Gene Ontology (GO). The InterPro2GO mappings are comprised of the cross-references between these two resources and are the largest source of GO annotation predictions for proteins. Here, we describe the protocol by which InterPro curators integrate GO terms into the InterPro database. We discuss the unique challenges involved in integrating specific GO terms with entries that may describe a diverse set of proteins, and we illustrate, with examples, how InterPro hierarchies reflect GO terms of increasing specificity. We describe a revised protocol for GO mapping that enables us to assign GO terms to domains based on the function of the individual domain, rather than the function of the families in which the domain is found. We also discuss how taxonomic constraints are dealt with and those cases where we are unable to add any appropriate GO terms. Expert manual annotation of InterPro entries with GO terms enables users to infer function, process or subcellular information for uncharacterized sequences based on sequence matches to predictive models.
引用
收藏
页数:6
相关论文
共 15 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]   The GOA database in 2009-an integrated Gene Ontology Annotation resource [J].
Barrell, Daniel ;
Dimmer, Emily ;
Huntley, Rachael P. ;
Binns, David ;
O'Donovan, Claire ;
Apweiler, Rolf .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D396-D403
[3]   The Gene Ontology Annotation (GOA) project - application of GO in SWISS-PROT, TrEMBL and InterPro [J].
Camon, E ;
Barrell, D ;
Brooksbank, C ;
Magrane, M ;
Apweiler, R .
COMPARATIVE AND FUNCTIONAL GENOMICS, 2003, 4 (01) :71-74
[4]   A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing [J].
Cantacessi, Cinzia ;
Jex, Aaron R. ;
Hall, Ross S. ;
Young, Neil D. ;
Campbell, Bronwyn E. ;
Joachim, Anja ;
Nolan, Matthew J. ;
Abubucker, Sahar ;
Sternberg, Paul W. ;
Ranganathan, Shoba ;
Mitreva, Makedonka ;
Gasser, Robin B. .
NUCLEIC ACIDS RESEARCH, 2010, 38 (17) :e171-e171
[5]   Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development [J].
Deegan , Jennifer I. ;
Dimmer, Emily C. ;
Mungall, Christopher J. .
BMC BIOINFORMATICS, 2010, 11 :530
[6]   The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species [J].
Gaudet, Pascale ;
Chisholm, Rex ;
Berardini, Tanya ;
Dimmer, Emily ;
Engel, Stacia R. ;
Fey, Petra ;
Hill, David P. ;
Howe, Doug ;
Hu, James C. ;
Huntley, Rachael ;
Khodiyar, Varsha K. ;
Kishore, Ranjana ;
Li, Donghui ;
Lovering, Ruth C. ;
McCarthy, Fiona ;
Ni, Li ;
Petri, Victoria ;
Siegele, Deborah A. ;
Tweedie, Susan ;
Van Auken, Kimberly ;
Wood, Valerie ;
Basu, Siddhartha ;
Carbon, Seth ;
Dolan, Mary ;
Mungall, Christopher J. ;
Dolinski, Kara ;
Thomas, Paul ;
Ashburner, Michael ;
Blake, Judith A. ;
Cherry, J. Michael ;
Lewis, Suzanna E. .
PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (07)
[7]   The functional domains in p53 family proteins exhibit both common and distinct properties [J].
Harms, K. L. ;
Chen, X. .
CELL DEATH AND DIFFERENTIATION, 2006, 13 (06) :890-897
[8]   InterPro in 2011: new developments in the family and domain prediction database [J].
Hunter, Sarah ;
Jones, Philip ;
Mitchell, Alex ;
Apweiler, Rolf ;
Attwood, Teresa K. ;
Bateman, Alex ;
Bernard, Thomas ;
Binns, David ;
Bork, Peer ;
Burge, Sarah ;
de Castro, Edouard ;
Coggill, Penny ;
Corbett, Matthew ;
Das, Ujjwal ;
Daugherty, Louise ;
Duquenne, Lauranne ;
Finn, Robert D. ;
Fraser, Matthew ;
Gough, Julian ;
Haft, Daniel ;
Hulo, Nicolas ;
Kahn, Daniel ;
Kelly, Elizabeth ;
Letunic, Ivica ;
Lonsdale, David ;
Lopez, Rodrigo ;
Madera, Martin ;
Maslen, John ;
McAnulla, Craig ;
McDowall, Jennifer ;
McMenamin, Conor ;
Mi, Huaiyu ;
Mutowo-Muellenet, Prudence ;
Mulder, Nicola ;
Natale, Darren ;
Orengo, Christine ;
Pesseat, Sebastien ;
Punta, Marco ;
Quinn, Antony F. ;
Rivoire, Catherine ;
Sangrador-Vegas, Amaia ;
Selengut, Jeremy D. ;
Sigrist, Christian J. A. ;
Scheremetjew, Maxim ;
Tate, John ;
Thimmajanarthanan, Manjulapramila ;
Thomas, Paul D. ;
Wu, Cathy H. ;
Yeats, Corin ;
Yong, Siew-Yit .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D306-D312
[9]   Identification and characterization of a bacterial glutamic peptidase [J].
Jensen, Kenneth ;
Ostergaard, Peter R. ;
Wilting, Reinhard ;
Lassen, Soren F. .
BMC BIOCHEMISTRY, 2010, 11
[10]   HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot [J].
Lima, Tania ;
Auchincloss, Andrea H. ;
Coudert, Elisabeth ;
Keller, Guillaume ;
Michoud, Karine ;
Rivoire, Catherine ;
Bulliard, Virginie ;
de Castro, Edouard ;
Lachaize, Corinne ;
Baratin, Delphine ;
Phan, Isabelle ;
Bougueleret, Lydie ;
Bairoch, Amos .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D471-D478