Towards fully automated structure-based function prediction in structural genomics: A case study

被引:55
作者
Watson, James D.
Sanderson, Steve
EzerskY, Alexandra
Savchenk, Alexel
Edwards, Aled
Orengo, Christine
Joachimiak, Andrzej
Laskowski, Roman A.
Thornton, Janet M.
机构
[1] European Bioinformat Inst, EMBL, Cambridge CB10 1SD, England
[2] Univ Toronto, Banting & Best Dept Med Res, Toronto, ON, Canada
[3] Univ Hlth Network, Clin Genom Ctr Proteom, Toronto, ON, Canada
[4] UCL, London WC1E 6BT, England
[5] Argonne Natl Lab, Biosci Div & Struct Biol Ctr, Argonne, IL 60439 USA
基金
美国国家卫生研究院;
关键词
structural genomics; function prediction from structure; Gene Ontology; GO-slims; protein function prediction;
D O I
10.1016/j.jmb.2007.01.063
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment. (c) 2007 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1511 / 1522
页数:12
相关论文
共 31 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   The universal protein resource (UniProt) [J].
Bairoch, A ;
Apweiler, R ;
Wu, CH ;
Barker, WC ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E ;
Huang, HZ ;
Lopez, R ;
Magrane, M ;
Martin, MJ ;
Natale, DA ;
O'Donovan, C ;
Redaschi, N ;
Yeh, LSL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D154-D159
[4]  
Benson Dennis A, 2005, Nucleic Acids Res, V33, pD34
[5]   Structural genomics: an overview [J].
Blundell, TL ;
Mizuguchi, K .
PROGRESS IN BIOPHYSICS & MOLECULAR BIOLOGY, 2000, 73 (05) :289-295
[6]   The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology [J].
Camon, E ;
Magrane, M ;
Barrell, D ;
Lee, V ;
Dimmer, E ;
Maslen, J ;
Binns, D ;
Harte, N ;
Lopez, R ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D262-D266
[7]   Protein classification using probabilistic chain graphs and the Gene Ontology structure [J].
Carroll, Steven ;
Pavlovic, Vladimir .
BIOINFORMATICS, 2006, 22 (15) :1871-1878
[8]   TargetDB: a target registration database for structural genomics projects [J].
Chen, L ;
Oughtred, R ;
Berman, HM ;
Westbrook, J .
BIOINFORMATICS, 2004, 20 (16) :2860-2862
[9]   A procedure for assessing GO annotation consistency [J].
Dolan, ME ;
Ni, L ;
Camon, E ;
Blake, JA .
BIOINFORMATICS, 2005, 21 :I136-I143
[10]   SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments [J].
Gough, J ;
Chothia, C .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :268-272