Towards fully automated structure-based function prediction in structural genomics: A case study

被引：55

作者：

Watson, James D.

Sanderson, Steve

EzerskY, Alexandra

Savchenk, Alexel

Edwards, Aled

Orengo, Christine

Joachimiak, Andrzej

Laskowski, Roman A.

Thornton, Janet M.

机构：

[1] European Bioinformat Inst, EMBL, Cambridge CB10 1SD, England

[2] Univ Toronto, Banting & Best Dept Med Res, Toronto, ON, Canada

[3] Univ Hlth Network, Clin Genom Ctr Proteom, Toronto, ON, Canada

[4] UCL, London WC1E 6BT, England

[5] Argonne Natl Lab, Biosci Div & Struct Biol Ctr, Argonne, IL 60439 USA

来源：

JOURNAL OF MOLECULAR BIOLOGY | 2007年 / 367卷 / 05期

基金：

美国国家卫生研究院;

关键词：

structural genomics; function prediction from structure; Gene Ontology; GO-slims; protein function prediction;

D O I：

10.1016/j.jmb.2007.01.063

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment. (c) 2007 Elsevier Ltd. All rights reserved.

引用

页码：1511 / 1522

页数：12

共 31 条

[1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].