Predicting enzyme class from protein structure without alignments

被引:126
作者
Dobson, PD [1 ]
Doig, AJ [1 ]
机构
[1] Univ Manchester, Dept Biomol Sci, Manchester M60 1QD, Lancs, England
基金
英国生物技术与生命科学研究理事会;
关键词
protein function prediction; structure; EC number; machine learning; structural genomics;
D O I
10.1016/j.jmb.2004.10.024
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Methods for predicting protein function from structure are becoming more important as the rate at which structures are solved increases more rapidly than experimental knowledge. As a result, protein structures now frequently lack functional annotations. The majority of methods for predicting protein function are reliant upon identifying a similar protein and transferring its annotations to the query protein. This method fails when a similar protein cannot be identified, or when any similar proteins identified also lack reliable annotations. Here, we describe a method that can assign function from structure without the use of algorithms reliant upon alignments. Using simple attributes that can be calculated from any crystal structure, such as secondary structure content, amino acid propensities, surface properties and ligands, we describe each enzyme in a non-redundant set. The set is split according to Enzyme Classification (EC) number. We combine the predictions of one-class versus one-class support vector machine models to make overall assignments of EC number to an accuracy of 35% with the top-ranked prediction, rising to 60% accuracy with the top two ranks. In doing so we demonstrate the utility of simple structural attributes in protein function prediction and shed light on the link between structure and function. We apply our methods to predict the function of every currently unclassified protein in the Protein Data Bank. (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:187 / 199
页数:13
相关论文
共 50 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]  
[Anonymous], METHOD ENZYMOL
[4]  
Attwood Terri K, 2002, Brief Bioinform, V3, P252, DOI 10.1093/bib/3.3.252
[5]   Definitions of enzyme function for the structural genomics era [J].
Babbitt, PC .
CURRENT OPINION IN CHEMICAL BIOLOGY, 2003, 7 (02) :230-237
[6]   The ENZYME database in 2000 [J].
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :304-305
[7]   Analysis of catalytic residues in enzyme active sites [J].
Bartlett, GJ ;
Porter, CT ;
Borkakoti, N ;
Thornton, JM .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 324 (01) :105-121
[8]  
BATE P, 2004, IN PRESS J MOL BIOL
[9]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[10]  
BISHOP CM, 1995, NEURAL NETWORKS PATT, P372