Automated discovery of 3D motifs for protein function annotation

被引:56
作者
Polacco, BJ [1 ]
Babbitt, PC [1 ]
机构
[1] Univ Calif San Francisco, Dept Biopharmaceut Sci, San Francisco, CA 94143 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btk038
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Function inference from structure is facilitated by the use of patterns of residues (3D motifs), normally identified by expert knowledge, that correlate with function. As an alternative to often limited expert knowledge, we use machine-learning techniques to identify patterns of 3-10 residues that maximize function prediction. This approach allows us to test the assumption that residues that provide function are the most informative for predicting function. Results: We apply our method, GASPS, to the haloacid dehalogenase, enolase, amidohydrolase and crotonase superfamilies and to the serine proteases. The motifs found by GASPS are as good at function prediction as 3D motifs based on expert knowledge. The GASPS motifs with the greatest ability to predict protein function consist mainly of known functional residues. However, several residues with no known functional role are equally predictive. For four groups, we show that the predictive power of our 3D motifs is comparable with or better than approaches that use the entire fold (Combinatorial-Extension) or sequence profiles (PSI-BLAST).
引用
收藏
页码:723 / 730
页数:8
相关论文
共 37 条
[1]   Phosphoryl group transfer: evolution of a catalytic scaffold [J].
Allen, KN ;
Dunaway-Mariano, D .
TRENDS IN BIOCHEMICAL SCIENCES, 2004, 29 (09) :495-503
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Large-scale assessment of the utility of low-resolution protein structures for biochemical function assignment [J].
Arakaki, AK ;
Zhang, Y ;
Skolnick, J .
BIOINFORMATICS, 2004, 20 (07) :1087-1096
[4]   A GRAPH-THEORETIC APPROACH TO THE IDENTIFICATION OF 3-DIMENSIONAL PATTERNS OF AMINO-ACID SIDE-CHAINS IN PROTEIN STRUCTURES [J].
ARTYMIUK, PJ ;
POIRRETTE, AR ;
GRINDLEY, HM ;
RICE, DW ;
WILLETT, P .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 243 (02) :327-344
[5]   The enolase superfamily: A general strategy for enzyme-catalyzed abstraction of the alpha-protons of carboxylic acids [J].
Babbitt, PC ;
Hasson, MS ;
Wedekind, JE ;
Palmer, DRJ ;
Barrett, WC ;
Reed, GH ;
Rayment, I ;
Ringe, D ;
Kenyon, GL ;
Gerlt, JA .
BIOCHEMISTRY, 1996, 35 (51) :16489-16501
[6]   Definitions of enzyme function for the structural genomics era [J].
Babbitt, PC .
CURRENT OPINION IN CHEMICAL BIOLOGY, 2003, 7 (02) :230-237
[7]   An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis [J].
Barker, JA ;
Thornton, JM .
BIOINFORMATICS, 2003, 19 (13) :1644-1649
[8]   Analysis of catalytic residues in enzyme active sites [J].
Bartlett, GJ ;
Porter, CT ;
Borkakoti, N ;
Thornton, JM .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 324 (01) :105-121
[9]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[10]   THE RELATION BETWEEN THE DIVERGENCE OF SEQUENCE AND STRUCTURE IN PROTEINS [J].
CHOTHIA, C ;
LESK, AM .
EMBO JOURNAL, 1986, 5 (04) :823-826