Enhanced functional annotation of protein sequences via the use of structural descriptors

被引:38
作者
Di Gennaro, JA
Siew, N
Hoffman, BT
Zhang, L
Skolnick, J
Neilson, LI
Fetrow, JS
机构
[1] GeneFormat Inc, San Diego, CA 92121 USA
[2] Scripps Res Inst, Dept Mol Biol, La Jolla, CA 92037 USA
[3] Danforth Plant Sci Ctr, Lab Computat Genomes, St Louis, MO 63141 USA
关键词
Bacillus subtilis; bioinformatics; disulfide oxidoreductase; Drosophila melanogaster; FFF (fuzzy functional form); functional annotation; protein function prediction; protein tyrosine phosphatase; structural genomics;
D O I
10.1006/jsbi.2001.4391
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In order to circumvent limitations of sequence based methods in the process of making functional predictions for proteins, we have developed a methodology that uses a sequence-to-structure-to-function paradigm. First, an approximate three-dimensional structure is predicted. Then, a three-dimensional descriptor of the functional site, termed a Fuzzy Functional Form, or FFF, is used to screen the structure for the presence of the functional site of interest (Fetrow et al., 1998; Fetrow and Skolnick, 1998). Previously, a disulfide oxidoreductase FFF was developed and applied to predicted structures obtained from a small structural database. Here, using a substantially larger structural database, we expand the analysis of the disulfide oxidoreductase FFF to the B. subtilis genome. To ascertain the performance of the FFF, its results are compared to those obtained using both the sequence alignment method BLAST and three local sequence motif databases: PRINTS, Prosite, and Blocks. The FFF method is then compared in detail to Blocks and it is shown that the FFF is more flexible and sensitive in finding a specific function in a set of unknown proteins. In addition, the estimated false positive rate of function prediction is significantly lower using the FFF structural motif, rather than the standard sequence motif methods. We also present a second FFF and describe a specific example of the results of its whole-genome application to D. melanogaster using a newer threading algorithm. Our results from all of these studies indicate that the addition of three-dimensional structural in-formation adds significant value in the prediction of biochemical function of genomic sequences. (C) 2001 Academic Press.
引用
收藏
页码:232 / 245
页数:14
相关论文
共 67 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Iterated profile searches with PSI-BLAST - a tool for discovery in protein databases [J].
Altschul, SF ;
Koonin, EV .
TRENDS IN BIOCHEMICAL SCIENCES, 1998, 23 (11) :444-447
[3]   Automated genome sequence analysis and annotation [J].
Andrade, MA ;
Brown, NP ;
Leroy, C ;
Hoersch, S ;
de Daruvar, A ;
Reich, C ;
Franchini, A ;
Tamames, J ;
Valencia, A ;
Ouzounis, C ;
Sander, C .
BIOINFORMATICS, 1999, 15 (05) :391-412
[4]   Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families [J].
Andrade, MA ;
Valencia, A .
BIOINFORMATICS, 1998, 14 (07) :600-607
[5]   A GRAPH-THEORETIC APPROACH TO THE IDENTIFICATION OF 3-DIMENSIONAL PATTERNS OF AMINO-ACID SIDE-CHAINS IN PROTEIN STRUCTURES [J].
ARTYMIUK, PJ ;
POIRRETTE, AR ;
GRINDLEY, HM ;
RICE, DW ;
WILLETT, P .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 243 (02) :327-344
[6]   The quest to deduce protein function from sequence: the role of pattern databases [J].
Attwood, TK .
INTERNATIONAL JOURNAL OF BIOCHEMISTRY & CELL BIOLOGY, 2000, 32 (02) :139-155
[7]   The PRINTS protein fingerprint database in its fifth year [J].
Attwood, TK ;
Beck, ME ;
Flower, DR ;
Scordis, P ;
Selley, JN .
NUCLEIC ACIDS RESEARCH, 1998, 26 (01) :304-308
[8]   CRYSTAL-STRUCTURE OF HUMAN PROTEIN-TYROSINE-PHOSPHATASE 1B [J].
BARFORD, D ;
FLINT, AJ ;
TONKS, NK .
SCIENCE, 1994, 263 (5152) :1397-1404
[9]   Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins [J].
Bateman, A ;
Birney, E ;
Durbin, R ;
Eddy, SR ;
Finn, RD ;
Sonnhammer, ELL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :260-262
[10]   Characterization of an lrp-like (lrpC) gene from Bacillus subtilis [J].
Beloin, C ;
Ayora, S ;
Exley, R ;
Hirschbein, L ;
Ogasawara, N ;
Kasahara, Y ;
Alonso, JC ;
LeHegarat, F .
MOLECULAR & GENERAL GENETICS, 1997, 256 (01) :63-71