From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions

被引:47
作者
Zhang, BH [1 ]
Rychlewski, L [1 ]
Pawlowski, K [1 ]
Fetrow, JS [1 ]
Skolnick, J [1 ]
Godzik, A [1 ]
机构
[1] Scripps Res Inst, La Jolla, CA 92037 USA
关键词
fold assignments; function predictions; genome analysis;
D O I
10.1110/ps.8.5.1104
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A database of functional sites for proteins with known structures, SITE, is constructed and used in conjunction with a simple pattern matching program SiteMatch to evaluate possible function conservation in a recently constructed database of fold predictions for Escherichia coli proteins (Rychlewski L et at., 1999, Protein Sci 8:614-624). In this and other prediction databases, fold predictions are based on algorithms that can recognize weak sequence similarities and putatively assign new proteins into already characterized protein families. It is not clear whether such sequence similarities arise from distant homologies or general similarity of physicochemical features along the sequence. Leaving aside the important question of nature of relations within fold superfamilies, it is possible to assess possible function conservation by looking at the pattern of conservation of crucial functional residues. SITE consists of a multilevel function description based on structure annotations and structure analyses. In particular, active site residues, ligand binding residues, and patterns of hydrophobic residues on the protein surface are used to describe different functional features. SiteMatch, a simple pattern matching program, is designed to check the conservation of residues involved in protein activity in alignments generated by any alignment method. Here, this procedure is used to study conservation of functional features in alignments between protein sequences from the E. coli genome and their optimal structural templates. The optimal templates were identified and alignments taken from the database of genomic structural predictions was described in a previous publication (Rychlewski L et al., 1999, Protein Sci 8:614-624). An automated assessment of function conservation is used to analyze the relation between fold and function similarity for a large number of fold predictions. For instance, it is shown that identifying low significance predictions with a high level of functional residue conservations can be used to extend the prediction sensitivity for fold prediction methods. Over 100 new fold/function predictions in this class were obtained in the E. coli genome. At the same time, about 30% of our previous fold predictions are not confirmed as function predictions, further highlighting the problem of function divergence in fold superfamilies.
引用
收藏
页码:1104 / 1115
页数:12
相关论文
共 31 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :49-54
[4]   PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES [J].
BERNSTEIN, FC ;
KOETZLE, TF ;
WILLIAMS, GJB ;
MEYER, EF ;
BRICE, MD ;
RODGERS, JR ;
KENNARD, O ;
SHIMANOUCHI, T ;
TASUMI, M .
JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) :535-542
[5]  
Bork P, 1996, METHOD ENZYMOL, V266, P162
[6]   A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE [J].
BOWIE, JU ;
LUTHY, R ;
EISENBERG, D .
SCIENCE, 1991, 253 (5016) :164-170
[7]  
CASARI G, 1996, P 1 ANN PAC S BIOC, P108
[8]   Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases [J].
Fetrow, JS ;
Skolnick, J .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 281 (05) :949-968
[9]   Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm:: Identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity [J].
Fetrow, JS ;
Godzik, A ;
Skolnick, J .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 282 (04) :703-711
[10]   Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium [J].
Fischer, D ;
Eisenberg, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1997, 94 (22) :11929-11934