The relationship between protein structure and function: a comprehensive survey with application to the yeast genome

被引:294
作者
Hegyi, H [1 ]
Gerstein, M [1 ]
机构
[1] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
关键词
structure-function; fold classification; structural convergence; functional divergence; yeast genomics;
D O I
10.1006/jmbi.1999.2661
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
For most proteins in the genome databases, function is predicted via sequence comparison. In spite of the popularity of this approach, the extent to which it can be reliably applied is unknown. We address this issue by systematically investigating the relationship between protein function and structure. We focus initially on enzymes functionally classified by the Enzyme Commission (EC) and relate these to by structurally classified domains the SCOP database. We find that the major SCOP fold classes have different propensities to carry out certain broad categories of functions. For instance, alpha/beta folds are disproportionately associated with enzymes, especially transferases and hydrolases, and all-alpha and small folds with non-enzymes, while alpha +beta folds have an equal tendency either way. These observations for the database overall are largely true for specific genomes. We focus, in particular, on yeast, analyzing it with many classifications in addition to SCOP and EC (i.e. COGs, CATH, MIPS), and find clear tendencies for fold-function association, across a broad spectrum of functions. Analysis with the COGs scheme also suggests that the func tions of the most ancient proteins are more evenly distributed among different structural classes than those of more modern ones. For the data base overall, we identify the most versatile functions, i.e. those that are associated with the most folds, and the most versatile folds, associated with the most functions. The two most versatile enzymatic functions (hydro-lyases and O-glycosyl glucosidases) are associated with seven folds each. The five most versatile folds (TIM-barrel, Rossmann, ferredoxin, alpha-beta hydrolase, and P-loop NTP hydrolase) are all mixed alpha-beta structures. They stand out as generic scaffolds, accommodating from six to as many as 16 functions (for the exceptional TIM-barrel). At the conclusion of our analysis we are able to construct a graph giving the chance that a functional annotation can be reliably transferred at different degrees of sequence and structural similarity. Supplemental information is available from http://bioinfo.mbb.yale.edu/genome/foldfunc. (C) 1999 Academic Press.
引用
收藏
页码:147 / 164
页数:18
相关论文
共 64 条
[41]   Prokaryotic genomes: the emerging paradigm of genome-based microbiology [J].
Koonin, EV ;
Galperin, MY .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 1997, 7 (06) :757-763
[42]   MOLSCRIPT - A PROGRAM TO PRODUCE BOTH DETAILED AND SCHEMATIC PLOTS OF PROTEIN STRUCTURES [J].
KRAULIS, PJ .
JOURNAL OF APPLIED CRYSTALLOGRAPHY, 1991, 24 :946-950
[43]   Protein folds and functions [J].
Martin, AC ;
Orengo, CA ;
Hutchinson, EG ;
Jones, S ;
Karmirantzou, M ;
Laskowski, RA ;
Mitchell, JB ;
Taroni, C ;
Thornton, JM .
STRUCTURE, 1998, 6 (07) :875-884
[44]   The rational design of allosteric interactions in a monomeric protein and its applications to the construction of biosensors [J].
Marvin, JS ;
Corcoran, EE ;
Hattangadi, NA ;
Zhang, JV ;
Gere, SA ;
Hellinga, HW .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1997, 94 (09) :4366-4371
[45]   Overview of the yeast genome [J].
Mewes, HW ;
Albermann, K ;
Bahr, M ;
Frishman, D ;
Gleissner, A ;
Hani, J ;
Heumann, K ;
Kleine, K ;
Maierl, A ;
Oliver, SG ;
Pfeiffer, F ;
Zollner, A .
NATURE, 1997, 387 (6632) :7-8
[46]  
MORGAN JG, 1991, J IMMUNOL, V147, P3210
[47]  
MURZIN AG, 1995, J MOL BIOL, V247, P536, DOI 10.1016/S0022-2836(05)80134-2
[48]   CATH - a hierarchic classification of protein domain structures [J].
Orengo, CA ;
Michie, AD ;
Jones, S ;
Jones, DT ;
Swindells, MB ;
Thornton, JM .
STRUCTURE, 1997, 5 (08) :1093-1108
[49]   IDENTIFICATION AND CLASSIFICATION OF PROTEIN FOLD FAMILIES [J].
ORENGO, CA ;
FLORES, TP ;
TAYLOR, WR ;
THORNTON, JM .
PROTEIN ENGINEERING, 1993, 6 (05) :485-500
[50]   Empirical statistical estimates for sequence similarity searches [J].
Pearson, WR .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 276 (01) :71-84