Relations of the numbers of protein sequences, families and folds

被引:20
作者
Zhang, CT
机构
[1] Department of Physics, Tianjin University
来源
PROTEIN ENGINEERING | 1997年 / 10卷 / 07期
关键词
degeneracy; degenerate degree; distribution of degenerate degrees; numerical relations; protein families; protein folds; protein sequences;
D O I
10.1093/protein/10.7.757
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The relations among the numbers of protein sequences, families and folds have been studied theoretically. It is found that the number of families is related to the natural logarithm of the number of sequences, The logarithmic relation should not be changed regardless of what value of the homology threshold is applied in the protein sequence comparison routines. To study the relation between the numbers of families and folds, the degenerate degree of a fold has been introduced, The degenerate degree of a fold is the number of protein families which adopt the same fold. The distribution of the degenerate degrees of folds has been found to be very likely exponential, Based on the distribution, the average degenerate degree (d) over bar is calculated. The number of folds is simply equal to that of families divided by the average degenerate degree of folds. It is shown that (d) over bar is an increasing function of time. The current value of (d) over bar is about 2. It will continue to increase and reach the value of at least 3.3 in some years. By using the above result, the numbers of protein folds for four species have been estimated, In particular, the number of folds for human proteins is estimated to be less than or equal to 5200.
引用
收藏
页码:757 / 761
页数:5
相关论文
共 24 条
[1]  
ALEXANDROV NN, 1994, PROTEIN SCI, V3, P866
[2]   CATCHING A COMMON FOLD [J].
BLUNDELL, TL ;
JOHNSON, MS .
PROTEIN SCIENCE, 1993, 2 (06) :877-883
[3]   A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE [J].
BOWIE, JU ;
LUTHY, R ;
EISENBERG, D .
SCIENCE, 1991, 253 (5016) :164-170
[4]   PROTEINS - 1000 FAMILIES FOR THE MOLECULAR BIOLOGIST [J].
CHOTHIA, C .
NATURE, 1992, 357 (6379) :543-544
[5]   PROTEIN-STRUCTURE PREDICTION - RECOGNITION OF PRIMARY, SECONDARY, AND TERTIARY STRUCTURAL FEATURES FROM AMINO-ACID-SEQUENCE [J].
EISENHABER, F ;
PERSSON, B ;
ARGOS, P .
CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, 1995, 30 (01) :1-94
[6]   WHY DO GLOBULAR-PROTEINS FIT THE LIMITED SET OF FOLDING PATTERNS [J].
FINKELSTEIN, AV ;
PTITSYN, OB .
PROGRESS IN BIOPHYSICS & MOLECULAR BIOLOGY, 1987, 50 (03) :171-190
[7]   COMPARISON OF CONFORMATIONAL CHARACTERISTICS IN STRUCTURALLY SIMILAR PROTEIN PAIRS [J].
FLORES, TP ;
ORENGO, CA ;
MOSS, DS ;
THORNTON, JM .
PROTEIN SCIENCE, 1993, 2 (11) :1811-1826
[8]   STRUCTURAL RELATIONSHIPS OF HOMOLOGOUS PROTEINS AS A FUNDAMENTAL PRINCIPLE IN HOMOLOGY MODELING [J].
HILBERT, M ;
BOHM, G ;
JAENICKE, R .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1993, 17 (02) :138-151
[9]   Mapping the protein universe [J].
Holm, L ;
Sander, C .
SCIENCE, 1996, 273 (5275) :595-602
[10]  
HOLM L, 1994, NUCLEIC ACIDS RES, V22, P3600