Monte Carlo estimation of the number of possible protein folds: Effects of sampling bias and folds distributions

被引:13
作者
Leonov, H
Mitchell, JSB
Arkin, IT [1 ]
机构
[1] Hebrew Univ Jerusalem, Alexander Silberman Inst Life Sci, Dept Biol Chem, IL-91904 Jerusalem, Israel
[2] Hebrew Univ Jerusalem, Sch Comp Sci & Engn, Jerusalem, Israel
[3] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
来源
PROTEINS-STRUCTURE FUNCTION AND GENETICS | 2003年 / 51卷 / 03期
关键词
protein folds; proteomics; Monte Carlo;
D O I
10.1002/prot.10336
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The estimation of the number of protein folds in nature is a matter of considerable interest. In this study, a Monte Carlo method employing the broken stick model is used to assign a given number of proteins into a given number of folds. Subsequently, random, integer, non-repeating numbers are generated in order to simulate the process of fold discovery. With this conceptual framework at hand, the effects of two factors upon the fold identification process were investigated: (1) the nature of folds distributions and (2) preferential sampling bias of previously identified folds. Depending on the type of distribution, dividing 100,000 proteins into 1,000 folds resulted in 10-30% of the folds having 10 proteins or less per fold, approximately 10% of the folds having 10-20 proteins per fold, 31-45% having 20-100 proteins per fold, and >30% of the folds having more than 100 proteins per fold. After randomly sampling one tenth of the proteins, 68-96% of the folds were identified. These percentages depend both on folds distribution and biased/non-biased sampling. Only upon increasing the sampling bias for previously identified folds to 1,000, did the model result in a reduction of the number of proteins identified by an order of magnitude (approximately 9%). Thus, assuming the structures of one tenth of the population of proteins in nature have been solved, the results of the Monte Carlo simulation are more consistent with recent lower estimates of the number of folds, less than or equal to1,000. Any deviation from this estimate would reflect significant bias in the experimental sampling of protein structure, and/or substantially nonuniform folds distribution, manifested in a large number of single-fold proteins. (C) 2003 Wiley-Liss, Inc.
引用
收藏
页码:352 / 359
页数:8
相关论文
共 21 条
[1]  
ALEXANDROV NN, 1994, PROTEIN SCI, V3, P866
[2]   Protein structure prediction and structural genomics [J].
Baker, D ;
Sali, A .
SCIENCE, 2001, 294 (5540) :93-96
[3]   Stabilizing membrane proteins [J].
Bowie, JU .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2001, 11 (04) :397-402
[4]   STATISTICS OF SEQUENCE-STRUCTURE THREADING [J].
BRYANT, SH ;
ALTSCHUL, SF .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1995, 5 (02) :236-244
[5]   Integration of cytogenetic landmarks into the draft sequence of the human genome [J].
Cheung, VG ;
Nowak, N ;
Jang, W ;
Kirsch, IR ;
Zhao, S ;
Chen, XN ;
Furey, TS ;
Kim, UJ ;
Kuo, WL ;
Olivier, M ;
Conroy, J ;
Kasprzyk, A ;
Massa, H ;
Yonescu, R ;
Sait, S ;
Thoreen, C ;
Snijders, A ;
Lemyre, E ;
Bailey, JA ;
Bruzel, A ;
Burrill, WD ;
Clegg, SM ;
Collins, S ;
Dhami, P ;
Friedman, C ;
Han, CS ;
Herrick, S ;
Lee, J ;
Ligon, AH ;
Lowry, S ;
Morley, M ;
Narasimhan, S ;
Osoegawa, K ;
Peng, Z ;
Plajzer-Frick, I ;
Quade, BJ ;
Scott, D ;
Sirotkin, K ;
Thorpe, AA ;
Gray, JW ;
Hudson, J ;
Pinkel, D ;
Ried, T ;
Rowen, L ;
Shen-Ong, GL ;
Strausberg, RL ;
Birney, E ;
Callen, DF ;
Cheng, JF ;
Cox, DR .
NATURE, 2001, 409 (6822) :953-958
[6]   PROTEINS - 1000 FAMILIES FOR THE MOLECULAR BIOLOGIST [J].
CHOTHIA, C .
NATURE, 1992, 357 (6379) :543-544
[7]  
Govindarajan S, 1999, PROTEINS, V35, P408, DOI 10.1002/(SICI)1097-0134(19990601)35:4<408::AID-PROT4>3.0.CO
[8]  
2-A
[9]   KNOWLEDGE-BASED PROTEIN MODELING [J].
JOHNSON, MS ;
SRINIVASAN, N ;
SOWDHAMINI, R ;
BLUNDELL, TL .
CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, 1994, 29 (01) :1-68
[10]   Potential energy functions for threading [J].
Jones, DT ;
Thornton, JM .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (02) :210-216