The PDB is a covering set of small protein structures

被引:89
作者
Kihara, D [1 ]
Skolnick, J [1 ]
机构
[1] Univ Buffalo, Ctr Excellence Bioinformat, Buffalo, NY 14203 USA
关键词
PDB; protein structure comparison; protein structure space; relative RMSD; fragments;
D O I
10.1016/j.jmb.2003.10.027
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Structure comparisons of all representative proteins have been done. Employing the relative root mean square deviation (RMSD) from native enables the assessment of the statistical significance of structure alignments of different lengths in terms of a Z-score. Two conclusions emerge: first, proteins with their native fold can be distinguished by their Z-score. Second and somewhat surprising, all small proteins up to 100 residues in length have significant structure alignments to other proteins in a different secondary structure and fold class; i.e. 24.0% of them have 60% coverage by a template protein with a RMSD below 3.5 Angstrom and 6.0% have 70% coverage. If the restriction that we align proteins only having different secondary structure types is removed, then in a representative benchmark set of proteins of 200 residues or smaller, 93% can be aligned to a single template structure (with average sequence identity of 9.8%), with a RMSD less than 4 Angstrom, and 79% average coverage. In this sense, the current Protein Data Bank (PDB) is almost a covering set of small protein structures. The length of the aligned region (relative to the whole protein length) does not differ among the top hit proteins, indicating that protein structure space is highly dense. For larger proteins, non-related proteins can cover a significant portion of the structure. Moreover, these top hit proteins are aligned to different parts of the target protein, so that almost the entire molecule can be covered when combined. The number of proteins required to cover a target protein is very small, e.g. the top ten hit proteins can give 90% coverage below a RMSD of 3.5 A for proteins up to 320 residues long. These results give a new view of the nature of protein structure space, and its implications for protein structure prediction are discussed. (C) 2003 Elsevier Ltd. All rights reserved.
引用
收藏
页码:793 / 802
页数:10
相关论文
共 46 条
[1]  
ASAI K, 1993, COMPUT APPL BIOSCI, V9, P141
[2]   A COMPUTER VISION-BASED TECHNIQUE FOR 3-D SEQUENCE-INDEPENDENT STRUCTURAL COMPARISON OF PROTEINS [J].
BACHAR, O ;
FISCHER, D ;
NUSSINOV, R ;
WOLFSON, H .
PROTEIN ENGINEERING, 1993, 6 (03) :279-288
[3]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[4]  
Betancourt MR, 2001, J COMPUT CHEM, V22, P339, DOI 10.1002/1096-987X(200102)22:3<339::AID-JCC1006>3.0.CO
[5]  
2-R
[6]   De novo prediction of three-dimensional structures for major protein families [J].
Bonneau, R ;
Strauss, CEM ;
Rohl, CA ;
Chivian, D ;
Bradley, P ;
Malmström, L ;
Robertson, T ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 322 (01) :65-78
[7]  
Boutonnet NS, 1998, PROTEINS, V30, P193, DOI 10.1002/(SICI)1097-0134(19980201)30:2<193::AID-PROT9>3.0.CO
[8]  
2-O
[9]   A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE [J].
BOWIE, JU ;
LUTHY, R ;
EISENBERG, D .
SCIENCE, 1991, 253 (5016) :164-170
[10]   PROTEINS - 1000 FAMILIES FOR THE MOLECULAR BIOLOGIST [J].
CHOTHIA, C .
NATURE, 1992, 357 (6379) :543-544