Small libraries of protein fragments model native protein structures accurately

被引：143

作者：

Kolodny, R

Koehl, P

Guibas, L

Levitt, M

机构：

[1] Stanford Univ, Sch Med, Dept Biol Struct, Stanford, CA 94305 USA

[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

来源：

JOURNAL OF MOLECULAR BIOLOGY | 2002年 / 323卷 / 02期

关键词：

protein representations; discrete models;

D O I：

10.1016/S0022-2836(02)00942-7

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Prediction of protein structure depends on the accuracy and complexity of the models used. Here, we represent the polypeptide chain by a sequence of rigid fragments that are concatenated without any degrees of freedom. Fragments chosen from a library of representative fragments are fit to the native structure using a greedy build-up method. This gives a one-dimensional representation of native protein three-dimensional structure whose quality depends on the nature of the library. We use a novel clustering method to construct libraries that differ in the fragment length (four to seven residues) and number of representative fragments they contain, (25-300). Each library is characterized by the quality of fit (accuracy) and the number of allowed states per residue (complexity). We find that the accuracy depends on the complexity and varies from 2.9 Angstrom for a 2.7-state model on the basis of fragments of length 7-0.76 Angstrom for a 15-state model on the basis of fragments of length 5. Our goal is to find representations that are both accurate and economical (low complexity). The models defined here are substantially better in this regard: with ten states per residue we approximate native protein structure to 1 Angstrom compared to over 20 states per residue needed previously. For the same complexity, we find that longer fragments provide better fits. Unfortunately, libraries of longer fragments must be much larger (for ten states per residue, a seven-residue library is 100 times larger than a five-residue library). As the number of known protein native structures increases, it will be possible to construct larger libraries to better exploit this correlation between neighboring residues. Our fragment libraries, which offer a wide range of optimal fragments suited to different accuracies of fit, may prove to be useful for generating better decoy sets for ab initio protein folding and for generating accurate loop conformations in homology modeling. (C) 2002 Elsevier Science Ltd. All rights reserved.

引用

页码：297 / 307

页数：11

共 22 条

[1] The Protein Data Bank [J].