Data mining the protein data bank: Residue interactions

被引:24
作者
Oldfield, TJ [1 ]
机构
[1] Univ York, Accelrys Inc, Dept Chem, York YO1 5DD, N Yorkshire, England
关键词
mathematical data mining; active sites; binding sites; protein structure; templates; superposition; residue configurations;
D O I
10.1002/prot.10221
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The protein databank contains a vast wealth of structural and functional information. The analysis of this macromolecular information has been the subject of considerable work in order to advance knowledge beyond the collection of molecular coordinates. This article presents a method that determines local structural information within proteins using mathematical data mining techniques. The mine program described returns many known configurations of residues such as the catalytic triad, metal binding sites and the N-linked glycosylation site; as well as many other multiple residue interactions not previously categorized. Because mathematical constructs are used as targets, this method can identify new information not previously known, and also provide unbiased results of typical structure and their expected deviations. Because the results are defined mathematically, they cannot indicate the biological implications of the results. Therefore two support programs are described that provide insight into the biological context for the mine results. The first allows a weighted RMSD search between a template set of coordinates and a list of PDB files, and the second allows the labeling of a protein with the template results from mining to aid in the classification of this protein. (C) 2002 Wiley-Liss, Inc.
引用
收藏
页码:510 / 528
页数:19
相关论文
共 41 条
[1]   Protein data bank archives of three-dimensional macromolecular structures [J].
Abola, EE ;
Sussman, JL ;
Prilusky, J ;
Manning, NO .
MACROMOLECULAR CRYSTALLOGRAPHY, PT B, 1997, 277 :556-571
[2]   A GRAPH-THEORETIC APPROACH TO THE IDENTIFICATION OF 3-DIMENSIONAL PATTERNS OF AMINO-ACID SIDE-CHAINS IN PROTEIN STRUCTURES [J].
ARTYMIUK, PJ ;
POIRRETTE, AR ;
GRINDLEY, HM ;
RICE, DW ;
WILLETT, P .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 243 (02) :327-344
[3]  
ARTYMIUK PJ, 1995, 1 MAP FINAL MODEL, P71
[4]  
ATTWOOD TK, 1994, NUCLEIC ACIDS RES, V22, P3590
[5]   PROSITE - A DICTIONARY OF SITES AND PATTERNS IN PROTEINS [J].
BAIROCH, A .
NUCLEIC ACIDS RESEARCH, 1992, 20 :2013-2018
[6]  
Barth A, 1994, Drug Des Discov, V12, P89
[7]  
BARTH A, 1993, DRUG DESIGN DISCOVER, V10, P535
[8]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[9]   PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES [J].
BERNSTEIN, FC ;
KOETZLE, TF ;
WILLIAMS, GJB ;
MEYER, EF ;
BRICE, MD ;
RODGERS, JR ;
KENNARD, O ;
SHIMANOUCHI, T ;
TASUMI, M .
JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) :535-542
[10]   STRUCTURE AND MECHANISM OF CHYMOTRYPSIN [J].
BLOW, DM .
ACCOUNTS OF CHEMICAL RESEARCH, 1976, 9 (04) :145-152