DATABASE OF HOMOLOGY-DERIVED PROTEIN STRUCTURES AND THE STRUCTURAL MEANING OF SEQUENCE ALIGNMENT

被引:1408
作者
SANDER, C
SCHNEIDER, R
机构
[1] European Molecular Biology Laboratory, Heidelberg
来源
PROTEINS-STRUCTURE FUNCTION AND GENETICS | 1991年 / 9卷 / 01期
关键词
SECONDARY STRUCTURE; TERTIARY STRUCTURE; RESIDUE CONSERVATION; SEQUENCE VARIABILITY; SEQUENCE PROFILE; FOLDING UNITS;
D O I
10.1002/prot.340090107
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.
引用
收藏
页码:56 / 68
页数:13
相关论文
共 43 条
[1]  
ARGOS P, 1987, J MOL BIOL, V193, P285
[2]   ATOMIC COORDINATES FOR TRIOSE PHOSPHATE ISOMERASE FROM CHICKEN MUSCLE [J].
BANNER, DW ;
BLOOMER, AC ;
PETSKO, GA ;
PHILLIPS, DC ;
WILSON, IA .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 1976, 72 (01) :146-155
[3]   DETERMINANTS OF A PROTEIN FOLD - UNIQUE FEATURES OF THE GLOBIN AMINO-ACID-SEQUENCES [J].
BASHFORD, D ;
CHOTHIA, C ;
LESK, AM .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 196 (01) :199-216
[4]   PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES [J].
BERNSTEIN, FC ;
KOETZLE, TF ;
WILLIAMS, GJB ;
MEYER, EF ;
BRICE, MD ;
RODGERS, JR ;
KENNARD, O ;
SHIMANOUCHI, T ;
TASUMI, M .
JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) :535-542
[5]   THE RELATION BETWEEN THE DIVERGENCE OF SEQUENCE AND STRUCTURE IN PROTEINS [J].
CHOTHIA, C ;
LESK, AM .
EMBO JOURNAL, 1986, 5 (04) :823-826
[6]  
Dayhoff H., 1978, ALTAS PROTEIN SEQUEN, V5, P363
[7]   A COMPREHENSIVE SET OF SEQUENCE-ANALYSIS PROGRAMS FOR THE VAX [J].
DEVEREUX, J ;
HAEBERLI, P ;
SMITHIES, O .
NUCLEIC ACIDS RESEARCH, 1984, 12 (01) :387-395
[8]   ROLE OF THE N-TERMINUS IN THE INTERACTION OF PANCREATIC PHOSPHOLIPASE-A2 WITH AGGREGATED SUBSTRATES - PROPERTIES AND CRYSTAL-STRUCTURE OF TRANSAMINATED PHOSPHOLIPASE-A2 [J].
DIJKSTRA, BW ;
KALK, KH ;
DRENTH, J ;
DEHAAS, GH ;
EGMOND, MR ;
SLOTBOOM, AJ .
BIOCHEMISTRY, 1984, 23 (12) :2759-2766
[9]   STRUCTURE OF PORCINE PANCREATIC PHOSPHOLIPASE-A2 AT 2.6-A RESOLUTION AND COMPARISON WITH BOVINE PHOSPHOLIPASE-A2 [J].
DIJKSTRA, BW ;
RENETSEDER, R ;
KALK, KH ;
HOL, WGJ ;
DRENTH, J .
JOURNAL OF MOLECULAR BIOLOGY, 1983, 168 (01) :163-179
[10]  
DRENTH J, 1972, COLD SPRING HARB SYM, V36, P107