A SEQUENCE PROPERTY APPROACH TO SEARCHING PROTEIN DATABASES

被引:148
作者
HOBOHM, U [1 ]
SANDER, C [1 ]
机构
[1] EUROPEAN MOLEC BIOL LAB, D-69012 HEIDELBERG, GERMANY
关键词
AMINO ACID COMPOSITION; DATABASE SEARCH; STRUCTURAL HOMOLOGS;
D O I
10.1006/jmbi.1995.0442
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Currently available sequence alignment programs are generally not capable of detecting functional and structural homologs in the twilight zone of sequence similarity, i.e. when the sequence identity falls below about 25%. Here we attempt to detect such weak similarities using an approach based on a notion of protein sequence similarity radically different from that used in sequential alignment. The approach defines protein sequence dissimilarity (or distance) as a weighted sum of differences of compositional properties such as singlet and doublet amino acid composition, molecular weight, isoelectric point (protein property search or PropSearch). With PropSearch, either single sequences can be used for a database query, or multiple sequences can be merged into an ''average'' sequence reflecting the average composition of a protein family. First, we show that members of structural protein families have a low mutual PropSearch distance when the weights are optimized to discriminate maximally between structural families. Second, we demonstrate the results of database searches using the PropSearch method. Such searches are very rapid when scanning a preprocessed database and do not require alignments. In cases in which conventional alignment tools fail to detect similarities, PropSearch can be used to generate hypotheses about possible structural or functional relationships between a new sequence and sequences in the database.
引用
收藏
页码:390 / 399
页数:10
相关论文
共 32 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] THE SWISS-PROT PROTEIN-SEQUENCE DATA-BANK
    BAIROCH, A
    BOECKMANN, B
    [J]. NUCLEIC ACIDS RESEARCH, 1991, 19 : 2247 - 2248
  • [3] PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES
    BERNSTEIN, FC
    KOETZLE, TF
    WILLIAMS, GJB
    MEYER, EF
    BRICE, MD
    RODGERS, JR
    KENNARD, O
    SHIMANOUCHI, T
    TASUMI, M
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) : 535 - 542
  • [4] FAST COMPUTER-SEARCH FOR SIMILAR DNA-SEQUENCES
    BISHOP, M
    THOMPSON, E
    [J]. NUCLEIC ACIDS RESEARCH, 1984, 12 (13) : 5471 - 5474
  • [6] COMPREHENSIVE SEQUENCE-ANALYSIS OF THE 182 PREDICTED OPEN READING FRAMES OF YEAST CHROMOSOME-III
    BORK, P
    OUZOUNIS, C
    SANDER, C
    SCHARF, M
    SCHNEIDER, R
    SONNHAMMER, E
    [J]. PROTEIN SCIENCE, 1992, 1 (12) : 1677 - 1690
  • [7] NORMAL DEVELOPMENT AND BEHAVIOR OF MICE LACKING THE NEURONAL CELL-SURFACE PRP PROTEIN
    BUELER, H
    FISCHER, M
    LANG, Y
    BLUETHMANN, H
    LIPP, HP
    DEARMOND, SJ
    PRUSINER, SB
    AGUET, M
    WEISSMANN, C
    [J]. NATURE, 1992, 356 (6370) : 577 - 582
  • [8] PREDICTION OF PROTEIN FOLDING CLASS FROM AMINO-ACID-COMPOSITION
    DUBCHAK, I
    HOLBROOK, SR
    KIM, SH
    [J]. PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1993, 16 (01): : 79 - 91
  • [9] DOES THE HIV NEF PROTEIN MIMIC THE MHC
    HOBOHM, U
    SANDER, C
    [J]. FEBS LETTERS, 1993, 333 (03) : 211 - 213
  • [10] HOBOHM U, 1992, PROTEIN SCI, V1, P409