RECOGNITION OF DISTANTLY RELATED PROTEIN SEQUENCES USING CONSERVED MOTIFS AND NEURAL NETWORKS

被引:20
作者
FRISHMAN, D [1 ]
ARGOS, P [1 ]
机构
[1] ACAD SCI ST PETERSBURG, INST EVOLUT PHYSIOL & BIOCHEM, ST PETERSBURG 194223, RUSSIA
关键词
SEQUENCE COMPARISON; CONSERVED SEQUENCE PATTERNS; SEQUENCE MOTIFS; NEURAL NETWORKS;
D O I
10.1016/0022-2836(92)90877-M
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A sensitive technique for protein sequence motif recognition based on neural networks has been developed. It involves three major steps. (1) At each appropriate alignment position of a set of N matched sequences, a set of N aligned oligopeptides is specified with preselected window length. N neural nets are subsequently and successively trained on N-1 amino acid spans after eliminating each ith oligopeptide. A test for recognition of each of the ith spans is performed. The average neural net recognition over N such trials is used as a measure of conservation for the particular windowed region of the multiple alignment. This process is repeated for all possible spans of given length in the multiple alignment. (2) The M most conserved regions are regarded as motifs and the oligopeptides within each are used to train intensively M individual neural networks. (3) The M networks are then applied in a search for related primary structures in a databank of known protein sequences. The oligopeptide spans in the database sequence with strongest neural net output for each of the M networks are saved and then scored according to the output signals and the proper combination that follows the expected N- to C-terminal sequence order. The motifs from the database with highest similarity scores can then be used to retrain the M neural nets, which can be subsequently utilized for further searches in the databank, thus providing even greater sensitivity to recognize distant familial proteins. This technique was successfully applied to the integrase, DNA-polymerase and immunoglobulin families. © 1992.
引用
收藏
页码:951 / 962
页数:12
相关论文
共 41 条
  • [1] EVIDENCE FOR A 2ND CONSERVED ARGININE RESIDUE IN THE INTEGRASE FAMILY OF RECOMBINATION PROTEINS
    ABREMSKI, KE
    HOESS, RH
    [J]. PROTEIN ENGINEERING, 1992, 5 (01): : 87 - 91
  • [2] [Anonymous], 1987, LEARNING INTERNAL RE
  • [3] THE INTEGRASE FAMILY OF SITE-SPECIFIC RECOMBINASES - REGIONAL SIMILARITIES AND GLOBAL DIVERSITY
    ARGOS, P
    LANDY, A
    ABREMSKI, K
    EGAN, JB
    HAGGARDLJUNGQUIST, E
    HOESS, RH
    KAHN, ML
    KALIONIS, B
    NARAYANA, SVL
    PIERSON, LS
    STERNBERG, N
    LEONG, JM
    [J]. EMBO JOURNAL, 1986, 5 (02) : 433 - 440
  • [4] ARGOS P, 1990, METHOD ENZYMOL, V182, P751
  • [5] THE SWISS-PROT PROTEIN-SEQUENCE DATA-BANK
    BAIROCH, A
    BOECKMANN, B
    [J]. NUCLEIC ACIDS RESEARCH, 1991, 19 : 2247 - 2248
  • [6] PROSITE - A DICTIONARY OF SITES AND PATTERNS IN PROTEINS
    BAIROCH, A
    [J]. NUCLEIC ACIDS RESEARCH, 1991, 19 : 2241 - 2245
  • [7] BENGIO Y, 1990, COMPUT APPL BIOSCI, V6, P319
  • [8] PROTEIN DATA BANK - COMPUTER-BASED ARCHIVAL FILE FOR MACROMOLECULAR STRUCTURES
    BERNSTEIN, FC
    KOETZLE, TF
    WILLIAMS, GJB
    MEYER, EF
    BRICE, MD
    RODGERS, JR
    KENNARD, O
    SHIMANOUCHI, T
    TASUMI, M
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1977, 112 (03) : 535 - 542
  • [9] PROTEIN SECONDARY STRUCTURE AND HOMOLOGY BY NEURAL NETWORKS - THE ALPHA-HELICES IN RHODOPSIN
    BOHR, H
    BOHR, J
    BRUNAK, S
    COTTERILL, RMJ
    LAUTRUP, B
    NORSKOV, L
    OLSEN, OH
    PETERSEN, SB
    [J]. FEBS LETTERS, 1988, 241 (1-2) : 223 - 228
  • [10] A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE
    BOWIE, JU
    LUTHY, R
    EISENBERG, D
    [J]. SCIENCE, 1991, 253 (5016) : 164 - 170