COMPREHENSIVE SEQUENCE-ANALYSIS OF THE 182 PREDICTED OPEN READING FRAMES OF YEAST CHROMOSOME-III

被引:90
作者
BORK, P
OUZOUNIS, C
SANDER, C
SCHARF, M
SCHNEIDER, R
SONNHAMMER, E
机构
[1] European Molecular Biology Laboratory, Heidelberg
关键词
COMPUTER METHODS; GENOME PROJECTS; PREDICTION OF PROTEIN FUNCTION; PREDICTION OF PROTEIN STRUCTURE; PROTEIN SEQUENCE ANALYSIS;
D O I
10.1002/pro.5560011216
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
With the completion of the first phase of the European yeast genome sequencing project, the complete DNA sequence of chromosome III of Saccharomyces cerevisiae has become available (Oliver, S.G., et al., 1992, Nature 357, 38-46). We have tested the predictive power of computer sequence analysis on the 176 probable protein products of this chromosome, after exclusion of six problem cases. When the results of database similarity searches are pooled with prior knowledge, a likely function can be assigned to 42% of the proteins, and a predicted three-dimensional structure to a third of these (140% of the total). The function of the remaining 58% remains to be determined. Of these, about one-third have one or more probable transmembrane segments. Among the most interesting proteins with predicted functions are a new member of the type X polymerase family, a transcription factor with an N-terminal DNA-binding domain related to GAL4, a ''fork head'' DNA-binding domain previously known only in Drosophila and in mammals, and a putative methyltransferase. Our analysis increased the number of known significant sequence similarities on chromosome III by 13, to now 67. Although the near 40% success rate of identifying unknown protein function by sequence analysis is surprisingly high, the information gap between known protein sequences and unknown function is expected to widen and become a major bottleneck of genome projects in the near future. Based on the experience gained in this test study, we suggest that the development of an automated computer workbench for protein sequence analysis must be an important item in genome projects.
引用
收藏
页码:1677 / 1690
页数:14
相关论文
共 47 条
  • [21] CHANCE AND STATISTICAL SIGNIFICANCE IN PROTEIN AND DNA-SEQUENCE ANALYSIS
    KARLIN, S
    BRENDEL, V
    [J]. SCIENCE, 1992, 257 (5066) : 39 - 49
  • [22] METHODS FOR ASSESSING THE STATISTICAL SIGNIFICANCE OF MOLECULAR SEQUENCE FEATURES BY USING GENERAL SCORING SCHEMES
    KARLIN, S
    ALTSCHUL, SF
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1990, 87 (06) : 2264 - 2268
  • [23] SEQUENCE MOTIFS CHARACTERISTIC OF DNA[CYTOSINE-N4]METHYLTRANSFERASES - SIMILARITY TO ADENINE AND CYTOSINE-C5 DNA-METHYLASES
    KLIMASAUSKAS, S
    TIMINSKAS, A
    MENKEVICIUS, S
    BUTKIENE, D
    BUTKUS, V
    JANULAITIS, A
    [J]. NUCLEIC ACIDS RESEARCH, 1989, 17 (23) : 9823 - 9832
  • [24] KONDO K, 1991, J BIOL CHEM, V266, P17537
  • [25] STRUCTURE OF THE DNA-BINDING DOMAIN OF ZINC GAL4
    KRAULIS, PJ
    RAINE, ARC
    GADHAVI, PL
    LAUE, ED
    [J]. NATURE, 1992, 356 (6368) : 448 - 450
  • [26] A SIMPLE METHOD FOR DISPLAYING THE HYDROPATHIC CHARACTER OF A PROTEIN
    KYTE, J
    DOOLITTLE, RF
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1982, 157 (01) : 105 - 132
  • [27] CLONING OF A CELLULAR FACTOR, INTERLEUKIN BINDING-FACTOR, THAT BINDS TO NFAT-LIKE MOTIFS IN THE HUMAN-IMMUNODEFICIENCY-VIRUS LONG TERMINAL REPEAT
    LI, C
    LAI, CF
    SIGMAN, DS
    GAYNOR, RB
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1991, 88 (17) : 7739 - 7743
  • [28] PREDICTING COILED COILS FROM PROTEIN SEQUENCES
    LUPAS, A
    VANDYKE, M
    STOCK, J
    [J]. SCIENCE, 1991, 252 (5009) : 1162 - 1164
  • [29] YEAST GENE SRP1 (SERINE-RICH PROTEIN) - INTRAGENIC REPEAT STRUCTURE AND IDENTIFICATION OF A FAMILY OF SRP1-RELATED DNA-SEQUENCES
    MARGUET, D
    GUO, XJ
    LAUQUIN, GJM
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1988, 202 (03) : 455 - 470
  • [30] DNA RECOGNITION BY GAL4 - STRUCTURE OF A PROTEIN DNA COMPLEX
    MARMORSTEIN, R
    CAREY, M
    PTASHNE, M
    HARRISON, SC
    [J]. NATURE, 1992, 356 (6368) : 408 - 414