Analysis of the yeast transcriptome with structural and functional categories: characterizing highly expressed proteins

被引:98
作者
Jansen, R [1 ]
Gerstein, M [1 ]
机构
[1] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
关键词
D O I
10.1093/nar/28.6.1481
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We analyzed 10 genome expression data sets by large-scale cross-referencing against broad structural and functional categories. The data sets, generated by different techniques (e.g. SAGE and gene chips), provide various representations of the yeast transcriptome (the set of all yeast genes, weighted by transcript abundance). Our analysis enabled us to determine features more prevalent in the transcriptome than the genome: i.e. those that are common to highly expressed proteins. Starting with simplest categories, we find that, relative to the genome, the transcriptome is enriched in Ala and Gly and depleted in Asn and very long proteins. We find, furthermore, that protein length and maximum expression level have a roughly inverse relationship. To relate expression level and protein structure, we assigned transmembrane helices and known folds (using PSI-blast) to each protein in the genome; this allowed us to determine that the transcriptome is enriched in mixed alpha-beta structures and depleted in membrane proteins relative to the genome. In particular, some enzymatic folds, such as the TIM barrel and the G3P dehydrogenase fold, are much more prevalent in the transcriptome than the genome, whereas others, such as the protein-kinase and leucine-zipper folds, are depleted. The TIM barrel, in fact, is overwhelmingly the 'top fold' in the transcriptome, while it only ranks fifth in the genome. The most highly enriched functional categories in the transcriptome (based on the MIPS system) are energy production and protein synthesis, while categories such as transcription, transport and signaling ave depleted. Furthermore, for a given functional category, transcriptome enrichment varies quite substantially between the different expression data sets, with a variation an order of magnitude larger than for the other categories cross-referenced (e.g. amino acids). One can readily see how the enrichment and depletion of the various functional categories relates directly to that of particular folds. Further information can be found at http://bioinfo.mbb.yale.edu/genome/expression.
引用
收藏
页码:1481 / 1488
页数:8
相关论文
共 47 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [3] Arkin IT, 1997, PROTEINS, V28, P465, DOI 10.1002/(SICI)1097-0134(199708)28:4<465::AID-PROT1>3.0.CO
  • [4] 2-9
  • [5] How many membrane proteins are there?
    Boyd, D
    Schierle, C
    Beckwith, J
    [J]. PROTEIN SCIENCE, 1998, 7 (01) : 201 - 205
  • [6] Predicting gene regulatory elements in silico on a genomic scale
    Brazma, A
    Jonassen, I
    Vilo, J
    Ukkonen, E
    [J]. GENOME RESEARCH, 1998, 8 (11) : 1202 - 1215
  • [7] CHAKRABARTTY A, 1994, PROTEIN SCI, V3, P843
  • [8] The transcriptional program of sporulation in budding yeast
    Chu, S
    DeRisi, J
    Eisen, M
    Mulholland, J
    Botstein, D
    Brown, PO
    Herskowitz, I
    [J]. SCIENCE, 1998, 282 (5389) : 699 - 705
  • [9] Cluster analysis and display of genome-wide expression patterns
    Eisen, MB
    Spellman, PT
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868
  • [10] IDENTIFYING NONPOLAR TRANSBILAYER HELICES IN AMINO-ACID-SEQUENCES OF MEMBRANE-PROTEINS
    ENGELMAN, DM
    STEITZ, TA
    GOLDMAN, A
    [J]. ANNUAL REVIEW OF BIOPHYSICS AND BIOPHYSICAL CHEMISTRY, 1986, 15 : 321 - 353