A Bayesian system integrating expression data with sequence patterns for localizing proteins: Comprehensive application to the yeast genome

被引:105
作者
Drawid, A
Gerstein, M
机构
[1] Yale Univ, Dept Mol Biophys, New Haven, CT 06520 USA
[2] Yale Univ, Dept Biochem, New Haven, CT 06520 USA
[3] Yale Univ, Dept Comp Sci, New Haven, CT 06520 USA
基金
美国国家卫生研究院;
关键词
proteomics; bioinformatics; machine learning; cDNA microarray analysis; subcellular localization;
D O I
10.1006/jmbi.2000.3968
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We develop a probabilistic system for predicting the subcellular localization of proteins and estimating the relative population of the various compartments in yeast. Our system employs a Bayesian approach, updating a protein's probability of being in a compartment, based on a diverse range of 30 features. These range from specific motifs (e.g. signal sequences or the HDEL motif) to overall properties of a sequence (e.g. surface composition or isoelectric point) to whole-genome data (e.g. absolute mRNA expression levels or their fluctuations). The strength of our approach is the easy integration of many features, particularly the whole-genome expression data. We construct a training and testing set of similar to 1300 yeast proteins with an experimentally known localization from merging, filtering, and standardizing the annotation in the MIPS, Swiss-Prot and YPD databases, and we achieve 75% accuracy on individual protein predictions using this dataset. Moreover, we are able to estimate the relative protein population of the various compartments without requiring a definite localization for every protein. This approach, which is based on an analogy to formalism in quantum mechanics, gives better accuracy in determining relative compartment populations than that obtained by simply tallying the localization predictions for individual proteins (on the yeast proteins with known localization, 92% versus 74%). Our training and testing also highlights which of the 30 features are informative and which are redundant (19 being particularly useful). After developing our system, we apply it to the 4700 yeast proteins with currently unknown localization and estimate the relative population of the various compartments in the entire yeast genome. An unbiased prior is essential to this extrapolated estimate; for this, we use the MIPS localization catalogue, and adapt recent results on the localization of yeast proteins obtained by Snyder and colleagues using a minitransposon system. Our final localizations for all similar to 6000 proteins in the yeast genome are available over the web at: http:/ /bioinfo.mbb.yale.edu /genome/ localize (C) 2000 Academic Press.
引用
收藏
页码:1059 / 1075
页数:17
相关论文
共 62 条
  • [41] Nielsen H, 1997, Int J Neural Syst, V8, P581, DOI 10.1142/S0129065797000537
  • [42] Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites
    Nielsen, H
    Engelbrecht, J
    Brunak, S
    vonHeijne, G
    [J]. PROTEIN ENGINEERING, 1997, 10 (01): : 1 - 6
  • [43] Machine learning approaches for the prediction of signal peptides and other protein sorting signals
    Nielsen, H
    Brunak, S
    von Heijne, G
    [J]. PROTEIN ENGINEERING, 1999, 12 (01): : 3 - 9
  • [44] Using neural networks for prediction of the subcellular location of proteins
    Reinhardt, A
    Hubbard, T
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (09) : 2230 - 2236
  • [45] REST B, 1994, PROTEINS, V20, P216
  • [46] Large-scale analysis of the yeast genome by transposon tagging and gene disruption
    Ross-Macdonald, P
    Coelho, PSR
    Roemer, T
    Agarwal, S
    Kumar, A
    Jansen, R
    Cheung, KH
    Sheehan, A
    Symoniatis, D
    Umansky, L
    Heldtman, M
    Nelson, FK
    Iwasaki, H
    Hager, K
    Gerstein, M
    Miller, P
    Roeder, GS
    Snyder, M
    [J]. NATURE, 1999, 402 (6760) : 413 - 418
  • [47] ROST B, 1995, PROTEIN SCI, V4, P521
  • [48] Rost B, 1996, METHOD ENZYMOL, V266, P525
  • [49] Quantitative phenotypic analysis of yeast deletion mutants using a highly parallel molecular bar-coding strategy
    Shoemaker, DD
    Lashkari, DA
    Morris, D
    Mittmann, M
    Davis, RW
    [J]. NATURE GENETICS, 1996, 14 (04) : 450 - 456
  • [50] PREDICTING THE TOPOLOGY OF EUKARYOTIC MEMBRANE-PROTEINS
    SIPOS, L
    VONHEIJNE, G
    [J]. EUROPEAN JOURNAL OF BIOCHEMISTRY, 1993, 213 (03): : 1333 - 1340