Mapping gene ontology to proteins based on protein-protein interaction data

被引:104
作者
Deng, MH [1 ]
Tu, ZD [1 ]
Sun, FZ [1 ]
Chen, T [1 ]
机构
[1] Univ So Calif, Dept Biol Sci, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btg500
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Gene Ontology (GO) consortium provides structural description of protein function that is used as a common language for gene annotation in many organisms. Large-scale techniques have generated many valuable protein-protein interaction datasets that are useful for the study of protein function. Combining both GO and protein-protein interaction data allows the prediction of function for unknown proteins. Result: We apply a Markov random field method to the prediction of yeast protein function based on multiple protein-protein interaction datasets. We assign function to unknown proteins with a probability representing the confidence of this prediction. The functions are based on three general categories of cellular component, molecular function and biological process defined in GO. The yeast proteins are defined in the Saccharomyces Genome Database (SGD). The protein-protein interaction datasets are obtained from the Munich Information Center for Protein Sequences (MIPS), including physical interactions and genetic interactions. The efficiency of our prediction is measured by applying the leave-one-out validation procedure to a functional path matching scheme, which compares the prediction with the GO description of a protein's function from the abstract level to the detailed level along the GO structure. For biological process, the leave-one-out validation procedure shows 52% precision and recall of our method, much better than that of the simple guilty-by-association methods. Supplementary material: http://www.cmb.usc.edu/similar tomsms/gomapping.
引用
收藏
页码:895 / 902
页数:8
相关论文
共 33 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], P 7 INT C COMP MOL B
[3]  
Ashburner M, 2001, GENOME RES, V11, P1425
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]  
Bader GD, 2003, NUCLEIC ACIDS RES, V31, P248, DOI 10.1093/nar/gkg056
[6]   The GRID: The General Repository for Interaction Datasets [J].
Breitkreutz, BJ ;
Stark, C ;
Tyers, M .
GENOME BIOLOGY, 2003, 4 (03)
[7]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[8]   YPD™, PombePD™ and WormPD™:: model organism volumes of the BioKnowledge™ Library, an integrated resource for protein information [J].
Costanzo, MC ;
Crawford, ME ;
Hirschman, JE ;
Kranz, JE ;
Olsen, P ;
Robertson, LS ;
Skrzypek, MS ;
Braun, BR ;
Hopkins, KL ;
Kondu, P ;
Lengieza, C ;
Lew-Smith, JE ;
Tillberg, M ;
Garrels, JI .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :75-79
[9]   First InP/InGaAs PNPHBT grown by metal organic chemical vapor deposition [J].
Cui, DL ;
Hsu, S ;
Pavlidis, D .
2001 INTERNATIONAL CONFERENCE ON INDIUM PHOSPHIDE AND RELATED MATERIALS, CONFERENCE PROCEEDINGS, 2001, :224-227
[10]   Protein interactions - Two methods for assessment of the reliability of high throughput observations [J].
Deane, CM ;
Salwinski, L ;
Xenarios, I ;
Eisenberg, D .
MOLECULAR & CELLULAR PROTEOMICS, 2002, 1 (05) :349-356