An integrated probabilistic model for functional prediction of proteins

被引:85
作者
Deng, MH [1 ]
Chen, T [1 ]
Sun, FZ [1 ]
机构
[1] Univ So Calif, Dept Biol Sci, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
关键词
function prediction; Pfam domain; protein-protein interaction; Markov random field; Gibbs sampler;
D O I
10.1089/1066527041410346
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We develop an integrated probabilistic model to combine protein physical interactions, genetic interactions, highly correlated gene expression networks, protein complex data, and domain structures of individual proteins to predict protein functions. The model is an extension of our previous model for protein function prediction based on Markovian random field theory. The model is flexible in that other protein pairwise relationship information and features of individual proteins can be easily incorporated. Two features distinguish the integrated approach from other available methods for protein function prediction. One is that the integrated approach uses all available sources of information with different weights for different sources of data. It is a global approach that takes the whole network into consideration. The second feature is that the posterior probability that a protein has the function of interest is assigned. The posterior probability indicates how confident we are about assigning the function to the protein. We apply our integrated approach to predict functions of yeast proteins based upon MIPS protein function classifications and upon the interaction networks based on MIPS physical and genetic interactions, gene expression profiles, tandem affinity purification (TAP) protein complex data, and protein domain information. We study the recall and precision of the integrated approach using different sources of information by the leave-one-out approach. In contrast to using MIPS physical interactions only, the integrated approach combining all of the information increases the recall from 57% to 87% when the precision is set at 57%-an increase of 30%.
引用
收藏
页码:463 / 475
页数:13
相关论文
共 44 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[3]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[4]   Machine learning of functional class from phenotype data [J].
Clare, A ;
King, RD .
BIOINFORMATICS, 2002, 18 (01) :160-166
[5]   YPD™, PombePD™ and WormPD™:: model organism volumes of the BioKnowledge™ Library, an integrated resource for protein information [J].
Costanzo, MC ;
Crawford, ME ;
Hirschman, JE ;
Kranz, JE ;
Olsen, P ;
Robertson, LS ;
Skrzypek, MS ;
Braun, BR ;
Hopkins, KL ;
Kondu, P ;
Lengieza, C ;
Lew-Smith, JE ;
Tillberg, M ;
Garrels, JI .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :75-79
[6]   First InP/InGaAs PNPHBT grown by metal organic chemical vapor deposition [J].
Cui, DL ;
Hsu, S ;
Pavlidis, D .
2001 INTERNATIONAL CONFERENCE ON INDIUM PHOSPHIDE AND RELATED MATERIALS, CONFERENCE PROCEEDINGS, 2001, :224-227
[7]   Prediction of protein function using protein-protein interaction data [J].
Deng, MH ;
Zhang, K ;
Mehta, S ;
Chen, T ;
Sun, FZ .
CSB2002: IEEE COMPUTER SOCIETY BIOINFORMATICS CONFERENCE, 2002, :197-206
[8]  
Deng Minghua, 2003, Pac Symp Biocomput, P140
[9]  
Devos D, 2000, PROTEINS, V41, P98, DOI 10.1002/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO
[10]  
2-S