Protein classification using probabilistic chain graphs and the Gene Ontology structure

被引:26
作者
Carroll, Steven [1 ]
Pavlovic, Vladimir [1 ]
机构
[1] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08854 USA
关键词
D O I
10.1093/bioinformatics/btl187
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Probabilistic graphical models have been developed in the past for the task of protein classification. In many cases, classifications obtained from the Gene Ontology have been used to validate these models. In this work we directly incorporate the structure of the Gene Ontology into the graphical representation for protein classification. We present a method in which each protein is represented by a replicate of the Gene Ontology structure, effectively modeling each protein in its own 'annotation space'. Proteins are also connected to one another according to different measures of functional similarity, after which belief propagation is run to make predictions at all ontology terms. Results: The proposed method was evaluated on a set of 4879 proteins from the Saccharomyces Genome Database whose interactions were also recorded in the GRID project. Results indicate that direct utilization of the Gene Ontology improves predictive ability, outperforming traditional models that do not take advantage of dependencies among functional terms. Average increase in accuracy (precision) of positive and negative term predictions of 27.8% (2.0%) over three different similarity measures and three subontologies was observed.
引用
收藏
页码:1871 / 1878
页数:8
相关论文
共 23 条
[1]  
[Anonymous], P 7 INT C COMP MOL B
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   The GRID: The General Repository for Interaction Datasets [J].
Breitkreutz, BJ ;
Stark, C ;
Tyers, M .
GENOME BIOLOGY, 2003, 4 (03)
[4]   Mapping gene ontology to proteins based on protein-protein interaction data [J].
Deng, MH ;
Tu, ZD ;
Sun, FZ ;
Chen, T .
BIOINFORMATICS, 2004, 20 (06) :895-902
[5]   Prediction of protein function using protein-protein interaction data [J].
Deng, MH ;
Zhang, K ;
Mehta, S ;
Chen, T ;
Sun, FZ .
CSB2002: IEEE COMPUTER SOCIETY BIOINFORMATICS CONFERENCE, 2002, :197-206
[6]   A Bayesian system integrating expression data with sequence patterns for localizing proteins: Comprehensive application to the yeast genome [J].
Drawid, A ;
Gerstein, M .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 301 (04) :1059-1075
[7]   Who's your neighbor? New computational approaches for functional genomics [J].
Galperin, MY ;
Koonin, EV .
NATURE BIOTECHNOLOGY, 2000, 18 (06) :609-613
[8]   STOCHASTIC RELAXATION, GIBBS DISTRIBUTIONS, AND THE BAYESIAN RESTORATION OF IMAGES [J].
GEMAN, S ;
GEMAN, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1984, 6 (06) :721-741
[9]  
Iyer LM, 2001, GENOME BIOL, V2
[10]   Computational genomics [J].
Koonin, EV .
CURRENT BIOLOGY, 2001, 11 (05) :R155-R158