Information-theoretic evaluation of predicted ontological annotations

被引:84
作者
Clark, Wyatt T. [1 ]
Radivojac, Predrag [1 ]
机构
[1] Indiana Univ, Dept Comp Sci & Informat, Bloomington, IN 47405 USA
关键词
SEMANTIC SIMILARITY; PROTEIN FUNCTION; GENE ONTOLOGY;
D O I
10.1093/bioinformatics/btt228
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The development of effective methods for the prediction of ontological annotations is an important goal in computational biology, with protein function prediction and disease gene prioritization gaining wide recognition. Although various algorithms have been proposed for these tasks, evaluating their performance is difficult owing to problems caused both by the structure of biomedical ontologies and biased or incomplete experimental annotations of genes and gene products. Results: We propose an information-theoretic framework to evaluate the performance of computational protein function prediction. We use a Bayesian network, structured according to the underlying ontology, to model the prior probability of a protein's function. We then define two concepts, misinformation and remaining uncertainty, that can be seen as information-theoretic analogs of precision and recall. Finally, we propose a single statistic, referred to as semantic distance, that can be used to rank classification models. We evaluate our approach by analyzing the performance of three protein function predictors of Gene Ontology terms and provide evidence that it addresses several weaknesses of currently used metrics. We believe this framework provides useful insights into the performance of protein function prediction tools.
引用
收藏
页码:53 / 61
页数:9
相关论文
共 20 条
[1]   Ontology engineering [J].
Alterovitz, Gil ;
Xiang, Michael ;
Hill, David P. ;
Lomax, Jane ;
Liu, Jonathan ;
Cherkassky, Michael ;
Dreyfuss, Jonathan ;
Mungall, Chris ;
Harris, Midori A. ;
Dolan, Mary E. ;
Blake, Judith A. ;
Ramoni, Marco F. .
NATURE BIOTECHNOLOGY, 2010, 28 (02) :128-130
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
[Anonymous], 2011, INTRO BIOONTOLOGIES
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   Analysis of protein function and its prediction from amino acid sequence [J].
Clark, Wyatt T. ;
Radivojac, Predrag .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (07) :2086-2096
[6]   Semantic similarity analysis of protein data: assessment with biological features and issues [J].
Guzzi, Pietro H. ;
Mina, Marco ;
Guerra, Concettina ;
Cannataro, Mario .
BRIEFINGS IN BIOINFORMATICS, 2012, 13 (05) :569-585
[7]  
Jiang J, 1997, INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, 1997 DIGEST OF TECHNICAL PAPERS, P94
[8]  
Koller D., 2009, PROBABILISTIC GRAPHI, DOI DOI 10.1016/J.CCL.2010.07.006
[9]  
Lin D., 1998, An information-theoretic definition of similarity, P296
[10]   Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation [J].
Lord, PW ;
Stevens, RD ;
Brass, A ;
Goble, CA .
BIOINFORMATICS, 2003, 19 (10) :1275-1283