A GRAPH-BASED SEMANTIC SIMILARITY MEASURE FOR THE GENE ONTOLOGY

被引:15
作者
Alvarez, Marco A. [1 ]
Yan, Changhui [2 ]
机构
[1] Utah State Univ, Dept Comp Sci, Logan, UT 84322 USA
[2] N Dakota State Univ, Dept Comp Sci, Fargo, ND 58102 USA
关键词
Semantics; ontology; graph; EXPRESSION;
D O I
10.1142/S0219720011005641
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.
引用
收藏
页码:681 / 695
页数:15
相关论文
共 23 条
[1]  
Alvarez MA, 2007, INT C SEM COMP
[2]  
[Anonymous], INT C RES COMP LING
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   The GOA database in 2009-an integrated Gene Ontology Annotation resource [J].
Barrell, Daniel ;
Dimmer, Emily ;
Huntley, Rachael P. ;
Binns, David ;
O'Donovan, Claire ;
Apweiler, Rolf .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D396-D403
[5]  
Budanitsky A, 2006, COMPUT LINGUIST, V32, P13, DOI 10.1162/coli.2006.32.1.13
[6]   A transversal approach to predict gene product networks from ontology-based similarity [J].
Chabalier, Julie ;
Mosser, Jean ;
Burgun, Anita .
BMC BIOINFORMATICS, 2007, 8 (1)
[7]   Measuring semantic similarity between Gene Ontology terms [J].
Couto, Francisco M. ;
Silva, Mario J. ;
Coutinho, Pedro M. .
DATA & KNOWLEDGE ENGINEERING, 2007, 61 (01) :137-152
[8]   G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery [J].
Du, Zhidian ;
Li, Lin ;
Chen, Chin-Fu ;
Yu, Philip S. ;
Wang, James Z. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W345-W349
[9]   The Pfam protein families database [J].
Finn, Robert D. ;
Tate, John ;
Mistry, Jaina ;
Coggill, Penny C. ;
Sammut, Stephen John ;
Hotz, Hans-Rudolf ;
Ceric, Goran ;
Forslund, Kristoffer ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D281-D288
[10]  
LIN D, 1998, INT C MACH LEARN