Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation

被引:529
作者
Lord, PW [1 ]
Stevens, RD [1 ]
Brass, A [1 ]
Goble, CA [1 ]
机构
[1] Univ Manchester, Dept Comp Sci, Manchester M13 9PL, Lancs, England
基金
英国工程与自然科学研究理事会; 英国生物技术与生命科学研究理事会;
关键词
D O I
10.1093/bioinformatics/btg153
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or 'semantic similarity' between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repetoire of analyses. Results: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases.
引用
收藏
页码:1275 / 1283
页数:9
相关论文
共 18 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], ADV OBJECT ORIENTED
[3]  
Ashburner M, 2001, GENOME RES, V11, P1425
[4]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[5]   Conceptual biology: Unearthing the gems [J].
Blagosklonny, MV ;
Pardee, AB .
NATURE, 2002, 416 (6879) :373-373
[6]  
Budanitsky A., 2001, WORKSH WORDN OTH LEX
[7]   The gene ontology annotation (GOA) project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro [J].
Camon, E ;
Magrane, M ;
Barrell, D ;
Binns, D ;
Fleischmann, W ;
Kersey, P ;
Mulder, N ;
Oinn, T ;
Maslen, J ;
Cox, A ;
Apweiler, R .
GENOME RESEARCH, 2003, 13 (04) :662-672
[8]  
CHANG J, 2001, PAC S BIOCOMPUT, V6, P374
[9]  
Fellbaum C, 1998, WORDNET ELECT LEXICA
[10]  
Jiang J. J., 1998, P INT C RES COMP LIN