A graph-search framework for associating gene identifiers with documents

被引:13
作者
Cohen, William W.
Minkov, Einat
机构
[1] Carnegie Mellon Univ, Dept Machine Learning, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[3] Carnegie Mellon Univ, Ctr Bioimage Informat, Pittsburgh, PA 15213 USA
关键词
D O I
10.1186/1471-2105-7-440
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: One step in the model organism database curation process is to find, for each article, the identifier of every gene discussed in the article. We consider a relaxation of this problem suitable for semi-automated systems, in which each article is associated with a ranked list of possible gene identifiers, and experimentally compare methods for solving this geneld ranking problem. In addition to baseline approaches based on combining named entity recognition (NER) systems with a "soft dictionary" of gene synonyms, we evaluate a graph-based method which combines the outputs of multiple NER systems, as well as other sources of information, and a learning method for reranking the output of the graph-based method. Results: We show that named entity recognition (NER) systems with similar F-measure performance can have significantly different performance when used with a soft dictionary for geneld-ranking. The graph-based approach can outperform any of its component NER systems, even without learning, and learning can further improve the performance of the graph-based ranking approach. Conclusion: The utility of a named entity recognition ( NER) system for geneld-finding may not be accurately predicted by its entity-level FI performance, the most common performance measure. Geneld-ranking systems are best implemented by combining several NER systems. With appropriate combination methods, usefully accurate geneld-ranking systems can be constructed based on easily-available resources, without resorting to problem-specific, engineered components.
引用
收藏
页数:16
相关论文
共 30 条
[1]  
[Anonymous], 2002, P ICML
[2]  
[Anonymous], P HLT NAACL
[3]  
[Anonymous], SECONDSTRING OPEN SO
[4]  
[Anonymous], 2003, P IJCAI 2003 WORKSH
[5]  
[Anonymous], EMPIRICAL METHODS NA
[6]  
[Anonymous], PSB 2000
[7]  
[Anonymous], 2003, P 20 INT C MACH LEAR
[8]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[9]  
Bunescu R., 2004, J ARTIF INTELL MED, V33, P139
[10]   A survey of current work in biomedical text mining [J].
Cohen, AM ;
Hersh, WR .
BRIEFINGS IN BIOINFORMATICS, 2005, 6 (01) :57-71