Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature

被引:110
作者
Yang, Zhihao [1 ]
Lin, Hongfei [1 ]
Li, Yanpeng [1 ]
机构
[1] Dalian Univ Technol, Dept Comp Sci & Engn, Dalian 116023, Peoples R China
基金
中国国家自然科学基金;
关键词
text mining; entity recognition; edit distance; conditional random fields;
D O I
10.1016/j.compbiolchem.2008.03.008
中图分类号
Q [生物科学];
学科分类号
07 [理学]; 0710 [生物学]; 09 [农学];
摘要
Bio-entity name recognition is the key step for information extraction from biomedical literature. This paper presents a dictionary-based bio-entity name recognition approach. The approach expands the bio-entity name dictionary via the Abbreviation Definitions identifying algorithm, improves the recall rate through the improved edit distance algorithm and adopts some post-processing methods including Pre-keyword and Post-keyword expansion, Part of Speech expansion, merge of adjacent bio-entity names and the exploitation of the contextual cues to further improve the performance. Experiment results show that with this approach even an internal dictionary-based system could achieve a fairly good performance. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:287 / 291
页数:5
相关论文
共 15 条
[1]
[Anonymous], 2003, NATURAL LANGUAGE PRO
[2]
COHEN AM, 2005, P ACL ISMB WORKSH LI, P14
[3]
Exploring the boundaries: gene and protein identification in biomedical text [J].
Finkel, J ;
Dingare, S ;
Manning, CD ;
Nissim, M ;
Alex, B ;
Grover, C .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[4]
Finkel J., 2004, Joint Workshop on Natural Language Processing in Biomedicine and its Applications JNLPBA, P91
[5]
GuoDong Z, 2004, P INT JOINT WORKSH N, P96, DOI DOI 10.3115/1567594.1567616
[6]
Overview of BioCreAtIvE: critical assessment of information extraction for biology [J].
Hirschman, L ;
Yeh, A ;
Blaschke, C ;
Valencia, A .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[7]
GENIA corpus-a semantically annotated corpus for bio-textmining [J].
Kim, J-D ;
Ohta, T. ;
Tateisi, Y. ;
Tsujii, J. .
BIOINFORMATICS, 2003, 19 :i180-i182
[8]
Kim J-D, 2004, P INT JOINT WORKSHOP, P70
[9]
Lafferty J., 2001, PROC 18 INT C MACHIN, DOI [DOI 10.1038/NPROT.2006.61, 10.1038/nprot.2006.61]
[10]
A guided tour to approximate string matching [J].
Navarro, G .
ACM COMPUTING SURVEYS, 2001, 33 (01) :31-88