Exploiting the contextual cues for bio-entity name recognition in biomedical literature

被引:16
作者
Yang, Zhihao [1 ]
Lin, Hongfei [1 ]
Li, Yanpeng [1 ]
机构
[1] Dalian Univ Technol, Dept Comp Sci & Engn, Dalian 116023, Peoples R China
基金
中国国家自然科学基金;
关键词
text mining; information extraction; named entity recognition; conditional random fields; contextual cue;
D O I
10.1016/j.jbi.2008.01.002
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
To extract biomedical information about bio-entities from the huge amount of biomedical literature, the first key step is recognizing their names in these literatures, which remains a challenging task due to the irregularities and ambiguities in bio-entities nomenclature. The recognition performances of the current popular methods, machine learning techniques, still have much space to be improved. This paper presents a Conditional Random Field-based approach used to recognize the names of bio-entities including gene, protein, cell type, cell line and studies the methods of improving the performance by the exploitation of the contextual cues including bracket pair, heuristic syntax structure and interaction words cue. Experiment results on both JNLPBA2004 and BioCreative2004 task 1A datasets show that these methods can improve Conditional Random Field-based recognition performance by more than 2 points in F-score. (C) 2008 Elsevier Inc. All rights reserved.
引用
收藏
页码:580 / 587
页数:8
相关论文
共 23 条
[1]  
ANDO RK, 2007, BIOCREATIVE, V2
[2]  
[Anonymous], 2003, NATURAL LANGUAGE PRO
[3]  
COHEN AM, 2005, P ACL ISMB WORKSH LI, P14
[4]   Exploring the boundaries: gene and protein identification in biomedical text [J].
Finkel, J ;
Dingare, S ;
Manning, CD ;
Nissim, M ;
Alex, B ;
Grover, C .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[5]  
Finkel J., 2004, Joint Workshop on Natural Language Processing in Biomedicine and its Applications JNLPBA, P91
[6]  
FUKUDA K, 1998, PAC S BIOCOMPUT, V3, P705
[7]  
GuoDong Z, 2004, P INT JOINT WORKSH N, P96, DOI DOI 10.3115/1567594.1567616
[8]   Overview of BioCreAtIvE: critical assessment of information extraction for biology [J].
Hirschman, L ;
Yeh, A ;
Blaschke, C ;
Valencia, A .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[9]   GENIA corpus-a semantically annotated corpus for bio-textmining [J].
Kim, J-D ;
Ohta, T. ;
Tateisi, Y. ;
Tsujii, J. .
BIOINFORMATICS, 2003, 19 :i180-i182
[10]  
Kim J-D, 2004, P INT JOINT WORKSHOP, P70