Experimental study on a two phase method for biomedical named entity recognition

被引:7
作者
Kim, Seonho [1 ]
Yoon, Juntae
机构
[1] Sogang Univ, Seoul 121742, South Korea
[2] Damusoft Inc, Seoul, South Korea
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2007年 / E90D卷 / 07期
关键词
information extraction; named entity recognition; two-phase model; ME; CRF; SVM; FST;
D O I
10.1093/ietisy/e90-d.7.1103
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we describe a two-phase method for biomedical named entity recognition consisting of term boundary detection and biomedical category labeling. The term boundary detection can be defined as a task to assign label sequences to a given sentence, and biomedical category labeling can be viewed as a local classification problem which does not need knowledge of the labels of other named entities in a sentence. The advantage of dividing the recognition process into two phases is that we can measure the effectiveness of models at each phase and select separately the appropriate model for each subtask. In order to obtain a better performance in biomedical named entity recognition, we conducted comparative experiments using several learning methods at each phase. Moreover, results by these machine learning based models are refined by rule-based postprocessing. We tested our methods on the JNLPBA 2004 shared task and the GENIA corpus.
引用
收藏
页码:1103 / 1110
页数:8
相关论文
共 18 条
[1]  
Albrecht D., 2006, P 21 INT C COMP LING, V21, P465
[2]  
[Anonymous], P COLING
[3]  
[Anonymous], 2000, P 2 WORKSH LEARN LOG
[4]  
CHEN SF, 1999, CMUCS99108
[5]  
Dan Shen, 2003, P ACL 2003 WORKSH NA, P49, DOI DOI 10.3115/1118958.1118965
[6]  
Finkel J., 2004, JOINT WORKSHOP NATUR, P88
[7]  
Fukuda K, 1998, Pac Symp Biocomput, P707
[8]  
GuoDong Zhou., 2004, JNLPBA'04, P96
[9]  
Kazama JI, 2002, ACL-02 Workshop on Natural Language Processing in the Biomedical Domain, V3, P1, DOI DOI 10.3115/1118149.1118150
[10]   Term identification in the biomedical literature [J].
Krauthammer, M ;
Nenadic, G .
JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (06) :512-526