Two-phase biomedical named entity recognition using CRFs

被引:64
作者
Li, Lishuang [1 ]
Zhou, Rongpeng [1 ]
Huang, Degen [1 ]
机构
[1] Dalian Univ Technol, Dept Comp Sci & Engn, Dalian 116023, Peoples R China
关键词
Text mining; Biomedical named entity recognition; Named entity detection; Named entity classification; Conditional random fields; TEXT;
D O I
10.1016/j.compbiolchem.2009.07.004
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
As a fundamental step of biomedical text mining, Biomedical Named Entity Recognition (Bio-NER) remains a challenging task. This paper explores a so-called two-phase approach to identify biomedical entities, in which the recognition task is divided into two subtasks: Named Entity Detection (NED) and Named Entity Classification (NEC). And the two subtasks are finished in two phases. At the first phase, we try to identify each named entity with a Conditional Random Fields (CRFs) model without identifying its type; at the second phase, another CRFs model is used to determine the correct entity type for each identified entity. This treatment can reduce the training time significantly and furthermore, more relevant features can be selected for each subtask. In order to achieve a better performance, post-processing algorithms are employed before NEC subtask. Experiments conducted on JNLPBA2004 datasets show that our two-phase approach can achieve an F-score of 74.31%, which outperforms most of the state-of-the-art systems. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:334 / 338
页数:5
相关论文
共 20 条
[1]  
[Anonymous], P INT JOINT WORKSH N
[2]  
[Anonymous], 2001, P 18 INT C MACH LEAR, DOI DOI 10.5555/645530.655813
[3]   A GA based intelligent traffic signal scheduling model [J].
Chang, Shaw C. ;
Tsai, Ming W. ;
Huang, Gi W. .
2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN SCHEDULING, 2007, :93-+
[4]   A survey of current work in biomedical text mining [J].
Cohen, AM ;
Hersh, WR .
BRIEFINGS IN BIOINFORMATICS, 2005, 6 (01) :57-71
[5]   A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations [J].
Dingare, S ;
Nissim, M ;
Finkel, J ;
Manning, C ;
Grover, C .
COMPARATIVE AND FUNCTIONAL GENOMICS, 2005, 6 (1-2) :77-85
[6]  
Finkel J., 2004, JOINT WORKSHOP NATUR, P88
[7]  
GuoDong Zhou., 2004, JNLPBA'04, P96, DOI DOI 10.3115/1567594.1567616
[8]  
Kim J.-D., 2004, P INT JOINT WORKSHOP, P70, DOI DOI 10.3115/1567594.1567610
[9]   GENIA corpus-a semantically annotated corpus for bio-textmining [J].
Kim, J-D ;
Ohta, T. ;
Tateisi, Y. ;
Tsujii, J. .
BIOINFORMATICS, 2003, 19 :i180-i182
[10]   Experimental study on a two phase method for biomedical named entity recognition [J].
Kim, Seonho ;
Yoon, Juntae .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (07) :1103-1110