A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations

被引:26
作者
Dingare, S
Nissim, M
Finkel, J
Manning, C
Grover, C
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Univ Edinburgh, Inst Communicating & Collaborat Syst, Edinburgh EH8 9LW, Midlothian, Scotland
来源
COMPARATIVE AND FUNCTIONAL GENOMICS | 2005年 / 6卷 / 1-2期
关键词
automatic named entity recognition; manual annotation of biomedical texts;
D O I
10.1002/cfg.457
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We present a maximum entropy-based system for identifying named entities (NEs) in biomedical abstracts and present its performance in the only two biomedical named entity recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail, including its rich use of local features, attention to correct boundary identification, innovative use of external knowledge resources, including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than optimal. Copyright (c) 2005 John Wiley & Sons, Ltd.
引用
收藏
页码:77 / 85
页数:9
相关论文
共 26 条
[11]  
FUKUDA K, 1998, PAC S BIOCOMPUT, V3, P705
[12]  
GREFENSTETTE G, 1999, P ASLIB 99 TRANSL CO, V21
[13]  
HIRSCHMAN L, 2003, USING BIOL RESOURCES
[14]   Using the Web to obtain frequencies for unseen bigrams [J].
Keller, F ;
Lapata, M .
COMPUTATIONAL LINGUISTICS, 2003, 29 (03) :459-484
[15]  
Kilgarriff A., 1997, International Journal of Lexicography, V10, P135, DOI [10.1093/ijl/10.2.135, DOI 10.1093/IJL/10.2.135, 10.1093/ijl/eck006, DOI 10.1093/IJL/ECK006]
[16]   Accurate unlexicalized parsing [J].
Klein, D ;
Manning, CD .
41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, :423-430
[17]  
Klein D., 2003, P 7 C NATURAL LANGUA, P180
[18]  
KOICHI T, 2003, P WORKSH NAT LANG PR, P7
[19]  
Makino T., 2002, ACL-02 Workshop on Natural Language Processing in the Biomedical Domain, P1, DOI DOI 10.3115/1118149.1118150
[20]  
MARKERT K, 2003, P EACL WORKSH COMP T, P39