Rich features based Conditional Random Fields for biological named entities recognition

被引:30
作者
Sun, Chengjie [1 ]
Guan, Yi [1 ]
Wang, Xiaolong [1 ]
Lin, Lei [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci, Harbin 150001, Heilongjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Conditional Random Fields; named entities recognition; chunking; sequential labeling problem; text mining;
D O I
10.1016/j.compbiomed.2006.12.002
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Biological named entity recognition is a critical task for automatically mining knowledge from biological literature. In this paper, this task is cast as a sequential labeling problem and Conditional Random Fields model is introduced to solve it. Under the framework of Conditional Random Fields model, rich features including literal, context and semantics are involved. Among these features, shallow syntactic features are first introduced, which effectively improve the model's performance. Experiments show that our method can achieve an F-measure of 71.2% in an open evaluation data, which is better than most of state-of-the-art systems. (C) 2007 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1327 / 1333
页数:7
相关论文
共 19 条
  • [1] [Anonymous], 2002, THESIS U EDINBURGH
  • [2] BURR S, 2004, JOINT WORKSH NAT LAN, P104
  • [3] A survey of current work in biomedical text mining
    Cohen, AM
    Hersh, WR
    [J]. BRIEFINGS IN BIOINFORMATICS, 2005, 6 (01) : 57 - 71
  • [4] Data preparation and interannotator agreement: BioCreAtIvE task IB
    Colosimo, ME
    Morgan, AA
    Yeh, AS
    Colombe, JB
    Hirschman, L
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [5] Finkel J., 2004, Joint Workshop on Natural Language Processing in Biomedicine and its Applications JNLPBA, P91
  • [6] GuoDong Z, 2004, P INT JOINT WORKSH N, P96, DOI DOI 10.3115/1567594.1567616
  • [7] HIRSCHMAN L, 2005, BMC BIOINFORMATIC S1, V6
  • [8] Kim J-D, 2004, P INT JOINT WORKSHOP, P70
  • [9] High-recall protein entity recognition using a dictionary
    Kou, ZZ
    Cohen, WW
    Murphy, RF
    [J]. BIOINFORMATICS, 2005, 21 : I266 - I273
  • [10] Lafferty J., 2001, PROC 18 INT C MACHIN, DOI [DOI 10.1038/NPROT.2006.61, 10.1038/nprot.2006.61]