Exploring the boundaries: gene and protein identification in biomedical text

被引:48
作者
Finkel, J
Dingare, S
Manning, CD [1 ]
Nissim, M
Alex, B
Grover, C
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[2] Univ Edinburgh, Inst Communicating & Collaborat Syst, Edinburgh EH8 9YL, Midlothian, Scotland
关键词
External Resource; Name Entity Recognition; Biomedical Domain; Biomedical Text; GENIA Corpus;
D O I
10.1186/1471-2105-6-S1-S5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. Methods: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. Results: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation. Conclusion: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.
引用
收藏
页数:9
相关论文
共 26 条
  • [1] ARONSON AR, 2000, 2000 AMIA ANN FALL S, P17
  • [2] BORTHWICK A.E., 1999, MAXIMUM ENTROPY APPR
  • [3] BRANTS T, 2000, ANLP, V6, P224
  • [4] Collins M, 2002, 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P489
  • [5] Curran JamesR., 2003, P 7 C NATURAL LANGUA, V4. -, P164, DOI DOI 10.3115/1119176.1119200
  • [6] Inducing features of random fields
    DellaPietra, S
    DellaPietra, V
    Lafferty, J
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (04) : 380 - 393
  • [7] DEMETRIOU G, 2003, P 3 M SPEC INT GROUP
  • [8] FINKEL J, 2004, P INT JOINT WORKSH N
  • [9] GREFENSTETTE G, 1999, P ASLIB 99 TRANSLATI, V21
  • [10] Rutabaga by any other name: extracting biological names
    Hirschman, L
    Morgan, AA
    Yeh, AS
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2002, 35 (04) : 247 - 259