Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles

被引:45
作者
Blake, Catherine [1 ]
机构
[1] Univ Illinois, Sch Lib & Informat Sci, Champaign, IL 61820 USA
基金
美国国家科学基金会;
关键词
Information extraction; Relationship extraction; Biomedical informatics; Scientific discovery; Text mining; Natural language processing; Corpus annotation; KNOWLEDGE;
D O I
10.1016/j.jbi.2009.11.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Massive increases in electronically available text have spurred a variety of natural language processing methods to automatically identify relationships from text; however, existing annotated collections comprise only bioinformatics (gene-protein) or clinical informatics (treatment-disease) relationships. This paper introduces the Claim Framework that reflects how authors across biomedical spectrum communicate findings in empirical studies. The Framework captures different levels of evidence by differentiating between explicit and implicit claims, and by capturing under-specified claims such as correlations, comparisons, and observations. The results from 29 full-text articles show that authors report fewer than 7.84% of scientific claims in an abstract, thus revealing the urgent need for text mining systems to consider the full-text of an article rather than just the abstract. The results also show that authors typically report explicit claims (77.12%) rather than an observations (9.23%), correlations (5.39%), comparisons (5.11%) or implicit claims (2.7%). Informed by the initial manual annotations, we introduce an automated approach that uses syntax and semantics to identify explicit claims automatically and measure the degree to which each feature contributes to the overall precision and recall. Results show that a combination of semantics and syntax is required to achieve the best system performance. (C) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:173 / 189
页数:17
相关论文
共 41 条
  • [21] Harris, 1981, THEORY LANGUAGE INFO
  • [22] HERSH W, 2007, TREC 2007 WORKING NO
  • [23] Discovering patterns to extract protein-protein interactions from full texts
    Huang, ML
    Zhu, XY
    Hao, Y
    Payan, DG
    Qu, KB
    Li, M
    [J]. BIOINFORMATICS, 2004, 20 (18) : 3604 - 3612
  • [24] Khoo C. S. G., 1998, Literary & Linguistic Computing, V13, P177, DOI 10.1093/llc/13.4.177
  • [25] Kim Jin-Dong, 2008, BMC BIOINFORM, V9
  • [26] MEASUREMENT OF OBSERVER AGREEMENT FOR CATEGORICAL DATA
    LANDIS, JR
    KOCH, GG
    [J]. BIOMETRICS, 1977, 33 (01) : 159 - 174
  • [27] Semantic role labeling:: An introduction to the special issue
    Marquez, Lluis
    Carreras, Xavier
    Litkowski, Kenneth C.
    Stevenson, Suzanne
    [J]. COMPUTATIONAL LINGUISTICS, 2008, 34 (02) : 145 - 159
  • [28] *NAT LIB MED, 2004, UMLS MET FACT SHEET
  • [29] Price SusanL., 2005, P AMIA 2005 ANN FALL
  • [30] Medical information retrieval: Development and evaluation of a context-based document representation for searching the medical literature
    Purcell G.P.
    Rennels G.D.
    Shortliffe E.H.
    [J]. International Journal on Digital Libraries, 1997, 1 (3) : 288 - 296