Biomedical negation scope detection with conditional random fields

被引:44
作者
Agarwal, Shashank
Yu, Hong [1 ,2 ]
机构
[1] Univ Wisconsin, Dept Hlth Sci, Milwaukee, WI 53201 USA
[2] Univ Wisconsin, Dept Comp Sci, Milwaukee, WI 53201 USA
关键词
D O I
10.1136/jamia.2010.003228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective Negation is a linguistic phenomenon that marks the absence of an entity or event. Negated events are frequently reported in both biological literature and clinical notes. Text mining applications benefit from the detection of negation and its scope. However, due to the complexity of language, identifying the scope of negation in a sentence is not a trivial task. Design Conditional random fields (CAF), a supervised machine-learning algorithm, were used to train models to detect negation cue phrases and their scope in both biological literature and clinical notes. The models were trained on the publicly available BioScope corpus. Measurement The performance of the CRF models was evaluated on identifying the negation cue phrases and their scope by calculating recall, precision and F1-score. The models were compared with four competitive baseline systems. Results The best CRF-based model performed statistically better than all baseline systems and NegEx, achieving an F1-score of 98% and 95% on detecting negation cue phrases and their scope in clinical notes, and an F1-score of 97% and 85% on detecting negation cue phrases and their scope in biological literature. Conclusions This approach is robust, as it can identify negation scope in both biological and clinical text. To benefit text mining applications, the system is publicly available as a Java API and as an online application at http://negscope.askhermes.org.
引用
收藏
页码:696 / 701
页数:6
相关论文
共 19 条
[1]  
[Anonymous], P 11 INT JOINT C ART
[2]  
[Anonymous], 2009, Proceedings of the Thirteenth Conference on Computational Natural Language Learning, DOI DOI 10.3115/1596374.1596381
[3]  
Averbuch M, 2004, ST HEAL T, V107, P282
[4]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[5]   A simple algorithm for identifying negated findings and diseases in discharge summaries [J].
Chapman, WW ;
Bridewell, W ;
Hanbury, P ;
Cooper, GF ;
Buchanan, BG .
JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) :301-310
[6]  
DAGAN I, 1993, 31ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P164
[7]   A controlled trial of automated classification of negation from clinical notes [J].
Elkin P.L. ;
Brown S.H. ;
Bauer B.A. ;
Husser C.S. ;
Carruth W. ;
Bergstrom L.R. ;
Wahner-Roedler D.L. .
BMC Medical Informatics and Decision Making, 5 (1)
[8]   A novel hybrid approach to automated negation detection in clinical radiology reports [J].
Huang, Yang ;
Lowe, Henry J. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2007, 14 (03) :304-311
[9]   ESTIMATION OF PROBABILITIES FROM SPARSE DATA FOR THE LANGUAGE MODEL COMPONENT OF A SPEECH RECOGNIZER [J].
KATZ, SM .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1987, 35 (03) :400-401
[10]  
Lafferty J.D., 2001, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, P282, DOI DOI 10.5555/645530.655813