Cue-based assertion classification for Swedish clinical text-Developing a lexicon for pyConTextSwe

被引:7
作者
Velupillai, Sumithra [1 ]
Skeppstedt, Maria [1 ]
Kvist, Maria [1 ,2 ]
Mowery, Danielle [3 ]
Chapman, Brian E. [4 ]
Dalianis, Hercules [1 ]
Chapman, Wendy W. [5 ]
机构
[1] Stockholm Univ, Dept Comp & Syst Sci DSV, S-16440 Kista, Sweden
[2] Karolinska Inst, Dept Learning Informat Management & Eth LIME, Solna, Sweden
[3] Univ Pittsburgh, Dept Biomed Informat, Pittsburgh, PA 15206 USA
[4] Univ Utah, Dept Radiol, Salt Lake City, UT 84108 USA
[5] Univ Utah, Dept Biomed Informat, Salt Lake City, UT 84112 USA
关键词
Assertion classification; Clinical text mining; Dictionaries; Medical Language Processing; Information extraction; Electronic health records; NEGATION; LANGUAGE;
D O I
10.1016/j.artmed.2014.01.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish. Methods and material: We integrated cues from four external lexicons, along with generated inflections and combinations. We used subsets of a clinical corpus in Swedish. We applied four assertion classes (definite existence, probable existence, probable negated existence and definite negated existence) and two binary classes (existence yes/no and uncertainty yes/no) to pyConTextSwe. We compared pyConTextSwe's performance with and without the added cues on a development set, and improved the lexicon further after an error analysis. On a separate evaluation set, we calculated the system's final performance. Results: Following integration steps, we added 454 cues to pyConTextSwe. The optimized lexicon developed after an error analysis resulted in statistically significant improvements on the development set (83%F-score, overall). The system's final F-scores on an evaluation set were 81% (overall). For the individual assertion classes, F-score results were 88% (definite existence), 81% (probable existence), 55% (probable negated existence), and 63% (definite negated existence). For the binary classifications existence yes/no and uncertainty yes/no, final system performance was 97%/87% and 78%/86% F-score, respectively. Conclusions: We have successfully ported pyConTextNLP to Swedish (pyConTextSwe). We have created an extensive and useful assertion lexicon for Swedish clinical text, which could form a valuable resource for similar studies, and which is publicly available. (C) 2014 The Authors. Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:137 / 144
页数:8
相关论文
共 31 条
[1]   Biomedical negation scope detection with conditional random fields [J].
Agarwal, Shashank ;
Yu, Hong .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (06) :696-701
[2]  
[Anonymous], 2012, P 2 ACM SIGHIT INT H, DOI DOI 10.1145/2110363.2110443
[3]   Ad hoc classification of radiology reports [J].
Aronow, DB ;
Feng, FF ;
Croft, WB .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1999, 6 (05) :393-411
[4]   Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm [J].
Chapman, Brian E. ;
Lee, Sean ;
Kang, Hyunseok Peter ;
Chapman, Wendy W. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (05) :728-737
[5]   A simple algorithm for identifying negated findings and diseases in discharge summaries [J].
Chapman, WW ;
Bridewell, W ;
Hanbury, P ;
Cooper, GF ;
Buchanan, BG .
JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) :301-310
[6]   MITRE system for clinical assertion status classification [J].
Clark, Cheryl ;
Aberdeen, John ;
Coarr, Matt ;
Tresner-Kirsch, David ;
Wellner, Ben ;
Yeh, Alexander ;
Hirschman, Lynette .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 (05) :563-567
[7]  
Dalianis H., 2012, SWED LANG TECHN C, P17
[8]   Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 [J].
de Bruijn, Berry ;
Cherry, Colin ;
Kiritchenko, Svetlana ;
Martin, Joel ;
Zhu, Xiaodan .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 (05) :557-562
[9]  
Farkas R, 2010, Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL-2010): Shared Task
[10]   A GENERAL NATURAL-LANGUAGE TEXT PROCESSOR FOR CLINICAL RADIOLOGY [J].
FRIEDMAN, C ;
ALDERSON, PO ;
AUSTIN, JHM ;
CIMINO, JJ ;
JOHNSON, SB .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, 1 (02) :161-174