Mayo clinic NLP system for patient smoking status identification

被引:72
作者
Savova, Guergana K. [1 ]
Ogren, Philip V. [1 ]
Duffy, Patrick H. [1 ]
Buntrock, James D. [1 ]
Chute, Christopher G. [1 ]
机构
[1] Mayo Clin, Rochester, MN 55902 USA
关键词
D O I
10.1197/jamia.M2437
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article describes our system entry for the 2006 I2B2 contest "Challenges in Natural Language Processing for Clinical Data" for the task of identifying the smoking status of patients. Our system makes the simplifying assumption that patient-level smoking status determination can be achieved by accurately classifying individual sentences from a patient's record. We created our system with reusable text analysis components built on the Unstructured Information Management Architecture and Weka. This reuse of code minimized the development effort related specifically to our smoking status classifier. We report precision, recall, F-score, and 95% exact confidence intervals for each metric. Recasting the classification task for the sentence level and reusing code from other text analysis projects allowed us to quickly build a classification system that performs with a system F-score of 92.64 based on held-out data tests and of 85.57 on the formal evaluation data. Our general medical natural language engine is easily adaptable to a real-world medical informatics application. Some of the limitations as applied to the use-case are negation detection and temporal resolution.
引用
收藏
页码:25 / 28
页数:4
相关论文
共 8 条
[1]  
BRANK J, MSRTR200263
[2]   A simple algorithm for identifying negated findings and diseases in discharge summaries [J].
Chapman, WW ;
Bridewell, W ;
Hanbury, P ;
Cooper, GF ;
Buchanan, BG .
JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) :301-310
[3]   Automatic document classification of biological literature [J].
Chen, David ;
Muller, Hans-Michael ;
Sternberg, Paul W. .
BMC BIOINFORMATICS, 2006, 7 (1)
[4]  
Clopper CJ, 1934, BIOMETRIKA, V26, P404, DOI 10.2307/2331986
[5]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[6]  
Joachims T., 1998, Lecture Notes in Computer Science, P137, DOI DOI 10.1007/BFB0026683
[7]  
UZUNER O, 2008, J AM MED INFORM ASSN, V15, pR30
[8]   Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques 2nd editionSan Francisco: Morgan Kaufmann Publishers; 2005:560. ISBN 0-12-088407-0, £34.99 [J].
Francisco Azuaje .
BioMedical Engineering OnLine, 5 (1)