Identifying smokers with a medical extraction system

被引:42
作者
Clark, Cheryl [1 ]
Good, Kathleen [2 ]
Jezierny, Lesley [2 ]
Macpherson, Melissa [2 ]
Wilson, Brian [2 ]
Chajewska, Urszula [2 ]
机构
[1] MITRE Corp, Bedford, MA 01730 USA
[2] Nuance Commun Inc, Dictaphone Healthcare Solut, Burlington, MA USA
关键词
D O I
10.1197/jamia.M2442
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Clinical Language Understanding group at Nuance Communications has developed a medical information extraction system that combines a rule-based extraction engine with machine learning algorithms to identify and categorize references to patient smoking in clinical reports. The extraction engine identifies smoking references; documents that contain no smoking references are classified as UNKNOWN. For the remaining documents, the extraction engine uses linguistic analysis to associate features such as status and time to smoking mentions. Machine learning is used to classify the documents based on these features. This approach shows overall accuracy in the 90s on all data sets used. Classification using engine-generated and word-based features outperforms classification using only word-based features for all data sets, although the difference gets smaller as the data set size increases. These techniques could be applied to identify other risk factors, such as drug and alcohol use, or a family history of a disease.
引用
收藏
页码:36 / 39
页数:4
相关论文
共 10 条
[1]  
[Anonymous], 2005, Data Mining Pratical Machine Learning Tools and Techniques
[2]  
[Anonymous], 1982, ESTIMATION DEPENDENC
[3]  
[Anonymous], 1990, SUPPORT VECTOR LEARN
[4]   Fever detection from free-text clinical records for biosurveillance [J].
Chapman, WW ;
Dowling, JN ;
Wagner, MM .
JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (02) :120-127
[5]   Automated encoding of clinical documents based on natural language processing [J].
Friedman, C ;
Shagina, L ;
Lussier, Y ;
Hripcsak, G .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2004, 11 (05) :392-402
[6]  
Long William, 2005, AMIA Annu Symp Proc, P470
[7]  
MORSCH ML, 2006, AAAI, P1814
[8]  
Sordo M, 2005, LECT NOTES COMPUTER, V3745
[9]  
UZUNER O, I2B2 WORKSH NAT LANG
[10]   Extracting principal diagnosis, co-morbidity and smoking status for asthma research: Evaluation of a natural language processing system [J].
Zeng Q.T. ;
Goryachev S. ;
Weiss S. ;
Sordo M. ;
Murphy S.N. ;
Lazarus R. .
BMC Medical Informatics and Decision Making, 6 (1)