Automated annotation and classification of BI-RADS assessment from radiology reports

被引:58
作者
Castro, Sergio M. [1 ]
Tseytlin, Eugene [1 ]
Medvedeva, Olga [1 ]
Mitchell, Kevin [1 ]
Visweswaran, Shyam [1 ]
Bekhuis, Tanja [1 ]
Jacobson, Rebecca S. [1 ]
机构
[1] Univ Pittsburgh, Sch Med, Dept Biomed Informat, Off Baum, 5607 Baum Blvd,BAUM 423, Pittsburgh, PA 15206 USA
关键词
Breast Imaging Reporting and Data System (BI-RADS); Information extraction; Natural language processing; Imaging informatics; Machine learning; MEDICATION INFORMATION; MAMMOGRAPHY REPORTS; CLINICAL NOTES; EXTRACTION; SYSTEM;
D O I
10.1016/j.jbi.2017.04.011
中图分类号
TP39 [计算机的应用];
学科分类号
080201 [机械制造及其自动化];
摘要
The Breast Imaging Reporting and Data System (BI-RADS) was developed to reduce variation in the descriptions of findings. Manual analysis of breast radiology report data is challenging but is necessary for clinical and healthcare quality assurance activities. The objective of this study is to develop a natural language processing (NLP) system for automated BI-RADS categories extraction from breast radiology reports. We evaluated an existing rule-based NLP algorithm, and then we developed and evaluated our own method using a supervised machine learning approach. We divided the BI-RADS category extraction task into two specific tasks: (1) annotation of all BI-RADS category values within a report, (2) classification of the laterality of each BI-RADS category value. We used one algorithm for task 1 and evaluated three algorithms for task 2. Across all evaluations and model training, we used a total of 2159 radiology reports from 18 hospitals, from 2003 to 2015. Performance with the existing rule-based algorithm was not satisfactory. Conditional random fields showed a high performance for task 1 with an F-1 measure of 0.95. Rules from partial decision trees (PART) algorithm showed the best performance across classes for task 2 with a weighted F-1 measure of 0.91 for BIRADS 0-6, and 0.93 for BIRADS 3-5. Classification performance by class showed that performance improved for all classes from Naive Bayes to Support Vector Machine (SVM), and also from SVM to PART. Our system is able to annotate and classify all BI-RADS mentions present in a single radiology report and can serve as the foundation for future studies that will leverage automated BI-RADS annotation, to provide feedback to radiologists as part of a learning health system loop. (C) 2017 The Authors. Published by Elsevier Inc.
引用
收藏
页码:177 / 187
页数:11
相关论文
共 57 条
[1]
Akande Halimat Jumai, 2015, Niger Med J, V56, P213, DOI 10.4103/0300-1652.160401
[2]
[Anonymous], LAW 5 11 P 5 LING AN
[3]
[Anonymous], LEARNING HEALTHCARE
[4]
[Anonymous], 2002, MALLET: A machine learning for language toolkit
[5]
[Anonymous], P IEEE INT C BIOINF
[6]
[Anonymous], MACHINE LEARNING APP
[7]
[Anonymous], ACR BI RADS ATL BREA
[8]
[Anonymous], J DIGIT IMAG
[9]
[Anonymous], 2001, CONDITIONAL RANDOM F
[10]
Badan Gustavo Machado, 2014, Radiol Bras, V47, P74, DOI 10.1590/S0100-39842014000200007