Deep Learning to Distinguish Recalled but Benign Mammography Images in Breast Cancer Screening

被引：96

作者：

Aboutalib, Sarah S. ^{[1
]}

Mohamed, Aly A. ^{[2
]}

Berg, Wendie A. ^{[2
,3
]}

Zuley, Margarita L. ^{[2
,3
]}

Sumkin, Jules H. ^{[2
,3
]}

Wu, Shandong ^{[4
,5
,6
,7
]}

机构：

[1] Univ Pittsburgh, Sch Med, Dept Biomed Informat, Pittsburgh, PA USA

[2] Univ Pittsburgh, Sch Med, Dept Radiol, Pittsburgh, PA USA

[3] Univ Pittsburgh, Magee Womens Hosp, Med Ctr, Pittsburgh, PA 15213 USA

[4] Univ Pittsburgh, Dept Radiol, Pittsburgh, PA 15260 USA

[5] Univ Pittsburgh, Dept Biomed Informat, Pittsburgh, PA USA

[6] Univ Pittsburgh, Dept Bioengn, Pittsburgh, PA USA

[7] Univ Pittsburgh, Dept Intelligent Syst, Pittsburgh, PA USA

来源：

CLINICAL CANCER RESEARCH | 2018年 / 24卷 / 23期

关键词：

D O I：

10.1158/1078-0432.CCR-18-1115

中图分类号：

R73 [肿瘤学];

学科分类号：

100214 [肿瘤学];

摘要：

Purpose: False positives in digital mammography screening lead to high recall rates, resulting in unnecessary medical procedures to patients and health care costs. This study aimed to investigate the revolutionary deep learning methods to distinguish recalled but benign mammography images from negative exams and those with malignancy. Experimental Design: Deep learning convolutional neural network (CNN) models were constructed to classify mammography images into malignant (breast cancer), negative (breast cancer free), and recalled-benign categories. A total of 14,860 images of 3,715 patients from two independent mammography datasets: Full-Field Digital Mammography Dataset (FFDM) and a digitized film dataset, Digital Dataset of Screening Mammography (DDSM), were used in various settings for training and testing the CNN models. The ROC curve was generated and the AUC was calculated as a metric of the classification accuracy. Results: Training and testing using only the FFDM dataset resulted in AUC ranging from 0.70 to 0.81. When the DDSM dataset was used, AUC ranged from 0.77 to 0.96. When datasets were combined for training and testing, AUC ranged from 0.76 to 0.91. When pretrained on a large nonmedical dataset and DDSM, the models showed consistent improvements in AUC ranging from 0.02 to 0.05 (all P > 0.05), compared with pretraining only on the nonmedical dataset. Conclusions: This study demonstrates that automatic deep learning CNN methods can identify nuanced mammographic imaging features to distinguish recalled-benign images from malignant and negative cases, which may lead to a computerized clinical toolkit to help reduce false recalls.

引用

页码：5902 / 5909

页数：8

共 37 条

[1]

[Anonymous], P 2015 INT C DIG IM

[2]

[Anonymous], 2014, ACM INT C MULTIMEDIA

[3]

[Anonymous], 2016, Lecture Notes in Computer Science, DOI [10.1007/978-3-319-46493-0_38, DOI 10.1007/978-3-319-46493-0_38]

[4]

[Anonymous], 2015, DDSM Utility

[5]

Representation learning for mammography mass lesion classification with convolutional neural networks [J].

Arevalo, John ;

Gonzalez, Fabio A. ;

Ramos-Pollan, Raul ;

Oliveira, Jose L. ;

Guevara Lopez, Miguel Angel .

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2016, 127 :248-257

[6]

Unregistered Multiview Mammogram Analysis with Pre-trained Deep Learning Models [J].

Carneiro, Gustavo ;

Nascimento, Jacinto ;

Bradley, Andrew P. .

MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 :652-660

[7]

Pan-Canadian Study of Mammography Screening and Mortality from Breast Cancer [J].

Coldman, Andrew ;

Phillips, Norm ;

Wilson, Christine ;

Decker, Kathleen ;

Chiarelli, Anna M. ;

Brisson, Jacques ;

Zhang, Bin ;

Payne, Jennifer ;

Doyle, Gregory ;

Ahmad, Rukshanda .

JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2014, 106 (11)

[8]

COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH [J].

DELONG, ER ;

DELONG, DM ;

CLARKEPEARSON, DI .

BIOMETRICS, 1988, 44 (03) :837-845

[9]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[10]

Deng L., 2014, FOND T SIGN PROC, V7, P197, DOI DOI 10.1561/2000000039

← 1 2 3 4 →