Deep Learning in Mammography Diagnostic Accuracy of a Multipurpose Image Analysis Software in the Detection of Breast Cancer

被引:281
作者
Becker, Anton S. [1 ]
Marcon, Magda [1 ]
Ghafoor, Soleen [1 ]
Wurnig, Moritz C. [1 ]
Frauenfelder, Thomas [1 ]
Boss, Andreas [1 ]
机构
[1] Univ Hosp Zurich, Inst Diagnost & Intervent Radiol, Raemistr 100, CH-8091 Zurich, Switzerland
关键词
mammography; breast cancer; artificial neural network; artificial intelligence; machine learning; deep learning; diagnostic accuracy; BREAST; MODEL;
D O I
10.1097/RLI.0000000000000358
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
100231 [临床病理学]; 100902 [航空航天医学];
摘要
Objectives: The aim of this study was to evaluate the diagnostic accuracy of a multipurpose image analysis software based on deep learning with artificial neural networks for the detection of breast cancer in an independent, dual-center mammography data set. Materials and Methods: In this retrospective, Health Insurance Portability and Accountability Act-compliant study, all patients undergoing mammography in 2012 at our institution were reviewed (n = 3228). All of their prior and follow-up mammographies from a time span of 7 years (2008-2015) were considered as a reference for clinical diagnosis. After applying exclusion criteria (missing reference standard, prior procedures or therapies), patients with the first diagnosis of a malignoma or borderline lesion were selected (n = 143). Histology or clinical long-term follow-up served as reference standard. In a first step, a breast density-and age-matched control cohort was selected (n = 143) from the remaining patients with more than 2 years follow-up (n = 1003). The neural network was trained with this data set. From the publicly available Breast Cancer Digital Repository data set, patients with cancer and a matched control cohort were selected (n = 35 x 2). The performance of the trained neural network was also tested with this external data set. Three radiologists (3, 5, and 10 years of experience) evaluated the test data set. In a second step, the neural network was trained with all cases from January to September and tested with cases from October to December 2012 (screening-like cohort). The radiologists also evaluated this second test data set. The areas under the receiver operating characteristic curve between readers and the neural network were compared. A Bonferroni-corrected P value of less than 0.016 was considered statistically significant. Results: Mean age of patients with lesion was 59.6 years (range, 35-88 years) and in controls, 59.1 years (35-83 years). Breast density distribution (A/B/C/D) was 21/59/42/21 and 22/60/41/20, respectively. Histologic diagnoses were invasive ductal carcinoma in 90, ductal in situ carcinoma in 13, invasive lobular carcinoma in 13, mucinous carcinoma in 3, and borderline lesion in 12 patients. In the first step, the area under the receiver operating characteristic curve of the trained neural network was 0.81 and comparable on the test cases 0.79 (P - 0.63). One of the radiologists showed almost equal performance (0.83, P - 0.17), whereas 2 were significantly better (0.91 and 0.94, P < 0.016). In the second step, performance of the neural network (0.82) was not significantly different from the human performance (0.77-0.87, P > 0.016); however, radiologists were consistently less sensitive and more specific than the neural network. Conclusions: Current state-of-the-art artificial neural networks for general image analysis are able to detect cancer in mammographies with similar accuracy to radiologists, even in a screening-like cohort with low breast cancer prevalence.
引用
收藏
页码:434 / 440
页数:7
相关论文
共 31 条
[11]
Comparison of Tomosynthesis Plus Digital Mammography and Digital Mammography Alone for Breast Cancer Screening [J].
Haas, Brian M. ;
Kalra, Vivek ;
Geisel, Jaime ;
Raghu, Madhavi ;
Durand, Melissa ;
Philpotts, Liane E. .
RADIOLOGY, 2013, 269 (03) :694-700
[12]
A fast learning algorithm for deep belief nets [J].
Hinton, Geoffrey E. ;
Osindero, Simon ;
Teh, Yee-Whye .
NEURAL COMPUTATION, 2006, 18 (07) :1527-1554
[13]
To recognize shapes, first learn to generate images [J].
Hinton, Geoffrey E. .
COMPUTATIONAL NEUROSCIENCE: THEORETICAL INSIGHTS INTO BRAIN FUNCTION, 2007, 165 :535-547
[14]
Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference [J].
Ho, Daniel E. ;
Imai, Kosuke ;
King, Gary ;
Stuart, Elizabeth A. .
POLITICAL ANALYSIS, 2007, 15 (03) :199-236
[15]
Cumulative Probability of False-Positive Recall or Biopsy Recommendation After 10 Years of Screening Mammography A Cohort Study [J].
Hubbard, Rebecca A. ;
Kerlikowske, Karla ;
Flowers, Chris I. ;
Yankaskas, Bonnie C. ;
Zhu, Weiwei ;
Miglioretti, Diana L. .
ANNALS OF INTERNAL MEDICINE, 2011, 155 (08) :481-U46
[16]
False-Positive Results in the Randomized Controlled Trial of Mammographic Screening from Age 40 ("Age" Trial) [J].
Johns, Louise E. ;
Moss, Sue M. .
CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2010, 19 (11) :2758-2764
[17]
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammographic Risk Scoring [J].
Kallenberg, Michiel ;
Petersen, Kersten ;
Nielsen, Mads ;
Ng, Andrew Y. ;
Diao, Pengfei ;
Igel, Christian ;
Vachon, Celine M. ;
Holland, Katharina ;
Winkel, Rikke Rass ;
Karssemeijer, Nico ;
Lillholm, Martin .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2016, 35 (05) :1322-1331
[18]
Prediction Model For Extensive Ductal Carcinoma In Situ Around Early-Stage Invasive Breast Cancer [J].
Knuttel, Floortje M. ;
van der Velden, Bas H. M. ;
Loo, Claudette E. ;
Elias, Sjoerd G. ;
Wesseling, Jelle ;
van den Bosch, Maurice A. A. J. ;
Gilhuijs, Kenneth G. A. .
INVESTIGATIVE RADIOLOGY, 2016, 51 (07) :462-468
[19]
The Changing World of Breast Cancer A Radiologist's Perspective [J].
Kuhl, Christiane K. .
INVESTIGATIVE RADIOLOGY, 2015, 50 (09) :615-628
[20]
Deep learning [J].
LeCun, Yann ;
Bengio, Yoshua ;
Hinton, Geoffrey .
NATURE, 2015, 521 (7553) :436-444