Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis

被引:538
作者
Aggarwal, Ravi [1 ]
Sounderajah, Viknesh [1 ]
Martin, Guy [1 ]
Ting, Daniel S. W. [2 ]
Karthikesalingam, Alan [1 ]
King, Dominic [1 ]
Ashrafian, Hutan [1 ]
Darzi, Ara [1 ]
机构
[1] Imperial Coll London, Inst Global Hlth Innovat, London, England
[2] Singapore Natl Eye Ctr, Singapore Eye Res Inst, Singapore, Singapore
关键词
CONVOLUTIONAL NEURAL-NETWORK; DIABETIC-RETINOPATHY; ARTIFICIAL-INTELLIGENCE; ANALYSIS-SOFTWARE; BREAST-CANCER; BIG DATA; IMAGES; CLASSIFICATION; PERFORMANCE; VALIDATION;
D O I
10.1038/s41746-021-00438-z
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
100404 [儿少卫生与妇幼保健学];
摘要
Deep learning (DL) has the potential to transform medical diagnostics. However, the diagnostic accuracy of DL is uncertain. Our aim was to evaluate the diagnostic accuracy of DL algorithms to identify pathology in medical imaging. Searches were conducted in Medline and EMBASE up to January 2020. We identified 11,921 studies, of which 503 were included in the systematic review. Eighty-two studies in ophthalmology, 82 in breast disease and 115 in respiratory disease were included for meta-analysis. Two hundred twenty-four studies in other specialities were included for qualitative review. Peer-reviewed studies that reported on the diagnostic accuracy of DL algorithms to identify pathology using medical imaging were included. Primary outcomes were measures of diagnostic accuracy, study design and reporting standards in the literature. Estimates were pooled using random-effects meta-analysis. In ophthalmology, AUC's ranged between 0.933 and 1 for diagnosing diabetic retinopathy, age-related macular degeneration and glaucoma on retinal fundus photographs and optical coherence tomography. In respiratory imaging, AUC's ranged between 0.864 and 0.937 for diagnosing lung nodules or lung cancer on chest X-ray or CT scan. For breast imaging, AUC's ranged between 0.868 and 0.909 for diagnosing breast cancer on mammogram, ultrasound, MRI and digital breast tomosynthesis. Heterogeneity was high between studies and extensive variation in methodology, terminology and outcome measures was noted. This can lead to an overestimation of the diagnostic accuracy of DL algorithms on medical imaging. There is an immediate need for the development of artificial intelligence-specific EQUATOR guidelines, particularly STARD, in order to provide guidance around key issues in this field.
引用
收藏
页数:23
相关论文
共 139 条
[1]
Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices [J].
Abramoff, Michael D. ;
Lavin, Philip T. ;
Birch, Michele ;
Shah, Nilay ;
Folk, James C. .
NPJ DIGITAL MEDICINE, 2018, 1
[2]
Evaluation of a Deep Learning System For Identifying Glaucomatous Optic Neuropathy Based on Color Fundus Photographs [J].
Al-Aswad, Lama A. ;
Kapoor, Rahul ;
Chu, Chia Kai ;
Walters, Stephen ;
Gong, Dan ;
Garg, Aakriti ;
Gopal, Kalashree ;
Patel, Vipul ;
Sameer, Trikha ;
Rogers, Thomas W. ;
Nicolas, Jaccard ;
De Moraes, Gustavo C. ;
Moazami, Golnaz .
JOURNAL OF GLAUCOMA, 2019, 28 (12) :1029-1034
[4]
Automated Triaging of Adult Chest Radiographs with Deep Artificial Neural Networks (vol 291, pg 272, 2019) [J].
Annarumma, Mauro ;
Withey, Samuel J. ;
Bakewell, Robert J. ;
Pesce, Emanuele ;
Goh, Vicky ;
Montana, Giovanni .
RADIOLOGY, 2019, 291 (01) :202-202
[5]
End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography [J].
Ardila, Diego ;
Kiraly, Atilla P. ;
Bharadwaj, Sujeeth ;
Choi, Bokyung ;
Reicher, Joshua J. ;
Peng, Lily ;
Tse, Daniel ;
Etemadi, Mozziyar ;
Ye, Wenxing ;
Corrado, Greg ;
Naidich, David P. ;
Shetty, Shravya .
NATURE MEDICINE, 2019, 25 (06) :954-+
[6]
Validation of a Deep Learning Model to Screen for Glaucoma Using Images from Different Fundus Cameras and Data Augmentation [J].
Asaoka, Ryo ;
Tanito, Masaki ;
Shibata, Naoto ;
Mitsuhashi, Keita ;
Nakahara, Kenichi ;
Fujino, Yuri ;
Matsuura, Masato ;
Murata, Hiroshi ;
Tokumo, Kana ;
Kiuchi, Yoshiaki .
OPHTHALMOLOGY GLAUCOMA, 2019, 2 (04) :224-231
[7]
Challenges to the Reproducibility of Machine Learning Models in Health Care [J].
Beam, Andrew L. ;
Manrai, Arjun K. ;
Ghassemi, Marzyeh .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2020, 323 (04) :305-306
[8]
Big Data and Machine Learning in Health Care [J].
Beam, Andrew L. ;
Kohane, Isaac S. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2018, 319 (13) :1317-1318
[9]
Classification of breast cancer in ultrasound imaging using a generic deep learning analysis software: a pilot study [J].
Becker, Anton S. ;
Mueller, Michael ;
Stofel, Elina ;
Marcon, Magda ;
Ghafoor, Soleen ;
Boss, Andreas .
BRITISH JOURNAL OF RADIOLOGY, 2018, 91 (1083)
[10]
Deep Learning in Mammography Diagnostic Accuracy of a Multipurpose Image Analysis Software in the Detection of Breast Cancer [J].
Becker, Anton S. ;
Marcon, Magda ;
Ghafoor, Soleen ;
Wurnig, Moritz C. ;
Frauenfelder, Thomas ;
Boss, Andreas .
INVESTIGATIVE RADIOLOGY, 2017, 52 (07) :434-440