Computer vs human: Deep learning versus perceptual training for the detection of neck of femur fractures

被引：101

作者：

Adams, Matthew ^{[1
]}

Chen, Weijia ^{[2
]}

Holcdorf, David ^{[1
]}

McCusker, Mark W. ^{[1
,3
]}

Howe, Piers D. L. ^{[2
]}

Gaillard, Frank ^{[1
,3
]}

机构：

[1] Royal Melbourne Hosp, Radiol Dept, Melbourne, Vic, Australia

[2] Univ Melbourne, Sch Psychol Sci, Melbourne, Vic, Australia

[3] Univ Melbourne, Radiol Dept, Melbourne, Vic, Australia

来源：

JOURNAL OF MEDICAL IMAGING AND RADIATION ONCOLOGY | 2019年 / 63卷 / 01期

关键词：

femoral neck fractures; learning; radiology; supervised machine learning; X-rays; OPERATING CHARACTERISTIC CURVES; ARTIFICIAL-INTELLIGENCE; CLASSIFICATION;

D O I：

10.1111/1754-9485.12828

中图分类号：

R8 [特种医学]; R445 [影像诊断学];

学科分类号：

1002 ; 100207 ; 1009 ;

摘要：

Introduction: To evaluate the accuracy of deep convolutional neural networks (DCNNs) for detecting neck of femur (NoF) fractures on radiographs, in comparison with perceptual training in medically-naive individuals. Methods: This study extends a previous study that conducted perceptual training in medically-naive individuals for the detection of NoF fractures on a variety of dataset sizes. The same anteroposterior hip radiograph dataset was used to train two DCNNs (AlexNet and GoogLeNet) to detect NoF fractures. For direct comparison with perceptual training results, deep learning was completed across a variety of dataset sizes (200, 320 and 640 images) with images split into training (80%) and validation (20%). An additional 160 images were used as the final test set. Multiple pre-processing and augmentation techniques were utilised. Results: AlexNet and GoogLeNet DCNNs NoF fracture detection accuracy increased with larger training dataset sizes and mildly with augmentation. Accuracy increased from 81.9% and 88.1% to 89.4% and 94.4% for AlexNet and GoogLeNet respectively. Similarly, the test accuracy for the perceptual training in top-performing medically-naive individuals increased from 87.6% to 90.5% when trained on 640 images compared with 200 images. Conclusions: Single detection tasks in radiology are commonly used in DCNN research with their results often used to make broader claims about machine learning being able to perform as well as subspecialty radiologists. This study suggests that as impressive as recognising fractures is for a DCNN, similar learning can be achieved by top-performing medically-naive humans with less than 1 hour of perceptual training.

引用

页码：27 / 32

页数：6