Deep learning predicts hip fracture using confounding patient and healthcare variables

被引:183
作者
Badgeley, Marcus A. [1 ,2 ,3 ]
Zech, John R. [4 ]
Oakden-Rayner, Luke [5 ]
Glicksberg, Benjamin S. [6 ]
Liu, Manway [1 ]
Gale, William [7 ]
McConnell, Michael, V [1 ,8 ]
Percha, Bethany [2 ]
Snyder, Thomas M. [1 ]
Dudley, Joel T. [2 ,3 ]
机构
[1] Verily Life Sci LLC, San Francisco, CA USA
[2] Icahn Sch Med Mt Sinai, Inst Next Generat Healthcare, New York, NY 10029 USA
[3] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, New York, NY 10029 USA
[4] Calif Pacific Med Ctr, Dept Med, San Francisco, CA USA
[5] Univ Adelaide, Sch Publ Hlth, Adelaide, SA, Australia
[6] Univ Calif San Francisco, Bakar Computat Hlth Sci Inst, San Francisco, CA 94143 USA
[7] Univ Adelaide, Sch Comp Sci, Adelaide, SA, Australia
[8] Stanford Sch Med, Div Cardiovasc Med, Stanford, CA 94305 USA
基金
美国国家卫生研究院;
关键词
ARTIFICIAL-INTELLIGENCE; VERTEBRAL FRACTURES; GENE-EXPRESSION; MORTALITY; MODELS; WOMEN;
D O I
10.1038/s41746-019-0105-1
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Hip fractures are a leading cause of death and disability among older adults. Hip fractures are also the most commonly missed diagnosis on pelvic radiographs, and delayed diagnosis leads to higher cost and worse outcomes. Computer-aided diagnosis (CAD) algorithms have shown promise for helping radiologists detect fractures, but the image features underpinning their predictions are notoriously difficult to understand. In this study, we trained deep-learning models on 17,587 radiographs to classify fracture, 5 patient traits, and 14 hospital process variables. All 20 variables could be individually predicted from a radiograph, with the best performances on scanner model (AUC = 1.00), scanner brand (AUC = 0.98), and whether the order was marked "priority" (AUC 0.79). Fracture was predicted moderately well from the image (AUC = 0.78) and better when combining image features with patient data (AUC = 0.86, DeLong paired AUC comparison, p = 2e-9) or patient data plus hospital process features (AUC = 0.91, p = 1e-21). Fracture prediction on a test set that balanced fracture risk across patient variables was significantly lower than a random test set (AUC = 0.67, DeLong unpaired AUC comparison, p = 0.003); and on a test set with fracture risk balanced across patient and hospital process variables, the model performed randomly (AUC = 0.52, 95% CI 0.46-0.58), indicating that these variables were the main source of the model's fracture predictions. A single model that directly combines image features, patient, and hospital process data outperforms a Naive Bayes ensemble of an image-only model prediction, patient, and hospital process data. If CAD algorithms are inexplicably leveraging patient and process variables in their predictions, it is unclear how radiologists should interpret their predictions in the context of other known patient data. Further research is needed to illuminate deep-learning decision processes so that computers and clinicians can effectively cooperate.
引用
收藏
页数:10
相关论文
共 50 条
[21]   Alcohol intake as a risk factor for fracture [J].
Kanis, JA ;
Johansson, H ;
Johnell, O ;
Oden, A ;
De Laet, C ;
Eisman, JA ;
Pols, H ;
Tenenhouse, A .
OSTEOPOROSIS INTERNATIONAL, 2005, 16 (07) :737-742
[22]   Usefulness of computer-aided diagnosis schemes for vertebral fractures and lung nodules on chest radiographs [J].
Kasai, Satoshi ;
Li, Feng ;
Shiraishi, Junji ;
Doi, Kunio .
AMERICAN JOURNAL OF ROENTGENOLOGY, 2008, 191 (01) :260-265
[23]   Automatic Classification of Proximal Femur Fractures Based on Attention Models [J].
Kazi, Anees ;
Albarqouni, Shadi ;
Sanchez, Amelia Jimenez ;
Kirchhoff, Sonja ;
Biberthaler, Peter ;
Navab, Nassir ;
Mateus, Diana .
MACHINE LEARNING IN MEDICAL IMAGING (MLMI 2017), 2017, 10541 :70-78
[24]   Artificial intelligence in fracture detection: transfer learning from deep convolutional neural networks [J].
Kim, D. H. ;
MacKinnon, T. .
CLINICAL RADIOLOGY, 2018, 73 (05) :439-445
[25]   Radiographic Detection of Hip and Pelvic Fractures in the Emergency Department [J].
Kirby, Matthew W. ;
Spritzer, Charles .
AMERICAN JOURNAL OF ROENTGENOLOGY, 2010, 194 (04) :1054-1060
[26]   Gender-From-Iris or Gender-From-Mascara [J].
Kuehlkamp, Andrey ;
Becker, Benedict ;
Bowyer, Kevin .
2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2017), 2017, :1151-1159
[27]   Building Predictive Models in R Using the caret Package [J].
Kuhn, Max .
JOURNAL OF STATISTICAL SOFTWARE, 2008, 28 (05) :1-26
[28]   Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks [J].
Lakhani, Paras ;
Sundaram, Baskaran .
RADIOLOGY, 2017, 284 (02) :574-582
[29]   Capturing heterogeneity in gene expression studies by surrogate variable analysis [J].
Leek, Jeffrey T. ;
Storey, John D. .
PLOS GENETICS, 2007, 3 (09) :1724-1735
[30]   Fast and accurate view classification of echocardiograms using deep learning [J].
Madani, Ali ;
Arnaout, Ramy ;
Mofrad, Mohammad ;
Arnaout, Rima .
NPJ DIGITAL MEDICINE, 2018, 1