Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide

被引:233
作者
Marshall, Iain J. [1 ]
Noel-Storr, Anna [2 ]
Kuiper, Joel [3 ]
Thomas, James [4 ]
Wallace, Byron C. [5 ]
机构
[1] Kings Coll London, London, England
[2] Univ Oxford, Oxford, England
[3] Doctor Evidence, Santa Monica, CA USA
[4] UCL, London, England
[5] Northeastern Univ, Boston, MA 02115 USA
基金
美国国家卫生研究院; 英国医学研究理事会; 美国医疗保健研究与质量局;
关键词
MEDLINE;
D O I
10.1002/jrsm.1287
中图分类号
Q [生物科学];
学科分类号
090105 [作物生产系统与生态工程];
摘要
Machine learning (ML) algorithms have proven highly accurate for identifying Randomized Controlled Trials (RCTs) but are not used much in practice, in part because the best way to make use of the technology in a typical workflow is unclear. In this work, we evaluate ML models for RCT classification (support vector machines, convolutional neural networks, and ensemble approaches). We trained and optimized support vector machine and convolutional neural network models on the titles and abstracts of the Cochrane Crowd RCT set. We evaluated the models on an external dataset (Clinical Hedges), allowing direct comparison with traditional database search filters. We estimated area under receiver operating characteristics (AUROC) using the Clinical Hedges dataset. We demonstrate that ML approaches better discriminate between RCTs and non-RCTs than widely used traditional database search filters at all sensitivity levels; our best-performing model also achieved the best results to date for ML in this task (AUROC 0.987, 95% CI, 0.984-0.989). We provide practical guidance on the role of ML in (1) systematic reviews (high-sensitivity strategies) and (2) rapid reviews and clinical question answering (high-precision strategies) together with recommended probability cutoffs for each use case. Finally, we provide open-source software to enable these approaches to be used in practice.
引用
收藏
页码:602 / 614
页数:13
相关论文
共 32 条
[1]
[Anonymous], 1998, P AAAI 98 WORKSH LEA, DOI DOI 10.1109/TSMC.1985.6313426
[2]
[Anonymous], 2004, INT C MACH LEARN
[3]
Tips for learners of evidence-based medicine: 1. Relative risk reduction, absolute risk reduction and number needed to treat [J].
Barratt, A ;
Wyer, PC ;
Hatala, R ;
McGinn, T ;
Dans, AL ;
Keitz, S ;
Moyer, V ;
Guyatt, G .
CANADIAN MEDICAL ASSOCIATION JOURNAL, 2004, 171 (04) :353-358
[4]
Improving the quality of reporting of randomized controlled trials - The CONSORT statement [J].
Begg, C ;
Cho, M ;
Eastwood, S ;
Horton, R ;
Moher, D ;
Olkin, I ;
Pitkin, R ;
Rennie, D ;
Schulz, KF ;
Simel, D ;
Stroup, DF .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1996, 276 (08) :637-639
[5]
Bergstra J., 2013, PMLR, V28, P115, DOI DOI 10.5555/3042817.3042832
[6]
Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]
PREPARING AND UPDATING SYSTEMATIC REVIEWS OF RANDOMIZED CONTROLLED TRIALS OF HEALTH-CARE [J].
CHALMERS, I ;
ENKIN, M ;
KEIRSE, MJNC .
MILBANK QUARTERLY, 1993, 71 (03) :411-437
[8]
Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine [J].
Cohen, Aaron M. ;
Smalheiser, Neil R. ;
McDonagh, Marian S. ;
Yu, Clement ;
Adams, Clive E. ;
Davis, John M. ;
Yu, Philip S. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2015, 22 (03) :707-717
[9]
Ensemble methods in machine learning [J].
Dietterich, TG .
MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 :1-15
[10]
Glanville JM, 2006, J MED LIBR ASSOC, V94, P130