What is Machine Learning? A Primer for the Epidemiologist

被引:435
作者
Bi, Qifang [1 ]
Goodman, Katherine E. [1 ]
Kaminsky, Joshua [1 ]
Lessler, Justin [1 ]
机构
[1] Johns Hopkins Univ, Bloomberg Sch Publ Hlth, Dept Epidemiol, Baltimore, MD USA
关键词
Big Data; ensemble models; machine learning; SUPPORT VECTOR MACHINE; GENE-EXPRESSION; NEURAL-NETWORKS; LOGISTIC-REGRESSION; GLOBAL DISTRIBUTION; DECISION TREE; BRAIN-TUMORS; NAIVE BAYES; CLASSIFICATION; PREDICTION;
D O I
10.1093/aje/kwz189
中图分类号
R1 [预防医学、卫生学];
学科分类号
100235 [预防医学];
摘要
Machine learning is a branch of computer science that has the potential to transform epidemiologic sciences. Amid a growing focus on "Big Data," it offers epidemiologists new tools to tackle problems for which classical methods are not well-suited. In order to critically evaluate the value of integrating machine learning algorithms and existing methods, however, it is essential to address language and technical barriers between the two fields that can make it difficult for epidemiologists to read and assess machine learning studies. Here, we provide an overview of the concepts and terminology used in machine learning literature, which encompasses a diverse set of tools with goals ranging from prediction to classification to clustering. We provide a brief introduction to 5 common machine learning algorithms and 4 ensemble-based approaches. We then summarize epidemiologic applications of machine learning techniques in the published literature. We recommend approaches to incorporate machine learning in epidemiologic research and discuss opportunities and challenges for integrating machine learning and existing epidemiologic research methods.
引用
收藏
页码:2222 / 2239
页数:18
相关论文
共 163 条
[71]
Artificial neural networks: A tutorial [J].
Jain, AK ;
Mao, JC ;
Mohiuddin, KM .
COMPUTER, 1996, 29 (03) :31-+
[72]
Data clustering: 50 years beyond K-means [J].
Jain, Anil K. .
PATTERN RECOGNITION LETTERS, 2010, 31 (08) :651-666
[73]
James G, 2013, SPRINGER TEXTS STAT, V103, P1, DOI [10.1007/978-1-4614-7138-7, 10.1007/978-1-4614-7138-7_1]
[74]
Kass Gordon V, 1980, Applied Statistics, V29, P119, DOI [10.2307/2986296, DOI 10.2307/2986296]
[75]
Predicting disease risks from highly imbalanced data using random forest [J].
Khalilia, Mohammed ;
Chakraborty, Sounak ;
Popescu, Mihail .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2011, 11
[76]
Prediction of inherited genomic susceptibility to 20 common cancer types by a supervised machine-learning method [J].
Kim, Byung-Ju ;
Kim, Sung-Hou .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (06) :1322-1327
[77]
Long-Term Consequences of Early Sexual Initiation on Young Adult Health: A Causal Inference Approach [J].
Kugler, Kari C. ;
Vasilenko, Sara A. ;
Butera, Nicole M. ;
Coffman, Donna L. .
JOURNAL OF EARLY ADOLESCENCE, 2017, 37 (05) :662-676
[78]
Predicting prolonged length of hospital stay in older emergency department users: Use of a novel analysis method, the Artificial Neural Network [J].
Launay, C. P. ;
Riviere, H. ;
Kabeshova, A. ;
Beauchet, O. .
EUROPEAN JOURNAL OF INTERNAL MEDICINE, 2015, 26 (07) :478-482
[79]
Improving propensity score weighting using machine learning [J].
Lee, Brian K. ;
Lessler, Justin ;
Stuart, Elizabeth A. .
STATISTICS IN MEDICINE, 2010, 29 (03) :337-346
[80]
Machine-Learning-Based Electronic Triage More Accurately Differentiates Patients With Respect to Clinical Outcomes Compared With the Emergency Severity Index [J].
Levin, Scott ;
Toerper, Matthew ;
Hamrock, Eric ;
Hinson, Jeremiah S. ;
Barnes, Sean ;
Gardner, Heather ;
Dugas, Andrea ;
Linton, Bob ;
Kirsch, Tom ;
Kelen, Gabor .
ANNALS OF EMERGENCY MEDICINE, 2018, 71 (05) :565-574