Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

被引:1843
作者
Kung, Tiffany H. [1 ,2 ]
Cheatham, Morgan [3 ]
Medenilla, Arielle [1 ]
Sillos, Czarina [1 ]
De Leon, Lorie [1 ]
Elepano, Camille
Madriaga, Maria [1 ]
Aggabao, Rimel [1 ]
Diaz-Candido, Giezel [1 ]
Maningo, James [1 ]
Tseng, Victor [1 ,4 ]
机构
[1] AnsibleHealth Inc, Mountain View, CA 94043 USA
[2] Harvard Sch Med, Massachusetts Gen Hosp, Dept Anesthesiol, Boston, MA USA
[3] Brown Univ, Warren Alpert Med Sch, Providence, RI USA
[4] UWorld LLC, Dept Med Educ, Dallas, TX 75019 USA
来源
PLOS DIGITAL HEALTH | 2023年 / 2卷 / 02期
关键词
STUDENT;
D O I
10.1371/journal.pdig.0000198
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
100404 [儿少卫生与妇幼保健学];
摘要
We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making.
引用
收藏
页数:12
相关论文
共 25 条
[1]
Resuscitating the Socratic Method: Student and Faculty Perspectives on Posing Probing Questions During Clinical Teaching [J].
Abou-Hanna, Jacob J. ;
Owens, Sonal T. ;
Kinnucan, Jami A. ;
Mian, Shahzad I. ;
Kolars, Joseph C. .
ACADEMIC MEDICINE, 2021, 96 (01) :113-117
[2]
Patient Perception of Plain-Language Medical Notes Generated Using Artificial Intelligence Software: Pilot Mixed-Methods Study [J].
Bala, Sandeep ;
Keniston, Angela ;
Burden, Marisha .
JMIR FORMATIVE RESEARCH, 2020, 4 (06)
[3]
Bhatia Yajurv, 2019, 2019 12 INT C CONT C, DOI [10.1109/IC3.2019.8844921, DOI 10.1109/IC3.2019.8844921]
[4]
Study Behaviors and USMLE Step 1 Performance: Implications of a Student Self-Directed Parallel Curriculum [J].
Burk-Rafel, Jesse ;
Santen, Sally A. ;
Purkiss, Joel .
ACADEMIC MEDICINE, 2017, 92 (11) :S67-S74
[5]
How to develop machine learning models for healthcare [J].
Chen, Po-Hsuan Cameron ;
Liu, Yun ;
Peng, Lily .
NATURE MATERIALS, 2019, 18 (05) :410-414
[6]
Development and Evaluation of an Automated Machine Learning Algorithm for In-Hospital Mortality Risk Adjustment Among Critical Care Patients [J].
Delahanty, Ryan J. ;
Kaufman, David ;
Jones, Spencer S. .
CRITICAL CARE MEDICINE, 2018, 46 (06) :E481-E488
[7]
Densen Peter, 2011, Trans Am Clin Climatol Assoc, V122, P48
[8]
Artificial intelligence to support clinical decision-making processes [J].
Garcia-Vidal, Carolina ;
Sanjuan, Gemma ;
Puerta-Alcalde, Pedro ;
Moreno-Garcia, Estela ;
Soriano, Alex .
EBIOMEDICINE, 2019, 46 :27-29
[9]
Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs [J].
Gulshan, Varun ;
Peng, Lily ;
Coram, Marc ;
Stumpe, Martin C. ;
Wu, Derek ;
Narayanaswamy, Arunachalam ;
Venugopalan, Subhashini ;
Widner, Kasumi ;
Madams, Tom ;
Cuadros, Jorge ;
Kim, Ramasamy ;
Raman, Rajiv ;
Nelson, Philip C. ;
Mega, Jessica L. ;
Webster, R. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2016, 316 (22) :2402-2410
[10]
A comprehensive review of randomized clinical trials in three medical journals reveals 396 medical reversals [J].
Herrera-perez, Diana ;
Haslam, Alyson ;
Crain, Tyler ;
Gill, Jennifer ;
Livingston, Catherine ;
Kaestner, Victoria ;
Hayes, Michael ;
Morgan, Dan ;
Cifu, Adam S. ;
Prasad, Vinay .
ELIFE, 2019, 8