Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

被引：1843

作者：

Kung, Tiffany H. ^{[1
,2
]}

Cheatham, Morgan ^{[3
]}

Medenilla, Arielle ^{[1
]}

Sillos, Czarina ^{[1
]}

De Leon, Lorie ^{[1
]}

Elepano, Camille

Madriaga, Maria ^{[1
]}

Aggabao, Rimel ^{[1
]}

Diaz-Candido, Giezel ^{[1
]}

Maningo, James ^{[1
]}

Tseng, Victor ^{[1
,4
]}

机构：

[1] AnsibleHealth Inc, Mountain View, CA 94043 USA

[2] Harvard Sch Med, Massachusetts Gen Hosp, Dept Anesthesiol, Boston, MA USA

[3] Brown Univ, Warren Alpert Med Sch, Providence, RI USA

[4] UWorld LLC, Dept Med Educ, Dallas, TX 75019 USA

来源：

PLOS DIGITAL HEALTH | 2023年 / 2卷 / 02期

关键词：

STUDENT;

D O I：

10.1371/journal.pdig.0000198

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

100404 [儿少卫生与妇幼保健学];

摘要：

We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making.

引用

页数：12

共 25 条

[1]

Resuscitating the Socratic Method: Student and Faculty Perspectives on Posing Probing Questions During Clinical Teaching [J].

Abou-Hanna, Jacob J. ;

Owens, Sonal T. ;

Kinnucan, Jami A. ;

Mian, Shahzad I. ;

Kolars, Joseph C. .

ACADEMIC MEDICINE, 2021, 96 (01) :113-117

[2]

Patient Perception of Plain-Language Medical Notes Generated Using Artificial Intelligence Software: Pilot Mixed-Methods Study [J].

Bala, Sandeep ;

Keniston, Angela ;

Burden, Marisha .

JMIR FORMATIVE RESEARCH, 2020, 4 (06)

[3]

Bhatia Yajurv, 2019, 2019 12 INT C CONT C, DOI [10.1109/IC3.2019.8844921, DOI 10.1109/IC3.2019.8844921]

[4]

Study Behaviors and USMLE Step 1 Performance: Implications of a Student Self-Directed Parallel Curriculum [J].

Burk-Rafel, Jesse ;

Santen, Sally A. ;

Purkiss, Joel .

ACADEMIC MEDICINE, 2017, 92 (11) :S67-S74

[5]

How to develop machine learning models for healthcare [J].

Chen, Po-Hsuan Cameron ;

Liu, Yun ;

Peng, Lily .

NATURE MATERIALS, 2019, 18 (05) :410-414

[6]

Development and Evaluation of an Automated Machine Learning Algorithm for In-Hospital Mortality Risk Adjustment Among Critical Care Patients [J].

Delahanty, Ryan J. ;

Kaufman, David ;

Jones, Spencer S. .

CRITICAL CARE MEDICINE, 2018, 46 (06) :E481-E488

[7]

Densen Peter, 2011, Trans Am Clin Climatol Assoc, V122, P48

[8]

Artificial intelligence to support clinical decision-making processes [J].

Garcia-Vidal, Carolina ;

Sanjuan, Gemma ;

Puerta-Alcalde, Pedro ;

Moreno-Garcia, Estela ;

Soriano, Alex .

EBIOMEDICINE, 2019, 46 :27-29

[9]

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs [J].

Gulshan, Varun ;

Peng, Lily ;

Coram, Marc ;

Stumpe, Martin C. ;

Wu, Derek ;

Narayanaswamy, Arunachalam ;

Venugopalan, Subhashini ;

Widner, Kasumi ;

Madams, Tom ;

Cuadros, Jorge ;

Kim, Ramasamy ;

Raman, Rajiv ;

Nelson, Philip C. ;

Mega, Jessica L. ;

Webster, R. .

JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2016, 316 (22) :2402-2410

[10]

A comprehensive review of randomized clinical trials in three medical journals reveals 396 medical reversals [J].

Herrera-perez, Diana ;

Haslam, Alyson ;

Crain, Tyler ;

Gill, Jennifer ;

Livingston, Catherine ;

Kaestner, Victoria ;

Hayes, Michael ;

Morgan, Dan ;

Cifu, Adam S. ;

Prasad, Vinay .

ELIFE, 2019, 8

← 1 2 3 →