Extracting COVID-19 diagnoses and symptoms from clinical text: A new annotated corpus and neural event extraction framework

被引:42
作者
Lybarger, Kevin [1 ]
Ostendorf, Mari [2 ]
Thompson, Matthew [3 ]
Yetisgen, Meliha [1 ]
机构
[1] Univ Washington, Biomed & Hlth Informat, Box 358047, Seattle, WA 98109 USA
[2] Univ Washington, Dept Elect & Comp Engn, Campus Box 352500 185, Seattle, WA 98195 USA
[3] Univ Washington, Dept Family Med, Box 354696, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
COVID-19; Coronavirus; Machine learning; Natural language processing; Information extraction; METAMAP;
D O I
10.1016/j.jbi.2021.103761
中图分类号
TP39 [计算机的应用];
学科分类号
080201 [机械制造及其自动化];
摘要
Coronavirus disease 2019 (COVID-19) is a global pandemic. Although much has been learned about the novel coronavirus since its emergence, there are many open questions related to tracking its spread, describing symptomology, predicting the severity of infection, and forecasting healthcare utilization. Free-text clinical notes contain critical information for resolving these questions. Data-driven, automatic information extraction models are needed to use this text-encoded information in large-scale studies. This work presents a new clinical corpus, referred to as the COVID-19 Annotated Clinical Text (CACT) Corpus, which comprises 1,472 notes with detailed annotations characterizing COVID-19 diagnoses, testing, and clinical presentation. We introduce a span-based event extraction model that jointly extracts all annotated phenomena, achieving high performance in identifying COVID-19 and symptom events with associated assertion values (0.83-0.97 F1 for events and 0.73-0.79 F1 for assertions). Our span-based event extraction model outperforms an extractor built on MetaMapLite for the identification of symptoms with assertion values. In a secondary use application, we predicted COVID-19 test results using structured patient data (e.g. vital signs and laboratory results) and automatically extracted symptom information, to explore the clinical presentation of COVID-19. Automatically extracted symptoms improve COVID-19 prediction performance, beyond structured data alone.
引用
收藏
页数:13
相关论文
共 59 条
[1]
Alsentzer Emily, 2019, P 2 CLIN NATURAL LAN, P72, DOI [10.18653/v1/W19-1909, DOI 10.18653/V1/W19-1909]
[2]
Coronavirus outbreak in Nigeria: Burden and socio-medical response during the first 100 days [J].
Amzat, Jimoh ;
Aminu, Kafayat ;
Kolo, Victor, I ;
Akinyele, Ayodele A. ;
Ogundairo, Janet A. ;
Danjibo, Maryann C. .
INTERNATIONAL JOURNAL OF INFECTIOUS DISEASES, 2020, 98 :218-224
[3]
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[4]
Assertion modeling and its role in clinical phenotype identification [J].
Bejan, Cosmin Adrian ;
Vanderwende, Lucy ;
Xia, Fei ;
Yetisgen-Yildiz, Meliha .
JOURNAL OF BIOMEDICAL INFORMATICS, 2013, 46 (01) :68-74
[5]
Bertsimas D., 200616509 ARXIV
[6]
Detection of COVID-19 Infection from Routine Blood Exams with Machine Learning: A Feasibility Study [J].
Brinati, Davide ;
Campagner, Andrea ;
Ferrari, Davide ;
Locatelli, Massimo ;
Banfi, Giuseppe ;
Cabitza, Federico .
JOURNAL OF MEDICAL SYSTEMS, 2020, 44 (08)
[7]
Extracting medications and associated adverse drug events using a natural language processing system combining knowledge base and deep learning [J].
Chen, Long ;
Gu, Yu ;
Ji, Xin ;
Sun, Zhiyong ;
Li, Haodan ;
Gao, Yuan ;
Huang, Yang .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (01) :56-64
[8]
Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods [J].
Christopoulou, Fenia ;
Thy Thy Tran ;
Sahu, Sunil Kumar ;
Miwa, Makoto ;
Ananiadou, Sophia .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (01) :39-46
[9]
Collins GS, 2015, J CLIN EPIDEMIOL, V68, P112, DOI [10.7326/M14-0697, 10.1002/bjs.9736, 10.7326/M14-0698, 10.1016/j.jclinepi.2014.11.010, 10.1111/eci.12376, 10.1038/bjc.2014.639, 10.1186/s12916-014-0241-z, 10.1136/bmj.g7594, 10.1016/j.eururo.2014.11.025]
[10]
MetaMap Lite: an evaluation of a new Java']Java implementation of MetaMap [J].
Demner-Fushman, Dina ;
Rogers, Willie J. ;
Aronson, Alan R. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (04) :841-844