A software tool for removing patient identifying information from clinical documents

被引:68
作者
Friedlin, F. Jeff [1 ]
McDonald, Clement J. [1 ]
机构
[1] Regenstrief Inst Inc, Med Informat, Indianapolis, IN 46202 USA
关键词
D O I
10.1197/jamia.M2702
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We created a software tool that accurately removes all patient identifying information from various kinds of clinical data documents, including laboratory and narrative reports. We created the Medical De-identification System (MeDS), a software tool that de-identifies clinical documents, and performed 2 evaluations. Our first evaluation used 2,400 Health Level Seven (HL7) messages from 10 different HL7 message producers. After modifying the software based on the results of this first evaluation, we performed a second evaluation using 7,190 pathology report HL7 messages. We compared the results of MeDS de-identification process to a gold standard of human review to find identifying strings. For both evaluations, we calculated the number of successful scrubs, missed identifiers, and over-scrubs committed by MeDS and evaluated the readability and interpretability of the scrubbed messages. We categorized all missed identifiers into 3 groups: (1) complete HIPAA-specified identifiers, (2) HIPAA-specified identifier fragments, (3) non-HIPAA-specified identifiers (such as provider names and addresses). In the results of the first-pass evaluation, MeDS scrubbed 11,273 (99.06%) of the 11,380 HIPAA-specified identifiers and 38,095 (98.26%) of the 38,768 non-HIPAA-specified identifiers. In our second evaluation (status postmodification to the software), MeDS scrubbed 79,993 (99.47%) of the 80,418 HIPAA-specified identifiers and 12,689 (96.93%) of the 13,091 non-HIPAA-specified identifiers. Approximately 95% of scrubbed messages were both readable and interpretable. We conclude that MeDS successfully de-identified a wide range of medical documents from numerous sources and creates scrubbed reports that retain their interpretability, thereby maintaining their usefulness for research.
引用
收藏
页码:601 / 610
页数:10
相关论文
共 19 条
[1]   Development and evaluation of an open source software tool for deidentification of pathology reports [J].
Beckwith B.A. ;
Mahaadevan R. ;
Balis U.J. ;
Kuo F. .
BMC Medical Informatics and Decision Making, 6 (1)
[2]  
Berman JJ, 2003, ARCH PATHOL LAB MED, V127, P680
[3]   TOLERATING SPELLING-ERRORS DURING PATIENT VALIDATION [J].
FRIEDMAN, C ;
SIDELI, R .
COMPUTERS AND BIOMEDICAL RESEARCH, 1992, 25 (05) :486-509
[4]   A GENERAL NATURAL-LANGUAGE TEXT PROCESSOR FOR CLINICAL RADIOLOGY [J].
FRIEDMAN, C ;
ALDERSON, PO ;
AUSTIN, JHM ;
CIMINO, JJ ;
JOHNSON, SB .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, 1 (02) :161-174
[5]   Automated encoding of clinical documents based on natural language processing [J].
Friedman, C ;
Shagina, L ;
Lussier, Y ;
Hripcsak, G .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2004, 11 (05) :392-402
[6]   Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research [J].
Gupta, D ;
Saul, M ;
Gilbertson, J .
AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2004, 121 (02) :176-186
[7]  
Henderson M, 2007, HL7 MESSAGING
[8]  
Kurtz Gary, 2003, J Healthc Inf Manag, V17, P41
[9]   The Indiana network for patient care: A working local health information infrastructure [J].
McDonald, CJ ;
Overhage, JM ;
Barnes, M ;
Schadow, G ;
Blevins, L ;
Dexter, PR ;
Mamlin, B .
HEALTH AFFAIRS, 2005, 24 (05) :1214-1220
[10]   The Regenstrief Medical Record System: a quarter century experience [J].
McDonald, CJ ;
Overhage, JM ;
Tierney, WM ;
Dexter, PR ;
Martin, DK ;
Suico, JG ;
Zafar, A ;
Schadow, G ;
Blevins, L ;
Glazener, T ;
Meeks-Johnson, J ;
Lemmon, L ;
Warvel, J ;
Porterfield, B ;
Warvel, J ;
Cassidy, P ;
Lindbergh, D ;
Belsito, A ;
Tucker, M ;
Williams, B ;
Wodniak, C .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 1999, 54 (03) :225-253