Representing information in patient reports using natural language processing and the extensible markup language

被引:81
作者
Friedman, C
Hripcsak, G
Shagina, L
Liu, HF
机构
[1] Columbia Univ, Dept Med Informat, New York, NY 10032 USA
[2] CUNY Queens Coll, New York, NY USA
关键词
D O I
10.1136/jamia.1999.0060076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: To design a document model that provides reliable and efficient access to clinical information in patient reports for a broad range of clinical applications, and to implement an automated method using natural language processing that maps textual reports to a form consistent with the model. Methods: A document model that encodes structured clinical information in patient reports while retaining the original contents was designed using the extensible markup Language (XML), and a document type definition (DTD) was created. An existing natural language processor (NLP)was modified to generate output consistent with the model. Two hundred reports were processed using the modified NLP system, and the XML output that was generated was validated using an XML validating parser. Results: The modified NLP system successfully processed all 200 reports. The output of one report was invalid, and 199 reports were valid XML forms consistent with the DTD. Conclusions: Natural language processing can be used to automatically create an enriched document that contains a structured component whose elements are linked to portions of the original textual report. This integrated document model provides a representation where documents containing specific information can be accurately and efficiently retrieved by querying the structured components. If manual review of the documents is desired, the salient information in the original reports can also be identified and highlighted. Using an XML model of tagging provides an additional benefit in that software tools that manipulate XML documents are readily available.
引用
收藏
页码:76 / 87
页数:12
相关论文
共 39 条
  • [1] ALSCHULER L, 1997, P SGML EUROPE 97, P195
  • [2] BAUD RH, 1992, METHOD INFORM MED, V31, P117
  • [3] *DEP HLTH HUM SERV, 1990, INT CLASS DIS
  • [4] *DICOM, 1992, NEMA PS 3 1 PS 3 12
  • [5] DILKS C, 1996, P GRAPHIC COMMUNICAT, P583
  • [6] Dolin RH, 1997, J AM MED INFORM ASSN, P635
  • [7] Friedman C, 1995, Proc Annu Symp Comput Appl Med Care, P347
  • [8] A GENERAL NATURAL-LANGUAGE TEXT PROCESSOR FOR CLINICAL RADIOLOGY
    FRIEDMAN, C
    ALDERSON, PO
    AUSTIN, JHM
    CIMINO, JJ
    JOHNSON, SB
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, 1 (02) : 161 - 174
  • [9] Development and evaluation of a computerized admission diagnoses encoding system
    Gundersen, ML
    Haug, PJ
    Pryor, TA
    vanBree, R
    Koehler, S
    Bauer, K
    Clemons, B
    [J]. COMPUTERS AND BIOMEDICAL RESEARCH, 1996, 29 (05): : 351 - 372
  • [10] COMPUTERIZED EXTRACTION OF CODED FINDINGS FROM FREE-TEXT RADIOLOGIC REPORTS - WORK IN PROGRESS
    HAUG, PJ
    RANUM, DL
    FREDERICK, PR
    [J]. RADIOLOGY, 1990, 174 (02) : 543 - 548