Evaluation of a Method to Identify and Categorize Section Headers in Clinical Documents

被引:74
作者
Denny, Joshua C. [1 ,2 ]
Spickard, Anderson, III [1 ,2 ]
Johnson, Kevin B. [1 ,3 ]
Peterson, Neeraja B. [2 ]
Peterson, Josh F. [1 ,2 ,4 ]
Miller, Randolph A. [1 ]
机构
[1] Vanderbilt Univ, Dept Biomed Informat, Sch Med, Nashville, TN USA
[2] Vanderbilt Univ, Dept Med, Div Gen Internal Med & Publ Hlth, Sch Med, Nashville, TN USA
[3] Vanderbilt Univ, Sch Med, Dept Pediat, Nashville, TN 37212 USA
[4] Vet Adm, Tennessee Valley Geriatr Res Educ Clin Ctr, Tennessee Valley Healthcare Syst, Nashville, TN USA
关键词
PHYSICAL DIAGNOSIS SKILLS; MEDICAL REFERENCE; DECISION-SUPPORT; ENTRY; PERFORMANCE; CURRICULUM; SOFTWARE; STUDENTS; CAPTURE; SYSTEMS;
D O I
10.1197/jamia.M3037
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Clinical notes, typically written in natural language, often contain substructure that divides them into sections, such as "History of Present Illness" or "Family Medical History." The authors designed and evaluated an algorithm ("SecTag") to identify both labeled and unlabeled (implied) note section headers in "history and physical examination" documents ("H&P notes"). Design: The SecTag algorithm uses a combination of natural language processing techniques, word variant recognition with spelling correction, terminology-based rules, and naive Bayesian scoring methods to identify note section headers. Eleven physicians evaluated SecTag's performance on 319 randomly chosen H&P notes. Measurements: The primary outcomes were the algorithm's recall and precision in identifying all document sections and a predefined list of twenty-nine major sections. A secondary outcome was to evaluate the algorithm's ability to recognize the correct start and end boundaries of identified sections. Results: The SecTag algorithm identified 16,036 total sections and 7,858 major sections. Physician evaluators classified 1.5,329 as true positives and identified 160 sections omitted by SecTag. The recall and precision of the SecTag algorithm were 99.0 and 95.6% for all sections, 98.6 and 96.2% for major sections, and 96.6 and 86.8% for unlabeled sections. The algorithm determined the correct starting and ending text boundaries for 94.8% of labeled sections and 85.9% of unlabeled sections. Conclusions: The SecTag algorithm accurately identified both labeled and unlabeled sections in history and physical documents. This type of algorithm may assist in natural language processing applications, such as clinical decision support systems or competency assessment for medical trainees. J Am Med Inform Assoc. 2009;16:806-815. DOI 10.1197/jamia.M3037.
引用
收藏
页码:806 / 815
页数:10
相关论文
共 56 条
  • [1] [Anonymous], LOGICAL OBSERVATION
  • [2] Aronson AR, 2001, J AM MED INFORM ASSN, P17
  • [3] BELL DS, 1994, J AM MED INFORM ASSN, P216
  • [4] BURCH GE, 1950, PRIMER VENOUS PRESSU
  • [5] Creating a text classifier to detect radiology reports describing mediastinal findings associated with inhalational anthrax and other disorders
    Chapman, WW
    Cooper, GF
    Hanbury, P
    Chapman, BE
    Harrison, LH
    Wagner, MM
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2003, 10 (05) : 494 - 503
  • [6] A simple algorithm for identifying negated findings and diseases in discharge summaries
    Chapman, WW
    Bridewell, W
    Hanbury, P
    Cooper, GF
    Buchanan, BG
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) : 301 - 310
  • [7] A comparison of classification algorithms to automatically identify chest X-ray reports that support pneumonia
    Chapman, WW
    Fizman, M
    Chapman, BE
    Huag, PJ
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (01) : 4 - 14
  • [8] Cimino JJ, 1998, METHOD INFORM MED, V37, P394
  • [9] A frequency-based technique to improve the spelling suggestion rank in medical queries
    Cowell, J
    Zeng, Q
    Ngo, L
    Lacroix, EM
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2004, 11 (03) : 179 - 185
  • [10] Crowell Jonathan B, 2003, AMIA Annu Symp Proc, P823