Recognition of medication information from discharge summaries using ensembles of classifiers

被引:22
作者
Doan, Son [1 ]
Collier, Nigel [1 ]
Xu, Hua [2 ]
Pham Hoang Duy [3 ]
Tu Minh Phuong [3 ]
机构
[1] Res Org Informat & Syst, Natl Inst Informat, Chiyoda Ku, Tokyo 1018430, Japan
[2] Vanderbilt Univ, Sch Med, Dept Biomed Informat, Nashville, TN 37212 USA
[3] Posts & Telecommun Inst Technol, Dept Comp Sci, Hanoi, Vietnam
关键词
EXTRACTION SYSTEM;
D O I
10.1186/1472-6947-12-36
中图分类号
R-058 [];
学科分类号
摘要
Background: Extraction of clinical information such as medications or problems from clinical text is an important task of clinical natural language processing (NLP). Rule-based methods are often used in clinical NLP systems because they are easy to adapt and customize. Recently, supervised machine learning methods have proven to be effective in clinical NLP as well. However, combining different classifiers to further improve the performance of clinical entity recognition systems has not been investigated extensively. Combining classifiers into an ensemble classifier presents both challenges and opportunities to improve performance in such NLP tasks. Methods: We investigated ensemble classifiers that used different voting strategies to combine outputs from three individual classifiers: a rule-based system, a support vector machine (SVM) based system, and a conditional random field (CRF) based system. Three voting methods were proposed and evaluated using the annotated data sets from the 2009 i2b2 NLP challenge: simple majority, local SVM-based voting, and local CRF-based voting. Results: Evaluation on 268 manually annotated discharge summaries from the i2b2 challenge showed that the local CRF-based voting method achieved the best F-score of 90.84% (94.11% Precision, 87.81% Recall) for 10-fold cross-validation. We then compared our systems with the first-ranked system in the challenge by using the same training and test sets. Our system based on majority voting achieved a better F-score of 89.65% (93.91% Precision, 85.76% Recall) than the previously reported F-score of 89.19% (93.78% Precision, 85.03% Recall) by the first-ranked system in the challenge. Conclusions: Our experimental results using the 2009 i2b2 challenge datasets showed that ensemble classifiers that combine individual classifiers into a voting system could achieve better performance than a single classifier in recognizing medication information from clinical text. It suggests that simple strategies that can be easily implemented such as majority voting could have the potential to significantly improve clinical entity recognition.
引用
收藏
页数:10
相关论文
共 23 条
  • [1] [Anonymous], 2004, Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), DOI 10.3115/1567594.1567618
  • [2] [Anonymous], 2000, P 18 C COMP LING COL, DOI [DOI 10.3115/990820, DOI 10.3115/990820.990850]
  • [3] Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
  • [4] A tutorial on Support Vector Machines for pattern recognition
    Burges, CJC
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) : 121 - 167
  • [5] Doan Son, 2010, Proc Int Conf Comput Ling, V2010, P259
  • [6] Integrating existing natural language processing tools for medication extraction from discharge summaries
    Doan, Son
    Bastarache, Lisa
    Klimkowski, Sergio
    Denny, Joshua C.
    Xu, Hua
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (05) : 528 - 531
  • [7] Halgrim S, 2010, P AMIA SUMM TRANSL B, P10
  • [8] A cascade of classifiers for extracting medication information from discharge summaries
    Halgrim S.R.
    Xia F.
    Solti I.
    Cadag E.
    Uzuner
    [J]. Journal of Biomedical Semantics, 2 (Suppl 3)
  • [9] Rutabaga by any other name: extracting biological names
    Hirschman, L
    Morgan, AA
    Yeh, AS
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2002, 35 (04) : 247 - 259
  • [10] Kazama J, 2002, P ACL 02 WORKSHOP NA, V3, P1, DOI [10.3115/1118149.1118150, DOI 10.3115/1118149.1118150]