Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC)

被引:51
作者
D'Avolio, Leonard W. [1 ,2 ,3 ,4 ]
Nguyen, Thien M. [1 ]
Farwell, Wildon R. [1 ,3 ,4 ,5 ]
Chen, Yongming [1 ]
Fitzmeyer, Felicia [1 ]
Harris, Owen M. [1 ]
Fiore, Louis D. [1 ,6 ,7 ]
机构
[1] VA Boston Healthcare Syst, Coordinating Ctr, Cooperat Studies, Massachusetts Vet Epidemiol Res & Informat Ctr MA, Jamaica Plain, MA 02130 USA
[2] Brigham & Womens Hosp, Ctr Surg & Publ Hlth, Boston, MA 02115 USA
[3] Brigham & Womens Hosp, Dept Med, Div Ageing, Boston, MA 02115 USA
[4] Harvard Univ, Sch Med, Boston, MA USA
[5] VA Boston Healthcare Syst, Dept Med, Boston, MA USA
[6] Boston Univ, Sch Publ Hlth, Boston, MA USA
[7] Boston Univ, Sch Med, Boston, MA 02118 USA
关键词
DE-IDENTIFICATION; MEDICAL-RECORDS; RADIOLOGY; QUALITY; ERA;
D O I
10.1136/jamia.2009.001412
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reducing custom software development effort is an important goal in information retrieval (IR). This study evaluated a generalizable approach involving with no custom software or rules development. The study used documents "consistent with cancer" to evaluate system performance in the domains of colorectal (CRC), prostate (PC), and lung (LC) cancer. Using an end-user-supplied reference set, the automated retrieval console (ARC) iteratively calculated performance of combinations of natural language processing-derived features and supervised classification algorithms. Training and testing involved 10-fold cross-validation for three sets of 500 documents each. Performance metrics included recall, precision, and F-measure. Annotation time for five physicians was also measured. Top performing algorithms had recall, precision, and F-measure values as follows: for CRC, 0.90, 0.92, and 0.89, respectively; for PC, 0.97, 0.95, and 0.94; and for LC, 0.76, 0.80, and 0.75. In all but one case, conditional random fields outperformed maximum entropy-based classifiers. Algorithms had good performance without custom code or rules development, but performance varied by specific application.
引用
收藏
页码:375 / 382
页数:8
相关论文
共 43 条
[1]  
[Anonymous], 1971, The SMART Retrieval System-Experiments in Automatic Document Processing
[2]  
[Anonymous], 1993, COMPUT LINGUIST, DOI DOI 10.21236/ADA273556
[3]  
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[4]   Ten commandments for effective clinical decision support: Making the practice of evidence-based medicine a reality [J].
Bates, DW ;
Kuperman, GJ ;
Wang, S ;
Gandhi, T ;
Kittler, A ;
Volk, L ;
Spurr, C ;
Khorasani, R ;
Tanasijevic, M ;
Middleton, B .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2003, 10 (06) :523-530
[5]   The contextual nature of medical information [J].
Berg, M ;
Goorman, E .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 1999, 56 (1-3) :51-60
[6]   THE MISINFORMATION ERA - THE FALL OF THE MEDICAL RECORD [J].
BURNUM, JF .
ANNALS OF INTERNAL MEDICINE, 1989, 110 (06) :482-484
[7]  
*COMM COMP EFF RES, 2009, IN PRIOR COMP EFF RE
[8]   GATE, a general architecture for text engineering [J].
Cunningham, H .
COMPUTERS AND THE HUMANITIES, 2002, 36 (02) :223-254
[9]   Electronic Medical Records at a Crossroads Impetus for Change or Missed Opportunity? [J].
D'Avolio, Leonard W. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2009, 302 (10) :1109-1111
[10]  
DEMNERFUSHMAN D, 2008, P 2008 ANN S AM MED