An adaptive annotation approach for biomedical entity and relation recognition

被引:14
作者
Yimam S.M. [1 ]
Biemann C. [1 ]
Majnaric L. [2 ]
Šabanović Š. [2 ]
Holzinger A. [3 ]
机构
[1] TU Darmstadt CS Department, FG Language Technology, Darmstadt
[2] Josip Juraj Strossmayer University of Osijek Faculty of Medicine Osijek, Osijek
[3] Research Unit HCI-KDD Institute for Medical Informatics, Statistics and Documentation Medical University Graz, Auenbruggerplatz 2, Graz
关键词
Biomedical entity recognition; Data mining; Human in the loop; Interactive annotation; Knowledge discovery; Machine learning; Relation learning;
D O I
10.1007/s40708-016-0036-4
中图分类号
学科分类号
摘要
In this article, we demonstrate the impact of interactive machine learning: we develop biomedical entity recognition dataset using a human-into-the-loop approach. In contrary to classical machine learning, human-in-the-loop approaches do not operate on predefined training or test sets, but assume that human input regarding system improvement is supplied iteratively. Here, during annotation, a machine learning model is built on previous annotations and used to propose labels for subsequent annotation. To demonstrate that such interactive and iterative annotation speeds up the development of quality dataset annotation, we conduct three experiments. In the first experiment, we carry out an iterative annotation experimental simulation and show that only a handful of medical abstracts need to be annotated to produce suggestions that increase annotation speed. In the second experiment, clinical doctors have conducted a case study in annotating medical terms documents relevant for their research. The third experiment explores the annotation of semantic relations with relation instance learning across documents. The experiments validate our method qualitatively and quantitatively, and give rise to a more personalized, responsive information extraction technology. © 2016, The Author(s).
引用
收藏
页码:157 / 168
页数:11
相关论文
共 49 条
[1]  
Holzinger A., Human-n++n++computer interaction and knowledge discovery (HCI-KDD): what is the benefit of bringing those two fields to work together? In: Multidiscipl. Res. and Pract. for Inf. Sys., LNCS 8127, Springer 319–328, (2013)
[2]  
Holzinger A., Schantl J., Schroettner M., Seifert C., Verspoor K., Biomedical text mining: state-of-the-art, open problems and future challenges. In Holzinger A, Jurisica I, eds.: Interactive knowledge discovery and data mining in biomedical informatics, LNCS 8401, Springer 271–300, (2014)
[3]  
Holzinger A., Geierhofer R., Modritscher F., Tatzl R., Semantic information in medical information systems: utilization of text mining techniques to analyze medical diagnoses, JUCS, 14, pp. 3781-3795, (2008)
[4]  
Holzinger A., Yildirim P., Geier M., Simonic K.M., Quality-based knowledge discovery from medical text on the web. In Pasi G, Bordogna G, Jain LC, eds.: ISRL 50, Springer 145–158, (2013)
[5]  
Suchanek F.M., Kasneci G., Weikum G (2007) Yago: a core of semantic knowledge, Proceedings of the 16th International Conference on World Wide Web. WWW ’07, New York, NY, pp. 697-706
[6]  
Bizer C., Lehmann J., Kobilarov G., Auer S., Becker C., Cyganiak R., Hellmann S., Dbpedia—a crystallization point for the web of data, Web Semant, 7, pp. 154-165, (2009)
[7]  
Hirst G., Overcoming linguistic barriers to the multilingual semantic web, Towards the multilingual semantic web, pp. 1-14, (2015)
[8]  
Biemann C., Ontology learning from text: a survey of methods, LDV Forum, 20, pp. 75-93, (2005)
[9]  
Ghiasvand O., Kate R., UWM: disorder mention extraction from clinical text using CRFs and normalization using learned edit distance patterns, Proc. SemEval, (2014)
[10]  
Leser U., Hakenberg J., What makes a gene name? Named entity recognition in the biomedical literature, Brief Bioinform, 6, pp. 357-369, (2005)