Boosting automatic event extraction from the literature using domain adaptation and coreference resolution

被引:96
作者
Miwa, Makoto [1 ,2 ]
Thompson, Paul [1 ,2 ]
Ananiadou, Sophia [1 ,2 ]
机构
[1] Univ Manchester, Natl Ctr Text Min NaCTeM, Manchester M1 7DN, Lancs, England
[2] Univ Manchester, Sch Comp Sci, Manchester M1 7DN, Lancs, England
基金
英国生物技术与生命科学研究理事会;
关键词
CORPUS; CLASSIFICATION;
D O I
10.1093/bioinformatics/bts237
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Motivation: In recent years, several biomedical event extraction (EE) systems have been developed. However, the nature of the annotated training corpora, as well as the training process itself, can limit the performance levels of the trained EE systems. In particular, most event-annotated corpora do not deal adequately with coreference. This impacts on the trained systems' ability to recognize biomedical entities, thus affecting their performance in extracting events accurately. Additionally, the fact that most EE systems are trained on a single annotated corpus further restricts their coverage. Results: We have enhanced our existing EE system, EventMine, in two ways. First, we developed a new coreference resolution (CR) system and integrated it with EventMine. The standalone performance of our CR system in resolving anaphoric references to proteins is considerably higher than the best ranked system in the COREF subtask of the BioNLP'11 Shared Task. Secondly, the improved EventMine incorporates domain adaptation (DA) methods, which extend EE coverage by allowing several different annotated corpora to be used during training. Combined with a novel set of methods to increase the generality and efficiency of EventMine, the integration of both CR and DA have resulted in significant improvements in EE, ranging between 0.5% and 3.4% F-Score. The enhanced EventMine outperforms the highest ranked systems from the BioNLP'09 shared task, and from the GENIA and Infectious Diseases subtasks of the BioNLP'11 shared task.
引用
收藏
页码:1759 / 1765
页数:7
相关论文
共 27 条
[1]
Event extraction for systems biology by text mining the literature [J].
Ananiadou, Sophia ;
Pyysalo, Sampo ;
Tsujii, Jun'ichi ;
Kell, Douglas B. .
TRENDS IN BIOTECHNOLOGY, 2010, 28 (07) :381-390
[2]
[Anonymous], 2011, Proceedings of the BioNLP Shared Task 2011 Workshop
[3]
[Anonymous], 2011, P 2011 C EMPIRICAL M
[4]
[Anonymous], 2010, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10
[5]
[Anonymous], 2011, P BIONLP SHARED TASK
[6]
Bjorne J., 2011, P BIONLP SHARED TASK, P183
[7]
Complex event extraction at PubMed scale [J].
Bjorne, Jari ;
Ginter, Filip ;
Pyysalo, Sampo ;
Tsujii, Jun'ichi ;
Salakoski, Tapio .
BIOINFORMATICS, 2010, 26 (12) :i382-i390
[8]
The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270
[9]
Fan RE, 2008, J MACH LEARN RES, V9, P1871
[10]
Fellbaum C., 1998, WordNet, DOI DOI 10.7551/MITPRESS/7287.001.0001