Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts

被引:33
作者
Barnickel, Thorsten
Weston, Jason
Collobert, Ronan
Mewes, Hans-Werner
Stuempflen, Volker
机构
[1] Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Bioinformatics and Systems Biology (MIPS), Neuherberg
[2] NEC Laboratories America, Inc., Princeton, NJ
[3] Department of Genome-Oriented Bioinformatics, Technische Universität München, Life and Food Science Center Weihenstephan, Freising-Weihenstephan
来源
PLOS ONE | 2009年 / 4卷 / 07期
关键词
D O I
10.1371/journal.pone.0006393
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA ("Semantic Extraction using a Neural Network Architecture''), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, cooccurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches.
引用
收藏
页数:6
相关论文
共 22 条
[11]   A gene network for navigating the literature [J].
Hoffmann, R ;
Valencia, A .
NATURE GENETICS, 2004, 36 (07) :664-664
[12]   Extraction of Protein Interaction Data: A Comparative Analysis of Methods in Use [J].
Jose, Hena ;
Vadivukarasi, Thangavel ;
Devakumar, Jyothi .
EURASIP JOURNAL ON BIOINFORMATICS AND SYSTEMS BIOLOGY, 2007, (01)
[13]   Accurate unlexicalized parsing [J].
Klein, D ;
Manning, CD .
41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, :423-430
[14]  
Kogan Yacov, 2005, AMIA Annu Symp Proc, P410
[15]  
Lease M, 2005, LECT NOTES ARTIF INT, V3651, P58, DOI 10.1007/11562214_6
[16]  
MIYAO Y, 2005, ACL 05 P 43 ANN M AS, P83
[17]  
Nedellec Claire., 2005, Proceedings of the 4th Learning Language in Logic Workshop, V7, P31
[18]   The Proposition Bank: An annotated corpus of semantic roles [J].
Palmer, M ;
Kingsbury, P ;
Gildeafi, D .
COMPUTATIONAL LINGUISTICS, 2005, 31 (01) :71-105
[19]   EBIMed - text crunching to gather facts for proteins from Medline [J].
Rebholz-Schuhmann, Dietrich ;
Kirsch, Harald ;
Arregui, Miguel ;
Gaudan, Sylvain ;
Riethoven, Mark ;
Stoehr, Peter .
BIOINFORMATICS, 2007, 23 (02) :E237-E244
[20]   BIOSMILE: A semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features [J].
Tsai, Richard Tzong-Han ;
Chou, Wen-Chi ;
Su, Ying-Shan ;
Lin, Yu-Chun ;
Sung, Cheng-Lung ;
Dai, Hong-Jie ;
Yeh, Irene Tzu-Hsuan ;
Ku, Wei ;
Sung, Ting-Yi ;
Hsu, Wen-Lian .
BMC BIOINFORMATICS, 2007, 8 (1)