A novel machine learning framework for automated biomedical relation extraction from large-scale literature repositories

被引:48
作者
Hong, Lixiang [1 ]
Lin, Jinjian [1 ]
Li, Shuya [1 ]
Wan, Fangping [1 ]
Yang, Hui [2 ]
Jiang, Tao [3 ,4 ,5 ,6 ]
Zhao, Dan [1 ]
Zeng, Jianyang [1 ,5 ]
机构
[1] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing, Peoples R China
[2] Silexon Co Ltd, Nanjing, Peoples R China
[3] Tsinghua Univ, MOE Key Lab Bioinformat, TNLIST, Bioinformat Div, Beijing, Peoples R China
[4] Tsinghua Univ, Ctr Synthet & Syst Biol, Beijing, Peoples R China
[5] Tsinghua Univ, MOE Key Lab Bioinformat, Beijing, Peoples R China
[6] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
基金
中国国家自然科学基金;
关键词
NEURAL-NETWORKS; DATABASE; SYNTAX;
D O I
10.1038/s42256-020-0189-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A lot of scientific literature is unstructured, which makes extracting information for biomedical databases difficult. Hong and colleagues show that a distant supervision approach, using latent tree learning and recurrent units, can extract drug-target interactions from literature that were previously unknown. Knowledge about the relations between biomedical entities (such as drugs and targets) is widely distributed in more than 30 million research articles and consistently plays an important role in the development of biomedical science. In this work, we propose a novel machine learning framework, named BERE, for automatically extracting biomedical relations from large-scale literature repositories. BERE uses a hybrid encoding network to better represent each sentence from both semantic and syntactic aspects, and employs a feature aggregation network to make predictions after considering all relevant statements. More importantly, BERE can also be trained without any human annotation via a distant supervision technique. Through extensive tests, BERE has demonstrated promising performance in extracting biomedical relations, and can also find meaningful relations that were not reported in existing databases, thus providing useful hints to guide wet-lab experiments and advance the biological knowledge discovery process.
引用
收藏
页码:347 / +
页数:12
相关论文
共 60 条
[1]  
[Anonymous], 2014, XLFIT 5 4 0 8
[2]  
[Anonymous], 2013, 1 INT C LEARN REPR I
[3]  
[Anonymous], 2016, P 4 INT C LEARN REPR
[4]  
[Anonymous], 2014, P C EMP METH NAT LAN
[5]  
Bowman SR, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P1466
[6]  
Cho K., 2014, Empirical evaluation of gated recurrent neural networks on sequence modeling, P103, DOI [DOI 10.3115/V1/W14-4012, 10.3115/v1/w14-4012, https://doi.org/10.3115/v1/W14-4012]
[7]  
Choi J, 2018, AAAI CONF ARTIF INTE, P5094
[8]   Solving the multiple instance problem with axis-parallel rectangles [J].
Dietterich, TG ;
Lathrop, RH ;
LozanoPerez, T .
ARTIFICIAL INTELLIGENCE, 1997, 89 (1-2) :31-71
[9]  
Hashimoto Kazuma., 2013, EMNLP, P1372
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778