A hybrid method for relation extraction from biomedical literature

被引:18
作者
Huang, Minlie
Zhu, Xiaoyan [1 ]
Li, Ming
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
[2] Univ Waterloo, Bioinformat Lab, Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
[3] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
基金
加拿大自然科学与工程研究理事会; 中国国家自然科学基金;
关键词
natural language processing; NLP; information extraction; relation extraction; shallow parsing; pattern matching; appositive structure; coordinative structure; protein-protein interaction;
D O I
10.1016/j.ijmedinf.2005.06.010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose: Over recent years, there has been a growing interest in extracting entities and relations from biomedical literature. There are a vast number of systems and approaches being proposed to extract biological relations, but none of them achieves satisfactory results. These methodologies are either parsing-based or pattern-based, which are not competent to handle the grammatical complexities of biomedical texts, or too complicated to be adapted. It is well known that appositive, coordinative propositions and such grammatical structures are extremely common in biomedical texts, particularly in full texts. However, these problems are still untouched for most of researchers. Methods: In this paper, we have proposed a new approach, which is hybrid with both shallow parsing and pattern matching, to extract relations between proteins from scientific papers of biomedical themes. In the method, appositive and coordinative structures are interpreted based on the shallow parsing analysis, with both syntactic and semantic constraints. Then long sentences are splitted into sub-ones, from which relations are extracted by a greedy pattern matching algorithm, along with automatically generated patterns. Results: Our approach is experimented to extract protein-protein interactions from full biomedical texts, and has achieved an average F-score of 80% on individual verbs, and 66% on all verbs. With the help of shallow parsing analysis, pattern matching is improved remarkably. Compared with the traditional pattern matching algorithm, our approach achieves about 7% improvement of both precision and F-score. In contrast to other systems, our approach achieves performance comparable to the best. A demo system has been available at http://spies.cs.tsinghua.edu.cn. (c) 2005 Elsevier Ireland Ltd. All rights reserved.
引用
收藏
页码:443 / 455
页数:13
相关论文
共 21 条
[1]  
[Anonymous], 1995, P 3 ACL WORKSH VER L
[2]  
Brill E, 1995, COMPUT LINGUIST, V21, P543
[3]  
CALIFF ME, THESIS U TEXAS AUSTI
[4]  
FRIEDMAN C, 2001, BIOINFORMATICS S1, V17, P74
[5]  
Hearst MA, 1992, P 14 INT C COMP LING, V2, P539, DOI DOI 10.3115/992133.992154
[6]   Accomplishments and challenges in literature data mining for biology [J].
Hirschman, L ;
Park, JC ;
Tsujii, J ;
Wong, L ;
Wu, CH .
BIOINFORMATICS, 2002, 18 (12) :1553-1561
[7]   Discovering patterns to extract protein-protein interactions from full texts [J].
Huang, ML ;
Zhu, XY ;
Hao, Y ;
Payan, DG ;
Qu, KB ;
Li, M .
BIOINFORMATICS, 2004, 20 (18) :3604-3612
[8]   A shallow parser based on closed-class words to capture relations in biomedical text [J].
Leroy, G ;
Chen, HC ;
Martinez, JD .
JOURNAL OF BIOMEDICAL INFORMATICS, 2003, 36 (03) :145-158
[9]  
LEROY G, 2002, PACIFIC S BIOCOMPUTI, V7, P350
[10]   Automated extraction of information on protein-protein interactions from the biological literature [J].
Ono, T ;
Hishigaki, H ;
Tanigami, A ;
Takagi, T .
BIOINFORMATICS, 2001, 17 (02) :155-161