Discovering patterns to extract protein-protein interactions from the literature: Part II

被引:53
作者
Hao, Y
Zhu, XY [1 ]
Huang, ML
Li, M
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
[2] Univ Waterloo, Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
[3] City Univ Hong Kong, Kowloon, Hong Kong, Peoples R China
基金
加拿大自然科学与工程研究理事会; 中国国家自然科学基金;
关键词
D O I
10.1093/bioinformatics/bti493
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: An enormous number of protein-protein interaction relationships are buried in millions of research articles published over the years, and the number is growing. Rediscovering them automatically is a challenging bioinformatics task. Solutions to this problem also reach far beyond bioinformatics. Results: We study a new approach that involves automatically discovering English expression patterns, optimizing them and using them to extract protein-protein interactions. In a sister paper, we described how to generate English expression patterns related to protein-protein interactions, and this approach alone has already achieved precision and recall rates significantly higher than those of other automatic systems. This paper continues to present our theory, focusing on how to improve the patterns. A minimum description length (MDL)-based pattern-optimization algorithm is designed to reduce and merge patterns. This has significantly increased generalization power, and hence the recall and precision rates, as confirmed by ourexperiments. Availability: http://spies.cs.tsinghua.edu.cn Contact: zxy-dcs@tsinghua.edu.cn
引用
收藏
页码:3294 / 3300
页数:7
相关论文
共 21 条
[1]   BIND - The Biomolecular Interaction Network Database [J].
Bader, GD ;
Donaldson, I ;
Wolting, C ;
Ouellette, BFF ;
Pawson, T ;
Hogue, CWV .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :242-245
[2]  
Brill E, 1995, COMPUT LINGUIST, V21, P543
[3]  
Friedman C, 2001, Bioinformatics, V17 Suppl 1, pS74
[4]   Accomplishments and challenges in literature data mining for biology [J].
Hirschman, L ;
Park, JC ;
Tsujii, J ;
Wong, L ;
Wu, CH .
BIOINFORMATICS, 2002, 18 (12) :1553-1561
[5]   Discovering patterns to extract protein-protein interactions from full texts [J].
Huang, ML ;
Zhu, XY ;
Hao, Y ;
Payan, DG ;
Qu, KB ;
Li, M .
BIOINFORMATICS, 2004, 20 (18) :3604-3612
[6]  
LEROY G, 2002, PACIFIC S BIOCOMPUTI, V7, P350
[7]  
Li M., 2008, INTRO KOLMOGOROV COM
[8]   Mining literature for protein-protein interactions [J].
Marcotte, EM ;
Xenarios, I ;
Eisenberg, D .
BIOINFORMATICS, 2001, 17 (04) :359-363
[9]  
Ng, 1999, Genome Inform Ser Workshop Genome Inform, V10, P104
[10]   Automated extraction of information on protein-protein interactions from the biological literature [J].
Ono, T ;
Hishigaki, H ;
Tanigami, A ;
Takagi, T .
BIOINFORMATICS, 2001, 17 (02) :155-161