Discovering patterns to extract protein-protein interactions from full texts

被引:135
作者
Huang, ML
Zhu, XY [1 ]
Hao, Y
Payan, DG
Qu, KB
Li, M
机构
[1] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Rigel Pharmaceut Inc, San Francisco, CA 94080 USA
[3] Univ Waterloo, Sch Comp Sci, Bioinformat Lab, Waterloo, ON N2L 3G1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1093/bioinformatics/bth451
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Although there are several databases storing protein-protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein-protein interactions from biomedical texts. Results: We present a novel and robust approach for extracting protein-protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0% and precision rate of 80.5%.
引用
收藏
页码:3604 / 3612
页数:9
相关论文
共 20 条
[1]  
Brill E, 1995, COMPUT LINGUIST, V21, P543
[2]   Genetic and physical maps of Saccharomyces cerevisiae [J].
Cherry, JM ;
Ball, C ;
Weng, S ;
Juvik, G ;
Schmidt, R ;
Adler, C ;
Dunn, B ;
Dwight, S ;
Riles, L ;
Mortimer, RK ;
Botstein, D .
NATURE, 1997, 387 (6632) :67-73
[3]  
Friedman C, 2001, Bioinformatics, V17 Suppl 1, pS74
[4]   Accomplishments and challenges in literature data mining for biology [J].
Hirschman, L ;
Park, JC ;
Tsujii, J ;
Wong, L ;
Wu, CH .
BIOINFORMATICS, 2002, 18 (12) :1553-1561
[5]  
LEROY G, 2002, PACIFIC S BIOCOMPUTI, V7, P350
[6]   A GENERAL METHOD APPLICABLE TO SEARCH FOR SIMILARITIES IN AMINO ACID SEQUENCE OF 2 PROTEINS [J].
NEEDLEMAN, SB ;
WUNSCH, CD .
JOURNAL OF MOLECULAR BIOLOGY, 1970, 48 (03) :443-+
[7]  
Ng, 1999, Genome Inform Ser Workshop Genome Inform, V10, P104
[8]  
O'Donovan Claire, 2002, Brief Bioinform, V3, P275
[9]  
OHTA T, 2000, P COLING 2000 WORKSH, P28
[10]   Automated extraction of information on protein-protein interactions from the biological literature [J].
Ono, T ;
Hishigaki, H ;
Tanigami, A ;
Takagi, T .
BIOINFORMATICS, 2001, 17 (02) :155-161