Evaluating contributions of natural language parsers to protein-protein interaction extraction

被引:88
作者
Miyao, Yusuke [1 ]
Sagae, Kenji [2 ]
Saetre, Rune [1 ]
Matsuzaki, Takuya [1 ]
Tsujii, Jun'ichi [1 ,3 ,4 ]
机构
[1] Univ Tokyo, Dept Comp Sci, Tokyo, Japan
[2] Univ So Calif, Inst Creat Technol, Los Angeles, CA 90089 USA
[3] Univ Manchester, Sch Comp Sci, Manchester, Lancs, England
[4] Natl Ctr Text Min, Manchester, Lancs, England
关键词
INFORMATION;
D O I
10.1093/bioinformatics/btn631
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: While text mining technologies for biomedical research have gained popularity as a way to take advantage of the explosive growth of information in text form in biomedical papers, selecting appropriate natural language processing (NLP) tools is still difficult for researchers who are not familiar with recent advances in NLP. This article provides a comparative evaluation of several state-of-the-art natural language parsers, focusing on the task of extracting protein protein interaction (PPI) from biomedical papers. We measure how each parser, and its output representation, contributes to accuracy improvement when the parser is used as a component in a PPI system. Results: All the parsers attained improvements in accuracy of PPI extraction. The levels of accuracy obtained with these different parsers vary slightly, while differences in parsing speed are larger. The best accuracy in this work was obtained when we combined Miyao and Tsujii's Enju parser and Charniak and Johnson's reranking parser, and the accuracy is better than the state-of-the-art results on the same data.
引用
收藏
页码:394 / 400
页数:7
相关论文
共 42 条
[1]  
Airola A., 2008, P WORKSH CURR TRENDS, P1, DOI DOI 10.3115/1572306.1572308
[2]  
[Anonymous], 2003, P 41 ANN M ASS COMP
[3]  
[Anonymous], 1993, Comput. Linguist., DOI DOI 10.21236/ADA273556
[4]  
[Anonymous], P 2 BIOCREATIVE CHAL
[5]   Intricacies of Collins' parsing model [J].
Bikel, DM .
COMPUTATIONAL LINGUISTICS, 2004, 30 (04) :479-511
[6]   Comparative experiments on learning information extractors for proteins and their interactions [J].
Bunescu, R ;
Ge, RF ;
Kate, RJ ;
Marcotte, EM ;
Mooney, RJ ;
Ramani, AK ;
Wong, YW .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 33 (02) :139-155
[7]  
Bunescu R., 2004, Proceedings of ACL, P439
[8]  
BUNESCU RC, 2005, P 19 ANN C NEUR INF
[9]  
Charniak E, 2000, 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, pA132
[10]  
Charniak Eugene, 2005, P 43 ANN M ASS COMP