Extraction of Protein Interaction Data: A Comparative Analysis of Methods in Use

被引:9
作者
Jose, Hena [1 ]
Vadivukarasi, Thangavel [1 ]
Devakumar, Jyothi [1 ]
机构
[1] Jubilant Biosys Ltd, 96 Ind Suburb,2nd Stage, Bangalore 560022, Karnataka, India
来源
EURASIP JOURNAL ON BIOINFORMATICS AND SYSTEMS BIOLOGY | 2007年 / 01期
关键词
D O I
10.1155/2007/53096
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Several natural language processing tools, both commercial and freely available, are used to extract protein interactions from publications. Methods used by these tools include pattern matching to dynamic programming with individual recall and precision rates. A methodical survey of these tools, keeping in mind the minimum interaction information a researcher would need, in comparison to manual analysis has not been carried out. We compared data generated using some of the selected NLP tools with manually curated protein interaction data (PathArt and IMaps) to comparatively determine the recall and precision rate. The rates were found to be lower than the published scores when a normalized definition for interaction is considered. Each data point captured wrongly or not picked up by the tool was analyzed. Our evaluation brings forth critical failures of NLP tools and provides pointers for the development of an ideal NLP tool. Copyright (C) 2007 Hena Jose et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
引用
收藏
页数:9
相关论文
共 17 条
[1]  
Ahmed S. T., 2005, ASS COMPUTATIONAL LI, P54
[2]   BioRAT: extracting biological information from full-length papers [J].
Corney, DPA ;
Buxton, BF ;
Langdon, WB ;
Jones, DT .
BIOINFORMATICS, 2004, 20 (17) :3206-3213
[3]   Extracting human protein interactions from MEDLINE using a full-sentence parser [J].
Daraselia, N ;
Yuryev, A ;
Egorov, S ;
Novichkova, S ;
Nikitin, A ;
Mazo, I .
BIOINFORMATICS, 2004, 20 (05) :604-U43
[4]   PreBIND and Textomy - mining the biomedical literature for protein-protein interactions using a support vector machine [J].
Donaldson, I ;
Martin, J ;
de Bruijn, B ;
Wolting, C ;
Lay, V ;
Tuekam, B ;
Zhang, SD ;
Baskin, B ;
Bader, GD ;
Michalickova, K ;
Pawson, T ;
Hogue, CWV .
BMC BIOINFORMATICS, 2003, 4 (1)
[5]  
Friedman C, 2001, Bioinformatics, V17 Suppl 1, pS74
[6]  
Fukuda K, 1998, Pac Symp Biocomput, P707
[7]   Literature mining and database annotation of protein phosphorylation using a rule-based system [J].
Hu, ZZ ;
Narayanaswamy, M ;
Ravikumar, KE ;
Vijay-Shanker, K ;
Wu, CH .
BIOINFORMATICS, 2005, 21 (11) :2759-2765
[8]   Discovering patterns to extract protein-protein interactions from full texts [J].
Huang, ML ;
Zhu, XY ;
Hao, Y ;
Payan, DG ;
Qu, KB ;
Li, M .
BIOINFORMATICS, 2004, 20 (18) :3604-3612
[9]   Biomedical language processing: What's beyond PubMed? [J].
Hunter, L ;
Cohen, KB .
MOLECULAR CELL, 2006, 21 (05) :589-594
[10]  
Jae-Hong Eom, 2004, Genomics & Informatics, V2, P99