SR4GN: A Species Recognition Software Tool for Gene Normalization

被引:52
作者
Wei, Chih-Hsuan [1 ,2 ]
Kao, Hung-Yu [2 ]
Lu, Zhiyong [1 ]
机构
[1] Natl Lib Med, Natl Ctr Biotechnol Informat, Bethesda, MD 20894 USA
[2] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 70101, Taiwan
基金
美国国家卫生研究院;
关键词
ENTITIES;
D O I
10.1371/journal.pone.0038460
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
070301 [无机化学]; 070403 [天体物理学]; 070507 [自然资源与国土空间规划学]; 090105 [作物生产系统与生态工程];
摘要
As suggested in recent studies, species recognition and disambiguation is one of the most critical and challenging steps in many downstream text-mining applications such as the gene normalization task and protein-protein interaction extraction. We report SR4GN: an open source tool for species recognition and disambiguation in biomedical text. In addition to the species detection function in existing tools, SR4GN is optimized for the Gene Normalization task. As such it is developed to link detected species with corresponding gene mentions in a document. SR4GN achieves 85.42% in accuracy and compares favorably to the other state-of-the-art techniques in benchmark experiments. Finally, SR4GN is implemented as a standalone software tool, thus making it convenient and robust for use in many text-mining applications. SR4GN can be downloaded at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/downloads/SR4GN
引用
收藏
页数:5
相关论文
共 22 条
[1]
[Anonymous], P 40 ANN M ASS COMP
[3]
Concept recognition for extracting protein interaction relations from biomedical text [J].
Baumgartner, William A., Jr. ;
Lu, Zhiyong ;
Johnson, Helen L. ;
Caporaso, J. Gregory ;
Paquette, Jesse ;
Lindemann, Anna ;
White, Elizabeth K. ;
Medvedeva, Olga ;
Cohen, K. Bretonnel ;
Hunter, Lawrence .
GENOME BIOLOGY, 2008, 9
[4]
Evolving GATE to meet new challenges in language engineering [J].
Bontcheva, Kalina ;
Tablan, Valentin ;
Maynard, Diana ;
Cunningham, Hamish .
Natural Language Engineering, 2004, 10 (3-4) :349-373
[5]
Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics [J].
Carroll, Hyrum D. ;
Kann, Maricel G. ;
Sheetlin, Sergey L. ;
Spouge, John L. .
BIOINFORMATICS, 2010, 26 (14) :1708-1713
[6]
LINNAEUS: A species name identification system for biomedical literature [J].
Gerner, Martin ;
Nenadic, Goran ;
Bergman, Casey M. .
BMC BIOINFORMATICS, 2010, 11
[7]
Inter-species normalization of gene mentions with GNAT [J].
Hakenberg, Joerg ;
Plake, Conrad ;
Leaman, Robert ;
Schroeder, Michael ;
Gonzalez, Graciela .
BIOINFORMATICS, 2008, 24 (16) :I126-I132
[8]
The GNAT library for local and remote gene mention normalization [J].
Hakenberg, Joerg ;
Gerner, Martin ;
Haeussler, Maximilian ;
Solt, Illes ;
Plake, Conrad ;
Schroeder, Michael ;
Gonzalez, Graciela ;
Nenadic, Goran ;
Bergman, Casey M. .
BIOINFORMATICS, 2011, 27 (19) :2769-2771
[9]
Overview of BioCreAtIvE task IB: normalized gene lists [J].
Hirschman, L ;
Colosimo, M ;
Morgan, A ;
Yeh, A .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[10]
Integrating high dimensional bi-directional parsing models for gene mention tagging [J].
Hsu, Chun-Nan ;
Chang, Yu-Ming ;
Kuo, Cheng-Ju ;
Lin, Yu-Shi ;
Huang, Han-Shen ;
Chung, I-Fang .
BIOINFORMATICS, 2008, 24 (13) :I286-I294