The GNAT library for local and remote gene mention normalization

被引:46
作者
Hakenberg, Joerg [1 ]
Gerner, Martin [2 ]
Haeussler, Maximilian [2 ]
Solt, Illes [3 ]
Plake, Conrad [4 ]
Schroeder, Michael [5 ]
Gonzalez, Graciela [6 ]
Nenadic, Goran [7 ]
Bergman, Casey M. [2 ]
机构
[1] Hoffmann La Roche Inc, Pharma Res & Early Dev, Nutley, NJ 07110 USA
[2] Univ Manchester, Fac Life Sci, Manchester M13 9PT, Lancs, England
[3] Humboldt Univ, D-10090 Berlin, Germany
[4] Max Delbruck Ctr Mol Med, D-13092 Berlin, Germany
[5] Tech Univ Dresden, Ctr Biotechnol, D-01307 Dresden, Germany
[6] Arizona State Univ, Biomed Informat Dept, Phoenix, AZ 85004 USA
[7] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1093/bioinformatics/btr455
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the GNAT Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of GNAT achieves a Tap-20 score of 0.1987.
引用
收藏
页码:2769 / 2771
页数:3
相关论文
共 11 条
[1]   Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics [J].
Carroll, Hyrum D. ;
Kann, Maricel G. ;
Sheetlin, Sergey L. ;
Spouge, John L. .
BIOINFORMATICS, 2010, 26 (14) :1708-1713
[2]   LINNAEUS: A species name identification system for biomedical literature [J].
Gerner, Martin ;
Nenadic, Goran ;
Bergman, Casey M. .
BMC BIOINFORMATICS, 2010, 11
[3]   Annotating genes and genomes with DNA sequences extracted from biomedical articles [J].
Haeussler, Maximilian ;
Gerner, Martin ;
Bergman, Casey M. .
BIOINFORMATICS, 2011, 27 (07) :980-986
[4]   Inter-species normalization of gene mentions with GNAT [J].
Hakenberg, Joerg ;
Plake, Conrad ;
Leaman, Robert ;
Schroeder, Michael ;
Gonzalez, Graciela .
BIOINFORMATICS, 2008, 24 (16) :I126-I132
[5]   Overview of BioCreAtIvE task IB: normalized gene lists [J].
Hirschman, L ;
Colosimo, M ;
Morgan, A ;
Yeh, A .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[6]   GeneTUKit: a software for document-level gene normalization [J].
Huang, Minlie ;
Liu, Jingchen ;
Zhu, Xiaoyan .
BIOINFORMATICS, 2011, 27 (07) :1032-1033
[7]  
Leaman R., 2008, PACIFIC S BIOCOMPUTI, V13, P652
[8]  
LU Z, 2010, P BIOCREATIVE SEPT 1, V3, P20
[9]   Overview of BioCreative II gene normalization [J].
Morgan, Alexander A. ;
Lu, Zhiyong ;
Wang, Xinglong ;
Cohen, Aaron M. ;
Fluck, Juliane ;
Ruch, Patrick ;
Divoli, Anna ;
Fundel, Katrin ;
Leaman, Robert ;
Hakenberg, Joerg ;
Sun, Chengjie ;
Liu, Heng-hui ;
Torres, Rafael ;
Krauthammer, Michael ;
Lau, William W. ;
Liu, Hongfang ;
Hsu, Chun-Nan ;
Schuemie, Martijn ;
Cohen, K. Bretonnel ;
Hirschman, Lynette .
GENOME BIOLOGY, 2008, 9
[10]  
SOLT I, 2010, P BIOCREATIVE SEPT 1, V3, P134