GeneTUKit: a software for document-level gene normalization

被引:51
作者
Huang, Minlie [1 ]
Liu, Jingchen [1 ]
Zhu, Xiaoyan [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
关键词
D O I
10.1093/bioinformatics/btr042
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Linking gene mentions in an article to entries of biological databases can facilitate indexing and querying biological literature greatly. Due to the high ambiguity of gene names, this task is particularly challenging. Manual annotation for this task is cost expensive, time consuming and labor intensive. Therefore, providing assistive tools to facilitate the task is of high value. Results: We developed GeneTUKit, a document-level gene normalization software for full-text articles. This software employs both local context surrounding gene mentions and global context from the whole full-text document. It can normalize genes of different species simultaneously. When participating in BioCreAtIvE III, the system obtained good results among 37 runs: the system was ranked first, fourth and seventh in terms of TAP-20, TAP-10 and TAP-5, respectively on the 507 full-text test articles.
引用
收藏
页码:1032 / 1033
页数:2
相关论文
共 11 条
[1]  
[Anonymous], 2007, P 24 INT C MACH LEAR
[2]   Threshold Average Precision (TAP-k): a measure of retrieval designed for bioinformatics [J].
Carroll, Hyrum D. ;
Kann, Maricel G. ;
Sheetlin, Sergey L. ;
Spouge, John L. .
BIOINFORMATICS, 2010, 26 (14) :1708-1713
[3]   Inter-species normalization of gene mentions with GNAT [J].
Hakenberg, Joerg ;
Plake, Conrad ;
Leaman, Robert ;
Schroeder, Michael ;
Gonzalez, Graciela .
BIOINFORMATICS, 2008, 24 (16) :I126-I132
[4]   ProMiner: rule-based protein and gene entity recognition [J].
Hanisch, D ;
Fundel, K ;
Mevissen, HT ;
Zimmer, R ;
Fluck, J .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[5]  
LU Z, 2010, BIOCREATIVE WORKSH B
[6]   Moara: a Java']Java library for extracting and normalizing gene and protein mentions [J].
Neves, Mariana L. ;
Carazo, Jose-Maria ;
Pascual-Montano, Alberto .
BMC BIOINFORMATICS, 2010, 11
[7]  
Schwartz Ariel S, 2003, Pac Symp Biocomput, P451
[8]   ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text [J].
Settles, B .
BIOINFORMATICS, 2005, 21 (14) :3191-3192
[9]   Overview of BioCreative II gene mention recognition [J].
Smith, Larry ;
Tanabe, Lorraine K. ;
Johnson Nee Ando, Rie ;
Kuo, Cheng-Ju ;
Chung, I-Fang ;
Hsu, Chun-Nan ;
Lin, Yu-Shi ;
Klinger, Roman ;
Friedrich, Christoph M. ;
Ganchev, Kuzman ;
Torii, Manabu ;
Liu, Hongfang ;
Haddow, Barry ;
Struble, Craig A. ;
Povinelli, Richard J. ;
Vlachos, Andreas ;
Baumgartner, William A., Jr. ;
Hunter, Lawrence ;
Carpenter, Bob ;
Tsai, Richard Tzong-Han ;
Dai, Hong-Jie ;
Liu, Feng ;
Chen, Yifei ;
Sun, Chengjie ;
Katrenko, Sophia ;
Adriaans, Pieter ;
Blaschke, Christian ;
Torres, Rafael ;
Neves, Mariana ;
Nakov, Preslav ;
Divoli, Anna ;
Mana-Lopez, Manuel ;
Mata, Jacinto ;
Wilbur, W. John .
GENOME BIOLOGY, 2008, 9
[10]   High-performance gene name normalization with GENO [J].
Wermter, Joachim ;
Tomanek, Katrin ;
Hahn, Udo .
BIOINFORMATICS, 2009, 25 (06) :815-821