Gene Name Disambiguation Using Multi-Scope Species Detection

被引:1
作者
Hsiao, Jui-Chen [1 ]
Wei, Chih-Hsuan [2 ]
Kao, Hung-Yu [1 ,2 ]
机构
[1] Natl Cheng Kung Univ, Inst Med Informat, Tainan 701, Taiwan
[2] Natl Cheng Kung Univ, Inst Comp Sci & Informat, Tainan 701, Taiwan
关键词
Biomedical text mining; gene name disambiguation; focus species detection; PROTEIN INTERACTIONS; NORMALIZATION; TEXT; EXTRACTION; ARTICLES;
D O I
10.1109/TCBB.2013.139
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Species detection is an important topic in the text mining field. According to the importance of the research topics (e.g., species assignment to genes and document focus species detection), some studies are dedicated to an individual topic. However, no researcher to date has discussed species detection as a general problem. Therefore, we developed a multi-scope species detection model to identify the focus species for different scopes (i.e., gene mention, sentence, paragraph, and global scope of the entire article). Species assignment is one of the bottlenecks of gene name disambiguation. In our evaluation, recognizing the focus species of a gene mention in four different scopes improved the gene name disambiguation. We used the species cue words extracted from articles to estimate the relevance between an article and a species. The relevance score was calculated by our proposed entities frequency-augmented invert species frequency (EF-AISF) formula, which represents the importance of an entity to a species. We also defined a relation guide factor (RGF) to normalize the relevance score. Our method not only achieved better performance than previous methods but also can handle the articles that do not specifically mention a species. In the DECA corpus, we outperformed previous studies and obtained an accuracy of 88.22 percent.
引用
收藏
页码:55 / 62
页数:8
相关论文
共 30 条
[1]
Evaluation of BioCreAtIvE assessment of task 2 [J].
Blaschke, Christian ;
Leon, Eduardo Andres ;
Krallinger, Martin ;
Valencia, Alfonso .
BMC Bioinformatics, 2005, 6 (SUPPL.1)
[2]
LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[3]
BioLMiner System: Interaction Normalization Task and Interaction Pair Task in the BioCreative II.5 Challenge [J].
Chen, Yifei ;
Liu, Feng ;
Manderick, Bernard .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (03) :428-441
[4]
Coburn A., 2013, LINGUA EN TAGGER
[5]
Multistage Gene Normalization and SVM-Based Ranking for Protein Interactor Extraction in Full-Text Articles [J].
Dai, Hong-Jie ;
Lai, Po-Ting ;
Tsai, Richard Tzong-Han .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (03) :412-420
[6]
Inter-species normalization of gene mentions with GNAT [J].
Hakenberg, Joerg ;
Plake, Conrad ;
Leaman, Robert ;
Schroeder, Michael ;
Gonzalez, Graciela .
BIOINFORMATICS, 2008, 24 (16) :I126-I132
[7]
Efficient Extraction of Protein-Protein Interactions from Full-Text Articles [J].
Hakenberg, Joerg ;
Leaman, Robert ;
Vo, Nguyen Ha ;
Jonnalagadda, Siddhartha ;
Sullivan, Ryan ;
Miller, Christopher ;
Tari, Luis ;
Baral, Chitta ;
Gonzalez, Graciela .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (03) :481-494
[8]
Which species is it? Species-driven gene name disambiguation using random walks over a mixture of adjacency matrices [J].
Harmston, Nathan ;
Filsell, Wendy ;
Stumpf, Michael P. H. .
BIOINFORMATICS, 2012, 28 (02) :254-260
[9]
Hirschman L., 2005, BMC BIOINFORMATICS, V6
[10]
GeneTUKit: a software for document-level gene normalization [J].
Huang, Minlie ;
Liu, Jingchen ;
Zhu, Xiaoyan .
BIOINFORMATICS, 2011, 27 (07) :1032-1033