Which species is it? Species-driven gene name disambiguation using random walks over a mixture of adjacency matrices

被引:4
作者
Harmston, Nathan [1 ]
Filsell, Wendy [2 ]
Stumpf, Michael P. H. [1 ]
机构
[1] Univ London Imperial Coll Sci Technol & Med, Ctr Bioinformat, Div Mol Biosci, London SW7 2AZ, England
[2] Unilever R&D, Sharnbrook MK44 1LQ, Beds, England
基金
英国生物技术与生命科学研究理事会;
关键词
SYSTEMS BIOLOGY; TEXT; NORMALIZATION; ENTITIES; IDENTIFICATION; PROTEIN; MODEL; GNAT;
D O I
10.1093/bioinformatics/btr640
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Results: Our method performs well in terms of both micro- and macro-averaged performance, achieving micro-F-1 of 0.76 and macro-F-1 of 0.36 on the publicly available DECA corpus. Re-curation of the DECA corpus was performed, with our method achieving 0.88 micro-F-1 and 0.51 macro-F-1. Our method improves over standard classification techniques [such as support vector machines (SVMs)] in a number of ways: flexibility, interpretability and its resistance to the effects of class bias in the training data. Good performance is achieved without the need for computationally expensive parse tree generation or 'bag of words classification'.
引用
收藏
页码:254 / 260
页数:7
相关论文
共 30 条
[1]
Text mining and its potential applications in systems biology [J].
Ananiadou, Sophia ;
Kell, Douglas B. ;
Tsujii, Jun-ichi .
TRENDS IN BIOTECHNOLOGY, 2006, 24 (12) :571-579
[2]
Gene name ambiguity of eukaryotic nomenclatures [J].
Chen, LF ;
Liu, HF ;
Friedman, C .
BIOINFORMATICS, 2005, 21 (02) :248-256
[3]
Farkas R, 2008, BMC BIOINFORMATICS, V24, pi126
[4]
Whither model organism research? [J].
Fields, S ;
Johnston, M .
SCIENCE, 2005, 307 (5717) :1885-1886
[5]
LINNAEUS: A species name identification system for biomedical literature [J].
Gerner, Martin ;
Nenadic, Goran ;
Bergman, Casey M. .
BMC BIOINFORMATICS, 2010, 11
[6]
Hahn U., 2008, Proceedings of the LREC Workshop: Towards Enhanced Interoperability for Large HLT Systems, P1
[7]
Finding kinetic parameters using text mining [J].
Hakenberg, J ;
Schmeier, S ;
Kowald, A ;
Klipp, E ;
Leser, U .
OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2004, 8 (02) :131-152
[8]
Inter-species normalization of gene mentions with GNAT [J].
Hakenberg, Joerg ;
Plake, Conrad ;
Leaman, Robert ;
Schroeder, Michael ;
Gonzalez, Graciela .
BIOINFORMATICS, 2008, 24 (16) :I126-I132
[9]
The GNAT library for local and remote gene mention normalization [J].
Hakenberg, Joerg ;
Gerner, Martin ;
Haeussler, Maximilian ;
Solt, Illes ;
Plake, Conrad ;
Schroeder, Michael ;
Gonzalez, Graciela ;
Nenadic, Goran ;
Bergman, Casey M. .
BIOINFORMATICS, 2011, 27 (19) :2769-2771
[10]
Harary F., 1994, GRAPH THEORY