Integration of gene normalization stages and co-reference resolution using a Markov logic network

被引:10
作者
Dai, Hong-Jie [1 ,2 ]
Chang, Yen-Ching [2 ,3 ,4 ]
Tsai, Richard Tzong-Han [5 ]
Hsu, Wen-Lian [1 ,2 ]
机构
[1] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu 30043, Taiwan
[2] Acad Sinica, Inst Informat Sci, Intelligent Agent Syst Lab, Taipei, Taiwan
[3] Natl Yang Ming Univ, Dept Life Sci, Taipei 112, Taiwan
[4] Natl Yang Ming Univ, Inst Genome Sci, Taipei 112, Taiwan
[5] Yuan Ze Univ, Dept Comp Sci & Engn, Chungli, Taiwan
关键词
KNOWLEDGE;
D O I
10.1093/bioinformatics/btr358
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Motivation: Gene normalization (GN) is the task of normalizing a textual gene mention to a unique gene database ID. Traditional top performing GN systems usually need to consider several constraints to make decisions in the normalization process, including filtering out false positives, or disambiguating an ambiguous gene mention, to improve system performance. However, these constraints are usually executed in several separate stages and cannot use each other's input/output interactively. In this article, we propose a novel approach that employs a Markov logic network (MLN) to model the constraints used in the GN task. Firstly, we show how various constraints can be formulated and combined in an MLN. Secondly, we are the first to apply the two main concepts of co-reference resolution-discourse salience in centering theory and transitivity-to GN models. Furthermore, to make our results more relevant to developers of information extraction applications, we adopt the instance-based precision/recall/F-measure (PRF) in addition to the article-wide PRF to assess system performance. Results: Experimental results show that our system outperforms baseline and state-of-the-art systems under two evaluation schemes. Through further analysis, we have found several unexplored challenges in the GN task.
引用
收藏
页码:2586 / 2594
页数:9
相关论文
共 30 条
[1]
AHA DW, 1995, LEARNING DATA ARTIFI, V5, P199
[2]
BAUMGARTNER WA, 2007, P 2 BIOCREATIVE CHAL, P257
[3]
Ultraconservative online algorithms for multiclass problems [J].
Crammer, K ;
Singer, Y .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :951-991
[4]
Automatically annotating documents with normalized gene lists [J].
Crim, J ;
McDonald, R ;
Pereira, F .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[5]
Multistage Gene Normalization and SVM-Based Ranking for Protein Interactor Extraction in Full-Text Articles [J].
Dai, Hong-Jie ;
Lai, Po-Ting ;
Tsai, Richard Tzong-Han .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (03) :412-420
[6]
Exploring the boundaries: gene and protein identification in biomedical text [J].
Finkel, J ;
Dingare, S ;
Manning, CD ;
Nissim, M ;
Alex, B ;
Grover, C .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[7]
GROSZ BJ, 1995, COMPUT LINGUIST, V21, P203
[8]
Inter-species normalization of gene mentions with GNAT [J].
Hakenberg, Joerg ;
Plake, Conrad ;
Leaman, Robert ;
Schroeder, Michael ;
Gonzalez, Graciela .
BIOINFORMATICS, 2008, 24 (16) :I126-I132
[9]
Khalid MA, 2008, LECT NOTES COMPUT SC, V4956, P705
[10]
LAI PT, 2009, IEEE INT C INF REUS, P1