Large-scale directional relationship extraction and resolution

被引:24
作者
Giles, Cory B. [1 ]
Wren, Jonathan D. [1 ]
机构
[1] Oklahoma Med Res Fdn, Arthritis & Immunol Res Program, Oklahoma City, OK 73104 USA
关键词
D O I
10.1186/1471-2105-9-S9-S11
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Relationships between entities such as genes, chemicals, metabolites, phenotypes and diseases in MEDLINE are often directional. That is, one may affect the other in a positive or negative manner. Detection of causality and direction is key in piecing pathways together and in examining possible implications of experimental results. Because of the size and growth of biomedical literature, it is increasingly important to be able to automate this process as much as possible. Results: Here we present a method of relation extraction using dependency graph parsing with SVM classification. We tested the SVM classifier first on gold standard corpora from GENIA and find it achieved 82% precision and 94.8% recall (F-measure of 87.9) on these standardized test sets. We then applied the entire system to all available MEDLINE abstracts for two target interactions with known effects. We find that while some directional relations are extracted with low ambiguity, others are apparently contradictory, at least when considered in an isolated context. When examined, it is apparent some are dependent upon the surrounding context (e. g. whether the relationship referred to short-term or long-term effects, or whether the focus was extracellular versus intracellular). Conclusion: Thesaurus-based directional relation extraction can be done with reasonable accuracy, but is prone to false-positives on larger corpora due to noun modifiers. Furthermore, methods of resolving or disambiguating relationship context and contingencies are important for large-scale corpora.
引用
收藏
页数:13
相关论文
共 44 条
[11]   PreBIND and Textomy - mining the biomedical literature for protein-protein interactions using a support vector machine [J].
Donaldson, I ;
Martin, J ;
de Bruijn, B ;
Wolting, C ;
Lay, V ;
Tuekam, B ;
Zhang, SD ;
Baskin, B ;
Bader, GD ;
Michalickova, K ;
Pawson, T ;
Hogue, CWV .
BMC BIOINFORMATICS, 2003, 4 (1)
[12]   RelEx -: Relation extraction using dependency parse trees [J].
Fundel, Katrin ;
Kueffner, Robert ;
Zimmer, Ralf .
BIOINFORMATICS, 2007, 23 (03) :365-371
[13]   e-LiSe - an online tool for finding needles in the '(Medline) haystack' [J].
Gladki, Arek ;
Siedlecki, Pawel ;
Kaczanowski, Szymon ;
Zielenkiewicz, Piotr .
BIOINFORMATICS, 2008, 24 (08) :1115-1117
[14]   A hybrid method for relation extraction from biomedical literature [J].
Huang, Minlie ;
Zhu, Xiaoyan ;
Li, Ming .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2006, 75 (06) :443-455
[15]   Finding the evidence for protein-protein interactions from PubMed abstracts [J].
Jang, Hyunchul ;
Lim, Jaesoo ;
Lim, Joon-Ho ;
Park, Soo-Jun ;
Lee, Kyu-Chul ;
Park, Seon-Hee .
BIOINFORMATICS, 2006, 22 (14) :E220-E226
[16]   A literature network of human genes for high-throughput analysis of gene expression [J].
Tor-Kristian Jenssen ;
Astrid Lægreid ;
Jan Komorowski ;
Eivind Hovig .
Nature Genetics, 2001, 28 (1) :21-28
[17]  
JIANG J, 2007, SYSTEMATIC EXPLORATI, P113
[18]   A critical review of caffeine withdrawal: empirical validation of symptoms and signs, incidence, severity, and associated features [J].
Juliano, LM ;
Griffiths, RR .
PSYCHOPHARMACOLOGY, 2004, 176 (01) :1-29
[19]   Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations [J].
Kim, Hyunsoo ;
Park, Haesun ;
Drake, Barry L. .
BMC BIOINFORMATICS, 2007, 8 (Suppl 9)
[20]   GENIA corpus-a semantically annotated corpus for bio-textmining [J].
Kim, J-D ;
Ohta, T. ;
Tateisi, Y. ;
Tsujii, J. .
BIOINFORMATICS, 2003, 19 :i180-i182