Complex event extraction at PubMed scale

被引:65
作者
Bjorne, Jari [1 ,2 ]
Ginter, Filip [1 ]
Pyysalo, Sampo [3 ]
Tsujii, Jun'ichi [3 ,4 ]
Salakoski, Tapio [1 ,2 ]
机构
[1] Univ Turku, Dept Informat Technol, Turku, Finland
[2] Turku Ctr Comp Sci TUCS, Turku, Finland
[3] Univ Tokyo, Dept Comp Sci, Tokyo, Japan
[4] Univ Manchester, Natl Ctr Text Min, Manchester, Lancs, England
基金
芬兰科学院;
关键词
NETWORK;
D O I
10.1093/bioinformatics/btq180
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: There has recently been a notable shift in biomedical information extraction (IE) from relation models toward the more expressive event model, facilitated by the maturation of basic tools for biomedical text analysis and the availability of manually annotated resources. The event model allows detailed representation of complex natural language statements and can support a number of advanced text mining applications ranging from semantic search to pathway extraction. A recent collaborative evaluation demonstrated the potential of event extraction systems, yet there have so far been no studies of the generalization ability of the systems nor the feasibility of large-scale extraction. Results: This study considers event-based IE at PubMed scale. We introduce a system combining publicly available, state-of-the-art methods for domain parsing, named entity recognition and event extraction, and test the system on a representative 1% sample of all PubMed citations. We present the first evaluation of the generalization performance of event extraction systems to this scale and show that despite its computational complexity, event extraction from the entire PubMed is feasible. We further illustrate the value of the extraction approach through a number of analyses of the extracted information. Availability: The event detection system and extracted data are open source licensed and available at http://bionlp.utu.fi/. Contact: jari.bjorne@utu.fi
引用
收藏
页码:i382 / i390
页数:9
相关论文
共 39 条
  • [1] All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning
    Airola, Antti
    Pyysalo, Sampo
    Bjoerne, Jari
    Pahikkala, Tapio
    Ginter, Filip
    Salakoski, Tapio
    [J]. BMC BIOINFORMATICS, 2008, 9 (Suppl 11)
  • [2] [Anonymous], 2008, Tech. Rep.
  • [3] [Anonymous], 2006, AGROBIZNES
  • [4] BENTON N, 1999, NLM TECHNICAL B, V311
  • [5] Bjrne J., 2009, P BIONLP 09 SHARED T, P10
  • [6] Current issues in biomedical text mining and natural language processing
    Chapman, Wendy W.
    Cohen, K. Bretonnel
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) : 757 - 759
  • [7] Charniak Eugene, 2005, P 43 ANN M ASS COMP, P173, DOI DOI 10.3115/1219840.1219862
  • [8] MINT and IntAct contribute to the Second BioCreative challenge: serving the text-mining community with high quality molecular interaction data
    Chatr-aryamontri, Andrew
    Kerrien, Samuel
    Khadake, Jyoti
    Orchard, Sandra
    Ceol, Arnaud
    Licata, Luana
    Castagnoli, Luisa
    Costa, Stefano
    Derow, Cathy
    Huntley, Rachael
    Aranda, Bruno
    Leroy, Catherine
    Thorneycroft, Dave
    Apweiler, Rolf
    Cesareni, Gianni
    Hermjakob, Henning
    [J]. GENOME BIOLOGY, 2008, 9
  • [9] Content-rich biological network constructed by mining PubMed abstracts
    Chen, H
    Sharp, BM
    [J]. BMC BIOINFORMATICS, 2004, 5 (1)
  • [10] Bayesian inference of protein-protein interactions from biological literature
    Chowdhary, Rajesh
    Zhang, Jinfeng
    Liu, Jun S.
    [J]. BIOINFORMATICS, 2009, 25 (12) : 1536 - 1542