BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events

被引:35
作者
Gerner, Martin [1 ]
Sarafraz, Farzaneh [2 ]
Bergman, Casey M. [1 ]
Nenadic, Goran [2 ]
机构
[1] Univ Manchester, Fac Life Sci, Manchester M13 9PT, Lancs, England
[2] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
基金
英国生物技术与生命科学研究理事会;
关键词
PROTEINS; GENES; TOOL;
D O I
10.1093/bioinformatics/bts332
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Although the amount of data in biology is rapidly increasing, critical information for understanding biological events like phosphorylation or gene expression remains locked in the biomedical literature. Most current text mining (TM) approaches to extract information about biological events are focused on either limited-scale studies and/or abstracts, with data extracted lacking context and rarely available to support further research. Results: Here we present BioContext, an integrated TM system which extracts, extends and integrates results from a number of tools performing entity recognition, biomolecular event extraction and contextualization. Application of our system to 10.9 million MEDLINE abstracts and 234 000 open-access full-text articles from PubMed Central yielded over 36 million mentions representing 11.4 million distinct events. Event participants included over 290 000 distinct genes/proteins that are mentioned more than 80 million times and linked where possible to Entrez Gene identifiers. Over a third of events contain contextual information such as the anatomical location of the event occurrence or whether the event is reported as negated or speculative.
引用
收藏
页码:2154 / 2161
页数:8
相关论文
共 33 条
  • [1] [Anonymous], 2011, Proceedings of the BioNLP Shared Task 2011 Workshop
  • [2] [Anonymous], 2008, P 22 INT C COMP LING, DOI DOI 10.3115/1599081.1599150
  • [3] pubmed2ensembl: A Resource for Mining the Biological Literature on Genes
    Baran, Joachim
    Gerner, Martin
    Haeussler, Maximilian
    Nenadic, Goran
    Bergman, Casey M.
    [J]. PLOS ONE, 2011, 6 (09):
  • [4] Bjorne J., 2010, Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, BioNLP '10, P28
  • [5] Complex event extraction at PubMed scale
    Bjorne, Jari
    Ginter, Filip
    Pyysalo, Sampo
    Tsujii, Jun'ichi
    Salakoski, Tapio
    [J]. BIOINFORMATICS, 2010, 26 (12) : i382 - i390
  • [6] Bjrne J., 2009, P BIONLP 09 SHARED T, P10
  • [7] Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles
    Blake, Catherine
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (02) : 173 - 189
  • [8] MINT, the molecular interaction database: 2009 update
    Ceol, Arnaud
    Aryamontri, Andrew Chatr
    Licata, Luana
    Peluso, Daniele
    Briganti, Leonardo
    Perfetto, Livia
    Castagnoli, Luisa
    Cesareni, Gianni
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : D532 - D539
  • [9] Cunningham H., 2011, PROCESSING GATE
  • [10] Gerner M., 2010, P 2010 WORKSHOP BIOM, P72