Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases

被引:12
作者
Chatr-aryamontri, Andrew [1 ]
Winter, Andrew [1 ]
Perfetto, Livia [2 ]
Briganti, Leonardo [2 ]
Licata, Luana [2 ]
Iannuccelli, Marta [2 ]
Castagnoli, Luisa [2 ]
Cesareni, Gianni [2 ,3 ]
Tyers, Mike [1 ,4 ]
机构
[1] Univ Edinburgh, Sch Biol Sci, Edinburgh EH9 3JR, Midlothian, Scotland
[2] Univ Roma Tor Vergata, Dept Biol, I-00133 Rome, Italy
[3] Fdn Santa Lucia, IRCCS, I-00143 Rome, Italy
[4] Mt Sinai Hosp, Samuel Lunenfeld Res Inst, Ctr Syst Biol, Toronto, ON M5G 1X5, Canada
来源
BMC BIOINFORMATICS | 2011年 / 12卷
基金
美国国家卫生研究院; 英国生物技术与生命科学研究理事会; 加拿大健康研究院;
关键词
NETWORKS; SYSTEMS; COMMUNITY;
D O I
10.1186/1471-2105-12-S8-S8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate the information dispersed across the entire biomedical literature. The development of efficient tools based on natural language processing (NLP) systems is essential for the selection of relevant publications, identification of data attributes and partially automated annotation. One of the tasks of the Biocreative 2010 Challenge III was devoted to the evaluation of NLP systems developed to identify articles for curation and extraction of protein-protein interaction (PPI) data. Results: The Biocreative 2010 competition addressed three tasks: gene normalization, article classification and interaction method identification. The BioGRID and MINT protein interaction databases both participated in the generation of the test publication set for gene normalization, annotated the development and test sets for article classification, and curated the test set for interaction method classification. These test datasets served as a gold standard for the evaluation of data extraction algorithms. Conclusion: The development of efficient tools for extraction of PPI data is a necessary step to achieve full curation of the biomedical literature. NLP systems can in the first instance facilitate expert curation by refining the list of candidate publications that contain PPI data; more ambitiously, NLP approaches may be able to directly extract relevant information from full-text articles for rapid inspection by expert curators. Close collaboration between biological databases and NLP systems developers will continue to facilitate the long-term objectives of both disciplines.
引用
收藏
页数:8
相关论文
共 42 条
  • [11] The BioGRID interaction database:: 2008 update
    Breitkreutz, Bobby-Joe
    Stark, Chris
    Reguly, Teresa
    Boucher, Lorrie
    Breitkreutz, Ashton
    Livstone, Michael
    Oughtred, Rose
    Lackner, Daniel H.
    Bahler, Jurg
    Wood, Valerie
    Dolinski, Kara
    Tyers, Mike
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D637 - D640
  • [12] Linking entries in protein interaction database to structured text: The FEBS Letters experiment
    Ceol, Arnaud
    Chatr-Aryamontri, Andrew
    Licata, Luana
    Cesareni, Gianni
    [J]. FEBS LETTERS, 2008, 582 (08) : 1171 - 1177
  • [13] MINT, the molecular interaction database: 2009 update
    Ceol, Arnaud
    Aryamontri, Andrew Chatr
    Licata, Luana
    Peluso, Daniele
    Briganti, Leonardo
    Perfetto, Livia
    Castagnoli, Luisa
    Cesareni, Gianni
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : D532 - D539
  • [14] Protein interactions: integration leads to belief
    Chatr-aryamontri, Andrew
    Ceol, Arnaud
    Licata, Luana
    Cesareni, Gianni
    [J]. TRENDS IN BIOCHEMICAL SCIENCES, 2008, 33 (06) : 241 - 242
  • [15] MINT and IntAct contribute to the Second BioCreative challenge: serving the text-mining community with high quality molecular interaction data
    Chatr-aryamontri, Andrew
    Kerrien, Samuel
    Khadake, Jyoti
    Orchard, Sandra
    Ceol, Arnaud
    Licata, Luana
    Castagnoli, Luisa
    Costa, Stefano
    Derow, Cathy
    Huntley, Rachael
    Aranda, Bruno
    Leroy, Catherine
    Thorneycroft, Dave
    Apweiler, Rolf
    Cesareni, Gianni
    Hermjakob, Henning
    [J]. GENOME BIOLOGY, 2008, 9
  • [16] MatrixDB, the extracellular matrix interaction database
    Chautard, Emilie
    Fatoux-Ardore, Marie
    Ballut, Lionel
    Thierry-Mieg, Nicolas
    Ricard-Blum, Sylvie
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D235 - D240
  • [17] Structured digital tables on the Semantic Web: toward a structured digital literature
    Cheung, Kei-Hoi
    Samwald, Matthias
    Auerbach, Raymond K.
    Gerstein, Mark B.
    [J]. MOLECULAR SYSTEMS BIOLOGY, 2010, 6
  • [18] Network-based classification of breast cancer metastasis
    Chuang, Han-Yu
    Lee, Eunjung
    Liu, Yu-Tsueng
    Lee, Doheon
    Ideker, Trey
    [J]. MOLECULAR SYSTEMS BIOLOGY, 2007, 3 (1)
  • [19] A Decade of Systems Biology
    Chuang, Han-Yu
    Hofree, Matan
    Ideker, Trey
    [J]. ANNUAL REVIEW OF CELL AND DEVELOPMENTAL BIOLOGY, VOL 26, 2010, 26 : 721 - 744
  • [20] The Genetic Landscape of a Cell
    Costanzo, Michael
    Baryshnikova, Anastasia
    Bellay, Jeremy
    Kim, Yungil
    Spear, Eric D.
    Sevier, Carolyn S.
    Ding, Huiming
    Koh, Judice L. Y.
    Toufighi, Kiana
    Mostafavi, Sara
    Prinz, Jeany
    Onge, Robert P. St.
    VanderSluis, Benjamin
    Makhnevych, Taras
    Vizeacoumar, Franco J.
    Alizadeh, Solmaz
    Bahr, Sondra
    Brost, Renee L.
    Chen, Yiqun
    Cokol, Murat
    Deshpande, Raamesh
    Li, Zhijian
    Lin, Zhen-Yuan
    Liang, Wendy
    Marback, Michaela
    Paw, Jadine
    Luis, Bryan-Joseph San
    Shuteriqi, Ermira
    Tong, Amy Hin Yan
    van Dyk, Nydia
    Wallace, Iain M.
    Whitney, Joseph A.
    Weirauch, Matthew T.
    Zhong, Guoqing
    Zhu, Hongwei
    Houry, Walid A.
    Brudno, Michael
    Ragibizadeh, Sasan
    Papp, Balazs
    Pal, Csaba
    Roth, Frederick P.
    Giaever, Guri
    Nislow, Corey
    Troyanskaya, Olga G.
    Bussey, Howard
    Bader, Gary D.
    Gingras, Anne-Claude
    Morris, Quaid D.
    Kim, Philip M.
    Kaiser, Chris A.
    [J]. SCIENCE, 2010, 327 (5964) : 425 - 431