Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases

被引:12
作者
Chatr-aryamontri, Andrew [1 ]
Winter, Andrew [1 ]
Perfetto, Livia [2 ]
Briganti, Leonardo [2 ]
Licata, Luana [2 ]
Iannuccelli, Marta [2 ]
Castagnoli, Luisa [2 ]
Cesareni, Gianni [2 ,3 ]
Tyers, Mike [1 ,4 ]
机构
[1] Univ Edinburgh, Sch Biol Sci, Edinburgh EH9 3JR, Midlothian, Scotland
[2] Univ Roma Tor Vergata, Dept Biol, I-00133 Rome, Italy
[3] Fdn Santa Lucia, IRCCS, I-00143 Rome, Italy
[4] Mt Sinai Hosp, Samuel Lunenfeld Res Inst, Ctr Syst Biol, Toronto, ON M5G 1X5, Canada
来源
BMC BIOINFORMATICS | 2011年 / 12卷
基金
美国国家卫生研究院; 英国生物技术与生命科学研究理事会; 加拿大健康研究院;
关键词
NETWORKS; SYSTEMS; COMMUNITY;
D O I
10.1186/1471-2105-12-S8-S8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate the information dispersed across the entire biomedical literature. The development of efficient tools based on natural language processing (NLP) systems is essential for the selection of relevant publications, identification of data attributes and partially automated annotation. One of the tasks of the Biocreative 2010 Challenge III was devoted to the evaluation of NLP systems developed to identify articles for curation and extraction of protein-protein interaction (PPI) data. Results: The Biocreative 2010 competition addressed three tasks: gene normalization, article classification and interaction method identification. The BioGRID and MINT protein interaction databases both participated in the generation of the test publication set for gene normalization, annotated the development and test sets for article classification, and curated the test set for interaction method classification. These test datasets served as a gold standard for the evaluation of data extraction algorithms. Conclusion: The development of efficient tools for extraction of PPI data is a necessary step to achieve full curation of the biomedical literature. NLP systems can in the first instance facilitate expert curation by refining the list of candidate publications that contain PPI data; more ambitiously, NLP approaches may be able to directly extract relevant information from full-text articles for rapid inspection by expert curators. Close collaboration between biological databases and NLP systems developers will continue to facilitate the long-term objectives of both disciplines.
引用
收藏
页数:8
相关论文
共 42 条
  • [1] [Anonymous], 2007, NAT BIOTECHNOL, V25, P262
  • [2] [Anonymous], MINT DAT
  • [3] [Anonymous], BIOGRID EXPT EV COD
  • [4] [Anonymous], IMEX CUR MAN
  • [5] Ongoing and future developments at the Universal Protein Resource
    Apweiler, Rolf
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Alam-Faruque, Yasmin
    Antunes, Ricardo
    Barrell, Daniel
    Bely, Benoit
    Bingley, Mark
    Binns, David
    Bower, Lawrence
    Browne, Paul
    Chan, Wei Mun
    Dimmer, Emily
    Eberhardt, Ruth
    Fazzini, Francesco
    Fedotov, Alexander
    Foulger, Rebecca
    Garavelli, John
    Castro, Leyla Garcia
    Huntley, Rachael
    Jacobsen, Julius
    Kleen, Michael
    Laiho, Kati
    Legge, Duncan
    Lin, Quan
    Liu, Wudong
    Luo, Jie
    Orchard, Sandra
    Patient, Samuel
    Pichler, Klemens
    Poggioli, Diego
    Pontikos, Nikolas
    Pruess, Manuela
    Rosanoff, Steven
    Sawford, Tony
    Sehra, Harminder
    Turner, Edward
    Corbett, Matt
    Donnelly, Mike
    van Rensburg, Pieter
    Xenarios, Ioannis
    Bougueleret, Lydie
    Auchincloss, Andrea
    Argoud-Puy, Ghislaine
    Axelsen, Kristian
    Bairoch, Amos
    Baratin, Delphine
    Blatter, Marie-Claude
    Boeckmann, Brigitte
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D214 - D219
  • [6] The IntAct molecular interaction database in 2010
    Aranda, B.
    Achuthan, P.
    Alam-Faruque, Y.
    Armean, I.
    Bridge, A.
    Derow, C.
    Feuermann, M.
    Ghanbarian, A. T.
    Kerrien, S.
    Khadake, J.
    Kerssemakers, J.
    Leroy, C.
    Menden, M.
    Michaut, M.
    Montecchi-Palazzi, L.
    Neuhauser, S. N.
    Orchard, S.
    Perreau, V.
    Roechert, B.
    van Eijk, K.
    Hermjakob, H.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : D525 - D531
  • [7] NCBI GEO: archive for functional genomics data sets-10 years on
    Barrett, Tanya
    Troup, Dennis B.
    Wilhite, Stephen E.
    Ledoux, Pierre
    Evangelista, Carlos
    Kim, Irene F.
    Tomashevsky, Maxim
    Marshall, Kimberly A.
    Phillippy, Katherine H.
    Sherman, Patti M.
    Muertter, Rolf N.
    Holko, Michelle
    Ayanbule, Oluwabukunmi
    Yefanov, Andrey
    Soboleva, Alexandra
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D1005 - D1010
  • [8] Integrating physical and genetic maps: from genomes to interaction networks
    Beyer, Andreas
    Bandyopadhyay, Sourav
    Ideker, Trey
    [J]. NATURE REVIEWS GENETICS, 2007, 8 (09) : 699 - 710
  • [9] Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
    Birney, Ewan
    Stamatoyannopoulos, John A.
    Dutta, Anindya
    Guigo, Roderic
    Gingeras, Thomas R.
    Margulies, Elliott H.
    Weng, Zhiping
    Snyder, Michael
    Dermitzakis, Emmanouil T.
    Stamatoyannopoulos, John A.
    Thurman, Robert E.
    Kuehn, Michael S.
    Taylor, Christopher M.
    Neph, Shane
    Koch, Christoph M.
    Asthana, Saurabh
    Malhotra, Ankit
    Adzhubei, Ivan
    Greenbaum, Jason A.
    Andrews, Robert M.
    Flicek, Paul
    Boyle, Patrick J.
    Cao, Hua
    Carter, Nigel P.
    Clelland, Gayle K.
    Davis, Sean
    Day, Nathan
    Dhami, Pawandeep
    Dillon, Shane C.
    Dorschner, Michael O.
    Fiegler, Heike
    Giresi, Paul G.
    Goldy, Jeff
    Hawrylycz, Michael
    Haydock, Andrew
    Humbert, Richard
    James, Keith D.
    Johnson, Brett E.
    Johnson, Ericka M.
    Frum, Tristan T.
    Rosenzweig, Elizabeth R.
    Karnani, Neerja
    Lee, Kirsten
    Lefebvre, Gregory C.
    Navas, Patrick A.
    Neri, Fidencio
    Parker, Stephen C. J.
    Sabo, Peter J.
    Sandstrom, Richard
    Shafer, Anthony
    [J]. NATURE, 2007, 447 (7146) : 799 - 816
  • [10] Exploring genetic interactions and networks with yeast
    Boone, Charles
    Bussey, Howard
    Andrews, Brenda J.
    [J]. NATURE REVIEWS GENETICS, 2007, 8 (06) : 437 - 449