Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases

被引：12

作者：

Chatr-aryamontri, Andrew ^{[1
]}

Winter, Andrew ^{[1
]}

Perfetto, Livia ^{[2
]}

Briganti, Leonardo ^{[2
]}

Licata, Luana ^{[2
]}

Iannuccelli, Marta ^{[2
]}

Castagnoli, Luisa ^{[2
]}

Cesareni, Gianni ^{[2
,3
]}

Tyers, Mike ^{[1
,4
]}

机构：

[1] Univ Edinburgh, Sch Biol Sci, Edinburgh EH9 3JR, Midlothian, Scotland

[2] Univ Roma Tor Vergata, Dept Biol, I-00133 Rome, Italy

[3] Fdn Santa Lucia, IRCCS, I-00143 Rome, Italy

[4] Mt Sinai Hosp, Samuel Lunenfeld Res Inst, Ctr Syst Biol, Toronto, ON M5G 1X5, Canada

来源：

BMC BIOINFORMATICS | 2011年 / 12卷

基金：

美国国家卫生研究院; 英国生物技术与生命科学研究理事会; 加拿大健康研究院;

关键词：

NETWORKS; SYSTEMS; COMMUNITY;

D O I：

10.1186/1471-2105-12-S8-S8

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: The vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate the information dispersed across the entire biomedical literature. The development of efficient tools based on natural language processing (NLP) systems is essential for the selection of relevant publications, identification of data attributes and partially automated annotation. One of the tasks of the Biocreative 2010 Challenge III was devoted to the evaluation of NLP systems developed to identify articles for curation and extraction of protein-protein interaction (PPI) data. Results: The Biocreative 2010 competition addressed three tasks: gene normalization, article classification and interaction method identification. The BioGRID and MINT protein interaction databases both participated in the generation of the test publication set for gene normalization, annotated the development and test sets for article classification, and curated the test set for interaction method classification. These test datasets served as a gold standard for the evaluation of data extraction algorithms. Conclusion: The development of efficient tools for extraction of PPI data is a necessary step to achieve full curation of the biomedical literature. NLP systems can in the first instance facilitate expert curation by refining the list of candidate publications that contain PPI data; more ambitiously, NLP approaches may be able to directly extract relevant information from full-text articles for rapid inspection by expert curators. Close collaboration between biological databases and NLP systems developers will continue to facilitate the long-term objectives of both disciplines.

引用

页数：8

共 42 条

[1] [Anonymous], 2007, NAT BIOTECHNOL, V25, P262
[2] [Anonymous], MINT DAT
[3] [Anonymous], BIOGRID EXPT EV COD
[4] [Anonymous], IMEX CUR MAN
[5] Ongoing and future developments at the Universal Protein Resource
Apweiler, Rolf
Martin, Maria Jesus
O'Donovan, Claire
Magrane, Michele
Alam-Faruque, Yasmin
Antunes, Ricardo
Barrell, Daniel
Bely, Benoit
Bingley, Mark
Binns, David
Bower, Lawrence
Browne, Paul
Chan, Wei Mun
Dimmer, Emily
Eberhardt, Ruth
Fazzini, Francesco
Fedotov, Alexander
Foulger, Rebecca
Garavelli, John
Castro, Leyla Garcia
Huntley, Rachael
Jacobsen, Julius
Kleen, Michael
Laiho, Kati
Legge, Duncan
Lin, Quan
Liu, Wudong
Luo, Jie
Orchard, Sandra
Patient, Samuel
Pichler, Klemens
Poggioli, Diego
Pontikos, Nikolas
Pruess, Manuela
Rosanoff, Steven
Sawford, Tony
Sehra, Harminder
Turner, Edward
Corbett, Matt
Donnelly, Mike
van Rensburg, Pieter
Xenarios, Ioannis
Bougueleret, Lydie
Auchincloss, Andrea
Argoud-Puy, Ghislaine
Axelsen, Kristian
Bairoch, Amos
Baratin, Delphine
Blatter, Marie-Claude
Boeckmann, Brigitte
[J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D214 - D219
[6] The IntAct molecular interaction database in 2010
Aranda, B.
Achuthan, P.
Alam-Faruque, Y.
Armean, I.
Bridge, A.
Derow, C.
Feuermann, M.
Ghanbarian, A. T.
Kerrien, S.
Khadake, J.
Kerssemakers, J.
Leroy, C.
Menden, M.
Michaut, M.
Montecchi-Palazzi, L.
Neuhauser, S. N.
Orchard, S.
Perreau, V.
Roechert, B.
van Eijk, K.
Hermjakob, H.
[J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : D525 - D531
[7] NCBI GEO: archive for functional genomics data sets-10 years on
Barrett, Tanya
Troup, Dennis B.
Wilhite, Stephen E.
Ledoux, Pierre
Evangelista, Carlos
Kim, Irene F.
Tomashevsky, Maxim
Marshall, Kimberly A.
Phillippy, Katherine H.
Sherman, Patti M.
Muertter, Rolf N.
Holko, Michelle
Ayanbule, Oluwabukunmi
Yefanov, Andrey
Soboleva, Alexandra
[J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D1005 - D1010
[8] Integrating physical and genetic maps: from genomes to interaction networks
Beyer, Andreas
Bandyopadhyay, Sourav
Ideker, Trey
[J]. NATURE REVIEWS GENETICS, 2007, 8 (09) : 699 - 710
[9] Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
Birney, Ewan
Stamatoyannopoulos, John A.
Dutta, Anindya
Guigo, Roderic
Gingeras, Thomas R.
Margulies, Elliott H.
Weng, Zhiping
Snyder, Michael
Dermitzakis, Emmanouil T.
Stamatoyannopoulos, John A.
Thurman, Robert E.
Kuehn, Michael S.
Taylor, Christopher M.
Neph, Shane
Koch, Christoph M.
Asthana, Saurabh
Malhotra, Ankit
Adzhubei, Ivan
Greenbaum, Jason A.
Andrews, Robert M.
Flicek, Paul
Boyle, Patrick J.
Cao, Hua
Carter, Nigel P.
Clelland, Gayle K.
Davis, Sean
Day, Nathan
Dhami, Pawandeep
Dillon, Shane C.
Dorschner, Michael O.
Fiegler, Heike
Giresi, Paul G.
Goldy, Jeff
Hawrylycz, Michael
Haydock, Andrew
Humbert, Richard
James, Keith D.
Johnson, Brett E.
Johnson, Ericka M.
Frum, Tristan T.
Rosenzweig, Elizabeth R.
Karnani, Neerja
Lee, Kirsten
Lefebvre, Gregory C.
Navas, Patrick A.
Neri, Fidencio
Parker, Stephen C. J.
Sabo, Peter J.
Sandstrom, Richard
Shafer, Anthony
[J]. NATURE, 2007, 447 (7146) : 799 - 816
[10] Exploring genetic interactions and networks with yeast
Boone, Charles
Bussey, Howard
Andrews, Brenda J.
[J]. NATURE REVIEWS GENETICS, 2007, 8 (06) : 437 - 449

← 1 2 3 4 5 →