An Overview of BioCreative II.5

被引:76
作者
Leitner, Florian [1 ]
Mardis, Scott A. [2 ]
Krallinger, Martin [1 ]
Cesareni, Gianni [3 ,4 ]
Hirschman, Lynette A. [2 ]
Valencia, Alfonso [1 ]
机构
[1] Spanish Natl Canc Res Ctr CNIO, Struct Biol & BioComp Programme, Madrid, Spain
[2] Mitre Corp, Ctr Informat Technol, Bedford, MA 01730 USA
[3] Univ Roma Tor Vergata, Dept Biol, I-00173 Rome, Italy
[4] IRCSS Santa Lucia Rome, Rome, Italy
基金
美国国家科学基金会;
关键词
Text mining; text analysis; natural language processing; molecular biology; biological curation; TEXT; COMMUNITY; BIOLOGY; SYSTEMS;
D O I
10.1109/TCBB.2010.61
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We present the results of the BioCreative II.5 evaluation in association with the FEBS Letters experiment, where authors created Structured Digital Abstracts to capture information about protein-protein interactions. The BioCreative II.5 challenge evaluated automatic annotations from 15 text mining teams based on a gold standard created by reconciling annotations from curators, authors, and automated systems. The tasks were to rank articles for curation based on curatable protein-protein interactions; to identify the interacting proteins (using UniProt identifiers) in the positive articles (61); and to identify interacting protein pairs. There were 595 full-text articles in the evaluation test set, including those both with and without curatable protein interactions. The principal evaluation metrics were the interpolated area under the precision/recall curve (AUC iP/R), and ( balanced) F-measure. For article classification, the best AUC iP/R was 0.70; for interacting proteins, the best system achieved good macroaveraged recall (0.73) and interpolated area under the precision/recall curve (0.58), after filtering incorrect species and mapping homonymous orthologs; for interacting protein pairs, the top (filtered, mapped) recall was 0.42 and AUC iP/R was 0.29. Ensemble systems improved performance for the interacting protein task.
引用
收藏
页码:385 / 399
页数:15
相关论文
共 24 条
[1]   Text mining for biology - the way forward: opinions from leading scientists [J].
Altman, Russ B. ;
Bergman, Casey M. ;
Blake, Judith ;
Blaschke, Christian ;
Cohen, Aaron ;
Gannon, Frank ;
Grivell, Les ;
Hahn, Udo ;
Hersh, William ;
Hirschman, Lynette ;
Jensen, Lars Juhl ;
Krallinger, Martin ;
Mons, Barend ;
O'Donoghue, Sean I. ;
Peitsch, Manuel C. ;
Rebholz-Schuhmann, Dietrich ;
Shatkay, Hagit ;
Valencia, Alfonso .
GENOME BIOLOGY, 2008, 9
[2]  
[Anonymous], 2008, Introduction to information retrieval
[3]   Swiss-Prot: Juggling between evolution and stability [J].
Bairoch, A ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E .
BRIEFINGS IN BIOINFORMATICS, 2004, 5 (01) :39-55
[4]   The Universal Protein Resource (UniProt) 2009 [J].
Bairoch, Amos ;
Consortium, UniProt ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
deCastro, Edouard ;
Ciapina, Luciane ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
Delbard, Gwennaelle ;
Dornevil, Dolnide ;
Roggli, Paula Duek ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Feuermann, Marc ;
Gehant, Sebastian ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
James, Janet ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Kappler, Thomas ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra ;
Lara, Vicente .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D169-D174
[5]   Evaluation of BioCreAtIvE assessment of task 2 [J].
Blaschke, Christian ;
Leon, Eduardo Andres ;
Krallinger, Martin ;
Valencia, Alfonso .
BMC Bioinformatics, 2005, 6 (SUPPL.1)
[6]   Critical assessment of information extraction systems in biology [J].
Blaschke, C ;
Hirschman, L ;
Yeh, A ;
Valencia, A .
COMPARATIVE AND FUNCTIONAL GENOMICS, 2003, 4 (06) :674-677
[7]  
CARPENTER B, 2010, LINGPIPE
[8]   Linking entries in protein interaction database to structured text: The FEBS Letters experiment [J].
Ceol, Arnaud ;
Chatr-Aryamontri, Andrew ;
Licata, Luana ;
Cesareni, Gianni .
FEBS LETTERS, 2008, 582 (08) :1171-1177
[9]   MINT, the molecular interaction database: 2009 update [J].
Ceol, Arnaud ;
Aryamontri, Andrew Chatr ;
Licata, Luana ;
Peluso, Daniele ;
Briganti, Leonardo ;
Perfetto, Livia ;
Castagnoli, Luisa ;
Cesareni, Gianni .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D532-D539
[10]   MINT: the molecular INTeraction database [J].
Chatr-aryamontri, Andrew ;
Ceol, Arnaud ;
Palazzi, Luisa Montecchi ;
Nardelli, Giuliano ;
Schneider, Maria Victoria ;
Castagnoli, Luisa ;
Cesareni, Gianni .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D572-D574