Overview of BioCreAtIvE task IB: normalized gene lists

被引:96
作者
Hirschman, L [1 ]
Colosimo, M [1 ]
Morgan, A [1 ]
Yeh, A [1 ]
机构
[1] Mitre Corp, Bedford, MA 01730 USA
关键词
Word Sense Disambiguation; Gene Identifier; Gene Mention; Lexical Resource; Model Organism Database;
D O I
10.1186/1471-2105-6-S1-S11
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Our goal in BioCreAtIve has been to assess the state of the art in text mining, with emphasis on applications that reflect real biological applications, e. g., the curation process for model organism databases. This paper summarizes the BioCreAtIvE task 1B, the "Normalized Gene List" task, which was inspired by the gene list supplied for each curated paper in a model organism database. The task was to produce the correct list of unique gene identifiers for the genes and gene products mentioned in sets of abstracts from three model organisms (Yeast, Fly, and Mouse). Results: Eight groups fielded systems for three data sets (Yeast, Fly, and Mouse). For Yeast, the top scoring system (out of 15) achieved 0.92 F-measure (harmonic mean of precision and recall); for Mouse and Fly, the task was more difficult, due to larger numbers of genes, more ambiguity in the gene naming conventions (particularly for Fly), and complex gene names (for Mouse). For Fly, the top F-measure was 0.82 out of 11 systems and for Mouse, it was 0.79 out of 16 systems. Conclusion: This assessment demonstrates that multiple groups were able to perform a real biological task across a range of organisms. The performance was dependent on the organism, and specifically on the naming conventions associated with each organism. These results hold out promise that the technology can provide partial automation of the curation process in the near future.
引用
收藏
页数:10
相关论文
共 11 条
  • [1] Evaluation of BioCreAtIvE assessment of task 2
    Blaschke, Christian
    Leon, Eduardo Andres
    Krallinger, Martin
    Valencia, Alfonso
    [J]. BMC Bioinformatics, 2005, 6 (SUPPL.1)
  • [2] Data preparation and interannotator agreement: BioCreAtIvE task IB
    Colosimo, ME
    Morgan, AA
    Yeh, AS
    Colombe, JB
    Hirschman, L
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [3] Automatically annotating documents with normalized gene lists
    Crim, J
    McDonald, R
    Pereira, F
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [4] A simple approach for protein name identification:: prospects and limits
    Fundel, K
    Güttler, D
    Zimmer, R
    Apostolakis, J
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [5] HACHEY B, 2004, BIOCREATIVE WORKSH H
  • [6] ProMiner: rule-based protein and gene entity recognition
    Hanisch, D
    Fundel, K
    Mevissen, HT
    Zimmer, R
    Fluck, J
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [7] LIU H, 2004, BIOCREATIVE WORKSH H
  • [8] Gene name identification and normalization using a model organism database
    Morgan, AA
    Hirschman, L
    Colosimo, M
    Yeh, AS
    Colombe, JB
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (06) : 396 - 410
  • [9] Text Detective: a rule-based system for gene annotation in biomedical texts
    Tamames, J
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
  • [10] BioCreAtIvE task IA: gene mention finding evaluation
    Yeh, A
    Morgan, A
    Colosimo, M
    Hirschman, L
    [J]. BMC BIOINFORMATICS, 2005, 6 (Suppl 1)