Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup

被引:84
作者
Yeh, Alexander S. [1 ]
Hirschman, Lynette [1 ]
Morgan, Alexander A. [1 ]
机构
[1] Mitre Corp, Bedford, MA 01730 USA
关键词
text mining; evaluation; curation; genomics; data management;
D O I
10.1093/bioinformatics/btg1046
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The biological literature is a major repository of knowledge. Many biological databases draw much of their content from a careful curation of this literature. However, as the volume of literature increases, the burden of curation increases. Text mining may provide useful tools to assist in the curation process. To date, the lack of standards has made it impossible to determine whether text mining techniques are sufficiently mature to be useful. Results: We report on a Challenge Evaluation task that we created for the Knowledge Discovery and Data Mining (KDD) Challenge Cup. We provided a training corpus of 862 articles consisting of journal articles curated in FlyBase, along with the associated lists of genes and gene products, as well as the relevant data fields from FlyBase. For the test, we provided a corpus of 213 new ('blind') articles; the 18 participating groups provided systems that flagged articles for curation, based on whether the article contained experimental evidence for gene expression products. We report on the evaluation results and describe the techniques used by the top performing groups.
引用
收藏
页码:i331 / i339
页数:9
相关论文
共 15 条
  • [1] Gelbart WM, 2002, NUCLEIC ACIDS RES, V30, P106
  • [2] GHANEM MM, 2003, SIGKDD EXPLORATIONS, V4, P95
  • [3] Hart, 2006, PATTERN CLASSIFICATI
  • [4] The evolution of evaluation: Lessons from the Message Understanding Conferences
    Hirschman, L
    [J]. COMPUTER SPEECH AND LANGUAGE, 1998, 12 (04) : 281 - 305
  • [5] Accomplishments and challenges in literature data mining for biology
    Hirschman, L
    Park, JC
    Tsujii, J
    Wong, L
    Wu, CH
    [J]. BIOINFORMATICS, 2002, 18 (12) : 1553 - 1561
  • [6] Neuropeptide amidation in Drosophila: Separate genes encode the two enzymes catalyzing amidation
    Kolhekar, AS
    Roberts, MS
    Jiang, N
    Johnson, RC
    Mains, RE
    Eipper, BA
    Taghert, PH
    [J]. JOURNAL OF NEUROSCIENCE, 1997, 17 (04) : 1363 - 1376
  • [7] REGEV Y, 2003, SIGKDD EXPLORATIONS, V4, P90
  • [8] A DROSOPHILA GENE ENCODING A PROTEIN RESEMBLING THE HUMAN BETA-AMYLOID PROTEIN-PRECURSOR
    ROSEN, DR
    MARTINMORRIS, L
    LUO, LQ
    WHITE, K
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1989, 86 (07) : 2478 - 2482
  • [9] Salton Gerard, 1983, INTRO MODERN INFORM
  • [10] SHI M, 2003, SIGKDD EXPLORATIONS, V4, P93