Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies

被引:55
作者
Winnenburg, Rainer [1 ]
Waechter, Thomas [1 ]
Plake, Conrad [1 ]
Doms, Andreas [1 ]
Schroeder, Michael [1 ]
机构
[1] Tech Univ Dresden, Ctr Biotechnol, BIOTEC, Bioinformat Grp, D-01307 Dresden, Germany
关键词
D O I
10.1093/bib/bbn043
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The biomedical literature can be seen as a large integrated, but unstructured data repository. Extracting facts from literature and making them accessible is approached from two directions: manual curation efforts develop ontologies and vocabularies to annotate gene products based on statements in papers. Text mining aims to automatically identify entities and their relationships in text using information retrieval and natural language processing techniques. Manual curation is highly accurate but time consuming, and does not scale with the ever increasing growth of literature. Text mining as a high-throughput computational technique scales well, but is error-prone due to the complexity of natural language. How can both be married to combine scalability and accuracy Here, we review the state-of-the-art text mining approaches that are relevant to annotation and discuss available online services analysing biomedical literature by means of text mining techniques, which could also be utilised by annotation projects. We then examine how far text mining has already been utilised in existing annotation projects and conclude how these techniques could be tightly integrated into the manual annotation process through novel authoring systems to scale-up high-quality manual curation.
引用
收藏
页码:466 / 478
页数:13
相关论文
共 84 条
  • [1] Alex Beatrice, 2008, Pac Symp Biocomput, P556
  • [2] Terminologies for text-mining;: an experiment in the lipoprotein metabolism domain
    Alexopoulou, Dimitra
    Waechter, Thomas
    Pickersgill, Laura
    Eyre, Cecilia
    Schroeder, Michael
    [J]. BMC BIOINFORMATICS, 2008, 9 (Suppl 4)
  • [3] Ando R. K., 2007, Proceedings of the Second BioCreative Challenge Evaluation Workshop, P101
  • [4] Word Sense Disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering
    Andreopoulos, Bill
    Alexopoulou, Dimitra
    Schroeder, Michael
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2008, 2 (03) : 193 - 215
  • [5] [Anonymous], P 2 BIOCREATIVE CHAL
  • [6] [Anonymous], P 2 BIOCREATIVE CHAL
  • [7] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [8] BIND - The Biomolecular Interaction Network Database
    Bader, GD
    Donaldson, I
    Wolting, C
    Ouellette, BFF
    Pawson, T
    Hogue, CWV
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 242 - 245
  • [9] BAUMGARTNER WA, 2007, P 2 BIOCREATIVE CHAL, P257
  • [10] Manual curation is not sufficient for annotation of genomic databases
    Baumgartner, William A., Jr.
    Cohen, K. Bretonnel
    Fox, Lynne M.
    Acquaah-Mensah, George
    Hunter, Lawrence
    [J]. BIOINFORMATICS, 2007, 23 (13) : I41 - I48