Information extraction from full text scientific articles: Where are the keywords?

被引:117
作者
Shah, PK
Perez-Iratxeta, C
Bork, P [1 ]
Andrade, MA
机构
[1] European Mol Biol Lab, Heidelberg, Germany
[2] Max Delbruck Ctr Mol Med, Dept Bioinformat, Berlin, Germany
关键词
D O I
10.1186/1471-2105-4-20
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: To date, many of the methods for information extraction of biological information from scientific articles are restricted to the abstract of the article. However, full text articles in electronic version, which offer larger sources of data, are currently available. Several questions arise as to whether the effort of scanning full text articles is worthy, or whether the information that can be extracted from the different sections of an article can be relevant. Results: In this work we addressed those questions showing that the keyword content of the different sections of a standard scientific article ( abstract, introduction, methods, results, and discussion) is very heterogeneous. Conclusions: Although the abstract contains the best ratio of keywords per total of words, other sections of the article may be a better source of biologically relevant data.
引用
收藏
页数:9
相关论文
共 18 条
  • [1] Automated extraction of information in molecular biology
    Andrade, MA
    Bork, P
    [J]. FEBS LETTERS, 2000, 476 (1-2) : 12 - 17
  • [2] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [3] Blaschke Christian, 2002, Brief Bioinform, V3, P154, DOI 10.1093/bib/3.2.154
  • [4] COLLIER N, 2000, COLING 2000, P201
  • [5] Learning deficits, but normal development and tumor predisposition, in mice lacking exon 23a of Nf1
    Costa, RM
    Yang, T
    Huynh, DP
    Pulst, SM
    Viskochil, DH
    Silva, AJ
    Brannan, CI
    [J]. NATURE GENETICS, 2001, 27 (04) : 399 - 405
  • [6] Getting to the (c)ore of knowledge: mining biomedical literature
    de Bruijn, B
    Martin, J
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2002, 67 (1-3) : 7 - 18
  • [7] DEBRUIJN B, 2002, P EFMI WORKSH NAT LA, P1
  • [8] Public-access group supports PubMed Central
    Eisen, MB
    Brown, PO
    Varmus, HE
    [J]. NATURE, 2002, 419 (6903) : 111 - 111
  • [9] The complexity of comparing reaction systems
    Ettinger, M
    [J]. BIOINFORMATICS, 2002, 18 (03) : 465 - 469
  • [10] SplitsTree: analyzing and visualizing evolutionary data
    Huson, DH
    [J]. BIOINFORMATICS, 1998, 14 (01) : 68 - 73