Improving MeSH classification of biomedical articles using citation contexts

被引:25
作者
Aljaber, Bader [1 ]
Martinez, David [1 ,2 ]
Stokes, Nicola [3 ]
Bailey, James [1 ,2 ]
机构
[1] Univ Melbourne, Dept Comp Sci & Software Engn, Melbourne, Vic 3010, Australia
[2] Univ Melbourne, Victoria Res Lab, NICTA, Melbourne, Vic 3010, Australia
[3] Univ Coll Dublin, Sch Comp Sci & Informat, Dublin 4, Ireland
关键词
Citation contexts; Document expansion; Biomedical text classification; MeSH terms;
D O I
10.1016/j.jbi.2011.05.007
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Medical Subject Headings (MeSH) are used to index the majority of databases generated by the National Library of Medicine. Essentially, MeSH terms are designed to make information, such as scientific articles, more retrievable and assessable to users of systems such as PubMed. This paper proposes a novel method for automating the assignment of biomedical publications with MeSH terms that takes advantage of citation references to these publications. Our findings show that analysing the citation references that point to a document can provide a useful source of terms that are not present in the document. The use of these citation contexts, as they are known, can thus help to provide a richer document feature representation, which in turn can help improve text mining and information retrieval applications, in our case MeSH term classification. In this paper, we also explore new methods of selecting and utilising citation contexts. In particular, we assess the effect of weighting the importance of citation terms (found in the citation contexts) according to two aspects: (i) the section of the paper they appear in and (ii) their distance to the citation marker. We conduct intrinsic and extrinsic evaluations of citation term quality. For the intrinsic evaluation, we rely on the UMLS Metathesaurus conceptual database to explore the semantic characteristics of the mined citation terms. We also analyse the "informativeness" of these terms using a class-entropy measure. For the extrinsic evaluation, we run a series of automatic document classification experiments over MeSH terms. Our experimental evaluation shows that citation contexts contain terms that are related to the original document, and that the integration of this knowledge results in better classification performance compared to two state-of-the-art MeSH classification systems: MeSHUP and MTI. Our experiments also demonstrate that the consideration of Section and Distance factors can lead to statistically significant improvements in citation feature quality, thus opening the way for better document feature representation in other biomedical text processing applications. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:881 / 896
页数:16
相关论文
共 52 条
  • [1] Document clustering of scientific texts using citation contexts
    Aljaber, Bader
    Stokes, Nicola
    Bailey, James
    Pei, Jian
    [J]. INFORMATION RETRIEVAL, 2010, 13 (02): : 101 - 131
  • [2] [Anonymous], 2009, Proc. of Empirical Methods on Natural Lang. Proc
  • [3] [Anonymous], 2010, P 19 INT C WORLD WID, DOI DOI 10.1145/1772690.1772734
  • [4] [Anonymous], 1998, Computer networks and ISDN systems, DOI [10.1016/S0169-7552(98)00110-X, DOI 10.1016/S0169-7552(98)00110-X]
  • [5] Billerbeck Bodo, 2005, P 10 AUSTR DOC COMP, P34
  • [6] Bradshaw S, 2003, LECT NOTES COMPUT SC, V2769, P499
  • [7] Bradshaw S., 2002, IUI 02. 2002 International Conference on Intelligent User Interfaces, P180
  • [8] BRADSHAW S, 2002, THESIS NW U EVANSTON
  • [9] BRADSHAW S, NWUCS017
  • [10] A survey of current work in biomedical text mining
    Cohen, AM
    Hersh, WR
    [J]. BRIEFINGS IN BIOINFORMATICS, 2005, 6 (01) : 57 - 71