Hierarchical text categorization using neural networks

被引:164
作者
Ruiz, ME [1 ]
Srinivasan, P [1 ]
机构
[1] Univ Iowa, Sch Lib & Informat Sci, Main Lib 3087, Iowa City, IA 52242 USA
来源
INFORMATION RETRIEVAL | 2002年 / 5卷 / 01期
关键词
automatic text categorization; applied neural networks; hierarchical classifiers;
D O I
10.1023/A:1012782908347
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents the design and evaluation of a text categorization method based on the Hierarchical Mixture of Experts model. This model uses a divide and conquer principle to define smaller categorization problems based on a predefined hierarchical structure. The final classifier is a hierarchical array of neural networks. The method is evaluated using the UMLS Metathesaurus as the underlying hierarchical structure, and the OHSUMED test set of MEDLINE records. Comparisons with an optimized version of the traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers are provided. The results show that the use of the hierarchical structure improves text categorization performance with respect to an equivalent flat model. The optimized Rocchio algorithm achieves a performance comparable with that of the hierarchical neural networks.
引用
收藏
页码:87 / 118
页数:32
相关论文
共 44 条
  • [1] [Anonymous], 1998, NEW SCI, V158, P3
  • [2] [Anonymous], 1998, FEATURE EXTRACTION C
  • [3] [Anonymous], 1998, AAAI 98 WORKSHOP LEA
  • [4] [Anonymous], THESIS U CAMBRIDGE C
  • [5] [Anonymous], NEUROCOMPUTING ALGOR
  • [6] AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION
    APTE, C
    DAMERAU, F
    WEISS, SM
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1994, 12 (03) : 233 - 251
  • [7] Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946
  • [8] BUCKLEY C, 1995, P 18 ANN INT ACM SIG, P351
  • [9] Caropreso MF, 2001, TEXT DATABASES AND DOCUMENT MANAGEMENT: THEORY AND PRACTICE, P78
  • [10] COHEN WW, 1996, P 19 ANN INT ACM SIG, P307