Integrating external knowledge to supplement training data in semi-supervised learning for text categorization

被引:6
作者
Benkhalifa, M
Mouradi, A
Bouyakhf, H
机构
[1] Al Akhawayn Univ Ifrane, Sch Sci & Engn, Ifrane 53000, Morocco
[2] Mohammed V Univ, ENSIAS, Rabat, Morocco
来源
INFORMATION RETRIEVAL | 2001年 / 4卷 / 02期
关键词
semi supervised Fuzzy c Means; text categorization; vector space model; WordNet lexical database; Reuters Database;
D O I
10.1023/A:1011458711300
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text Categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. TC has been an application for many learning approaches, which prove effective. Nevertheless, TC provides many challenges to machine learning. In this paper, we suggest, for text categorization, the integration of external WordNet lexical information to supplement training data for a semi-supervised clustering algorithm which can learn from both training and test documents to classify new unseen documents. This algorithm is the "Semi-Supervised Fuzzy c-Means" (ssFCM). Our experiments use Reuters 21578 database and consist of binary classifications for categories selected from the 115 TOPICS classes of the Reuters collection. Using the Vector Space Model, each document is represented by its original feature vector augmented with external feature vector generated using WordNet. We verify experimentally that the integration of WordNet helps ssFCM improve its performance, effectively addresses the classification of documents into categories with few training documents and does not interfere with the use of training data.
引用
收藏
页码:91 / 113
页数:23
相关论文
共 35 条
  • [1] [Anonymous], P 22 ANN INT ACM SIG
  • [2] [Anonymous], Pattern Recognition With Fuzzy Objective Function Algorithms
  • [3] AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION
    APTE, C
    DAMERAU, F
    WEISS, SM
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1994, 12 (03) : 233 - 251
  • [4] Text Categorization using the Semi-Supervised Fuzzy c-Means Algorithm
    Benkhalifa, M
    Bensaid, A
    [J]. 18TH INTERNATIONAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1999, : 561 - 565
  • [5] Bensaid A. M., 1996, Fourth European Congress on Intelligent Techniques and Soft Computing Proceedings, EUFIT '96, P1402
  • [6] Partially supervised clustering for image segmentation
    Bensaid, AM
    Hall, LO
    Bezdek, JC
    Clarke, LP
    [J]. PATTERN RECOGNITION, 1996, 29 (05) : 859 - 871
  • [7] Multiple-prototype classifier design
    Bezdek, JC
    Reichherzer, TR
    Lim, GS
    Attikiouzel, Y
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 1998, 28 (01): : 67 - 79
  • [8] Biebricher P., 1988, 11th International Conference on Research and Development in Information Retrieval, P333
  • [9] BUENAGA MR, 1997, 2 INT C REC ADV NAT, P150
  • [10] CRAWFORD SL, 1991, 8TH P INT WORKSH MAC, P245