Class-dependent projection based method for text categorization

被引:10
作者
Chen, Lifei [1 ]
Guo, Gongde [1 ]
Wang, Kaijun [1 ]
机构
[1] Fujian Normal Univ, Sch Math & Comp Sci, Fuzhou 350108, Fujian, Peoples R China
关键词
Text categorization; Classification; Projection; Class-dependence; Feature weighting;
D O I
10.1016/j.patrec.2011.01.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text categorization presents unique challenges to traditional classification methods due to the large number of features inherent in the datasets from real-world applications of text categorization, and a great deal of training samples. In high-dimensional document data, the classes are typically categorized only by subsets of features, which are typically different for the classes of different topics. This paper presents a simple but effective classifier for text categorization using class-dependent projection based method. By projecting onto a set of individual subspaces, the samples belonging to different document classes are separated such that they are easily to be classified. This is achieved by developing a new supervised feature weighting algorithm to learn the optimized subspaces for all the document classes. The experiments carried out on common benchmarking corpuses showed that the proposed method achieved both higher classification accuracy and lower computational costs than some distinguishing classifiers in text categorization, especially for datasets including document categories with overlapping topics. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1493 / 1501
页数:9
相关论文
共 20 条
  • [1] Multispace KL for pattern representation and classification
    Cappelli, R
    Maio, D
    Maltoni, D
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (09) : 977 - 996
  • [2] Chen LF, 2008, IEEE DATA MINING, P755, DOI 10.1109/ICDM.2008.15
  • [3] Cohen M.C., 2008, P INT C PATT REC BIO, P170
  • [4] NEAREST NEIGHBOR PATTERN CLASSIFICATION
    COVER, TM
    HART, PE
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) : 21 - +
  • [5] Locally adaptive metrics for clustering high dimensional data
    Domeniconi, Carlotta
    Gunopulos, Dimitrios
    Ma, Sheng
    Yan, Bojun
    Al-Razgan, Muna
    Papadopoulos, Dimitris
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2007, 14 (01) : 63 - 97
  • [6] Estébanez C, 2005, PROC WRLD ACAD SCI E, V7, P56
  • [7] Using kNN model for automatic text categorization
    Guo, GD
    Wang, H
    Bell, D
    Bi, YX
    Greer, K
    [J]. SOFT COMPUTING, 2006, 10 (05) : 423 - 430
  • [8] A real-coded genetic algorithm for constructive induction
    HajAbedi, Z.
    [J]. 2009 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-5, 2009, : 2036 - 2042
  • [9] Han E., 2000, PKDD 00, P424, DOI DOI 10.1007/3-540-45372-5_46
  • [10] Hinneburg A, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P506