SVM-based feature selection of latent semantic features

被引:40
作者
Shima, K [1 ]
Todoriki, M [1 ]
Suzuki, A [1 ]
机构
[1] Univ Tokyo, Dept Quantum Engn & Syst Sci, Bunkyo Ku, Tokyo 1138656, Japan
关键词
support vector machines; text categorization; latent semantic indexing; feature selection;
D O I
10.1016/j.patrec.2004.03.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Latent Semantic Indexing (LSI) is an effective method to extract features that captures underlying latent semantic structure in the word usage across documents. However, subspace selected by this method may not be the most appropriate one to classify documents, since it orders extracted features according to their variances, not the classification power. We propose to apply feature ordering method based on support vector machines in order to select LSI-features that is suited for classification. Experimental results suggest that the method improves classification performance with considerably more compact representation. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:1051 / 1057
页数:7
相关论文
共 17 条
[11]  
Kwok JTY, 1998, ICONIP'98: THE FIFTH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING JOINTLY WITH JNNS'98: THE 1998 ANNUAL CONFERENCE OF THE JAPANESE NEURAL NETWORK SOCIETY - PROCEEDINGS, VOLS 1-3, P347
[12]  
PENTLAND A, 1993, P LOOK PEOPL WORKSH
[13]   AN ALGORITHM FOR SUFFIX STRIPPING [J].
PORTER, MF .
PROGRAM-AUTOMATED LIBRARY AND INFORMATION SYSTEMS, 1980, 14 (03) :130-137
[14]  
SALTON G, 1983, INTRO MODERN INFORMA
[15]  
Schutze H, 1997, PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P74, DOI 10.1145/278459.258539
[16]   An explanation of the effectiveness of latent semantic indexing by means of a Bayesian regression model [J].
Story, RE .
INFORMATION PROCESSING & MANAGEMENT, 1996, 32 (03) :329-344
[17]  
Vapnik V., 1998, STAT LEARNING THEORY, V1, P2