SVM-based feature selection of latent semantic features

被引:40
作者
Shima, K [1 ]
Todoriki, M [1 ]
Suzuki, A [1 ]
机构
[1] Univ Tokyo, Dept Quantum Engn & Syst Sci, Bunkyo Ku, Tokyo 1138656, Japan
关键词
support vector machines; text categorization; latent semantic indexing; feature selection;
D O I
10.1016/j.patrec.2004.03.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Latent Semantic Indexing (LSI) is an effective method to extract features that captures underlying latent semantic structure in the word usage across documents. However, subspace selected by this method may not be the most appropriate one to classify documents, since it orders extracted features according to their variances, not the classification power. We propose to apply feature ordering method based on support vector machines in order to select LSI-features that is suited for classification. Experimental results suggest that the method improves classification performance with considerably more compact representation. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:1051 / 1057
页数:7
相关论文
共 17 条
[1]  
[Anonymous], P 3 INT C DAT MIN ME
[2]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[3]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[4]  
2-9
[5]   Support vector machines for spam categorization [J].
Drucker, H ;
Wu, DH ;
Vapnik, VN .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05) :1048-1054
[6]  
Dumais S., 1998, Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, P148, DOI 10.1145/288627.288651
[7]  
DUMAIS ST, 1995, NIST SPECIAL PUBLICA, P219
[8]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422
[9]  
Joachims T, 1999, ADVANCES IN KERNEL METHODS, P169
[10]  
Joachims T., 1998, Lecture Notes in Computer Science, P137, DOI DOI 10.1007/BFB0026683