Combinatorial PCA and SVM methods for feature selection in learning classifications (applications to text categorization)

被引:3
作者
Anghelescu, AV [1 ]
Muchnik, IB [1 ]
机构
[1] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08854 USA
来源
INTERNATIONAL CONFERENCE ON INTEGRATION OF KNOWLEDGE INTENSIVE MULTI-AGENT SYSTEMS: KIMAS'03: MODELING, EXPLORATION, AND ENGINEERING | 2003年
关键词
D O I
10.1109/KIMAS.2003.1245090
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we describe a purely combinatorial approach of obtaining meaningful representations of text data. More precisely, we describe two different methods that materialize this approach: we call them combinatorial principal component analysis (cPCA) and combinatorial support vector machines (cSVM). These names emphasise mathematical analogies between the well known PCA and SVM, on one hand, and our respective methods. For evaluating the selected spaces of features, we used the environment set for TREC 2002 and used a very common classifier: 1-nearest neighbour (1-NN). We compared the results obtained on the feature sets calculated by the procedures we described (cPCA and cSVM) with the results obtained on the original feature space. We showed that by selecting a feature space on average 50 times smaller than the original space, the performance of the classifier does not decrease by more than 2%.
引用
收藏
页码:491 / 496
页数:6
相关论文
共 10 条
[1]  
ALMUALLIM H, 1991, P 9 NAT C ART INT AA, V2, P547
[2]  
ANGHELESCU A, 2003, COMPLETE SET RESULTS
[3]  
KIRA K, 1992, MACHINE LEARNING /, P249
[4]  
KUZNETSOV E, 1985, NONNUMERICAL INFORMA
[5]  
ROSE T, 2002, P 3 INT C LANG RES E
[6]  
SCHUTZE H, 1995, RES DEV INFORMATION, P229
[7]  
VOORHEES EM, 2002, P 2002 TEXT RETR C
[8]  
Weston J, 2001, ADV NEUR IN, V13, P668
[9]  
Wiener E., 1995, P 4 ANN S DOCUMENT A, P317
[10]  
YANG Y, 1996, J AM SOC INFORMATION, V47