CLDA: Feature selection for text categorization based on constrained LDA

被引:7
作者
Cui Zifeng [1 ]
Xu Baowen [1 ]
Zhang Weifeng [2 ]
Jiang Dawei [1 ]
Xu Junling [1 ]
机构
[1] SouthEast Univ, Sch Comp Sci & Engn, Nanjing 210018, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Dept CS&E, Nanjing, Peoples R China
来源
ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS | 2007年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICSC.2007.108
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Feature selection is a necessary process before pattern classification, machine learning and data mining. Now feature selection is facing challenge in high dimension space, such as text categorization in information retrieval. Linear Discriminant Analysis (LDA) is an excellent dimensionality reduction method which transforms the original data into low-dimensional feature space. However, it changes the original physical features and makes,features uninterpretable, which motivates us to select but not transform features by LDA idea of preserving structure information of between-class and within-class for text categorization. In the paper; a new approach of feature selection based on Constrained LDA (CLDA) is proposed, which models feature selection as a search problem in subspace and finds optimal solution subject to some restrictions. Further; CLDA optimization problem is transformed into a process of scoring and sorting of features. Experiments on 20 Newsgroups and Reuters-21578 show that CLDA is consistently better than information gain and chit-test with lower computational complexity.
引用
收藏
页码:702 / +
页数:2
相关论文
共 19 条
[1]  
[Anonymous], MSRTR200455
[2]  
[Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
[3]  
Blake C.L., 1998, UCI repository of machine learning databases
[4]  
Dash M., 1997, Intelligent Data Analysis, V1
[5]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[6]  
2-9
[7]  
DUDA RO, 2000, PATTER CLASSIFICATIO
[8]   Generalizing discriminant analysis using the generalized singular value decomposition [J].
Howland, P ;
Park, H .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (08) :995-1006
[9]  
Kim H, 2005, J MACH LEARN RES, V6, P37
[10]  
KIRA K, 1992, AAAI-92 PROCEEDINGS : TENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, P129