Efficient Sparse Kernel Feature Extraction Based on Partial Least Squares

被引:31
作者
Dhanjal, Charanpal [1 ]
Gunn, Steve R. [1 ]
Shawe-Taylor, John [2 ]
机构
[1] Univ Southampton, Sch Elect & Comp Sci, Informat Signals Images Syst Res Grp, Southampton SO17 1BJ, Hants, England
[2] UCL, Dept Comp Sci, Ctr Computat Stat & Machine Learning, London WC1E 6BT, England
关键词
Machine learning; kernel methods; feature extraction; partial least squares (PLS); STATISTICAL VARIABLES; REGRESSION; COMPLEX;
D O I
10.1109/TPAMI.2008.171
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The presence of irrelevant features in training data is a significant obstacle for many machine learning tasks. One approach to this problem is to extract appropriate features and, often, one selects a feature extraction method based on the inference algorithm. Here, we formalize a general framework for feature extraction, based on Partial Least Squares, in which one can select a user-defined criterion to compute projection directions. The framework draws together a number of existing results and provides additional insights into several popular feature extraction methods. Two new sparse kernel feature extraction methods are derived under the framework, called Sparse Maximal Alignment (SMA) and Sparse Maximal Covariance (SMC), respectively. Key advantages of these approaches include simple implementation and a training time which scales linearly in the number of examples. Furthermore, one can project a new test example using only k kernel evaluations, where k is the output dimensionality. Computational results on several real-world data sets show that SMA and SMC extract features which are as predictive as those found using other popular feature extraction methods. Additionally, on large text retrieval and face detection data sets, they produce features which match the performance of the original ones in conjunction with a Support Vector Machine.
引用
收藏
页码:1347 / 1361
页数:15
相关论文
共 47 条
[1]  
[Anonymous], 1687 AI MIT CTR BIOL
[2]  
[Anonymous], P SIGKDD INT C KNOWL
[3]  
ARENASGARCIA J, 2006, ADV NEURAL INFORM PR, V19, P33
[4]   Partial least squares for discrimination [J].
Barker, M ;
Rayens, W .
JOURNAL OF CHEMOMETRICS, 2003, 17 (03) :166-173
[5]  
Bartlett P. L., 2003, Journal of Machine Learning Research, V3, P463, DOI 10.1162/153244303321897690
[6]  
Bi J., 2003, Journal of Machine Learning Research, V3, P1229, DOI 10.1162/153244303322753643
[7]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[8]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[9]  
Crammer K., 2002, NEURAL INFORM PROCES, P537
[10]  
CRISTIANINI N, 2001, ADV NEURAL INFORM PR, V14, P10