Online feature extraction based on accelerated kernel principal component analysis for data stream

被引:23
作者
Joseph A.A. [1 ,2 ]
Tokumoto T. [1 ]
Ozawa S. [1 ]
机构
[1] Graduate School of Engineering, Kobe University, Rokko-dai, Nada, Kobe
[2] Universiti Malaysia Sarawak, Kota Samarahan, 94300, Sarawak
基金
日本学术振兴会;
关键词
Feature extraction; Incremental learning; Kernel principal component analysis; Online learning;
D O I
10.1007/s12530-015-9131-7
中图分类号
学科分类号
摘要
Kernel principal component analysis (KPCA) is known as a nonlinear feature extraction method. Takeuchi et al. have proposed an incremental type of KPCA (IKPCA) that can update an eigen-space incrementally for a sequence of data. However, in IKPCA, the eigenvalue decomposition should be carried out for every single data, even though a chunk of data is given at one time. To reduce the computational costs in learning chunk data, this paper proposes an extended IKPCA called Chunk IKPCA (CIKPCA) where a chunk of multiple data is learned with single eigenvalue decomposition. For a large data chunk, to reduce further computation time and memory usage, it is first divided into several smaller chunks, and only useful data are selected based on the accumulation ratio. In the proposed CIKPCA, a small set of independent data are first selected from a reduced set of data so that eigenvectors in a high-dimensional feature space can be represented as a linear combination of such independent data. Then, the eigenvectors are incrementally updated by keeping only an eigenspace model that consists of the sextuplet such as independent data, coefficients, eigenvalues, and mean information. The proposed CIKPCA can augment an eigen-feature space based on the accumulation ratio that can also be updated without keeping all the past data, and the eigen-feature space is rotated by solving an eigenvalue problem once for each data chunk. The experiment results show that the learning time of the proposed CIKPCA is greatly reduced as compared with KPCA and IKPCA without sacrificing recognition accuracy. © 2015, Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:15 / 27
页数:12
相关论文
共 31 条
[1]
Abe S., Support vector machines for pattern classification, advances in pattern recognition, (2010)
[2]
Aoki D., Omori T., Ozawa S., A robust incremental principal component analysis for feature extraction from stream data with missing values. In: Procccdings of international joint conference on neural networks, pp 1–8, (2013)
[3]
Asunction S., Newman D.J., UCI machine learning repository, (2007)
[4]
Babcock B., Babu S., Datar M., Motwani R., Widom J., Models and issues in data streams systems, In: Procedings 21st ACM SIGMOID-SIGACT-SIGART symposium on principles of database systems, (2002)
[5]
Baudat G., Anouar F., Feature vector selection and projection using kernels, Neurocomputing, 55, pp. 21-38, (2003)
[6]
Case J., Jain S., Lange S., Zeugmann T., Incremental concept learning for bounded data mining, Inf Comput, 152, pp. 74-110, (1999)
[7]
Chin T.J., Suter D., Incremental kernel principal component analysis, IEEE Trans Image Process, 16, pp. 1662-1674, (2007)
[8]
Domingos P., Hulten G., Catching up with the data: research issues in mining data streams, (2001)
[9]
Elwell R., Polikar R., Incremental learning of concept drift in nonstationary environments, IEEE Trans Neural Netw, 22, pp. 1517-1531, (2011)
[10]
Honeine P., Online kernel principal component analysis: a reduced-order model, IEEE Trans Pattern Anal Mach Intell, 34, pp. 1814-1826, (2012)