Incremental partial least squares analysis of big streaming data

被引:66
作者
Zeng, Xue-Qiang [1 ,2 ]
Li, Guo-Zheng [3 ]
机构
[1] Nanchang Univ, Ctr Comp, Nanchang 330031, Peoples R China
[2] Tongji Univ, Minist Educ, Key Lab Embedded Syst & Serv Comp, Shanghai 201804, Peoples R China
[3] Tongji Univ, Dept Control Sci & Engn, Shanghai 201804, Peoples R China
基金
中国博士后科学基金;
关键词
Feature extraction; Incremental learning; Large-scale data; Partial least squares; Streaming data; PRINCIPAL COMPONENT ANALYSIS; LINEAR DISCRIMINANT-ANALYSIS; DIMENSIONALITY REDUCTION; CLASSIFICATION; EFFICIENT; ALGORITHM; SELECTION; PCA;
D O I
10.1016/j.patcog.2014.05.022
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
Incremental feature extraction is effective for facilitating the analysis of large-scale streaming data. However, most current incremental feature extraction methods are not suitable for processing streaming data with high feature dimensions because only a few methods have low time complexity, which is linear with both the number of samples and features. In addition, feature extraction methods need to improve the performance of further classification. Therefore, incremental feature extraction methods need to be more efficient and effective. Partial least squares (PLS) is known to be an effective dimension reduction technique for classification. However, the application of PLS to streaming data is still an open problem. In this study, we propose a highly efficient and powerful dimension reduction algorithm called incremental PLS (IPLS), which comprises a two-stage extraction process. In the first stage, the PLS target function is adapted so it is incremental by updating the historical mean to extract the leading projection direction. In the second stage, the other projection directions are calculated based on the equivalence between the PLS vectors and the Krylov sequence. We compared the performance of IPLS with other state-of-the-art incremental feature extraction methods such as incremental principal components analysis, incremental maximum margin criterion, and incremental inter-class scatter using real streaming datasets. Our empirical results showed that IPLS performed better than other methods in terms of its efficiency and further classification accuracy. (C) 2014 The Authors. Published by Elsevier Ltd.
引用
收藏
页码:3726 / 3735
页数:10
相关论文
共 41 条
[1]
[Anonymous], 2002, Principal components analysis
[2]
Artac M, 2002, INT C PATT RECOG, P781, DOI 10.1109/ICPR.2002.1048133
[3]
Partial least squares for discrimination [J].
Barker, M ;
Rayens, W .
JOURNAL OF CHEMOMETRICS, 2003, 17 (03) :166-173
[4]
Boulesteix A.-L., 2004, STAT APPL GENET MOL, V3, P1, DOI [DOI 10.2202/1544-6115.1075, 10.2202/1544-6115.1075]
[5]
BUCKLEY C, 1994, 3 TEXT RETR C TREC 3
[6]
Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach [J].
Chen, Sheng ;
He, Haibo .
EVOLVING SYSTEMS, 2011, 2 (01) :35-50
[7]
Incremental kernel principal component analysis [J].
Chin, Tat-Jun ;
Suter, David .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2007, 16 (06) :1662-1674
[8]
Dai JJ, 2006, STAT APPL GENET MOL, V5
[9]
SIMPLS - AN ALTERNATIVE APPROACH TO PARTIAL LEAST-SQUARES REGRESSION [J].
DEJONG, S .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1993, 18 (03) :251-263
[10]
Clustering objects on subsets of attributes [J].
Friedman, JH ;
Meulman, JJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2004, 66 :815-839