A fast method for robust principal components with applications to chemometrics

被引：178

作者：

Hubert, M ^{[1
]}

Rousseeuw, PJ ^{[1
]}

Verboven, S ^{[1
]}

机构：

[1] Univ Instelling Antwerp, Dept Math & Comp Sci, B-2610 Wilrijk, Belgium

来源：

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS | 2002年 / 60卷 / 1-2期

关键词：

principal component analysis; projection pursuit; algorithm;

D O I：

10.1016/S0169-7439(01)00188-5

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

When faced with high-dimensional data, one often uses principal component analysis (PCA) for dimension reduction. Classical PCA constructs a set of uncorrelated variables, which correspond to eigenvectors of the sample covariance matrix. However, it is well-known that this covariance matrix is strongly affected by anomalous observations. It is therefore necessary to apply robust methods that are resistant to possible outliers. Li and Chen [J. Am. Stat. Assoc. 80 (1985) 759] proposed a solution based on projection pursuit (PP). The idea is to search for the direction in which the projected observations have the largest robust scale. In subsequent steps, each new direction is constrained to be orthogonal to all previous directions. This method is very well suited for high-dimensional data, even when the number of variables p is higher than the number of observations n. However, the,algorithm of Li and Chen has a high computational cost. In the references [C. Croux, A. Ruiz-Gazen, in COMPSTAT: Proceedings in Computational Statistics 1996, Physica-Verlag, Heidelberg, 1996, pp. 211-217, C. Croux and A. Ruiz-Gazen, High Breakdown Estimators for Principal Components: the Projection-Pursuit Approach Revisited, 2000, submitted for publication.], a computationally much more attractive method is presented, but in high dimensions (large p) it has a numerical accuracy problem and still consumes much computation time. In this paper, we construct a faster two-step algorithm that is more stable numerically. The new algorithm is illustrated on a data set with four dimensions and on two chemometrical data sets with 1200 and 600 dimensions. (C) 2002 Elsevier Science B.V. All rights reserved.

引用

页码：101 / 111

页数：11

共 17 条

[1] Croux C., 1996, COMPSTAT. Proceedings in Computational Statistics. 12th Symposium, P211
[2] Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies
Croux, C
Haesbroeck, G
[J]. BIOMETRIKA, 2000, 87 (03) : 603 - 618
[3] CROUX C, 2000, UNPUB HIGH BREAKDOWN
[4] ASYMPTOTIC-BEHAVIOR OF S-ESTIMATES OF MULTIVARIATE LOCATION PARAMETERS AND DISPERSION MATRICES
DAVIES, PL
[J]. ANNALS OF STATISTICS, 1987, 15 (03) : 1269 - 1292
[5] ROBUST ESTIMATION AND OUTLIER DETECTION WITH CORRELATION-COEFFICIENTS
DEVLIN, SJ
GNANADESIKAN, R
KETTENRING, JR
[J]. BIOMETRIKA, 1975, 62 (03) : 531 - 545
[6] Hossjer 0., 1995, NONPARAMETRIC STAT, V4, P293
[7] PROJECTION-PURSUIT APPROACH TO ROBUST DISPERSION MATRICES AND PRINCIPAL COMPONENTS - PRIMARY THEORY AND MONTE-CARLO
LI, GY
CHEN, ZL
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1985, 80 (391) : 759 - 766
[8] Robust principal component analysis for functional data
N. Locantore
J. S. Marron
D. G. Simpson
N. Tripoli
J. T. Zhang
K. L. Cohen
Graciela Boente
Ricardo Fraiman
Babette Brumback
Christophe Croux
Jianqing Fan
Alois Kneip
John I. Marden
Daniel Peña
Javier Prieto
Jim O. Ramsay
Mariano J. Valderrama
Ana M. Aguilera
N. Locantore
J. S. Marron
D. G. Simpson
N. Tripoli
J. T. Zhang
K. L. Cohen
[J]. Test, 1999, 8 (1) : 1 - 73
[9] Generalized linear regression on sampled signals and curves:: A P-spline approach
Marx, BD
Eilers, PHC
[J]. TECHNOMETRICS, 1999, 41 (01) : 1 - 13
[10] APPLICATION OF NEAR-INFRARED REFLECTANCE SPECTROSCOPY TO THE COMPOSITIONAL ANALYSIS OF BISCUITS AND BISCUIT DOUGHS
OSBORNE, BG
FEARN, T
MILLER, AR
DOUGLAS, S
[J]. JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE, 1984, 35 (01) : 99 - 105

← 1 2 →