A fast method for robust principal components with applications to chemometrics

被引:178
作者
Hubert, M [1 ]
Rousseeuw, PJ [1 ]
Verboven, S [1 ]
机构
[1] Univ Instelling Antwerp, Dept Math & Comp Sci, B-2610 Wilrijk, Belgium
关键词
principal component analysis; projection pursuit; algorithm;
D O I
10.1016/S0169-7439(01)00188-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When faced with high-dimensional data, one often uses principal component analysis (PCA) for dimension reduction. Classical PCA constructs a set of uncorrelated variables, which correspond to eigenvectors of the sample covariance matrix. However, it is well-known that this covariance matrix is strongly affected by anomalous observations. It is therefore necessary to apply robust methods that are resistant to possible outliers. Li and Chen [J. Am. Stat. Assoc. 80 (1985) 759] proposed a solution based on projection pursuit (PP). The idea is to search for the direction in which the projected observations have the largest robust scale. In subsequent steps, each new direction is constrained to be orthogonal to all previous directions. This method is very well suited for high-dimensional data, even when the number of variables p is higher than the number of observations n. However, the,algorithm of Li and Chen has a high computational cost. In the references [C. Croux, A. Ruiz-Gazen, in COMPSTAT: Proceedings in Computational Statistics 1996, Physica-Verlag, Heidelberg, 1996, pp. 211-217, C. Croux and A. Ruiz-Gazen, High Breakdown Estimators for Principal Components: the Projection-Pursuit Approach Revisited, 2000, submitted for publication.], a computationally much more attractive method is presented, but in high dimensions (large p) it has a numerical accuracy problem and still consumes much computation time. In this paper, we construct a faster two-step algorithm that is more stable numerically. The new algorithm is illustrated on a data set with four dimensions and on two chemometrical data sets with 1200 and 600 dimensions. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:101 / 111
页数:11
相关论文
共 17 条
  • [1] Croux C., 1996, COMPSTAT. Proceedings in Computational Statistics. 12th Symposium, P211
  • [2] Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies
    Croux, C
    Haesbroeck, G
    [J]. BIOMETRIKA, 2000, 87 (03) : 603 - 618
  • [3] CROUX C, 2000, UNPUB HIGH BREAKDOWN
  • [5] ROBUST ESTIMATION AND OUTLIER DETECTION WITH CORRELATION-COEFFICIENTS
    DEVLIN, SJ
    GNANADESIKAN, R
    KETTENRING, JR
    [J]. BIOMETRIKA, 1975, 62 (03) : 531 - 545
  • [6] Hossjer 0., 1995, NONPARAMETRIC STAT, V4, P293
  • [7] PROJECTION-PURSUIT APPROACH TO ROBUST DISPERSION MATRICES AND PRINCIPAL COMPONENTS - PRIMARY THEORY AND MONTE-CARLO
    LI, GY
    CHEN, ZL
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1985, 80 (391) : 759 - 766
  • [8] Robust principal component analysis for functional data
    N. Locantore
    J. S. Marron
    D. G. Simpson
    N. Tripoli
    J. T. Zhang
    K. L. Cohen
    Graciela Boente
    Ricardo Fraiman
    Babette Brumback
    Christophe Croux
    Jianqing Fan
    Alois Kneip
    John I. Marden
    Daniel Peña
    Javier Prieto
    Jim O. Ramsay
    Mariano J. Valderrama
    Ana M. Aguilera
    N. Locantore
    J. S. Marron
    D. G. Simpson
    N. Tripoli
    J. T. Zhang
    K. L. Cohen
    [J]. Test, 1999, 8 (1) : 1 - 73
  • [9] Generalized linear regression on sampled signals and curves:: A P-spline approach
    Marx, BD
    Eilers, PHC
    [J]. TECHNOMETRICS, 1999, 41 (01) : 1 - 13
  • [10] APPLICATION OF NEAR-INFRARED REFLECTANCE SPECTROSCOPY TO THE COMPOSITIONAL ANALYSIS OF BISCUITS AND BISCUIT DOUGHS
    OSBORNE, BG
    FEARN, T
    MILLER, AR
    DOUGLAS, S
    [J]. JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE, 1984, 35 (01) : 99 - 105