A robust PCR method for high-dimensional regressors

被引:69
作者
Hubert, M
Verboven, S
机构
[1] Katholieke Univ Leuven, Dept Math, B-3001 Heverlee, Belgium
[2] Univ Antwerp, Dept Math & Comp Sci, B-2020 Antwerp, Belgium
关键词
principal component analysis; principal component regression; robust regression; multivariate calibration;
D O I
10.1002/cem.783
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the multivariate calibration model which assumes that the concentrations of several constituents of a sample are linearly related to its spectrum. Principal component regression (PCR) is widely used for the estimation of the regression parameters in this model. In the classical approach it combines principal component analysis (PCA) on the regressors with least squares regression. However, both stages yield very unreliable results when the data set contains outlying observations. We present a robust PCR (RPCR) method which also consists of two parts. First we apply a robust PCA method for high-dimensional data on the regressors, then we regress the response variables on the scores using a robust regression method. A robust RMSECV value and a robust R-2 value are proposed as exploratory tools to select the number of principal components. The prediction error is also estimated in a robust way. Moreover, we introduce several diagnostic plots which are helpful to visualize and classify the outliers. The robustness of RPCR is demonstrated through simulations and the analysis of a real data set. Copyright (C) 2003 John Wiley Sons, Ltd.
引用
收藏
页码:438 / 452
页数:15
相关论文
共 31 条
[1]  
AGULLO J, UNPUB MULTIVARIATE L
[2]  
BEEBE KR, 1998, JCHEMOMETRICS PRACTI
[3]   Robust estimation of the SUR model [J].
Bilodeau, M ;
Duchesne, P .
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2000, 28 (02) :277-288
[4]   A BOUNDED INFLUENCE, HIGH BREAKDOWN, EFFICIENT REGRESSION ESTIMATOR [J].
COAKLEY, CW ;
HETTMANSPERGER, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (423) :872-880
[5]   Influence function and efficiency of the minimum covariance determinant scatter matrix estimator [J].
Croux, C ;
Haesbroeck, G .
JOURNAL OF MULTIVARIATE ANALYSIS, 1999, 71 (02) :161-190
[6]   Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies [J].
Croux, C ;
Haesbroeck, G .
BIOMETRIKA, 2000, 87 (03) :603-618
[7]   Outlier detection in multivariate analytical chemical data [J].
Egan, WJ ;
Mogan, SL .
ANALYTICAL CHEMISTRY, 1998, 70 (11) :2372-2379
[8]  
GARCIAESCUDERO LA, IN PRESS J COMPUT GR
[9]  
Hastie T, 2008, The elements of statistical learning, Vsecond, DOI DOI 10.1007/978-0-387-21606-5
[10]   A fast method for robust principal components with applications to chemometrics [J].
Hubert, M ;
Rousseeuw, PJ ;
Verboven, S .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2002, 60 (1-2) :101-111