Centering and scaling in component analysis

被引:302
作者
Bro, R [1 ]
Smilde, AK
机构
[1] Royal Vet & Agr Univ, Chemimetr Grp, Dept Dairy & Food Sci, DK-1958 Frederiksberg C, Denmark
[2] Univ Amsterdam, Dept Chem Engn, NL-1018 WV Amsterdam, Netherlands
关键词
offset; weighted least squares; preprocessing; two-way; three-way; multiway; missing data; PCA; PARAFAC;
D O I
10.1002/cem.773
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper the purpose and use of centering and scaling are discussed in depth. The main focus is on two-way bilinear data analysis, but the results can easily be generalized to multiway data analysis. In fact, one of the scopes of this paper is to show that if two-way centering and scaling are understood, then multiway centering and scaling is quite straightforward. In the literature it is often stated that preprocessing of multiway arrays is difficult, but here it is shown that most of the difficulties do not pertain to three- and higher-way modeling in particular. It is shown that centering is most conveniently seen as a projection step, where the data are projected onto certain well-defined spaces within a given mode. This view of centering helps to explain why, for example, centering data with missing elements is likely to be suboptimal if there are many missing elements. Building a model for data consists of two parts: postulating a structural model and using a method to estimate the parameters. Centering has to do with the first part: when centering, a model including offsets is postulated. Scaling has to do with the second part: when scaling, another way of fitting the model is employed. It is shown that centering is simply a convenient technique to estimate model parameters for models with certain offsets, but this does not work for all types of offsets. It is also shown that scaling is a way to fit models with a weighted least squares loss function and that sometimes this change in objective function cannot be performed by a simple scaling step. Further practical. aspects of and alternatives to centering and scaling are discussed, and examples are used throughout to show that the conclusions in the paper are not only of theoretical interest but can have an impact on practical data analysis. Copyright (C) 2003 John Wiley Sons, Ltd.
引用
收藏
页码:16 / 33
页数:18
相关论文
共 36 条
[1]   On the rank deficiency and rank augmentation of the spectral measurement matrix [J].
Amrhein, M ;
Srinivasan, B ;
Bonvin, D ;
Schumacher, MM .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1996, 33 (01) :17-33
[2]  
AMRHEIN M, 1998, THESIS ECOLE POLYTEC
[3]   Applications of maximum likelihood principal component analysis: incomplete data sets and calibration transfer [J].
Andrews, DT ;
Wentzell, PD .
ANALYTICA CHIMICA ACTA, 1997, 350 (03) :341-352
[4]   STANDARD NORMAL VARIATE TRANSFORMATION AND DE-TRENDING OF NEAR-INFRARED DIFFUSE REFLECTANCE SPECTRA [J].
BARNES, RJ ;
DHANOA, MS ;
LISTER, SJ .
APPLIED SPECTROSCOPY, 1989, 43 (05) :772-777
[5]   Maximum likelihood fitting using ordinary least squares algorithms [J].
Bro, R ;
Sidiropoulos, ND ;
Smilde, AK .
JOURNAL OF CHEMOMETRICS, 2002, 16 (8-10) :387-400
[6]  
BRO R, 1998, THESIS U AMSTERDAM
[7]   USING THE SHIFTED MULTIPLICATIVE MODEL TO SEARCH FOR SEPARABILITY IN CROP CULTIVAR TRIALS [J].
CORNELIUS, PL ;
SEYEDSADR, M ;
CROSSA, J .
THEORETICAL AND APPLIED GENETICS, 1992, 84 (1-2) :161-172
[8]  
GABRIEL KR, 1978, J R STAT SOC B, V40, P186
[9]   Missing values in principal component analysis [J].
Grung, B ;
Manne, R .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1998, 42 (1-2) :125-139
[10]   The robust normal variate transform for pattern recognition with near-infrared data [J].
Guo, Q ;
Wu, W ;
Massart, DL .
ANALYTICA CHIMICA ACTA, 1999, 382 (1-2) :87-103