Dealing with missing data in MSPC: several methods, different interpretations, some examples

被引:158
作者
Arteaga, F
Ferrer, A
机构
[1] Dpto Metodos Cuantitativos, Fac Estudios Empresa, E-46008 Valencia, Spain
[2] Univ Politecn Valencia, Dpto Estadist & IO, E-46022 Valencia, Spain
关键词
principal component analysis (PCA); missing data; sensor failure; NIPALS; multivariate statistical process control (MSPC);
D O I
10.1002/cem.750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper addresses the problem of using future multivariate observations with missing data to estimate latent variable scores from an existing principal component analysis (PCA) model. This is a critical issue in multivariate statistical process control (MSPC) schemes where the process is continuously interrogated based on an underlying PCA model. We present several methods for estimating the scores of new individuals with missing data: a so-called trimmed score method (TRI), a single-component projection method (SCP), a method of projection to the model plane (PMP), a method based on the iterative imputation of missing data, a method based on the minimization of the squared prediction error (SPE), a conditional mean replacement method (CMR) and various least squared-based methods: one based on a regression on known data (KDR) and the other based on a regression on trimmed scores (TSR). The basis for each method and the expressions for the score estimators, their covariance matrices and the estimation errors are developed. Some of the methods discussed have already been proposed in the literature (SCP, PMP and CMR), some are original (TRI and TSR) and others are shown to be equivalent to methods already developed by other authors: iterative imputation and SPE methods are equivalent to PMP; KDR is equivalent to CMR. These methods can be seen as different ways to impute values for the missing variables. The efficiency of the methods is studied through simulations based on an industrial data set. The KDR method is shown to be statistically superior to the other methods, except the TSR method in which the matrix to be inverted is of a much smaller size. Copyright (C) 2002 John Wiley Sons, Ltd.
引用
收藏
页码:408 / 418
页数:11
相关论文
共 13 条
[1]  
[Anonymous], ANAL CHIM ACTA
[2]  
[Anonymous], 1989, MULTIVARIATE CALIBRA
[3]  
Draper N. R., 1966, APPL REGRESSION ANAL
[4]   Missing values in principal component analysis [J].
Grung, B ;
Manne, R .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1998, 42 (1-2) :125-139
[5]  
Jackson JE., 1991, A user guide to Principal Components, DOI 10.1002/0471725331
[6]   Multivariate SPC methods for process and product monitoring [J].
Kourti, T ;
MacGregor, JF .
JOURNAL OF QUALITY TECHNOLOGY, 1996, 28 (04) :409-428
[7]  
Little RJA, 1987, Statistical Analysis With Missing Data
[8]   Missing data methods in PCA and PLS: Score calculations with incomplete observations [J].
Nelson, PRC ;
Taylor, PA ;
MacGregor, JF .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1996, 35 (01) :45-65
[9]  
*UMEA, 1999, SIMCA P 8 0 US GUID
[10]  
Wise B.M., 1991, IFAC INT S ADCHEM 91, P125