Discarding or downweighting high-noise variables in factor analytic models

被引:495
作者
Paatero, P
Hopke, PK
机构
[1] Clarkson Univ, Dept Chem Engn, Potsdam, NY 13699 USA
[2] Univ Helsinki, Dept Phys Sci, FIN-00014 Helsinki, Finland
关键词
principal component analysis; positive matrix factorization; signal-to-noise; scaling of variables; autoscaling; weak variables; givens rotations;
D O I
10.1016/S0003-2670(02)01643-4
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
This work examines the factor analysis of matrices where the proportion of signal and noise is very different in different columns (variables). Such matrices often occur when measuring elemental concentrations in environmental samples. In the strongest variables, the error level may be a few percent. For the weakest variables, the data may consist almost entirely of noise. This paper demonstrates that the proper scaling of weak variables is critical. It is found that if a few weak variables are scaled to too high a weight in the analysis, the errors in computed factors would grow, possibly obscuring the weakest factor(s) by the increased noise level. The mathematical explanation of this phenomenon is explored by means of Givens rotations. It is shown that the customary form of principal component analysis (PCA), based on autoscaling the original data, is generally very ineffective because the scaling of weak variables becomes much too high. Practical advice is given for dealing with noisy data in both PCA and positive matrix factorization (PMF). (C) 2003 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:277 / 289
页数:13
相关论文
共 7 条
[1]   APPLICATION OF PATTERN-RECOGNITION AND FACTOR-ANALYSIS FOR CHARACTERIZATION OF ATMOSPHERIC PARTICULATE COMPOSITION IN SOUTHWEST DESERT ATMOSPHERE [J].
GAARENSTROOM, PD ;
PERONE, SP ;
MOYERS, JL .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 1977, 11 (08) :795-800
[2]  
Golub GH, 2013, Matrix Computations, V4
[3]   Testing and optimizing two factor-analysis techniques on aerosol at Narragansett, Rhode Island [J].
Huang, SL ;
Rahn, KA ;
Arimoto, R .
ATMOSPHERIC ENVIRONMENT, 1999, 33 (14) :2169-2185
[4]   Application of positive matrix factorization in source apportionment of particulate pollutants in Hong Kong [J].
Lee, E ;
Chan, CK ;
Paatero, P .
ATMOSPHERIC ENVIRONMENT, 1999, 33 (19) :3201-3212
[6]   ANALYSIS OF DIFFERENT MODES OF FACTOR-ANALYSIS AS LEAST-SQUARES FIT PROBLEMS [J].
PAATERO, P ;
TAPPER, U .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1993, 18 (02) :183-194
[7]   Chemometric analysis of skeletal data from non-fused and non-π-complexed pentafulvenes [J].
Tomas, X ;
Andrade, JM ;
Alvarez-Larena, A .
TALANTA, 1999, 48 (04) :781-794