POSITIVE MATRIX FACTORIZATION - A NONNEGATIVE FACTOR MODEL WITH OPTIMAL UTILIZATION OF ERROR-ESTIMATES OF DATA VALUES

被引:4197
作者
PAATERO, P
TAPPER, U
机构
[1] University of Helsinki, Department of Physics, Helsinki, SF-00170
[2] Technical Research Centre of Finland (VTT), Aerosol Technology Group, Espoo, SF-02151
关键词
FACTOR ANALYSIS; PRINCIPAL COMPONENT ANALYSIS; WEIGHTED LEAST SQUARES; ALTERNATING REGRESSION; ERROR ESTIMATES; SCALING; REPETITIVE MEASUREMENTS;
D O I
10.1002/env.3170050203
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
A new variant 'PMF' of factor analysis is described. It is assumed that X is a matrix of observed data and sigma is the known matrix of standard deviations of elements of X. Both X and sigma are of dimensions n x m. The method solves the bilinear matrix problem X = GF + E where G is the unknown left hand factor matrix (scores) of dimensions n x p, F is the unknown right hand factor matrix (loadings) of dimensions p x m, and E is the matrix of residuals. The problem is solved in the weighted least squares sense: G and F are determined so that the Frobenius norm of E divided (element-by-element) by sigma is minimized. Furthermore, the solution is constrained so that all the elements of G and F are required to be non-negative. It is shown that the solutions by PMF are usually different from any solutions produced by the customary factor analysis (FA, i.e. principal component analysis (PCA) followed by rotations). Usually PMF produces a better fit to the data than FA. Also, the result of PF is guaranteed to be non-negative, while the result of FA often cannot be rotated so that all negative entries would be eliminated. Different possible application areas of the new method are briefly discussed. In environmental data, the error estimates of data can be widely varying and non-negativity is often an essential feature of the underlying models. Thus it is concluded that PMF is better suited than FA or PCA in many environmental applications. Examples of successful applications of PMF are shown in companion papers.
引用
收藏
页码:111 / 126
页数:16
相关论文
共 12 条
  • [1] Paatero P., Tapper U., Aalto P., Kulmala M., Matrix factorization methods for analysing diffusion battery data, Journal of Aerosol Science, 22, (1991)
  • [2] Paatero P., Tapper U., Analysis of different modes of factor analysis as least squares fit problems, Chemometrics and Intelligent Laboratory Systems, 18, pp. 183-194, (1993)
  • [3] Tapper U., Robust modelling of data errors in non‐negative factor analysis of bulk wet deposition, (1994)
  • [4] Henry R.C., Multivariate receptor models, Receptor Modeling for Air Quality Management, (1991)
  • [5] Shen J., Israel G.W., A receptor model using a specific non‐negative transformation technique for ambient aerosol, Atmospheric Environment, 23, pp. 2289-2298, (1989)
  • [6] Karjalainen E., Karjalainen U., Mathematical chromatography — resolution of overlapping spectra in GC‐MS, Medical Informatics Europe, 85, pp. 572-578, (1985)
  • [7] Karjalainen E., Karjalainen U., Component reconstruction in the primary space of spectra and concentrations. Alternating regression and related direct methods, Analytica Chimica Acta, 250, pp. 169-179, (1991)
  • [8] Karjalainen E., (1993)
  • [9] Juntto S., Paatero P., Analysis of daily precipitation data by positive matrix factorization, Environmetrics, 5, pp. 127-144, (1994)
  • [10] Currie L.A., Gerlach R.W., Lewis C.W., Balfour W.D., Cooper J.A., Dattner S.L., De Cesar R.T., Gordon G.E., Heisler S.L., Hopke P.K., Shah J.J., Thurston G.D., Williamson H.J., Interlaboratory comparison of source apportionment procedures: results for simulated data sets, Atmospheric Environment, 18, pp. 1517-1537, (1984)