Principal component analysis for compositional data with outliers

被引:408
作者
Filzmoser, Peter [1 ]
Hron, Karel [2 ]
Reimann, Clemens [3 ]
机构
[1] Vienna Univ Technol, Dept Stat & Probabil Theory, A-1040 Vienna, Austria
[2] Palacky Univ Olomouc, Dept Math Anal & Applicat Math, CZ-77100 Olomouc, Czech Republic
[3] Geol Survey Norway NGU, N-7491 Trondheim, Norway
关键词
robust statistics; compositional data; isometric logratio transformation; principal component analysis;
D O I
10.1002/env.966
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Compositional data (almost all data in geochemistry) are closed data, that is they usually sum up to a constant (e.g weight percent, wt.%) and carry only relative information. Thus, the covariance structure of compositional data is strongly biased and results of many multivariate techniques become doubtful without a proper transformation of the data. The centred logratio transformation (clr) is often used to open closed data. However the transformed data do not have full rank following a logratio transformation and cannot be used for robust multivariate techniques like principal component analysis (PCA). Here we propose to use the isometric logratio transformation (ilr) instead. However, the ilr transformation has the disadvantage that the resulting new variables are no longer directly interpretable in terms of the originally entered variables. Here we propose a technique how the resulting scores and loadings of a robust PCA on ilr transformed data can be back-transformed and interpreted. The procedure is demonstrated using a real data set from regional geochemistry and compared to results from non-transformed and non-robust versions of PCA. It turns out that the procedure using ilr-transformed data and robust PCA delivers superior results to all other approaches. The examples demonstrate that due to the compositional nature of geochemical data PCA should not be carried Out Without an appropriate transformation. Furthermore a robust approach is preferable if the dataset contains outliers. Copyright (C) 2009 John Wiley & Sons, Ltd.
引用
收藏
页码:621 / 632
页数:12
相关论文
共 18 条
[1]   REDUCING THE DIMENSIONALITY OF COMPOSITIONAL DATA SETS [J].
AITCHISON, J .
JOURNAL OF THE INTERNATIONAL ASSOCIATION FOR MATHEMATICAL GEOLOGY, 1984, 16 (06) :617-635
[2]  
AITCHISON J, 1983, BIOMETRIKA, V70, P57
[3]   Biplots of compositional data [J].
Aitchison, J ;
Greenacre, M .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2002, 51 :375-392
[4]  
Aitchison J., 1986, The statistical analysis of compositional data: Monographs on statistics and applied probability, P416
[5]  
[Anonymous], 2007, Lecture Notes on Compositional Data Analysis
[6]   Isometric logratio transformations for compositional data analysis [J].
Egozcue, JJ ;
Pawlowsky-Glahn, V ;
Mateu-Figueras, G ;
Barceló-Vidal, C .
MATHEMATICAL GEOLOGY, 2003, 35 (03) :279-300
[7]  
Filzmoser P, 1999, ENVIRONMETRICS, V10, P363, DOI 10.1002/(SICI)1099-095X(199907/08)10:4<363::AID-ENV362>3.0.CO
[8]  
2-0
[9]   Outlier detection for compositional data using robust methods [J].
Filzmoser, Peter ;
Hron, Karel .
MATHEMATICAL GEOSCIENCES, 2008, 40 (03) :233-248
[10]   BIPLOT GRAPHIC DISPLAY OF MATRICES WITH APPLICATION TO PRINCIPAL COMPONENT ANALYSIS [J].
GABRIEL, KR .
BIOMETRIKA, 1971, 58 (03) :453-+