Dimensionality reduction when data are density functions

被引:59
作者
Delicado, P. [1 ]
机构
[1] Univ Politecn Cataluna, Dept Estadist & Invest Operat, ES-08034 Barcelona, Spain
关键词
Compositional data; Functional data analysis; Graphical output; Kullback-Leibler divergence; L-p distance; Multidimensional scaling; Population pyramids; Principal components analysis;
D O I
10.1016/j.csda.2010.05.008
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Functional Data Analysis deals with samples where a whole function is observed for each individual. A relevant case of FDA is when the observed functions are density functions. Among the particular characteristics of density functions, the most of the fact that they are an example of infinite dimensional compositional data (parts of some whole which only carry relative information) is made. Several dimensionality reduction methods for this particular type of data are compared: functional principal components analysis with or without a previous data transformation, and multidimensional scaling for different inter-density distances, one of them taking into account the compositional nature of density functions. The emphasis is on the steps previous and posterior to the application of a particular dimensionality reduction method: care must be taken in choosing the right density function transformation and/or the appropriate distance between densities before performing dimensionality reduction; subsequently the graphical representation of dimensionality reduction results must take into account that the observed objects are density functions. The different methods are applied(1) to artificial and real data (population pyramids for 223 countries in year 2000). As a global conclusion, the use of multidimensional scaling based on compositional distance is recommended. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:401 / 420
页数:20
相关论文
共 19 条
[1]  
Anderson T.W., 1986, STAT ANAL DATA, V2nd, DOI DOI 10.1007/978-94-009-4109-0
[2]  
[Anonymous], 2007, Lecture Notes on Compositional Data Analysis
[3]  
Borg I., 2005, Modern multidimensional scaling: theory and applications
[4]  
Davidian M, 2004, STAT SINICA, V14, P613
[5]   Principal curves of oriented points: Theoretical and computational improvements [J].
Delicado, P ;
Huerta, M .
COMPUTATIONAL STATISTICS, 2003, 18 (02) :293-315
[6]   Hilbert space of probability density functions based on Aitchison geometry [J].
Egozcue, J. J. ;
Diaz-Barrero, J. L. ;
Pawlowsky-Glahn, V. .
ACTA MATHEMATICA SINICA-ENGLISH SERIES, 2006, 22 (04) :1175-1182
[7]  
Ferraty F., 2006, SPR S STAT
[8]   Statistics for functional data [J].
Gonzalez Manteiga, Wenceslao ;
Vieu, Philippe .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (10) :4788-4792
[9]  
*IPC, 2000, INT DAT BAS
[10]   DISPLAYING THE IMPORTANT FEATURES OF LARGE COLLECTIONS OF SIMILAR CURVES [J].
JONES, MC ;
RICE, JA .
AMERICAN STATISTICIAN, 1992, 46 (02) :140-145