DD-HDS: A method for visualization and exploration of high-dimensional data

被引:45
作者
Lespinats, Sylvain [1 ]
Verleysen, Michel
Giron, Alain
Fertil, Bernard
机构
[1] Univ Paris 06, INSERM, UMR 678, F-75634 Paris, France
[2] Univ Paris 01, F-75634 Paris 13, France
[3] Catholic Univ Louvain, B-1348 Louvain, Belgium
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2007年 / 18卷 / 05期
关键词
high-dimensional data; multidimensional scaling (MDS); neighborhood visualization; nonlinear mapping;
D O I
10.1109/TNN.2007.891682
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mapping high-dimensional data in a low-dimensional space, for example, for visualization, is a problem of increasingly major concern in data analysis. This paper presents data-driven high-dimensional scaling (DD-HDS), a nonlinear mapping method that follows the line of multidimensional scaling (MDS) approach, based on the preservation of distances between pairs of data. It improves the performance of existing competitors with respect to the representation of high-dimensional data, in two ways. It introduces 1) a specific weighting of distances between data taking into account the concentration of measure phenomenon and 2) a symmetric handling of short distances in the original and output spaces, avoiding false neighbor representations while still allowing some necessary tears in the original distribution. More precisely, the weighting is set according to the effective distribution of distances in the data set, with the exception of a single user-defined parameter setting the tradeoff between local neighborhood preservation and global mapping. The optimization of the stress criterion designed for the mapping is realized by "force-directed placement" (FDP). The mappings of low- and high-dimensional data sets are presented as illustrations of the features and advantages of the proposed algorithm. The weighting function specific to. high-dimensional data And the symmetric handling of short distances can be easily incorporated in most distance preservation-based nonlinear dimensionality reduction methods.
引用
收藏
页码:1265 / 1279
页数:15
相关论文
共 65 条
[41]   Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis [J].
Lee, JA ;
Lendasse, A ;
Verleysen, M .
NEUROCOMPUTING, 2004, 57 :49-76
[42]  
Lee JA, 2002, LECT NOTES COMPUT SC, V2415, P933
[43]  
LEE JA, 2003, P ESANN 2003 11 EUR, P527
[44]   Fabrication of silicon optical scanner for laser display [J].
Lee, JH ;
Ko, YC ;
Kong, DH ;
Kim, JM ;
Lee, KB ;
Jeon, DY .
2000 IEEE/LEOS INTERNATIONAL CONFERENCE ON OPTICAL MEMS, 2000, :13-14
[45]  
Li J. X., 2004, Information Visualization, V3, P49, DOI 10.1057/palgrave.ivs.9500051
[46]   Feed-forward neural networks and topographic mappings for exploratory data analysis [J].
Lowe, D ;
Tipping, M .
NEURAL COMPUTING & APPLICATIONS, 1996, 4 (02) :83-95
[47]  
Morrison A., 2003, Information Visualization, V2, P68, DOI 10.1057/palgrave.ivs.9500040
[48]  
Press W. H., 1992, NUMERICAL RECIPES C, V2nd ed., P994
[49]   FORCED DIRECTED COMPONENT PLACEMENT PROCEDURE FOR PRINTED-CIRCUIT BOARDS [J].
QUINN, NR ;
BREUER, MA .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1979, 26 (06) :377-388
[50]  
REEVES CR, 1995, GENETIC ALGORITHMS