Mapping high-dimensional data onto a relative distance plane - an exact method for visualizing and characterizing high-dimensional patterns

被引:25
作者
Somorjai, RL [1 ]
Dolenko, B [1 ]
Demko, A [1 ]
Mandelzweig, M [1 ]
Nikulin, AE [1 ]
Baumgartner, R [1 ]
Pizzi, NJ [1 ]
机构
[1] Natl Res Council Canada, Inst Biodiagnost, Winnipeg, MB R3B 1Y6, Canada
关键词
D O I
10.1016/j.jbi.2004.07.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We introduce a distance (similarity)-based mapping for the visualization of high-dimensional patterns and their relative relationships. The mapping preserves exactly the original distances between points with respect to any two reference patterns in a special two-dimensional coordinate system, the relative distance plane (RDP). As only a single calculation of a distance matrix is required, this method is computationally efficient, an essential requirement for any exploratory data analysis. The data visualization afforded by this representation permits a rapid assessment of class pattern distributions. In particular, we can determine with a simple statistical test whether both training and validation sets of a 2-class, high-dimensional dataset derive from the same class distributions. We can explore any dataset in detail by identifying the subset of reference pairs whose members belong to different classes, cycling through this subset, and for each pair, mapping the remaining patterns. These multiple viewpoints facilitate the identification and confirmation of outliers. We demonstrate the effectiveness of this method on several complex biomedical datasets. Because of its efficiency, effectiveness, and versatility, one may use the RDP representation as an initial, data mining exploration that precedes classification by some classifier. Once final enhancements to the RDP mapping software are completed, we plan to make it freely available to researchers. Crown Copyright (C) 2004 Published by Elsevier Inc. All rights reserved.
引用
收藏
页码:366 / 379
页数:14
相关论文
共 44 条
[1]  
Adam BL, 2002, CANCER RES, V62, P3609
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]   CLASSIFICATION INTO 2 MULTIVARIATE NORMAL-DISTRIBUTIONS WITH DIFFERENT COVARIANCE MATRICES [J].
ANDERSON, TW ;
BAHADUR, RR .
ANNALS OF MATHEMATICAL STATISTICS, 1962, 33 (02) :420-&
[4]   Molecular classification of cutaneous malignant melanoma by gene expression profiling [J].
Bittner, M ;
Meitzer, P ;
Chen, Y ;
Jiang, Y ;
Seftor, E ;
Hendrix, M ;
Radmacher, M ;
Simon, R ;
Yakhini, Z ;
Ben-Dor, A ;
Sampas, N ;
Dougherty, E ;
Wang, E ;
Marincola, F ;
Gooden, C ;
Lueders, J ;
Glatfelter, A ;
Pollock, P ;
Carpten, J ;
Gillanders, E ;
Leja, D ;
Dietrich, K ;
Beaudry, C ;
Berens, M ;
Alberts, D ;
Sondak, V ;
Hayward, N ;
Trent, J .
NATURE, 2000, 406 (6795) :536-540
[5]  
Borg I., 1997, Modern Multidimensional Scaling
[6]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[7]   Non-linear dimensionality reduction techniques for unsupervised feature extraction [J].
De Backer, S ;
Naud, A ;
Scheunders, P .
PATTERN RECOGNITION LETTERS, 1998, 19 (08) :711-720
[8]  
DeRisi J, 1996, NAT GENET, V14, P457
[9]   PROJECTION PURSUIT ALGORITHM FOR EXPLORATORY DATA-ANALYSIS [J].
FRIEDMAN, JH ;
TUKEY, JW .
IEEE TRANSACTIONS ON COMPUTERS, 1974, C 23 (09) :881-890
[10]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537