Cluster stability and the use of noise in interpretation of clustering

被引:42
作者
Davidson, GS [1 ]
Wylie, BN [1 ]
Boyack, KW [1 ]
机构
[1] Sandia Natl Labs, Livermore, CA 94550 USA
来源
IEEE SYMPOSIUM ON INFORMATION VISUALIZATION 2001, PROCEEDINGS | 2001年
关键词
D O I
10.1109/INFVIS.2001.963275
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A clustering and ordination algorithm suitable for mining extremely large databases, including those produced by microarray expression studies, is described and analyzed for stability. Data from a yeast cell cycle experiment with 6000 genes and 18 experimental measurements per gene are used to test this algorithm under practical conditions. The process of assigning database objects to an X, Y coordinate, ordination, is shown to be stable with respect to random starting conditions, and with respect to minor perturbations in the starting similarity estimates. Careful analysis of the way clusters typically co-locate, versus the occasional large displacements under different starting conditions are shown to be useful in interpreting the data. This extra stability information is lost when only a single cluster is reported, which is currently the accepted practice. However, it is believed that the approaches presented here should become a standard part of best practices in analyzing computer clustering of large data collections.
引用
收藏
页码:23 / 30
页数:8
相关论文
共 20 条
[1]  
[Anonymous], 2017, Introduction to robust estimation and hypothesis testing
[2]   Knowledge mining with VxInsight: Discovery through interaction [J].
Davidson, GS ;
Hendrickson, B ;
Johnson, DK ;
Meyers, CE ;
Wylie, BN .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 1998, 11 (03) :259-285
[3]  
Davidson R, 1989, CS8913 DEP APPL MATH
[4]  
Eades Peter, 1984, Congressus Numerantium, V42, P149, DOI DOI 10.1007/3-540-63938-1_
[5]  
Fisher R.A., 1921, METRON, V1, P3, DOI DOI 10.1093/BIOMET/9.1-2.22
[6]  
FRUCHTERMANN T, 1990, UIUCDCSR901609
[7]   AN ALGORITHM FOR DRAWING GENERAL UNDIRECTED GRAPHS [J].
KAMADA, T ;
KAWAI, S .
INFORMATION PROCESSING LETTERS, 1989, 31 (01) :7-15
[8]   A SIMPLE METHOD FOR COMPUTING GENERAL POSITION IN DISPLAYING 3-DIMENSIONAL OBJECTS [J].
KAMADA, T ;
KAWAI, S .
COMPUTER VISION GRAPHICS AND IMAGE PROCESSING, 1988, 41 (01) :43-56
[9]  
KAMADA T, 1988, 88007 TOK U DEP INF
[10]   OPTIMIZATION BY SIMULATED ANNEALING [J].
KIRKPATRICK, S ;
GELATT, CD ;
VECCHI, MP .
SCIENCE, 1983, 220 (4598) :671-680