Decision-making method using a visual approach for cluster analysis problems; indicative classification algorithms and grouping scope

被引:43
作者
Bittmann, Ran M. [1 ]
Gelbard, Roy M. [1 ]
机构
[1] Bar Ilan Univ, Informat Syst Program, Grad Sch Business Adm, IL-52900 Ramat Gan, Israel
关键词
cluster analysis; visualization techniques; decision support system;
D O I
10.1111/j.1468-0394.2007.00428.x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently, classifying samples into a fixed number of clusters (i.e. supervised cluster analysis) as well as unsupervised cluster analysis are limited in their ability to support 'cross-algorithms' analysis. It is well known that each cluster analysis algorithm yields different results (i.e. a different classification); even running the same algorithm with two different similarity measures commonly yields different results. Researchers usually choose the preferred algorithm and similarity measure according to analysis objectives and data set features, but they have neither a formal method nor tool that supports comparisons and evaluations of the different classifications that result from the diverse algorithms. Current research development and prototype decisions support a methodology based upon formal quantitative measures and a visual approach, enabling presentation, comparison and evaluation of multiple classification suggestions resulting from diverse algorithms. This methodology and tool were used in two basic scenarios: (I) a classification problem in which a 'true result' is known, using the Fisher iris data set; (II) a classification problem in which there is no 'true result' to compare with. In this case, we used a small data set from a user profile study (a study that tries to relate users to a set of stereotypes based on sociological aspects and interests). In each scenario, ten diverse algorithms were executed. The suggested methodology and decision support system produced a cross-algorithms presentation; all ten resultant classifications are presented together in a 'Tetris-like' format. Each column represents a specific classification algorithm, each line represents a specific sample, and formal quantitative measures analyse the 'Tetris blocks', arranging them according to their best structures, i.e. best classification.
引用
收藏
页码:171 / 187
页数:17
相关论文
共 10 条
[1]  
Boudjeloud L, 2005, LECT NOTES ARTIF INT, V3518, P426
[2]  
Clifford H.T., 1975, INTRO NUMERICAL CLAS
[3]   From visual data exploration to visual data mining: A survey [J].
de Oliveira, MCF ;
Levkowitz, H .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2003, 9 (03) :378-394
[4]   Data mining by means of binary representation: A model for similarity and clustering [J].
Erlich, Z ;
Gelbard, R ;
Spiegler, I .
INFORMATION SYSTEMS FRONTIERS, 2002, 4 (02) :187-197
[5]   The use of multiple measurements in taxonomic problems [J].
Fisher, RA .
ANNALS OF EUGENICS, 1936, 7 :179-188
[6]   Data clustering: A review [J].
Jain, AK ;
Murty, MN ;
Flynn, PJ .
ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323
[7]  
Jain K, 1988, Algorithms for clustering data
[8]  
SHAMIR R, 2002, CURRENT TOPICS COMPU, P269
[9]   Experimentation with an information filtering system that combines cognitive and sociological filtering integrated with user stereotypes [J].
Shapira, B ;
Shoval, P ;
Hanani, U .
DECISION SUPPORT SYSTEMS, 1999, 27 (1-2) :5-24
[10]   MODELING COGNITIVE-DEVELOPMENT ON BALANCE SCALE PHENOMENA [J].
SHULTZ, TR ;
MARESCHAL, D ;
SCHMIDT, WC .
MACHINE LEARNING, 1994, 16 (1-2) :57-86