A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems

被引:57
作者
Lawrence, RD [1 ]
Almasi, GS [1 ]
Rushmeier, HE [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
parallel processing; parallel IO; scalable data mining; clustering; Kohonen self-organizing maps; data visualization;
D O I
10.1023/A:1009817804059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe a scalable parallel implementation of the self organizing map (SOM) suitable for data-mining applications involving clustering or segmentation against large data sets such as those encountered in the analysis of customer spending patterns. The parallel algorithm is based on the batch SOM formulation in which the neural weights are updated at the end of each pass over the training data. The underlying serial algorithm is enhanced to take advantage of the sparseness often encountered in these data sets. Analysis of a realistic test problem shows that the batch SOM algorithm captures key features observed using the conventional on-line algorithm, with comparable convergence rates. Performance measurements on an SP2 parallel computer are given for two retail data sets and a publicly available set of census data.These results demonstrate essentially linear speedup for the parallel batch SOM algorithm, using both a memory-contained sparse formulation as well as a separate implementation in which the mining data is accessed directly from a parallel file system. We also present visualizations of the census data to illustrate the value of the clustering information obtained via the parallel SOM method.
引用
收藏
页码:171 / 195
页数:25
相关论文
共 24 条
[1]  
[Anonymous], P 1990 INT JOINT C N
[2]  
[Anonymous], SELFORGANIZING MAPS
[3]  
BUHUSI CV, 1993, P 9 ROM S COMP SCI 9, P51
[4]   COMPETITIVE NEURAL NETWORKS ON MESSAGE-PASSING PARALLEL COMPUTERS [J].
CECCARELLI, M ;
PETROSINO, A ;
VACCARO, R .
CONCURRENCY-PRACTICE AND EXPERIENCE, 1993, 5 (06) :449-470
[5]  
Gropp W. D., 1994, Using MPI-Portable Parallel Programming with the Message -Parsing Interface
[6]  
HONKELA T, 1998, WEBSOM SELFORGANIZIN
[7]   Modified self-organizing feature map algorithms for efficient digital hardware implementation [J].
Ienne, P ;
Thiran, P ;
Vassilas, N .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1997, 8 (02) :315-330
[8]   THE NEURAL PHONETIC TYPEWRITER [J].
KOHONEN, T .
COMPUTER, 1988, 21 (03) :11-72
[9]  
KOHONEN T, 1996, P ART NEUR NETW ICAN
[10]  
KOHONEN T, 1993, P IEEE INT C NEUR NE, P1147