A new agglomerative hierarchical clustering algorithm implementation based on the map reduce framework

被引:7
作者
Gao H. [1 ,3 ]
Jiang J. [2 ]
She L. [3 ]
Fu Y. [3 ]
机构
[1] Web Sciences Center, School of Computer Science and Engineering
关键词
Agglomerative hierarchical clustering; Initial classification; Map reduce; Text clustering;
D O I
10.4156/jdcta.vol4.issue3.9
中图分类号
学科分类号
摘要
Text clustering is one of the difficult and hot research fields in the text mining research. Combing Map Reduce framework and the neuron initialization method of VPSOM (vector pressing Self- Organizing Model) algorithm, a new text clustering algorithm is presented. It divides the large text vector dataset into data blocks, each of which then processed in different distributed data node of Map Reduce framework with agglomerative hierarchical clustering algorithm. The experiment results indicate that the improved algorithm has a higher efficiency and a better accuracy.
引用
收藏
页码:95 / 100
相关论文
共 15 条
[1]  
Jain A.K., Dubes R.C., Algorithms For Clustering Data, (1988)
[2]  
Bradley P., Fayyad U., Reina C., Clustering very large database using EM mixture models, Conf. On Pattern Recognition, pp. 76-80, (2000)
[3]  
Nigam K., McCallum A.K., Thrun S., Mitchel T.M., Text classification from labeled and unlabeled documents using EM, Machine Learning, 39, 2-3, pp. 103-134, (2000)
[4]  
Law M.H.C., Figueiredo M.A.T., Jain A.K., Simultaneous feature selection and clustering using mixture models, IEEE Transaction On Pattern Analysis and Machine Intelligence, 26, 9, pp. 1154-1166, (2004)
[5]  
An L., New Methods For Cluster Analysis In Distributed Environments, (2006)
[6]  
Li M., K-means Algorithm and Parallelization, (2003)
[7]  
Ya Z.Z., Mei C.H., Jin W., Fa W.X., An Approach on the Data Structure for the Matrix Storing Based on the Implementation of Agglomerative Hierarchical Clustering Algorithm, Computer Science, 1, pp. 14-17, (2006)
[8]  
Manasi N., Joshi P., (2003)
[9]  
An Efficient K-Means Clustering Algorithm
[10]  
Liping L., Meng Z.-Q., A method of choosing the initial cluster centers, Computer Engineering and Applications, 8, pp. 179-180, (2004)