基于k均值分区的流数据高效密度聚类算法

被引:7
作者
倪巍伟
陆介平
陈耿
孙志挥
机构
[1] 东南大学计算机科学与工程系
基金
高等学校博士学科点专项科研基金;
关键词
数据流聚类; 均值参考点; 密度聚类;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
数据流聚类是数据流挖掘研究的一个重要内容,已有的数据流聚类算法大多采用k中心点(均值)方法对数据进行聚类,不能对数据分布不规则以及高维空间数据流进行有效聚类.论文提出一种基于k均值分区的流数据密度聚类算法,先对数据流进行分区做k均值聚类生成中间聚类结果(均值参考点集),随后对这些均值参考点进行密度聚类,理论分析和实验结果表明算法可以有效解决数据分布不规则以及高维空间数据流聚类问题,算法是有效可行的.
引用
收藏
页码:83 / 87
页数:5
相关论文
共 9 条
[1]  
Micheline Data mining:concepts and techniques. Han Jia-wei. . 2000
[2]  
BIRCH:an efficient data clustering method for very large databases. Zhang T,Ramakrishnan R,Livny M. Proc.of the1996ACM SIGMOD Int.Conf.on Management of Data . 1996
[3]  
CURE:an efficient clustering al-gorithm for large databases. Guha S,Rostogi R,Shim K. Proceedings of the ACM SIGMOD International Conference on Management of Data . 1998
[4]  
A density based algo-rithm of discovering clusters in large spatial databases with noise. Ester M,Kriegel HP,Sander J,et al. Pro-ceedings of the2nd International Conference on Knowledge Dis-covery and Data Mining . 1996
[5]  
Clustering validity as-sessment:finding the optimal partitioning of a data set. Maria Halkidi,Michalis Vazirgiannis. ICDM . 2001
[6]  
k-LDCHD:local density based k-Neighborhood clustering algorithm for high and over-high dimensional space. Ni Wei-wei,Sun Zhi-hui,Lu Jie-ping. Computer Research and Develop-ment . 2005
[7]  
Streaming-data algorithms for high-quality clustering. Liadan O’Callaghan,Nina Mishra,Adam Meyerson,Sudipto Guha,Rajeev Motwani. Proceedings of IEEE International Conference on Data Engineering . 2002
[8]  
STING:a statistical information grid approach to spatial data mining. Wang W,Yang J,Muntz R. Proc.Int.Conf.on Very Large Databases(VLDB’97) . 1997
[9]  
Clustering data streams. Guha S,Mishra N,Motwani R. Proceedings of the Annual Symposium on Foundations of Computer Science . 2000