Clustering distributed data streams in peer-to-peer environments

被引:80
作者
Bandyopadhyay, Sanghamitra [1 ]
Giannella, Chris [1 ]
Maulik, Ujjwal [1 ]
Kargupta, Hillol [1 ]
Liu, Kun [1 ]
Datta, Souptik [1 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Comp Sci & Elect Engn, Baltimore, MD 21250 USA
关键词
data mining; data streams; cluster analysis; peer-to-peer;
D O I
10.1016/j.ins.2005.11.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a technique for clustering homogeneously distributed data in a peer-to-peer environment like sensor networks. The proposed technique is based on the principles of the K-Means algorithm. It works in a localized asynchronous manner by communicating with the neighboring nodes. The paper offers extensive theoretical analysis of the algorithm that bounds the error in the distributed clustering process compared to the centralized approach that requires downloading all the observed data to a single site. Experimental results show that, in contrast to the case when all the data is transmitted to a central location for application of the conventional clustering algorithm, the communication cost (an important consideration in sensor networks which are typically equipped with limited battery power) of the proposed approach significantly smaller. At the same time, the accuracy of the obtained centroids is high and the number of samples which are incorrectly labeled is also small. (C) 2005 Elsevier Inc. All rights reserved.
引用
收藏
页码:1952 / 1985
页数:34
相关论文
共 54 条
[1]  
Aggarwal C.C., 2003, P 29 INT C VER LARG, P81, DOI DOI 10.1016/B978-012722442-8/50016-1
[2]   A survey on sensor networks [J].
Akyildiz, IF ;
Su, WL ;
Sankarasubramaniam, Y ;
Cayirci, E .
IEEE COMMUNICATIONS MAGAZINE, 2002, 40 (08) :102-114
[3]   Wireless sensor networks: a survey [J].
Akyildiz, IF ;
Su, W ;
Sankarasubramaniam, Y ;
Cayirci, E .
COMPUTER NETWORKS, 2002, 38 (04) :393-422
[4]  
[Anonymous], 2002, J. Mach. Learn. Res
[5]  
[Anonymous], SIGKDD EXPLOR NEWSL
[6]  
[Anonymous], 2001, P 18 INT C MACH LEAR
[7]  
BABCOCK B, 2003, ACM SIGMOD PRINC DAT
[8]  
CERPA A, 2002, P 21 INT ANN JOINT C
[9]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[10]   Dynamic clustering for acoustic target tracking in wireless sensor networks [J].
Chen, WP ;
Hou, JC ;
Sha, L .
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2004, 3 (03) :258-271