On Evolutionary Spectral Clustering

被引:96
作者
Chi, Yun [1 ]
Song, Xiaodan [2 ]
Zhou, Dengyong [3 ]
Hino, Koji [1 ]
Tseng, Belle L. [4 ]
机构
[1] NEC Labs Amer, Cupertino, CA 95014 USA
[2] Google Inc, Mountain View, CA 94043 USA
[3] Microsoft Res, Redmond, WA 98052 USA
[4] YAHOO Inc, Santa Clara, CA 95054 USA
关键词
Algorithms; Experimentation; Measurement; Theory;
D O I
10.1145/1631162.1631165
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Evolutionary clustering is an emerging research area essential to important applications such as clustering dynamic Web and blog contents and clustering data streams. In evolutionary clustering, a good clustering result should fit the current data well, while simultaneously not deviate too dramatically from the recent history. To fulfill this dual purpose, a measure of temporal smoothness is integrated in the overall measure of clustering quality. In this article, we propose two frameworks that incorporate temporal smoothness in evolutionary spectral clustering. For both frameworks, we start with intuitions gained from the well-known k-means clustering problem, and then propose and solve corresponding cost functions for the evolutionary spectral clustering problems. Our solutions to the evolutionary spectral clustering problems provide more stable and consistent clustering results that are less sensitive to short-term noises while at the same time are adaptive to long-term cluster drifts. Furthermore, we demonstrate that our methods provide the optimal solutions to the relaxed versions of the corresponding evolutionary k-means clustering problems. Performance experiments over a number of real and synthetic data sets illustrate our evolutionary spectral clustering methods provide more robust clustering results that are not sensitive to noise and can adapt to data drifts.
引用
收藏
页数:30
相关论文
共 34 条
  • [1] AGGARWAL C. C., 2003, P 12 VLDB C
  • [2] [Anonymous], 2006, Advances in Neural Information Processing Systems, DOI DOI 10.1145/1117454.1117459
  • [3] Asur S, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P913
  • [4] Chakrabarti D., 2006, P 12 ACM SIGKDD INT, P554, DOI DOI 10.1145/1150402.1150467
  • [5] CHAPIKAR M., 1997, P 29 STOC C
  • [6] Chatfield C., 2003, The analysis of time series: An introduction
  • [7] Chi Y, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P153
  • [8] Chung F., 1997, Spectral graph theory
  • [9] DE LATHAUWER L., 2000, SIAM J MATRIX ANAL A, V21, P4
  • [10] DHILLON I. S., 2004, P 10 ACM SIGKDD C