Online clustering of parallel data streams

被引:137
作者
Beringer, Juergen [1 ]
Huellermeier, Eyke [1 ]
机构
[1] Otto Von Guericke Univ, Fak Informat, Magdeburg, Germany
关键词
data mining; clustering; data streams; fuzzy sets;
D O I
10.1016/j.datak.2005.05.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the management and processing of so-called data streams has become a topic of active research in several fields of computer science such as, e.g., distributed systems, database systems, and data mining. A data stream can roughly be thought of as a transient, continuously increasing sequence of time-stamped data. In this paper, we consider the problem of clustering parallel streams of real-valued data, that is to say, continuously evolving time series. In other words, we are interested in grouping data streams the evolution over time of which is similar in a specific sense. In order to maintain an up-to-date clustering structure, it is necessary to analyze the incoming data in an online manner, tolerating not more than a constant time delay. For this purpose, we develop an efficient online version of the classical K-means clustering algorithm. Our method's efficiency is mainly due to a scalable online transformation of the original data which allows for a fast computation of approximate distances between streams. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:180 / 204
页数:25
相关论文
共 33 条
  • [1] [Anonymous], 1999, Fuzzy Cluster Analysis
  • [2] [Anonymous], 2001, P 18 INT C MACH LEAR
  • [3] [Anonymous], 29 INT C VER LARG DA
  • [4] [Anonymous], Pattern Recognition With Fuzzy Objective Function Algorithms
  • [5] Babcock B., 2002, Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), P1, DOI DOI 10.1145/543613.543615
  • [6] BERCKEN J, 2001, PROC VLDB ENDOW, P39
  • [7] CHERNIACK M, 2003, P CIDR 03 1 BIENN C
  • [8] CONSIDINE J, 2004, ICDE 04 20 IEEE INT
  • [9] Cormode G., 2003, ACM Transactions on Database Systems (TODS), P296, DOI DOI 10.1145/1061318.1061325
  • [10] DAS A, 2003, P 2003 ACM SIGMOD IN, P40