SWAT: Hierarchical stream summarization in large networks

被引:26
作者
Bulut, A [1 ]
Singh, AK [1 ]
机构
[1] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA
来源
19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 2003年
关键词
D O I
10.1109/ICDE.2003.1260801
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of statistics and aggregate maintenance over data streams has gained popularity in recent years especially in telecommunications network monitoring, trend-related analysis, web-click streams, stock tickers, and other time-variant data. The amount of data generated in such applications can become too large to store, or if stored too large to scan multiple times. We consider queries over data streams that are biased towards the more recent values. We develop a technique that summarizes a dynamic stream incrementally at multiple resolutions. This approximation can be used to answer point queries, range queries, and inner product queries. Moreover the precision of answers can be changed adoptively by a client. Later we extend the above technique to work in a distributed setting, specifically in a large network where a central site summarizes the stream and clients ask queries. We minimize the message overhead by deciding what and where to replicate by using an adaptive replication scheme. We maintain a hierarchy of approximations that change adaptively based on the query and update rates. We show experimentally that our technique performs better than existing techniques: up to 50 times better in terms of approximation quality, up to four orders of magnitude times better in response time, and up to five times better in terms of message complexity.
引用
收藏
页码:303 / 314
页数:12
相关论文
共 18 条
[1]  
Alon N., 1996, Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, P20, DOI 10.1145/237814.237823
[2]  
[Anonymous], P IEEE PDIS SEP
[3]  
Babcock B., 2002, Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), P1, DOI DOI 10.1145/543613.543615
[4]  
BULUT A, 2002, SWAT HIERARCHICAL ST
[5]  
Datar M, 2002, SIAM PROC S, P635
[6]  
DOBRA A, 2002, ACM SIGMOD, P61
[7]  
Gehrke J, 2001, SIGMOD REC, V30, P13, DOI 10.1145/376284.375665
[8]  
GILBERT AC, 2001, VLDB J, P79
[9]   Approximating a data stream for querying and estimation: Algorithms and performance evaluation [J].
Guha, S ;
Koudas, N .
18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, :567-576
[10]   Clustering data streams [J].
Guha, S ;
Mishra, N ;
Motwani, R ;
O'Callaghan, L .
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :359-366