Learn more, sample less: Control of volume and variance in network measurement

被引:91
作者
Duffield, N [1 ]
Lund, C [1 ]
Thorup, M [1 ]
机构
[1] AT&T Labs Res, Shannon Lab, Florham Pk, NJ 07932 USA
关键词
estimation; flows; internet measurement; sampling; variance reduction;
D O I
10.1109/TIT.2005.846400
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper deals with sampling objects from a large stream. Each object possesses a size, and the aim is to be able to estimate the total size of an arbitrary subset of objects whose composition is not known at the time of sampling. This problem is motivated from network measurements in which the objects are flow records exported by routers and the sizes are the number of packet or bytes reported in the record. Subsets of interest could be flows from a certain customer or flows from a worm attack. This paper introduces threshold sampling as a sampling scheme that optimally controls the expected volume of samples and the variance of estimators over any classification of flows. It provides algorithms for dynamic control of sample volumes and evaluates them on flow data gathered from a commercial Internet Protocol (IP) network. The algorithms are simple to implement and robust to variation in network conditions. The work reported here has been applied in the measurement infrastructure of the commercial IP network. To not have employed sampling would have entailed an order of magnitude greater capital expenditure to accommodate the measurement traffic and its processing.
引用
收藏
页码:1756 / 1775
页数:20
相关论文
共 29 条
[1]  
[Anonymous], ACM SIGCOMM INT MEAS
[2]  
[Anonymous], P ACM SIGCOMM
[3]  
[Anonymous], P ACM SIGCOMM SEP
[4]  
[Anonymous], P ACM SIGCOMM 98 VAN
[5]  
Babcock B., 2002, Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), P1, DOI DOI 10.1145/543613.543615
[6]   Measurement and analysis of IP network usage and behavior [J].
Cáceres, R ;
Duffield, N ;
Feldmann, A ;
Friedmann, JD ;
Greenberg, A ;
Greer, R ;
Johnson, T ;
Kalmanek, CR ;
Krishnamurthy, B ;
Lavelle, D ;
Mishra, PP ;
Rexford, J ;
Ramakrishnan, KK ;
True, FD ;
van der Merwe, JE .
IEEE COMMUNICATIONS MAGAZINE, 2000, 38 (05) :144-151
[7]  
Claffy K. C., 1993, Computer Communication Review, V23, P194, DOI 10.1145/167954.166256
[8]   A PARAMETERIZABLE METHODOLOGY FOR INTERNET TRAFFIC FLOW PROFILING [J].
CLAFFY, KC ;
BRAUN, HW ;
POLYZOS, GC .
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1995, 13 (08) :1481-1494
[9]  
COCHRAN WG, 1987, SAMPLING TECHNIQUES
[10]  
Cormen T. H., 1990, INTRO ALGORITHMS