False Negative Problem of Counting Bloom Filter

被引:45
作者
Guo, Deke [1 ]
Liu, Yunhao [2 ]
Li, Xiangyang [3 ,4 ]
Yang, Panlong [5 ]
机构
[1] Natl Univ Def Technol, Sch Informat Syst & Management, Key Lab Technol C4ISR, Changsha 410073, Hunan, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci, Hong Kong, Peoples R China
[3] Hangzhou Dianzi Univ, Inst Comp Applicat Technol, Hangzhou, Zhejiang, Peoples R China
[4] IIT, Dept Comp Sci, Chicago, IL 60616 USA
[5] PLA Univ Sci & Technol, Inst Commun Engn, Nanjing, Jiangsu, Peoples R China
基金
国家高技术研究发展计划(863计划);
关键词
Bloom filter; false negative; multichoice counting Bloom filter;
D O I
10.1109/TKDE.2009.209
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bloom filter is effective, space-efficient data structure for concisely representing a data set and supporting approximate membership queries. Traditionally, researchers often believe that it is possible that a Bloom filter returns a false positive, but it will never return a false negative under well-behaved operations. By investigating the mainstream variants, however, we observe that a Bloom filter does return false negatives in many scenarios. In this work, we show that the undetectable incorrect deletion of false positive items and detectable incorrect deletion of multiaddress items are two general causes of false negative in a Bloom filter. We then measure the potential and exposed false negatives theoretically and practically. Inspired by the fact that the potential false negatives are usually not fully exposed, we propose a novel Bloom filter scheme, which increases the ratio of bits set to a value larger than one without decreasing the ratio of bits set to zero. Mathematical analysis and comprehensive experiments show that this design can reduce the number of exposed false negatives as well as decrease the likelihood of false positives. To the best of our knowledge, this is the first work dealing with both the false positive and false negative problems of Bloom filter systematically when supporting standard usages of item insertion, query, and deletion operations.
引用
收藏
页码:651 / 664
页数:14
相关论文
共 28 条
[1]   Scalable Bloom Filters [J].
Almeida, Paulo Sergio ;
Baquero, Carlos ;
Preguica, Nuno ;
Hutchison, David .
INFORMATION PROCESSING LETTERS, 2007, 101 (06) :255-261
[2]  
BENOIT D, 2006, P ACM C EM NETW EXP
[3]   SPACE/TIME TRADE/OFFS IN HASH CODING WITH ALLOWABLE ERRORS [J].
BLOOM, BH .
COMMUNICATIONS OF THE ACM, 1970, 13 (07) :422-&
[4]  
Bonomi F., 2006, P ACM SIGCOMM, P315
[5]  
Broder Andrei, 2002, Internet mathematics, P636, DOI DOI 10.1080/15427951.2004.10129096
[6]  
Chazelle Bernard, 2004, P 15 ANN ACM SIAM S, P30
[7]  
Cohen S., 2003, P ACM INT C MAN DAT, P241, DOI [10.1145/872757.872787, DOI 10.1145/872757.872787]
[8]  
DENG F, 2006, P 25 ACM SIGMOD
[9]   Summary cache: A scalable wide-area Web cache sharing protocol [J].
Fan, L ;
Cao, P ;
Almeida, J ;
Broder, AZ .
IEEE-ACM TRANSACTIONS ON NETWORKING, 2000, 8 (03) :281-293
[10]  
FORSGREN D, 2010, OBJECTIVE END TO END