Boosting and Differential Privacy

被引:503
作者
Dwork, Cynthia [1 ]
Rothblum, Guy N. [2 ,3 ]
Vadhan, Salil [4 ,5 ]
机构
[1] Microsoft Res, 1065 La Ave, Mountain View, CA 94043 USA
[2] Princeton Univ, Ctr Computat Intractabil, Princeton, NJ 08544 USA
[3] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
[4] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA
[5] Harvard Univ, Ctr Res Computat & Soc, Cambridge, MA 02138 USA
来源
2010 IEEE 51ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE | 2010年
基金
美国国家科学基金会;
关键词
D O I
10.1109/FOCS.2010.12
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved privacy-preserving synopses of an input database. These are data structures that yield, for a given set Q of queries over an input database, reasonably accurate estimates of the responses to every query in Q, even when the number of queries is much larger than the number of rows in the database. Given a base synopsis generator that takes a distribution on Q and produces a "weak" synopsis that yields "good" answers for a majority of the weight in Q, our Boosting for Queries algorithm obtains a synopsis that is good for all of Q. We ensure privacy for the rows of the database, but the boosting is performed on the queries. We also provide the first synopsis generators for arbitrary sets of arbitrary low-sensitivity queries, i.e., queries whose answers do not vary much under the addition or deletion of a single row. In the execution of our algorithm certain tasks, each incurring some privacy loss, are performed many times. To analyze the cumulative privacy loss, we obtain an O(epsilon(2)) bound on the expected privacy loss from a single e-differentially private mechanism. Combining this with evolution of confidence arguments from the literature, we get stronger bounds on the expected cumulative privacy loss due to multiple mechanisms, each of which provides epsilon-differential privacy or one of its relaxations, and each of which operates on (potentially) different, adaptively chosen, databases.
引用
收藏
页码:51 / 60
页数:10
相关论文
共 26 条
[1]  
[Anonymous], 2006, Elements of Information Theory
[2]  
[Anonymous], 2003, NONLINEAR ESTIMATION
[3]  
[Anonymous], 2005, PODS
[4]  
[Anonymous], 2003, P 22 ACM SIGMOD SIGA
[5]  
Bellare M, 1998, LECT NOTES COMPUT SC, V1462, P26, DOI 10.1007/BFb0055718
[6]  
Blum A., 2008, P 40 ACM SIGACT S TH
[7]  
Dwork C, 2004, LECT NOTES COMPUT SC, V3152, P528
[8]  
Dwork C., 2009, P 2009 INT ACM S THE
[9]  
Dwork C., COMMUNICATI IN PRESS
[10]  
Dwork C, 2008, LECT NOTES COMPUT SC, V4890, P1