Privacy-preserving clustering with distributed EM mixture modeling

被引:106
作者
Lin, XD
Clifton, C
Zhu, M
机构
[1] Univ Cincinnati, Dept Math Sci, Cincinnati, OH 45221 USA
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
关键词
privacy; security; clustering;
D O I
10.1007/s10115-004-0148-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
Privacy and security considerations can prevent sharing of data,, derailing data mining projects. Distributed knowledge discovery can alleviate this problem. We present a technique that uses EM mixture modeling to perform clustering on distributed data. This method controls data sharing, preventing disclosure of individual data items or any results that can be traced to an individual site.
引用
收藏
页码:68 / 81
页数:14
相关论文
共 21 条
[1]
Agrawal D., 2001, Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, P247, DOI DOI 10.1145/375551.375602
[2]
[Anonymous], 2000, WILEY SERIES PROBABI
[3]
[Anonymous], 2000, Privacy-preserving data mining, DOI DOI 10.1145/342009.335438
[4]
MODEL-BASED GAUSSIAN AND NON-GAUSSIAN CLUSTERING [J].
BANFIELD, JD ;
RAFTERY, AE .
BIOMETRICS, 1993, 49 (03) :803-821
[5]
BENALOH JC, 1987, LECT NOTES COMPUT SC, V263, P251
[6]
Blackmer S, 1998, PRIV AM BUS M MOD DA
[7]
Stochastic versions of the EM algorithm: An experimental study in the mixture case [J].
Celeux, G ;
Chauveau, D ;
Diebolt, J .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1996, 55 (04) :287-314
[8]
MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]
How many clusters? Which clustering method? Answers via model-based cluster analysis [J].
Fraley, C ;
Raftery, AE .
COMPUTER JOURNAL, 1998, 41 (08) :578-588
[10]
Goldreich Oded, 1987, P 19 ANN ACM S THEOR, DOI DOI 10.1145/28395.28420