Dare to share: Protecting sensitive knowledge with data sanitization

被引:75
作者
Amiri, Ali [1 ]
机构
[1] Oklahoma State Univ, Dept MSIS, Coll Business, Stillwater, OK 74078 USA
关键词
data mining; sensitive knowledge protection; data sanitization; data utility;
D O I
10.1016/j.dss.2006.08.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data sanitization is a process that is used to promote sharing of transactional databases among organizations while alleviating concerns of individual organizations by preserving confidentiality of their sensitive knowledge in the form of sensitive association rules. It hides the frequent itemsets corresponding to the sensitive association rules that contain sensitive knowledge by modifying the sensitive transactions that contain those itemsets. This process is guided by the need to minimize the impact on the data utility of the sanitized database by allowing mining as much as possible of the non-sensitive knowledge in the form non-sensitive association rules from the sanitized database. We propose three heuristic approaches for the sanitization problem. Results from extensive tests conducted on publicly available real datasets indicate that the approaches are effective and outperform a previous approach in terms of data utility at the expense of computational speed. The proposed approaches sanitize also the databases with great data accuracy, thus resulting in little distortion of the released databases. We recommend that the database owner sanitize the database using the third proposed hybrid approach. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:181 / 191
页数:11
相关论文
共 26 条
[1]  
Agrawal R., 1994, PROC 20 INT C VERY L
[2]  
[Anonymous], P ACM SIGMOD INT C M
[3]  
Atallah M, 1999, P 1999 WORKSH KNOWL, P45, DOI [DOI 10.1109/KDEX.1999.836532, 10.1109/KDEX.1999.836532]
[4]   The influence of communication mode and incentive structure on GDSS process and outcomes [J].
Barkhi, R ;
Jacob, VS ;
Pirkul, H .
DECISION SUPPORT SYSTEMS, 2004, 37 (02) :287-305
[5]   A study of the effect of communication channel and authority on group decision processes and outcomes [J].
Barkhi, R ;
Jacob, VS ;
Pipino, L ;
Pirkul, H .
DECISION SUPPORT SYSTEMS, 1998, 23 (03) :205-226
[6]  
Berry MichaelJ., 1997, DATA MINING TECHNIQU
[7]   Efficient data mining for path traversal patterns [J].
Chen, MS ;
Park, JS ;
Yu, PS .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1998, 10 (02) :209-221
[8]  
Goethals B., 2003, CEUR Workshop Proceedings, V90
[9]  
*ILOG INC, 2005, CPLEX 9 0 US MAN
[10]  
LIMAYEM M, 2005, IN PRESS IMPACT GDSS