Semantic Disclosure Control: semantics meets data privacy

被引:7
作者
Batet, Montserrat [1 ]
Sanchez, David [2 ]
机构
[1] Univ Oberta Catalunya, Internet Interdisciplinary Inst IN3, Barcelona, Spain
[2] Univ Rovira & Virgili, Dept Comp Sci & Math, CYBERCAT Ctr Cybersecur Res Catalonia, UNESCO Chair Data Privacy, Tarragona, Spain
基金
欧盟地平线“2020”;
关键词
Semantics; Knowledge; Privacy; Personal data protection; ANONYMIZATION; REDACTION; PROTECT;
D O I
10.1108/OIR-03-2017-0090
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - To overcome the limitations of purely statistical approaches to data protection, the purpose of this paper is to propose Semantic Disclosure Control (SeDC): an inherently semantic privacy protection paradigm that, by relying on state of the art semantic technologies, rethinks privacy and data protection in terms of the meaning of the data. Design/methodology/approach - The need for data protection mechanisms able to manage data from a semantic perspective is discussed and the limitations of statistical approaches are highlighted. Then, SeDC is presented by detailing how it can be enforced to detect and protect sensitive data. Findings - So far, data privacy has been tackled from a statistical perspective; that is, available solutions focus just on the distribution of the data values. This contrasts with the semantic way by which humans understand and manage (sensitive) data. As a result, current solutions present limitations both in preventing disclosure risks and in preserving the semantics (utility) of the protected data. Practical implications - SeDC captures more general, realistic and intuitive notions of privacy and information disclosure than purely statistical methods. As a result, it is better suited to protect heterogenous and unstructured data, which are the most common in current data release scenarios. Moreover, SeDC preserves the semantics of the protected data better than statistical approaches, which is crucial when using protected data for research. Social implications - Individuals are increasingly aware of the privacy threats that the uncontrolled collection and exploitation of their personal data may produce. In this respect, SeDC offers an intuitive notion of privacy protection that users can easily understand. It also naturally captures the (non-quantitative) privacy notions stated in current legislations on personal data protection. Originality/value - On the contrary to statistical approaches to data protection, SeDC assesses disclosure risks and enforces data protection from a semantic perspective. As a result, it offers more general, intuitive, robust and utility-preserving protection of data, regardless their type and structure.
引用
收藏
页码:290 / 303
页数:14
相关论文
共 31 条
[1]  
Anandan B., 2011, 2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies, P253, DOI 10.1109/WI-IAT.2011.240
[2]  
Anandan B, 2012, TRANS DATA PRIV, V5, P505
[3]  
[Anonymous], 2012, Forbes2 April
[4]  
[Anonymous], DAT BROK CALL TRANSP
[5]  
Batet M., 2014, ENCY INFORM SCI TECH, P7575
[6]   Utility preserving query log anonymization via semantic microaggregation [J].
Batet, Montserrat ;
Erola, Arnau ;
Sanchez, David ;
Castella-Roca, Jordi .
INFORMATION SCIENCES, 2013, 242 :49-63
[7]   The Rules of Redaction Identify, Protect, Review (and Repeat) [J].
Bier, Eric ;
Chow, Richard ;
Golle, Philippe ;
King, Tracy Holloway ;
Staddon, Jessica .
IEEE SECURITY & PRIVACY, 2009, 7 (06) :46-53
[8]  
Chakaravarthy Venkatesan T., 2008, P 17 ACM C INF KNOWL, P843, DOI DOI 10.1145/1458082.1458194
[9]  
Chow R, 2008, P 14 ACM SIGKDD INT, P893, DOI DOI 10.1145/1401890.1401997
[10]  
Department of Health and Human Services, 2000, 65FR82462 DEP HHS