Evaluating re-identification risks with respect to the HIPAA privacy rule

被引:176
作者
Benitez, Kathleen [1 ]
Malin, Bradley [1 ,2 ]
机构
[1] Vanderbilt Univ, Sch Med, Dept Biomed linformat, Nashville, TN 37203 USA
[2] Vanderbilt Univ, Sch Engn, Dept Elect Engn & Comp Sci, Nashville, TN 37203 USA
关键词
FRAMEWORK;
D O I
10.1136/jamia.2009.000026
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective Many healthcare organizations follow data protection policies that specify which patient identifiers must be suppressed to share "de-identified" records. Such policies, however, are often applied without knowledge of the risk of "re-identification". The goals of this work are: (1) to estimate re-identification risk for data sharing policies of the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule; and (2) to evaluate the risk of a specific re-identification attack using voter registration lists. Measurements We define several risk nietrics: (1) expected number of re-identifications; (2) estimated proportion of a population in a group of size g or less, and (31 monetary cost per re-identification. For each US state, we estimate the risk posed to hypothetical datasets, protected by the HIPAA Safe Harbor and Limited Dataset policies by an attacker with full knowledge of patient identifiers and with limited knowledge in the form of voter registries. Results The percentage of a state's population estimated to be vulnerable to unique re-identification lie, g=1) when protected via Safe Harbor and Limited Datasets ranges from 0.01% to 0.25% and 10% to 60%, respectively. In the voter attack, this number drops for many states, and for some states is 0%, due to the variable availability of voter registries in the real world. We also find that re-identification cost ranges from $0 to $17000, further confirming risk variability. Conclusions This work illustrates that blanket protection policies, such as Safe Harbor, leave different organizations vulnerable to re-identification at different rates. It provides justification for locally performed re-identification risk estimates prior to sharing data.
引用
收藏
页码:169 / 177
页数:9
相关论文
共 32 条
[1]   Securing electronic health records without impeding the flow of information [J].
Agrawal, Rakesh ;
Johnson, Christopher .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2007, 76 (5-6) :471-479
[2]  
Alexander Kim., 2004, Voter privacy in the digital age
[3]  
[Anonymous], AM FACTFINDER
[4]   Stimulating the Adoption of Health Information Technology. [J].
Blumenthal, David .
NEW ENGLAND JOURNAL OF MEDICINE, 2009, 360 (15) :1477-1479
[5]  
Department of Health and Human Services, 2002, Presidential Determination
[6]   Evaluating Predictors of Geographic Area Population Size Cut-offs to Manage Re-identification Risk [J].
El Emam, Khaled ;
Brown, Ann ;
AbdelMalik, Philip .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2009, 16 (02) :256-266
[7]   Anonymizing classification data for privacy preservation [J].
Fung, Benjamin C. M. ;
Wang, Ke ;
Yu, Philip S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (05) :711-725
[8]   k-Anonymization with Minimal Loss of Information [J].
Gionis, Aristides ;
Tassa, Tamir .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (02) :206-219
[9]  
GOLAB A, 2007, CHICAGO SUN TIM 0123, P13
[10]  
Golle P., 2006, Proceedings of the 5th ACM workshop on Privacy in electronic society, P77, DOI DOI 10.1145/1179601.1179615