How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems

被引:142
作者
Malin, B [1 ]
Sweeney, L [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Data Privacy Lab, Pittsburgh, PA 15213 USA
基金
美国安德鲁·梅隆基金会;
关键词
privacy; anonymity; re-identification; genomics; DNA databases;
D O I
10.1016/j.jbi.2004.04.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The increasing integration of patient-specific genomic data into clinical practice and research raises serious privacy concerns. Various systems have been proposed that protect privacy by removing or encrypting explicitly identifying information, such as name or social security number, into pseudonyms. Though these systems claim to protect identity from being disclosed, they lack formal proofs. In this paper, we study the erosion of privacy when genomic data, either pseudonymous or data believed to be anonymous, are released into a distributed healthcare environment. Several algorithms are introduced, collectively called RE-Identification of Data In Trails (REIDIT), which link genomic data to named individuals in publicly available records by leveraging unique features in patient-location visit patterns. Algorithmic proofs of re-identification are developed and we demonstrate, with experiments on real-world data, that susceptibility to re-identification is neither trivial nor the result of bizarre isolated occurrences. We propose that such techniques can be applied as system tests of privacy protection capabilities. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:179 / 192
页数:14
相关论文
共 27 条
[1]   Challenges for biomedical informatics and pharmacogenomics [J].
Altman, RB ;
Klein, TE .
ANNUAL REVIEW OF PHARMACOLOGY AND TOXICOLOGY, 2002, 42 :113-133
[2]  
[Anonymous], 1997, REC PROT EL HLTH INF
[3]   Probabilistic record linkage and a method to calculate the positive predictive value [J].
Blakely, T ;
Salmond, C .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2002, 31 (06) :1246-1252
[4]  
De Moor GJE, 2003, METHOD INFORM MED, V42, P148
[5]  
Dreiseitl S, 2001, J AM MED INFORM ASSN, P144
[6]  
Dugas Martin, 2002, In Silico Biology, V2, P383
[7]  
Federal Committee on Statistical Methodology, 1994, 22 FED COMM STAT MET
[8]   Procedure to protect confidentiality of familial data in community genetics and genomic research [J].
Gaudet, D ;
Arsenault, S ;
Bélanger, C ;
Hudson, T ;
Perron, P ;
Bernard, M ;
Hamet, P .
CLINICAL GENETICS, 1999, 55 (04) :259-264
[9]  
Grannis SJ, 2002, AMIA 2002 SYMPOSIUM, PROCEEDINGS, P305
[10]   Protection of privacy by third-party encryption in genetic research in Iceland [J].
Gulcher, JR ;
Kristjánsson, K ;
Gudbjartsson, H ;
Stefánsson, K .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2000, 8 (10) :739-742