Evaluating Predictors of Geographic Area Population Size Cut-offs to Manage Re-identification Risk

被引:41
作者
El Emam, Khaled [1 ,2 ]
Brown, Ann
AbdelMalik, Philip [3 ]
机构
[1] Childrens Hosp Eastern Ontario, Res Inst, Ottawa, ON K1H 8L1, Canada
[2] Univ Ottawa, Fac Med, Ottawa, ON, Canada
[3] Publ Hlth Agcy Canada, GIS Infrastruct, Off Publ Hlth Practice, Ottawa, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
INFORMATION-SYSTEMS; PUBLIC-HEALTH; CONSENT; DISCLOSURE; PATIENT; BIAS; CONFIDENTIALITY; IDENTIFICATION; COLLECTION; MICRODATA;
D O I
10.1197/jamia.M2902
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: In public health and health services research, the inclusion of geographic information in data sets is critical. Because of concerns over the re-identification of patients, data from small geographic areas are either suppressed or the geographic areas are aggregated into larger ones. Our objective is to estimate the population size cut-off at which a geographic area is sufficiently large so that no data suppression or further aggregation is necessary. Design: The 2001 Canadian census data were used to conduct a simulation to model the relationship between geographic area population size and uniqueness for some common demographic variables. Cut-offs were computed for geographic area population size, and prediction models were developed to estimate the appropriate cut-offs. Measurements: Re-identification risk was measured using uniqueness. Geographic area population size cut-offs were estimated using the maximum number of possible values in the data set and a traditional entropy measure. Results: The model that predicted population cut-offs using the maximum number of possible values in the data set had R-2 values around 0.9, and relative error of prediction less than 0.02 across all regions of Canada. The models were then applied to assess the appropriate geographic area size for the prescription records provided by retail and hospital pharmacies to commercial research and analysis firms. Conclusions: To manage re-identification risk, the prediction models can be used by public health professionals, health researchers, and research ethics boards to decide when the geographic area population size is sufficiently large.
引用
收藏
页码:256 / 266
页数:11
相关论文
共 66 条
[1]   Bias from. requiring explicit consent from all participants in observational research: Prospective, population based study [J].
Al-Shahi, R ;
Vousden, C ;
Warlow, C .
BRITISH MEDICAL JOURNAL, 2005, 331 (7522) :942-945
[2]  
[Anonymous], 2004, Introduction to Machine Learning
[3]  
[Anonymous], 1986, LTD DEPENDENT QUALIT
[4]  
[Anonymous], 1997, REGRESSION MODELS CA
[5]  
[Anonymous], DATA ANONYMIZATION P
[6]   Potential impact of the HIPAA privacy rule on data collection in a registry of patients with acute coronary syndrome [J].
Armstrong, D ;
Kline-Rogers, E ;
Jani, SM ;
Goldman, EB ;
Fang, JM ;
Mukherjee, D ;
Nallamothu, BK ;
Eagle, KA .
ARCHIVES OF INTERNAL MEDICINE, 2005, 165 (10) :1125-1129
[7]  
Armstrong MP, 1999, STAT MED, V18, P497, DOI 10.1002/(SICI)1097-0258(19990315)18:5<497::AID-SIM45>3.3.CO
[8]  
2-R
[9]   DISCLOSURE CONTROL OF MICRODATA [J].
BETHLEHEM, JG ;
KELLER, WJ ;
PANNEKOEK, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1990, 85 (409) :38-45
[10]  
Boulos M.N.K., 2004, International Journal of Health Geographies, V3