Threshold protocol for the exchange of confidential medical data

被引:4
作者
Berman J.J. [1 ]
机构
[1] Cancer Diagnosis Program, National Cancer Institute, Bethesda, MD
关键词
Original Text; Unify Medical Language System; Confidential Information; Threshold Algorithm; Negotiation Protocol;
D O I
10.1186/1471-2288-2-12
中图分类号
学科分类号
摘要
Background: Medical researchers often need to share clinical data without violating patient confidentiality. Threshold cryptographic protocols divide messages into multiple pieces, no single piece containing information that can reconstruct the original message. The author describes and implements a novel threshold protocol that can be used to search, annotate or transform confidential data without breaching patient confidentiality. Methods: The basic threshold protocol is: 1) Text is divided into short phrases; 2) Each phrase is converted by a one-way hash algorithm into a seemingly-random set of characters; 3) Threshold Piece 1 is composed of the list of all phrases, with each phrase followed by its one-way hash; 4) Threshold Piece 2 is composed of the text with all phrases replaced by their one-way hash values, and with high-frequency words preserved. Neither Piece 1 nor Piece 2 contains information linking patients to their records. The original text can be re-constructed from Piece 1 and Piece 2. Results: The threshold algorithm produces two files (threshold pieces). In typical usage, Piece 2 is held by the data owner, and Piece 1 is freely distributed. Piece 1 can be annotated and returned to the owner of the original data to enhance the complete data set. Collections of Piece 1 files can be merged and distributed without identifying patient records. Variations of the threshold protocol are described. The author's Perl implementation is freely available. Conclusions: Threshold files are safe in the sense that they are de-identified and can be used for research purposes. The threshold protocol is particularly useful when the receiver of the threshold file needs to obtain certain concepts or data-types found in the original data, but does not need to fully understand the original data set.
引用
收藏
页码:1 / 6
页数:5
相关论文
共 11 条
[1]  
Federal Register, 56, pp. 28003-28032, (1991)
[2]  
Federal Register, 67, pp. 53181-53273, (2002)
[3]  
Schneier B., Applied Cryptography: Protocols, Algorithms and Source Code in C, (1994)
[4]  
PubMed Help
[5]  
Moore G.W., Miller R.E., Hutchins G.M., Indexing by MeSH titles of natural language pathology phrases identified on first encounter using the barrier word method, Computerized Natural Medical Language Processing for Knowledge Representation, pp. 29-39, (1989)
[6]  
NIH Draft Statement on Sharing Research Data, (2002)
[7]  
(2002)
[8]  
Sweeney L., Replacing personally-identifying information in medical records, the scrub system, Proceedings, Journal of the American Medical Informatics Association, pp. 333-337, (1996)
[9]  
Moore G.W., Berman J.J., Anatomic pathology data mining, Medical Data Mining and Knowledge Discovery, (2000)
[10]  
Quantin C., Bouzelat H., Allaert F.A., Benhamiche A.M., Faivre J., Dusserre L., Automatic record hash coding and linkage for epidemiological followup data confidentiality, Methods Inf Med, 37, pp. 271-277, (1998)