An empirical comparison of record linkage procedures

被引:78
作者
Gomatam, S
Carter, R
Ariet, M
Mitchell, G
机构
[1] Univ S Florida, Tampa, FL 33620 USA
[2] Univ Florida, Div Biostat, Gainesville, FL USA
[3] Univ Florida, Div Comp Sci, Dept Med, Gainesville, FL USA
[4] USF, Florida Policy Exchange Ctr Aging, State Data Ctr Aging, Tampa, FL USA
关键词
exact matching; hierarchical linkage strategies; document linkage; probabilistic matching; AUTOMATCH; stepwise deterministic linkage;
D O I
10.1002/sim.1147
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We consider the problem of record linkage in the situation where we have only non-unique identifiers, like names, sex, race etc., as common identifiers in databases to be linked. For such situations much work on probabilistic methods of record linkage can be found in the statistical literature. However, although many groups undoubtedly still use deterministic procedures, not much literature is available on deterministic strategies. Furthermore, there appears to exist almost no documentation on the comparison of results for the two strategies. In this work we compare a stepwise deterministic linkage strategy with a probabilistic strategy, as implemented in AUTOMATCH, for a situation in which the truth is known. The comparison was carried out on a linkage between medical records from the Regional Perinatal Intensive Care Centers database and educational records from the Florida Department of Education. Social security numbers, available in both databases, were used to decide the true status of each record pair after matching. Match rates and error rates for the two strategies are compared and a discussion of their similarities and differences, strengths and weaknesses is presented. Copyright (C) 2002 John Wiley Sons, Ltd.
引用
收藏
页码:1485 / 1496
页数:12
相关论文
共 37 条
[1]  
ACHESON ED, 1967, MED RECORD LINKAGE
[2]  
ALVEY W, 1997, RECORD LINKAGE METHO
[3]  
Armstrong J., 1992, P SECT SURV RES METH, P853
[4]  
BELIN TR, 1995, J AM STAT ASSOC, V90, P694
[5]   RECORD LINKAGE - STATISTICAL-MODELS FOR MATCHING COMPUTER RECORDS [J].
COPAS, JB ;
HILTON, FJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1990, 153 :287-320
[6]  
DANDREADUBOIS NS, 1969, J AM STAT ASSOC, V64, P163
[7]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[8]  
FAIR ME, 1995, P SECT SOC STAT AM S, P25
[9]   A THEORY FOR RECORD LINKAGE [J].
FELLEGI, IP ;
SUNTER, AB .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1969, 64 (328) :1183-&
[10]  
Gill LE, 1987, TXB MEDICAL RECORD L