Linking temporal records

被引:4
作者
Li, Pei [1 ]
Dong, Xin Luna [2 ]
Maurino, Andrea [1 ]
Srivastava, Divesh [2 ]
机构
[1] Univ Milano Bicocca, Dept Informat Syst & Commun, I-20126 Milan, Italy
[2] AT&T Labs Res, Data Management Dept, Florham Pk, NJ 07932 USA
关键词
temporal data; record linkage; data integration; DATABASES;
D O I
10.1007/s11704-012-2002-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
Many data sets contain temporal records which span a long period of time; each record is associated with a time stamp and describes some aspects of a real-world entity at a particular time (e.g., author information in DBLP). In such cases, we often wish to identify records that describe the same entity over time and so be able to perform interesting longitudinal data analysis. However, existing record linkage techniques ignore temporal information and fall short for temporal data. This article studies linking temporal records. First, we apply time decay to capture the effect of elapsed time on entity value evolution. Second, instead of comparing each pair of records locally, we propose clustering methods that consider the time order of the records and make global decisions. Experimental results show that our algorithms significantly outperform traditional linkage methods on various temporal data sets.
引用
收藏
页码:293 / 312
页数:20
相关论文
共 23 条
[1]
Ananthakrishna R., 2002, Proceedings of the Twenty-eighth International Conference on Very Large Data Bases, P586
[2]
[Anonymous], 2003, Internet Mathematics, DOI [10.1080/15427951.2004.10129093, DOI 10.1080/15427951.2004.10129093]
[3]
[Anonymous], 2011, P 30 ACM SIGMOD SIGA, DOI DOI 10.1145/1989284.1989295
[4]
[Anonymous], 2004, PROC KDD 2004 WORKSH
[5]
[Anonymous], 2005, Proceedings of the 2nd international workshop on Information quality in information systems
[6]
Burdick D., 2011, IEEE Data Engineering Bulletin, V34, P60, DOI [10.2139/ssrn.2666384, DOI 10.2139/SSRN.2666384]
[7]
Maintaining time-decaying stream aggregates [J].
Cohen, E ;
Strauss, MJ .
JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2006, 59 (01) :19-36
[8]
Forward Decay: A Practical Time Decay Model for Streaming Systems [J].
Cormode, Graham ;
Shkapenyuk, Vladislav ;
Srivastava, Divesh ;
Xu, Bojian .
ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, :138-+
[9]
Entity matching in heterogeneous databases: A logistic regression approach [J].
Dey, Debabrata .
DECISION SUPPORT SYSTEMS, 2008, 44 (03) :740-747
[10]
Duplicate record detection: A survey [J].
Elmagarmid, Ahmed K. ;
Ipeirotis, Panagiotis G. ;
Verykios, Vassilios S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (01) :1-16