Similarity flooding: A versatile graph matching algorithm and its application to schema matching

被引:515
作者
Melnik, S [1 ]
Garcia-Molina, H [1 ]
Rahm, E [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
来源
18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 2002年
关键词
D O I
10.1109/ICDE.2002.994702
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping; is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results. As a matter of fact, we evaluate the `accuracy' of the algorithm by counting the number of needed adjustments. We conducted a user study, in which our accuracy metric was used to estimate the labor savings that the users could obtain by utilizing our algorithm to obtain an initial matching. Finally, we illustrate how our matching algorithm is deployed as one of several high-level operators in an implemented testbed for managing information models and mappings.
引用
收藏
页码:117 / 128
页数:12
相关论文
共 18 条
[1]  
Bernstein PA, 2000, SIGMOD REC, V29, P55, DOI 10.1145/369275.369289
[2]   Microsoft repository version 2 and the open information model [J].
Bernstein, PA ;
Bergstraesser, T ;
Carlson, J ;
Pal, S ;
Sanders, P ;
Shutt, D .
INFORMATION SYSTEMS, 1999, 24 (02) :71-98
[3]  
BRIN S, 1998, P WWW7 C COMP NETW
[4]  
CHAWATHE S, 1997, P ACM SIGMOD INT C M, P26
[5]  
COHEN WW, 1998, P 1998 ACM SIGMOD IN, P201
[6]  
DOAN A, 2001, P SIGMOD 2001
[7]  
Gusfield D., 1989, STABLE MARRIAGE PROB
[8]  
KANEHISA M, 2000, POST GENOME INFORATI
[9]  
Lassila O, 1998, RESOURCE DESCRIPTION
[10]  
LI WS, 2000, T DATA KNOWLEDGE ENG, P49