Matching large schemas: Approaches and evaluation

被引:121
作者
Do, Hong-Hai
Rahm, Erhard
机构
[1] Univ Leipzig, Interdisciplinary Ctr Bioinformat, D-04107 Leipzig, Germany
[2] Univ Leipzig, Dept Comp Sci, D-04109 Leipzig, Germany
关键词
schema matching; schema matching evaluation; data integration; schema integration; model management;
D O I
10.1016/j.is.2006.09.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current schema matching approaches still have to improve for large and complex Schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high expressive power and versatility of modern schema languages, in particular user-defined types and classes, component reuse capabilities, and support for distributed schemas and namespaces. To better assist the user in matching complex schemas, we have developed a new generic schema matching tool, COMA + +, providing a library of individual matchers and a flexible infrastructure to combine the matchers and refine their results. Different match strategies can be applied including a new scalable approach to identify context-dependent correspondences between schemas with shared elements and a fragment-based match approach which decomposes a large match task into smaller tasks. We conducted a comprehensive evaluation of the match strategies using large e-Business standard schemas. Besides providing helpful insights for future match implementations, the evaluation demonstrated the practicability of our system for matching large schemas. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:857 / 885
页数:29
相关论文
共 60 条
[1]  
[Anonymous], 2002, P 18 INT C DAT ENG I
[2]  
AUMULLER D, 2005, P 24 ACM SIGMOD INT
[3]  
AVESANI P, 2005, P 4 INT SEM WEB C IS
[4]  
BENEVENTANO D, 2003, IEEE INTERNET CO SEP
[5]   Semantic integration of heterogeneous information sources [J].
Bergamaschi, S ;
Castano, S ;
Vincini, M ;
Beneventano, D .
DATA & KNOWLEDGE ENGINEERING, 2001, 36 (03) :215-249
[6]  
BERLIN J, 2002, P 14 INT C ADV INF S
[7]  
BERLIN J, 2001, P 9 INT C COOP INF S
[8]  
Bernstein P. A., 2001, P 27 INT C VER LARG, P49
[9]  
BERNSTEIN PA, 2004, ACM SIGMOD REC, V33
[10]  
BILKE A, 2005, P 21 INT C DAT ENG I