A holistic paradigm for large scale schema matching.

被引:17
作者
He, B [1 ]
Chang, KCC [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding pairwise-attribute correspondences in isolation. In contrast, we propose a new matching paradigm, holistic schema matching, to match many schemas at the same time and find all matchings at once. By handling a set of schemas together, we can explore their context information that reflects the semantic correspondences among attributes. Such information is not available when schemas are matched only in pairs. As the realizations of holistic schema matching, we develop two alternative approaches: global evaluation and local evaluation. Global evaluation exhaustively assesses all possible "models," where a model expresses all attribute matchings. In particular, we propose the MGS framework for such global evaluation, building upon the hypothesis of the existence of a hidden schema model that probabilistically generates the schemas we observed. On the other hand, local evaluation independently assesses every single matching to incrementally construct such a model. In particular, we develop the DCM framework for local evaluation, building upon the observation that co-occurrence patterns across schemas often reveal the complex relationships of attributes. We apply our approaches to match query interfaces on the deep Web. The result shows the effectiveness of both the MGS and DCM approaches, which together demonstrate the promise of holistic schema matching.
引用
收藏
页码:20 / 25
页数:6
相关论文
共 17 条
[1]
BATINI C, 1986, COMPUT SURV, V18, P323, DOI 10.1145/27633.27634
[2]
Bickel PJ., 2001, Mathematical Statistics: Basic Ideas and Selected Topics
[3]
BRUNK HD, 1965, INTRO MATH STAT
[4]
Chang K. C.-C., 2004, SIGMOD RECORD, V33
[5]
CHANG KC, 2003, UIUC WEB INTEGRATION
[6]
HALEVY A, 2003, C INN DAT RES
[7]
HE B, 2003, UIUCDCSR20032388 DEP
[8]
HE B, 2003, SIGMOD C
[9]
HE B, 2004, SIGKDD C
[10]
LEE Y, 2004, SIGMOD C