eTuner: tuning schema matching software using synthetic scenarios

被引:67
作者
Lee, Yoonkyong [1 ]
Sayyadian, Mayssam
Doan, AnHai
Rosenthal, Arnon S.
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Millipore Corp, Bedford, MA 01730 USA
关键词
schema matching; tuning; synthetic schemas; machine learning; compositional approach;
D O I
10.1007/s00778-006-0024-z
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
Most recent schema matching systems assemble multiple components, each employing a particular matching technique. The domain user mustthen tune the system: select the right component to be executed and correctly adjust their numerous "knobs" (e.g., thresholds, formula coefficients). Tuning is skill and time intensive, but (as we show) without it the matching accuracy is significantly inferior. We describe eTuner, an approach to automatically tune schema matching systems. Given a schema S, we match S against synthetic schemas, for which the ground truth mapping is known, and find a tuning that demonstrably improves the performance of matching S against real schemas. To efficiently search the huge space of tuning configurations, eTuner works sequentially, starting with tuning the lowest level components. To increase the applicability of eTuner, we develop methods to tune a broad range of matching components. While the tuning process is completely automatic, eTuner can also exploit user assistance (whenever available) to further improve the tuning quality. We employed eTuner to tune four recently developed matching systems on several real-world domains. The results show that eTuner produced tuned matching systems that achieve higher accuracy than using the systems with currently possible tuning methods.
引用
收藏
页码:97 / 122
页数:26
相关论文
共 72 条
[1]
Aberer K, 2003, SIGMOD RECORD, V32, P29, DOI 10.1145/945721.945729
[2]
Agrawal S, 2004, VLDB
[3]
ANDRITSOS P, 2004, P SIGMOD
[4]
[Anonymous], 2001, OIS IJCAI
[5]
[Anonymous], 1997, Machine Learning
[6]
Semantic heterogeneity resolution in federated databases by metadata implantation and stepwise evolution [J].
Aslan, G ;
McLeod, D .
VLDB JOURNAL, 1999, 8 (02) :120-132
[7]
BATINI C, 1986, COMPUT SURV, V18, P323, DOI 10.1145/27633.27634
[8]
BENJELLOUN O, 2005, GENERIC APPROACH ENT
[9]
Semantic integration of heterogeneous information sources [J].
Bergamaschi, S ;
Castano, S ;
Vincini, M ;
Beneventano, D .
DATA & KNOWLEDGE ENGINEERING, 2001, 36 (03) :215-249
[10]
BERLIN J, 2002, P C ADV INF SYST ENG