Semantic matching across heterogeneous data sources

被引：22

作者：

Zhao, Huimin ^{[1
]}

机构：

[1] Univ Wisconsin, Sheldon B Lubar Sch Business Adm, Milwaukee, WI 53201 USA

来源：

COMMUNICATIONS OF THE ACM | 2007年 / 50卷 / 01期

关键词：

D O I：

10.1145/1188913.1188916

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The role of semantic correspondences in semantic integration of data sources and to data integration across disparate databases are discussed. The growth of the Internet has increased the need for semantic interoperability across heterogeneous data sources. Semantic correspondences across heterogeneous data sources include schema-level correspondence and instance-level correspondence. Cluster analysis techniques are more suited to identify schema-level correspondence and classification techniques are more suited to detecting instance-level correspondences. Semantically related attributes tend to be highly correlated and can be identified through correlation analysis. Regression analysis can then be used to determine the actual relationship among correlated attributes. Corresponding records can be integrated into a single data set so that statistical analysis can be used to further analyze the relationships among attributes.

引用

页码：45 / 50

页数：6

共 12 条

[1] Matching records in an national medical patient index [J].

Bell, GB ;

Sethi, A .

COMMUNICATIONS OF THE ACM, 2001, 44 (09) :83-88

[2] Learning to match the schemas of data sources: A multistrategy approach [J].

Doan, A ;

Domingos, P ;

Halevy, A .

MACHINE LEARNING, 2003, 50 (03) :279-301

[3] DIRECT: a system for mining data value conversion rules from disparate data sources [J].

Fan, WG ;

Lu, HJ ;

Madnick, SE ;

Cheung, D .

DECISION SUPPORT SYSTEMS, 2002, 34 (01) :19-39

[4] A THEORY FOR RECORD LINKAGE [J].

FELLEGI, IP ;

SUNTER, AB .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1969, 64 (328) :1183-&

[5]

Hansen M, 2003, LECT NOTES COMPUT SC, V2590, P165

[6] Real-world data is dirty: Data cleansing and the merge/purge problem [J].

Hernandez, MA ;

Stolfo, SJ .

DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (01) :9-37

[7] SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks [J].

Li, WS ;

Clifton, C .

DATA & KNOWLEDGE ENGINEERING, 2000, 33 (01) :49-84

[8]

RAM S, 2001, P WORKSH INF TECHN S, P187

[9] Learning object identification rules for information integration [J].

Tejada, S ;

Knoblock, CA ;

Minton, S .

INFORMATION SYSTEMS, 2001, 26 (08) :607-633

[10] Automating the approximate record-matching process [J].

Verykios, VS ;

Elmagarmid, AK ;

Houstis, EN .

INFORMATION SCIENCES, 2000, 126 (1-4) :83-98

← 1 2 →