Instance-based attribute identification in database integration

被引:5
作者
Chua, CEH [1 ]
Chiang, RHL
Lim, EP
机构
[1] Georgia State Univ, J Mack Robinson Coll Business, Atlanta, GA 30303 USA
[2] Univ Cincinnati, Coll Business Adm, Cincinnati, OH 45221 USA
[3] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
关键词
attribute identification; database integration; measures of association;
D O I
10.1007/s00778-003-0088-y
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Most research on attribute identification in database integration has focused on integrating attributes using schema and summary information derived from the attribute values. No research has attempted to fully explore the use of attribute values to perform attribute identification. We propose an attribute identification method that employs schema and summary instance information as well as properties of attributes derived from their instances. Unlike other attribute identification methods that match only single attributes, our method matches attribute groups for integration. Because our attribute identification method fully explores data instances, it can identify corresponding attributes to be integrated even when schema information is misleading. Three experiments were performed to validate our attribute identification method. In the first experiment, the heuristic rules derived for attribute classification were evaluated on 119 attributes from nine public domain data sets. The second was a controlled experiment validating the robustness of the proposed attribute identification method by introducing erroneous data. The third experiment evaluated the proposed attribute identification method on five data sets extracted from online music stores. The results demonstrated the viability of the proposed method.
引用
收藏
页码:228 / 243
页数:16
相关论文
共 59 条
[1]  
AGGARWAL CC, 1998, B IEEE COMPUTER SOC, V2, P23
[2]  
[Anonymous], 1987, ANN DISCRETE MATH, DOI DOI 10.1016/S0304-0208(08)73238-9
[3]  
[Anonymous], 1989, Applied Linear Regression Models
[4]  
[Anonymous], Q J EC
[5]  
[Anonymous], 1980, CLUSTER ANAL
[6]  
Bekker PA., 1994, IDENTIFICATION EQUIV
[7]  
BERNDT ER, 1991, PRACTICE ECONOMETRIC, P193
[8]   THE EFFECTS OF FAMILY DISRUPTION ON SOCIAL-MOBILITY [J].
BIBLARZ, TJ ;
RAFTERY, AE .
AMERICAN SOCIOLOGICAL REVIEW, 1993, 58 (01) :97-109
[9]  
Blake C.L., 1998, UCI repository of machine learning databases
[10]  
*BUR LAB STAT, 1995, MARCH 1995 POP SURV