Efficient Selection and Integration of Hidden Web Database

被引:5
作者
Xian, Xuefeng [1 ,2 ]
Zhao, Pengpeng [1 ,2 ]
Yang, Yuanfeng [1 ,2 ]
Xin, Jie [2 ]
Cui, Zhiming [1 ,2 ]
机构
[1] JiangSu Prov Support Software Engn R&D Ctr Modern, Suzhou, Peoples R China
[2] Soochow Univ, Inst Intelligent Informat Proc & Applicat, Suzhou, Peoples R China
关键词
hidden web; data integration; web database selection;
D O I
10.4304/jcp.5.4.500-507
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
An ever increasing amount of valuable information is stored in web databases, "hidden" behind search interfaces. A new application area emerge for information retrieval and integration. There may be hundreds or thousands of web databases providing data of relevance to a particular domain on the web. So a primary challenge to internet-scale hidden web database integration is to determine in which web databases to include in the integration system with the aim of making the system contain as much high-quality data as possible and the least degree of overlap. In this paper, we present an approach to iteratively select and integrate candidate web database. The core of this approach is a benefit function that evaluates how much benefit the web database brings to a given status of an integration system by integrating it. We devise a benefit function based on the volume and quality of those new data that added to integration system by integrating the web database. We show in practice how to efficiently apply our approach to select and integrate web database. Our experiments on real hidden web databases indicate that the selected and integrated result of web databases produced by our approach yields an integration system with a significant higher utilities than a wide range of other strategies.
引用
收藏
页码:500 / 507
页数:8
相关论文
共 16 条
[11]  
NAUMANN F, 1998, P 3 C INF QUAL CAMBR, P137
[12]  
Pipino LL., 2002, COMMUN ACM, V45, P211, DOI DOI 10.1145/505248.506010
[13]  
Redman T., 1996, ARTECH HOUSE
[14]  
Shokouhi M, 2007, LECT NOTES COMPUT SC, V4425, P160
[15]  
Si L., 2003, SIGIR 03, P298
[16]  
Wang R. Y., 1996, Journal of Management Information Systems, V12, P5