A probabilistic approach to metasearching with adaptive probing

被引:4
作者
Liu, ZY [1 ]
Luo, C [1 ]
Cho, JH [1 ]
Chu, WW [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
来源
20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 2004年
关键词
D O I
10.1109/ICDE.2004.1320026
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An ever-increasing amount of valuable information is stored in Web databases, "hidden" behind search interfaces. To save the user's effort in manually exploring each database, metasearchers automatically select the most relevant databases to a user's query [2, 5, 16, 21, 27, 18]. In this paper we focus on one of the technical challenges in metasearching, namely database selection. Past research uses a pre-collected summary of each database to estimate its "relevancy" to the query, and in many cases make incorrect database selection. In this paper, we propose two techniques: probabilistic relevancy modelling and adaptive probing. First, we model the relevancy of each database to a given query as a probabilistic distribution, derived by sampling that database. Using the probabilistic model, the user can explicitly specify a desired level of certainty for database selection. The adaptive probing technique decides which and how many databases to contact in order to satisfy the user's requirement. Our experiments on real Hidden-Web databases indicate that our approach significantly improves the accuracy of database selection at the cost of a small number of database probing.
引用
收藏
页码:547 / 558
页数:12
相关论文
共 27 条
[1]  
BAUMGARTEN C, 1999, P ACM SIGIR 99 CA
[2]  
Bergman M.K., 2000, DEEP WEB SURFACING H
[3]  
BORGES JA, 1996, P ACM SIGCHI 96
[4]  
CALLAN JP, 1995, P ACM SIGIR 95 WA
[5]  
CHANG K, 2002, P ACM SIGMOD 02 WI
[6]  
CHANG KCC, 2003, UIUCDCSR20032321
[7]  
CHAUDHURI S, 1996, P ACM SIGMOD 96 CAN
[8]  
CHAUDHURI S, 1999, P VLDB 99 SCOTL
[9]  
CLYDE A, 2002, TEACHER LIB, V29
[10]  
CRASWELL, 2000, P ACM C DIG LIB 00 T