A decision-theoretic approach to database selection in networked IR

被引:73
作者
Fuhr, N [1 ]
机构
[1] Univ Dortmund, D-44221 Dortmund, Germany
关键词
theory; networked retrieval; probabilistic retrieval; probability ranking principle; resource discovery;
D O I
10.1145/314516.314517
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In networked IR, a client submits a query to a broker, which is in contact with a large number of databases. In order to yield a maximum number of documents at minimum cost, the broker has to make estimates about the retrieval cost of each database, and then decide for each database whether or not to use it for the current query, and if, how many documents to retrieve from it. For this purpose, we develop a general decision-theoretic model and discuss different cost structures. Besides cost for retrieving relevant versus nonrelevant documents, we consider the following parameters for each database: expected retrieval quality, expected number of relevant documents in the database, and cost factors for query processing and document delivery. For computing the overall optimum, a divide-and-conquer algorithm is given. If there are several brokers knowing different databases, a preselection of brokers can only be performed heuristically, but the computation of the optimum can be done similarly to the single-broker case. In addition, we derive a formula which estimates the number of relevant documents in a database based on dictionary information.
引用
收藏
页码:229 / 249
页数:21
相关论文
共 23 条
[1]  
[Anonymous], P 21 ANN INT ACM SIG
[2]  
[Anonymous], P 18 INT ACM SIGIR C
[3]   Toward inquiry-based education through interacting software agents [J].
Atkins, DE ;
Birmingham, WP ;
Durfee, EH ;
Glover, EJ ;
Mullen, T ;
Rundensteiner, EA ;
Soloway, E ;
Vidal, JM ;
Wallace, R ;
Wellman, MP .
COMPUTER, 1996, 29 (05) :69-&
[4]  
Baumgarten C, 1997, PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P258, DOI 10.1145/278459.258585
[5]   OUTLINE OF A GENERAL PROBABILISTIC RETRIEVAL MODEL [J].
BOOKSTEIN, A .
JOURNAL OF DOCUMENTATION, 1983, 39 (02) :63-72
[6]   THE HARVEST INFORMATION DISCOVERY AND ACCESS SYSTEM [J].
BOWMAN, CM ;
DANZIG, PB ;
HARDY, DR ;
MANBER, U ;
SCHWARTZ, MF .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1995, 28 (1-2) :119-125
[7]  
Danzig P. B., 1992, Computing Systems, V5, P433
[8]  
Dreger M., 1998, Digital libraries in computer science: the MeDoc approach, P67
[9]   Students access books and journals through MeDoc [J].
Endres, A ;
Fuhr, N .
COMMUNICATIONS OF THE ACM, 1998, 41 (04) :76-77
[10]  
French J. C., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P121, DOI 10.1145/290941.290976