Robust Result Merging Using Sample-Based Score Estimates

被引:26
作者
Shokouhi, Milad [1 ]
Zobel, Justin [1 ]
机构
[1] RMIT Univ, Melbourne, Vic, Australia
关键词
Algorithms; Result merging; result fusion; distributed information retrieval; uncooperative collections; COLLECTION SELECTION; SEARCH; PERFORMANCE;
D O I
10.1145/1508850.1508852
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In federated information retrieval, a query is routed to multiple collections and a single answer list is constructed by combining the results. Such metasearch provides a mechanism for locating documents on the hidden Web and, by use of sampling, can proceed even when the collections are uncooperative. However, the similarity scores for documents returned from different collections are not comparable, and, in uncooperative environments, document scores are unlikely to be reported. We introduce a new merging method for uncooperative environments, in which similarity scores for the sampled documents held for each collection are used to estimate global scores for the documents returned per query. This method requires no assumptions about properties such as the retrieval models used. Using experiments on a wide range of collections, we show that in many cases our merging methods are significantly more effective than previous techniques.
引用
收藏
页数:29
相关论文
共 79 条
[1]   A methodology for collection selection in heterogeneous contexts [J].
Abbaci, F ;
Savoy, J ;
Beigbeder, M .
INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, PROCEEDINGS, 2002, :529-535
[2]  
[Anonymous], P 21 ANN INT ACM SIG
[3]  
[Anonymous], P ADC
[4]  
Aslam J. A., 2001, SIGIR Forum, P276
[5]  
Aslam Javed A., 2003, P 12 INT C INF KNOWL, P484, DOI [10.1145/956863.956953, DOI 10.1145/956863.956953]
[6]   The FedLemur project: Federated search in the real world [J].
Avrahami, TT ;
Yau, L ;
Si, L ;
Callan, J .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (03) :347-358
[7]  
Azzopardi L., 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P605, DOI 10.1145/1148170.1148277
[8]  
Baillie M, 2006, LECT NOTES COMPUT SC, V4209, P316
[9]  
Bar-Yossef Z., 2006, WWW
[10]  
Bernstein Y, 2006, LECT NOTES COMPUT SC, V4209, P110