Collection-integral source selection for uncooperative distributed information retrieval environments

被引:14
作者
Paltoglou, Georgios [1 ]
Salampasis, Michail [2 ]
Satratzemi, Maria [1 ]
机构
[1] Univ Macedonia, Thessaloniki, Greece
[2] Alexander Technol Educ Inst Thessaloniki, Thessaloniki 57400, Greece
关键词
Source selection; Distributed information retrieval; Federated search; SEARCH; WEB; PERFORMANCE;
D O I
10.1016/j.ins.2010.03.020
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new integral-based source selection algorithm for uncooperative distributed information retrieval environments. The algorithm functions by modeling each source as a plot, using the relevance score and the intra-collection position of its sampled documents in reference to a centralized sample index. Based on the above modeling, the algorithm locates the collections that contain the most relevant documents. A number of transformations are applied to the original plot, in order to reward collections that have higher scoring documents and dampen the effect of collections returning an excessive number of documents. The family of linear interpolant functions that pass through the points of the modified plot is computed for each available source and the area that they cover in the rank-relevance space is calculated. Information sources are ranked based on the area that they cover. Based on this novel metric for collection relevance, the algorithm is tested in a variety of testbeds in both recall and precision oriented settings and its performance is found to be better or at least equal to previous state-of-the-art approaches, overall constituting a very effective and robust solution. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:2763 / 2776
页数:14
相关论文
共 61 条
[1]  
[Anonymous], P ADC
[2]  
[Anonymous], 2003, P 26 ANN INT ACM SIG
[3]   An efficient algorithm for full text retrieval for multiple keywords [J].
Arita, T ;
Shishibori, M ;
Aoe, JI .
INFORMATION SCIENCES, 1998, 104 (3-4) :345-363
[4]  
Aslam J. A., 2001, SIGIR Forum, P276
[5]   The FedLemur project: Federated search in the real world [J].
Avrahami, TT ;
Yau, L ;
Si, L ;
Callan, J .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (03) :347-358
[6]   Engineering a multi-purpose test collection for Web retrieval experiments [J].
Bailey, P ;
Craswell, N ;
Hawking, D .
INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (06) :853-871
[7]  
BAILLIE M, 2006, P SPIRE C GLASG UK, P316
[8]   A subjective measure of web search quality [J].
Beg, MMS .
INFORMATION SCIENCES, 2005, 169 (3-4) :365-381
[9]  
Bergman MichaelK., 2001, DEEP WEB SURFACING H
[10]   Query-based sampling of text databases [J].
Callan, J ;
Connell, M .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2001, 19 (02) :97-130