Collection-integral source selection for uncooperative distributed information retrieval environments

被引:14
作者
Paltoglou, Georgios [1 ]
Salampasis, Michail [2 ]
Satratzemi, Maria [1 ]
机构
[1] Univ Macedonia, Thessaloniki, Greece
[2] Alexander Technol Educ Inst Thessaloniki, Thessaloniki 57400, Greece
关键词
Source selection; Distributed information retrieval; Federated search; SEARCH; WEB; PERFORMANCE;
D O I
10.1016/j.ins.2010.03.020
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new integral-based source selection algorithm for uncooperative distributed information retrieval environments. The algorithm functions by modeling each source as a plot, using the relevance score and the intra-collection position of its sampled documents in reference to a centralized sample index. Based on the above modeling, the algorithm locates the collections that contain the most relevant documents. A number of transformations are applied to the original plot, in order to reward collections that have higher scoring documents and dampen the effect of collections returning an excessive number of documents. The family of linear interpolant functions that pass through the points of the modified plot is computed for each available source and the area that they cover in the rank-relevance space is calculated. Information sources are ranked based on the area that they cover. Based on this novel metric for collection relevance, the algorithm is tested in a variety of testbeds in both recall and precision oriented settings and its performance is found to be better or at least equal to previous state-of-the-art approaches, overall constituting a very effective and robust solution. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:2763 / 2776
页数:14
相关论文
共 61 条
[21]  
Hawking D., 2005, SIGIR 2005. Proceedings of the Twenty-Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P75, DOI 10.1145/1076034.1076050
[22]   Methods for information server selection [J].
Hawking, D ;
Thistlewaite, P .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1999, 17 (01) :40-76
[23]   What users see - Structures in search engine results pages [J].
Hoechstoetter, Nadine ;
Lewandowski, Dirk .
INFORMATION SCIENCES, 2009, 179 (12) :1796-1812
[24]   Real life, real users, and real needs: a study and analysis of user queries on the web [J].
Jansen, BJ ;
Spink, A ;
Saracevic, T .
INFORMATION PROCESSING & MANAGEMENT, 2000, 36 (02) :207-227
[25]  
Lagoze C., 2001, JCDL 01, P54
[26]   A knowledge engineering approach to knowledge management [J].
Lai, Lien F. .
INFORMATION SCIENCES, 2007, 177 (19) :4072-4094
[27]   Large-scale information retrieval with latent semantic indexing [J].
Letsche, TA ;
Berry, MW .
INFORMATION SCIENCES, 1997, 100 (1-4) :105-137
[28]   A three-year study on the freshness of web search engine databases [J].
Lewandowski, Dirk .
JOURNAL OF INFORMATION SCIENCE, 2008, 34 (06) :817-831
[29]  
LYMAN P, 2003, MUCH INFORM SIMS
[30]  
MACDONALD C, 2006, CIKM 2006, P387, DOI DOI 10.1145/1183614.1183671