Modeling score distributions in information retrieval

被引:20
作者
Arampatzis, Avi [1 ]
Robertson, Stephen [2 ]
机构
[1] Democritus Univ Thrace, Dept Elect & Comp Engn, GR-67100 Xanthi, Greece
[2] Microsoft Res, Cambridge, England
来源
INFORMATION RETRIEVAL | 2011年 / 14卷 / 01期
关键词
Score distribution; Normalization; Distributed retrieval; Fusion; Filtering;
D O I
10.1007/s10791-010-9145-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We review the history of modeling score distributions, focusing on the mixture of normal-exponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the Recall-Fallout Convexity Hypothesis, and formulate two new hypotheses considering the component distributions, individually as well as in pairs, under some limiting conditions of parameter values. From all the mixtures suggested in the past, the current theoretical argument points to the two gamma as the most-likely universal model, with the normal-exponential being a usable approximation. Beyond the theoretical contribution, we provide new experimental evidence showing vector space or geometric models, and BM25, as being 'friendly' to the normal-exponential, and that the non-convexity problem that the mixture possesses is practically not severe. Furthermore, we review recent non-binary mixture models, speculate on graded relevance, and consider methods such as logistic regression for score calibration.
引用
收藏
页码:26 / 46
页数:21
相关论文
共 39 条
[1]  
[Anonymous], 2001, P 24 ANN INT ACM SIG
[2]  
[Anonymous], 1989, Analysis of binary data
[3]  
ARAMPATZIS A, 2001, P TREC 2001
[4]  
ARAMPATZIS A, 2000, P TREC 2000
[5]  
ARAMPATZIS A, 2008, P TREC 2008
[6]   Where to Stop Reading a Ranked List? Threshold Optimization using Truncated Score Distributions [J].
Arampatzis, Avi ;
Kamps, Jaap ;
Robertson, Stephen .
PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, :524-531
[7]  
Arampatzis Avi., 2009, Proceedings of the 18th ACM conference on Information and knowledge management, CIKM '09, P797
[8]   A probabilistic solution to the selection and fusion problem in distributed information retrieval [J].
Baumgarten, C .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :246-253
[10]  
Callan J., 2000, ADV INFORM RETRIEVAL, P127