The whens and hows of learning to rank for web search

被引:66
作者
Macdonald, Craig [1 ]
Santos, Rodrygo L. T. [1 ]
Ounis, Iadh [1 ]
机构
[1] Univ Glasgow, Sch Comp Sci, Glasgow G12 8QQ, Lanark, Scotland
来源
INFORMATION RETRIEVAL | 2013年 / 16卷 / 05期
关键词
Learning to rank; Evaluation; Web search; Sample size; Document representations; Loss function; INFORMATION-RETRIEVAL;
D O I
10.1007/s10791-012-9209-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
080201 [机械制造及其自动化];
摘要
Web search engines are increasingly deploying many features, combined using learning to rank techniques. However, various practical questions remain concerning the manner in which learning to rank should be deployed. For instance, a sample of documents with sufficient recall is used, such that re-ranking of the sample by the learned model brings the relevant documents to the top. However, the properties of the document sample such as when to stop ranking-i.e. its minimum effective size-remain unstudied. Similarly, effective listwise learning to rank techniques minimise a loss function corresponding to a standard information retrieval evaluation measure. However, the appropriate choice of how to calculate the loss function-i.e. the choice of the learning evaluation measure and the rank depth at which this measure should be calculated-are as yet unclear. In this paper, we address all of these issues by formulating various hypotheses and research questions, before performing exhaustive experiments using multiple learning to rank techniques and different types of information needs on the ClueWeb09 and LETOR corpora. Among many conclusions, we find, for instance, that the smallest effective sample for a given query set is dependent on the type of information need of the queries, the document representation used during sampling and the test evaluation measure. As the sample size is varied, the selected features markedly change-for instance, we find that the link analysis features are favoured for smaller document samples. Moreover, despite reflecting a more realistic user model, the recently proposed ERR measure is not as effective as the traditional NDCG as a learning loss function. Overall, our comprehensive experiments provide the first empirical derivation of best practices for learning to rank deployments.
引用
收藏
页码:584 / 628
页数:45
相关论文
共 63 条
[1]
Amati G., 2003, Probabilistic Models for Information Retrieval based on Divergence from Randomness
[2]
[Anonymous], P SIGIR 2008 WORKSH
[3]
[Anonymous], P 13 TEXT RETR C TRE
[4]
[Anonymous], 2005, INT C MACH LEARN
[5]
[Anonymous], 2008, MSRTR2008109 MICR
[6]
[Anonymous], J WEB ENG
[7]
[Anonymous], 2003, Journal of machine learning research
[8]
[Anonymous], 1998, P 7 INT WORLD WID WE
[9]
[Anonymous], MACHINE LEARNED RANK
[10]
[Anonymous], P 18 TEXT RETRIEVAL