Fusion via a linear combination of scores

被引：161

作者：

Vogt C.C. ^{[1
]}

Cottrell G.W. ^{[1
]}

机构：

[1] Computer Science and Engineering, University of California San Diego, San Diego

来源：

Information Retrieval | 1999年 / 1卷 / 3期

关键词：

Fusion; Linear combination; Neural networks; Performance evaluation; Routing;

D O I：

10.1023/A:1009980820262

中图分类号：

学科分类号：

摘要：

We present a thorough analysis of the capabilities of the linear combination (LC) model for fusion of information retrieval systems. The LC model combines the results lists of multiple IR systems by scoring each document using a weighted sum of the scores from each of the component systems. We first present both empirical and analytical justification for the hypotheses that such a model should only be used when the systems involved have high performance, a large overlap of relevant documents, and a small overlap of nonrelevant documents. The empirical approach allows us to very accurately predict the performance of a combined system. We also derive a formula for a theoretically optimal weighting scheme for combining 2 systems. We introduce d - the difference between the average score on relevant documents and the average score on nonrelevant documents - as a performance measure which not only allows mathematical reasoning about system performance, but also allows the selection of weights which generalize well to new documents. We describe a number of experiments involving large numbers of different IR systems which support these findings. © 1999 Kluwer Academic Publishers.

引用

页码：151 / 173

页数：22

共 25 条

[1] Bartell, B.T., Cottrell, G.W., Belew, R.K., Automatic combination of multiple ranked retrieval systems (1994) SIGIR 94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 173-181. , Croft WB and van Rijsbergen C, eds. Springer-Verlag, Dublin
[2] Belkin, N., Kantor, P., Fox, E., Shaw, J., Combining evidence of multiple query representations for information retrieval (1995) Information Processing and Management, 31 (3), pp. 431-448
[3] Boughanem, M., Layaida, R., Caron, A., A neural network model for documentary base self-organising and querying (1993) Proceedings of the Fifth International Conference on Computing and Information, pp. 512-518. , Sudbury, Ontario
[4] Crestani, F., Comparing neural and probabilistic relevance feedback in an interactive information retrieval system (1994) 1994 IEEE International Conference on Neural Networks, 5, pp. 3426-3430
[5] Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R., Indexing by latent semantic analysis (1990) Journal of the American Society for Information Science, 41 (6), pp. 391-407
[6] Diamond, T., (1998) Information Retrieval Using Dynamic Evidence Combination, , Unpublished Ph.D. Thesis proposal, School of Information Studies, Syracuse University
[7] Egan, J.P., (1975) Signal Detection Theory and ROC-Analysis, , Academic Press
[8] Guttman, L., What is not what in statistics (1978) The Statistician, 26, pp. 81-107
[9] Harman, D., (1995) The Third Text REtrieval Conference (TREC-3), , Gaithersberg, MD. National Institute of Standards and Technology. NIST Special Publication 500-226
[10] Harman, D.K., (1997) The Fifth Text REtrieval Conference (TREC-5), , Gaithersberg, MD. National Institute of Standards and Technology. NIST Special Publication 500-238

← 1 2 3 →