Fusion via a linear combination of scores

被引:161
作者
Vogt C.C. [1 ]
Cottrell G.W. [1 ]
机构
[1] Computer Science and Engineering, University of California San Diego, San Diego
来源
Information Retrieval | 1999年 / 1卷 / 3期
关键词
Fusion; Linear combination; Neural networks; Performance evaluation; Routing;
D O I
10.1023/A:1009980820262
中图分类号
学科分类号
摘要
We present a thorough analysis of the capabilities of the linear combination (LC) model for fusion of information retrieval systems. The LC model combines the results lists of multiple IR systems by scoring each document using a weighted sum of the scores from each of the component systems. We first present both empirical and analytical justification for the hypotheses that such a model should only be used when the systems involved have high performance, a large overlap of relevant documents, and a small overlap of nonrelevant documents. The empirical approach allows us to very accurately predict the performance of a combined system. We also derive a formula for a theoretically optimal weighting scheme for combining 2 systems. We introduce d - the difference between the average score on relevant documents and the average score on nonrelevant documents - as a performance measure which not only allows mathematical reasoning about system performance, but also allows the selection of weights which generalize well to new documents. We describe a number of experiments involving large numbers of different IR systems which support these findings. © 1999 Kluwer Academic Publishers.
引用
收藏
页码:151 / 173
页数:22
相关论文
共 25 条
  • [1] Bartell, B.T., Cottrell, G.W., Belew, R.K., Automatic combination of multiple ranked retrieval systems (1994) SIGIR 94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 173-181. , Croft WB and van Rijsbergen C, eds. Springer-Verlag, Dublin
  • [2] Belkin, N., Kantor, P., Fox, E., Shaw, J., Combining evidence of multiple query representations for information retrieval (1995) Information Processing and Management, 31 (3), pp. 431-448
  • [3] Boughanem, M., Layaida, R., Caron, A., A neural network model for documentary base self-organising and querying (1993) Proceedings of the Fifth International Conference on Computing and Information, pp. 512-518. , Sudbury, Ontario
  • [4] Crestani, F., Comparing neural and probabilistic relevance feedback in an interactive information retrieval system (1994) 1994 IEEE International Conference on Neural Networks, 5, pp. 3426-3430
  • [5] Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R., Indexing by latent semantic analysis (1990) Journal of the American Society for Information Science, 41 (6), pp. 391-407
  • [6] Diamond, T., (1998) Information Retrieval Using Dynamic Evidence Combination, , Unpublished Ph.D. Thesis proposal, School of Information Studies, Syracuse University
  • [7] Egan, J.P., (1975) Signal Detection Theory and ROC-Analysis, , Academic Press
  • [8] Guttman, L., What is not what in statistics (1978) The Statistician, 26, pp. 81-107
  • [9] Harman, D., (1995) The Third Text REtrieval Conference (TREC-3), , Gaithersberg, MD. National Institute of Standards and Technology. NIST Special Publication 500-226
  • [10] Harman, D.K., (1997) The Fifth Text REtrieval Conference (TREC-5), , Gaithersberg, MD. National Institute of Standards and Technology. NIST Special Publication 500-238