A statistical interpretation of term specificity and its application in retrieval

被引:40
作者
Sparck-Jones, K [1 ]
机构
[1] Univ Cambridge, Comp Lab, Cambridge CB2 3QG, England
关键词
information research; information retrieval; information science and documentation;
D O I
10.1108/00220410410560573
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing, in particular, that frequently-occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.
引用
收藏
页码:493 / 502
页数:10
相关论文
共 11 条
[1]  
Aitchison T. M., 1970, COMP EVALUATION INDE
[2]   EFFECTIVENESS OF AUTOMATICALLY GENERATED WEIGHTS AND LINKS IN MECHANICAL INDEXING [J].
ARTANDI, S ;
WOLF, EH .
AMERICAN DOCUMENTATION, 1969, 20 (03) :198-202
[3]  
Borko H., 1968, MECHANISED INFORM ST, P591
[4]  
Cleverdon C., 1966, FACTORS DETERMING PE
[5]  
CURTICE RM, 1969, OPERATIONAL INTERACT
[6]  
KEEN EM, 1972, IN PRESS REPORT INFO
[7]  
LANCASTER FW, 1968, INFORMATION RETRIEVA
[8]   COMPUTER EVALUATION OF INDEXING AND TEXT PROCESSING [J].
SALTON, G ;
LESK, ME .
JOURNAL OF THE ACM, 1968, 15 (01) :8-&
[9]  
SALTON G, 1968, AUTOMATIC INFORMATIO
[10]  
SPARCK JK, 1971, AUTOMATIC KEYWORD CL