Shortest-substring retrieval and ranking

被引:21
作者
Clarke, CLA
Cormack, G
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 3G4, Canada
[2] Univ Waterloo, Dept Comp Sci, Waterloo, ON N2L 3G1, Canada
关键词
algorithms; performance; Boolean retrieval model; passage retrieval; relevance ranking;
D O I
10.1145/333135.333137
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a model for arbitrary passage retrieval using Boolean queries. The model is applied to the task of ranking documents, or other structural elements, in the order of their expected relevance. Features such as phrase matching, truncation, and stemming integrate naturally into the model. Properties of Boolean algebra are obeyed, and the exact-match semantics of Boolean retrieval are preserved. Simple inverted-list file structures provide an efficient implementation. Retrieval effectiveness is comparable to that of standard ranking techniques. Since global statistics are not used, the method is of particular value in distributed environments. Since ranking is based on arbitrary passages, the structural elements to be ranked mag be specified at query time and do not need to be restricted to predefined elements.
引用
收藏
页码:44 / 78
页数:35
相关论文
共 61 条
[1]  
Allan J., 1995, SIGIR Forum, P337
[2]  
[Anonymous], 1996, P 19 ANN INT ACM SIG, DOI DOI 10.1145/243199.243202
[3]  
[Anonymous], P 18 INT ACM SIGIR C
[4]   LOCAL FEEDBACK IN FULL-TEXT RETRIEVAL SYSTEMS [J].
ATTAR, R ;
FRAENKEL, AS .
JOURNAL OF THE ACM, 1977, 24 (03) :397-417
[5]   FUZZY REQUESTS - AN APPROACH TO WEIGHTED BOOLEAN SEARCHES [J].
BOOKSTEIN, A .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1980, 31 (04) :240-247
[6]   PERILS OF MERGING BOOLEAN AND WEIGHTED RETRIEVAL SYSTEMS [J].
BOOKSTEIN, A .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1978, 29 (03) :156-158
[7]  
BRENNER EH, 1996, BOOLEAN NEW APPROACH
[8]   THRESHOLD VALUES AND BOOLEAN RETRIEVAL-SYSTEMS [J].
BUELL, DA ;
KRAFT, DH .
INFORMATION PROCESSING & MANAGEMENT, 1981, 17 (03) :127-136
[9]   AN ALGEBRA FOR HIERARCHICALLY ORGANIZED TEXT-DOMINATED DATABASES [J].
BURKOWSKI, FJ .
INFORMATION PROCESSING & MANAGEMENT, 1992, 28 (03) :333-348
[10]  
Callan J. P., 1994, SIGIR '94. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, P302