ON THE EFFICIENCY OF BEST-MATCH CLUSTER SEARCHES

被引:5
作者
CAN, F
机构
[1] Department of Systems Analysis, Miami University, Oxford
关键词
D O I
10.1016/0306-4573(94)90049-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
The efficiency of various cluster-based retrieval (CBR) strategies is analyzed. The possibility of combining CBR and inverted index search (IIS) is investigated. A method for combining the two approaches is proposed and shown to be cost effective in terms of paging and CPU time. In the new method, the selection of documents from the best-matching clusters is done using the inverted index for all documents. Although this is counterintuitive to the concept of best-match CBR, the observations prove that it is much more efficient than conventional approaches. In the experiments, the effects of the number of selected clusters, page size, centroid length, and matching function are considered. The experiments show that the storage overhead of the new method would be moderately higher than that of IIS.
引用
收藏
页码:343 / 361
页数:19
相关论文
共 33 条
[1]
ARAYA JE, 1990, THESIS CORNELL U ITH
[2]
INCREMENTAL CLUSTERING FOR DYNAMIC INFORMATION-PROCESSING [J].
CAN, F .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1993, 11 (02) :143-164
[3]
CONCEPTS AND EFFECTIVENESS OF THE COVER-COEFFICIENT-BASED CLUSTERING METHODOLOGY FOR TEXT DATABASES [J].
CAN, F ;
OZKARAHAN, EA .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 1990, 15 (04) :483-517
[4]
CAN F, 1989, SEP P CAN C EL COMP, P572
[5]
CAN F, 1993, 1993 P S APPL COMP N, P729
[6]
CAN F, 1991, 91001 MIAM U DEP SYS
[7]
CAN F, 1992, 92001 MIAM U DEP SYS
[8]
CROFT WB, 1989, INFORMATION PROCESSI, V26, P599
[9]
Deitel H. M., 1990, OPERATING SYSTEMS
[10]
COMPARISON OF HIERARCHIC AGGLOMERATIVE CLUSTERING METHODS FOR DOCUMENT-RETRIEVAL [J].
ELHAMDOUCHI, A ;
WILLETT, P .
COMPUTER JOURNAL, 1989, 32 (03) :220-227