INCREMENTAL CLUSTERING FOR DYNAMIC INFORMATION-PROCESSING

被引:76
作者
CAN, F
机构
[1] Department of Systems Analysis, Miami University, Oxford
关键词
BEST-MATCH CLUSTER SEARCH; CLUSTER VALIDITY; COVER COEFFICIENT; DYNAMIC INFORMATION RETRIEVAL ENVIRONMENT; INFORMATION RETRIEVAL; INFORMATION RETRIEVAL EFFECTIVENESS; INFORMATION RETRIEVAL EFFICIENCY;
D O I
10.1145/130226.134466
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering of very large document databases is useful for both searching and browsing. The periodic updating of clusters is required due to the dynamic nature of databases. An algorithm for incremental clustering is introduced. The complexity and cost analysis of the algorithm together with an investigation of its expected behavior are presented. Through empirical testing it is shown that the algorithm achieves cost effectiveness and generates statistically valid clusters that are compatible with those of reclustering. The experimental evidence shows that the algorithm creates an effective and efficient retrieval environment.
引用
收藏
页码:143 / 164
页数:22
相关论文
共 32 条
[1]  
ANDERBERG MR, 1973, CLUSTER ANAL APPLICA
[2]  
BELKIN NJ, 1987, ANNU REV INFORM SCI, V22, P109
[3]   CONCEPTS AND EFFECTIVENESS OF THE COVER-COEFFICIENT-BASED CLUSTERING METHODOLOGY FOR TEXT DATABASES [J].
CAN, F ;
OZKARAHAN, EA .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 1990, 15 (04) :483-517
[4]   DYNAMIC CLUSTER MAINTENANCE [J].
CAN, F ;
OZKARAHAN, EA .
INFORMATION PROCESSING & MANAGEMENT, 1989, 25 (03) :275-291
[5]  
CAN F, 1987, 10TH P ANN INT ACM S, P123
[6]  
CAN F, 1989, 1989 P CAN C EL COMP, P572
[7]  
CAN F, 1990, 1990 P S APPL COMP F, P61
[8]  
CAN F, 1991, 91002 MIAM U DEP SYS
[9]  
CAN F, UNPUB EFFICIENCY BES
[10]   COMPARISON OF HIERARCHIC AGGLOMERATIVE CLUSTERING METHODS FOR DOCUMENT-RETRIEVAL [J].
ELHAMDOUCHI, A ;
WILLETT, P .
COMPUTER JOURNAL, 1989, 32 (03) :220-227