CONCEPTS AND EFFECTIVENESS OF THE COVER-COEFFICIENT-BASED CLUSTERING METHODOLOGY FOR TEXT DATABASES

被引:57
作者
CAN, F [1 ]
OZKARAHAN, EA [1 ]
机构
[1] PENN STATE UNIV,SCH BUSINESS,ERIE,PA 16563
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 1990年 / 15卷 / 04期
关键词
ALGORITHMS; DESIGN; PERFORMANCE; THEORY; VERIFICATION; CLUSTERING-INDEXING RELATIONSHIPS; CLUSTER VALIDITY; COVER COEFFICIENT; DECOUPLING COEFFICIENT; DOCUMENT RETRIEVAL; RETRIEVAL EFFECTIVENESS;
D O I
10.1145/99935.99938
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A new algorithm for document clustering is introduced. The base concept of the algorithm, the cover coefficient (CC) concept, provides a means of estimating the number of clusters within a document database and relates indexing and clustering analytically. The CC concept is used also to identify the cluster seeds and to form clusters with these seeds. It is shown that the complexity of the clustering process is very low. The retrieval experiments show that the information-retrieval effectiveness of the algorithm is compatible with a very demanding complete linkage clustering method that is known to have good retrieval performance. The experiments also show that the algorithm is 15.1 to 63.5 (with an average of 47.5) percent better than four other clustering algorithms in cluster-based information retrieval. The experiments have validated the indexing-clustering relationships and the complexity of the algorithm and have shown improvements in retrieval effectiveness. In the experiments, two document databases are used: TODS214 and INSPEC. The latter is a common database with 12,684 documents.
引用
收藏
页码:483 / 517
页数:35
相关论文
共 34 条
[1]  
ANDERBERG MR, 1973, CLUSTER ANAL APPLICA
[2]   2 PARTITIONING TYPE CLUSTERING ALGORITHMS [J].
CAN, F ;
OZKARAHAN, EA .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1984, 35 (05) :268-276
[3]   DYNAMIC CLUSTER MAINTENANCE [J].
CAN, F ;
OZKARAHAN, EA .
INFORMATION PROCESSING & MANAGEMENT, 1989, 25 (03) :275-291
[4]  
CAN F, 1985, THESIS MIDDLE E TECH
[5]  
CAN F, 1985, 8TH P ANN INT ACM SI, P204
[6]  
CAN F, 1989, 89002 MIAM U DEP SYS
[7]  
CAN F, 1983, 6TH P ANN INT ACM SI, P115
[8]   FILE ORGANIZATION AND MAINTENANCE PROCEDURE FOR DYNAMIC DOCUMENT COLLECTIONS [J].
CROUCH, DB .
INFORMATION PROCESSING & MANAGEMENT, 1975, 11 (1-2) :11-21
[9]  
Dubes RC., 1988, DUBES ALGORITHMS CLU
[10]   COMPARISON OF HIERARCHIC AGGLOMERATIVE CLUSTERING METHODS FOR DOCUMENT-RETRIEVAL [J].
ELHAMDOUCHI, A ;
WILLETT, P .
COMPUTER JOURNAL, 1989, 32 (03) :220-227