COMPARISON OF HIERARCHIC AGGLOMERATIVE CLUSTERING METHODS FOR DOCUMENT-RETRIEVAL

被引:71
作者
ELHAMDOUCHI, A [1 ]
WILLETT, P [1 ]
机构
[1] UNIV SHEFFIELD,DEPT INFORMAT STUDIES,SHEFFIELD S10 2TN,S YORKSHIRE,ENGLAND
关键词
Data Processing--File Organization;
D O I
10.1093/comjnl/32.3.220
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper considers the use of the single linkage, complete linkage, group average and Ward hierarchic agglomerative clustering methods for document retrieval. The methods are used to cluster seven document test collections for which queries and relevance judgements are available. Several retrieval strategies are described which allow searches to be carried out of the clustered document files resulting from the use of the four methods. These searches suggest that the group average method is the most suitable for document clustering purposes; however, searches of the unclustered document collections and of a simpler type of clustered file (based on pairs of nearest neighbours) usually result in better levels of retrieval effectiveness than searches of the clustered collections.
引用
收藏
页码:220 / 227
页数:8
相关论文
共 40 条
[1]  
ANDERBERG MR, 1973, CLUSTER ANAL APPLICA
[2]  
[Anonymous], 1973, NUMERICAL TAXONOMY P
[3]   MIXTURE MODEL TESTS OF CLUSTER-ANALYSIS - ACCURACY OF 4 AGGLOMERATIVE HIERARCHICAL METHODS [J].
BLASHFIELD, RK .
PSYCHOLOGICAL BULLETIN, 1976, 83 (03) :377-388
[4]   A MODEL OF CLUSTER-SEARCHING BASED ON CLASSIFICATION [J].
CROFT, WB .
INFORMATION SYSTEMS, 1980, 5 (03) :189-195
[5]   CLUSTERING LARGE FILES OF DOCUMENTS USING SINGLE-LINK METHOD [J].
CROFT, WB .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1977, 28 (06) :341-344
[6]   EFFICIENT ALGORITHM FOR A COMPLETE LINK METHOD [J].
DEFAYS, D .
COMPUTER JOURNAL, 1977, 20 (04) :364-366
[7]  
DUBES R, 1980, ADV COMPUT, V19, P113
[8]   TECHNIQUES FOR THE MEASUREMENT OF CLUSTERING TENDENCY IN DOCUMENT-RETRIEVAL SYSTEMS [J].
ELHAMDOUCHI, A ;
WILLETT, P .
JOURNAL OF INFORMATION SCIENCE, 1987, 13 (06) :361-365
[9]  
ELHAMDOUCHI A, 1987, THESIS U SHEFFIELD
[10]  
ELHAMDOUCHI A, 1986, 9TH P INT C RES DEV, P149