An approach to discover and recommend cross-domain bridge-keywords in document banks

被引:10
作者
Su, Yu-Min [2 ]
Hsu, Ping-Yu [1 ]
Pai, Ning-Yao [3 ]
机构
[1] Natl Cent Univ, Dept Business Adm, Tao Yuan, Taiwan
[2] Natl Chengchi Univ, Dept Comp Sci, Taipei 11623, Taiwan
[3] Natl Chiao Tung Univ, Inst Informat Management, Hsinchu, Taiwan
关键词
Databases; Data handling; Information retrieval; Cluster analysis; AUTHOR COCITATION ANALYSIS; CLASSIFICATION; NETWORKS; SCIENCE; SEARCH; SPACE;
D O I
10.1108/02640471011081951
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Purpose - The co-word analysis method is commonly used to cluster-related keywords into the same keyword domain. In other words, traditional co-word analysis cannot cluster the same keywords into more than one keyword domain, and disregards the multi-domain property of keywords. The purpose of this paper is to propose an innovative keyword co-citation approach called "Complete Keyword Pair (CKP) method", which groups complete keyword sets of reference papers into clusters, and thus finds keywords belonging to more than one keyword domain, namely bridge-keywords. Design/methodology/approach - The approach regards complete author keywords of a paper as a complete keyword set to compute the relations among keywords. Any two complete keyword sets whose corresponding papers are co-referenced by the same paper are recorded as a CKP. A clustering method is performed with the correlation matrix computed from the frequency counts of the CKPs, for clustering the complete keyword sets. Since keywords may be involved in more than one complete keyword set, the same keywords may end up appearing in different clusters. Findings - Results of this study show that the CKP method can discover bridge-keywords with average precision of 80 per cent in the Journal of the Association for Computing Machinery citation bank during 2000-2006 when compared against the benchmark of Association for Computing Machinery Computing Classification System. Originality/value - Traditional co-word analysis focuses on co-occurrence of keywords, and therefore, cannot cluster the same keywords into more than one keyword domain. The CKP approach considers complete author keyword sets of reference papers to discover bridge-keywords. Therefore, the keyword recommendation system based on CKP can recommend keywords across multiple keyword domains via the bridge-keywords.
引用
收藏
页码:669 / 687
页数:19
相关论文
共 35 条
[1]   Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient [J].
Ahlgren, P ;
Jarneving, B ;
Rousseau, R .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2003, 54 (06) :550-560
[2]  
Association for Computing Machinery, 2009, ACM COMP CLASS SYST
[3]  
Association for Computing Machinery, 2009, ACM PORT
[4]  
AVANCINI H, 2004, P IADIS C APPL COMP, P67
[5]   Using data mining technology to solve classification problems - A case study of campus digital library [J].
Chang, Chan-Chine ;
Chen, Ruey-Shun .
ELECTRONIC LIBRARY, 2006, 24 (03) :307-321
[6]   AUTOMATIC CONSTRUCTION OF NETWORKS OF CONCEPTS CHARACTERIZING DOCUMENT DATABASES [J].
CHEN, HC ;
LYNCH, KJ .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1992, 22 (05) :885-902
[7]  
Chen HC, 1997, J AM SOC INFORM SCI, V48, P17, DOI 10.1002/(SICI)1097-4571(199701)48:1<17::AID-ASI4>3.0.CO
[8]  
2-4
[9]  
Ding Y., 2000, P 6 INT SOC KNOWL OR, P28
[10]  
Egghe L., 1990, Introduction to Informetrics: quantitative methods in library, documentation and information science