Optimal and hierarchical clustering of large-scale hybrid networks for scientific mapping

被引:19
作者
Liu, Xinhai [1 ,2 ]
Glanzel, Wolfgang [3 ,4 ]
De Moor, Bart [5 ,6 ]
机构
[1] Peoples Bank China, Credit Reference Ctr, Dept Postdoctoral Res, Beijing 100800, Peoples R China
[2] Peoples Bank China, Financial Res Inst, Dept Postdoctoral Res, Beijing 100800, Peoples R China
[3] Katholieke Univ Leuven, Dept MSI, Ctr R&D Monitoring ECOOM, B-3000 Louvain, Belgium
[4] Hungarian Acad Sci, IRPS, Budapest, Hungary
[5] Katholieke Univ Leuven, ESAT SCD, B-3001 Louvain, Belgium
[6] Katholieke Univ Leuven, KU Leuven IBBT Future Hlth Dept, B-3001 Louvain, Belgium
基金
中国国家自然科学基金;
关键词
Optimal and hierarchical clustering; Text mining; Bibliometric analysis; Modularity optimization; Network analysis; COMMUNITY STRUCTURE; COMBINED COCITATION; WORD ANALYSIS; INFORMATION; SCIENCE;
D O I
10.1007/s11192-011-0600-x
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Previous studies have shown that hybrid clustering methods based on textual and citation information outperforms clustering methods that use only one of these components. However, former methods focus on the vector space model. In this paper we apply a hybrid clustering method which is based on the graph model to map the Web of Science database in the mirror of the journals covered by the database. Compared with former hybrid clustering strategies, our method is very fast and even achieves better clustering accuracy. In addition, it detects the number of clusters automatically and provides a top-down hierarchical analysis, which fits in with the practical application. We quantitatively and qualitatively asses the added value of such an integrated analysis and we investigate whether the clustering outcome provides an appropriate representation of the field structure by comparing with a text-only or citation-only clustering and with another hybrid method based on linear combination of distance matrices. Our dataset consists of about 8,000 journals published in the period 2002-2006. The cognitive analysis, including the ranked journals, term annotation and the visualization of cluster structure demonstrates the efficiency of our strategy.
引用
收藏
页码:473 / 493
页数:21
相关论文
共 39 条
[1]  
[Anonymous], 2004, Lucene in Action
[2]  
[Anonymous], 1988, Algorithms for Clustering Data
[3]  
[Anonymous], 2010, Community detection and mining in social media
[4]  
Baeza-Yates R.A., 1999, Modern Information Retrieval
[5]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[6]  
BRAAM RR, 1991, J AM SOC INFORM SCI, V42, P233, DOI 10.1002/(SICI)1097-4571(199105)42:4<233::AID-ASI1>3.0.CO
[7]  
2-I
[8]  
BRAAM RR, 1991, J AM SOC INFORM SCI, V42, P252, DOI 10.1002/(SICI)1097-4571(199105)42:4<252::AID-ASI2>3.0.CO
[9]  
2-G
[10]   Link-based similarity measures for the classification of Web documents [J].
Calado, P ;
Cristo, M ;
Gonçalves, MA ;
de Moura, ES ;
Ribeiro-Neto, B ;
Ziviani, N .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (02) :208-221