Hybrid clustering for validation and improvement of subject-classification schemes

被引:96
作者
Janssens, Frizo [1 ,2 ,3 ]
Zhang, Lin [1 ,4 ]
De Moor, Bart [3 ]
Glanzel, Wolfgang [1 ,5 ]
机构
[1] Katholieke Univ Leuven, Ctr R&D Monitoring ECOOM, Dept MSI, Louvain, Belgium
[2] Attentio SA NV, B-1000 Brussels, Belgium
[3] Katholieke Univ Leuven, ESAT SCD, Louvain, Belgium
[4] Dalian Univ Technol, WISE Lab, Dalian, Peoples R China
[5] Hungarian Acad Sci, IRPS, Budapest, Hungary
关键词
Subject classification; Journal cross-citation; Mapping of science; Hybrid clustering; COMBINING FULL-TEXT; COMBINED COCITATION; WORD ANALYSIS; SCIENCE; INFORMATION;
D O I
10.1016/j.ipm.2009.06.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A hybrid text/citation-based method is used to cluster journals covered by the Web of Science database in the period 2002-2006. The objective is to use this clustering to validate and, if possible, to improve existing journal-based subject-classification schemes. Cross-citation links are determined on an item-by-paper procedure for individual papers assigned to the corresponding journal. Text mining for the textual component is based on the same principle; textual characteristics of individual papers are attributed to the journals in which they have been published. In a first step, the 22-field subject-classification scheme of the Essential Science Indicators (ESI) is evaluated and visualised. In a second step, the hybrid clustering method is applied to classify the about 8300 journals meeting the selection criteria concerning continuity, size and impact. The hybrid method proves superior to its two components when applied separately. The choice of 22 clusters also allows a direct field-to-cluster comparison, and we substantiate that the science areas resulting from cluster analysis form a more coherent structure than the "intellectual" reference scheme, the ESI subject scheme. Moreover, the textual component of the hybrid method allows labelling the clusters using cognitive characteristics, while the citation component allows visualising the cross-citation graph and determining representative journals suggested by the PageRank algorithm. Finally, the analysis of journal 'migration' allows the improvement of existing classification schemes on the basis of the concordance between fields and clusters. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:683 / 702
页数:20
相关论文
共 60 条
[41]   Can scientific journals be classified in terms of aggregated journal-journal citation relations using the Journal Citation Reports? [J].
Leydesdorff, L .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (05) :601-613
[42]   A Global Map of Science Based on the ISI Subject Categories [J].
Leydesdorff, Loet ;
Rafols, Ismael .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (02) :348-362
[43]  
Mardia K. V., 1979, Multivariate Analysis
[44]  
MARSHAKOVA IV, 1973, NAUCH-TEKHN INFORM 2, P3
[45]  
Modha D. S., 2000, ACM 2000 Hypertext. Proceedings of the Eleventh ACM Conference on Hypertext and Hypermedia, P143, DOI 10.1145/336296.336351
[46]  
Narin F., 1976, Evaluative Bibliometrics: The Use of Publication and Citation Analysis in the Evaluation of Scientific Activity
[47]   Modularity and community structure in networks [J].
Newman, M. E. J. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (23) :8577-8582
[48]   Finding and evaluating community structure in networks [J].
Newman, MEJ ;
Girvan, M .
PHYSICAL REVIEW E, 2004, 69 (02) :026113-1
[49]  
Noyons E.C. M., 1999, Bibliometric mapping as a science policy and research management tool
[50]   CITATION INFLUENCE FOR JOURNAL AGGREGATES OF SCIENTIFIC PUBLICATIONS - THEORY, WITH APPLICATION TO LITERATURE OF PHYSICS [J].
PINSKI, G ;
NARIN, F .
INFORMATION PROCESSING & MANAGEMENT, 1976, 12 (05) :297-312