Finding cohesive clusters for analyzing knowledge communities

被引:19
作者
Kandylas, Vasileios [1 ]
Upham, S. Phineas [2 ]
Ungar, Lyle H. [1 ]
机构
[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[2] Univ Penn, Wharton Sch, Philadelphia, PA 19104 USA
关键词
D O I
10.1007/s10115-008-0135-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Documents and authors can be clustered into "knowledge communities" based on the overlap in the papers they cite. We introduce a new clustering algorithm, Streemer, which finds cohesive foreground clusters embedded in a diffuse background, and use it to identify knowledge communities as foreground clusters of papers which share common citations. To analyze the evolution of these communities over time, we build predictive models with features based on the citation structure, the vocabulary of the papers, and the affiliations and prestige of the authors. Findings include that scientific knowledge communities tend to grow more rapidly if their publications build on diverse information and if they use a narrow vocabulary.
引用
收藏
页码:335 / 354
页数:20
相关论文
共 26 条
[1]  
[Anonymous], 2002, J. Mach. Learn. Res
[2]  
[Anonymous], TOPICS TIME NONMARKO
[3]  
Blei DM, 2006, DYNAMIC TOPIC MODELS, P113
[4]  
CRANE D., 1972, Invisible Colleges: Diffusion of Knowledge in Scientific Communities
[5]  
Dhillon I. S., 2001, KDD-2001. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P269, DOI 10.1145/502512.502550
[6]   Information theoretic clustering of sparse co-occurrence data [J].
Dhillon, IS ;
Guan, YQ .
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, :517-520
[7]  
Ester M., 1996, P 2 INT C KNOWL DISC, P226, DOI DOI 10.5555/3001460.3001507
[8]  
Fern X. Z., 2003, Random projection for high dimensional data clustering: A cluster ensemble approach, P186
[9]  
FLAKE GW, 2000, EFFICIENT IDENTIFICA, P150
[10]  
GIBSON D, 1998, INFERRING WEB COMMUN