Tracking evolving communities in large linked networks

被引:167
作者
Hopcroft, J
Khan, O
Kulis, B
Selman, B [1 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
[2] Google Inc, Mountain View, CA 94043 USA
[3] Univ Texas, Dept Comp Sci, Austin, TX 78712 USA
关键词
D O I
10.1073/pnas.0307750100
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We are interested in tracking changes in large-scale data by periodically creating an agglomerative clustering and examining the evolution of clusters (communities) over time. We examine a large real-world data set: the NEC CiteSeer database, a linked network of >250,000 papers. Tracking changes over time requires a clustering algorithm that produces clusters stable under small perturbations of the input data. However, small perturbations of the CiteSeer data lead to significant changes to most of the clusters. One reason for this is that the order in which papers within communities are combined is somewhat arbitrary. However, certain subsets of papers, called natural communities, correspond to real structure in the CiteSeer database and thus appear in any clustering. By identifying the subset of clusters that remain stable under multiple clustering runs, we get the set of natural communities that we can track over time. We demonstrate that such natural communities allow us to identify emerging communities and track temporal changes in the underlying structure of our network data.
引用
收藏
页码:5249 / 5253
页数:5
相关论文
共 24 条
[1]  
Adler RJ, 1998, PRACTICAL GUIDE TO HEAVY TAILS, P133
[2]  
Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
[3]  
[Anonymous], 2003, KDD 03
[4]  
Barabasi A.L., 2002, The formula: the universal laws of success
[5]   The simultaneous evolution of author and paper networks [J].
Börner, K ;
Maru, JT ;
Goldstone, RL .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 :5266-5273
[6]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[7]  
COHEN W, 2000, P AS SCOMP MACH SPEC, V6, P255
[8]  
Duda R. O., 1973, PATTERN CLASSIFICATI
[9]  
ERDOS P, 1960, B INT STATIST INST, V38, P343
[10]  
FLAKE GW, 2000, P ASS COMP MACH SPEC, V6, P255