Tracking evolving communities in large linked networks

被引:167
作者
Hopcroft, J
Khan, O
Kulis, B
Selman, B [1 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
[2] Google Inc, Mountain View, CA 94043 USA
[3] Univ Texas, Dept Comp Sci, Austin, TX 78712 USA
关键词
D O I
10.1073/pnas.0307750100
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We are interested in tracking changes in large-scale data by periodically creating an agglomerative clustering and examining the evolution of clusters (communities) over time. We examine a large real-world data set: the NEC CiteSeer database, a linked network of >250,000 papers. Tracking changes over time requires a clustering algorithm that produces clusters stable under small perturbations of the input data. However, small perturbations of the CiteSeer data lead to significant changes to most of the clusters. One reason for this is that the order in which papers within communities are combined is somewhat arbitrary. However, certain subsets of papers, called natural communities, correspond to real structure in the CiteSeer database and thus appear in any clustering. By identifying the subset of clusters that remain stable under multiple clustering runs, we get the set of natural communities that we can track over time. We demonstrate that such natural communities allow us to identify emerging communities and track temporal changes in the underlying structure of our network data.
引用
收藏
页码:5249 / 5253
页数:5
相关论文
共 24 条
[11]  
GIBSON D, 1998, P HYP 1998 C, V9, P225
[12]  
GILES CL, 1998, P INT C DIG LIB, V3, P89
[13]  
Jain A.K., 1998, ALGORITHMS CLUSTERIN
[14]   BIBLIOGRAPHIC COUPLING BETWEEN SCIENTIFIC PAPERS [J].
KESSLER, MM .
AMERICAN DOCUMENTATION, 1963, 14 (01) :10-&
[15]   Authoritative sources in a hyperlinked environment [J].
Kleinberg, JM .
JOURNAL OF THE ACM, 1999, 46 (05) :604-632
[16]  
Ng Andrew Y., 2001, 17 INTERNAT JOINT C, V2, P903
[17]  
NG AY, 2001, P ASS COMP MACH SPEC, V24, P258
[18]  
PASULA H, 2003, ADV NEURAL INFORM SY, V15, P1401
[19]   Clustering and identifying temporal trends in document databases [J].
Popescul, A ;
Flake, GW ;
Lawrence, S ;
Ungar, LH ;
Giles, CL .
IEEE ADVANCES IN DIGITAL LIBRARIES 2000, PROCEEDINGS, 2000, :173-182
[20]  
Salton G., 1988, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer