Clustering based on random graph model embedding vertex features

被引:48
作者
Zanghi, Hugo [1 ]
Volant, Stevenn [2 ]
Ambroise, Christophe [3 ]
机构
[1] Exalead, F-75008 Paris, France
[2] Agroparistech, UMR 518, F-75231 Paris, France
[3] INRA 1152, CNRS, UMR Stat & Genome 8071, F-91000 Evry, France
关键词
Variational EM algoritm; Graph clustering; Vertex features; EM ALGORITHM; LIKELIHOOD;
D O I
10.1016/j.patrec.2010.01.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large datasets with interactions between objects are common to numerous scientific fields including the social sciences and biology, as well as being a feature of specific phenomena such as the internet. The interactions naturally define a graph, and a common way of exploring and summarizing such datasets is graph clustering. Most techniques for clustering graph vertices use only the topology of connections, while ignoring information about the vertices' features. In this paper we provide a clustering algorithm that harnesses both types of data, based on a statistical model with a latent structure characterizing each vertex both by a vector of features and by its connectivity. We perform simulations to compare our algorithm with existing approaches, and also evaluate our method using real datasets based on hypertext documents. We find that our algorithm successfully exploits whatever information is found both in the connectivity pattern and in the features. (c) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:830 / 836
页数:7
相关论文
共 26 条
[1]  
Ambroise C, 1997, QUANT GEO G, V9, P493
[2]  
[Anonymous], 1977, INRIA
[3]  
[Anonymous], 1971, Journal of Mathematical Sociology, DOI 10.1080/0022250X.1971.9989788
[4]  
BEKKERMAN R, 2006, WEB PAGE CLUSTERING
[5]  
BERTIN P, 2002, Patent No. 1182581
[6]   Assessing a mixture model for clustering with the integrated completed likelihood [J].
Biernacki, C ;
Celeux, G ;
Govaert, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (07) :719-725
[7]  
Cook D.J., 2007, Mining graph data
[8]   A mixture model for random graphs [J].
Daudin, J. -J. ;
Picard, F. ;
Robin, S. .
STATISTICS AND COMPUTING, 2008, 18 (02) :173-183
[9]  
DAVISON BD, 2000, ACM SIGIR C RES DEV, P272
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38