Comparison and validation of community structures in complex networks

被引:39
作者
Gustafsson, Mika [1 ]
Hornquist, Michael [1 ]
Lombardi, Anna [1 ]
机构
[1] Linkoping Univ, Dept Sci & Technol, Div Phys & Elect, SE-60174 Norrkoping, Sweden
关键词
network; community; validation; distance measure; hierarchical clustering; K-means; GO; CLUSTER; ORGANIZATION;
D O I
10.1016/j.physa.2005.12.017
中图分类号
O4 [物理学];
学科分类号
070305 [高分子化学与物理];
摘要
The issue of partitioning a network into communities has attracted a great deal of attention recently. Most authors seem to equate this issue with the one of finding the maximum value of the modularity, as defined by Newman. Since the problem formulated this way is believed to be NP-hard, most effort has gone into the construction of search algorithms, and less to the question of other measures of community structures, similarities between various partitionings and the validation with respect to external information. Here we concentrate on a class of computer generated networks and on three well-studied real networks which constitute a bench-mark for network studies; the karate club, the US college football teams and a gene network of yeast. We utilize some standard ways of clustering data (originally not designed for finding community structures in networks) and show that these classical methods sometimes outperform the newer ones. We discuss various measures of the strength of the modular structure, and show by examples features and drawbacks. Further, we compare different partitions by applying some graph-theoretic concepts of distance, which indicate that one of the quality measures of the degree of modularity corresponds quite well with the distance from the true partition. Finally, we introduce a way to validate the partitionings with respect to external data when the nodes are classified but the network structure is unknown. This is here possible since we know everything of the computer generated networks, as well as the historical answer to how the karate club and the football teams are partitioned in reality. The partitioning of the gene network is validated by use of the Gene Ontology database, where we show that a community in general corresponds to a biological process. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:559 / 576
页数:18
相关论文
共 31 条
[1]
Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]
A cluster validity framework for genome expression data [J].
Azuaje, F .
BIOINFORMATICS, 2002, 18 (02) :319-320
[3]
Cluster validation techniques for genome expression data [J].
Bolshakova, N ;
Azuaje, F .
SIGNAL PROCESSING, 2003, 83 (04) :825-833
[4]
Comparing community structure identification -: art. no. P09008 [J].
Danon, L ;
Díaz-Guilera, A ;
Duch, J ;
Arenas, A .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2005, :219-228
[5]
Community detection in complex networks using extremal optimization [J].
Duch, J ;
Arenas, A .
PHYSICAL REVIEW E, 2005, 72 (02)
[6]
Complex networks [J].
Evans, TS .
CONTEMPORARY PHYSICS, 2004, 45 (06) :455-474
[7]
Iterative optimization and simplification of hierarchical clusterings [J].
Fisher, D .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :147-179
[8]
Community structure in social and biological networks [J].
Girvan, M ;
Newman, MEJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (12) :7821-7826
[9]
Guimerà R, 2004, PHYS REV E, V70, DOI 10.1103/PhysRevE.70.025101
[10]
Partition-distance: A problem and class of perfect graphs arising in clustering [J].
Gusfield, D .
INFORMATION PROCESSING LETTERS, 2002, 82 (03) :159-164