Graph clustering via a discrete uncoupling process

被引:400
作者
Van Dongen, Stijn [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
关键词
stochastic uncoupling; graph clustering; Markov graph; Markov matrix; diagonal similarity; positive semi-definite matrices; circulant matrices;
D O I
10.1137/040608635
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
A discrete uncoupling process for finite spaces is introduced, called the Markov Cluster Process or the MCL process. The process is the engine for the graph clustering algorithm called the MCL algorithm. The MCL process takes a stochastic matrix as input, and then alternates expansion and inflation, each step de. ning a stochastic matrix in terms of the previous one. Expansion corresponds with taking the kth power of a stochastic matrix, where k epsilon N. Inflation corresponds with a parametrized operator Gamma(r), r >= 0, that maps the set of (column) stochastic matrices onto itself. The image Gamma M-r is obtained by raising each entry in M to the rth power and rescaling each column to have sum 1 again. In practice the process converges very fast towards a limit that is invariant under both matrix multiplication and inflation, with quadratic convergence around the limit points. The heuristic behind the process is its expected behavior for (Markov) graphs possessing cluster structure. The process is typically applied to the matrix of random walks on a given graph G, and the connected components of (the graph associated with) the process limit generically allow a clustering interpretation of G. The limit is in general extremely sparse and iterands are sparse in a weighted sense, implying that the MCL algorithm is very fast and highly scalable. Several mathematical properties of the MCL process are established. Most notably, the process (and algorithm) iterands posses structural properties generalizing the mapping from process limits onto clusterings. The inflation operator Gr maps the class of matrices that are diagonally similar to a symmetric matrix onto itself. The phrase diagonally positive semi-definite (dpsd) is used for matrices that are diagonally similar to a positive semi-definite matrix. For r. N and for M a stochastic dpsd matrix, the image GrM is again dpsd. Determinantal inequalities satisfied by a dpsd matrix M imply a natural ordering among the diagonal elements of M, generalizing the mapping of process limits onto clusterings. The spectrum of Gamma(infinity) M is of the form {0(n-k), 1(k)}, where k is the number of endclasses of the ordering associated with M, and n is the dimension of M. This attests to the uncoupling effect of the inflation operator.
引用
收藏
页码:121 / 141
页数:21
相关论文
共 62 条
[1]   LAMBDA-1, ISOPERIMETRIC-INEQUALITIES FOR GRAPHS, AND SUPERCONCENTRATORS [J].
ALON, N ;
MILMAN, VD .
JOURNAL OF COMBINATORIAL THEORY SERIES B, 1985, 38 (01) :73-88
[2]  
[Anonymous], 1971, WILEY SERIES PROBABI
[3]  
[Anonymous], AMS C PUBLICATIONS
[4]  
[Anonymous], FILOMAT
[5]  
BERMAN A, 1994, APPL MATH, V9
[6]  
BHATIA R, 1997, GRADUATE TEXTS MATH, P169
[7]   Evaluation of clustering algorithms for protein-protein interaction networks [J].
Brohee, Sylvain ;
van Helden, Jacques .
BMC BIOINFORMATICS, 2006, 7 (1)
[8]   HILBERTS METRIC AND POSITIVE CONTRACTION MAPPINGS IN A BANACH-SPACE [J].
BUSHELL, PJ .
ARCHIVE FOR RATIONAL MECHANICS AND ANALYSIS, 1973, 52 (04) :330-338
[9]  
Chen YJ, 2005, NUCLEIC ACIDS RES, V33, pD169
[10]  
COHEN JE, 1979, MATH P CAMBRIDGE PHI, V86, P351