Correlation clustering in general weighted graphs

被引:182
作者
Demaine, Erik D.
Emanuel, Dotan
Fiat, Amos
Immorlica, Nicole
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[2] Tel Aviv Univ, Sch Math Sci, Dept Comp Sci, IL-69978 Tel Aviv, Israel
关键词
clustering; partitioning; approximation algorithms; minimum multicut;
D O I
10.1016/j.tcs.2006.05.008
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the following general correlation-clustering problem [N. Bansal, A. Blum, S. Chawla, Correlation clustering, in: Proc. 43rd Annu. IEEE Symp. on Foundations of Computer Science, Vancouver, Canada, November 2002, pp. 238-250]: given a graph with real nonnegative edge weights and a (+)/(-) edge labelling, partition the vertices into clusters to minimize the total weight of cut (+) edges and uncut (-) edges. Thus, (+) edges with large weights (representing strong correlations between endpoints) encourage those endpoints to belong to a common cluster while (-) edges with large weights encourage the endpoints to belong to different clusters. In contrast to most clustering problems, correlation clustering specifies neither the desired number of clusters nor a distance threshold for clustering; both of these parameters are effectively chosen to be the best possible by the problem definition. Correlation clustering was introduced by Bansal et al. [Correlation clustering, in: Proc. 43rd Annu. IEEE Symp. on Foundations of Computer Science, Vancouver, Canada, November 2002, pp. 238-250], motivated by both document clustering and agnostic learning. They proved NP-hardness and gave constant-factor approximation algorithms for the special case in which the graph is complete (full information) and every edge has the same weight. We give an 0 (log n) -approximation algorithm for the general case based on a linear-programming rounding and the "region-growing" technique. We also prove that this linear program has a gap of Omega(log n), and therefore our approximation is tight under this approach. We also give an O(r(3))-approximation algorithm for K-r,K-r-minor-free graphs. On the other hand, we show that the problem is equivalent to minimum multicut, and therefore APX-hard and difficult to approximate better than Theta(log n). (C) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:172 / 187
页数:16
相关论文
共 30 条
[1]  
Bansal N, 2002, ANN IEEE SYMP FOUND, P238, DOI 10.1109/SFCS.2002.1181947
[2]  
BEJERANO Y, 2003, P 9 ANN INT C MOB CO, P109
[3]   Maximizing quadratic programs: extending Grothendieck's inequality [J].
Charikar, M ;
Wirth, A .
45TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2004, :54-60
[4]   Clustering with qualitative information [J].
Charikar, M ;
Guruswami, V ;
Wirth, A .
44TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2003, :524-533
[5]   THE COMPLEXITY OF MULTITERMINAL CUTS [J].
DAHLHAUS, E ;
JOHNSON, DS ;
PAPADIMITRIOOU, CH ;
SEYMOUR, PD ;
YANNAKAKIS, M .
SIAM JOURNAL ON COMPUTING, 1994, 23 (04) :864-894
[6]  
DEMAINE ED, 2003, P 6 INT WORKSH APPR, P1
[7]  
EMANUEL D, 2003, P 11 ANN EUR S ALG, P208
[8]  
Ester M., 1998, KI, V12, P18
[9]   Approximate max-flow min-(multi)cut theorems and their applications [J].
Garg, N ;
Vazirani, VV ;
Yannakakis, M .
SIAM JOURNAL ON COMPUTING, 1996, 25 (02) :235-251
[10]   Primal-dual approximation algorithms for integral flow and multicut in trees [J].
Garg, N ;
Vazirani, VV ;
Yannakakis, M .
ALGORITHMICA, 1997, 18 (01) :3-20