Correlation Clustering

被引:20
作者
Nikhil Bansal
Avrim Blum
Shuchi Chawla
机构
[1] Carnegie Mellon University,Computer Science Department
来源
Machine Learning | 2004年 / 56卷
关键词
clustering; approximation algorithm; document classification;
D O I
暂无
中图分类号
学科分类号
摘要
We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u, v) is labeled either + or − depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of − edges between clusters (equivalently, minimizes the number of disagreements: the number of − edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of “agnostic learning” problem.
引用
收藏
页码:89 / 113
页数:24
相关论文
共 26 条
  • [1] Alon N.(2000)Efficient testing of large graphs Combinatorica 20 451-476
  • [2] Fischer E.(2002)A new rounding procedure for the assignment problem with applications to dense graph arrangements Mathematical Programming 92 1-36
  • [3] Krivelevich M.(1999)Polynomial time approximation schemes for dense instances of NP-Hard problems JCSS 58 193-210
  • [4] Szegedy M.(1999)Algorithms for graph partitioning on the planted partition model Random Structures and Algorithms 18 116-140
  • [5] Arora S.(1996)MAX-CUT has a randomized approximation scheme in dense graphs Random Structures and Algorithms 8 187-198
  • [6] Frieze A.(1998)Property testing and its connection to learning and approximation JACM 45 653-750
  • [7] Kaplan H.(1986)A unified approach to approximation algorithms for bottleneck problems JACM 33 533-550
  • [8] Arora S.(2001)Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation JACM 48 274-296
  • [9] Karger D.(1998)Efficient noise-tolerant learning from statistical queries JACM 45 983-1006
  • [10] Karpinski M.(1994)Toward efficient agnostic learning Machine Learning 17 115-142