Correlation Clustering

被引:20
作者
Nikhil Bansal
Avrim Blum
Shuchi Chawla
机构
[1] Carnegie Mellon University,Computer Science Department
来源
Machine Learning | 2004年 / 56卷
关键词
clustering; approximation algorithm; document classification;
D O I
暂无
中图分类号
学科分类号
摘要
We consider the following clustering problem: we have a complete graph on n vertices (items), where each edge (u, v) is labeled either + or − depending on whether u and v have been deemed to be similar or different. The goal is to produce a partition of the vertices (a clustering) that agrees as much as possible with the edge labels. That is, we want a clustering that maximizes the number of + edges within clusters, plus the number of − edges between clusters (equivalently, minimizes the number of disagreements: the number of − edges inside clusters plus the number of + edges between clusters). This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of “agnostic learning” problem.
引用
收藏
页码:89 / 113
页数:24
相关论文
共 26 条
  • [11] Condon A.(2002)Testing the diameter of graphs Random Structures and Algorithms 20 165-183
  • [12] Karp R.(undefined)undefined undefined undefined undefined-undefined
  • [13] de la Vega F.(undefined)undefined undefined undefined undefined-undefined
  • [14] Goldreich O.(undefined)undefined undefined undefined undefined-undefined
  • [15] Goldwasser S.(undefined)undefined undefined undefined undefined-undefined
  • [16] Ron D.(undefined)undefined undefined undefined undefined-undefined
  • [17] Hochbaum D.(undefined)undefined undefined undefined undefined-undefined
  • [18] Shmoys D.(undefined)undefined undefined undefined undefined-undefined
  • [19] Jain K.(undefined)undefined undefined undefined undefined-undefined
  • [20] Vazirani V.(undefined)undefined undefined undefined undefined-undefined