On weighting clustering

被引：168

作者：

Nock, Richard

Nielsen, Frank

机构：

[1] Univ Antilles Guyane, GRIMAAG Lab, Dept Sci Interfac, F-97278 Schoelcher, Martinique, France

[2] Sony Comp Sci Labs Inc, Shinagawa Ku, Tokyo 1410022, Japan

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2006年 / 28卷 / 08期

关键词：

clustering; Bregman divergences; k-means; fuzzy k-means; expectation maximization; harmonic means clustering;

D O I：

10.1109/TPAMI.2006.168

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent papers and patents in iterative unsupervised learning have emphasized a new trend in clustering. It basically consists of penalizing solutions via weights on the instance points, somehow making clustering move toward the hardest points to cluster. The motivations come principally from an analogy with powerful supervised classification methods known as boosting algorithms. However, interest in this analogy has so far been mainly borne out from experimental studies only. This paper is, to the best of our knowledge, the first attempt at its formalization. More precisely, we handle clustering as a constrained minimization of a Bregman divergence. Weight modifications rely on the local variations of the expected complete log- likelihoods. Theoretical results show benefits resembling those of boosting algorithms and bring modified ( weighted) versions of clustering algorithms such as k- means, fuzzy c- means, Expectation Maximization ( EM), and k- harmonic means. Experiments are provided for all these algorithms, with a readily available code. They display the advantages that subtle data reweighting may bring to clustering.

引用

页码：1223 / 1235

页数：13

共 20 条

[1]

Attias H, 2000, ADV NEUR IN, V12, P209

[2]

Banerjee A, 2005, J MACH LEARN RES, V6, P1705

[3]

Banerjee A, 2004, SIAM PROC S, P234

[4]

Beal MJ, 2003, BAYESIAN STATISTICS 7, P453

[5]

Bezdek J. C., 1981, Pattern recognition with fuzzy objective function algorithms

[6]

BUDIMIR I, 2000, I INEQUALITIES PURE, V3

[7] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[8]

Devroye L., 1996, A probabilistic theory of pattern recognition

[9] A decision-theoretic generalization of on-line learning and an application to boosting [J].

Freund, Y ;

Schapire, RE .

JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139

[10]

GENTILE C, 2000, P TUT 13 INT C COMP

← 1 2 →