Feature weighting in k-means clustering

被引：292

作者：

Modha, DS ^{[1
]}

Spangler, WS ^{[1
]}

机构：

[1] IBM Corp, Almaden Res Ctr, San Jose, CA 95120 USA

来源：

MACHINE LEARNING | 2003年 / 52卷 / 03期

关键词：

clustering; convexity; convex k-means algorithm; feature combination; feature selection; Fisher's discriminant analysis; text mining; unsupervised learning;

D O I：

10.1023/A:1024016609528

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data sets with multiple, heterogeneous feature spaces occur frequently. We present an abstract framework for integrating multiple feature spaces in the k-means clustering algorithm. Our main ideas are (i) to represent each data object as a tuple of multiple feature vectors, (ii) to assign a suitable ( and possibly different) distortion measure to each feature space, (iii) to combine distortions on different feature spaces, in a convex fashion, by assigning ( possibly) different relative weights to each, (iv) for a fixed weighting, to cluster using the proposed convex k-means algorithm, and ( v) to determine the optimal feature weighting to be the one that yields the clustering that simultaneously minimizes the average within-cluster dispersion and maximizes the average between-cluster dispersion along all the feature spaces. Using precision/recall evaluations and known ground truth classifications, we empirically demonstrate the effectiveness of feature weighting in clustering on several different application domains.

引用

页码：217 / 237

页数：21

共 34 条

[1] AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[2] AHONENMYKA H, 1999, ICML 99 WORKSH MACH, P11
[3] [Anonymous], [No title captured]
[4] BAY SD, 1999, UCI KDD ARCH
[5] Blake C.L., 1998, UCI repository of machine learning databases
[6] Selection of relevant features and examples in machine learning
Blum, AL
Langley, P
[J]. ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) : 245 - 271
[7] Diversity by design
Bradbury, A
[J]. TRENDS IN BIOTECHNOLOGY, 1998, 16 (03) : 99 - 102
[8] Caruana R., 1994, MACH LEARN P 1994, P28, DOI 10.1016/B978-1-55860-335-6.50012-X
[9] DEVANEY M., 1997, P 14 INT C MACH LEAR, P92
[10] Concept decompositions for large sparse text data using clustering
Dhillon, IS
Modha, DS
[J]. MACHINE LEARNING, 2001, 42 (1-2) : 143 - 175

← 1 2 3 4 →