An entropy weighting mixture model for subspace clustering of high-dimensional data

被引:17
作者
Peng, Liuqing [1 ]
Zhang, Junying [1 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Subspace clustering; High-dimensional data; Gaussian mixture models; Local feature relevance; Shape volume;
D O I
10.1016/j.patrec.2011.03.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In high-dimensional data, clusters of objects usually exist in subspaces; besides, different clusters probably have different shape volumes. Most existing methods for high-dimensional data clustering, however, only consider the former factor. They ignore the latter factor by assuming the same shape volume value for different clusters. In this paper we propose a new Gaussian mixture model (GMM) type algorithm for discovering clusters with various shape volumes in subspaces. We extend the GMM clustering method to calculate a local weight vector as well as a local variance within each cluster, and use the weight and variance values to capture main properties that discriminate different clusters, including subsets of relevant dimensions and shape volumes. This is achieved by introducing negative entropy of weight vectors, along with adaptively-chosen coefficients, into the objective function of the extended GMM. Experimental results on both synthetic and real datasets show that the proposed algorithm outperforms its competitors, especially when applying to high-dimensional datasets. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1154 / 1161
页数:8
相关论文
共 26 条
[11]  
Dasgupta S., 1999, Proceedings of the 40th Annual Symposium on Foundations of Computer Science, FOCS'99, page, V40, P634
[12]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[13]  
Domeniconi C, 2004, SIAM PROC S, P517
[14]   Locally adaptive metrics for clustering high dimensional data [J].
Domeniconi, Carlotta ;
Gunopulos, Dimitrios ;
Ma, Sheng ;
Yan, Bojun ;
Al-Razgan, Muna ;
Papadopoulos, Dimitris .
DATA MINING AND KNOWLEDGE DISCOVERY, 2007, 14 (01) :63-97
[15]   Clustering objects on subsets of attributes [J].
Friedman, JH ;
Meulman, JJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2004, 66 :815-839
[16]   Unsupervised learning of prototypes and attribute weights [J].
Frigui, H ;
Nasraoui, O .
PATTERN RECOGNITION, 2004, 37 (03) :567-581
[17]   An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data [J].
Jing, Liping ;
Ng, Michael K. ;
Huang, Joshua Zhexue .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (08) :1026-1041
[18]  
Jing LP, 2005, LECT NOTES ARTIF INT, V3518, P802
[19]  
Kailing K, 2004, SIAM PROC S, P246
[20]  
KULLBACK S, 1987, AM STAT, V41, P340