An entropy weighting mixture model for subspace clustering of high-dimensional data

被引:17
作者
Peng, Liuqing [1 ]
Zhang, Junying [1 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
基金
中国国家自然科学基金;
关键词
Subspace clustering; High-dimensional data; Gaussian mixture models; Local feature relevance; Shape volume;
D O I
10.1016/j.patrec.2011.03.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In high-dimensional data, clusters of objects usually exist in subspaces; besides, different clusters probably have different shape volumes. Most existing methods for high-dimensional data clustering, however, only consider the former factor. They ignore the latter factor by assuming the same shape volume value for different clusters. In this paper we propose a new Gaussian mixture model (GMM) type algorithm for discovering clusters with various shape volumes in subspaces. We extend the GMM clustering method to calculate a local weight vector as well as a local variance within each cluster, and use the weight and variance values to capture main properties that discriminate different clusters, including subsets of relevant dimensions and shape volumes. This is achieved by introducing negative entropy of weight vectors, along with adaptively-chosen coefficients, into the objective function of the extended GMM. Experimental results on both synthetic and real datasets show that the proposed algorithm outperforms its competitors, especially when applying to high-dimensional datasets. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1154 / 1161
页数:8
相关论文
共 26 条
[1]  
AGRAWAL C, 1999, P ACM SIGMOD INT C M, V28, P61
[2]   Automatic subspace clustering of high dimensional data [J].
Agrawal, R ;
Gehrke, J ;
Gunopulos, D ;
Raghavan, P .
DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (01) :5-33
[3]  
[Anonymous], 2002, Principal components analysis
[4]  
[Anonymous], 2004, SIGKDD EXPLOR, DOI DOI 10.1145/1007730.1007731
[5]  
[Anonymous], 2007, Uci machine learning repository
[6]   Soft clustering using weighted one-class support vector machines [J].
Bicego, Manuele ;
Figueiredo, Mario A. T. .
PATTERN RECOGNITION, 2009, 42 (01) :27-32
[7]   Robust fuzzy clustering using mixtures of Student's-t distributions [J].
Chatzis, Sotirios ;
Varvarigou, Theodora .
PATTERN RECOGNITION LETTERS, 2008, 29 (13) :1901-1905
[8]  
Chen LF, 2008, IEEE DATA MINING, P755, DOI 10.1109/ICDM.2008.15
[9]   Gene expression patterns in human liver cancers [J].
Chen, X ;
Cheung, ST ;
So, S ;
Fan, ST ;
Barry, C ;
Higgins, J ;
Lai, KM ;
Ji, JF ;
Dudoit, S ;
Ng, IOL ;
van de Rijn, M ;
Botstein, D ;
Brown, PO .
MOLECULAR BIOLOGY OF THE CELL, 2002, 13 (06) :1929-1939
[10]  
Cheng C H, 1999, P 5 ACM SIGKDD INT C, P84, DOI DOI 10.1145/312129.312199