基于自适应Nystrm采样的大数据谱聚类算法

被引：52

作者：

丁世飞 ^{[1
,2
]}

贾洪杰 ^{[1
,2
]}

史忠植 ^{[2
]}

机构：

[1] 中国矿业大学计算机科学与技术学院

[2] 中国科学院计算技术研究所智能信息处理重点实验室

来源：

软件学报 | 2014年 / 25卷 / 09期

关键词：

大数据; 谱聚类; 特征分解; Nystrm扩展; 自适应采样;

D O I：

10.13328/j.cnki.jos.004643

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

面对结构复杂的数据集,谱聚类是一种灵活而有效的聚类方法,它基于谱图理论,通过将数据点映射到一个由特征向量构成的低维空间,优化数据的结构,得到令人满意的聚类结果.但在谱聚类的过程中,特征分解的计算复杂度通常为O(n3),限制了谱聚类算法在大数据中的应用.Nystrm扩展方法利用数据集中的部分抽样点,进行近似计算,逼近真实的特征空间,可以有效降低计算复杂度,为大数据谱聚类算法提供了新思路.抽样策略的选择对Nystrm扩展技术至关重要,设计了一种自适应的Nystrm采样方法,每个数据点的抽样概率都会在一次采样完成后及时更新,而且从理论上证明了抽样误差会随着采样次数的增加呈指数下降.基于自适应的Nystrm采样方法,提出一种适用于大数据的谱聚类算法,并对该算法的可行性和有效性进行了实验验证.

引用

页码：2037 / 2049

页数：13

共 14 条

[1] Large scale spectral clustering with landmark-based representation. Chen XL,Deng C. Proc.of the 25th AAAI Conf.on Artificial Intelligence . 2011
[2] Power iteration clustering. Lin F,Cohen WW. Proc.of the Int’’l Conf.on Machine Learning . 2010
[3] Parallel spectral clustering. Song YQ,Chen WY,Bai HJ,Lin CJ,Chang EY. Machine Learning and Knowledge Discovery in Databases . 2008
[4] 聚类算法研究
孙吉贵
刘杰
赵连宇
[J]. 软件学报, 2008, (01) : 48 - 61
[5] Spectral methods in machine learning and new strategies for very large datasets. Mohamed-Ali Belabbas,Patrick J. Wol. Proceedings of the National Academy of Sciences of the United States of America . 2009
[6] Making Large-Scale Nystrom Approximation Possible. Mu L,James T K,Bao-Liang L. Proceedings of the 27th International Conference on Machine Learning . 2010
[7] Using the Nystrom Method to Speed up Kernel Machines. Williams C K I,Seeger M. Advances in Neural Information Processing Systems . 2001
[8] Motion segmentation and tracking using normalized cuts. Shi J B,Malik J. IEEE International Conference on Computer Vision (ICCV) . 1998
[9] Ensemble Nystrom method. Kumar S,Mohri M,Talwalkar A. Advances in Neural Information Processing Systems . 2009
[10] Cluster ability Analysis and Incremental Sampling for Nystrom Extension Based Spectral Clustering. Xianchao Zhang,Quanzeng You. 2011 IEEE International Conference on Data Mining . 2011

← 1 2 →