无需特征分解的快速谱聚类算法

doi:10.11772/j.issn.1001-9081.2020061040

计算机应用 ›› 2020, Vol. 40 ›› Issue (12): 3413-3422.DOI: 10.11772/j.issn.1001-9081.2020061040

• 2020年中国粒计算与知识发现学术会议(CGCKD 2020) • 下一篇

无需特征分解的快速谱聚类算法

刘静姝¹, 王莉¹, 刘惊雷²

1. 太原理工大学大数据学院, 山西晋中 030600;
2. 烟台大学计算机与控制工程学院, 山东烟台 264005

收稿日期:2020-06-12 修回日期:2020-09-21 出版日期:2020-12-10 发布日期:2020-10-20
通讯作者: 王莉(1971-),女,山西太原人,教授,博士,CCF会员,主要研究方向:在线社会网络计算、移动通信。wangli@tyut.edu.cn
作者简介:刘静姝(1997-),女,山东烟台人,硕士研究生,主要研究方向:大数据、矩阵分解;刘惊雷(1970-),男,山东烟台人,教授,博士,CCF会员,主要研究方向:图模型推理、矩阵分解
基金资助:
国家自然科学基金资助项目（61872260）；山西省自然科学基金资助项目（201703D421013）。

Fast spectral clustering algorithm without eigen-decomposition

LIU Jingshu¹, WANG Li¹, LIU Jinglei²

1. College of Data Science, Taiyuan University of Technology, Jinzhong Shanxi 030600, China;
2. School of Computer and Control Engineering, Yantai University, Yantai Shandong 264005, China

Received:2020-06-12 Revised:2020-09-21 Online:2020-12-10 Published:2020-10-20
Supported by:
This work is partially supported by the National Natural Science Foundation of China （61872260）， the Natural Science Foundation of Shanxi Province （201703D421013）.

摘要/Abstract

摘要： 为了解决样本数较大时，传统谱聚类算法执行特征分解消耗时间过大的问题，提出了一种无需特征分解的快速谱聚类算法，通过乘法更新迭代来降低时间开销。首先，利用Nyström方法进行随机采样，建立了采样矩阵和原始矩阵之间的关系；其次，基于乘法更新原理实现矩阵指示器矩阵的迭代更新；最后，在理论上对所设计算法进行了正确性和收敛性分析。在广泛使用的五个真实数据集和三个人工合成数据集上进行测试。实验结果表明，在真实数据集上，所提算法的标准互信息（NMI）平均值为0.45，与k-means聚类算法相比提高了12.50%；运行时间为61.73 s，与传统谱聚类算法相比减少了61.13%；而且表现性能优于层次聚类算法，验证了该算法的有效性。

关键词: 谱聚类, Nyström采样, 收敛性分析, 特征分解, 乘法更新迭代

Abstract: The traditional spectral clustering algorithm needs too much time to perform eigen-decomposition when the number of samples is very large. In order to solve the problem, a fast spectral clustering algorithm without eigen-decomposition was proposed to reduce the time overhead by multiplication update iteration. Firstly, the Nyström algorithm was used for random sampling in order to establish the relationship between the sampling matrix and the original matrix. Then, the indicator matrix was updated iteratively based on the principle of multiplication update iteration. Finally, the correctness and convergence analysis of the designed algorithm were given theoretically. The proposed algorithm was tested on five widely used real datasets and three synthetic datasets. Experimental results on real datasets show that:the average Normalized Mutual Information (NMI) of the proposed algorithm is 0.45, which is improved by 12.5% compared with that of the k-means clustering algorithm; the computing time of the proposed algorithm achieves 61.73 s, which is decreased by 61.13% compared with that of the traditional spectral clustering algorithm; and the performance of the proposed algorithm is superior to that of the hierarchical clustering algorithm, which verify the effectiveness of the proposed algorithm.

Key words: spectral clustering, Nyström sampling, convergence analysis, eigen-decomposition, multiplication update iteration

中图分类号:

TP181

刘静姝, 王莉, 刘惊雷. 无需特征分解的快速谱聚类算法[J]. 计算机应用, 2020, 40(12): 3413-3422.

LIU Jingshu, WANG Li, LIU Jinglei. Fast spectral clustering algorithm without eigen-decomposition[J]. Journal of Computer Applications, 2020, 40(12): 3413-3422.

参考文献

[1] 朝乐门, 邢春晓, 张勇. 数据科学研究的现状与趋势[J]. 计算机科学, 2018, 45(1):1-13.(CHAO L M,XING C X,ZHANG Y. Data science studies:state-of-the-art and trends[J]. Computer Science,2018,45(1):1-13.)
[2] CAO L. Data science:acomprehensive overview[J]. ACM Computing Surveys,2017,50(3):Article No. 43.
[3] WANG L,DONG M. Multi-level low-rank approximation-based spectral clustering for image segmentation[J]. Pattern Recognition Letters,2012,33(16):2206-2215.
[4] 周莉莉, 姜枫. 图像分割方法综述研究[J]. 计算机应用研究, 2017, 34(7):1921-1928.(ZHOU L L,JIANG F. Survey on image segmentation methods[J]. Application Research of Computers, 2017,34(7):1921-1928.)
[5] NIE F, ZHU W, LI X. Unsupervised feature selection with structured graph optimization[C]//Proceedings of the 201630th AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press,2016:1302-1308.
[6] WANG Q,ZHANG F,LI X. Optimal clustering framework for hyperspectral band selection[J]. IEEE Transaction on Geoscience and Remote Sensing,2018,56(10):5910-5922.
[7] GU G,HOU Z,CHEN C,et al. A dimensionality reduction method based on structured sparse representation for face recognition[J]. Artificial Intelligence Review,2016,46(4):431-443.
[8] 管涛, 李玉玲. 大规模矩阵降维的随机逼近方法[J]. 数学的实践与认识, 2016, 46(24):184-193.(GUAN T,LI Y L. Stochastic approximation approaches of large-scale matrix dimension reduction[J]. Mathematic in Practice and Theory, 2016, 46(24):184-193.)
[9] WU J,XIONG H,CHEN J. Towards understanding hierarchical clustering:a data distribution perspective[J]. Neurocomputing, 2009,72(10/11/12):2319-2330.
[10] NAGPAL A, JATAIN A, GAUR D. Review based on data clustering algorithms[C]//Proceedings of the 2013 IEEE Conference on Information and Communication Technologies. Piscataway:IEEE,2013:298-303.
[11] WU J,LIU H,XIONG H,et al. K-means-based consensus clustering:a unified view[J]. IEEE Transactions on Knowledge and Data Engineering,2015,27(1):155-169.
[12] WANG Y,CHEN L. Multi-view fuzzy clustering with minimax optimization for effective clustering of data from multiple sources[J]. Expert Systems with Applications,2017,72:457-466.
[13] KADIR S N, GOODMAN D F M, HARRIS K D. Highdimensional cluster analysis with the masked EM algorithm[J]. Neural Computation,2014,26(11):2379-2394.
[14] YANG Y,MA Z,YANG Y,et al. Multitask spectral clustering by exploring intertask correlation[J]. IEEE Transactions on Cybernetics,2015,45(5):1083-1094.
[15] NG A Y,JORDAN M I,WEISS Y. On spectral clustering:analysis and an algorithm[C]//Proceedings of the 200114th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2001:849-856.
[16] VON LUXBURG U. A tutorial on spectral clustering[J]. Statistics and Computing,2007,17(4):395-416.
[17] FOWLKES C,BELONGIE S,CHUNG F,et al. Spectral grouping using the Nyström method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(2):214-225.
[18] 丁世飞, 贾洪杰, 史忠植. 基于自适应Nyström采样的大数据谱聚类算法[J]. 软件学报, 2014, 25(9):2037-2049.(DING S F, JIA H J,SHI Z Z. Spectral clustering algorithm based on adaptive Nyström sampling for big data analysis[J]. Journal of Software, 2014,25(9):2037-2049.)
[19] CHEN X,CAI D. Large scale spectral clustering via landmarkbased sparse representation[C]//Proceedings of the 201125th AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press,2011:313-318.
[20] MARTIN L, LOUKAS A, VANDERGHEYNST P. Fast approximate spectral clustering for dynamicnetworks[C]//Proceedings of the 201835th International Conference on Machine Learning. New York:International Machine Learning Society, 2018:3423-3432.
[21] 叶茂, 刘文芬. 基于快速地标采样的大规模谱聚类算法[J]. 电子与信息学报, 2017, 39(2):278-284.(YE M,LIU W F. Large scale spectral clustering based on fast landmark sampling[J]. Journal of Electronics and Information Technology,2017,39(2):278-284.)
[22] CHEN W Y,SONG Y,BAI H,et al. Parallel spectral clustering in distributed systems[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(3):568-586.
[23] 张涛, 唐振民, 吕建勇. 一种基于低秩表示的子空间聚类改进算法[J]. 电子与信息学报, 2016, 38(11):2811-2818.(ZHANG T, TANG Z M,LYU J Y. Improved algorithm based on low rank representation for subspace clustering[J]. Journal of Electronics and Information Technology,2016,38(11):2811-2818.)
[24] GALLIER J. Spectral theory of unsigned and signed graphs applications to graph clustering:a survey[J]. Computing Research Repository,2016,16(4):1601-692.
[25] SHI J,MALIK J. Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000,22(8):888-905.
[26] NIE F,DING C,LUO D,et al. Improved minmax cut graph clustering with nonnegative relaxation[C]//Proceedings of the 2010 Joint European Conference on Machine Learning and Knowledge Discovery in Databases, LNCS 6322. Berlin:Springer,2010:451-466.
[27] TÜRKMEN A C. A review of nonnegative matrix factorization methods for clustering[EB/OL].[2020-05-10]. https://www.researchgate.net/profile/Ali_Caner_Turkmen/publication/280062357_A_Review_of_Nonnegative_Matrix_Factorization_Methods_for_Clustering/links/57fd28a908ae49db475537b0.pdf.
[28] DING C H Q,LI T,JORDAN M I. Convex and semi-nonnegative matrix factorizations[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2008,32(1):45-55.
[29] LEE D D,SEUNG H S. Algorithms for non-negative matrix factorization[C]//Proceedings of the 200013th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2000:535-541.
[30] LEE D D,SEUNG H S. Unsupervised learning by convex and conic coding[C]//Proceedings of the 19969th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,1996:515-521.
[31] CRAVEN M,DIPASQUO D,FREITAG D,et al. Learning to extract symbolic knowledge from the World Wide Web[C]//Proceedings of the 199815th National on Artificial Intelligence/10th Conference on Innovative Applications of Artificial Intelligence. Palo Alto:AAAI Press,1998:509-516.
[32] SEMERTZIDIS T,RAFAIIDIS D,STRINTZIS M G,et al. Largescale spectral clustering based on pairwise constraints[J]. Information and Management,2015,51(5):616-624.
[33] ZELNIK-MANOR L,PERONA P. Self-tuning spectral clustering[C]//Proceedings of the 200417th International Conference on Neural Information Processing Systems. Cambridge:MIT Press, 2004:1601-1608.

无需特征分解的快速谱聚类算法

Fast spectral clustering algorithm without eigen-decomposition

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	祝承, 赵晓琦, 赵丽萍, 焦玉宏, 朱亚飞, 陈建英, 周伟, 谭颖. 基于谱聚类半监督特征选择的功能磁共振成像数据分类[J]. 计算机应用, 2021, 41(8): 2288-2293.
[2]	李杏峰, 黄玉清, 任珍文. 联合低秩稀疏的多核子空间聚类算法[J]. 计算机应用, 2020, 40(6): 1648-1653.
[3]	宋艳, 殷俊. 基于共享近邻的多视角谱聚类算法[J]. 计算机应用, 2020, 40(11): 3211-3216.
[4]	崔艺馨, 陈晓东. Spark框架优化的大规模谱聚类并行算法[J]. 计算机应用, 2020, 40(1): 168-172.
[5]	毛伊敏, 刘银萍, 梁田, 毛丁慧. 基于模糊谱聚类的不确定蛋白质相互作用网络功能模块挖掘[J]. 计算机应用, 2019, 39(4): 1032-1040.
[6]	郭烜成, 林晖, 叶秀彩, 许传丰. 软件定义广域网中控制器部署与交换机动态迁移策略[J]. 计算机应用, 2019, 39(2): 453-457.
[7]	孙石磊, 王超, 赵元棣. 基于轮廓系数的参数无关空中交通轨迹聚类方法[J]. 计算机应用, 2019, 39(11): 3293-3297.
[8]	郑孝遥, 陈冬梅, 刘雨晴, 尤浩, 汪祥舜, 孙丽萍. 基于差分隐私保护的谱聚类算法[J]. 计算机应用, 2018, 38(10): 2918-2922.
[9]	林凯, 陈国初, 张鑫. 多交互式人工蜂群算法及其收敛性分析[J]. 计算机应用, 2017, 37(3): 760-765.
[10]	胡强, 林云. 基于观测矩阵优化的自适应压缩感知算法[J]. 计算机应用, 2017, 37(12): 3381-3385.
[11]	王伟东, 刘兵, 管红杰, 周勇, 夏士雄. 基于核函数的谱嵌入聚类算法[J]. 计算机应用, 2015, 35(3): 761-765.
[12]	张嫱嫱, 黄廷磊, 张银明. 基于聚类分析的二分网络社区挖掘[J]. 计算机应用, 2015, 35(12): 3511-3514.
[13]	徐盈盈钟才明. 基于集成学习的无监督离散化算法[J]. 计算机应用, 2014, 34(8): 2184-2187.
[14]	侯海霞原民民刘春霞. 面向大文本数据集的间接谱聚类[J]. 计算机应用, 2012, 32(12): 3274-3277.
[15]	邹小林. 改进的判别割及其在图像分割中的应用[J]. 计算机应用, 2012, 32(08): 2291-2298.