Fast spectral clustering algorithm without eigen-decomposition

doi:10.11772/j.issn.1001-9081.2020061040

Abstract

Abstract: The traditional spectral clustering algorithm needs too much time to perform eigen-decomposition when the number of samples is very large. In order to solve the problem, a fast spectral clustering algorithm without eigen-decomposition was proposed to reduce the time overhead by multiplication update iteration. Firstly, the Nyström algorithm was used for random sampling in order to establish the relationship between the sampling matrix and the original matrix. Then, the indicator matrix was updated iteratively based on the principle of multiplication update iteration. Finally, the correctness and convergence analysis of the designed algorithm were given theoretically. The proposed algorithm was tested on five widely used real datasets and three synthetic datasets. Experimental results on real datasets show that:the average Normalized Mutual Information (NMI) of the proposed algorithm is 0.45, which is improved by 12.5% compared with that of the k-means clustering algorithm; the computing time of the proposed algorithm achieves 61.73 s, which is decreased by 61.13% compared with that of the traditional spectral clustering algorithm; and the performance of the proposed algorithm is superior to that of the hierarchical clustering algorithm, which verify the effectiveness of the proposed algorithm.

Key words: spectral clustering, Nyström sampling, convergence analysis, eigen-decomposition, multiplication update iteration

摘要： 为了解决样本数较大时，传统谱聚类算法执行特征分解消耗时间过大的问题，提出了一种无需特征分解的快速谱聚类算法，通过乘法更新迭代来降低时间开销。首先，利用Nyström方法进行随机采样，建立了采样矩阵和原始矩阵之间的关系；其次，基于乘法更新原理实现矩阵指示器矩阵的迭代更新；最后，在理论上对所设计算法进行了正确性和收敛性分析。在广泛使用的五个真实数据集和三个人工合成数据集上进行测试。实验结果表明，在真实数据集上，所提算法的标准互信息（NMI）平均值为0.45，与k-means聚类算法相比提高了12.50%；运行时间为61.73 s，与传统谱聚类算法相比减少了61.13%；而且表现性能优于层次聚类算法，验证了该算法的有效性。

关键词: 谱聚类, Nyström采样, 收敛性分析, 特征分解, 乘法更新迭代

CLC Number:

TP181

LIU Jingshu, WANG Li, LIU Jinglei. Fast spectral clustering algorithm without eigen-decomposition[J]. Journal of Computer Applications, 2020, 40(12): 3413-3422.

刘静姝, 王莉, 刘惊雷. 无需特征分解的快速谱聚类算法[J]. 计算机应用, 2020, 40(12): 3413-3422.

References

[1] 朝乐门, 邢春晓, 张勇. 数据科学研究的现状与趋势[J]. 计算机科学, 2018, 45(1):1-13.(CHAO L M,XING C X,ZHANG Y. Data science studies:state-of-the-art and trends[J]. Computer Science,2018,45(1):1-13.)
[2] CAO L. Data science:acomprehensive overview[J]. ACM Computing Surveys,2017,50(3):Article No. 43.
[3] WANG L,DONG M. Multi-level low-rank approximation-based spectral clustering for image segmentation[J]. Pattern Recognition Letters,2012,33(16):2206-2215.
[4] 周莉莉, 姜枫. 图像分割方法综述研究[J]. 计算机应用研究, 2017, 34(7):1921-1928.(ZHOU L L,JIANG F. Survey on image segmentation methods[J]. Application Research of Computers, 2017,34(7):1921-1928.)
[5] NIE F, ZHU W, LI X. Unsupervised feature selection with structured graph optimization[C]//Proceedings of the 201630th AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press,2016:1302-1308.
[6] WANG Q,ZHANG F,LI X. Optimal clustering framework for hyperspectral band selection[J]. IEEE Transaction on Geoscience and Remote Sensing,2018,56(10):5910-5922.
[7] GU G,HOU Z,CHEN C,et al. A dimensionality reduction method based on structured sparse representation for face recognition[J]. Artificial Intelligence Review,2016,46(4):431-443.
[8] 管涛, 李玉玲. 大规模矩阵降维的随机逼近方法[J]. 数学的实践与认识, 2016, 46(24):184-193.(GUAN T,LI Y L. Stochastic approximation approaches of large-scale matrix dimension reduction[J]. Mathematic in Practice and Theory, 2016, 46(24):184-193.)
[9] WU J,XIONG H,CHEN J. Towards understanding hierarchical clustering:a data distribution perspective[J]. Neurocomputing, 2009,72(10/11/12):2319-2330.
[10] NAGPAL A, JATAIN A, GAUR D. Review based on data clustering algorithms[C]//Proceedings of the 2013 IEEE Conference on Information and Communication Technologies. Piscataway:IEEE,2013:298-303.
[11] WU J,LIU H,XIONG H,et al. K-means-based consensus clustering:a unified view[J]. IEEE Transactions on Knowledge and Data Engineering,2015,27(1):155-169.
[12] WANG Y,CHEN L. Multi-view fuzzy clustering with minimax optimization for effective clustering of data from multiple sources[J]. Expert Systems with Applications,2017,72:457-466.
[13] KADIR S N, GOODMAN D F M, HARRIS K D. Highdimensional cluster analysis with the masked EM algorithm[J]. Neural Computation,2014,26(11):2379-2394.
[14] YANG Y,MA Z,YANG Y,et al. Multitask spectral clustering by exploring intertask correlation[J]. IEEE Transactions on Cybernetics,2015,45(5):1083-1094.
[15] NG A Y,JORDAN M I,WEISS Y. On spectral clustering:analysis and an algorithm[C]//Proceedings of the 200114th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2001:849-856.
[16] VON LUXBURG U. A tutorial on spectral clustering[J]. Statistics and Computing,2007,17(4):395-416.
[17] FOWLKES C,BELONGIE S,CHUNG F,et al. Spectral grouping using the Nyström method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(2):214-225.
[18] 丁世飞, 贾洪杰, 史忠植. 基于自适应Nyström采样的大数据谱聚类算法[J]. 软件学报, 2014, 25(9):2037-2049.(DING S F, JIA H J,SHI Z Z. Spectral clustering algorithm based on adaptive Nyström sampling for big data analysis[J]. Journal of Software, 2014,25(9):2037-2049.)
[19] CHEN X,CAI D. Large scale spectral clustering via landmarkbased sparse representation[C]//Proceedings of the 201125th AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press,2011:313-318.
[20] MARTIN L, LOUKAS A, VANDERGHEYNST P. Fast approximate spectral clustering for dynamicnetworks[C]//Proceedings of the 201835th International Conference on Machine Learning. New York:International Machine Learning Society, 2018:3423-3432.
[21] 叶茂, 刘文芬. 基于快速地标采样的大规模谱聚类算法[J]. 电子与信息学报, 2017, 39(2):278-284.(YE M,LIU W F. Large scale spectral clustering based on fast landmark sampling[J]. Journal of Electronics and Information Technology,2017,39(2):278-284.)
[22] CHEN W Y,SONG Y,BAI H,et al. Parallel spectral clustering in distributed systems[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(3):568-586.
[23] 张涛, 唐振民, 吕建勇. 一种基于低秩表示的子空间聚类改进算法[J]. 电子与信息学报, 2016, 38(11):2811-2818.(ZHANG T, TANG Z M,LYU J Y. Improved algorithm based on low rank representation for subspace clustering[J]. Journal of Electronics and Information Technology,2016,38(11):2811-2818.)
[24] GALLIER J. Spectral theory of unsigned and signed graphs applications to graph clustering:a survey[J]. Computing Research Repository,2016,16(4):1601-692.
[25] SHI J,MALIK J. Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000,22(8):888-905.
[26] NIE F,DING C,LUO D,et al. Improved minmax cut graph clustering with nonnegative relaxation[C]//Proceedings of the 2010 Joint European Conference on Machine Learning and Knowledge Discovery in Databases, LNCS 6322. Berlin:Springer,2010:451-466.
[27] TÜRKMEN A C. A review of nonnegative matrix factorization methods for clustering[EB/OL].[2020-05-10]. https://www.researchgate.net/profile/Ali_Caner_Turkmen/publication/280062357_A_Review_of_Nonnegative_Matrix_Factorization_Methods_for_Clustering/links/57fd28a908ae49db475537b0.pdf.
[28] DING C H Q,LI T,JORDAN M I. Convex and semi-nonnegative matrix factorizations[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2008,32(1):45-55.
[29] LEE D D,SEUNG H S. Algorithms for non-negative matrix factorization[C]//Proceedings of the 200013th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2000:535-541.
[30] LEE D D,SEUNG H S. Unsupervised learning by convex and conic coding[C]//Proceedings of the 19969th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,1996:515-521.
[31] CRAVEN M,DIPASQUO D,FREITAG D,et al. Learning to extract symbolic knowledge from the World Wide Web[C]//Proceedings of the 199815th National on Artificial Intelligence/10th Conference on Innovative Applications of Artificial Intelligence. Palo Alto:AAAI Press,1998:509-516.
[32] SEMERTZIDIS T,RAFAIIDIS D,STRINTZIS M G,et al. Largescale spectral clustering based on pairwise constraints[J]. Information and Management,2015,51(5):616-624.
[33] ZELNIK-MANOR L,PERONA P. Self-tuning spectral clustering[C]//Proceedings of the 200417th International Conference on Neural Information Processing Systems. Cambridge:MIT Press, 2004:1601-1608.