Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (12): 3413-3422.DOI: 10.11772/j.issn.1001-9081.2020061040

• 2020 China Conference on Granular Computing and Knowledge Discovery(CGCKD 2020) •     Next Articles

Fast spectral clustering algorithm without eigen-decomposition

LIU Jingshu1, WANG Li1, LIU Jinglei2   

  1. 1. College of Data Science, Taiyuan University of Technology, Jinzhong Shanxi 030600, China;
    2. School of Computer and Control Engineering, Yantai University, Yantai Shandong 264005, China
  • Received:2020-06-12 Revised:2020-09-21 Online:2020-12-10 Published:2020-10-20
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61872260), the Natural Science Foundation of Shanxi Province (201703D421013).

无需特征分解的快速谱聚类算法

刘静姝1, 王莉1, 刘惊雷2   

  1. 1. 太原理工大学 大数据学院, 山西 晋中 030600;
    2. 烟台大学 计算机与控制工程学院, 山东 烟台 264005
  • 通讯作者: 王莉(1971-),女,山西太原人,教授,博士,CCF会员,主要研究方向:在线社会网络计算、移动通信。wangli@tyut.edu.cn
  • 作者简介:刘静姝(1997-),女,山东烟台人,硕士研究生,主要研究方向:大数据、矩阵分解;刘惊雷(1970-),男,山东烟台人,教授,博士,CCF会员,主要研究方向:图模型推理、矩阵分解
  • 基金资助:
    国家自然科学基金资助项目(61872260);山西省自然科学基金资助项目(201703D421013)。

Abstract: The traditional spectral clustering algorithm needs too much time to perform eigen-decomposition when the number of samples is very large. In order to solve the problem, a fast spectral clustering algorithm without eigen-decomposition was proposed to reduce the time overhead by multiplication update iteration. Firstly, the Nyström algorithm was used for random sampling in order to establish the relationship between the sampling matrix and the original matrix. Then, the indicator matrix was updated iteratively based on the principle of multiplication update iteration. Finally, the correctness and convergence analysis of the designed algorithm were given theoretically. The proposed algorithm was tested on five widely used real datasets and three synthetic datasets. Experimental results on real datasets show that:the average Normalized Mutual Information (NMI) of the proposed algorithm is 0.45, which is improved by 12.5% compared with that of the k-means clustering algorithm; the computing time of the proposed algorithm achieves 61.73 s, which is decreased by 61.13% compared with that of the traditional spectral clustering algorithm; and the performance of the proposed algorithm is superior to that of the hierarchical clustering algorithm, which verify the effectiveness of the proposed algorithm.

Key words: spectral clustering, Nyström sampling, convergence analysis, eigen-decomposition, multiplication update iteration

摘要: 为了解决样本数较大时,传统谱聚类算法执行特征分解消耗时间过大的问题,提出了一种无需特征分解的快速谱聚类算法,通过乘法更新迭代来降低时间开销。首先,利用Nyström方法进行随机采样,建立了采样矩阵和原始矩阵之间的关系;其次,基于乘法更新原理实现矩阵指示器矩阵的迭代更新;最后,在理论上对所设计算法进行了正确性和收敛性分析。在广泛使用的五个真实数据集和三个人工合成数据集上进行测试。实验结果表明,在真实数据集上,所提算法的标准互信息(NMI)平均值为0.45,与k-means聚类算法相比提高了12.50%;运行时间为61.73 s,与传统谱聚类算法相比减少了61.13%;而且表现性能优于层次聚类算法,验证了该算法的有效性。

关键词: 谱聚类, Nyström采样, 收敛性分析, 特征分解, 乘法更新迭代

CLC Number: