计算机应用 ›› 2020, Vol. 40 ›› Issue (12): 3413-3422.DOI: 10.11772/j.issn.1001-9081.2020061040

• 2020年中国粒计算与知识发现学术会议(CGCKD 2020) •    下一篇

无需特征分解的快速谱聚类算法

刘静姝1, 王莉1, 刘惊雷2   

  1. 1. 太原理工大学 大数据学院, 山西 晋中 030600;
    2. 烟台大学 计算机与控制工程学院, 山东 烟台 264005
  • 收稿日期:2020-06-12 修回日期:2020-09-21 出版日期:2020-12-10 发布日期:2020-10-20
  • 通讯作者: 王莉(1971-),女,山西太原人,教授,博士,CCF会员,主要研究方向:在线社会网络计算、移动通信。wangli@tyut.edu.cn
  • 作者简介:刘静姝(1997-),女,山东烟台人,硕士研究生,主要研究方向:大数据、矩阵分解;刘惊雷(1970-),男,山东烟台人,教授,博士,CCF会员,主要研究方向:图模型推理、矩阵分解
  • 基金资助:
    国家自然科学基金资助项目(61872260);山西省自然科学基金资助项目(201703D421013)。

Fast spectral clustering algorithm without eigen-decomposition

LIU Jingshu1, WANG Li1, LIU Jinglei2   

  1. 1. College of Data Science, Taiyuan University of Technology, Jinzhong Shanxi 030600, China;
    2. School of Computer and Control Engineering, Yantai University, Yantai Shandong 264005, China
  • Received:2020-06-12 Revised:2020-09-21 Online:2020-12-10 Published:2020-10-20
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61872260), the Natural Science Foundation of Shanxi Province (201703D421013).

摘要: 为了解决样本数较大时,传统谱聚类算法执行特征分解消耗时间过大的问题,提出了一种无需特征分解的快速谱聚类算法,通过乘法更新迭代来降低时间开销。首先,利用Nyström方法进行随机采样,建立了采样矩阵和原始矩阵之间的关系;其次,基于乘法更新原理实现矩阵指示器矩阵的迭代更新;最后,在理论上对所设计算法进行了正确性和收敛性分析。在广泛使用的五个真实数据集和三个人工合成数据集上进行测试。实验结果表明,在真实数据集上,所提算法的标准互信息(NMI)平均值为0.45,与k-means聚类算法相比提高了12.50%;运行时间为61.73 s,与传统谱聚类算法相比减少了61.13%;而且表现性能优于层次聚类算法,验证了该算法的有效性。

关键词: 谱聚类, Nyström采样, 收敛性分析, 特征分解, 乘法更新迭代

Abstract: The traditional spectral clustering algorithm needs too much time to perform eigen-decomposition when the number of samples is very large. In order to solve the problem, a fast spectral clustering algorithm without eigen-decomposition was proposed to reduce the time overhead by multiplication update iteration. Firstly, the Nyström algorithm was used for random sampling in order to establish the relationship between the sampling matrix and the original matrix. Then, the indicator matrix was updated iteratively based on the principle of multiplication update iteration. Finally, the correctness and convergence analysis of the designed algorithm were given theoretically. The proposed algorithm was tested on five widely used real datasets and three synthetic datasets. Experimental results on real datasets show that:the average Normalized Mutual Information (NMI) of the proposed algorithm is 0.45, which is improved by 12.5% compared with that of the k-means clustering algorithm; the computing time of the proposed algorithm achieves 61.73 s, which is decreased by 61.13% compared with that of the traditional spectral clustering algorithm; and the performance of the proposed algorithm is superior to that of the hierarchical clustering algorithm, which verify the effectiveness of the proposed algorithm.

Key words: spectral clustering, Nyström sampling, convergence analysis, eigen-decomposition, multiplication update iteration

中图分类号: