计算机应用 ›› 2020, Vol. 40 ›› Issue (11): 3211-3216.DOI: 10.11772/j.issn.1001-9081.2020020228

• 数据科学与技术 • 上一篇    下一篇

基于共享近邻的多视角谱聚类算法

宋艳, 殷俊   

  1. 上海海事大学 信息工程学院, 上海 201306
  • 收稿日期:2020-03-05 修回日期:2020-06-15 出版日期:2020-11-10 发布日期:2020-07-07
  • 通讯作者: 殷俊(1984-),男,江苏扬州人,副教授,博士,CCF会员,主要研究方向:机器学习、模式识别;junyin@shmtu.edu.cn
  • 作者简介:宋艳(1996-),女,江苏南通人,硕士研究生,主要研究方向:数据聚类、大数据分析
  • 基金资助:
    国家自然科学基金资助项目(61603243);中国博士后科学基金资助项目(2017M611503)。

Multi-view spectral clustering algorithm based on shared nearest neighbor

SONG Yan, YIN Jun   

  1. College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
  • Received:2020-03-05 Revised:2020-06-15 Online:2020-11-10 Published:2020-07-07
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61603243), the Postdoctoral Science Foundation of China (2017M611503).

摘要: 为了解决谱聚类算法中相似矩阵的构造不能满足簇内数据点高度相似的问题,给出一种基于共享近邻的多视角谱聚类算法(MV-SNN)。首先,算法通过提高共享近邻个数多的两个数据点的相似度,使同簇的数据之间的相似度更高;然后,将改进后的多个视角的相似矩阵进行相加从而整合得到全局相似矩阵;最后,为了解决一般谱聚类算法在后期仍需要通过k均值聚类算法进行数据点划分的问题,给出拉普拉斯矩阵秩约束的方法,从而直接通过全局相似矩阵得到最终的类簇结构。实验结果表明,对比其他几种多视角谱聚类算法,MV-SNN算法在三个聚类衡量标准:准确度、纯度和归一化互信息上的性能提高了1%~20%,在聚类时间上减少了50%左右,可见MV-SNN算法的聚类性能更好,用时更短。

关键词: 无监督学习, 多视角聚类, 相似矩阵, 谱聚类, 共享近邻

Abstract: In order to solve the problem that the construction of the similarity matrix in the spectral clustering algorithm cannot meet the higher similarity of the data points within the cluster, a Multi-View spectral clustering algorithm based on Shared Nearest Neighbor (MV-SNN) was given. Firstly, the similarity between two data points with a large number of shared neighbors was increased, making the similarity between the data points in the same cluster higher. Then, the improved similarity matrices of multiple views were integrated to obtain a global similarity matrix. Finally, considering that the general spectral clustering methods still need k-means clustering algorithm to divide the data points at the later stage, a rank constraint method of Laplacian matrix was proposed to directly obtain the final cluster structure through the global similarity matrix. Experimental results show that compared with other multi-view spectral algorithms, MV-SNN algorithm has the three measurement standards of clustering:accuracy, purity and normalized mutual information improved by 1%-20%, and the clustering time reduced by about 50%. It can be seen that MV-SNN algorithm can improve the clustering performance and reduce the clustering time.

Key words: unsupervised learning, multi-view clustering, similarity matrix, spectral clustering, shared neighbor

中图分类号: