Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (2): 409-413.DOI: 10.11772/j.issn.1001-9081.2018061381

Previous Articles     Next Articles

Clustering by fast search and find of density peaks based on spectrum analysis

HAN Zhonghua1,2, BI Kaiyuan1, SI Wen1, LYU Zhe1   

  1. 1. Faculty of Information and Control Engineering, Shenyang Jianzhu Universty, Shenyang Liaoning 110168, China;
    2. Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang Liaoning 110016, China
  • Received:2018-07-03 Revised:2018-08-29 Online:2019-02-10 Published:2019-02-15
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61503259), the Liaoning Provincial Science and Technology Department Project (201602608), the Basic Research Project of Liaoning Higher Education Institutions (LJZ2017015), the Liaoning Provincial Archives Science and Technology Project (L-2018-X-10).

基于谱分析的密度峰值快速聚类算法

韩忠华1,2, 毕开元1, 司雯1, 吕哲1   

  1. 1. 沈阳建筑大学 信息与控制工程学院, 沈阳 110168;
    2. 中国科学院 沈阳自动化研究所, 沈阳 110016
  • 通讯作者: 毕开元
  • 作者简介:韩忠华(1977-),男,辽宁沈阳人,教授,博士,主要研究方向:生产经营管理、企业自动化系统集成、生产调度方法;毕开元(1994-),男,辽宁丹东人,硕士研究生,主要研究方向:机器学习、数据挖掘;司雯(1995-),女,河北唐山人,硕士研究生,主要研究方向:机器学习、数据挖掘;吕哲(1993-),男,甘肃兰州人,硕士,主要研究方向:机器学习、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61503259);辽宁省科技厅面上项目(201602608);辽宁省高等学校基本科研项目(LJZ2017015);辽宁省档案科技项目(L-2018-X-10)。

Abstract: For different clustering effects of Clustering by Fast Search and Find of Density Peaks (CFSFDP) on different datasets, an improved CFSFDP algorithm based on spectral clustering was proposed, namely CFSFDP-SA (CFSFDP based on Spectrum Analysis). Firstly, a high-dimensional non-linear dataset was mapped into a low-dimensional subspace to realize dimension reduction, then the clustering problem was transformed into the optimal partitioning problem of the graph to enhance the algorithm adaptability to the global structure of the data. Secondly, the CFSFDP algorithm was used to cluster the processed dataset. Combining the advantages of these two clustering algorithms, the clustering performance was further improved. The clustering results of two artificial linear datasets, three artificial nonlinear datasets and four real datasets in UCI show that compared with CFSFDP, the CFSFDP-SA algorithm has higher clustering precision, achieving up to 14% improvement in accuracy for high-dimensional dataset, which means CFSFDP-SA is more adaptable to the original datasets.

Key words: data clustering, adaptability, dimension reduction, Clustering by Fast Search and Find of Density Peaks (CFSFDP), spectrum analysis

摘要: 针对密度峰值快速聚类(CFSFDP)算法对不同数据集聚类效果的差异,利用谱聚类对密度峰值快速聚类算法加以改进,提出了一种基于谱分析的密度峰值快速聚类算法CFSFDP-SA。首先,将高维非线性的数据集映射到低维子空间上实现降维处理,将聚类问题转化为图的最优划分问题以增强算法对数据全局结构的适应性;然后,利用CFSFDP算法对处理后的数据集进行聚类。结合这两种聚类算法各自的优势,能进一步提升聚类算法的性能。在5个人工合成数据集(2个线性数据集和3个非线性数据集)与4个UCI数据库中真实数据集上的聚类结果显示,相比CFSFDP算法,CFSFDP-SA算法的聚类精度有一定提升,在高维数据集的聚类精度上最多提高了14%,对原始数据集的适应性更强。

关键词: 数据聚类, 适应性, 降维, 密度峰值快速聚类, 谱分析

CLC Number: