计算机应用 ›› 2015, Vol. 35 ›› Issue (3): 761-765.DOI: 10.11772/j.issn.1001-9081.2015.03.761

• 人工智能 • 上一篇    下一篇

基于核函数的谱嵌入聚类算法

王伟东, 刘兵, 管红杰, 周勇, 夏士雄   

  1. 中国矿业大学 计算机科学与技术学院, 江苏 徐州 221116
  • 收稿日期:2014-10-10 修回日期:2014-12-03 出版日期:2015-03-10 发布日期:2015-03-13
  • 通讯作者: 王伟东
  • 作者简介:王伟东(1990-),男,甘肃正宁人,硕士研究生,CCF会员,主要研究方向:机器学习、数据挖掘;刘兵(1981-),男,河南永城人,副教授,博士,CCF会员,主要研究方向:机器学习、模式识别;管红杰(1976-),男,河北望都人,副教授,博士研究生,CCF会员,主要研究方向:智能信息数据处理、遥感图像处理;周勇(1974-),男,江苏徐州人,副教授,博士,CCF会员,主要研究方向:数据挖掘、智能计算;夏士雄(1961-),男,辽宁抚顺人,教授,博士,CCF会员,主要研究方向:智能信息处理
  • 基金资助:

    国家自然科学基金青年科学基金资助项目(61403394);国家863计划项目(2012AA011004,2012AA0622022);中央高校基本科研业务费专项资金资助项目(2014QNA45);教育部博士点基金资助项目(20100095110003,20110095110010)

Spectral embedded clustering algorithm based on kernel function

WANG Weidong, LIU Bing, GUAN Hongjie, ZHOU Yong, XIA Shixiong   

  1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 221116, China
  • Received:2014-10-10 Revised:2014-12-03 Online:2015-03-10 Published:2015-03-13

摘要:

谱嵌入聚类(SEC)算法要求样本满足流形假设,样本标签总是可以嵌入到一个线性空间中去,这为线性可分数据的谱嵌入聚类问题提供了新的思路,但该算法使用的线性映射函数不适用于处理高维非线性数据。针对这一问题,通过核化线性映射函数,建立了基于核函数的谱嵌入聚类(KSEC)模型,该模型既能解决线性映射函数不能处理非线性数据的问题,又实现了对高维数据的核降维。在真实数据集上的实验分析结果表明,使用所提算法后聚类正确率平均提高了13.11%,最高可提高31.62%,特别在高维数据上平均提高了16.53%,而且在算法关于参数的敏感度实验中发现算法的稳定性更好。所以改进后的算法对高维非线性数据具有很好的聚类效果,获得了比传统谱嵌入聚类算法更高的聚类准确率和更好的聚类性能。所提方法可以用于诸如遥感影像这类复杂图像的处理领域。

关键词: 谱聚类, 谱嵌入, 核函数, 高维数据

Abstract:

Samples are required to meet the manifold assumption in Spectral Embedded Clustering (SEC) algorithm, and class labels of samples can always be embedded in a linear space, which provides a new idea for spectral clustering of linearly separable data, but the linear mapping function used by the spectral embedded clustering algorithm is not available to process the nonlinear high-dimensional data. To solve this problem, this paper cored the linear mapping function, built a Spectral Embedded Clustering based on Kernel function (KSEC) model. This model can solve the problem that the linear mapping function can't deal with nonlinear data, as well as it can achieve kernel's dimension reduction synchronously. The experimental results on real data sets show that the improved algorithm can improve the clustering accuracy by 13.11% averagely, and the highest 31.62%, especially for high-dimensional data clustering accuracy can be increased by 16.53% on average. And the sensitive experiments on algorithm to parameters show the stability of the improved algorithm, so compared with traditional spectral clustering algorithms, higher accuracy and better clustering performance are obtained. And the method can be used for such complex image processing field as remote sensing image.

Key words: spectral clustering, spectral embedded, kernel function, high-dimensional data

中图分类号: