流形上的非线性判别K均值聚类

计算机应用 ›› 2011, Vol. 31 ›› Issue (12): 3247-3251.

流形上的非线性判别K均值聚类

高丽平¹,周雪燕¹,詹宇斌²

1. 中原工学院计算机学院,郑州 450000
2. 国防科学技术大学计算机学院，长沙 410073

收稿日期:2011-07-04 修回日期:2011-08-16 发布日期:2011-12-12 出版日期:2011-12-01
通讯作者: 高丽平
基金资助:
国家自然科学基金资助项目;河南省科技攻关计划项目

Nonlinear discriminant K-means clustering on manifold

GAO Li-pin¹,ZHOU Xue-yan¹,ZHAN Yu-bin²

1. School of Computer, Zhongyuan University of Technology, Zhengzhou Henan 450000, China
2. School of Computer, National University of Defense Technology, Changsha Hunan 410073, China

Received:2011-07-04 Revised:2011-08-16 Online:2011-12-12 Published:2011-12-01
Contact: GAO Li-pin

摘要/Abstract

摘要： 为提高具有流形结构的高维数据的聚类性能，提出非线性判别K均值聚类算法(NDisKmeans)。该方法通过引入流形上的谱正则化技术，将数据的低维嵌入表示成数据流形上平滑函数的线性组合，然后通过最大化低维空间中聚类类间的散度与总体散度的比值，来实现对高维数据的聚类。还设计了一种收敛的迭代求解方法来求解最优组合系数矩阵和聚类赋值矩阵。NDisKmeans方法由于考虑了数据的流形结构，克服了判别K均值算法中线性映射的不足，从而提高了对高维数据聚类的性能。最后在数据集上的广泛实验表明，NDisKmeans方法能有效实现对高维数据的聚类。

关键词: 聚类, 流形, K均值聚类, 谱正则化, 谱聚类

Abstract: In real applications in pattern recognition and computer vison, high dimensional data always lie approximately on a low dimensional manifold. How to improve the performance of clustering algorithm on high dimensional data by using the manifold structure is a research hotspot in machine learning and data mining community. In this paper, a novel clustering algorithm called Nonlinear Discriminant K-means Clustering (NDisKmeans), which has taken the manifold structure of high dimensional into account, is proposed. By introducing the spectracl regularization technology, NDisKmeans first represents the desired low dimensional coordinates as linear combinations of smooth vectors predefined on the data manifold; then maximizes the ratio between inter-clusters scatter and total scatter to cluster the high dimensional data. A convergent iterative procedure is devised to solute the matrix of the combination coefficient and clustering assignment matrix. NDisKmeans overcomes the limilation of linear mapping of DisKmeans algorithm; therefore, it significantly improves the clustering performance. The systematic and extensive experiments on UCI and real world data sets have shown the effectiveness of the proposed NDisKmeans method.

Key words: clustering, manifold, K-means clustering, spectral regularization, spectral clustering

中图分类号:

TP181

高丽平周雪燕詹宇斌. 流形上的非线性判别K均值聚类[J]. 计算机应用, 2011, 31(12): 3247-3251.

GAO Li-pin ZHOU Xue-yan ZHAN Yu-bin. Nonlinear discriminant K-means clustering on manifold[J]. Journal of Computer Applications, 2011, 31(12): 3247-3251.

[1]	陈恒恒, 倪志伟, 朱旭辉, 金媛媛, 陈千. 基于聚类分析的差分隐私高维数据发布方法[J]. 计算机应用, 2021, 41(9): 2578-2585.
[2]	曾祥银, 郑伯川, 刘丹. 基于深度卷积神经网络和聚类的左右轨道线检测[J]. 计算机应用, 2021, 41(8): 2324-2329.
[3]	祝承, 赵晓琦, 赵丽萍, 焦玉宏, 朱亚飞, 陈建英, 周伟, 谭颖. 基于谱聚类半监督特征选择的功能磁共振成像数据分类[J]. 计算机应用, 2021, 41(8): 2288-2293.
[4]	戴嫣然, 戴国庆, 袁玉波. 基于肤色学习的多人脸前景抽取方法[J]. 计算机应用, 2021, 41(6): 1659-1666.
[5]	马建红, 曹文斌, 刘元刚, 夏爽. 基于功效特征的专利聚类方法[J]. 计算机应用, 2021, 41(5): 1361-1366.
[6]	王治和, 常筱卿, 杜辉. 基于万有引力的自适应近邻传播聚类算法[J]. 计算机应用, 2021, 41(5): 1337-1342.
[7]	李国荣, 冶继民, 甄远婷. 基于新的鲁棒相似性度量的时间序列聚类[J]. 计算机应用, 2021, 41(5): 1343-1347.
[8]	龙超奇, 蒋瑜, 谢雨. 基于峰值网格改进的小波聚类算法[J]. 计算机应用, 2021, 41(4): 1122-1127.
[9]	李杏峰, 黄玉清, 任珍文, 李毅红. 基于自适应邻域的鲁棒多视图聚类算法[J]. 计算机应用, 2021, 41(4): 1093-1099.
[10]	吕佳, 鲜焱. 结合改进密度峰值聚类和共享子空间的协同训练算法[J]. 计算机应用, 2021, 41(3): 686-693.
[11]	郭佳, 韩李涛, 孙宪龙, 周丽娟. 自动确定聚类中心的比较密度峰值聚类算法[J]. 计算机应用, 2021, 41(3): 738-744.
[12]	邹志文, 秦程. 基于k-means++的动态构建空间主题R树方法[J]. 计算机应用, 2021, 41(3): 733-737.
[13]	袁芊芊, 邓洪敏, 王晓航. 基于超像素快速模糊C均值聚类与支持向量机的柑橘病虫害区域分割[J]. 计算机应用, 2021, 41(2): 563-570.
[14]	张恩, 李会敏, 常键. 可验证的隐私保护k-means聚类方案[J]. 计算机应用, 2021, 41(2): 413-421.
[15]	胡誉生, 何炳蔚, 邓清康. 混合视觉系统的运动物体检测和静态地图重建[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3332-3336.