计算机应用 ›› 2016, Vol. 36 ›› Issue (7): 1981-1987.DOI: 10.11772/j.issn.1001-9081.2016.07.1981

• 大数据 • 上一篇    下一篇

基于改进核模糊C均值类间极大化聚类算法

李斌, 狄岚, 王少华, 于晓瞳   

  1. 江南大学 数字媒体学院, 江苏 无锡 214122
  • 收稿日期:2015-12-08 修回日期:2016-03-20 出版日期:2016-07-10 发布日期:2016-07-14
  • 通讯作者: 李斌
  • 作者简介:李斌(1991-),男,江苏泰州人,硕士研究生,主要研究方向:模式识别、数据挖掘;狄岚(1965-),女,江苏南京人,副教授,硕士,CCF会员,主要研究方向:模式识别、数字图像处理;王少华(1991-),男,江西九江人,硕士研究生,主要研究方向:图像处理、数据挖掘;于晓瞳(1989-),男,山东青岛人,硕士研究生,主要研究方向:图像处理、数据挖掘。
  • 基金资助:
    江苏省六大人才高峰项目(DZXX-028);江苏省产学研项目(BY2014023-33)。

Clustering algorithm with maximum distance between clusters based on improved kernel fuzzy C-means

LI Bin, DI Lan, WANG Shaohua, YU Xiaotong   

  1. School of Digital Media, Jiangnan University, Wuxi Jiangsu 214122, China
  • Received:2015-12-08 Revised:2016-03-20 Online:2016-07-10 Published:2016-07-14
  • Supported by:
    This work is partially supported by the Six Talent Peaks Project in Jiangsu Province (DZXX-028), the Industry University Research Project in Jiangsu Province (BY2014023-33).

摘要: 传统的核聚类仅考虑了类内元素的关系而忽略了类间的关系,对边界模糊或边界存在噪声点的数据集进行聚类分析时,会造成边界点的误分问题。为解决上述问题,在核模糊C均值(KFCM)聚类算法的基础上提出了一种基于改进核模糊C均值类间极大化聚类(MKFCM)算法。该算法考虑了类内元素和类间元素的联系,引入了高维特征空间的类间极大惩罚项和调控因子,拉大类中心间的距离,使得边界处的样本得到了较好的划分。在各模拟数据集的实验中,该算法在类中心的偏移距离相对其他算法均有明显降低。在人造高斯数据集的实验中,该算法的精度(ACC)、归一化互信息(NMI)、芮氏指标(RI)指标分别提升至0.9132,0.7575,0.9138。

关键词: 核聚类, 模糊C均值聚类, 类间极大惩罚项, 模糊边界

Abstract: General kernel clustering only concern relationship within clusters while ignoring the issue between clusters. Misclassification easily occurs when clustering data sets with fuzzy and noisy boundaries. To solve this problem, a new clustering algorithm was proposed based on Kernel Fuzzy C-Means (KFCM) clustering algorithm, which was called Kernel Fuzzy C-Means with Maximum distance between clusters (MKFCM). Considering the relationship between within-cluster elements and between-cluster elements, a penalty term representing the distance between centers in feature space and a control parameter were introduced. In this way, the distance between clustering centers was broadened and the samples near boundaries were better classified. Compared with traditional clustering algorithms, the experiments results on simulated data sets show that the proposed algorithm reduces the offset distance of clustering centers obviously. On man-made Gaussian data sets, the ACCuracy (ACC), Normalized Mutual Information (NMI) and Rand Index (RI) of the proposed algorithm were improved to 0.9132, 0.7575 and 0.9138. The proposed algorithm shows its theoretical research significance on data sets with fuzzy and noisy boundaries.

Key words: kernel clustering, Fuzzy C-Means (FCM) clustering, maximum penalty term between centers, fuzzy boundary

中图分类号: