计算机应用 ›› 2012, Vol. 32 ›› Issue (09): 2476-2479.DOI: 10.3724/SP.J.1087.2012.02476

• 数据库技术 • 上一篇    下一篇

改进的GK聚类算法

张妨妨*,钱雪忠   

  1. 江南大学 物联网工程学院,江苏 无锡 214122
  • 收稿日期:2012-03-12 修回日期:2012-05-03 发布日期:2012-09-01 出版日期:2012-09-01
  • 通讯作者: 张妨妨
  • 作者简介:张妨妨(1985-),女,安徽淮北人,硕士研究生,主要研究方向:模糊聚类; 钱雪忠(1967-),男,江苏无锡人,副教授,主要研究方向:数据库、数据挖掘、网络安全。
  • 基金资助:

    国家自然科学基金资助项目(61103129);江苏省科技支撑计划项目(BE2009009)

Improved GK clustering algorithm

ZHANG Fang-fang*,QIAN Xue-zhong   

  1. School of Internet of Things Engineering,Jiangnan University,Wuxi Jiangsu 214122,China
  • Received:2012-03-12 Revised:2012-05-03 Online:2012-09-01 Published:2012-09-01

摘要: 针对传统GK聚类算法无法自动确定聚类数和对初始聚类中心比较敏感的缺陷,提出一种改进的GK聚类算法。该算法首先通过基于类间分离度和类内紧致性的权和的新有效性指标来确定最佳聚类数;然后,利用改进的熵聚类的思想来确定初始聚类中心;最后,根据判定出的聚类数和新的聚类中心进行聚类。实验结果表明,新指标能准确地判断出类间有交叠的数据集的最佳聚类数,且改进后的算法具有更高的聚类准确率。

关键词: 聚类数, 聚类有效性指标, 初始聚类中心, 熵聚类, GK聚类算法

Abstract: Traditional GK clustering algorithm cannot automatically determine the number of clusters, and is sensitive to the initial cluster centers. According to these defects, an improved algorithm was proposed in this paper. Firstly, a new validity index, based on the weighted sum of separation between clusters and inter-cluster compactness, was proposed for the determination of the proper number of clusters. Then the idea of an improved entropy clustering was referenced to determine the initial cluster centers. Finally, the improved algorithm clustered the data sets according to the number of clusters given by the new index and the new cluster centers. The experimental results show that the new index works well in situations when there are overlapping clusters in the data set, and the improved algorithm has a higher clustering accuracy.

Key words: cluster number, cluster validity index, initial cluster center, entropy clustering, Gustafson-Kessel (GK) clustering algorithm

中图分类号: