计算机应用 ›› 2011, Vol. 31 ›› Issue (01): 97-100.

• 人工智能 • 上一篇    下一篇

粗糙K-Modes聚类算法

李仁侃1,叶东毅2   

  1. 1. 福建省福州市福州大学数学与计算机科学学院
    2. 福州大学数学与计算机科学学院
  • 收稿日期:2010-06-03 修回日期:2010-08-16 发布日期:2011-01-12 出版日期:2011-01-01
  • 通讯作者: 李仁侃
  • 基金资助:
    国家自然科学基金资助项目

Rough K-Modes clustering algorithm

  • Received:2010-06-03 Revised:2010-08-16 Online:2011-01-12 Published:2011-01-01
  • Contact: Li RenKan

摘要: Michael K.Ng等人提出了新K-Modes聚类算法,它采用基于相对频率的启发式相异度度量方法,有效地提高了聚类精度,但不足的是在计算各类的属性分类值频率时假定类中样本对聚类的贡献相同。为了考虑类中样本对类中心的不同影响,提出一种粗糙K-Modes算法,通过粗糙集的上、下近似度量数据样本在类内的重要性程度,不仅可以获得比新K-Modes算法更好的聚类效果,而且可以在保证聚类效果的基础上降低白亮等人提出的基于粗糙集改进的K-Modes算法的计算复杂度。对几个UCI的数据集的测试实验结果显示出新算法的优良性能。

关键词: 聚类, K-Modes算法, 粗糙集, 类中心, 聚类精度

Abstract: Michael K.Ng et al proposed the new K-Modes clustering algorithm. It takes the heuristic dissimilarity measure method based on the relative frequency and improves the clustering accuracy. However, when computing the attribute category frequency in each cluster, it assumes each object of the samples plays a uniform contribution to the cluster center. To consider the particular contribution of the different objects, a rough K-Modes algorithm is proposed in this paper. By a new approach based on the upper and lower approximate of rough set to measure the important level of each object in its corresponding cluster, the better clustering results can be achieved than the new K-Modes algorithm, and the computational complexity can be reduced in comparison with the improved K-Modes clustering algorithm based on rough sets of Bai Liang et al with the equivalent clustering results. The experimental results on several UCI data sets illustrate the effectiveness of the proposed algorithm.

Key words: clustering, K-Modes algorithm, rough set, cluster center, clustering accuracy