计算机应用 ›› 2012, Vol. 32 ›› Issue (07): 1973-1977.DOI: 10.3724/SP.J.1087.2012.01973

• 人工智能 • 上一篇    下一篇

基于粒计算的K-medoids聚类算法

马箐,谢娟英   

  1. 陕西师范大学 计算机科学学院,西安710062
  • 收稿日期:2011-12-12 修回日期:2012-03-20 发布日期:2012-07-05 出版日期:2012-07-01
  • 通讯作者: 谢娟英
  • 作者简介:马箐(1986-),女,陕西合阳人,硕士研究生,主要研究方向:智能计算;谢娟英(1971-),女,陕西西安人,副教授,博士,主要研究方向:机器学习、模式识别、数据挖掘、智能计算。
  • 基金资助:

    中央高校基本科研业务费专项(GK201102007);中央高校基本科研业务费专项(GK201102007);陕西师范大学2011年研究生培养创新基金资助项目(2011CXS029)

New K-medoids clustering algorithm based on granular computing

MA Qing,XIE Juan-ying   

  1. School of Computer Science, Shaanxi Normal University, Xi'an Shaanxi 710062, China
  • Received:2011-12-12 Revised:2012-03-20 Online:2012-07-05 Published:2012-07-01
  • Contact: XIE Juan-ying

摘要: 传统K-medoids聚类算法的聚类结果随初始中心点不同而波动,且计算复杂度较高不适于处理大规模数据集;快速K-medoids聚类算法通过选择合适的初始聚类中心改进了传统K-medoids聚类算法,但是快速K-medoids聚类算法的初始聚类中心有可能位于同一类簇。为克服传统K-medoids聚类算法和快速K-medoids聚类算法的缺陷,提出一种基于粒计算的K-medoids聚类算法。算法引入粒度概念,定义新的样本相似度函数,基于等价关系产生粒子,根据粒子包含样本多少定义粒子密度,选择密度较大的前K个粒子的中心样本点作为K-medoids聚类算法的初始聚类中心,实现K-medoids聚类。UCI机器学习数据库数据集以及随机生成的人工模拟数据集实验测试,证明了基于粒计算的K-medoids聚类算法能得到更好的初始聚类中心,聚类准确率和聚类误差平方和优于传统K-medoids和快速K-medoids聚类算法,具有更稳定的聚类结果,且适用于大规模数据集。

关键词: 传统K-medoids聚类算法, 快速K-medoids聚类算法, 粒计算, 等价关系, 聚类

Abstract: Traditional K-medoids clustering algorithm has some drawbacks, such as its clustering results being sensitive to initial cluster centers and its deficiency in large datasets. Although the fast K-medoids algorithm overcame the shortcomings of traditional K-medoids, it has the potential disadvantages of selecting the exemplars in the same cluster as initial seeds for different clusters. To overcome the shortcomings of the traditional K-medoids and the fast K-medoids clustering algorithms, a granular computing based K-medoids clustering algorithm was proposed in this paper. The algorithm defined a new similarity function between samples via pooling granularity, where the granules were produced via the equivalence relationship. The density of a granule was defined according to the number of samples in it, after that the K samples closest to the centers of the first K granules were selected as the initial centers for K-medoids clustering algorithm to cluster datasets. The experimental results on the datasets from UCI machine learning repository and on the synthetic datasets all demonstrate that the new granular computing based K-medoids clustering algorithm can find much better initial centers. Its clustering accuracy and its clustering error are better than those of the traditional K-medoids and the fast K-medoids clustering algorithms. It can get much more stable results and can be applied to cluster large datasets.

Key words: traditional K-medoids clustering algorithm, fast K-medoids clustering algorithm, granular computing, equivalence relation, clustering

中图分类号: