Journal of Computer Applications ›› 2015, Vol. 35 ›› Issue (5): 1302-1305.DOI: 10.11772/j.issn.1001-9081.2015.05.1302

Previous Articles     Next Articles

Novel K-medoids clustering algorithm based on breadth-first search

YAN Hongwen1, ZHOU Yamei1, PAN Chu1,2   

  1. 1. School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha Hunan 410114, China;
    2. College of Computer Science and Electronic Engineering, Hunan University, Changsha Hunan 410082, China
  • Received:2014-12-05 Revised:2015-01-12 Online:2015-05-10 Published:2015-05-14

基于宽度优先搜索的K-medoids聚类算法

颜宏文1, 周雅梅1, 潘楚1,2   

  1. 1. 长沙理工大学 计算机与通信工程学院, 长沙 410114;
    2. 湖南大学 信息科学与工程学院, 长沙 410082
  • 通讯作者: 潘楚
  • 作者简介:颜宏文(1968-),女,湖南株洲人,教授,博士,主要研究方向:数据挖掘、电网数据通信; 周雅梅(1989-),女,湖南娄底人,硕士研究生,主要研究方向:数据挖掘; 潘楚(1986-),男,湖南怀化人,博士研究生,主要研究方向:数据挖掘、复杂网络.
  • 基金资助:

    国家自然科学基金资助项目(51277015);湖南省研究生科研创新项目(CX2014B386).

Abstract:

Due to the disadvantages such as sensitivity to the initial selection of the center, random selection of centers and poor accuracy in traditional K-medoids clustering algorithm, a breadth-first search strategy for centers was proposed on the basis of granular computing effective initialization. The new algorithm selected K granules firstly using granular computing and selected their corresponding centers as the K initial centers. Secondly, according to the similarity between objects, the proposed algorithm set up binary tree of similar objects separately where the corresponding initial centers were taken as the root nodes, and then used breadth-first search to traverse the binary tree to find out K optimal centers. What's more, the fitness function was optimized by using within-cluster distance and between-cluster distance. The experimental results on standard data set Iris and Wine in UCI show that this proposed algorithm effectively reduces the number of iterations and guarantees the accuracy of clustering at the same time.

Key words: K-medoids clustering algorithm, granular computing, binary tree of similar object, breadth-first search, fitness function

摘要:

针对传统K-medoids聚类算法对初始值敏感、中心点随机选择以及聚类精度不够高等缺点,在粒计算有效初始化的基础上,提出中心点宽度优先搜索策略. 首先,利用粒计算初始化获取K个有效粒子,遴选该K个粒子所对应的K个中心点作为K个初始中心点;然后,根据对象间的相似性分别对K个粒子中的对象建立以中心点为根节点的相似对象二叉树,通过宽度优先搜索遍历二叉树迭代出最优中心点, 同时采用簇间距离和簇内距离优化准则函数. 实验结果表明,所提算法在UCI中Iris和Wine标准数据集中测试,在有效缩短迭代次数的同时保证了算法聚类准确率.

关键词: K-medoids聚类算法, 粒计算, 相似对象二叉树, 宽度优先搜索, 适应度函数

CLC Number: