Novel K-medoids clustering algorithm based on breadth-first search

doi:10.11772/j.issn.1001-9081.2015.05.1302

Journal of Computer Applications ›› 2015, Vol. 35 ›› Issue (5): 1302-1305.DOI: 10.11772/j.issn.1001-9081.2015.05.1302

Previous Articles Next Articles

Novel K-medoids clustering algorithm based on breadth-first search

YAN Hongwen¹, ZHOU Yamei¹, PAN Chu^1,2

1. School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha Hunan 410114, China;
2. College of Computer Science and Electronic Engineering, Hunan University, Changsha Hunan 410082, China

Received:2014-12-05 Revised:2015-01-12 Online:2015-05-14 Published:2015-05-10

基于宽度优先搜索的K-medoids聚类算法

颜宏文¹, 周雅梅¹, 潘楚^1,2

1. 长沙理工大学计算机与通信工程学院, 长沙 410114;
2. 湖南大学信息科学与工程学院, 长沙 410082

通讯作者: 潘楚
作者简介:颜宏文(1968-),女,湖南株洲人,教授,博士,主要研究方向:数据挖掘、电网数据通信; 周雅梅(1989-),女,湖南娄底人,硕士研究生,主要研究方向:数据挖掘; 潘楚(1986-),男,湖南怀化人,博士研究生,主要研究方向:数据挖掘、复杂网络.
基金资助:
国家自然科学基金资助项目(51277015);湖南省研究生科研创新项目(CX2014B386).

Abstract

Abstract:

Due to the disadvantages such as sensitivity to the initial selection of the center, random selection of centers and poor accuracy in traditional K-medoids clustering algorithm, a breadth-first search strategy for centers was proposed on the basis of granular computing effective initialization. The new algorithm selected K granules firstly using granular computing and selected their corresponding centers as the K initial centers. Secondly, according to the similarity between objects, the proposed algorithm set up binary tree of similar objects separately where the corresponding initial centers were taken as the root nodes, and then used breadth-first search to traverse the binary tree to find out K optimal centers. What's more, the fitness function was optimized by using within-cluster distance and between-cluster distance. The experimental results on standard data set Iris and Wine in UCI show that this proposed algorithm effectively reduces the number of iterations and guarantees the accuracy of clustering at the same time.

Key words: K-medoids clustering algorithm, granular computing, binary tree of similar object, breadth-first search, fitness function

摘要：

针对传统K-medoids聚类算法对初始值敏感、中心点随机选择以及聚类精度不够高等缺点,在粒计算有效初始化的基础上,提出中心点宽度优先搜索策略. 首先,利用粒计算初始化获取K个有效粒子,遴选该K个粒子所对应的K个中心点作为K个初始中心点;然后,根据对象间的相似性分别对K个粒子中的对象建立以中心点为根节点的相似对象二叉树,通过宽度优先搜索遍历二叉树迭代出最优中心点, 同时采用簇间距离和簇内距离优化准则函数. 实验结果表明,所提算法在UCI中Iris和Wine标准数据集中测试,在有效缩短迭代次数的同时保证了算法聚类准确率.

关键词: K-medoids聚类算法, 粒计算, 相似对象二叉树, 宽度优先搜索, 适应度函数

CLC Number:

TP301

YAN Hongwen, ZHOU Yamei, PAN Chu. Novel K-medoids clustering algorithm based on breadth-first search[J]. Journal of Computer Applications, 2015, 35(5): 1302-1305.

颜宏文, 周雅梅, 潘楚. 基于宽度优先搜索的K-medoids聚类算法[J]. 计算机应用, 2015, 35(5): 1302-1305.

References

[1] HAN J, KAMBER M, PEI J. Date mining: concepts and techniques[M]. FAN M, translated. Beijing: China Machine Press,2012:293-297.(HAN J, KAMBER M, PEI J.数据挖掘:概念与技术[M]. 范明,译.北京:机械工业出版社,2012:293-297.)
[2] XIA N, SU Y, QIN X. Efficient K-medoids clustering algorithm[J]. Application Research of Computers,2010,27(12):4517-4519.(夏宁霞, 苏一丹, 覃希. 一种高效的K-medoids 聚类算法[J]. 计算机应用研究,2010,27(12):4517-4519.)
[3] PARDESHI B,TOSHNIWAL D. Improved K-medoids clustering based on cluster validity index and object density[C]// Proceedings of the 2nd IEEE International Advance Computing Conference. Piscataway: IEEE,2010:379-384.
[4] ADRIANO A P, MARIO A N, CAETANO T J. Using pivots to speed-up K-medoids clustering[J]. Journal of Information and Data Management, 2011, 2(2): 221-236.
[5] MA Q, XIE J. New K-medoids clustering algorithm based on granular computing[J]. Journal of Computer Applications, 2012,32(7):1973-1977.(马箐,谢娟英.基于粒计算的K-medoids聚类算法[J].计算机应用,2012, 32(7):1973-1977.)
[6] NTOUTSI I, ZIMEK A, PALPANAS T, et al. Density-based projected clustering over high dimensional data streams[C]// Proceedings of the 2012 SIAM International Conference on Data Mining. Piscataway: IEEE, 2012, 12: 987-998.
[7] YU Y, WANG Q, KUANG J, et al. An on-line density-based clustering algorithm for spatial data stream[J]. Acta Automatica Sinica, 2012, 38(6): 1051-1058.(于彦伟, 王沁, 邝俊, 等. 一种基于密度的空间数据流在线聚类算法[J]. 自动化学报, 2012, 38(6): 1051-1058.)
[8] PARK H S, JUN C H. A simple and fast algorithm for K-medoids clustering[J]. Expert Systems with Applications,2008,36(2):3336-3341.
[9] PAN C, LUO K. Improved K-medoids clustering algorithm based on improved granular computing[J]. Journal of Computer Applications, 2014,34(7):1997-2000.(潘楚, 罗可. 基于改进粒计算的K-medoids聚类算法[J]. 计算机应用, 2014,34(7):1997-2000.)
[10] LIN T Y. Granular computing: from rough sets and neighborhood systems to information granulation and computing with words[C]// Proceedings of the 5th European Congress on Intelligent Techniques and Soft Computing. Dordrecht: Kluwer Academic Publishers, 1997: 1602-1606.
[11] WANG G, ZHANG Q, HU J. An overview of granular computing[J]. CAAI Transactions on Intelligent Systems,2007, 2(6):8-26. (王国胤,张清华,胡军.粒计算研究综述[J].智能系统学报,2007,2(6):8-26.)
[12] XU L, DING S. Research on granularity clustering algorithms[J]. Computer Science, 2011,38(8):25-28. (徐丽,丁世飞.粒度聚类算法研究[J].计算机科学,2011,38(8):25-28.)

[1]	Yuanjiang LI, Jinsheng QUAN, Yangyi TAN, Tian YANG. Attribute reduction for high-dimensional data based on bi-view of similarity and difference [J]. Journal of Computer Applications, 2023, 43(5): 1467-1472.
[2]	Yanfei LIU, Zheng PENG, Yihui WANG, Zhong WANG. PID parameter tuning of brushed direct-current motor based on improved genetic algorithm [J]. Journal of Computer Applications, 2022, 42(5): 1634-1641.
[3]	Yiheng LI, Chenxi DU, Yanyan YANG, Xiangyu LI. Feature selection algorithm for imbalanced data based on pseudo-label consistency [J]. Journal of Computer Applications, 2022, 42(2): 475-484.
[4]	XU Xiaoqiang, QIN Pinle, ZENG Jianchao. Orthodontic path planning based on improved particle swarm optimization algorithm [J]. Journal of Computer Applications, 2020, 40(7): 1938-1943.
[5]	WANG Shuyan, WANG Rui, SUN Jiaze. Test case generation method based on improved bacterial foraging optimization algorithm [J]. Journal of Computer Applications, 2019, 39(3): 845-850.
[6]	HU Xingchen, SHEN Yinghua, WU Keyu, CHENG Guangquan, LIU Zhong. Evaluation method of granular performance indexes for fuzzy rule-based models [J]. Journal of Computer Applications, 2019, 39(11): 3114-3119.
[7]	LIANG Bing, XU Hua. Kernel fuzzy C-means clustering based on improved artificial bee colony algorithm [J]. Journal of Computer Applications, 2017, 37(9): 2600-2604.
[8]	HAN Ming, LIU Jiaomin, WU Shuomei, WANG Jingtao. Path planning algorithm of mobile robot based on particle swarm optimization [J]. Journal of Computer Applications, 2017, 37(8): 2258-2263.
[9]	ZHU Chunmei, MO Hongqiang. Encoding of genetic algorithm for a class of fitness functions [J]. Journal of Computer Applications, 2017, 37(7): 1972-1976.
[10]	WANG Yuefei, YU Jiong, LU Liang. Coordinator selection strategy based on RAMCloud [J]. Journal of Computer Applications, 2016, 36(9): 2402-2408.
[11]	WANG Ningning, LU Ran, WANG Zhihao. Micro-blog recommendation algorithm by combining tag and artificial bee colony [J]. Journal of Computer Applications, 2016, 36(10): 2789-2793.
[12]	PAN Chu LUO Ke. Improved K-medoids clustering algorithm based on improved granular computing [J]. Journal of Computer Applications, 2014, 34(7): 1997-2000.
[13]	MA Qing XIE Juan-ying. New K-medoids clustering algorithm based on granular computing [J]. Journal of Computer Applications, 2012, 32(07): 1973-1977.
[14]	YANG Wei-ping LIN Meng-lei. Information granularity in interval-valued intuitionistic fuzzy information systems [J]. Journal of Computer Applications, 2012, 32(06): 1657-1661.
[15]	ZHENG Gaowei LI Miao GAO Huiyi LI Lujiu. Co-evolution theory and its application in fertilization model [J]. Journal of Computer Applications, 2011, 31(06): 1685-1688.

Novel K-medoids clustering algorithm based on breadth-first search

基于宽度优先搜索的K-medoids聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics