高效率的K-means最佳聚类数确定算法

doi:10.11772/j.issn.1001-9081.2014.05.1331

计算机应用 ›› 2014, Vol. 34 ›› Issue (5): 1331-1335.DOI: 10.11772/j.issn.1001-9081.2014.05.1331

高效率的K-means最佳聚类数确定算法

王勇,唐靖,饶勤菲,袁巢燕

重庆理工大学计算机科学与工程学院，重庆 400054

收稿日期:2013-11-25 修回日期:2013-12-25 出版日期:2014-05-01 发布日期:2014-05-30
通讯作者: 王勇
作者简介:王勇(1974-)，男，重庆人，副教授，博士，主要研究方向：多媒体、网络；唐靖（1988-），女，湖南永州人，硕士研究生，主要研究方向：图像处理；饶勤菲（1990-），男，江西吉安人，硕士研究生，主要研究方向：图像处理；袁巢燕(1987-)，女，安徽合肥人，硕士研究生，主要研究方向：无线传感器网络、嵌入式技术。
基金资助:
重庆市教委资助项目;重庆理工大学研究生创新基金资助项目

High efficient K-means algorithm for determining optimal number of clusters

WANG Yong,TANG Jing,RAO Qinfei,YUAN Chaoyan

College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China

Received:2013-11-25 Revised:2013-12-25 Online:2014-05-01 Published:2014-05-30
Contact: WANG Yong

摘要/Abstract

摘要：

针对K-means聚类算法通常无法事先设定聚类数，而人为设定初始聚类数目容易导致聚类结果不够稳定的问题，提出一种新的高效率的K-means最佳聚类数确定算法。该算法通过样本数据分层来得到聚类数搜索范围的上界，并设计了一种聚类有效性指标来评价聚类后类内与类间的相似性程度，从而在聚类数搜索范围内获得最佳聚类数。仿真实验结果表明，该算法能够快速、高效地获得最佳聚类数，对数据集聚类效果良好。

Abstract:

The cluster number is not generally set by K-means clustering algorithm beforehand, and artificial initial clustering number easily leads to the problem of unstable clustering results. A high-efficient algorithm for determining the K-means optimal clustering number was presented. The algorithm got the upper bound of the number of clustering search range through stratified sample data and designed a new kind of effective clustering indicator to evaluate the clustering degree of similarity between and within class after clustering. Thus the optimal number of clusters was obtained in the search range of the clusters number. The simulation results show that the algorithm can obtain the optimal clustering number fast and accurately, and the dataset clustering effect is good.

中图分类号:

TP393

王勇唐靖饶勤菲袁巢燕. 高效率的K-means最佳聚类数确定算法[J]. 计算机应用, 2014, 34(5): 1331-1335.

WANG Yong TANG Jing RAO Qinfei YUAN Chaoyan. High efficient K-means algorithm for determining optimal number of clusters[J]. Journal of Computer Applications, 2014, 34(5): 1331-1335.

参考文献

［1］SUN J, LIU J, ZHAO L. Clustering algorithms research ［J］.Journal of Software, 2008,19(1):48-61.(孙吉贵,刘杰,赵连宇.聚类算法研究［J］.软件学报， 2008,19(1):48-61.)
［2］YU H, LI Z, YAO N. Research on optimization method for K-means clustering algorithm ［J］. Journal of Chinese Computer Systems, 2012,33(10):2273-2277.(于海涛,李梓,姚念民.K-means聚类算法优化方法的研究［J］.小型微型计算机系统,2012,33(10):2273-2277.)
［3］XING X, PAN J, JIAO L. A novel K-means clustering based on the immune programming algorithm ［J］. Chinese Journal of Computers, 2003,26(5):605-610.(行小帅,潘进,焦李成.基于免疫规划的K-means聚类算法［J］.计算机学报,2003,26(5):605-610.)
［4］XU X, XIAO Y. KBAC: K-means based adaptive clustering for massive dataset ［J］. Journal of Chinese Computer Systems, 2012,33(10):2268-2272.(徐晓旻,肖仰华.KBAC:一种基于K-means的自适应聚类［J］.小型微型计算机系统,2012,33(10):2268-2272.)
［5］ZHANG L, CHEN Y, JI Y, et al. Research on K-means algorithm based on density ［J］. Application Research of Computers, 2011,28(11):4071-4074.(张琳,陈燕,汲业,等.一种基于密度的K-means算法研究［J］.计算机应用研究,2011,28(11):4071-4074.)
［6］ZHANG Z, WANG A, CHAI X. Easy and efficient algorithm to determine number of clusters ［J］.Computer Engineering and Applications, 2009,45(15):166-168.(张忠平，王爱杰，柴旭光.简单有效的确定聚类数目算法［J］.计算机工程与应用，2009,45(15):166-168.)
［7］ZHOU S, XU Z, TANG X. New method for determining optimal number of clusters in K-means clustering algorithm ［J］.Computer Engineering and Applications，2010,46(16):27-31.(周士兵,徐振源,唐旭清.新的K-均值算法最佳聚类数确定方法［J］.计算机工程与应用,2010,46(16):27-31.)
［8］CALINSKI T, HARABASZ J. A dendrite method for cluster analysis ［J］. Communications in Statistics, 1974,3（1）:1-27.
［9］DIMITRIADOU E, DOLNICAR S, WEINGESSEL A. An examination of indexes for determining the number of cluster in binary data sets ［J］. Psychometrika, 2002,67(3):137-160.
［10］DUDOIT S, FRIDLYAND J. A prediction-based resampling method for estimating the number of clusters in a dataset ［J］. Genome Biology, 2002,3(7):1-21.
［11］DEMBL D, KASTNER P. Fuzzy C-means method for clustering microarray data ［J］. Bioinformatics, 2003,19(8):973-980.
［12］BLAKE C L, MERZ C J. UCI repository of machine learning databases (University of California) ［EB/OL］. ［2013-06-21］. http://mlearn.ics.uci.edu/MLRepository.html.

[1]	陈港, 孟相如, 康巧燕, 阳勇. 基于拓扑分割与聚类分析的虚拟软件定义网络映射算法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3309-3318.
[2]	毕文婷林海涛张立群. 基于多阶段演化信号博弈模型的移动目标防御决策算法[J]. 计算机应用, 0, (): 0-0.
[3]	朱玉娜, 张玉涛, 闫少阁, 范钰丹, 陈韩托. 基于半监督子空间聚类的协议识别方法[J]. 计算机应用, 2021, 41(10): 2900-2904.
[4]	杨书新许景峰. 基于反向影响采样的积极影响力最大化[J]. 计算机应用, 0, (): 0-0.
[5]	郭棉, 张锦友. 移动边缘计算环境中面向机器学习的计算迁移策略[J]. 计算机应用, 2021, 41(9): 2639-2645.
[6]	倪萍, 陈伟. 基于模糊测试的反射型跨站脚本漏洞检测[J]. 计算机应用, 2021, 41(9): 2594-2601.
[7]	曾续玲李陶深巩健杜利俊. 无线供能移动边缘计算系统的安全卸载优化[J]. 计算机应用, 0, (): 0-0.
[8]	谢家贵李志平金键. 基于星火区块链的跨链机制[J]. 计算机应用, 0, (): 0-0.
[9]	张立群林海涛郇文明毕文婷. 基于OpenFlow的软件定义网络流规则冲突检测系统的设计与仿真[J]. 计算机应用, 0, (): 0-0.
[10]	赖涵光李清江勇. 基于场景变化的传输控制协议拥塞控制切换方案[J]. 计算机应用, 0, (): 0-0.
[11]	陈葳葳, 曹利, 顾翔. 基于区块链的车联网电子取证模型[J]. 计算机应用, 2021, 41(7): 1989-1995.
[12]	肖跃雷, 邓小凡. 基于证书的有线局域网安全关联方案改进与分析[J]. 计算机应用, 2021, 41(7): 1970-1976.
[13]	邓伟健陈曦. 基于时变资源的容器化虚拟网络映射算法[J]. 计算机应用, 0, (): 0-0.
[14]	董文涛, 李卓, 陈昕. 基于联邦学习的在线短视频内容分发策略[J]. 计算机应用, 2021, 41(6): 1551-1556.
[15]	施安妮, 李陶深, 王哲, 何璐. 基于缓存辅助的全双工无线携能通信系统的中继选择策略[J]. 计算机应用, 2021, 41(6): 1539-1545.

高效率的K-means最佳聚类数确定算法

High efficient K-means algorithm for determining optimal number of clusters

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics