基于多维网格空间的改进K-means聚类算法

doi:10.11772/j.issn.1001-9081.2018040830

计算机应用 ›› 2018, Vol. 38 ›› Issue (10): 2850-2855.DOI: 10.11772/j.issn.1001-9081.2018040830

基于多维网格空间的改进K-means聚类算法

邵伦, 周新志, 赵成萍, 张旭

四川大学电子信息学院, 成都 610065

收稿日期:2018-04-23 修回日期:2018-06-24 发布日期:2018-10-13 出版日期:2018-10-10
通讯作者: 周新志
作者简介:邵伦(1992-),男,湖北荆州人,硕士研究生,主要研究方向:智能控制、数据挖掘;周新志(1966-),男,四川德阳人,教授,博士,主要研究方向:人工智能、智能控制、分布式测控系统;赵成萍(1975-),女,山西太原人,副教授,博士,主要研究方向:模式识别、智能系统;张旭(1992-),男,安徽阜阳人,硕士研究生,主要研究方向:智能控制、数据挖掘。
基金资助:
国家973计划项目（2013CB328903-2）。

Improved K-means clustering algorithm based on multi-dimensional grid space

SHAO Lun, ZHOU Xinzhi, ZHAO Chengping, ZHANG Xu

College of Electronics and Information Engineering, Sichuan University, Chengdu Sichuan 610065, China

Received:2018-04-23 Revised:2018-06-24 Online:2018-10-13 Published:2018-10-10
Supported by:
This work is partially supported by the National Basic Research Program (973 Program) of China (2013CB328903-2).

摘要/Abstract

摘要： K-means算法是被广泛使用的一种聚类算法，传统的K-means算法中初始聚类中心的选择具有随机性，易使算法陷入局部最优，聚类结果不稳定。针对此问题，引入多维网格空间的思想，首先将样本集映射到一个虚拟的多维网格空间结构中，然后从中搜索出包含样本数最多且距离较远的子网格作为初始聚类中心网格，最后计算出各初始聚类中心网格中所包含样本的均值点来作为初始聚类中心。此法选择出来的初始聚类中心与实际聚类中心拟合度高，进而可据此初始聚类中心稳定高效地得到最终的聚类结果。通过使用计算机模拟数据集和UCI机器学习数据集进行测试，结果表明改进算法的迭代次数和错误率比较稳定，且均小于传统K-means算法测试结果的平均值，能有效避免陷入局部最优，并且聚类结果稳定。

关键词: K-means算法, 聚类算法, 初始聚类中心, 多维网格空间, 均值点

Abstract: K-means algorithm is a widely used clustering algorithm, but the selection of the initial clustering centers in the traditional K-means algorithm is random, which makes the algorithm easily fall into local optimum and causes instability in the clustering result. In order to solve this problem, the idea of multi-dimensional grid space was introduced to the selection of initial clustering center. Firstly, the sample set was mapped to a virtual multi-dimensional grid space structure. Secondly, the sub-grids containing the largest number of samples and being far away from each other were searched as the initial cluster center grids in the space structure. Finally, the mean points of the samples in the initial cluster center grids were calculated as the initial clustering centers. The initial clustering centers chosen by this method are very close to the actual clustering centers, so that the final clustering result can be obtained stably and efficiently. By using computer simulation data set and UCI machine learning data sets to test, both the iterative number and error rate of the improved algorithm are stable, and smaller than the average of the traditional K-means algorithm. The improved algorithm can effectively avoid falling into local optimum and guarantee the stability of clustering result.

Key words: K-means algorithm, clustering algorithm, initial clustering center, multi-dimensional grid space, mean point

中图分类号:

TP301.6

邵伦, 周新志, 赵成萍, 张旭. 基于多维网格空间的改进K-means聚类算法[J]. 计算机应用, 2018, 38(10): 2850-2855.

SHAO Lun, ZHOU Xinzhi, ZHAO Chengping, ZHANG Xu. Improved K-means clustering algorithm based on multi-dimensional grid space[J]. Journal of Computer Applications, 2018, 38(10): 2850-2855.

参考文献

[1] FAHIM A M, SALEM A M, TORKEY F A, et al. An efficient enhanced k-means clustering algorithm[J]. Journal of Zhejiang University, Science A, 2006, 7(10):1626-1633.
[2] DATTA S, GIANNELLA C, KARGUPTA H, et al. Approximate distributed K-means clustering over a peer-to-peer network[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(10):1372-1388.
[3] 李敏, 张桂珠.密度峰值优化初始中心的K-means算法[J]. 计算机应用与软件, 2017, 34(3):212-217. (LI M, ZHANG G Z. K-means algorithm of optimized initial center by density peaks[J]. Computer Applications and Software, 2017, 34(3):212-217.)
[4] KWON K, SHIN J W, KIM N S. Incremental basis estimation adopting global k-means algorithm for NMF-based noise reduction[J]. Applied Acoustics, 2018, 129:277-283.
[5] 陈朝威, 常冬霞.基于密度差分的自动聚类算法[J]. 软件学报, 2018, 29(4):935-944. (CHEN Z W, CHANG D X. Automatic clustering algorithm based on density difference[J]. Journal of Software, 2018, 29(4):935-944.)
[6] 于佐军, 秦欢.基于改进蜂群算法的K-means算法[J]. 控制与决策, 2018, 33(1):181-185. (YU Z J, QIN H. K-means algorithm based on improved artificial bee colony algorithm[J]. Control and Decision, 2018, 33(1):181-185.)
[7] 覃华, 詹娟娟, 苏一丹.基于概率无向图模型的近邻传播聚类算法[J]. 控制与决策, 2017, 32(10):1796-1802. (QIN H, ZHAN J J, SU Y D. Affinity propagation clustering algorithm based on probabilistic undirected graphical model[J]. Control and Decision, 2017, 32(10):1796-1802.)
[8] 周润物, 李智勇, 陈水淼, 等.面向大数据处理的并行优化抽样聚类K-means算法[J]. 计算机应用, 2016, 36(2):311-315. (ZHOU R W, LI Z Y, CHEN S M, et al. Parallel optimization sampling clustering K-means algorithm for big data processing[J]. Journal of Computer Applications, 2016, 36(2):311-315.)
[9] 雷小锋, 谢昆青, 林帆, 等.一种基于K-means局部最优的高效聚类算法[J]. 软件学报, 2008, 19(7):1683-1692. (LEI X F, XIE K Q, LIN F, et al. An efficient clustering algorithm based on local optimality of K-means[J]. Journal of Software, 2008, 19(7):1683-1692.)
[10] 周涛,陆惠玲.数据挖掘中聚类算法研究进展[J]. 计算机工程与应用,2012, 48(12):100-111.(ZHOU T, LU H L. Clustering algorithm research advances on data mining[J]. Computer Engineering and Applications, 2012, 48(12):100-111.)
[11] 贾瑞玉, 李振.基于最小生成树的层次K-means聚类算法[J]. 微电子学与计算机, 2016, 33(3):86-93. (JIA R Y, LI Z. The level of K-means clustering algorithm based on the minimum spanning tree[J]. Microelectronics & Computer, 2016, 33(3):86-93.)
[12] BAI L, CHENG X Q, LIANG J Y, et al. Fast density clustering strategies based on the k-means algorithm[J]. Pattern Recognition, 2017, 71:375-386.
[13] 贺玲, 吴玲达, 蔡益朝.数据挖掘中的聚类算法综述[J]. 计算机应用研究, 2007, 23(1):10-13. (HE L, WU L D, CAI Y C. Survey of clustering algorithms in data mining[J]. Application Research of Computers, 2007, 23(1):10-13.)
[14] 王骏,王士同,邓赵红.聚类分析研究中的若干问题[J].控制与决策, 2012, 27(3):321-328.(WANG J, WANG S T, DENG Z H, et al. Survey on challenges in clustering analysis research[J]. Control and Decision, 2012, 27(3):321-328.)
[15] 刘敏娟, 柴玉梅, 张西芝.基于相似度的网格聚类算法[J]. 计算机工程与应用, 2007, 43(7):198-201. (LIU M J, CHAI Y M, ZHANG X Z. Similarity-based grid clustering algorithm[J]. Computer Engineering and Applications, 2007, 43(7):198-201.)
[16] 冯波, 郝文宁, 陈刚, 等.K-means算法初始聚类中心选择的优化[J]. 计算机工程与应用, 2013, 49(14):182-185. (FENG B, HAO W N, CHEN G, et al. Optimization to K-means initial cluster centers[J]. Computer Engineering and Applications, 2013, 49(14):182-185.)
[17] 王勇, 唐靖, 饶勤菲, 等.高效率的K-means最佳聚类数确定算法[J]. 计算机应用, 2014, 34(5):1331-1335. (WANG Y, TANG J, RAO Q F, et al. High efficient K-means algorithm for determining optimal of clusters[J]. Journal of Computer Applications, 2014, 34(5):1331-1335.)
[18] 翟东海, 鱼江, 高飞, 等.最大距离法选取初始簇中心的K-means文本聚类算法的研究[J]. 计算机应用研究, 2014, 31(3):713-719. (ZHAI D H, YU J, GAO F, et al. K-means text clustering algorithm based on initial cluster centers selection according to maximum distance[J]. Application Research of Computers, 2014, 31(3):713-719.)
[19] 郑丹, 王潜平.K-means初始聚类中心的选择算法[J]. 计算机应用, 2012, 32(8):2186-2188. (ZHENG D, WANG Q P. Selection algorithm for K-means initial clustering center[J]. Journal of Computer Applications, 2012, 32(8):2186-2188.)
[20] 谢娟英, 王艳娥.最小方差优化初始聚类中心的K-means算法[J]. 计算机工程, 2014, 40(8):205-223. (XIE J Y, WANG Y E. K-means algorithm based on minimum deviation initialized clustering centers[J]. Computer Engineering, 2014, 40(8):205-223.)

基于多维网格空间的改进K-means聚类算法

Improved K-means clustering algorithm based on multi-dimensional grid space

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	杨成昊, 胡节, 王红军, 彭博. 基于注意力机制的不完备多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3784-3789.
[2]	徐雪冉, 杨庚, 黄喻先. 横向联邦学习中差分隐私聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 217-222.
[3]	钟静, 林晨, 盛志伟, 张仕斌. 基于汉明距离的量子K-Means算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2493-2498.
[4]	翟冉, 陈学斌, 张国鹏, 裴浪涛, 马征. 基于不同敏感度的改进K-匿名隐私保护算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1497-1503.
[5]	王逸, 裴生雷, 王煜. 基于CSI和K-means-SVR的多指纹库室内定位方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1636-1640.
[6]	王啸飞, 鲍胜利, 陈炯环. 基于潜在因子模型在子空间上的缺失值注意力聚类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3772-3778.
[7]	李宇航, 杨玉丽, 马垚, 于丹, 陈永乐. 基于BERT模型的文本对抗样本生成方法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3093-3098.
[8]	王谨东, 李强. 基于Raft算法改进的实用拜占庭容错共识算法[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 122-129.
[9]	孙泽强, 陈炳才, 崔晓博, 王磊, 陆雅诺. 融合频域注意力机制和解耦头的YOLOv5带钢表面缺陷检测[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 242-249.
[10]	章曼, 张正军, 冯俊淇, 严涛. 基于自适应可达距离的密度峰值聚类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1914-1921.
[11]	孙建军, 徐岩. 基于加权改进模糊C均值聚类的欠定混合矩阵估计[J]. 计算机应用, 2020, 40(6): 1769-1773.
[12]	黄永鑫, 唐雪飞. 基于近邻传播聚类和TANE算法的高校数据中函数依赖的发现[J]. 计算机应用, 2020, 40(1): 90-95.
[13]	毛伊敏, 刘银萍, 梁田, 毛丁慧. 基于模糊谱聚类的不确定蛋白质相互作用网络功能模块挖掘[J]. 计算机应用, 2019, 39(4): 1032-1040.
[14]	陈万志, 徐东升, 张静, 唐雨. 结合优化支持向量机与K-means++的工控系统入侵检测方法[J]. 计算机应用, 2019, 39(4): 1089-1094.
[15]	丁成, 王秋萍, 王晓峰. 基于广义反向学习的磷虾群算法及其在数据聚类中的应用[J]. 计算机应用, 2019, 39(2): 336-342.