计算机应用 ›› 2014, Vol. 34 ›› Issue (5): 1331-1335.DOI: 10.11772/j.issn.1001-9081.2014.05.1331

• 人工智能 • 上一篇    下一篇

高效率的K-means最佳聚类数确定算法

王勇,唐靖,饶勤菲,袁巢燕   

  1. 重庆理工大学 计算机科学与工程学院,重庆 400054
  • 收稿日期:2013-11-25 修回日期:2013-12-25 出版日期:2014-05-01 发布日期:2014-05-30
  • 通讯作者: 王勇
  • 作者简介:王勇(1974-),男,重庆人,副教授,博士,主要研究方向:多媒体、网络;唐靖(1988-) ,女,湖南永州人,硕士研究生,主要研究方向:图像处理;饶勤菲(1990-),男,江西吉安人,硕士研究生,主要研究方向:图像处理;袁巢燕(1987-),女,安徽合肥人,硕士研究生,主要研究方向:无线传感器网络、嵌入式技术。
  • 基金资助:

    重庆市教委资助项目;重庆理工大学研究生创新基金资助项目

High efficient K-means algorithm for determining optimal number of clusters

WANG Yong,TANG Jing,RAO Qinfei,YUAN Chaoyan   

  1. College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
  • Received:2013-11-25 Revised:2013-12-25 Online:2014-05-01 Published:2014-05-30
  • Contact: WANG Yong

摘要:

针对K-means聚类算法通常无法事先设定聚类数,而人为设定初始聚类数目容易导致聚类结果不够稳定的问题,提出一种新的高效率的K-means最佳聚类数确定算法。该算法通过样本数据分层来得到聚类数搜索范围的上界,并设计了一种聚类有效性指标来评价聚类后类内与类间的相似性程度,从而在聚类数搜索范围内获得最佳聚类数。仿真实验结果表明,该算法能够快速、高效地获得最佳聚类数,对数据集聚类效果良好。

Abstract:

The cluster number is not generally set by K-means clustering algorithm beforehand, and artificial initial clustering number easily leads to the problem of unstable clustering results. A high-efficient algorithm for determining the K-means optimal clustering number was presented. The algorithm got the upper bound of the number of clustering search range through stratified sample data and designed a new kind of effective clustering indicator to evaluate the clustering degree of similarity between and within class after clustering. Thus the optimal number of clusters was obtained in the search range of the clusters number. The simulation results show that the algorithm can obtain the optimal clustering number fast and accurately, and the dataset clustering effect is good.

中图分类号: