计算机应用 ›› 2018, Vol. 38 ›› Issue (1): 104-109.DOI: 10.11772/j.issn.1001-9081.2017071716

• 人工智能 • 上一篇    下一篇

融合集群度与距离均衡优化的K-均值聚类算法

王日宏, 崔兴梅   

  1. 青岛理工大学 计算机工程学院, 山东 青岛 266033
  • 收稿日期:2017-07-17 修回日期:2017-09-04 出版日期:2018-01-10 发布日期:2018-01-22
  • 通讯作者: 王日宏
  • 作者简介:王日宏(1964-),男,山东福山人,教授,硕士,主要研究方向:智能信息处理、数据挖掘;崔兴梅(1990-),女,山东淄博人,硕士研究生,主要研究方向:智能信息处理。
  • 基金资助:
    国家自然科学基金资助项目(61502262);山东省研究生教育创新计划项目(SDYY16023)。

K-means clustering algorithm based on cluster degree and distance equilibrium optimization

WANG Rihong, CUI Xingmei   

  1. College of Computer Engineering, Qingdao University of Technology, Qingdao Shandong 266033, China
  • Received:2017-07-17 Revised:2017-09-04 Online:2018-01-10 Published:2018-01-22
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61502262), the Shandong Graduate Education Innovation Program (SDYY16023).

摘要: 针对传统K-均值算法对初始聚类中心选择较为敏感的问题,提出了一种基于融合集群度与距离均衡优化选择的K-均值聚类(K-MCD)算法。首先,基于"集群度"思想选取初始簇中心;然后,遵循所有聚类中心距离总和均衡优化的选择策略,获得最终初始簇中心;最后,对文本集进行向量化处理,并根据优化算法重新选取文本簇中心及聚类效果评价标准进行文本聚类分析。对文本数据集从准确性与稳定性两方面进行仿真实验分析,与K-均值算法相比,K-MCD算法在4个文本集上的聚类精确度分别提高了18.6、17.5、24.3与24.6个百分点;在平均进化代数方差方面,K-MCD算法比K-均值算法降低了36.99个百分点。仿真结果表明K-MCD算法能有效提高文本聚类精确度,并具有较好的稳定性。

关键词: 初始聚类中心, K-均值算法, 集群度, 距离均衡优化, 文本聚类

Abstract: To deal with the problem that the traditional K-means algorithm is sensitive to the initial clustering center selection, an algorithm of K-Means clustering based on Clustering degree and Distance equalization optimization (K-MCD) was proposed. Firstly, the initial clustering center was selected based on the idea of "cluster degree". Secondly, the selection strategy of total clustering center distance equilibrium optimization was followed to obtain the final initial clustering center. Finally, the text set was vectorized, and the text cluster center and the evaluation criteria of text clustering were reselected to perform text clustering analysis according to the optimization algorithm. The analysis of simulation experiment for the text data set was carried out from the aspects of accuracy and stability. Compared with K-means algorithm, the clustering accuracy of K-MCD algorithm was improved by 18.6, 17.5, 24.3 and 24.6 percentage points respectively for four text sets; the average evolutionary algebraic variance of K-MCD algorithm was 36.99 percentage points lower than K-means algorithm. The experimental results show that K-MCD algorithm can improve text clustering accuracy with good stability.

Key words: initial clustering center, K-means algorithm, cluster degree, distance equalization optimization, text clustering

中图分类号: