计算机应用 ›› 2012, Vol. 32 ›› Issue (08): 2186-2192.DOI: 10.3724/SP.J.1087.2012.02186

• 数据库技术 • 上一篇    下一篇

K-means初始聚类中心的选择算法

郑丹1,2,王潜平2   

  1. 1. 江苏师范大学 人事处,江苏 徐州 221116
    2. 中国矿业大学 计算机科学与技术学院,江苏 徐州 221116
  • 收稿日期:2012-02-03 修回日期:2012-02-26 发布日期:2012-08-28 出版日期:2012-08-01
  • 通讯作者: 郑丹
  • 作者简介:郑丹(1980-),男,江苏徐州人,实验师,硕士,主要研究方向:数据挖掘;
    王潜平(1964-),男,安徽安庆人,教授,博士生导师,博士,主要研究方向:无线传感器网络、数据挖掘。
  • 基金资助:
    国家科技支撑计划项目(2008BAH37B05095)

Selection algorithm for K-means initial clustering center

ZHENG Dan1,2,WANG Qian-ping2   

  1. 1. Department of Personnel, Jiangsu Normal University, Xuzhou Jiangsu 221116, China
    2. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 221116,China
  • Received:2012-02-03 Revised:2012-02-26 Online:2012-08-28 Published:2012-08-01
  • Contact: ZHENG Dan

摘要: K-means算法随机选取初始聚类中心,容易造成聚类准确率低且聚类结果不稳定。针对这一问题,提出一种初始聚类中心的选择算法。通过k-dist的差值(DK)图分析,确定数据点在k-dist图上的位置,选择主要密度水平曲线上k-dist值最小的点作为初始聚类中心。实验证明,改进算法选择的初始聚类中心唯一,聚类结果稳定,聚类准确率高,迭代次数少。

关键词: 聚类, K-means算法, k-dist图, k-dist的差值图, 密度

Abstract: The initial clustering centers of K-means algorithm are randomly selected, which may result in low accuracy and unstable clustering. To solve these problems, a K-means initial clustering center selection algorithm was proposed. The locations of data points were determined by analyzing Difference of K-dist (DK) graph. One point with the least k-dist value on the main density curves was selected as an initial clustering center. The experimental results demonstrate that the improved algorithm can select unique initial clustering center, gain stable clustering result, get higher accuracy and reduce times of iteration.

Key words: clustering, K-means algorithm, k-dist graph, Difference of K-dist (DK) graph, density

中图分类号: