计算机应用 ›› 2011, Vol. 31 ›› Issue (06): 1675-1677.DOI: 10.3724/SP.J.1087.2011.01675

• 数据库技术 • 上一篇    下一篇

基于信息熵的精确属性赋权K-means聚类算法

原福永,张晓彩,罗思标   

  1. 燕山大学 信息科学与工程学院,河北 秦皇岛 066004
  • 收稿日期:2010-12-22 修回日期:2011-01-20 发布日期:2011-06-20 出版日期:2011-06-01
  • 通讯作者: 张晓彩
  • 作者简介:原福永(1958-),男,黑龙江鸡西人,教授,主要研究方向:网络信息检索、数据库;张晓彩(1985-),女,河北石家庄人,硕士研究生,主要研究方向:网络信息检索、数据库;罗思标(1984-),男,江西吉安人,硕士研究生,主要研究方向:计算几何、机器人路径规划。

Accurate property weighted K-means clustering algorithm based on information entropy

YUAN Fuyong,ZHANG Xiaocai,LUO Sibiao   

  1. College of Information Science and Engineering,Yanshan University, Qinhuangdao Hebei 066004,China
  • Received:2010-12-22 Revised:2011-01-20 Online:2011-06-20 Published:2011-06-01
  • Contact: ZHANG Xiaocai

摘要: 为了进一步提高聚类的精确度,针对传统K-means算法的初始聚类中心产生方式和数据相似性判断依据,提出一种基于信息熵的精确属性赋权K-means聚类算法。首先利用熵值法对数据对象的属性赋权来修正对象间的欧氏距离,然后通过比较初聚类的赋权类别目标价值函数,选择高质量的初始聚类中心来进行更高精度和更加稳定的聚类,最后通过Matlab编程实现。实验证明该算法的聚类精确度和稳定性要明显高于传统K-means算法。

关键词: K-means, 精确度, 信息熵, 属性赋权, 初始聚类中心

Abstract: Concerning the initial clustering center generation and the data similarity judgment basis of the traditional K-means algorithm, the paper proposed an accurate property weighted K-means clustering algorithm based on information entropy to further improve the clustering accuracy. First, property weights were determined by using entroy method to correct the Euclidean distance. And then, high-quality initial clustering center was chosen by comparing the empowering target cost function of the initial clusters for more accurate and more stable clustering. Finally, the algorithm was implemented in Matlab. The experimental results show that the algorithm accuracy and stability are significantly higher than the traditional K-means algorithm.

Key words: K-means, accuracy, information entropy, property weight, initial clustering center

中图分类号: