Accurate property weighted K-means clustering algorithm based on information entropy

doi:10.3724/SP.J.1087.2011.01675

Journal of Computer Applications ›› 2011, Vol. 31 ›› Issue (06): 1675-1677.DOI: 10.3724/SP.J.1087.2011.01675

• Database technology • Previous Articles Next Articles

Accurate property weighted K-means clustering algorithm based on information entropy

YUAN Fuyong,ZHANG Xiaocai,LUO Sibiao

College of Information Science and Engineering，Yanshan University， Qinhuangdao Hebei 066004，China

Received:2010-12-22 Revised:2011-01-20 Online:2011-06-20 Published:2011-06-01
Contact: ZHANG Xiaocai

基于信息熵的精确属性赋权K-means聚类算法

原福永,张晓彩,罗思标

燕山大学信息科学与工程学院，河北秦皇岛 066004

通讯作者: 张晓彩
作者简介:原福永(1958-)，男，黑龙江鸡西人，教授，主要研究方向：网络信息检索、数据库；张晓彩(1985-)，女，河北石家庄人，硕士研究生，主要研究方向：网络信息检索、数据库；罗思标(1984-)，男，江西吉安人，硕士研究生，主要研究方向：计算几何、机器人路径规划。

Abstract

Abstract: Concerning the initial clustering center generation and the data similarity judgment basis of the traditional K-means algorithm, the paper proposed an accurate property weighted K-means clustering algorithm based on information entropy to further improve the clustering accuracy. First, property weights were determined by using entroy method to correct the Euclidean distance. And then, high-quality initial clustering center was chosen by comparing the empowering target cost function of the initial clusters for more accurate and more stable clustering. Finally, the algorithm was implemented in Matlab. The experimental results show that the algorithm accuracy and stability are significantly higher than the traditional K-means algorithm.

Key words: K-means, accuracy, information entropy, property weight, initial clustering center

摘要： 为了进一步提高聚类的精确度，针对传统K-means算法的初始聚类中心产生方式和数据相似性判断依据，提出一种基于信息熵的精确属性赋权K-means聚类算法。首先利用熵值法对数据对象的属性赋权来修正对象间的欧氏距离，然后通过比较初聚类的赋权类别目标价值函数，选择高质量的初始聚类中心来进行更高精度和更加稳定的聚类，最后通过Matlab编程实现。实验证明该算法的聚类精确度和稳定性要明显高于传统K-means算法。

关键词: K-means, 精确度, 信息熵, 属性赋权, 初始聚类中心

CLC Number:

TP301

YUAN Fuyong ZHANG Xiaocai LUO Sibiao. Accurate property weighted K-means clustering algorithm based on information entropy[J]. Journal of Computer Applications, 2011, 31(06): 1675-1677.

原福永张晓彩罗思标. 基于信息熵的精确属性赋权K-means聚类算法[J]. 计算机应用, 2011, 31(06): 1675-1677.

[1]	Dongju YANG, Chengfu HU. Keyword extraction method for scientific text based on improved TextRank [J]. Journal of Computer Applications, 2024, 44(6): 1720-1726.
[2]	Shengjie MENG, Wanjun YU, Ying CHEN. Feature selection algorithm for high-dimensional data with maximum correlation and maximum difference [J]. Journal of Computer Applications, 2024, 44(3): 767-771.
[3]	Chenghao YANG, Jie HU, Hongjun WANG, Bo PENG. Incomplete multi-view clustering algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(12): 3784-3789.
[4]	Yi WANG, Shenglei PEI, Yu WANG. Indoor positioning method of multi-fingerprint database based on channel state information and K-means-SVR [J]. Journal of Computer Applications, 2023, 43(5): 1636-1640.
[5]	Mengyi LI, Xia FANG, Hongbo ZHENG, Xujia QIN. Oriented line integral convolution algorithm for flow field based on information entropy [J]. Journal of Computer Applications, 2023, 43(4): 1233-1239.
[6]	Qian CHEN, Zheng CHAI, Zilong WANG, Jiawei CHEN. Poisoning attack detection scheme based on generative adversarial network for federated learning [J]. Journal of Computer Applications, 2023, 43(12): 3790-3798.
[7]	Yuyu MENG, Jing GUO. Link prediction algorithm based on information entropy improved PCA model [J]. Journal of Computer Applications, 2022, 42(9): 2823-2829.
[8]	Lihua YIN, Liang KANG, Wenhua ZHU. High-accuracy video image stabilization algorithm incorporating temporal and spatial saliency [J]. Journal of Computer Applications, 2022, 42(8): 2564-2570.
[9]	Jun FENG, Bingfa WANG, Jiamin LU. Query performance evaluation of distributed resource description framework data management systems [J]. Journal of Computer Applications, 2022, 42(2): 440-448.
[10]	Hao FENG, Chaobing HUANG, Yuanqiao WEN. Remote sensing image small target detection based on improved YOLOv3 [J]. Journal of Computer Applications, 2022, 42(12): 3723-3732.
[11]	Yongbo CHEN, Qiaoqin LI, Yongguo LIU. Dynamic relevance based feature selection algorithm [J]. Journal of Computer Applications, 2022, 42(1): 109-114.
[12]	Xiaojuan LI, Meng HAN, Le WANG, Ni ZHENG, Haodong CHENG. Dynamic weighted ensemble classification algorithm based on accuracy climbing [J]. Journal of Computer Applications, 2022, 42(1): 123-131.
[13]	WANG Zhihe, CHANG Xiaoqing, DU Hui. Adaptive affinity propagation clustering algorithm based on universal gravitation [J]. Journal of Computer Applications, 2021, 41(5): 1337-1342.
[14]	LI Zhao, DONG Xiaoxiao, HUANG Chengcheng, REN Chongguang. Design space exploration method for floating-point expression based on heuristic search [J]. Journal of Computer Applications, 2020, 40(9): 2665-2669.
[15]	YIN Chunyong, ZHU Yuhang. Fake review detection model based on vertical ensemble Tri-training [J]. Journal of Computer Applications, 2020, 40(8): 2194-2201.

Accurate property weighted K-means clustering algorithm based on information entropy

基于信息熵的精确属性赋权K-means聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics