基于k-prototypes的混合属性数据聚类算法

计算机应用 ›› 2010, Vol. 30 ›› Issue (8): 2003-2005.

基于k-prototypes的混合属性数据聚类算法

陈韡¹,王雷²,蒋子云³

1. 湖南大学软件学院
2. 湖南大学
3. 中南大学信息科学与工程学院电子创新研究所

收稿日期:2010-02-07 修回日期:2010-03-07 发布日期:2010-07-30 出版日期:2010-08-01
通讯作者: 陈韡
基金资助:
国家高技术研究发展(863)计划

K-prototypes based clustering algorithm for data mixed with numeric and categorical values

Received:2010-02-07 Revised:2010-03-07 Online:2010-07-30 Published:2010-08-01

摘要/Abstract

摘要： 通过对基于K-prototypes算法对混合属性数据处理的聚类问题进行研究，改进了K-prototypes算法中分类属性相异度计算公式，使之能更加精确反映样本间的差异；在此基础上提出了一种用于处理混合属性数据的聚类算法，并将改进后的算法应用于英语借词数据的聚类分析中。实验结果表明，与K-prototypes算法相比，改进后的算法具有更好的稳定性和更高的精度。

关键词: 聚类, k-prototypes算法, 混合属性数据, 相异度

Abstract: Based on the K-prototypes, the clustering problem for data mixed with numeric and categorical values was researched in this paper. At first, an improved formula for computing the dissimilarity degree was proposed, compared with the formula in the K-prototypes algorithm. The modified formula can reflect the samples similarities and differences more precisely. Furthermore, a new clustering algorithm for data mixed with numeric and categorical values was presented on the basis of the improved formula for computing the dissimilarity degree, which was finally applied in the clustering analysis of English loanwords. The experimental results show that the new algorithm has better stability and higher precision than the traditional K-prototypes algorithm.

Key words: clustering, k-prototypes algorithm, data with mixed numeric and categorical values, dissimilarity

陈韡王雷蒋子云. 基于k-prototypes的混合属性数据聚类算法[J]. 计算机应用, 2010, 30(8): 2003-2005.

[1]	陈恒恒, 倪志伟, 朱旭辉, 金媛媛, 陈千. 基于聚类分析的差分隐私高维数据发布方法[J]. 计算机应用, 2021, 41(9): 2578-2585.
[2]	曾祥银, 郑伯川, 刘丹. 基于深度卷积神经网络和聚类的左右轨道线检测[J]. 计算机应用, 2021, 41(8): 2324-2329.
[3]	祝承, 赵晓琦, 赵丽萍, 焦玉宏, 朱亚飞, 陈建英, 周伟, 谭颖. 基于谱聚类半监督特征选择的功能磁共振成像数据分类[J]. 计算机应用, 2021, 41(8): 2288-2293.
[4]	戴嫣然, 戴国庆, 袁玉波. 基于肤色学习的多人脸前景抽取方法[J]. 计算机应用, 2021, 41(6): 1659-1666.
[5]	马建红, 曹文斌, 刘元刚, 夏爽. 基于功效特征的专利聚类方法[J]. 计算机应用, 2021, 41(5): 1361-1366.
[6]	王治和, 常筱卿, 杜辉. 基于万有引力的自适应近邻传播聚类算法[J]. 计算机应用, 2021, 41(5): 1337-1342.
[7]	李国荣, 冶继民, 甄远婷. 基于新的鲁棒相似性度量的时间序列聚类[J]. 计算机应用, 2021, 41(5): 1343-1347.
[8]	李杏峰, 黄玉清, 任珍文, 李毅红. 基于自适应邻域的鲁棒多视图聚类算法[J]. 计算机应用, 2021, 41(4): 1093-1099.
[9]	龙超奇, 蒋瑜, 谢雨. 基于峰值网格改进的小波聚类算法[J]. 计算机应用, 2021, 41(4): 1122-1127.
[10]	吕佳, 鲜焱. 结合改进密度峰值聚类和共享子空间的协同训练算法[J]. 计算机应用, 2021, 41(3): 686-693.
[11]	邹志文, 秦程. 基于k-means++的动态构建空间主题R树方法[J]. 计算机应用, 2021, 41(3): 733-737.
[12]	郭佳, 韩李涛, 孙宪龙, 周丽娟. 自动确定聚类中心的比较密度峰值聚类算法[J]. 计算机应用, 2021, 41(3): 738-744.
[13]	袁芊芊, 邓洪敏, 王晓航. 基于超像素快速模糊C均值聚类与支持向量机的柑橘病虫害区域分割[J]. 计算机应用, 2021, 41(2): 563-570.
[14]	张恩, 李会敏, 常键. 可验证的隐私保护k-means聚类方案[J]. 计算机应用, 2021, 41(2): 413-421.
[15]	陈港, 孟相如, 康巧燕, 阳勇. 基于拓扑分割与聚类分析的虚拟软件定义网络映射算法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3309-3318.