基于粗糙集的混合属性数据聚类算法

计算机应用 ›› 2010, Vol. 30 ›› Issue (12): 3377-3379.

基于粗糙集的混合属性数据聚类算法

范黎林¹,王娟²

1. 河南师范大学
2.

收稿日期:2010-06-24 修回日期:2010-07-17 发布日期:2010-12-22 出版日期:2010-12-01
通讯作者: 范黎林

Clustering algorithms for mixed attributes based on rough set

Received:2010-06-24 Revised:2010-07-17 Online:2010-12-22 Published:2010-12-01

摘要/Abstract

摘要： 传统聚类方法将对象严格地划分到某一类，但是很多时候边界对象不能被严格地划分。基于粗糙集的k-means聚类算法和基于粗糙集的leader聚类算法，利用粗糙集理论将数据对象划分到一个簇的上近似集或下近似集当中，提供了一种新的处理不确定性的视角，很好地解决了这种边界不确定问题。但其缺点是不能处理混合属性数据，聚类结果对初值有明显的依赖性。针对这些算法存在的不足，给出了一种适用于混合属性数据的距离定义，对初始值的选取提出了改进办法，提出了一种基于粗糙集的混合属性数据聚类算法。仿真实验证明，在不确定聚类簇数的情况下，该算法的聚类准确率比传统k-means算法明显提高。

关键词: 聚类, 粗糙集, k-means算法, 混合属性

Abstract: Objects are strictly divided into clusters in the conventional algorithms; however, most of the time, the object boundary cannot be strictly classified. The rough set based k-means clustering algorithm and leader clustering algorithm divide the data object into a clusters upper-bound or lower-bound using rough set, which provides a new perspective of dealing with uncertainty and solve the problem of uncertain boundary region. The problem is that both of the two algorithms cannot deal with mixed valued data, and clustering results significantly depend on the initial value. A definition of the distance for mixed valued data was introduced in this paper, an improved method was put forward for the selection of the initial value, and a clustering algorithm for mixed valued data based on rough set was given. Finally, a simulation experiment was carried out. Simulation results show, under the uncertain situation of cluster number,the clustering accuracy of the algorithm is significantly improved than the traditional k-means algorithm.

Key words: clustering, rough set, k-means algorithm, mixed attribute

范黎林王娟. 基于粗糙集的混合属性数据聚类算法[J]. 计算机应用, 2010, 30(12): 3377-3379.

[1]	陈恒恒, 倪志伟, 朱旭辉, 金媛媛, 陈千. 基于聚类分析的差分隐私高维数据发布方法[J]. 计算机应用, 2021, 41(9): 2578-2585.
[2]	祝承, 赵晓琦, 赵丽萍, 焦玉宏, 朱亚飞, 陈建英, 周伟, 谭颖. 基于谱聚类半监督特征选择的功能磁共振成像数据分类[J]. 计算机应用, 2021, 41(8): 2288-2293.
[3]	曾祥银, 郑伯川, 刘丹. 基于深度卷积神经网络和聚类的左右轨道线检测[J]. 计算机应用, 2021, 41(8): 2324-2329.
[4]	戴嫣然, 戴国庆, 袁玉波. 基于肤色学习的多人脸前景抽取方法[J]. 计算机应用, 2021, 41(6): 1659-1666.
[5]	王治和, 常筱卿, 杜辉. 基于万有引力的自适应近邻传播聚类算法[J]. 计算机应用, 2021, 41(5): 1337-1342.
[6]	马建红, 曹文斌, 刘元刚, 夏爽. 基于功效特征的专利聚类方法[J]. 计算机应用, 2021, 41(5): 1361-1366.
[7]	李国荣, 冶继民, 甄远婷. 基于新的鲁棒相似性度量的时间序列聚类[J]. 计算机应用, 2021, 41(5): 1343-1347.
[8]	王小荣, 张玉召, 张振江. 基于双论域粗糙集的快捷货物运输方案选择[J]. 计算机应用, 2021, 41(5): 1500-1505.
[9]	李杏峰, 黄玉清, 任珍文, 李毅红. 基于自适应邻域的鲁棒多视图聚类算法[J]. 计算机应用, 2021, 41(4): 1093-1099.
[10]	龙超奇, 蒋瑜, 谢雨. 基于峰值网格改进的小波聚类算法[J]. 计算机应用, 2021, 41(4): 1122-1127.
[11]	彭莉, 张海清, 李代伟, 唐聃, 于曦, 何磊. 基于粗糙集理论的不完备数据分析方法的混合信息系统填补算法[J]. 计算机应用, 2021, 41(3): 677-685.
[12]	邹志文, 秦程. 基于k-means++的动态构建空间主题R树方法[J]. 计算机应用, 2021, 41(3): 733-737.
[13]	郭佳, 韩李涛, 孙宪龙, 周丽娟. 自动确定聚类中心的比较密度峰值聚类算法[J]. 计算机应用, 2021, 41(3): 738-744.
[14]	吕佳, 鲜焱. 结合改进密度峰值聚类和共享子空间的协同训练算法[J]. 计算机应用, 2021, 41(3): 686-693.
[15]	袁芊芊, 邓洪敏, 王晓航. 基于超像素快速模糊C均值聚类与支持向量机的柑橘病虫害区域分割[J]. 计算机应用, 2021, 41(2): 563-570.