计算机应用 ›› 2010, Vol. 30 ›› Issue (12): 3377-3379.

• 数据库与数据挖掘 • 上一篇    下一篇

基于粗糙集的混合属性数据聚类算法

范黎林1,王娟2   

  1. 1. 河南师范大学
    2.
  • 收稿日期:2010-06-24 修回日期:2010-07-17 发布日期:2010-12-22 出版日期:2010-12-01
  • 通讯作者: 范黎林

Clustering algorithms for mixed attributes based on rough set

  • Received:2010-06-24 Revised:2010-07-17 Online:2010-12-22 Published:2010-12-01

摘要: 传统聚类方法将对象严格地划分到某一类,但是很多时候边界对象不能被严格地划分。基于粗糙集的k-means聚类算法和基于粗糙集的leader聚类算法,利用粗糙集理论将数据对象划分到一个簇的上近似集或下近似集当中,提供了一种新的处理不确定性的视角,很好地解决了这种边界不确定问题。但其缺点是不能处理混合属性数据,聚类结果对初值有明显的依赖性。针对这些算法存在的不足,给出了一种适用于混合属性数据的距离定义,对初始值的选取提出了改进办法,提出了一种基于粗糙集的混合属性数据聚类算法。仿真实验证明,在不确定聚类簇数的情况下,该算法的聚类准确率比传统k-means算法明显提高。

关键词: 聚类, 粗糙集, k-means算法, 混合属性

Abstract: Objects are strictly divided into clusters in the conventional algorithms; however, most of the time, the object boundary cannot be strictly classified. The rough set based k-means clustering algorithm and leader clustering algorithm divide the data object into a clusters upper-bound or lower-bound using rough set, which provides a new perspective of dealing with uncertainty and solve the problem of uncertain boundary region. The problem is that both of the two algorithms cannot deal with mixed valued data, and clustering results significantly depend on the initial value. A definition of the distance for mixed valued data was introduced in this paper, an improved method was put forward for the selection of the initial value, and a clustering algorithm for mixed valued data based on rough set was given. Finally, a simulation experiment was carried out. Simulation results show, under the uncertain situation of cluster number,the clustering accuracy of the algorithm is significantly improved than the traditional k-means algorithm.

Key words: clustering, rough set, k-means algorithm, mixed attribute