计算机应用 ›› 2016, Vol. 36 ›› Issue (8): 2061-2065.DOI: 10.11772/j.issn.1001-9081.2016.08.2061

• 第六届中国数据挖掘会议(CCDM 2016) •    下一篇

基于k-means的自动三支决策聚类方法

于洪, 毛传凯   

  1. 计算智能重庆市重点实验室(重庆邮电大学), 重庆 400065
  • 收稿日期:2016-03-01 修回日期:2016-05-11 出版日期:2016-08-10 发布日期:2016-08-10
  • 通讯作者: 于洪
  • 作者简介:于洪(1972-),女,重庆人,教授,博士,CCF会员,主要研究方向:粗糙集、三支决策、智能信息处理、Web智能、数据挖掘;毛传凯(1989-),男,四川资阳人,硕士研究生,主要研究方向:聚类、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61379114,61533020)。

Automatic three-way decision clustering algorithm based on k-means

YU Hong, MAO Chuankai   

  1. Chongqing Key Laboratory of Computational Intelligence(Chongqing University of Posts and Telecommunications), Chongqing 400065, China
  • Received:2016-03-01 Revised:2016-05-11 Online:2016-08-10 Published:2016-08-10
  • Supported by:
    This work is partially supported by the National Nature Science Foundation of China (61379114, 61533020).

摘要: 应用广泛的k-means算法结果是一种二支决策的结果,即对象要么属于某个类要么不属于这个类,这种决策方式难以适用于一些具有不确定现象的环境,因此提出三支决策聚类方法来反映对象与类之间的关系,即:对象确定属于某类、可能属于某类或确定不属于某类。显然,二支决策是三支决策的一种特例。此外,从类内紧凑性和考虑近邻类间分离性角度出发,定义了分离性指数、聚类结果评估有效性指数,并提出了一种自动三支决策聚类算法。该方法为处理具有不确定信息的基于k-means算法框架的聚类数目自动确定的难题提供了一种新的解决思路。在人工数据集和UCI真实数据集上的初步对比实验结果表明所提出的方法是有效的。

关键词: 聚类, 三支决策, 有效性指数, k-means算法

Abstract: The result of widely used k-means algorithm is a two-way decision result, namely each object either belongs to one cluster or not. The two-way decision method is difficult to apply to some situations with uncertainty. Therefore, a three-way decision clustering method was proposed to show the three relationships between an object and a cluster. That is, the object definitely belongs to the cluster, the object may belong to the cluster or the object does not belong to the cluster. Obviously, the two-way decision is a special case of the three-way decision. A new separation index and clustering validity index were defined from the perspective of two aspects, which were the compactness of cluster and the separation among clusters considering the nearest neighbors. Then, an automatic three-way decision clustering algorithm was put forward. The method provides a new way to solve the problem of automatically determining the number of clusters in the framework of k-means algorithm for the uncertain information. The preliminary comparison experimental results on the artificial and real UCI data sets show that the proposed method is effective.

Key words: clustering, three-way decision, validity index, k-means algorithm

中图分类号: