计算机应用

• 人工智能与仿真 •    下一篇

NCCA2017+14+基于密度峰值的混合型数据聚类算法设计

李晔1,2,陈奕延2,张淑芬3   

  1. 1. 中国社会科学院数量经济与技术经济研究所
    2. 中国市场学会服务质量专业委员会
    3. 华北理工大学河北省数据科学与应用重点实验室
  • 收稿日期:2017-08-21 发布日期:2017-08-21 出版日期:2017-09-15
  • 通讯作者: 陈奕延

Design of hybrid data clustering algorithm based on density peak

  • Received:2017-08-21 Online:2017-08-21 Published:2017-09-15
  • Contact: Yi-Yan Chen

摘要:

针对k-prototypes算法无法自动识别簇数以及无法发现任意形状的簇,提出一种针对混合型数据的新方法:寻找密度峰值的聚类算法。首先,把CFSFDP (Clustering by fast Search and Find of Density Peaks)聚类算法扩展到混合型数据集,定义混合型数据对象之间的距离后按照CFSFDP算法的方法确定出簇中心,这样也就自动确定了簇的个数,然后其余的点按照密度从大到小的顺序进行分配。其次,研究了该算法中阈值(截断距离)及权值的选取问题。对于密度公式中的阈值,通过计算数据场中的势熵来自动提取;对于距离公式中的权值,利用度量数值型数据集和分类型数据集聚类趋势的统计量来定义。最后通过在三个实际混合型数据集上的测试发现:与传统k-prototypes算法相比,所提算法提高了聚类的精度。

关键词: 聚类分析, 混合型数据, 数据场, 聚类趋势, 密度峰值

Abstract:

Focused on the issue that k-prototypes algorithm is incapable of identifying automatically number of clusters and discovering clusters with arbitrary shape, a mixed type data clustering algorithm based on searching for density peaks was proposed. Firstly, CFSFDP algorithm was extended to mixed data sets in which the distances between mixed data objects were calculated to determine the cluster centers according to the method of CFSFDP algorithm, that is, the number of clusters was determined automatically. The rest points were then assigned to the cluster in order of their density from large to small. Secondly, the selection method of threshold and weight in the proposed algorithm was introduced. In the density formula, the threshold (cutoff distance) was extracted automatically by calculating potential entropy of data field. In the distance formula, the weight was defined through certain statistic which can measure clustering tendency of numeric datasets and categorical datasets. Finally, after the testing on three real mixed datasets it is shown that compared with k-prototypes algorithm, the proposed algorithm can improve the accuracy of clustering

Key words: cluster analysis, mixed data, data field, clustering trend, density peaks

中图分类号: