Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (5): 1472-1479.DOI: 10.11772/j.issn.1001-9081.2021030515

• Data science and technology • Previous Articles     Next Articles

Clustering algorithm based on local gravity and distance

Jie DU, Yan MA(), Hui HUANG   

  1. College of Information,Mechanical and Electrical Engineering,Shanghai Normal University,Shanghai 201418,China
  • Received:2021-04-06 Revised:2021-07-09 Accepted:2021-07-14 Online:2022-06-11 Published:2022-05-10
  • Contact: Yan MA
  • About author:DU Jie, born in 1996,M. S. candidate. Her research interestsinclude pattern recognition,image processing.
    MA Yan, born in 1970,Ph. D.,professor. Her research interestsinclude pattern recognition,image processing.
    HUANG Hui, born in 1981,Ph. D.,associate professor. Herresearch interests include pattern recognition,medical image processing.
  • Supported by:
    National Natural Science Foundation of China(61373004)

基于局部引力和距离的聚类算法

杜洁, 马燕(), 黄慧   

  1. 上海师范大学 信息与机电工程学院,上海 201418
  • 通讯作者: 马燕
  • 作者简介:杜洁(1996—),女,浙江湖州人,硕士研究生,主要研究方向:模式识别、图像处理
    马燕(1970—),女,浙江海宁人,教授,博士,CCF会员,主要研究方向:模式识别、图像处理 ma‑yan@shnu.edu.cn
    黄慧(1981—),女,山东日照人,副教授,博士,主要研究方向:模式识别、医学图像处理。
  • 基金资助:
    国家自然科学基金资助项目(61373004)

Abstract:

The Density Peak Clustering (DPC) algorithm cannot accurately select the cluster centers for the datasets with various density and complex shape. The Clustering by Local Gravitation (LGC) algorithm has many parameters which need manual adjustment. To address these issues, a new Clustering algorithm based on Local Gravity and Distance (LGDC) was proposed. Firstly, the local gravity model was used to calculate the ConcEntration (CE) of data points, and the distance between each point and the point with higher CE value was determined according to CE. Then, the data points with high CE and high distance were selected as cluster centers. Finally, the remaining data points were allocated based on the idea that the CE of internal points of the cluster was much higher than that of the boundary points. At the same time, the balanced k nearest neighbor was used to adjust the parameters automatically. Experimental results show that, LGDC achieves better clustering effect on four synthetic datasets. Compared with algorithms such as DPC and LGC, LGDC has the index of Adjustable Rand Index (ARI) improved by 0.144 7 on average on the real datasets such as Wine, SCADI and Soybean.

Key words: density peak clustering, gravity clustering, local gravity model, concentration, distance

摘要:

密度峰值聚类(DPC)算法对于密度多样、形状复杂的数据集不能准确选择聚类中心,同时基于局部引力的聚类(LGC)算法参数较多且需要手动调参。针对这些问题,提出了一种基于局部引力和距离的聚类算法(LGDC)。首先,利用局部引力模型计算数据点的集中度(CE),根据集中度确定每个数据点与高集中度的点之间的距离;然后,选取具有高集中度值和高距离值的数据点作为聚类中心;最后,基于簇的内部点集中度远高于边界点的集中度的思想,分配其余数据点,并且利用平衡k近邻实现参数的自动调整。实验结果表明,LGDC在4个合成数据集上取得了更好的聚类效果;且在Wine、SCADI、Soybean等真实数据集上,LGDC的调整兰德系数(ARI)指标相较DPC、LGC等算法平均提高了0.144 7。

关键词: 密度峰值聚类, 引力聚类, 局部引力模型, 集中度, 距离

CLC Number: