计算机应用 ›› 2018, Vol. 38 ›› Issue (12): 3433-3437.DOI: 10.11772/j.issn.1001-9081.2018040739

• 数据科学与技术 • 上一篇    下一篇

融合密度峰值的高斯混合模型聚类算法

陶志勇1, 刘晓芳1,2, 王和章1   

  1. 1. 辽宁工程技术大学 电子与信息工程学院, 辽宁 葫芦岛 125105;
    2. 阜新力兴科技有限责任公司, 辽宁 阜新 123000
  • 收稿日期:2018-04-11 修回日期:2018-06-11 出版日期:2018-12-10 发布日期:2018-12-15
  • 通讯作者: 刘晓芳
  • 作者简介:陶志勇(1978-),男,黑龙江五大连池人,副教授,博士,CCF会员,主要研究方向:多媒体通信;刘晓芳(1995-),女,辽宁鞍山人,硕士研究生,主要研究方向:机器学习;王和章(1992-),男,河北邢台人,硕士研究生,主要研究方向:无线传感器网络、物联网。
  • 基金资助:
    辽宁省博士启动基金资助项目(20170520098);辽宁省自然科学基金资助项目(2015020100);辽宁省普通高等教育本科教学改革研究项目(551610001095);辽宁省教育厅一般项目(LJ2017QL013)。

Clustering algorithm of Gaussian mixture model based on density peaks

TAO Zhiyong1, LIU Xiaofang1,2, WANG Hezhang1   

  1. 1. School of Electronic and Information Engineering, Liaoning Technical University, Huludao Liaoning 125105, China;
    2. Fuxin Lixing Technology Company Limited, Fuxin Liaoning 123000, China
  • Received:2018-04-11 Revised:2018-06-11 Online:2018-12-10 Published:2018-12-15
  • Contact: 刘晓芳
  • Supported by:
    This work is partially supported by the Doctoral Research Start Foundation of Liaoning Province (20170520098), the Natural Science Foundation of Liaoning Province (2015020100), the Research Project of Liaoning Provincial Higher Education Undergraduate Teaching Reform (551610001095), the General Project of Liaoning Provincial Education Department (LJ2017QL013).

摘要: 针对高斯混合模型(GMM)聚类算法对初始值敏感且容易陷入局部极小值的问题,利用密度峰值(DP)算法全局搜索能力强的优势,对GMM算法的初始聚类中心进行优化,提出了一种融合DP的GMM聚类算法(DP-GMMC)。首先,基于DP算法寻找聚类中心,得到混合模型的初始参数;其次,采用最大期望(EM)算法迭代估计混合模型的参数;最后,根据贝叶斯后验概率准则实现数据点的聚类。在Iris数据集下,DP-GMMC聚类准确率可达到96.67%,与传统GMM算法相比提高了33.6个百分点,解决了对初始聚类中心依赖的问题。实验结果表明,DP-GMMC对低维数据集有较好的聚类效果。

关键词: 聚类, 高斯混合模型, 最大期望算法, 密度峰值

Abstract: The clustering algorithm of Gaussian Mixture Model (GMM) is sensitive to initial value and easy to fall into local minimum. In order to solve the problems, taking advantage of strong global search ability of Density Peaks (DP) algorithm, the initial clustering center of GMM algorithm was optimized, and a new Clustering algorithm of GMM based on DP (DP-GMMC) was proposed. Firstly, the clustering center was searched by the DP algorithm to obtain the initial parameters of mixed model. Then, the Expectation Maximization (EM) algorithm was used to estimate the parameters of mixed model iteratively. Finally, the data points were clustered according to the Bayesian posterior probability criterion. In the Iris data set, the problem of dependence on the initial clustering center is solved, and the clustering accuracy of DP-GMMC can reach 96.67%, which is 33.6 percentage points higher than that of the traditional GMM algorithm. The experimental results show that, the proposd DP-GMMC has better clustering effect on low-dimensional datasets.

Key words: clustering, Gaussian Mixture Model (GMM), Expectation Maximization (EM) algorithm, Density Peak (DP)

中图分类号: