《计算机应用》唯一官方网站

• •    下一篇

基于高阶一致性学习的聚类集成算法

甘舰文1, 陈艳2, 周芃3, 杜亮1,4()   

  1. 1.山西大学 计算机与信息技术学院, 太原 030006
    2.四川大学 计算机学院, 成都 610065
    3.安徽大学 计算机科学与技术学院, 合肥 230601
    4.山西大学 大数据科学与产业研究院, 太原 030006
  • 收稿日期:2022-09-12 修回日期:2022-10-28 发布日期:2023-07-03
  • 通讯作者: 杜亮
  • 基金资助:
    国家自然科学基金资助项目(61976129)

Clustering ensemble algorithm with high-order consistency learning

Jianwen GAN1, Yan CHEN2, Peng ZHOU3, Liang DU1,4()   

  1. 1.School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China
    2.College of Computer Science,Sichuan University,Chengdu Sichuan 610065,China
    3.School of Computer Science and Technology,Anhui University,Hefei Anhui 230601,China
    4.Institute of Big Data Science and Industry,Shanxi University,Taiyuan Shanxi 030006,China
  • Received:2022-09-12 Revised:2022-10-28 Online:2023-07-03
  • Supported by:
    National Natural Science Foundation of China(61976129)

摘要: 现有的大部分关于聚类集成的研究主要关注有效的集成算法的设计。为解决由于基聚类器的质量高低不一、低质量的基聚类器对聚类集成性能产生影响的问题,从数据发掘的角度出发,以基聚类器为基础挖掘数据的内在联系,提出一种高阶信息融合算法——基于高阶一致性学习的聚类集成(HCLCE)算法,从不同的维度表示数据之间的联系。首先,将每种高阶信息融合成一个新的结构化的一致性矩阵;然后,再对得到的多个一致性矩阵进行融合;最后,将多种信息融合为一个一致性的结果。实验结果表明,与次优的LWEA(Locally Weighted Evidence Accumulation)算法相比,HCLCE算法的聚类准确率平均提升了7.22%,归一化互信息(NMI)平均提升了9.19%。可见,HCLCE能得到比聚类集成算法和单独使用一种信息更好的聚类结果。

关键词: , 聚类集成, 一致性学习, 高阶信息, 双随机约束, 结构化, 相似性矩阵

Abstract: Most of the research on clustering ensemble focuses on designing practical consistency learning algorithms. To solve the problems that the quality of base clusters varies and the low-quality base clusters have an impact on the performance of the clustering ensemble, from the perspective of data mining, the intrinsic connections of data were mined based on the base clusters, and a high-order information fusion algorithm was proposed to represent the connections between data from different dimensions, namely Clustering Ensemble with High-order Consensus learning (HCLCE). Firstly, each high-order information was fused into a new structured consistency matrix. Then, the obtained multiple consistency matrices were fused together. Finally, multiple information was fused into a consistent result. Experimental results show that LCLCE algorithm has the clustering accuracy improved by an average of 7.22%, and the Normalized Mutual Information (NMI) improved by an average of 9.19% compared with the suboptimal Locally Weighted Evidence Accumulation (LWEA) algorithm. It can be seen that the proposed algorithm can obtain better clustering results compared with clustering ensemble algorithms and using one information alone.

Key words: , clustering ensemble, consistency learning, high-order information, double random constraint, structuration, similarity matrix

中图分类号: