基于高阶一致性学习的聚类集成算法

doi:10.11772/j.issn.1001-9081.2022091406

《计算机应用》唯一官方网站

• • 下一篇

基于高阶一致性学习的聚类集成算法

甘舰文¹, 陈艳², 周芃³, 杜亮¹^,⁴()

^1.山西大学计算机与信息技术学院, 太原 030006
^2.四川大学计算机学院, 成都 610065
^3.安徽大学计算机科学与技术学院, 合肥 230601
^4.山西大学大数据科学与产业研究院, 太原 030006

收稿日期:2022-09-12 修回日期:2022-10-28 发布日期:2023-07-03
通讯作者: 杜亮
基金资助:
国家自然科学基金资助项目(61976129)

Clustering ensemble algorithm with high-order consistency learning

Jianwen GAN¹, Yan CHEN², Peng ZHOU³, Liang DU¹^,⁴()

^1.School of Computer and Information Technology，Shanxi University，Taiyuan Shanxi 030006，China
^2.College of Computer Science，Sichuan University，Chengdu Sichuan 610065，China
^3.School of Computer Science and Technology，Anhui University，Hefei Anhui 230601，China
^4.Institute of Big Data Science and Industry，Shanxi University，Taiyuan Shanxi 030006，China

Received:2022-09-12 Revised:2022-10-28 Online:2023-07-03
Supported by:
National Natural Science Foundation of China(61976129)

摘要/Abstract

摘要： 现有的大部分关于聚类集成的研究主要关注有效的集成算法的设计。为解决由于基聚类器的质量高低不一、低质量的基聚类器对聚类集成性能产生影响的问题，从数据发掘的角度出发，以基聚类器为基础挖掘数据的内在联系，提出一种高阶信息融合算法——基于高阶一致性学习的聚类集成（HCLCE）算法，从不同的维度表示数据之间的联系。首先，将每种高阶信息融合成一个新的结构化的一致性矩阵；然后，再对得到的多个一致性矩阵进行融合；最后，将多种信息融合为一个一致性的结果。实验结果表明，与次优的LWEA（Locally Weighted Evidence Accumulation）算法相比，HCLCE算法的聚类准确率平均提升了7.22%，归一化互信息（NMI）平均提升了9.19%。可见，HCLCE能得到比聚类集成算法和单独使用一种信息更好的聚类结果。

关键词: , 聚类集成, 一致性学习, 高阶信息, 双随机约束, 结构化, 相似性矩阵

Abstract: Most of the research on clustering ensemble focuses on designing practical consistency learning algorithms. To solve the problems that the quality of base clusters varies and the low-quality base clusters have an impact on the performance of the clustering ensemble， from the perspective of data mining， the intrinsic connections of data were mined based on the base clusters， and a high-order information fusion algorithm was proposed to represent the connections between data from different dimensions， namely Clustering Ensemble with High-order Consensus learning （HCLCE）. Firstly， each high-order information was fused into a new structured consistency matrix. Then， the obtained multiple consistency matrices were fused together. Finally， multiple information was fused into a consistent result. Experimental results show that LCLCE algorithm has the clustering accuracy improved by an average of 7.22%， and the Normalized Mutual Information （NMI） improved by an average of 9.19% compared with the suboptimal Locally Weighted Evidence Accumulation （LWEA） algorithm. It can be seen that the proposed algorithm can obtain better clustering results compared with clustering ensemble algorithms and using one information alone.

Key words: , clustering ensemble, consistency learning, high-order information, double random constraint, structuration, similarity matrix

中图分类号:

TP181

甘舰文, 陈艳, 周芃, 杜亮. 基于高阶一致性学习的聚类集成算法[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2022091406.

Jianwen GAN, Yan CHEN, Peng ZHOU, Liang DU. Clustering ensemble algorithm with high-order consistency learning[J]. Journal of Computer Applications, DOI: 10.11772/j.issn.1001-9081.2022091406.

[1]	帅健, 王中卿, 陈嘉沥. 基于代码生成的细粒度情感分析方法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1827-1832.
[2]	甘舰文, 陈艳, 周芃, 杜亮. 基于高阶一致性学习的聚类集成算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2665-2672.
[3]	张潇誉, 于自强, 刘承栋, 李博涵, 靖常峰. 面向视频数据的时空伴随模式挖掘算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2330-2337.
[4]	刘耀, 童昕, 陈一风. 面向业务需求的算法路径自组配模型[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1768-1778.
[5]	姜春茂, 吴鹏, 李志聪. 基于Seeds集和成对约束的半监督三支聚类集成[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1481-1488.
[6]	杨东, 王以松. 析取回答集程序设计结构化测试方法[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 215-220.
[7]	张晓博, 杨燕, 李天瑞, 陆凡, 彭莉兰. 基于医疗文本数据聚类的帕金森病早期诊断预测[J]. 计算机应用, 2020, 40(10): 3088-3094.
[8]	徐健锋, 邹伟康, 梁伟, 程高洁, 张远健. 面向聚类集成的基聚类三支筛选方法[J]. 计算机应用, 2019, 39(11): 3120-3126.
[9]	张芳艳, 王新, 许新征. 基于结构化遮挡编码和极限学习机的局部遮挡人脸识别[J]. 计算机应用, 2019, 39(10): 2893-2898.
[10]	王新晴, 孟凡杰, 吕高旺, 任国亭. 基于PCA-SVM准则改进区域生长的非结构化道路识别[J]. 计算机应用, 2017, 37(6): 1782-1786.
[11]	杜政霖, 李云. 基于特征聚类集成技术的在线特征选择[J]. 计算机应用, 2017, 37(3): 866-870.
[12]	王泽宇, 吴艳霞, 张国印, 布树辉. 面向RGB-D场景解析的三维空间结构化编码深度网络[J]. 计算机应用, 2017, 37(12): 3458-3466.
[13]	钟艳如, 梁毅芳, 许本胜, 曾聪文, 卢宏成, 吴帆, 赵争君. 基于网络本体语言的三维计算机辅助设计主模型相似性计算方法[J]. 计算机应用, 2016, 36(6): 1599-1604.
[14]	刘超, 胡成玉, 姚宏, 梁庆中, 颜雪松. 面向海量非结构化数据的非关系型存储管理机制[J]. 计算机应用, 2016, 36(3): 670-674.
[15]	朱苏阳, 惠浩添, 钱龙华, 张民. 基于自监督学习的维基百科家庭关系抽取[J]. 计算机应用, 2015, 35(4): 1013-1016.

基于高阶一致性学习的聚类集成算法

Clustering ensemble algorithm with high-order consistency learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics