Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (5): 1481-1488.DOI: 10.11772/j.issn.1001-9081.2022071094

• Data science and technology • Previous Articles    

Semi-supervised three-way clustering ensemble based on Seeds set and pairwise constraints

Chunmao JIANG1, Peng WU2, Zhicong LI2()   

  1. 1.School of Computer Science and Mathematics,Fujian University of Technology,Fuzhou Fujian 350118,China
    2.College of Computer Science and Information Engineering,Harbin Normal University,Harbin Heilongjiang 150025,China
  • Received:2022-07-19 Revised:2022-10-03 Accepted:2022-11-04 Online:2023-05-08 Published:2023-05-10
  • Contact: Zhicong LI
  • About author:JIANG Chunmao, born in 1972, Ph. D., professor. His research interests include three-way decision and three-way computing, cloud computing, big data mining.
    WU Peng, born in 1997, M. S. candidate. His research interests include three-way decision.
    LI Zhicong, born in 1972, M. S., associate professor. His research interests include data mining.
  • Supported by:
    Natural Science Foundation of Heilongjiang Province(LH2020F031);Fujian University of Technology Research Start Fund Project(GY-Z220212)

基于Seeds集和成对约束的半监督三支聚类集成

姜春茂1, 吴鹏2, 李志聪2()   

  1. 1.福建工程学院 计算机科学与数学学院,福州 350118
    2.哈尔滨师范大学 计算机科学与信息工程学院,哈尔滨 150025
  • 通讯作者: 李志聪
  • 作者简介:姜春茂(1972—),男,辽宁庄河人,教授,博士,CCF高级会员,主要研究方向:三支决策与三支计算、云计算、大数据挖掘
    吴鹏(1997—),男,山东烟台人,硕士研究生,主要研究方向:三支决策
    李志聪(1972—),男,黑龙江绥化人,副教授,硕士,CCF会员,主要研究方向:数据挖掘。lizhicong72@163.com
  • 基金资助:
    黑龙江省自然科学基金资助项目(LH2020F031);福建工程学院科研启动基金资助项目(GY?Z220212)

Abstract:

Using appropriate strategies, clustering ensemble can effectively improve the stability, robustness and precision of clustering results by fusing multiple base cluster members with differences. Current research on the clustering ensemble rarely uses known priori information, and it is difficult to describe belonging relationships between objects and clusters when facing complex data. Therefore, a semi-supervised three-way clustering ensemble method was proposed on the basis of Seeds set and pairwise constraints. Firstly, based on the existing label information, a new three-way label propagation algorithm was proposed to construct the base cluster members. Secondly, a semi-supervised three-way clustering ensemble framework was designed to integrate the base cluster members to construct a consistent similarity matrix, and this matrix was optimized by using pairwise constraint information. Finally, the three-way spectral clustering was employed as a consistency function to cluster the similarity matrix to obtain the final clustering results. Experimental results on several real datasets in UCI show that compared with the semi-supervised clustering ensemble algorithms including Cluster-based Similarity Partitioning Algorithm (CSPA), HyperGraph Partitioning Algorithm (HGPA), Meta-CLustering Algorithm (MCLA), Label Propagation Algorithm (LPA) and Cop-Kmeans, the proposed method achieves the best results on most of the datasets in terms of Normalized Mutual Information (NMI), Adjusted Rand Index (ARI) and F-measure.

Key words: three-way decision, clustering ensemble, three-way clustering, pairwise constraint, semi-supervised, Seeds set

摘要:

聚类集成使用合适的策略融合多个具有差异性的基聚类成员,能够有效提高聚类结果的稳定性、鲁棒性和准确率。当前聚类集成的研究较少利用已知的先验信息,面对复杂数据时难以刻画对象与类簇之间明确的归属关系。因此,提出一种基于Seeds集和成对约束的半监督三支聚类集成方法。首先,基于已有的标签信息提出一种新的三支标签传播算法构造基聚类成员;其次,提出一种半监督三支聚类集成框架集成基聚类成员,构造出一致性相似矩阵,并利用成对约束信息对该矩阵进行优化调整;最后,将三支谱聚类作为一致性函数对相似矩阵进行聚类,得到最终集成结果。在多个UCI真实数据集上的实验结果表明,与基于类簇的相似分区算法(CSPA)、超图分区算法(HGPA)、元类簇算法(MCLA)、标签传播算法(LPA)、Cop-Kmeans等半监督聚类集成算法相比,所提方法的归一化互信息(NMI)、调整兰德系数(ARI)和F测度在绝大多数据集上取得了最优值,获得了相对更好的聚类集成结果。

关键词: 三支决策, 聚类集成, 三支聚类, 成对约束, 半监督, Seeds集

CLC Number: