《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (6): 1673-1682.DOI: 10.11772/j.issn.1001-9081.2023060813

• CCF第38届中国计算机应用大会 (CCF NCCA 2023) • 上一篇    下一篇

不完整多视图聚类综述

董瑶1,2,3, 付怡雪1,2,3, 董永峰1,2,3(), 史进1, 陈晨1,2,3   

  1. 1.河北工业大学 人工智能与数据科学学院, 天津 300401
    2.河北省大数据计算重点实验室(河北工业大学), 天津 300401
    3.河北省数据驱动工业智能工程研究中心(河北工业大学), 天津 300401
  • 收稿日期:2023-06-25 修回日期:2023-07-18 接受日期:2023-08-03 发布日期:2023-08-21 出版日期:2024-06-10
  • 通讯作者: 董永峰
  • 作者简介:董瑶(1982—),女,河北唐山人,高级实验师,博士研究生,CCF会员,主要研究方向:人工智能、图数据挖掘
    付怡雪(1999—),女,河北唐山人,硕士研究生,主要研究方向:知识图谱、多视图聚类
    史进(1981—),男,河北张家口人,助理研究员,硕士,主要研究方向:人工智能、数据挖掘
    陈晨(1997—),女,黑龙江鹤岗人,硕士研究生,主要研究方向:知识图谱、图聚类。
  • 基金资助:
    河北省高等学校科学技术研究项目(QN2021213);河北省高等教育教学改革研究与实践项目(2020GJJG027)

Survey of incomplete multi-view clustering

Yao DONG1,2,3, Yixue FU1,2,3, Yongfeng DONG1,2,3(), Jin SHI1, Chen CHEN1,2,3   

  1. 1.School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China
    2.Hebei Province Key Laboratory of Big Data Computing (Hebei University of Technology),Tianjin 300401,China
    3.Hebei Engineering Research Center of Data?Driven Industrial Intelligence (Hebei University of Technology),Tianjin 300401,China
  • Received:2023-06-25 Revised:2023-07-18 Accepted:2023-08-03 Online:2023-08-21 Published:2024-06-10
  • Contact: Yongfeng DONG
  • About author:DONG Yao, born in 1982, Ph. D. candidate, senior experimentalist. Her research interests include artificial intelligence, graph data mining.
    FU Yixue, born in 1999, M.S. candidate. Her research interests include knowledge graph, multi-view clustering.
    SHI Jin, born in 1981, M. S., assistant research fellow. His research interests include artificial intelligence, data mining.
    CHEN Chen, born in 1997, M. S. candidate. Her research interests include knowledge graph, graph clustering.
  • Supported by:
    Science and Technology Research Project of Hebei Province Colleges and Universities(QN2021213);Hebei Higher Education Teaching Reform Research and Practice Project(2020GJJG027)

摘要:

多视图聚类是近年来图数据挖掘领域的研究热点。由于数据采集技术的限制或人为因素等原因常导致视图或样本缺失问题。降低多视图的不完整性对聚类效果的影响是多视图聚类目前面临的重大挑战。因此,综合研究不完整多视图聚类(IMC)近年的发展具有重要的理论意义和实践价值。首先,归纳分析不完整多视图数据缺失类型;其次,详细比较基于多核学习(MKL)、矩阵分解(MF)学习、深度学习和图学习这4类IMC方法,分析代表性方法的技术特点和区别;再次,从数据集类型、视图和类别数量、应用领域等角度总结22个公开不完整多视图数据集;继次,总结评价指标,并系统分析现有不完整多视图聚类方法在同构和异构数据集上的性能表现;最后,归纳分析不完整多视图聚类目前存在的问题、未来的发展方向和现有应用领域。

关键词: 不完整性, 多视图聚类, 图数据挖掘, 缺失视图, 多视图学习

Abstract:

Multi-view clustering has recently been a hot topic in graph data mining. However, due to the limitations of data collection technology or human factors, multi-view data often has the problem of missing views or samples. Reducing the impact of incomplete views on clustering performance is a major challenge currently faced by multi-view clustering. In order to better understand the development of Incomplete Multi-view Clustering (IMC) in recent years, a comprehensive review is of great theoretical significance and practical value. Firstly, the missing types of incomplete multi-view data were summarized and analyzed. Secondly, four types of IMC methods, based on Multiple Kernel Learning (MKL), Matrix Factorization (MF) learning, deep learning, and graph learning were compared, and the technical characteristics and differences among the methods were analyzed. Thirdly, from the perspectives of dataset types, the numbers of views and categories, and application fields, twenty-two public incomplete multi-view datasets were summarized. Then, the evaluation metrics were outlined, and the performance of existing incomplete multi-view clustering methods on homogeneous and heterogeneous datasets were evaluated. Finally, the existing problems, future research directions, and existing application fields of incomplete multi-view clustering were discussed.

Key words: incompleteness, multi-view clustering, graph data mining, missing view, multi-view learning

中图分类号: