《计算机应用》唯一官方网站

• •    下一篇

基于注意力机制的不完备多视图聚类算法

杨成昊1,胡节1,王红军2,彭博3   

  1. 1. 西南交通大学计算机与人工智能学院
    2. 西南交通大学
    3. 西南交通大学信息科学与技术学院, 成都 610031
  • 收稿日期:2024-01-08 修回日期:2024-03-14 发布日期:2024-03-22 出版日期:2024-03-22
  • 通讯作者: 胡节
  • 基金资助:
    国家自然科学基金;四川省重点研发项目;2023年西南交通大学国际学生教育管理研究项目

Incomplete multi-view clustering algorithm based on attention mechanism

  • Received:2024-01-08 Revised:2024-03-14 Online:2024-03-22 Published:2024-03-22

摘要: 摘 要: 针对传统深度不完备多视图聚类算法中补全缺失视图数据的不确定性,嵌入学习缺乏鲁棒性以及模型泛化性低的问题,提出了基于注意力机制的不完备多视图聚类算法(IMVCAM)。首先,通过K最近邻(KNN)补全了视图中缺失的数据,使得训练数据具有互补性;然后,经过线性编码层,再将获得的嵌入通过注意力层,提高嵌入的质量;最后,对每个视图训练得到的嵌入使用k均值聚类算法(k-means),视图的权重通过皮尔逊相关系数进行确定。实验在五个经典的数据集上进行,在Fashion数据集上取得最优的结果。在Fashion数据集上的实验结果表明,所提算法IMVCAM相较于次优的DSIMVC(Deep Safe Incomplete Multi-View Clustering)在数据缺失率为0.1,0.3的情况下聚类精度提升了2.85,4.35个百分点。此外,在Caltech101-20数据集上,缺失率为0.1,0.3的情况下相比于次优的IMVCSAF(Incomplete Multi-View Clustering algorithm based on Self-Attention Fusion)聚类精度提升了7.68,3.48个百分点。

关键词: 关键词: 不完备多视图聚类, K最近邻, 注意力机制, 皮尔逊相关系数, k均值聚类算法

Abstract: Abstract: In order to solve the problems of uncertainty in completing missing view data, lack of robustness of embedding learning and low model generalization in traditional deep incomplete multi-view clustering algorithms, an Incomplete Multi-View Clustering algorithm based on Attention Mechanism (IMVCAM) was proposed. First, K-Nearest Neighbors (KNN) was used to complete the missing data in the view, making the training data complementary. Then, after passing the linear encoding layer, the obtained embedding was passed through the attention layer to improve the quality of the embedding. Finally, the embedding obtained from the training of each view was clustered using the k-means clustering algorithm (k-means), and the weights of the views were determined by the Pearson correlation coefficient. The experiments were conducted on five classic datasets, and the best results were achieved on the Fashion dataset. Experimental results on the Fashion dataset showed that compared with the suboptimal DSIMVC (Deep Safe Incomplete Multi-View Clustering), the proposed algorithm IMVCAM improved the clustering accuracy by 2.85 and 4.35 percentage points when the data missing rate was 0.1 and 0.3 respectively. In addition, on the Caltech101-20 dataset, the clustering accuracy increased by 7.68 and 3.48 percentage points compared to the suboptimal IMVCSAF (Incomplete Multi-View Clustering algorithm based on Self-Attention Fusion) when the missing rate was 0.1 and 0.3.

Key words: Keywords: Incomplete multi-view clustering, Attention mechanism, K-Nearest Neighbors (KNN), k-means clustering algorithm (k-means), Pearson correlation coefficient

中图分类号: