基于语义最优化的图像聚类算法

doi:10.11772/j.issn.1001-9081.2023040468

摘要/Abstract

摘要：

针对深度聚类中采用对比学习方式得到的语义特征信息不足的问题，提出一种优化语义特征的算法。在预训练阶段，采用重构损失作为正则化项，增加特征表示和输入之间的互信息，从而近似引入更多与聚类任务相关的信息，降低对比学习过拟合共享信息的风险；在微调阶段，抛弃传统的聚类算法与聚类网络同时更新的方式，采用图像近邻之间的相似性差异作为损失更新聚类网络，以最大限度地利用图像之间的近邻语义信息。在CIFAR10、CIFAR100和STL10数据集上的实验结果表明，所提算法在STL10数据集上的准确率比次优的SCAN（Semantic Clustering by Adopting Nearest neighbors）算法提高了2.7个百分点，并且在标准化互信息（NMI）和调整兰德系数（ARI）指标上均取得了领先，验证了所提算法的有效性。

关键词: 深度聚类, 对比学习, 语义特征, 过拟合, 正则化

Abstract:

Aiming at the problem of insufficient information of semantic features obtained by using contrastive learning in deep clustering， an algorithm for optimizing semantic features was proposed. In the pre-training stage， Reconstruction loss was used as a regularization term to increase the mutual information between the feature representation and the input， thus approximating the introduction of more information relevant to the clustering task and reducing the risk of overfitting shared information by contrastive learning. In the fine-tuning stage， the traditional method that the clustering algorithm and the clustering network were updated simultaneously was abandoned， and the similarity difference between the nearest neighbors of the image was used as the loss to update the clustering network to maximize the use of the semantic information of the nearest neighbors of the image. Experiments results on the CIFAR10， CIFAR100 and STL10 datasets show that the proposed algorithm improves the accuracy on the STL10 dataset by 2.7 percentage points compared to the suboptimal SCAN （Semantic Clustering by Adopting Nearest neighbors） algorithm， and achieves a lead in both the Normalized Mutual Information （NMI） and Adjusted Rand Index （ARI） metrics， which validates the effectiveness of the proposed algorithm.

Key words: deep clustering, contrastive learning, semantic feature, overfitting, regularization

中图分类号:

TP391.4

张凯, 宋承云. 基于语义最优化的图像聚类算法[J]. 计算机应用, 2023, 43(S2): 117-121.

Kai ZHANG, Chengyun SONG. Image clustering algorithm based on semantic optimization[J]. Journal of Computer Applications, 2023, 43(S2): 117-121.

图/表 8

参考文献 31

1	XIE J Y， GIRSHICK R， FARHADI A. Unsupervised deep embedding for clustering analysis［C］// Proceedings of the 33rd International Conference on Machine Learning. New York： ACM， 2016：478-487.
2	马志峰，于俊洋，王龙葛.多样性表示的深度子空间聚类算法［J］.计算机应用，2023，43（2）：407-412.
3	WU J， LONG K， WANG F， et al. Deep comprehensive correlation mining for image clustering［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 8150-8159. 10.1109/iccv.2019.00824
4	董永峰，邓亚晗，董瑶，等.基于深度学习的聚类综述［J］.计算机应用，2022，42（4）：1021-1028.
5	DIZAJI K G， HERANDI A， DENG C， et al. Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 5747-5756. 10.1109/iccv.2017.612
6	JIANG Z， ZHENG Y， TAN H， et al. Variational deep embedding： an unsupervised and generative approach to clustering ［C］ // Proceedings of the 26th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2017： 2139.1-2139.22. 10.24963/ijcai.2017/273
7	JI X， HENRIQUES J F， VEDALDI A. Invariant information clustering for unsupervised image classification and segmentation［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 9865-9874. 10.1109/iccv.2019.00996
8	CAI J， FAN J， GUO W， et al. Efficient deep embedded subspace clustering［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 21-30. 10.1109/cvpr52688.2022.00012
9	LLOYD S. Least squares quantization in PCM［J］. IEEE Transactions on Information Theory， 1982， 28（2）： 129-137. 10.1109/tit.1982.1056489
10	CARON M， BOJANOWSKI P， JOULIN A， et al. Deep clustering for unsupervised learning of visual features［C］// Proceedings of the 2018 European Conference on Computer Vision. Cham： Springer， 2018： 139-156. 10.1007/978-3-030-01264-9_9
11	HE K， FAN H， WU Y， et al. Momentum contrast for unsupervised visual representation learning［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 9726-9735. 10.1109/cvpr42600.2020.00975
12	TSAI Y H H， WU Y， SALAKHUTDINOV R， et al. Self-supervised learning from a multi-view perspective ［C］ // Proceedings of the 2021 International Conference on Learning Representations. New Orleans： ICLR， 2021： 1-18.
13	FEDERICI M， DUTTA A， FORRE P， et al. Learning robust representations via multi-view information bottleneck ［C］// Proceedings of the 2020 International Conference on Learning Representations. New Orleans： ICLR， 2020： 1-26.
14	TIAN Y， SUN C， POOLE B， et al. What makes for good views for contrastive learning？［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. New York： ACM， 2020： 6827-6839.
15	WANG H， GUO X， DENG Z H， et al. Rethinking minimal sufficient representation in contrastive learning［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 16020-16029. 10.1109/cvpr52688.2022.01557
16	DANG Z， DENG C， YANG X， et al. Doubly contrastive deep clustering ［EB/OL］. （2021-03-09）［2023-01-28］. .
17	CHANG J， WANG L， MENG G， et al. Deep adaptive image clustering［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 5880-5888. 10.1109/iccv.2017.626
18	HUANG J， GONG S， ZHU X. Deep semantic clustering by partition confidence maximization ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 8846-8855. 10.1109/cvpr42600.2020.00887
19	van GANSBEKE W， VANDENHENDE S， GEORGOULIS S， et al. SCAN： learning to classify images without labels［C］//Proceedings of the 2020 European Conference on Computer Vision. Cham： Springer， 2020： 268-285. 10.1007/978-3-030-58607-2_16
20	HADSELL R， CHOPRA S， LeCUN Y. Dimensionality reduction by learning an invariant mapping［C］// Proceedings of the 2006 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2006： 1735-1742. 10.1109/cvpr.2006.9
21	CHEN T， KORNBLITH S， NOROUZI M， et al. A simple framework for contrastive learning of visual representations［C］//Proceedings of the 37th International Conference on Machine Learning. New York： ACM， 2020： 1597-1607.
22	GOYAL P， DOLLÁR P， GIRSHICK R， et al. Accurate， large minibatch SGD： training ImageNet in 1 hour ［EB/OL］. （2018-04-30）［2023-02-15］. .
23	WU Z， XIONG Y， YU S X， et al. Unsupervised feature learning via non-parametric instance discrimination［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 3733-3742. 10.1109/cvpr.2018.00393
24	J-B GRILL， STRUB F， ALTCHÉ F， et al. Bootstrap your own latent-a new approach to self-supervised learning［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. New York： ACM， 2020： 21271-21284.
25	ZBONTAR J， JING L， MISRA I， et al. Barlow twins： self-supervised learning via redundancy reduction［C］// Proceedings of the 2021 38th International Conference on Machine Learning. New York： PMLR.org， 2021： 12310-12320.
26	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
27	KRIZHEVSKY A. Learning multiple layers of features from tiny images［EB/OL］. （2009-04-08）［2023-04-02］. . 10.1016/j.tics.2007.09.004
28	COATES A， NG A， LEE H. An analysis of single-layer networks in unsupervised feature learning［C］// Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. New York： PMLR.org， 2011： 215-223.
29	KINGMA D P， WELLING M. Auto-encoding variational Bayes ［EB/OL］. （2013-12-20）［2023-03-16］. . 10.1561/2200000056
30	TAO Y， TAKAGI K， NAKATA K. Clustering-friendly representation learning via instance discrimination and feature decorrelation ［C］// Proceedings of the 2021 International Conference on Learning Representations. New Orleans： ICLR， 2021： 1-14.
31	SADEGHI M， HOJJATI H， ARMANFARD N. C3 ： cross-instance guided contrastive clustering ［EB/OL］. （2022-11-21）［2023-02-19］. .

数据集	类别数	图像数	图像大小
CIFAR10	10	60 000	32×32×3
CIFAR100	20	60 000	32×32×3
STL10	10	130 000	96×96×3

数据集	类别数	图像数	图像大小
CIFAR10	10	60 000	32×32×3
CIFAR100	20	60 000	32×32×3
STL10	10	130 000	96×96×3

算法	CIFAR10			CIFAR100-20			STL10
算法	ACC	NMI	ARI	ACC	NMI	ARI	ACC	NMI	ARI
K-means	0.229	0.087	0.049	0.130	0.084	0.028	0.192	0.125	0.061
VAE	0.291	0.245	0.167	0.152	0.108	0.040	0.282	0.200	0.146
DEC	0.301	0.257	0.161	0.185	0.136	0.050	0.359	0.276	0.186
DCCM	0.623	0.496	0.408	0.327	0.285	0.173	0.482	0.376	0.262
PICA	0.696	0.591	0.512	0.337	0.310	0.171	0.713	0.611	0.531
DCDC	0.699	0.585	0.506	0.349	0.310	0.179	0.734	0.621	0.547
SCAN	0.862	0.765	0.751	0.465	0.482	0.325	0.761	0.660	0.599
IDFD	0.828	0.714	0.679	0.425	0.432	0.244	0.756	0.636	0.569
C3	0.838	0.748	0.707	0.451	0.434	0.275	—	—	—
本文算法	0.874	0.779	0.754	0.485	0.493	0.332	0.788	0.673	0.605

算法	CIFAR10			CIFAR100-20			STL10
算法	ACC	NMI	ARI	ACC	NMI	ARI	ACC	NMI	ARI
K-means	0.229	0.087	0.049	0.130	0.084	0.028	0.192	0.125	0.061
VAE	0.291	0.245	0.167	0.152	0.108	0.040	0.282	0.200	0.146
DEC	0.301	0.257	0.161	0.185	0.136	0.050	0.359	0.276	0.186
DCCM	0.623	0.496	0.408	0.327	0.285	0.173	0.482	0.376	0.262
PICA	0.696	0.591	0.512	0.337	0.310	0.171	0.713	0.611	0.531
DCDC	0.699	0.585	0.506	0.349	0.310	0.179	0.734	0.621	0.547
SCAN	0.862	0.765	0.751	0.465	0.482	0.325	0.761	0.660	0.599
IDFD	0.828	0.714	0.679	0.425	0.432	0.244	0.756	0.636	0.569
C3	0.838	0.748	0.707	0.451	0.434	0.275	—	—	—
本文算法	0.874	0.779	0.754	0.485	0.493	0.332	0.788	0.673	0.605

预训练阶段	微调阶段	ACC
对比损失	K-means	0.659
对比损失	近邻损失	0.860
对比损失+重构损失	K-means	0.716
对比损失+重构损失	近邻损失	0.874