Multimodal fusion recommendation algorithm based on joint self-supervised learning

doi:10.11772/j.issn.1001-9081.2024060824

Abstract

Abstract:

To address the data sparsity problem in multimodal recommendation algorithms and the problem in the existing Self-Supervised Learning （SSL） algorithms that the algorithms often focus on SSL a single feature in a dataset， ignoring the possibility of joint learning of multiple features， a multimodal fusion recommendation algorithm based on joint self-supervised learning was proposed， called SFELMMR （SelF-supErvised Learning for MultiModal Recommendation）. Firstly， the existing SSL strategies were integrated and optimized to enhance data representation capabilities significantly by learning data features from different modalities jointly， thereby alleviating the data sparsity issue. Secondly， a method to construct multimodal latent semantic graph was designed by integrating deep item relationships from a global perspective with direct interactions from a local perspective， enabling the algorithm to capture complex relationships among items more accurately. Finally， experiments were carried out on three datasets. The results demonstrate that the proposed algorithm achieves significant improvements in multiple recommendation performance metrics compared to the existing best-performing multimodal recommendation algorithms. Specifically， the proposed algorithm has the Recall@10 improved by 5.49%， 2.56%， and 2.99%， respectively， the NDCG@10 improved by 1.17%， 1.98%， and 3.52%， respectively， the Precision@10 improved by 4.69%， 2.74%， and 1.22%， respectively， and the Map@10 improved by 0.81%， 1.59%， and 3.11%， respectively. Besides， through ablation experiments of the proposed algorithm， the effectiveness of the algorithm is verified.

Key words: recommendation system, multimodal, Self-Supervised Learning (SSL), Graph Convolutional neural Network (GCN), feature fusion

摘要：

针对多模态推荐算法的数据稀疏性问题，以及现有的自监督学习（SSL）算法往往集中在对数据集中单一特征的SSL上，而忽视了多特征联合学习的可能性的问题，提出一种基于联合SSL的多模态融合推荐算法SFELMMR （SelF supErvised Learning for MultiModal Recommendation）。首先，整合并优化现有的SSL策略，以通过联合学习不同模态的数据特征，显著增强数据的表示能力，从而缓解数据稀疏性的问题；其次，通过融合全局视角下的深层次项目关系和局部视角下的直接相互作用，设计一种构造多模态潜在语义图的方法，使算法能更精准地捕捉项目间的复杂联系；最后，在3个数据集上进行实验。结果表明，与现有算法中表现最佳的多模态推荐算法相比，所提算法在多个推荐性能指标上取得了显著提升。具体地，所提算法的Recall@10分别提升了5.49%、2.56%、2.99%，NDCG@10分别提升了1.17%、1.98%、3.52%，Precision@10分别提升了4.69%、2.74%、1.22%，Map@10分别提升了0.81%、1.59%、3.11%。此外，通过对所提算法进行消融实验，验证了该算法的有效性。

CLC Number:

TP391.3

Zonghang WU, Dong ZHANG, Guanyu LI. Multimodal fusion recommendation algorithm based on joint self-supervised learning[J]. Journal of Computer Applications, 2025, 45(6): 1858-1868.

吴宗航, 张东, 李冠宇. 基于联合自监督学习的多模态融合推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1858-1868.

Figures/Tables 7

References 31

1	刘君良，李晓光. 个性化推荐系统技术进展［J］. 计算机科学， 2020， 47（7）：47-55.
	LIU J L， LI X G. Techniques for recommendation system： a survey［J］. Computer Science， 2020， 47（7）： 47-55.
2	HE R， McAULEY J. VBPR： visual Bayesian personalized ranking from implicit feedback［C］// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2016：144-150.
3	WANG Q， WEI Y， YIN J， et al. DualGNN： dual graph neural network for multimedia recommendation［J］. IEEE Transactions on Multimedia， 2023， 25： 1074-1084.
4	ZHANG J， ZHU Y， LIU Q， et al. Mining latent structures for multimedia recommendation［C］// Proceedings of the 29th ACM International Conference on Multimedia. New York： ACM， 2021： 3872-3880.
5	ZHOU X， SHEN Z. A tale of two graphs： freezing and denoising graph structures for multimodal recommendation［C］// Proceedings of the 31st ACM International Conference on Multimedia. New York： ACM， 2023： 935-943.
6	ZHU Y， XU Y， YU F， et al. Graph contrastive learning with adaptive augmentation［C］// Proceedings of the Web Conference 2021. New York： ACM， 2021： 2069-2080.
7	XUN J， ZHANG S， ZHAO Z， et al. Why do we click： visual impression-aware news recommendation［C］// Proceedings of the 29th ACM International Conference on Multimedia. New York： ACM， 2021： 3881-3890.
8	ZHOU X， ZHOU H， LIU Y， et al. Bootstrap latent representations for multi-modal recommendation［C］// Proceedings of the ACM Web Conference 2023. New York： ACM， 2023： 845-854.
9	WEI W， HUANG C， XIA L， et al. Multi-modal self-supervised learning for recommendation［C］// Proceedings of the ACM Web Conference 2023. New York： ACM， 2023： 790-800.
10	TAO Z， WEI Y， WANG X， et al. MGAT： multimodal graph attention network for recommendation［J］. Information Processing and Management， 2020， 57（5）： No.102277.
11	TAO Z， LIU X， XIA Y， et al. Self-supervised learning for multimedia recommendation［J］. IEEE Transactions on Multimedia， 2023， 25： 5107-5116.
12	SUN R， CAO X， ZHAO Y， et al. Multi-modal knowledge graphs for recommender systems［C］// Proceedings of the 29th ACM International Conference on Information and Knowledge Management. New York： ACM， 2020： 1405-1414.
13	ZHOU H， ZHOU X， ZHANG L， et al. Enhancing dyadic relations with homogeneous graphs for multimodal recommendation［C］// Proceedings of the 26th European Conference on Artificial Intelligence/ the 12th Conference on Prestigious Applications of Intelligent Systems. Amsterdam： IOS Press， 2023： 3123-3130.
14	ZHOU X， SUN A， LIU Y， et al. SelfCF： a simple framework for self-supervised collaborative filtering［J］. ACM Transactions on Recommender Systems， 2023， 1（2）： No.9.
15	WU J， WANG X， FENG F， et al. Self-supervised graph learning for recommendation［C］// Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2021： 726-735.
16	YU J， YIN H， XIA X， et al. Are graph augmentations necessary？ simple graph contrastive learning for recommendation［C］// Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2022： 1294-1303.
17	XIA L， HUANG C， SHI J， et al. Graph-less collaborative filtering［C］// Proceedings of the ACM Web Conference 2023. New York： ACM， 2023： 17-27.
18	YI Z， WANG X， OUNIS I， et al. Multi-modal graph contrastive learning for micro-video recommendation［C］// Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2022： 1807-1811.
19	ZHOU J， CUI G， HU S， et al. Graph neural networks： a review of methods and applications ［J］. AI Open， 2020， 1： 57-81.
20	WEI Y， WANG X， NIE L， et al. MMGCN： multi-modal graph convolution network for personalized recommendation of micro-video［C］// Proceedings of the 27th ACM International Conference on Multimedia. New York： ACM， 2019： 1437-1445.
21	HE X， DENG K， WANG X， et al. LightGCN： simplifying and powering graph convolution network for recommendation［C］// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2020： 639-648.
22	XU J， CHEN Z， YANG S， et al. MENTOR： multi-level self-supervised learning for multimodal recommendation ［EB/OL］. ［2024-06-05］..
23	CHEN T， KORNBLITH S， SWERSKY K， et al. Big self-supervised models are strong semi-supervised learners［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 22243-22255.
24	CHEN T， KORNBLITH S， NOROUZI M， et al. A simple framework for contrastive learning of visual representations ［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 1597-1607.
25	RENDLE S， FREUDENTHALER C， GANTNER Z， et al. BPR： Bayesian personalized ranking from implicit feedback［C］// Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. Arlington， VA： AUAI Press， 2009： 452-461.
26	McAULEY J， TARGETT C， SHI Q， et al. Image-based recommendations on styles and substitutes ［C］// Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2015： 43-52.
27	KINGMA D P， BA J L. Adam： a method for stochastics optimization［EB/OL］. ［2024-06-05］..
28	ZHOU X， LIN D， LIU Y， et al. Layer-refined graph convolutional networks for recommendation ［C］// Proceedings of the IEEE 39th International Conference on Data Engineering. Piscataway： IEEE， 2023： 1247-1259.
29	WEI Y， WANG X， NIE L， et al. Graph-refined convolutional network for multimedia recommendation with implicit feedback［C］// Proceedings of the 28th ACM International Conference on Multimedia. New York： ACM， 2020： 3541-3549.
30	YU P， TAN Z， LU G， et al. Multi-view graph convolutional network for multimedia recommendation ［C］// Proceedings of the 31st ACM International Conference on Multimedia. New York： ACM， 2023： 6576-6585.
31	ZHOU X. MMRec： simplifying multimodal recommendation ［C］// Proceedings of the 5th ACM International Conference on Multimedia in Asia Workshops. New York： ACM， 2023： No.6.

数据集	用户数	项目数	交互数
TikTok	9 308	6 710	68 722
Baby	19 445	7 050	160 792
Sports	35 598	18 357	296 337

数据集	用户数	项目数	交互数
TikTok	9 308	6 710	68 722
Baby	19 445	7 050	160 792
Sports	35 598	18 357	296 337

算法	TikTok				Baby				Sports
算法	R@10	N@10	P@10	M@10	R@10	N@10	P@10	M@10	R@10	N@10	P@10	M@10
最优较次优提升/%	5.49	1.17	4.69	0.81	2.56	1.98	2.74	1.59	2.99	3.52	1.22	3.11
SelfCF	0.058 6	0.029 2	0.005 9	0.020 3	0.052 1	0.027 9	0.005 8	0.019 9	0.063 0	0.034 4	0.007 0	0.024 8
LayerGCN	0.059 4	0.033 9	0.005 9	0.026 3	0.051 8	0.027 7	0.005 8	0.019 6	0.061 6	0.033 6	0.006 9	0.024 1
MMGCN	0.055 2	0.029 7	0.005 5	0.022 0	0.042 0	0.021 8	0.004 7	0.015 1	0.038 8	0.020 6	0.004 4	0.014 4
GRCN	0.048 8	0.023 1	0.004 9	0.015 4	0.052 8	0.028 2	0.005 9	0.020 0	0.057 3	0.030 9	0.006 4	0.022 0
MGCN	0.061 9	0.032 5	0.006 2	0.023 6	0.061 3	0.032 9	0.006 8	0.023 5	0.073 3	0.040 2	0.008 1	0.029 2
LATTICE	0.057 8	0.030 8	0.005 8	0.022 6	0.054 9	0.029 1	0.006 1	0.020 5	0.062 2	0.034 1	0.006 9	0.024 7
FREEDOM	0.053 7	0.031 6	0.006 4	0.024 5	0.062 8	0.032 9	0.006 8	0.022 7	0.071 3	0.038 2	0.007 9	0.027 2
DRAGON	0.062 0	0.032 8	0.006 2	0.023 9	0.065 6	0.034 6	0.007 2	0.024 4	0.072 6	0.039 6	0.008 2	0.028 9
BM3	0.061 7	0.032 2	0.006 2	0.023 4	0.055 1	0.029 0	0.006 2	0.022 4	0.063 5	0.034 3	0.007 1	0.024 5
SLMRec	0.046 0	0.023 2	0.004 6	0.016 5	0.055 1	0.029 5	0.006 1	0.021 0	0.067 6	0.037 4	0.007 5	0.027 2
MENTOR	0.063 8	0.034 2	0.006 4	0.024 6	0.066 5	0.035 4	0.007 3	0.025 2	0.073 5	0.039 8	0.008 1	0.028 6
SFELMMR	0.067 3	0.034 6	0.006 7	0.024 8	0.068 2	0.036 1	0.007 5	0.025 6	0.075 7	0.041 2	0.008 3	0.029 8

算法	TikTok				Baby				Sports
算法	R@10	N@10	P@10	M@10	R@10	N@10	P@10	M@10	R@10	N@10	P@10	M@10
最优较次优提升/%	5.49	1.17	4.69	0.81	2.56	1.98	2.74	1.59	2.99	3.52	1.22	3.11
SelfCF	0.058 6	0.029 2	0.005 9	0.020 3	0.052 1	0.027 9	0.005 8	0.019 9	0.063 0	0.034 4	0.007 0	0.024 8
LayerGCN	0.059 4	0.033 9	0.005 9	0.026 3	0.051 8	0.027 7	0.005 8	0.019 6	0.061 6	0.033 6	0.006 9	0.024 1
MMGCN	0.055 2	0.029 7	0.005 5	0.022 0	0.042 0	0.021 8	0.004 7	0.015 1	0.038 8	0.020 6	0.004 4	0.014 4
GRCN	0.048 8	0.023 1	0.004 9	0.015 4	0.052 8	0.028 2	0.005 9	0.020 0	0.057 3	0.030 9	0.006 4	0.022 0
MGCN	0.061 9	0.032 5	0.006 2	0.023 6	0.061 3	0.032 9	0.006 8	0.023 5	0.073 3	0.040 2	0.008 1	0.029 2
LATTICE	0.057 8	0.030 8	0.005 8	0.022 6	0.054 9	0.029 1	0.006 1	0.020 5	0.062 2	0.034 1	0.006 9	0.024 7
FREEDOM	0.053 7	0.031 6	0.006 4	0.024 5	0.062 8	0.032 9	0.006 8	0.022 7	0.071 3	0.038 2	0.007 9	0.027 2
DRAGON	0.062 0	0.032 8	0.006 2	0.023 9	0.065 6	0.034 6	0.007 2	0.024 4	0.072 6	0.039 6	0.008 2	0.028 9
BM3	0.061 7	0.032 2	0.006 2	0.023 4	0.055 1	0.029 0	0.006 2	0.022 4	0.063 5	0.034 3	0.007 1	0.024 5
SLMRec	0.046 0	0.023 2	0.004 6	0.016 5	0.055 1	0.029 5	0.006 1	0.021 0	0.067 6	0.037 4	0.007 5	0.027 2
MENTOR	0.063 8	0.034 2	0.006 4	0.024 6	0.066 5	0.035 4	0.007 3	0.025 2	0.073 5	0.039 8	0.008 1	0.028 6
SFELMMR	0.067 3	0.034 6	0.006 7	0.024 8	0.068 2	0.036 1	0.007 5	0.025 6	0.075 7	0.041 2	0.008 3	0.029 8

算法	TikTok				Baby				Sports
算法	R@10	N@10	P@10	M@10	R@10	N@10	P@10	M@10	R@10	N@10	P@10	M@10
SFELMMR_K	0.059 4	0.030 6	0.005 9	0.022 0	0.065 8	0.034 7	0.007 2	0.024 5	0.073 9	0.039 9	0.008 1	0.028 6
SFELMMR_T	0.064 8	0.032 4	0.006 5	0.022 8	0.065 3	0.035 3	0.007 2	0.025 3	0.074 6	0.040 7	0.008 2	0.029 4
SFELMMR_CMA	0.066 1	0.032 9	0.005 2	0.023 0	0.067 0	0.035 5	0.007 4	0.025 2	0.074 5	0.040 1	0.008 2	0.028 8
SFELMMR_FE	0.066 6	0.034 4	0.0067	0.024 7	0.066 8	0.035 6	0.007 4	0.025 2	0.075 2	0.040 7	0.008 2	0.029 3
SFELMMR_FE-C	0.066 8	0.0354	0.006 6	0.0249	0.067 8	0.036 0	0.0075	0.0256	0.075 4	0.040 8	0.0083	0.029 3
SFELMMR_GP	0.066 0	0.034 8	0.006 6	0.024 8	0.066 1	0.035 6	0.007 3	0.025 5	0.075 0	0.040 7	0.008 2	0.029 4
SFELMMR_GP-ProJH	0.066 9	0.034 4	0.0067	0.024 7	0.067 9	0.0361	0.007 4	0.025 5	0.075 5	0.040 6	0.0083	0.029 1
SFELMMR	0.0673	0.034 6	0.0067	0.024 8	0.0682	0.0361	0.0075	0.0256	0.0757	0.0412	0.0083	0.0298