基于联合自监督学习的多模态融合推荐算法

doi:10.11772/j.issn.1001-9081.2024060824

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (6): 1858-1868.DOI: 10.11772/j.issn.1001-9081.2024060824

基于联合自监督学习的多模态融合推荐算法

吴宗航, 张东, 李冠宇()

大连海事大学信息科学技术学院，辽宁大连 116026

收稿日期:2024-06-20 修回日期:2024-09-18 接受日期:2024-09-19 发布日期:2024-10-11 出版日期:2025-06-10
通讯作者: 李冠宇
作者简介:吴宗航（2002—），男，吉林公主岭人，硕士研究生，CCF会员，主要研究方向：推荐系统、智能信息处理
张东（1996—），男，辽宁海城人，博士研究生，主要研究方向：自然语言处理、知识图谱
基金资助:
国家自然科学基金资助项目(61976032)

Multimodal fusion recommendation algorithm based on joint self-supervised learning

Zonghang WU, Dong ZHANG, Guanyu LI()

Information Science and Technology College，Dalian Maritime University，Dalian Liaoning 116026，China

Received:2024-06-20 Revised:2024-09-18 Accepted:2024-09-19 Online:2024-10-11 Published:2025-06-10
Contact: Guanyu LI
About author:WU Zonghang， born in 2002， M. S. candidate. His research interests include recommender system， intelligent information processing.
ZHANG Dong， born in 1996， Ph. D. candidate. His research interests include natural language processing， knowledge graph.
Supported by:
National Natural Science Foundation of China(61976032)

摘要/Abstract

摘要：

针对多模态推荐算法的数据稀疏性问题，以及现有的自监督学习（SSL）算法往往集中在对数据集中单一特征的SSL上，而忽视了多特征联合学习的可能性的问题，提出一种基于联合SSL的多模态融合推荐算法SFELMMR （SelF supErvised Learning for MultiModal Recommendation）。首先，整合并优化现有的SSL策略，以通过联合学习不同模态的数据特征，显著增强数据的表示能力，从而缓解数据稀疏性的问题；其次，通过融合全局视角下的深层次项目关系和局部视角下的直接相互作用，设计一种构造多模态潜在语义图的方法，使算法能更精准地捕捉项目间的复杂联系；最后，在3个数据集上进行实验。结果表明，与现有算法中表现最佳的多模态推荐算法相比，所提算法在多个推荐性能指标上取得了显著提升。具体地，所提算法的Recall@10分别提升了5.49%、2.56%、2.99%，NDCG@10分别提升了1.17%、1.98%、3.52%，Precision@10分别提升了4.69%、2.74%、1.22%，Map@10分别提升了0.81%、1.59%、3.11%。此外，通过对所提算法进行消融实验，验证了该算法的有效性。

Abstract:

To address the data sparsity problem in multimodal recommendation algorithms and the problem in the existing Self-Supervised Learning （SSL） algorithms that the algorithms often focus on SSL a single feature in a dataset， ignoring the possibility of joint learning of multiple features， a multimodal fusion recommendation algorithm based on joint self-supervised learning was proposed， called SFELMMR （SelF-supErvised Learning for MultiModal Recommendation）. Firstly， the existing SSL strategies were integrated and optimized to enhance data representation capabilities significantly by learning data features from different modalities jointly， thereby alleviating the data sparsity issue. Secondly， a method to construct multimodal latent semantic graph was designed by integrating deep item relationships from a global perspective with direct interactions from a local perspective， enabling the algorithm to capture complex relationships among items more accurately. Finally， experiments were carried out on three datasets. The results demonstrate that the proposed algorithm achieves significant improvements in multiple recommendation performance metrics compared to the existing best-performing multimodal recommendation algorithms. Specifically， the proposed algorithm has the Recall@10 improved by 5.49%， 2.56%， and 2.99%， respectively， the NDCG@10 improved by 1.17%， 1.98%， and 3.52%， respectively， the Precision@10 improved by 4.69%， 2.74%， and 1.22%， respectively， and the Map@10 improved by 0.81%， 1.59%， and 3.11%， respectively. Besides， through ablation experiments of the proposed algorithm， the effectiveness of the algorithm is verified.

Key words: recommendation system, multimodal, Self-Supervised Learning (SSL), Graph Convolutional neural Network (GCN), feature fusion

中图分类号:

TP391.3

吴宗航, 张东, 李冠宇. 基于联合自监督学习的多模态融合推荐算法[J]. 计算机应用, 2025, 45(6): 1858-1868.

Zonghang WU, Dong ZHANG, Guanyu LI. Multimodal fusion recommendation algorithm based on joint self-supervised learning[J]. Journal of Computer Applications, 2025, 45(6): 1858-1868.

图/表 7

参考文献 31

1	刘君良，李晓光. 个性化推荐系统技术进展［J］. 计算机科学， 2020， 47（7）：47-55.
	LIU J L， LI X G. Techniques for recommendation system： a survey［J］. Computer Science， 2020， 47（7）： 47-55.
2	HE R， McAULEY J. VBPR： visual Bayesian personalized ranking from implicit feedback［C］// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2016：144-150.
3	WANG Q， WEI Y， YIN J， et al. DualGNN： dual graph neural network for multimedia recommendation［J］. IEEE Transactions on Multimedia， 2023， 25： 1074-1084.
4	ZHANG J， ZHU Y， LIU Q， et al. Mining latent structures for multimedia recommendation［C］// Proceedings of the 29th ACM International Conference on Multimedia. New York： ACM， 2021： 3872-3880.
5	ZHOU X， SHEN Z. A tale of two graphs： freezing and denoising graph structures for multimodal recommendation［C］// Proceedings of the 31st ACM International Conference on Multimedia. New York： ACM， 2023： 935-943.
6	ZHU Y， XU Y， YU F， et al. Graph contrastive learning with adaptive augmentation［C］// Proceedings of the Web Conference 2021. New York： ACM， 2021： 2069-2080.
7	XUN J， ZHANG S， ZHAO Z， et al. Why do we click： visual impression-aware news recommendation［C］// Proceedings of the 29th ACM International Conference on Multimedia. New York： ACM， 2021： 3881-3890.
8	ZHOU X， ZHOU H， LIU Y， et al. Bootstrap latent representations for multi-modal recommendation［C］// Proceedings of the ACM Web Conference 2023. New York： ACM， 2023： 845-854.
9	WEI W， HUANG C， XIA L， et al. Multi-modal self-supervised learning for recommendation［C］// Proceedings of the ACM Web Conference 2023. New York： ACM， 2023： 790-800.
10	TAO Z， WEI Y， WANG X， et al. MGAT： multimodal graph attention network for recommendation［J］. Information Processing and Management， 2020， 57（5）： No.102277.
11	TAO Z， LIU X， XIA Y， et al. Self-supervised learning for multimedia recommendation［J］. IEEE Transactions on Multimedia， 2023， 25： 5107-5116.
12	SUN R， CAO X， ZHAO Y， et al. Multi-modal knowledge graphs for recommender systems［C］// Proceedings of the 29th ACM International Conference on Information and Knowledge Management. New York： ACM， 2020： 1405-1414.
13	ZHOU H， ZHOU X， ZHANG L， et al. Enhancing dyadic relations with homogeneous graphs for multimodal recommendation［C］// Proceedings of the 26th European Conference on Artificial Intelligence/ the 12th Conference on Prestigious Applications of Intelligent Systems. Amsterdam： IOS Press， 2023： 3123-3130.
14	ZHOU X， SUN A， LIU Y， et al. SelfCF： a simple framework for self-supervised collaborative filtering［J］. ACM Transactions on Recommender Systems， 2023， 1（2）： No.9.
15	WU J， WANG X， FENG F， et al. Self-supervised graph learning for recommendation［C］// Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2021： 726-735.
16	YU J， YIN H， XIA X， et al. Are graph augmentations necessary？ simple graph contrastive learning for recommendation［C］// Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2022： 1294-1303.
17	XIA L， HUANG C， SHI J， et al. Graph-less collaborative filtering［C］// Proceedings of the ACM Web Conference 2023. New York： ACM， 2023： 17-27.
18	YI Z， WANG X， OUNIS I， et al. Multi-modal graph contrastive learning for micro-video recommendation［C］// Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2022： 1807-1811.
19	ZHOU J， CUI G， HU S， et al. Graph neural networks： a review of methods and applications ［J］. AI Open， 2020， 1： 57-81.
20	WEI Y， WANG X， NIE L， et al. MMGCN： multi-modal graph convolution network for personalized recommendation of micro-video［C］// Proceedings of the 27th ACM International Conference on Multimedia. New York： ACM， 2019： 1437-1445.
21	HE X， DENG K， WANG X， et al. LightGCN： simplifying and powering graph convolution network for recommendation［C］// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2020： 639-648.
22	XU J， CHEN Z， YANG S， et al. MENTOR： multi-level self-supervised learning for multimodal recommendation ［EB/OL］. ［2024-06-05］..
23	CHEN T， KORNBLITH S， SWERSKY K， et al. Big self-supervised models are strong semi-supervised learners［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 22243-22255.
24	CHEN T， KORNBLITH S， NOROUZI M， et al. A simple framework for contrastive learning of visual representations ［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 1597-1607.
25	RENDLE S， FREUDENTHALER C， GANTNER Z， et al. BPR： Bayesian personalized ranking from implicit feedback［C］// Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. Arlington， VA： AUAI Press， 2009： 452-461.
26	McAULEY J， TARGETT C， SHI Q， et al. Image-based recommendations on styles and substitutes ［C］// Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2015： 43-52.
27	KINGMA D P， BA J L. Adam： a method for stochastics optimization［EB/OL］. ［2024-06-05］..
28	ZHOU X， LIN D， LIU Y， et al. Layer-refined graph convolutional networks for recommendation ［C］// Proceedings of the IEEE 39th International Conference on Data Engineering. Piscataway： IEEE， 2023： 1247-1259.
29	WEI Y， WANG X， NIE L， et al. Graph-refined convolutional network for multimedia recommendation with implicit feedback［C］// Proceedings of the 28th ACM International Conference on Multimedia. New York： ACM， 2020： 3541-3549.
30	YU P， TAN Z， LU G， et al. Multi-view graph convolutional network for multimedia recommendation ［C］// Proceedings of the 31st ACM International Conference on Multimedia. New York： ACM， 2023： 6576-6585.
31	ZHOU X. MMRec： simplifying multimodal recommendation ［C］// Proceedings of the 5th ACM International Conference on Multimedia in Asia Workshops. New York： ACM， 2023： No.6.

数据集	用户数	项目数	交互数
TikTok	9 308	6 710	68 722
Baby	19 445	7 050	160 792
Sports	35 598	18 357	296 337

数据集	用户数	项目数	交互数
TikTok	9 308	6 710	68 722
Baby	19 445	7 050	160 792
Sports	35 598	18 357	296 337

算法	TikTok				Baby				Sports
算法	R@10	N@10	P@10	M@10	R@10	N@10	P@10	M@10	R@10	N@10	P@10	M@10
最优较次优提升/%	5.49	1.17	4.69	0.81	2.56	1.98	2.74	1.59	2.99	3.52	1.22	3.11
SelfCF	0.058 6	0.029 2	0.005 9	0.020 3	0.052 1	0.027 9	0.005 8	0.019 9	0.063 0	0.034 4	0.007 0	0.024 8
LayerGCN	0.059 4	0.033 9	0.005 9	0.026 3	0.051 8	0.027 7	0.005 8	0.019 6	0.061 6	0.033 6	0.006 9	0.024 1
MMGCN	0.055 2	0.029 7	0.005 5	0.022 0	0.042 0	0.021 8	0.004 7	0.015 1	0.038 8	0.020 6	0.004 4	0.014 4
GRCN	0.048 8	0.023 1	0.004 9	0.015 4	0.052 8	0.028 2	0.005 9	0.020 0	0.057 3	0.030 9	0.006 4	0.022 0
MGCN	0.061 9	0.032 5	0.006 2	0.023 6	0.061 3	0.032 9	0.006 8	0.023 5	0.073 3	0.040 2	0.008 1	0.029 2
LATTICE	0.057 8	0.030 8	0.005 8	0.022 6	0.054 9	0.029 1	0.006 1	0.020 5	0.062 2	0.034 1	0.006 9	0.024 7
FREEDOM	0.053 7	0.031 6	0.006 4	0.024 5	0.062 8	0.032 9	0.006 8	0.022 7	0.071 3	0.038 2	0.007 9	0.027 2
DRAGON	0.062 0	0.032 8	0.006 2	0.023 9	0.065 6	0.034 6	0.007 2	0.024 4	0.072 6	0.039 6	0.008 2	0.028 9
BM3	0.061 7	0.032 2	0.006 2	0.023 4	0.055 1	0.029 0	0.006 2	0.022 4	0.063 5	0.034 3	0.007 1	0.024 5
SLMRec	0.046 0	0.023 2	0.004 6	0.016 5	0.055 1	0.029 5	0.006 1	0.021 0	0.067 6	0.037 4	0.007 5	0.027 2
MENTOR	0.063 8	0.034 2	0.006 4	0.024 6	0.066 5	0.035 4	0.007 3	0.025 2	0.073 5	0.039 8	0.008 1	0.028 6
SFELMMR	0.067 3	0.034 6	0.006 7	0.024 8	0.068 2	0.036 1	0.007 5	0.025 6	0.075 7	0.041 2	0.008 3	0.029 8

算法	TikTok				Baby				Sports
算法	R@10	N@10	P@10	M@10	R@10	N@10	P@10	M@10	R@10	N@10	P@10	M@10
最优较次优提升/%	5.49	1.17	4.69	0.81	2.56	1.98	2.74	1.59	2.99	3.52	1.22	3.11
SelfCF	0.058 6	0.029 2	0.005 9	0.020 3	0.052 1	0.027 9	0.005 8	0.019 9	0.063 0	0.034 4	0.007 0	0.024 8
LayerGCN	0.059 4	0.033 9	0.005 9	0.026 3	0.051 8	0.027 7	0.005 8	0.019 6	0.061 6	0.033 6	0.006 9	0.024 1
MMGCN	0.055 2	0.029 7	0.005 5	0.022 0	0.042 0	0.021 8	0.004 7	0.015 1	0.038 8	0.020 6	0.004 4	0.014 4
GRCN	0.048 8	0.023 1	0.004 9	0.015 4	0.052 8	0.028 2	0.005 9	0.020 0	0.057 3	0.030 9	0.006 4	0.022 0
MGCN	0.061 9	0.032 5	0.006 2	0.023 6	0.061 3	0.032 9	0.006 8	0.023 5	0.073 3	0.040 2	0.008 1	0.029 2
LATTICE	0.057 8	0.030 8	0.005 8	0.022 6	0.054 9	0.029 1	0.006 1	0.020 5	0.062 2	0.034 1	0.006 9	0.024 7
FREEDOM	0.053 7	0.031 6	0.006 4	0.024 5	0.062 8	0.032 9	0.006 8	0.022 7	0.071 3	0.038 2	0.007 9	0.027 2
DRAGON	0.062 0	0.032 8	0.006 2	0.023 9	0.065 6	0.034 6	0.007 2	0.024 4	0.072 6	0.039 6	0.008 2	0.028 9
BM3	0.061 7	0.032 2	0.006 2	0.023 4	0.055 1	0.029 0	0.006 2	0.022 4	0.063 5	0.034 3	0.007 1	0.024 5
SLMRec	0.046 0	0.023 2	0.004 6	0.016 5	0.055 1	0.029 5	0.006 1	0.021 0	0.067 6	0.037 4	0.007 5	0.027 2
MENTOR	0.063 8	0.034 2	0.006 4	0.024 6	0.066 5	0.035 4	0.007 3	0.025 2	0.073 5	0.039 8	0.008 1	0.028 6
SFELMMR	0.067 3	0.034 6	0.006 7	0.024 8	0.068 2	0.036 1	0.007 5	0.025 6	0.075 7	0.041 2	0.008 3	0.029 8

算法	TikTok				Baby				Sports
算法	R@10	N@10	P@10	M@10	R@10	N@10	P@10	M@10	R@10	N@10	P@10	M@10
SFELMMR_K	0.059 4	0.030 6	0.005 9	0.022 0	0.065 8	0.034 7	0.007 2	0.024 5	0.073 9	0.039 9	0.008 1	0.028 6
SFELMMR_T	0.064 8	0.032 4	0.006 5	0.022 8	0.065 3	0.035 3	0.007 2	0.025 3	0.074 6	0.040 7	0.008 2	0.029 4
SFELMMR_CMA	0.066 1	0.032 9	0.005 2	0.023 0	0.067 0	0.035 5	0.007 4	0.025 2	0.074 5	0.040 1	0.008 2	0.028 8
SFELMMR_FE	0.066 6	0.034 4	0.0067	0.024 7	0.066 8	0.035 6	0.007 4	0.025 2	0.075 2	0.040 7	0.008 2	0.029 3
SFELMMR_FE-C	0.066 8	0.0354	0.006 6	0.0249	0.067 8	0.036 0	0.0075	0.0256	0.075 4	0.040 8	0.0083	0.029 3
SFELMMR_GP	0.066 0	0.034 8	0.006 6	0.024 8	0.066 1	0.035 6	0.007 3	0.025 5	0.075 0	0.040 7	0.008 2	0.029 4
SFELMMR_GP-ProJH	0.066 9	0.034 4	0.0067	0.024 7	0.067 9	0.0361	0.007 4	0.025 5	0.075 5	0.040 6	0.0083	0.029 1
SFELMMR	0.0673	0.034 6	0.0067	0.024 8	0.0682	0.0361	0.0075	0.0256	0.0757	0.0412	0.0083	0.0298

基于联合自监督学习的多模态融合推荐算法

Multimodal fusion recommendation algorithm based on joint self-supervised learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 31

相关文章 15

编辑推荐

Metrics

[1]	王向, 崔倩倩, 张晓明, 王建超, 王震洲, 宋佳霖. 改进ConvNeXt的无线胶囊内镜图像分类模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 2016-2024.
[2]	颜文婧, 王瑞东, 左敏, 张青川. 基于风味嵌入异构图层次学习的食谱推荐模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1869-1878.
[3]	孙林嘉, 秦磊, 康美金, 王莹琳. 基于音节类型识别的自动语音分割算法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 2034-2042.
[4]	黄颖, 高胜美, 陈广, 刘苏. 结合信噪比引导的双分支结构和直方图均衡的低照度图像增强网络[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1971-1979.
[5]	龙雨菲, 牟宇辰, 刘晔. 基于张量化图卷积网络和对比学习的多源数据表示学习模型[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1372-1378.
[6]	陈昕, 刘忠慧, 闵帆. 约简形式背景下的概念集构造及其推荐应用[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1415-1423.
[7]	杨雅莉, 黎英, 章育涛, 宋佩华. 面向人脸识别的多模态研究方法综述[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1645-1657.
[8]	张庆, 杨凡, 方宇涵. 基于多模态信息融合的中文拼写纠错算法[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1528-1534.
[9]	田海燕, 黄赛豪, 张栋, 李寿山. 视觉指导的分词和词性标注[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1488-1495.
[10]	周阳, 李辉. 基于语义和细节特征双促进的遥感影像建筑物提取网络[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1310-1316.
[11]	杨光局, 罗天健, 王开军, 杨思琪. 多分支多视图的时间序列上下文对比表征学习方法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1042-1052.
[12]	田仁杰, 景明利, 焦龙, 王飞. 基于混合负采样的图对比学习推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1053-1060.
[13]	党伟超, 温鑫瑜, 高改梅, 刘春霞. 基于多视图多尺度对比学习的图协同过滤[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1061-1068.
[14]	郭诗月, 党建武, 王阳萍, 雍玖. 结合注意力机制和多尺度特征融合的三维手部姿态估计[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1293-1299.
[15]	王一丁, 王泽浩, 李耀利, 蔡少青, 袁媛. 多尺度2D-Adaboost的中药材粉末显微图像识别算法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1325-1332.