语义图增强的多模态推荐算法

doi:10.11772/j.issn.1001-9081.2024010145

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (2): 421-427.DOI: 10.11772/j.issn.1001-9081.2024010145

• 人工智能 • 上一篇

语义图增强的多模态推荐算法

蔡启健, 谭伟()

东莞理工学院计算机科学与技术学院，广东东莞 523808

收稿日期:2024-02-07 修回日期:2024-04-03 接受日期:2024-04-07 发布日期:2024-05-09 出版日期:2025-02-10
通讯作者: 谭伟
作者简介:蔡启健（1998—），男，广东湛江人，硕士研究生，CCF会员，主要研究方向：数据挖掘、推荐系统；
基金资助:
广东省基础与应用基础研究基金自然科学基金资助项目(2021A1515010506)

Semantic graph enhanced multi-modal recommendation algorithm

Qijian CAI, Wei TAN()

School of Computer Science and Technology，Dongguan University of Technology，Dongguan Guangdong 523808，China

Received:2024-02-07 Revised:2024-04-03 Accepted:2024-04-07 Online:2024-05-09 Published:2025-02-10
Contact: Wei TAN
About author:CAI Qijian， born in 1998， M. S. candidate. His research interests include data mining， recommender systems.
Supported by:
Guangdong Provincial Basic and Applied Basic Research Fund - Natural Science Fund(2021A1515010506)

摘要/Abstract

摘要：

为了挖掘多模态信息潜在的同构语义关系，并学习更好的项目表示，提出一种语义图增强多模态推荐（SGEMR）算法。首先，利用辅助的多模态信息补充历史的用户-项目交互，捕捉用户在不同模态下的偏好；然后，基于度量学习将松散的项目序列重新构建为紧密的项目-项目语义图，并设计一个语义层级注意力机制，融合项目的多模态信息；同时，提出一个图重构损失函数，使项目表示保留更多的语义关系，从而提高推荐性能。实验结果表明，在3个真实的数据集上与最优基线算法FREEDOM（FREEzes the item-item graph and DenOises the user-item interaction graph simultaneously for Multimodal recommendation）相比，所提算法的Recall@10分别提升了6.70%、11.30%、5.09%，NDCG@10分别提升了9.09%、12.73%、7.62%，并通过多个消融实验，验证了所提算法的有效性。

Abstract:

In order to mine the latent isomorphic semantic relationships within multi-modal information and learn better item representations， a Semantic Graph Enhanced Multi-modal Recommendation （SGEMR） algorithm was proposed. Specifically， auxiliary multi-modal information was utilized to complement historical user-item interactions， thereby capturing user preferences in different modalities. Subsequently， based on metric learning， the scattered sequence of items was reconstructed into a dense item-item semantic graph， and a semantic hierarchical attention mechanism was designed to integrate the multi-modal information of items. At the same time， a graph reconstruction loss function was proposed to retain more semantic relationships in item representations， thereby improving recommendation performance. Experimental results indicate that compared to the optimal baseline algorithm FREEDOM （FREEzes the item-item graph and DenOises the user-item interaction graph simultaneously for Multimodal recommendation） on three real datasets， the proposed algorithm has the Recall@10 enhanced by 6.70%， 11.30%， and 5.09% respectively， and the NDCG@10 increased by 9.09%， 12.73%， and 7.62% respectively. Moreover， the effectiveness of the proposed algorithm is validated through various ablation experiments.

Key words: recommendation algorithm, Graph Neural Network (GNN), multi-modal fusion, attention mechanism, graph structure learning

中图分类号:

TP391

蔡启健, 谭伟. 语义图增强的多模态推荐算法[J]. 计算机应用, 2025, 45(2): 421-427.

Qijian CAI, Wei TAN. Semantic graph enhanced multi-modal recommendation algorithm[J]. Journal of Computer Applications, 2025, 45(2): 421-427.

图/表 8

参考文献 25

1	HE R， McAULEY J. VBPR： visual Bayesian personalized ranking from implicit feedback［C］// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2016： 144-150.
2	CHEN J， ZHANG H， HE X， et al. Attentive collaborative filtering： multimedia recommendation with item-and component-level attention［C］// Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. New York： ACM， 2017： 335-344.
3	WANG X， HE X， WANG M， et al. Neural graph collaborative filtering［C］// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2019： 165-174.
4	HE X， DENG K， WANG X， et al. LightGCN： simplifying and powering graph convolution network for recommendation［C］// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2020： 639-648.
5	FAN W， MA Y， LI Q， et al. Graph neural networks for social recommendation［C］// Proceedings of the 2019 World Wide Web Conference. New York： ACM， 2019： 417-426.
6	WANG X， HE X， CAO Y， et al. KGAT： knowledge graph attention network for recommendation［C］// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2019： 950-958.
7	DELDJOO Y， SCHEDL M， CREMONESI P， et al. Recommender systems leveraging multimedia content［J］. ACM Computing Surveys， 2021， 53（5）： No.106.
8	WEI Y， WANG X， NIE L， et al. MMGCN： multi-modal graph convolution network for personalized recommendation of micro-video［C］// Proceedings of the 27th ACM International Conference on Multimedia. New York： ACM， 2019： 1437-1445.
9	TAO Z， WEI Y， WANG X， et al. MGAT： multimodal graph attention network for recommendation［J］. Information Processing and Management， 2020， 57（5）： No.102277.
10	WANG Q， WEI Y， YIN J， et al. DualGNN： dual graph neural network for multimedia recommendation［J］. IEEE Transactions on Multimedia， 2023， 25： 1074-1084.
11	ZHANG J， ZHU Y， LIU Q， et al. Mining latent structures for multimedia recommendation［C］// Proceedings of the 29th ACM International Conference on Multimedia. New York： ACM， 2021： 3872-3880.
12	ZHOU X， SHEN Z. A tale of two graphs： freezing and denoising graph structures for multimodal recommendation［C］// Proceedings of the 31st ACM International Conference on Multimedia. New York： ACM， 2023： 935-943.
13	ZHOU H， ZHOU X， ZENG Z， et al. A comprehensive survey on multimodal recommender systems： taxonomy， evaluation， and future directions［EB/OL］. ［2024-02-09］..
14	LIU F， CHENG Z， SUN C， et al. User diverse preference modeling by multimodal attentive metric learning［C］// Proceedings of the 27th ACM International Conference on Multimedia. New York： ACM， 2019： 1526-1534.
15	LIU S， CHEN Z， LIU H， et al. User-video co-attention network for personalized micro-video recommendation［C］// Proceedings of the 2019 World Wide Web Conference. New York： ACM， 2019： 3020-3026.
16	WEI Y， WANG X， NIE L， et al. Graph-refined convolutional network for multimedia recommendation with implicit feedback［C］// Proceedings of the 28th ACM International Conference on Multimedia. New York： ACM， 2020： 3541-3549.
17	MU Z， ZHUANG Y， TAN J， et al. Learning hybrid behavior patterns for multimedia recommendation［C］// Proceedings of the 30th ACM International Conference on Multimedia. New York： ACM， 2022： 376-384.
18	ZHU Y， XU W， ZHANG J， et al. A survey on graph structure learning： progress and opportunities［EB/OL］. ［2024-03-04］..
19	CHEN Y， WU L， ZAKI M J. Iterative deep graph learning for graph neural networks： better and robust node embeddings［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 19314-19326.
20	ZHAO J， WANG X， SHI C， et al. Heterogeneous graph structure learning for graph neural networks［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 4697-4705.
21	SAHA A， MENDEZ O， RUSSELL C， et al. Learning adaptive neighborhoods for graph neural networks［C］// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2023： 22484-22493.
22	LUO D， CHENG W， YU W， et al. Learning to drop： robust graph neural network via topological denoising［C］// Proceedings of the 14th ACM International Conference on Web Search and Data Mining. New York： ACM， 2021： 779-787.
23	KREUZER D， BEAINI D， HAMILTON W L， et al. Rethinking graph transformers with spectral attention［C］// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2021： 21618-21629.
24	ZHOU H， ZHOU X， ZHANG L， et al. Enhancing dyadic relations with homogeneous graphs for multimodal recommendation［C］// Proceedings of the 26th European Conference on Artificial Intelligence. Amsterdam： IOS Press， 2023： 3123-3130.
25	WANG X， JI H， SHI C， et al. Heterogeneous graph attention network［C］// Proceedings of the 2019 World Wide Web Conference. New York： ACM， 2019： 2022-2032.

数据集	用户数	项目数	交互数	稀疏度/%
Baby	19 445	7 050	160 792	99.88
Sports	35 598	18 357	296 337	99.95
Clothing	39 387	23 033	278 677	99.97

数据集	用户数	项目数	交互数	稀疏度/%
Baby	19 445	7 050	160 792	99.88
Sports	35 598	18 357	296 337	99.95
Clothing	39 387	23 033	278 677	99.97

算法	Baby				Sports				Clothing
算法	R@10	R@20	N@10	N@20	R@10	R@20	N@10	N@20	R@10	R@20	N@10	N@20
LightGCN	0.047 9	0.075 4	0.025 7	0.032 8	0.056 9	0.086 4	0.031 1	0.038 7	0.036 1	0.054 4	0.019 7	0.024 3
VBPR	0.042 3	0.066 3	0.022 3	0.028 4	0.055 8	0.085 6	0.030 7	0.038 4	0.028 1	0.041 5	0.015 8	0.019 2
MMGCN	0.041 2	0.066 4	0.021 9	0.028 4	0.039 0	0.062 9	0.020 4	0.026 6	0.022 4	0.036 9	0.011 6	0.015 3
GRCN	0.053 2	0.082 4	0.028 2	0.035 8	0.059 9	0.091 9	0.033 0	0.041 3	0.042 1	0.065 7	0.022 4	0.028 4
DualGNN	0.051 3	0.080 3	0.027 8	0.035 2	0.058 8	0.089 9	0.032 4	0.040 4	0.045 2	0.067 5	0.024 2	0.029 8
LATTICE	0.054 7	0.085 0	0.029 2	0.037 0	0.062 0	0.095 3	0.033 5	0.042 1	0.049 2	0.073 3	0.026 8	0.033 0
FREEDOM	0.062 7	0.099 2	0.033 0	0.042 4	0.071 7	0.108 9	0.038 5	0.048 1	0.062 9	0.094 1	0.034 1	0.042 0
SGEMR	0.066 9	0.103 2	0.036 0	0.045 2	0.079 8	0.118 2	0.043 4	0.053 2	0.066 1	0.096 8	0.036 7	0.044 6

算法	Baby				Sports				Clothing
算法	R@10	R@20	N@10	N@20	R@10	R@20	N@10	N@20	R@10	R@20	N@10	N@20
LightGCN	0.047 9	0.075 4	0.025 7	0.032 8	0.056 9	0.086 4	0.031 1	0.038 7	0.036 1	0.054 4	0.019 7	0.024 3
VBPR	0.042 3	0.066 3	0.022 3	0.028 4	0.055 8	0.085 6	0.030 7	0.038 4	0.028 1	0.041 5	0.015 8	0.019 2
MMGCN	0.041 2	0.066 4	0.021 9	0.028 4	0.039 0	0.062 9	0.020 4	0.026 6	0.022 4	0.036 9	0.011 6	0.015 3
GRCN	0.053 2	0.082 4	0.028 2	0.035 8	0.059 9	0.091 9	0.033 0	0.041 3	0.042 1	0.065 7	0.022 4	0.028 4
DualGNN	0.051 3	0.080 3	0.027 8	0.035 2	0.058 8	0.089 9	0.032 4	0.040 4	0.045 2	0.067 5	0.024 2	0.029 8
LATTICE	0.054 7	0.085 0	0.029 2	0.037 0	0.062 0	0.095 3	0.033 5	0.042 1	0.049 2	0.073 3	0.026 8	0.033 0
FREEDOM	0.062 7	0.099 2	0.033 0	0.042 4	0.071 7	0.108 9	0.038 5	0.048 1	0.062 9	0.094 1	0.034 1	0.042 0
SGEMR	0.066 9	0.103 2	0.036 0	0.045 2	0.079 8	0.118 2	0.043 4	0.053 2	0.066 1	0.096 8	0.036 7	0.044 6

[1]	徐杰, 钟勇, 王阳, 张昌福, 杨观赐. 基于上下文通道注意力机制的人脸属性估计与表情识别[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 253-260.
[2]	陈俊颖, 郭士杰, 陈玲玲. 基于解耦注意力与幻影卷积的轻量级人体姿态估计[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 223-233.
[3]	余肖生, 王智鑫. 基于多层次图对比学习的序列推荐模型[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 106-114.
[4]	程子栋, 李鹏, 朱枫. 物联网威胁情报知识图谱中潜在关系的挖掘[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 24-31.
[5]	王丽芳, 吴荆双, 尹鹏亮, 胡立华. 基于注意力机制和能量函数的动作识别算法[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 234-239.
[6]	宋鹏程, 郭立君, 张荣. 利用局部-全局时间依赖的弱监督视频异常检测[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 240-246.
[7]	黄颖, 李昌盛, 彭慧, 刘苏. 用于动态场景高动态范围成像的局部熵引导的双分支网络[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 204-213.
[8]	赵文博, 马紫彤, 杨哲. 基于有向超图自适应卷积的链接预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 15-23.
[9]	张嘉琳, 任庆桦, 毛启容. 利用全局-局部特征依赖的反欺骗说话人验证系统[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 308-317.
[10]	杨兴耀, 陈羽, 于炯, 张祖莲, 陈嘉颖, 王东晓. 结合自我特征和对比学习的推荐模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2704-2710.
[11]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[12]	杜郁, 朱焱. 构建预训练动态图神经网络预测学术合作行为消失[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2726-2731.
[13]	黄颖, 杨佳宇, 金家昊, 万邦睿. 用于RGBT跟踪的孪生混合信息融合算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2878-2885.
[14]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[15]	杨航, 李汪根, 张根生, 王志格, 开新. 基于图神经网络的多层信息交互融合算法用于会话推荐[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2719-2725.

语义图增强的多模态推荐算法

Semantic graph enhanced multi-modal recommendation algorithm

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 25

相关文章 15

编辑推荐

Metrics