Semantic graph enhanced multi-modal recommendation algorithm

doi:10.11772/j.issn.1001-9081.2024010145

Abstract

Abstract:

In order to mine the latent isomorphic semantic relationships within multi-modal information and learn better item representations， a Semantic Graph Enhanced Multi-modal Recommendation （SGEMR） algorithm was proposed. Specifically， auxiliary multi-modal information was utilized to complement historical user-item interactions， thereby capturing user preferences in different modalities. Subsequently， based on metric learning， the scattered sequence of items was reconstructed into a dense item-item semantic graph， and a semantic hierarchical attention mechanism was designed to integrate the multi-modal information of items. At the same time， a graph reconstruction loss function was proposed to retain more semantic relationships in item representations， thereby improving recommendation performance. Experimental results indicate that compared to the optimal baseline algorithm FREEDOM （FREEzes the item-item graph and DenOises the user-item interaction graph simultaneously for Multimodal recommendation） on three real datasets， the proposed algorithm has the Recall@10 enhanced by 6.70%， 11.30%， and 5.09% respectively， and the NDCG@10 increased by 9.09%， 12.73%， and 7.62% respectively. Moreover， the effectiveness of the proposed algorithm is validated through various ablation experiments.

Key words: recommendation algorithm, Graph Neural Network (GNN), multi-modal fusion, attention mechanism, graph structure learning

摘要：

为了挖掘多模态信息潜在的同构语义关系，并学习更好的项目表示，提出一种语义图增强多模态推荐（SGEMR）算法。首先，利用辅助的多模态信息补充历史的用户-项目交互，捕捉用户在不同模态下的偏好；然后，基于度量学习将松散的项目序列重新构建为紧密的项目-项目语义图，并设计一个语义层级注意力机制，融合项目的多模态信息；同时，提出一个图重构损失函数，使项目表示保留更多的语义关系，从而提高推荐性能。实验结果表明，在3个真实的数据集上与最优基线算法FREEDOM（FREEzes the item-item graph and DenOises the user-item interaction graph simultaneously for Multimodal recommendation）相比，所提算法的Recall@10分别提升了6.70%、11.30%、5.09%，NDCG@10分别提升了9.09%、12.73%、7.62%，并通过多个消融实验，验证了所提算法的有效性。

CLC Number:

TP391

Qijian CAI, Wei TAN. Semantic graph enhanced multi-modal recommendation algorithm[J]. Journal of Computer Applications, 2025, 45(2): 421-427.

蔡启健, 谭伟. 语义图增强的多模态推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 421-427.

Figures/Tables 8

References 25

1	HE R， McAULEY J. VBPR： visual Bayesian personalized ranking from implicit feedback［C］// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2016： 144-150.
2	CHEN J， ZHANG H， HE X， et al. Attentive collaborative filtering： multimedia recommendation with item-and component-level attention［C］// Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. New York： ACM， 2017： 335-344.
3	WANG X， HE X， WANG M， et al. Neural graph collaborative filtering［C］// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2019： 165-174.
4	HE X， DENG K， WANG X， et al. LightGCN： simplifying and powering graph convolution network for recommendation［C］// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2020： 639-648.
5	FAN W， MA Y， LI Q， et al. Graph neural networks for social recommendation［C］// Proceedings of the 2019 World Wide Web Conference. New York： ACM， 2019： 417-426.
6	WANG X， HE X， CAO Y， et al. KGAT： knowledge graph attention network for recommendation［C］// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2019： 950-958.
7	DELDJOO Y， SCHEDL M， CREMONESI P， et al. Recommender systems leveraging multimedia content［J］. ACM Computing Surveys， 2021， 53（5）： No.106.
8	WEI Y， WANG X， NIE L， et al. MMGCN： multi-modal graph convolution network for personalized recommendation of micro-video［C］// Proceedings of the 27th ACM International Conference on Multimedia. New York： ACM， 2019： 1437-1445.
9	TAO Z， WEI Y， WANG X， et al. MGAT： multimodal graph attention network for recommendation［J］. Information Processing and Management， 2020， 57（5）： No.102277.
10	WANG Q， WEI Y， YIN J， et al. DualGNN： dual graph neural network for multimedia recommendation［J］. IEEE Transactions on Multimedia， 2023， 25： 1074-1084.
11	ZHANG J， ZHU Y， LIU Q， et al. Mining latent structures for multimedia recommendation［C］// Proceedings of the 29th ACM International Conference on Multimedia. New York： ACM， 2021： 3872-3880.
12	ZHOU X， SHEN Z. A tale of two graphs： freezing and denoising graph structures for multimodal recommendation［C］// Proceedings of the 31st ACM International Conference on Multimedia. New York： ACM， 2023： 935-943.
13	ZHOU H， ZHOU X， ZENG Z， et al. A comprehensive survey on multimodal recommender systems： taxonomy， evaluation， and future directions［EB/OL］. ［2024-02-09］..
14	LIU F， CHENG Z， SUN C， et al. User diverse preference modeling by multimodal attentive metric learning［C］// Proceedings of the 27th ACM International Conference on Multimedia. New York： ACM， 2019： 1526-1534.
15	LIU S， CHEN Z， LIU H， et al. User-video co-attention network for personalized micro-video recommendation［C］// Proceedings of the 2019 World Wide Web Conference. New York： ACM， 2019： 3020-3026.
16	WEI Y， WANG X， NIE L， et al. Graph-refined convolutional network for multimedia recommendation with implicit feedback［C］// Proceedings of the 28th ACM International Conference on Multimedia. New York： ACM， 2020： 3541-3549.
17	MU Z， ZHUANG Y， TAN J， et al. Learning hybrid behavior patterns for multimedia recommendation［C］// Proceedings of the 30th ACM International Conference on Multimedia. New York： ACM， 2022： 376-384.
18	ZHU Y， XU W， ZHANG J， et al. A survey on graph structure learning： progress and opportunities［EB/OL］. ［2024-03-04］..
19	CHEN Y， WU L， ZAKI M J. Iterative deep graph learning for graph neural networks： better and robust node embeddings［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 19314-19326.
20	ZHAO J， WANG X， SHI C， et al. Heterogeneous graph structure learning for graph neural networks［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 4697-4705.
21	SAHA A， MENDEZ O， RUSSELL C， et al. Learning adaptive neighborhoods for graph neural networks［C］// Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2023： 22484-22493.
22	LUO D， CHENG W， YU W， et al. Learning to drop： robust graph neural network via topological denoising［C］// Proceedings of the 14th ACM International Conference on Web Search and Data Mining. New York： ACM， 2021： 779-787.
23	KREUZER D， BEAINI D， HAMILTON W L， et al. Rethinking graph transformers with spectral attention［C］// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2021： 21618-21629.
24	ZHOU H， ZHOU X， ZHANG L， et al. Enhancing dyadic relations with homogeneous graphs for multimodal recommendation［C］// Proceedings of the 26th European Conference on Artificial Intelligence. Amsterdam： IOS Press， 2023： 3123-3130.
25	WANG X， JI H， SHI C， et al. Heterogeneous graph attention network［C］// Proceedings of the 2019 World Wide Web Conference. New York： ACM， 2019： 2022-2032.

数据集	用户数	项目数	交互数	稀疏度/%
Baby	19 445	7 050	160 792	99.88
Sports	35 598	18 357	296 337	99.95
Clothing	39 387	23 033	278 677	99.97

数据集	用户数	项目数	交互数	稀疏度/%
Baby	19 445	7 050	160 792	99.88
Sports	35 598	18 357	296 337	99.95
Clothing	39 387	23 033	278 677	99.97

算法	Baby				Sports				Clothing
算法	R@10	R@20	N@10	N@20	R@10	R@20	N@10	N@20	R@10	R@20	N@10	N@20
LightGCN	0.047 9	0.075 4	0.025 7	0.032 8	0.056 9	0.086 4	0.031 1	0.038 7	0.036 1	0.054 4	0.019 7	0.024 3
VBPR	0.042 3	0.066 3	0.022 3	0.028 4	0.055 8	0.085 6	0.030 7	0.038 4	0.028 1	0.041 5	0.015 8	0.019 2
MMGCN	0.041 2	0.066 4	0.021 9	0.028 4	0.039 0	0.062 9	0.020 4	0.026 6	0.022 4	0.036 9	0.011 6	0.015 3
GRCN	0.053 2	0.082 4	0.028 2	0.035 8	0.059 9	0.091 9	0.033 0	0.041 3	0.042 1	0.065 7	0.022 4	0.028 4
DualGNN	0.051 3	0.080 3	0.027 8	0.035 2	0.058 8	0.089 9	0.032 4	0.040 4	0.045 2	0.067 5	0.024 2	0.029 8
LATTICE	0.054 7	0.085 0	0.029 2	0.037 0	0.062 0	0.095 3	0.033 5	0.042 1	0.049 2	0.073 3	0.026 8	0.033 0
FREEDOM	0.062 7	0.099 2	0.033 0	0.042 4	0.071 7	0.108 9	0.038 5	0.048 1	0.062 9	0.094 1	0.034 1	0.042 0
SGEMR	0.066 9	0.103 2	0.036 0	0.045 2	0.079 8	0.118 2	0.043 4	0.053 2	0.066 1	0.096 8	0.036 7	0.044 6

算法	Baby				Sports				Clothing
算法	R@10	R@20	N@10	N@20	R@10	R@20	N@10	N@20	R@10	R@20	N@10	N@20
LightGCN	0.047 9	0.075 4	0.025 7	0.032 8	0.056 9	0.086 4	0.031 1	0.038 7	0.036 1	0.054 4	0.019 7	0.024 3
VBPR	0.042 3	0.066 3	0.022 3	0.028 4	0.055 8	0.085 6	0.030 7	0.038 4	0.028 1	0.041 5	0.015 8	0.019 2
MMGCN	0.041 2	0.066 4	0.021 9	0.028 4	0.039 0	0.062 9	0.020 4	0.026 6	0.022 4	0.036 9	0.011 6	0.015 3
GRCN	0.053 2	0.082 4	0.028 2	0.035 8	0.059 9	0.091 9	0.033 0	0.041 3	0.042 1	0.065 7	0.022 4	0.028 4
DualGNN	0.051 3	0.080 3	0.027 8	0.035 2	0.058 8	0.089 9	0.032 4	0.040 4	0.045 2	0.067 5	0.024 2	0.029 8
LATTICE	0.054 7	0.085 0	0.029 2	0.037 0	0.062 0	0.095 3	0.033 5	0.042 1	0.049 2	0.073 3	0.026 8	0.033 0
FREEDOM	0.062 7	0.099 2	0.033 0	0.042 4	0.071 7	0.108 9	0.038 5	0.048 1	0.062 9	0.094 1	0.034 1	0.042 0
SGEMR	0.066 9	0.103 2	0.036 0	0.045 2	0.079 8	0.118 2	0.043 4	0.053 2	0.066 1	0.096 8	0.036 7	0.044 6

[1]	Wenbo ZHAO, Zitong MA, Zhe YANG. Link prediction model based on directed hypergraph adaptive convolution [J]. Journal of Computer Applications, 2025, 45(1): 15-23.
[2]	Jialin ZHANG, Qinghua REN, Qirong MAO. Speaker verification system utilizing global-local feature dependency for anti-spoofing [J]. Journal of Computer Applications, 2025, 45(1): 308-317.
[3]	Ying HUANG, Changsheng LI, Hui PENG, Su LIU. Dual-branch network guided by local entropy for dynamic scene high dynamic range imaging [J]. Journal of Computer Applications, 2025, 45(1): 204-213.
[4]	Jie XU, Yong ZHONG, Yang WANG, Changfu ZHANG, Guanci YANG. Facial attribute estimation and expression recognition based on contextual channel attention mechanism [J]. Journal of Computer Applications, 2025, 45(1): 253-260.
[5]	Junying CHEN, Shijie GUO, Lingling CHEN. Lightweight human pose estimation based on decoupled attention and ghost convolution [J]. Journal of Computer Applications, 2025, 45(1): 223-233.
[6]	Zidong CHENG, Peng LI, Feng ZHU. Potential relation mining in internet of things threat intelligence knowledge graph [J]. Journal of Computer Applications, 2025, 45(1): 24-31.
[7]	Lifang WANG, Jingshuang WU, Pengliang YIN, Lihua HU. Action recognition algorithm based on attention mechanism and energy function [J]. Journal of Computer Applications, 2025, 45(1): 234-239.
[8]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[9]	Ying HUANG, Jiayu YANG, Jiahao JIN, Bangrui WAN. Siamese mixed information fusion algorithm for RGBT tracking [J]. Journal of Computer Applications, 2024, 44(9): 2878-2885.
[10]	Xianglan WU, Yang XIAO, Mengying LIU, Mingming LIU. Text-to-SQL model based on semantic enhanced schema linking [J]. Journal of Computer Applications, 2024, 44(9): 2689-2695.
[11]	Tingjie TANG, Jiajin HUANG, Jin QIN. Session-based recommendation with graph auxiliary learning [J]. Journal of Computer Applications, 2024, 44(9): 2711-2718.
[12]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[13]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[14]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[15]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.