《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (2): 421-427.DOI: 10.11772/j.issn.1001-9081.2024010145

• 人工智能 • 上一篇    

语义图增强的多模态推荐算法

蔡启健, 谭伟()   

  1. 东莞理工学院 计算机科学与技术学院,广东 东莞 523808
  • 收稿日期:2024-02-07 修回日期:2024-04-03 接受日期:2024-04-07 发布日期:2024-05-09 出版日期:2025-02-10
  • 通讯作者: 谭伟
  • 作者简介:蔡启健(1998—),男,广东湛江人,硕士研究生,CCF会员,主要研究方向:数据挖掘、推荐系统;
  • 基金资助:
    广东省基础与应用基础研究基金自然科学基金资助项目(2021A1515010506)

Semantic graph enhanced multi-modal recommendation algorithm

Qijian CAI, Wei TAN()   

  1. School of Computer Science and Technology,Dongguan University of Technology,Dongguan Guangdong 523808,China
  • Received:2024-02-07 Revised:2024-04-03 Accepted:2024-04-07 Online:2024-05-09 Published:2025-02-10
  • Contact: Wei TAN
  • About author:CAI Qijian, born in 1998, M. S. candidate. His research interests include data mining, recommender systems.
  • Supported by:
    Guangdong Provincial Basic and Applied Basic Research Fund - Natural Science Fund(2021A1515010506)

摘要:

为了挖掘多模态信息潜在的同构语义关系,并学习更好的项目表示,提出一种语义图增强多模态推荐(SGEMR)算法。首先,利用辅助的多模态信息补充历史的用户-项目交互,捕捉用户在不同模态下的偏好;然后,基于度量学习将松散的项目序列重新构建为紧密的项目-项目语义图,并设计一个语义层级注意力机制,融合项目的多模态信息;同时,提出一个图重构损失函数,使项目表示保留更多的语义关系,从而提高推荐性能。实验结果表明,在3个真实的数据集上与最优基线算法FREEDOM(FREEzes the item-item graph and DenOises the user-item interaction graph simultaneously for Multimodal recommendation)相比,所提算法的Recall@10分别提升了6.70%、11.30%、5.09%,NDCG@10分别提升了9.09%、12.73%、7.62%,并通过多个消融实验,验证了所提算法的有效性。

关键词: 推荐算法, 图神经网络, 多模态融合, 注意力机制, 图结构学习

Abstract:

In order to mine the latent isomorphic semantic relationships within multi-modal information and learn better item representations, a Semantic Graph Enhanced Multi-modal Recommendation (SGEMR) algorithm was proposed. Specifically, auxiliary multi-modal information was utilized to complement historical user-item interactions, thereby capturing user preferences in different modalities. Subsequently, based on metric learning, the scattered sequence of items was reconstructed into a dense item-item semantic graph, and a semantic hierarchical attention mechanism was designed to integrate the multi-modal information of items. At the same time, a graph reconstruction loss function was proposed to retain more semantic relationships in item representations, thereby improving recommendation performance. Experimental results indicate that compared to the optimal baseline algorithm FREEDOM (FREEzes the item-item graph and DenOises the user-item interaction graph simultaneously for Multimodal recommendation) on three real datasets, the proposed algorithm has the Recall@10 enhanced by 6.70%, 11.30%, and 5.09% respectively, and the NDCG@10 increased by 9.09%, 12.73%, and 7.62% respectively. Moreover, the effectiveness of the proposed algorithm is validated through various ablation experiments.

Key words: recommendation algorithm, Graph Neural Network (GNN), multi-modal fusion, attention mechanism, graph structure learning

中图分类号: