《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (4): 1058-1068.DOI: 10.11772/j.issn.1001-9081.2025050528

• 人工智能 • 上一篇    下一篇

基于语义融合和对比增强的多模态推荐方法

赵海华1, 胡怡君1, 唐瑞2, 莫先1()   

  1. 1.宁夏大学 信息工程学院,银川 750021
    2.四川大学 网络空间安全学院,成都 610065
  • 收稿日期:2025-05-15 修回日期:2025-07-18 接受日期:2025-08-06 发布日期:2025-08-12 出版日期:2026-04-10
  • 通讯作者: 莫先
  • 作者简介:赵海华(1999—),男,宁夏中卫人,硕士研究生,CCF会员,主要研究方向:图学习、推荐系统
    胡怡君(2000—),男,山东济宁人,硕士研究生,主要研究方向:图学习、推荐系统
    唐瑞(1990—),男,四川泸州人,副研究员,博士,CCF会员,主要研究方向:社交网络分析、网络表示学习
  • 基金资助:
    国家自然科学基金资助项目(62306157);宁夏自然科学基金资助项目(2024AAC05011)

Multimodal recommendation method based on semantic fusion and contrast enhancement

Haihua ZHAO1, Yijun HU1, Rui TANG2, Xian MO1()   

  1. 1.School of Information Engineering,Ningxia University,Yinchuan Ningxia 750021,China
    2.School of Cyberspace Security,Sichuan University,Chengdu Sichuan 610065,China
  • Received:2025-05-15 Revised:2025-07-18 Accepted:2025-08-06 Online:2025-08-12 Published:2026-04-10
  • Contact: Xian MO
  • About author:ZHAO Haihua, born in 1999, M. S. candidate. His research interests include graph learning, recommender systems.
    HU Yijun, born in 2000, M. S. candidate. His research interests include graph learning, recommender systems.
    TANG Rui, born in 1990, Ph. D., associate research fellow. His research interests include social network analysis, network representational learning.
  • Supported by:
    National Natural Science Foundation of China(62306157);Natural Science Foundation of Ningxia(2024AAC05011)

摘要:

多模态推荐旨在通过融合多模态信息增强用户和项目的特征表示,提升推荐性能。然而,现有方法存在跨模态语义信息融合不足、多模态特征冗余及噪声干扰问题。针对这些问题,提出一种基于语义融合和对比增强的多模态推荐方法(SFCERec)。首先,设计跨模态语义一致性增强框架,通过多模态语义特征筛选机制构建全局关联图,动态聚合多模态共性特征并抑制噪声传播;同时,提出多粒度属性解耦模块,从模态特征中分离粗粒度共性特征与用户行为驱动的细粒度特征,缓解特征冗余。其次,提出多层次对比学习范式,联合跨模态一致性对齐、用户行为相似性建模、项目语义关联性约束及显式?潜在特征互信息最大化这4类任务,通过对比学习强化表征的判别性。最后,进一步结合图扰动增强策略,以通过添加噪声与双重对比正则化,提升模型对稀疏数据与噪声干扰的鲁棒性。在Amazon-Baby、Amazon-Sports和Amazon-Clothing数据集上的实验结果表明,该方法在Recall@20和NDCG@20指标上均优于所有基线模型,尤其在稀疏场景下。消融实验结果也验证了该方法的有效性。

关键词: 推荐系统, 多模态, 对比学习, 语义融合, 特征解耦

Abstract:

Multimodal recommendation aims to enhance user and item feature representations by integrating multimodal information, so as to improve recommendation performance. However, the existing methods still face challenges including insufficient cross-modal semantic information fusion, redundant multimodal features, and noise interference. To address these issues, a multimodal Recommendation method based on Semantic Fusion and Contrast Enhancement (SFCERec) was proposed. Firstly, a cross-modal semantic consistency enhancement framework was designed by which a global correlation graph was constructed through a multimodal semantic feature filtering mechanism, so as to aggregate common multimodal features dynamically while suppressing noise propagation. Concurrently, a multi-granularity attribute disentanglement module was introduced to separate coarse-grained common features from user behavior-driven fine-grained features from modal features, so as to mitigate feature redundancy. Secondly, a multi-level contrastive learning paradigm was proposed, so as to joint four tasks: cross-modal consistency alignment, user behavior similarity modeling, item semantic relevance constraint, and explicit-latent feature mutual information maximization, thereby enhancing representation discriminability through contrastive learning. Finally, a graph perturbation enhancement strategy was further incorporated, thereby employing noise injection and dual contrastive regularization to improve model robustness against sparse data and noise interference. Experimental results on Amazon-Baby, Amazon-Sports, and Amazon-Clothing datasets demonstrate that this method outperforms all baseline models in both Recall@20 and NDCG@20 metrics, particularly in sparse scenarios. Ablation studies further validate the effectiveness of the proposed method.

Key words: Recommender System (RS), multimodal, contrastive learning, semantic fusion, feature disentanglement

中图分类号: