Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Multimodal recommendation method based on semantic fusion and contrast enhancement

Haihua ZHAO, Yijun HU, Rui TANG, Xian MO

Journal of Computer Applications 2026, 46 (4): 1058-1068. DOI: 10.11772/j.issn.1001-9081.2025050528

Abstract （147）

HTML （3）

PDF （2148KB）（32）

Save

Multimodal recommendation aims to enhance user and item feature representations by integrating multimodal information， so as to improve recommendation performance. However， the existing methods still face challenges including insufficient cross-modal semantic information fusion， redundant multimodal features， and noise interference. To address these issues， a multimodal Recommendation method based on Semantic Fusion and Contrast Enhancement （SFCERec） was proposed. Firstly， a cross-modal semantic consistency enhancement framework was designed by which a global correlation graph was constructed through a multimodal semantic feature filtering mechanism， so as to aggregate common multimodal features dynamically while suppressing noise propagation. Concurrently， a multi-granularity attribute disentanglement module was introduced to separate coarse-grained common features from user behavior-driven fine-grained features from modal features， so as to mitigate feature redundancy. Secondly， a multi-level contrastive learning paradigm was proposed， so as to joint four tasks： cross-modal consistency alignment， user behavior similarity modeling， item semantic relevance constraint， and explicit-latent feature mutual information maximization， thereby enhancing representation discriminability through contrastive learning. Finally， a graph perturbation enhancement strategy was further incorporated， thereby employing noise injection and dual contrastive regularization to improve model robustness against sparse data and noise interference. Experimental results on Amazon-Baby， Amazon-Sports， and Amazon-Clothing datasets demonstrate that this method outperforms all baseline models in both Recall@20 and NDCG@20 metrics， particularly in sparse scenarios. Ablation studies further validate the effectiveness of the proposed method.

Table and Figures | Reference | Related Articles | Metrics