Multimodal recommendation aims to enhance user and item feature representations by integrating multimodal information, so as to improve recommendation performance. However, the existing methods still face challenges including insufficient cross-modal semantic information fusion, redundant multimodal features, and noise interference. To address these issues, a multimodal Recommendation method based on Semantic Fusion and Contrast Enhancement (SFCERec) was proposed. Firstly, a cross-modal semantic consistency enhancement framework was designed by which a global correlation graph was constructed through a multimodal semantic feature filtering mechanism, so as to aggregate common multimodal features dynamically while suppressing noise propagation. Concurrently, a multi-granularity attribute disentanglement module was introduced to separate coarse-grained common features from user behavior-driven fine-grained features from modal features, so as to mitigate feature redundancy. Secondly, a multi-level contrastive learning paradigm was proposed, so as to joint four tasks: cross-modal consistency alignment, user behavior similarity modeling, item semantic relevance constraint, and explicit-latent feature mutual information maximization, thereby enhancing representation discriminability through contrastive learning. Finally, a graph perturbation enhancement strategy was further incorporated, thereby employing noise injection and dual contrastive regularization to improve model robustness against sparse data and noise interference. Experimental results on Amazon-Baby, Amazon-Sports, and Amazon-Clothing datasets demonstrate that this method outperforms all baseline models in both Recall@20 and NDCG@20 metrics, particularly in sparse scenarios. Ablation studies further validate the effectiveness of the proposed method.