To address the data sparsity problem in multimodal recommendation algorithms and the problem in the existing Self-Supervised Learning (SSL) algorithms that the algorithms often focus on SSL a single feature in a dataset, ignoring the possibility of joint learning of multiple features, a multimodal fusion recommendation algorithm based on joint self-supervised learning was proposed, called SFELMMR (SelF-supErvised Learning for MultiModal Recommendation). Firstly, the existing SSL strategies were integrated and optimized to enhance data representation capabilities significantly by learning data features from different modalities jointly, thereby alleviating the data sparsity issue. Secondly, a method to construct multimodal latent semantic graph was designed by integrating deep item relationships from a global perspective with direct interactions from a local perspective, enabling the algorithm to capture complex relationships among items more accurately. Finally, experiments were carried out on three datasets. The results demonstrate that the proposed algorithm achieves significant improvements in multiple recommendation performance metrics compared to the existing best-performing multimodal recommendation algorithms. Specifically, the proposed algorithm has the Recall@10 improved by 5.49%, 2.56%, and 2.99%, respectively, the NDCG@10 improved by 1.17%, 1.98%, and 3.52%, respectively, the Precision@10 improved by 4.69%, 2.74%, and 1.22%, respectively, and the Map@10 improved by 0.81%, 1.59%, and 3.11%, respectively. Besides, through ablation experiments of the proposed algorithm, the effectiveness of the algorithm is verified.