Comments on social media platforms sometimes express their attitudes towards events through sarcasm. Sarcasm detection can more accurately analyze user sentiments and opinions. But traditional models based on vocabulary and syntactic structure ignore the role of text sentiment information in sarcasm detection and suffer from performance degradation due to data noise. To address these limitations, a Multimodal Sarcasm Detection model integrating Contrastive learning with Sentiment analysis (MSDCS) was proposed. Firstly, BERT (Bidirectional Encoder Representation from Transformers) was used to extract text features, and ViT (Vision Transformer) was used to extract image features. Then, the contrastive loss in contrastive learning was employed to train a shallow model, and the image and text features were aligned before fusion. Finally, the cross-modal features were combined with the sentiment features to make classification judgments, and the use of information between different modalities was maximized to achieve sarcasm detection. Experimental results on the open dataset of multimodal sarcasm detection show that the accuracy and F1 value of MSDCS are at least 1.85% and 1.99% higher than those of the baseline model Decomposition and Relation Network (D&R Net), verifying the effectiveness of using sentiment information and contrastive learning in multimodal sarcasm detection.