Multimodal sarcasm detection model integrating contrastive learning with sentiment analysis

doi:10.11772/j.issn.1001-9081.2024050731

Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (5): 1432-1438.DOI: 10.11772/j.issn.1001-9081.2024050731

• 2024 China Granular Computing and Knowledge Discovery Conference (CGCKD2024) • Previous Articles

Multimodal sarcasm detection model integrating contrastive learning with sentiment analysis

Wenbin HU, Tianxiang CAI(), Tianle HAN, Zhaoman ZHONG, Changxia MA

School of Computer Engineering，Jiangsu Ocean University，Lianyungang Jiangsu 222005，China

Received:2024-06-03 Revised:2024-07-02 Accepted:2024-07-05 Online:2024-07-25 Published:2025-05-10
Contact: Tianxiang CAI
About author:HU Wenbin， born in 1976， Ph. D.， associate professor. Her research interests include personal privacy protection， social network analysis， pattern recognition.
CAI Tianxiang， born in 2000， M. S. candidate. His research interests include public opinion analysis， sarcasm detection.
HAN Tianle， born in 2000， M. S. candidate. His research interests include public opinion analysis， sentiment analysis.
ZHONG Zhaoman， born in 1977， Ph. D.， professor. His research interests include artificial intelligence， natural language processing， big data collection and analysis， social network analysis.
MA Changxia， born in 1975， Ph. D.， associate professor. Her research interests include pattern recognition and intelligent system， machine learning.
Supported by:
National Natural Science Foundation of China(72174079)

融合对比学习与情感分析的多模态反讽检测模型

胡文彬, 蔡天翔(), 韩天乐, 仲兆满, 马常霞

江苏海洋大学计算机工程学院，江苏连云港 222005

通讯作者: 蔡天翔
作者简介:胡文彬（1976—），女，江苏连云港人，副教授，博士，CCF会员，主要研究方向：个人隐私保护、社交网络分析、模式识别
蔡天翔（2000—），男，湖北随州人，硕士研究生，主要研究方向：舆情分析、反讽识别
韩天乐（2000—），男，江苏南京人，硕士研究生，主要研究方向：舆情分析、情感分析
仲兆满（1977—），男，江苏连云港人，教授，博士，CCF会员，主要研究方向：人工智能、自然语言处理、大数据采集与分析、社交网络分析
马常霞（1975—），女，江苏连云港人，副教授，博士，CCF会员，主要研究方向：模式识别与智能系统、机器学习。
基金资助:
国家自然科学基金资助项目(72174079)

Abstract

Abstract:

Comments on social media platforms sometimes express their attitudes towards events through sarcasm. Sarcasm detection can more accurately analyze user sentiments and opinions. But traditional models based on vocabulary and syntactic structure ignore the role of text sentiment information in sarcasm detection and suffer from performance degradation due to data noise. To address these limitations， a Multimodal Sarcasm Detection model integrating Contrastive learning with Sentiment analysis （MSDCS） was proposed. Firstly， BERT （Bidirectional Encoder Representation from Transformers） was used to extract text features， and ViT （Vision Transformer） was used to extract image features. Then， the contrastive loss in contrastive learning was employed to train a shallow model， and the image and text features were aligned before fusion. Finally， the cross-modal features were combined with the sentiment features to make classification judgments， and the use of information between different modalities was maximized to achieve sarcasm detection. Experimental results on the open dataset of multimodal sarcasm detection show that the accuracy and F1 value of MSDCS are at least 1.85% and 1.99% higher than those of the baseline model Decomposition and Relation Network （D&R Net）， verifying the effectiveness of using sentiment information and contrastive learning in multimodal sarcasm detection.

Key words: social media, sarcasm detection, sentiment analysis, contrastive learning, momentum distillation

摘要：

社交媒体平台上的评论有时会通过反讽来表达对事件的态度，通过反讽检测，可以更准确地分析用户情绪和观点。针对基于词汇和句法结构的传统模型忽略了文本情感信息对反讽检测的作用和由于数据噪声造成的检测性能降低等问题，提出一个融合对比学习和情感分析的多模态反讽检测模型（MSDCS）。首先，利用BERT（Bidirectional Encoder Representation from Transformers）提取文本特征，并利用ViT（Vision Transformer）提取图像特征；其次，利用对比学习中的对比损失训练浅层模型，在融合之前对齐图像和文本特征；最后，结合跨模态特征与情感特征融合后的结果作分类判断，最大限度地利用不同模态间信息实现反讽检测。在多模态反讽检测开放数据集上的实验结果表明，相较于基于分解和关系网络（D&R Net）的基准模型，MSDCS的准确率和F1值至少提高了1.85%和1.99%，验证了在多模态反讽检测中利用情感信息和对比学习的有效性。

关键词: 社交媒体, 反讽检测, 情感分析, 对比学习, 动量蒸馏

CLC Number:

TP391

Wenbin HU, Tianxiang CAI, Tianle HAN, Zhaoman ZHONG, Changxia MA. Multimodal sarcasm detection model integrating contrastive learning with sentiment analysis[J]. Journal of Computer Applications, 2025, 45(5): 1432-1438.

胡文彬, 蔡天翔, 韩天乐, 仲兆满, 马常霞. 融合对比学习与情感分析的多模态反讽检测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1432-1438.

Figures/Tables 8

References 27

1	LIU H， WEI R， TU G， et al. Sarcasm driven by sentiment： a sentiment-aware hierarchical fusion network for multimodal sarcasm detection［J］. Information Fusion， 2024， 108： No. 102353.
2	GONZÁLEZ-IBÁÑEZ R， MURESAN S， WACHOLDER N. Identifying sarcasm in Twitter： a closer look［C］// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2011： 581-586.
3	LUNANDO E， PURWARIANTI A. Indonesian social media sentiment analysis with sarcasm detection［C］// Proceedings of the 2013 International Conference on Advanced Computer Science and Information Systems. Piscataway： IEEE， 2013： 195-198.
4	JOSHI A， SHARMA V， BHATTACHARYYA P. Harnessing context incongruity for sarcasm detection［C］// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing （Volume 2： Short Papers）. Stroudsburg： ACL， 2015： 757-762.
5	CAI Y， CAI H， WAN X. Multi-modal sarcasm detection in twitter with hierarchical fusion model［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2019： 2506-2515.
6	SAVINI E， CARAGEA C. Intermediate-task transfer learning with BERT for sarcasm detection［J］. Mathematics， 2022， 10（5）： No. 844.
7	VITMAN O， KOSTIUK Y， SIDOROV G， et al. Sarcasm detection framework using context， emotion and sentiment features［J］. Expert Systems with Applications， 2023， 234： No. 121068.
8	ILIĆ S， MARRESE-TAYLOR E， BALAZS J A， et al. Deep contextualized word representations for detecting sarcasm and irony［C］// Proceedings of the 9th Workshop on Computational Approaches to Subjectivity， Sentiment and Social Media Analysis. Stroudsburg： ACL， 2018： 2-7.
9	MAJUMDER N， PORIA S， PENG H， et al. Sentiment and sarcasm classification with multitask learning［J］. IEEE Intelligent Systems， 2019， 34（3）： 38-43.
10	RAZALI M S， HALIN A A， NOROWI N M， et al. The importance of multimodality in sarcasm detection for sentiment analysis［C］// Proceedings of the 2017 IEEE 15th Student Conference on Research and Development. Piscataway： IEEE， 2017： 56-60.
11	BOUAZIZI M， OHTSUKI T. Sarcasm detection in Twitter： “all your products are incredibly amazing！！！” — are they really？［C］// Proceedings of the 2015 IEEE Global Communications Conference. Piscataway： IEEE， 2015： 1-6.
12	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale ［EB/OL］. ［2024-06-02］. .
13	RADFORD A， KIM J W， HALLACY C， et al. Learning transferable visual models from natural language supervision［C］// Proceedings of the 38th International Conference on Machine Learning. New York： PMLR， 2021： 8748-8763.
14	KIM W， SON B， KIM I. ViLT： vision-and-language Transformer without convolution or region supervision［C］// Proceedings of the 38th International Conference on Machine Learning. New York： PMLR， 2021： 5583-5594.
15	LI J， SELVARAJU R R， GOTMARE A D， et al. Align before fuse： vision and language representation learning with momentum distillation［C］// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2021： 9694-9705.
16	BAO H， WANG W， DONG L， et al. VLMo： unified vision-language pre-training with mixture-of-modality-experts［C］// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 32897-32912.
17	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
18	DENG J， DONG W， SOCHER R， et al. ImageNet： a large-scale hierarchical image database［C］// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2009： 248-255.
19	JIA M， XIE C， JING L. Debiasing multimodal sarcasm detection with contrastive learning［EB/OL］. ［2024-07-04］. .
20	LIU Y， OTT M， GOYAL N， et al. RoBERTa： a robustly optimized BERT pretraining approach［C］// Proceedings of the 20th Chinese National Conference on Computational Linguistics. Beijing： Chinese Information Processing Society of China， 2021： 1218-1227.
21	DEMSZKY D， MOVSHOVITZ-ATTIAS D， KO J， et al. GoEmotions： a dataset of fine-grained emotions［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 4040-4054.
22	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778.
23	ZHANG S， ZHENG D， HU X， et al. Bidirectional long short-term memory networks for relation classification［C］// Proceedings of the 29th Pacific Asia Conference on Language， Information and Computation. Stroudsburg： ACL， 2015： 73-78.
24	PAN H， LIN Z， FU P， et al. Modeling intra and inter-modality incongruity for multi-modal sarcasm detection［C］// Proceedings of the Findings of the Association for Computational Linguistics： EMNLP 2020. Stroudsburg： ACL， 2020： 1383-1392.
25	LIANG B， LOU C， LI X， et al. Multi-modal sarcasm detection with interactive in-modal and cross-modal graphs［C］// Proceedings of the 29th ACM International Conference on Multimedia. New York： ACM， 2021： 4707-4715.
26	XU N， ZENG Z， MAO W. Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 3777-3786.
27	YU J， JIANG J. Adapting BERT for target-oriented multimodal sentiment classification［C］// Proceedings of the 28th International Joint Conference on Artificial Intelligence. San Francisco： Morgan Kaufmann Publishers Inc.， 2019： 5408-5414.

样本集	样本数
样本集	反讽	非反讽	累计
总和	10 647	13 988	24 635
训练集	8 642	11 174	19 816
验证集	1 000	1 410	2 410
测试集	1 005	1 404	2 409

样本集	样本数
样本集	反讽	非反讽	累计
总和	10 647	13 988	24 635
训练集	8 642	11 174	19 816
验证集	1 000	1 410	2 410
测试集	1 005	1 404	2 409

模态	模型	准确率	精确率	召回率	F1
图像	ResNet	0.702 8	0.716 6	0.644 0	0.678 3
图像	ViT	0.732 6	0.676 4	0.688 5	0.682 4
文本	BiLSTM	0.735 5	0.692 4	0.658 7	0.675 1
文本	BERT	0.793 6	0.749 5	0.759 2	0.754 3
多模态（图像+文本）	TextCNN-ResNet	0.775 4	0.720 8	0.709 2	0.715 0
	BERT-LSTM-ResNet	0.756 0	0.702 4	0.715 0	0.708 6
	VLMo-base	0.837 3	0.802 0	0.810 0	0.805 9
	Res-BERT	0.824 2	0.768 7	0.807 7	0.787 7
	ALBEF	0.829 8	0.799 7	0.787 0	0.793 3
	InCrossMGs	0.832 1	0.732 4	0.821 7	0.773 5
	HFM	0.834 4	0.765 7	0.798 6	0.801 8
	D&R Net	0.840 2	0.779 7	0.834 2	0.806 0
	MSDCS	0.855 8	0.810 8	0.833 8	0.822 1

模态	模型	准确率	精确率	召回率	F1
图像	ResNet	0.702 8	0.716 6	0.644 0	0.678 3
图像	ViT	0.732 6	0.676 4	0.688 5	0.682 4
文本	BiLSTM	0.735 5	0.692 4	0.658 7	0.675 1
文本	BERT	0.793 6	0.749 5	0.759 2	0.754 3
多模态（图像+文本）	TextCNN-ResNet	0.775 4	0.720 8	0.709 2	0.715 0
	BERT-LSTM-ResNet	0.756 0	0.702 4	0.715 0	0.708 6
	VLMo-base	0.837 3	0.802 0	0.810 0	0.805 9
	Res-BERT	0.824 2	0.768 7	0.807 7	0.787 7
	ALBEF	0.829 8	0.799 7	0.787 0	0.793 3
	InCrossMGs	0.832 1	0.732 4	0.821 7	0.773 5
	HFM	0.834 4	0.765 7	0.798 6	0.801 8
	D&R Net	0.840 2	0.779 7	0.834 2	0.806 0
	MSDCS	0.855 8	0.810 8	0.833 8	0.822 1

实验	准确率	精确率	召回率	F1
MSDCS	0.855 8	0.810 8	0.833 8	0.822 1
w/o-S	0.837 3	0.802 0	0.810 0	0.805 9
w/o-E	0.826 8	0.808 4	0.798 0	0.803 2
w/o-S-E	0.817 3	0.793 8	0.785 1	0.789 4
w/o-M	0.801 9	0.777 6	0.764 3	0.770 9

Multimodal sarcasm detection model integrating contrastive learning with sentiment analysis

融合对比学习与情感分析的多模态反讽检测模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 27

Related Articles 15

Recommended Articles

Metrics

数据集	模型	准确率	Macro-F1
Twitter-15	MSDCS	0.789 8	0.770 8
Twitter-15	TomBERT	0.761 8	0.712 7
Twitter-17	MSDCS	0.719 8	0.687 3
Twitter-17	TomBERT	0.705 0	0.680 4

[1]	Yufei LONG, Yuchen MOU, Ye LIU. Multi-source data representation learning model based on tensorized graph convolutional network and contrastive learning [J]. Journal of Computer Applications, 2025, 45(5): 1372-1378.
[2]	Chun XU, Shuangyan JI, Huan MA, Enwei SUN, Mengmeng WANG, Mingyu SU. Consultation recommendation method based on knowledge graph and dialogue structure [J]. Journal of Computer Applications, 2025, 45(4): 1157-1168.
[3]	Weichao DANG, Xinyu WEN, Gaimei GAO, Chunxia LIU. Multi-view and multi-scale contrastive learning for graph collaborative filtering [J]. Journal of Computer Applications, 2025, 45(4): 1061-1068.
[4]	Renjie TIAN, Mingli JING, Long JIAO, Fei WANG. Recommendation algorithm of graph contrastive learning based on hybrid negative sampling [J]. Journal of Computer Applications, 2025, 45(4): 1053-1060.
[5]	Wei CHEN, Changyong SHI, Chuanxiang MA. Crop disease recognition method based on multi-modal data fusion [J]. Journal of Computer Applications, 2025, 45(3): 840-848.
[6]	Yuanlong WANG, Tinghua LIU, Hu ZHANG. Commonsense question answering model based on cross-modal contrastive learning [J]. Journal of Computer Applications, 2025, 45(3): 732-738.
[7]	Sheng YANG, Yan LI. Contrastive knowledge distillation method for object detection [J]. Journal of Computer Applications, 2025, 45(2): 354-361.
[8]	Xiaosheng YU, Zhixin WANG. Sequential recommendation model based on multi-level graph contrastive learning [J]. Journal of Computer Applications, 2025, 45(1): 106-114.
[9]	Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG. Recommendation model combining self-features and contrastive learning [J]. Journal of Computer Applications, 2024, 44(9): 2704-2710.
[10]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[11]	Huanliang SUN, Siyi WANG, Junling LIU, Jingke XU. Help-seeking information extraction model for flood event in social media data [J]. Journal of Computer Applications, 2024, 44(8): 2437-2445.
[12]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[13]	Xiaoxia JIANG, Ruizhang HUANG, Ruina BAI, Lina REN, Yanping CHEN. Deep event clustering method based on event representation and contrastive learning [J]. Journal of Computer Applications, 2024, 44(6): 1734-1742.
[14]	Tianci KE, Jianhua LIU, Shuihua SUN, Zhixiong ZHENG, Zijie CAI. Aspect-level sentiment analysis model combining strong association dependency and concise syntax [J]. Journal of Computer Applications, 2024, 44(6): 1786-1795.
[15]	Jiong WANG, Taotao TANG, Caiyan JIA. PAGCL： positive augmentation graph contrastive learning recommendation method without negative sampling [J]. Journal of Computer Applications, 2024, 44(5): 1485-1492.