融合对比学习与情感分析的多模态反讽检测模型

doi:10.11772/j.issn.1001-9081.2024050731

《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (5): 1432-1438.DOI: 10.11772/j.issn.1001-9081.2024050731

• 2024年中国粒计算与知识发现学术会议 • 上一篇

融合对比学习与情感分析的多模态反讽检测模型

胡文彬, 蔡天翔(), 韩天乐, 仲兆满, 马常霞

江苏海洋大学计算机工程学院，江苏连云港 222005

收稿日期:2024-06-03 修回日期:2024-07-02 接受日期:2024-07-05 发布日期:2024-07-25 出版日期:2025-05-10
通讯作者: 蔡天翔
作者简介:胡文彬（1976—），女，江苏连云港人，副教授，博士，CCF会员，主要研究方向：个人隐私保护、社交网络分析、模式识别
蔡天翔（2000—），男，湖北随州人，硕士研究生，主要研究方向：舆情分析、反讽识别
韩天乐（2000—），男，江苏南京人，硕士研究生，主要研究方向：舆情分析、情感分析
仲兆满（1977—），男，江苏连云港人，教授，博士，CCF会员，主要研究方向：人工智能、自然语言处理、大数据采集与分析、社交网络分析
马常霞（1975—），女，江苏连云港人，副教授，博士，CCF会员，主要研究方向：模式识别与智能系统、机器学习。
基金资助:
国家自然科学基金资助项目(72174079)

Multimodal sarcasm detection model integrating contrastive learning with sentiment analysis

Wenbin HU, Tianxiang CAI(), Tianle HAN, Zhaoman ZHONG, Changxia MA

School of Computer Engineering，Jiangsu Ocean University，Lianyungang Jiangsu 222005，China

Received:2024-06-03 Revised:2024-07-02 Accepted:2024-07-05 Online:2024-07-25 Published:2025-05-10
Contact: Tianxiang CAI
About author:HU Wenbin， born in 1976， Ph. D.， associate professor. Her research interests include personal privacy protection， social network analysis， pattern recognition.
CAI Tianxiang， born in 2000， M. S. candidate. His research interests include public opinion analysis， sarcasm detection.
HAN Tianle， born in 2000， M. S. candidate. His research interests include public opinion analysis， sentiment analysis.
ZHONG Zhaoman， born in 1977， Ph. D.， professor. His research interests include artificial intelligence， natural language processing， big data collection and analysis， social network analysis.
MA Changxia， born in 1975， Ph. D.， associate professor. Her research interests include pattern recognition and intelligent system， machine learning.
Supported by:
National Natural Science Foundation of China(72174079)

摘要/Abstract

摘要：

社交媒体平台上的评论有时会通过反讽来表达对事件的态度，通过反讽检测，可以更准确地分析用户情绪和观点。针对基于词汇和句法结构的传统模型忽略了文本情感信息对反讽检测的作用和由于数据噪声造成的检测性能降低等问题，提出一个融合对比学习和情感分析的多模态反讽检测模型（MSDCS）。首先，利用BERT（Bidirectional Encoder Representation from Transformers）提取文本特征，并利用ViT（Vision Transformer）提取图像特征；其次，利用对比学习中的对比损失训练浅层模型，在融合之前对齐图像和文本特征；最后，结合跨模态特征与情感特征融合后的结果作分类判断，最大限度地利用不同模态间信息实现反讽检测。在多模态反讽检测开放数据集上的实验结果表明，相较于基于分解和关系网络（D&R Net）的基准模型，MSDCS的准确率和F1值至少提高了1.85%和1.99%，验证了在多模态反讽检测中利用情感信息和对比学习的有效性。

关键词: 社交媒体, 反讽检测, 情感分析, 对比学习, 动量蒸馏

Abstract:

Comments on social media platforms sometimes express their attitudes towards events through sarcasm. Sarcasm detection can more accurately analyze user sentiments and opinions. But traditional models based on vocabulary and syntactic structure ignore the role of text sentiment information in sarcasm detection and suffer from performance degradation due to data noise. To address these limitations， a Multimodal Sarcasm Detection model integrating Contrastive learning with Sentiment analysis （MSDCS） was proposed. Firstly， BERT （Bidirectional Encoder Representation from Transformers） was used to extract text features， and ViT （Vision Transformer） was used to extract image features. Then， the contrastive loss in contrastive learning was employed to train a shallow model， and the image and text features were aligned before fusion. Finally， the cross-modal features were combined with the sentiment features to make classification judgments， and the use of information between different modalities was maximized to achieve sarcasm detection. Experimental results on the open dataset of multimodal sarcasm detection show that the accuracy and F1 value of MSDCS are at least 1.85% and 1.99% higher than those of the baseline model Decomposition and Relation Network （D&R Net）， verifying the effectiveness of using sentiment information and contrastive learning in multimodal sarcasm detection.

Key words: social media, sarcasm detection, sentiment analysis, contrastive learning, momentum distillation

中图分类号:

TP391

胡文彬, 蔡天翔, 韩天乐, 仲兆满, 马常霞. 融合对比学习与情感分析的多模态反讽检测模型[J]. 计算机应用, 2025, 45(5): 1432-1438.

Wenbin HU, Tianxiang CAI, Tianle HAN, Zhaoman ZHONG, Changxia MA. Multimodal sarcasm detection model integrating contrastive learning with sentiment analysis[J]. Journal of Computer Applications, 2025, 45(5): 1432-1438.

图/表 8

参考文献 27

1	LIU H， WEI R， TU G， et al. Sarcasm driven by sentiment： a sentiment-aware hierarchical fusion network for multimodal sarcasm detection［J］. Information Fusion， 2024， 108： No. 102353.
2	GONZÁLEZ-IBÁÑEZ R， MURESAN S， WACHOLDER N. Identifying sarcasm in Twitter： a closer look［C］// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2011： 581-586.
3	LUNANDO E， PURWARIANTI A. Indonesian social media sentiment analysis with sarcasm detection［C］// Proceedings of the 2013 International Conference on Advanced Computer Science and Information Systems. Piscataway： IEEE， 2013： 195-198.
4	JOSHI A， SHARMA V， BHATTACHARYYA P. Harnessing context incongruity for sarcasm detection［C］// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing （Volume 2： Short Papers）. Stroudsburg： ACL， 2015： 757-762.
5	CAI Y， CAI H， WAN X. Multi-modal sarcasm detection in twitter with hierarchical fusion model［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2019： 2506-2515.
6	SAVINI E， CARAGEA C. Intermediate-task transfer learning with BERT for sarcasm detection［J］. Mathematics， 2022， 10（5）： No. 844.
7	VITMAN O， KOSTIUK Y， SIDOROV G， et al. Sarcasm detection framework using context， emotion and sentiment features［J］. Expert Systems with Applications， 2023， 234： No. 121068.
8	ILIĆ S， MARRESE-TAYLOR E， BALAZS J A， et al. Deep contextualized word representations for detecting sarcasm and irony［C］// Proceedings of the 9th Workshop on Computational Approaches to Subjectivity， Sentiment and Social Media Analysis. Stroudsburg： ACL， 2018： 2-7.
9	MAJUMDER N， PORIA S， PENG H， et al. Sentiment and sarcasm classification with multitask learning［J］. IEEE Intelligent Systems， 2019， 34（3）： 38-43.
10	RAZALI M S， HALIN A A， NOROWI N M， et al. The importance of multimodality in sarcasm detection for sentiment analysis［C］// Proceedings of the 2017 IEEE 15th Student Conference on Research and Development. Piscataway： IEEE， 2017： 56-60.
11	BOUAZIZI M， OHTSUKI T. Sarcasm detection in Twitter： “all your products are incredibly amazing！！！” — are they really？［C］// Proceedings of the 2015 IEEE Global Communications Conference. Piscataway： IEEE， 2015： 1-6.
12	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale ［EB/OL］. ［2024-06-02］. .
13	RADFORD A， KIM J W， HALLACY C， et al. Learning transferable visual models from natural language supervision［C］// Proceedings of the 38th International Conference on Machine Learning. New York： PMLR， 2021： 8748-8763.
14	KIM W， SON B， KIM I. ViLT： vision-and-language Transformer without convolution or region supervision［C］// Proceedings of the 38th International Conference on Machine Learning. New York： PMLR， 2021： 5583-5594.
15	LI J， SELVARAJU R R， GOTMARE A D， et al. Align before fuse： vision and language representation learning with momentum distillation［C］// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2021： 9694-9705.
16	BAO H， WANG W， DONG L， et al. VLMo： unified vision-language pre-training with mixture-of-modality-experts［C］// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 32897-32912.
17	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
18	DENG J， DONG W， SOCHER R， et al. ImageNet： a large-scale hierarchical image database［C］// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2009： 248-255.
19	JIA M， XIE C， JING L. Debiasing multimodal sarcasm detection with contrastive learning［EB/OL］. ［2024-07-04］. .
20	LIU Y， OTT M， GOYAL N， et al. RoBERTa： a robustly optimized BERT pretraining approach［C］// Proceedings of the 20th Chinese National Conference on Computational Linguistics. Beijing： Chinese Information Processing Society of China， 2021： 1218-1227.
21	DEMSZKY D， MOVSHOVITZ-ATTIAS D， KO J， et al. GoEmotions： a dataset of fine-grained emotions［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 4040-4054.
22	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778.
23	ZHANG S， ZHENG D， HU X， et al. Bidirectional long short-term memory networks for relation classification［C］// Proceedings of the 29th Pacific Asia Conference on Language， Information and Computation. Stroudsburg： ACL， 2015： 73-78.
24	PAN H， LIN Z， FU P， et al. Modeling intra and inter-modality incongruity for multi-modal sarcasm detection［C］// Proceedings of the Findings of the Association for Computational Linguistics： EMNLP 2020. Stroudsburg： ACL， 2020： 1383-1392.
25	LIANG B， LOU C， LI X， et al. Multi-modal sarcasm detection with interactive in-modal and cross-modal graphs［C］// Proceedings of the 29th ACM International Conference on Multimedia. New York： ACM， 2021： 4707-4715.
26	XU N， ZENG Z， MAO W. Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 3777-3786.
27	YU J， JIANG J. Adapting BERT for target-oriented multimodal sentiment classification［C］// Proceedings of the 28th International Joint Conference on Artificial Intelligence. San Francisco： Morgan Kaufmann Publishers Inc.， 2019： 5408-5414.

样本集	样本数
样本集	反讽	非反讽	累计
总和	10 647	13 988	24 635
训练集	8 642	11 174	19 816
验证集	1 000	1 410	2 410
测试集	1 005	1 404	2 409

样本集	样本数
样本集	反讽	非反讽	累计
总和	10 647	13 988	24 635
训练集	8 642	11 174	19 816
验证集	1 000	1 410	2 410
测试集	1 005	1 404	2 409

模态	模型	准确率	精确率	召回率	F1
图像	ResNet	0.702 8	0.716 6	0.644 0	0.678 3
图像	ViT	0.732 6	0.676 4	0.688 5	0.682 4
文本	BiLSTM	0.735 5	0.692 4	0.658 7	0.675 1
文本	BERT	0.793 6	0.749 5	0.759 2	0.754 3
多模态（图像+文本）	TextCNN-ResNet	0.775 4	0.720 8	0.709 2	0.715 0
	BERT-LSTM-ResNet	0.756 0	0.702 4	0.715 0	0.708 6
	VLMo-base	0.837 3	0.802 0	0.810 0	0.805 9
	Res-BERT	0.824 2	0.768 7	0.807 7	0.787 7
	ALBEF	0.829 8	0.799 7	0.787 0	0.793 3
	InCrossMGs	0.832 1	0.732 4	0.821 7	0.773 5
	HFM	0.834 4	0.765 7	0.798 6	0.801 8
	D&R Net	0.840 2	0.779 7	0.834 2	0.806 0
	MSDCS	0.855 8	0.810 8	0.833 8	0.822 1

模态	模型	准确率	精确率	召回率	F1
图像	ResNet	0.702 8	0.716 6	0.644 0	0.678 3
图像	ViT	0.732 6	0.676 4	0.688 5	0.682 4
文本	BiLSTM	0.735 5	0.692 4	0.658 7	0.675 1
文本	BERT	0.793 6	0.749 5	0.759 2	0.754 3
多模态（图像+文本）	TextCNN-ResNet	0.775 4	0.720 8	0.709 2	0.715 0
	BERT-LSTM-ResNet	0.756 0	0.702 4	0.715 0	0.708 6
	VLMo-base	0.837 3	0.802 0	0.810 0	0.805 9
	Res-BERT	0.824 2	0.768 7	0.807 7	0.787 7
	ALBEF	0.829 8	0.799 7	0.787 0	0.793 3
	InCrossMGs	0.832 1	0.732 4	0.821 7	0.773 5
	HFM	0.834 4	0.765 7	0.798 6	0.801 8
	D&R Net	0.840 2	0.779 7	0.834 2	0.806 0
	MSDCS	0.855 8	0.810 8	0.833 8	0.822 1

实验	准确率	精确率	召回率	F1
MSDCS	0.855 8	0.810 8	0.833 8	0.822 1
w/o-S	0.837 3	0.802 0	0.810 0	0.805 9
w/o-E	0.826 8	0.808 4	0.798 0	0.803 2
w/o-S-E	0.817 3	0.793 8	0.785 1	0.789 4
w/o-M	0.801 9	0.777 6	0.764 3	0.770 9

融合对比学习与情感分析的多模态反讽检测模型

Multimodal sarcasm detection model integrating contrastive learning with sentiment analysis

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 27

相关文章 15

编辑推荐

Metrics

数据集	模型	准确率	Macro-F1
Twitter-15	MSDCS	0.789 8	0.770 8
Twitter-15	TomBERT	0.761 8	0.712 7
Twitter-17	MSDCS	0.719 8	0.687 3
Twitter-17	TomBERT	0.705 0	0.680 4

[1]	徐博, 郝德志, 于迩晨, 林鸿飞, 宗林林. 面向对话生成和心理疾病检测的心理咨询式人机对话数据集构建[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1395-1402.
[2]	龙雨菲, 牟宇辰, 刘晔. 基于张量化图卷积网络和对比学习的多源数据表示学习模型[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1372-1378.
[3]	田仁杰, 景明利, 焦龙, 王飞. 基于混合负采样的图对比学习推荐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1053-1060.
[4]	徐春, 吉双焱, 马欢, 孙恩威, 王萌萌, 苏明钰. 基于知识图谱和对话结构的问诊推荐方法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1157-1168.
[5]	党伟超, 温鑫瑜, 高改梅, 刘春霞. 基于多视图多尺度对比学习的图协同过滤[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1061-1068.
[6]	陈维, 施昌勇, 马传香. 基于多模态数据融合的农作物病害识别方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 840-848.
[7]	王元龙, 刘亭华, 张虎. 基于跨模态对比学习的常识问答模型[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 732-738.
[8]	杨晟, 李岩. 面向目标检测的对比知识蒸馏方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 354-361.
[9]	严雪文, 黄章进. 基于对比学习的小样本图像分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 383-391.
[10]	余肖生, 王智鑫. 基于多层次图对比学习的序列推荐模型[J]. 《计算机应用》唯一官方网站, 2025, 45(1): 106-114.
[11]	杨兴耀, 陈羽, 于炯, 张祖莲, 陈嘉颖, 王东晓. 结合自我特征和对比学习的推荐模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2704-2710.
[12]	孙焕良, 王思懿, 刘俊岭, 许景科. 社交媒体数据中水灾事件求助信息提取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2437-2445.
[13]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[14]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[15]	蒋小霞, 黄瑞章, 白瑞娜, 任丽娜, 陈艳平. 基于事件表示和对比学习的深度事件聚类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1734-1742.