Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (5): 1432-1438.DOI: 10.11772/j.issn.1001-9081.2024050731
• 2024 China Granular Computing and Knowledge Discovery Conference (CGCKD2024) • Previous Articles
Wenbin HU, Tianxiang CAI(
), Tianle HAN, Zhaoman ZHONG, Changxia MA
Received:2024-06-03
Revised:2024-07-02
Accepted:2024-07-05
Online:2024-07-25
Published:2025-05-10
Contact:
Tianxiang CAI
About author:HU Wenbin, born in 1976, Ph. D., associate professor. Her research interests include personal privacy protection, social network analysis, pattern recognition.Supported by:通讯作者:
蔡天翔
作者简介:胡文彬(1976—),女,江苏连云港人,副教授,博士,CCF会员,主要研究方向:个人隐私保护、社交网络分析、模式识别基金资助:CLC Number:
Wenbin HU, Tianxiang CAI, Tianle HAN, Zhaoman ZHONG, Changxia MA. Multimodal sarcasm detection model integrating contrastive learning with sentiment analysis[J]. Journal of Computer Applications, 2025, 45(5): 1432-1438.
胡文彬, 蔡天翔, 韩天乐, 仲兆满, 马常霞. 融合对比学习与情感分析的多模态反讽检测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1432-1438.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024050731
| 样本集 | 样本数 | ||
|---|---|---|---|
| 反讽 | 非反讽 | 累计 | |
| 总和 | 10 647 | 13 988 | 24 635 |
| 训练集 | 8 642 | 11 174 | 19 816 |
| 验证集 | 1 000 | 1 410 | 2 410 |
| 测试集 | 1 005 | 1 404 | 2 409 |
Tab. 1 Twitter dataset used in experiments
| 样本集 | 样本数 | ||
|---|---|---|---|
| 反讽 | 非反讽 | 累计 | |
| 总和 | 10 647 | 13 988 | 24 635 |
| 训练集 | 8 642 | 11 174 | 19 816 |
| 验证集 | 1 000 | 1 410 | 2 410 |
| 测试集 | 1 005 | 1 404 | 2 409 |
| 模态 | 模型 | 准确率 | 精确率 | 召回率 | F1 |
|---|---|---|---|---|---|
| 图像 | ResNet | 0.702 8 | 0.716 6 | 0.644 0 | 0.678 3 |
| ViT | 0.732 6 | 0.676 4 | 0.688 5 | 0.682 4 | |
| 文本 | BiLSTM | 0.735 5 | 0.692 4 | 0.658 7 | 0.675 1 |
| BERT | 0.793 6 | 0.749 5 | 0.759 2 | 0.754 3 | |
| 多模态(图像+文本) | TextCNN-ResNet | 0.775 4 | 0.720 8 | 0.709 2 | 0.715 0 |
| BERT-LSTM-ResNet | 0.756 0 | 0.702 4 | 0.715 0 | 0.708 6 | |
| VLMo-base | 0.837 3 | 0.802 0 | 0.810 0 | 0.805 9 | |
| Res-BERT | 0.824 2 | 0.768 7 | 0.807 7 | 0.787 7 | |
| ALBEF | 0.829 8 | 0.799 7 | 0.787 0 | 0.793 3 | |
| InCrossMGs | 0.832 1 | 0.732 4 | 0.821 7 | 0.773 5 | |
| HFM | 0.834 4 | 0.765 7 | 0.798 6 | 0.801 8 | |
| D&R Net | 0.840 2 | 0.779 7 | 0.834 2 | 0.806 0 | |
| MSDCS | 0.855 8 | 0.810 8 | 0.833 8 | 0.822 1 |
Tab. 2 Comparison of evaluation metrics for experimental models
| 模态 | 模型 | 准确率 | 精确率 | 召回率 | F1 |
|---|---|---|---|---|---|
| 图像 | ResNet | 0.702 8 | 0.716 6 | 0.644 0 | 0.678 3 |
| ViT | 0.732 6 | 0.676 4 | 0.688 5 | 0.682 4 | |
| 文本 | BiLSTM | 0.735 5 | 0.692 4 | 0.658 7 | 0.675 1 |
| BERT | 0.793 6 | 0.749 5 | 0.759 2 | 0.754 3 | |
| 多模态(图像+文本) | TextCNN-ResNet | 0.775 4 | 0.720 8 | 0.709 2 | 0.715 0 |
| BERT-LSTM-ResNet | 0.756 0 | 0.702 4 | 0.715 0 | 0.708 6 | |
| VLMo-base | 0.837 3 | 0.802 0 | 0.810 0 | 0.805 9 | |
| Res-BERT | 0.824 2 | 0.768 7 | 0.807 7 | 0.787 7 | |
| ALBEF | 0.829 8 | 0.799 7 | 0.787 0 | 0.793 3 | |
| InCrossMGs | 0.832 1 | 0.732 4 | 0.821 7 | 0.773 5 | |
| HFM | 0.834 4 | 0.765 7 | 0.798 6 | 0.801 8 | |
| D&R Net | 0.840 2 | 0.779 7 | 0.834 2 | 0.806 0 | |
| MSDCS | 0.855 8 | 0.810 8 | 0.833 8 | 0.822 1 |
| 实验 | 准确率 | 精确率 | 召回率 | F1 |
|---|---|---|---|---|
| MSDCS | 0.855 8 | 0.810 8 | 0.833 8 | 0.822 1 |
| w/o-S | 0.837 3 | 0.802 0 | 0.810 0 | 0.805 9 |
| w/o-E | 0.826 8 | 0.808 4 | 0.798 0 | 0.803 2 |
| w/o-S-E | 0.817 3 | 0.793 8 | 0.785 1 | 0.789 4 |
| w/o-M | 0.801 9 | 0.777 6 | 0.764 3 | 0.770 9 |
Tab. 3 Ablation experimental results
| 实验 | 准确率 | 精确率 | 召回率 | F1 |
|---|---|---|---|---|
| MSDCS | 0.855 8 | 0.810 8 | 0.833 8 | 0.822 1 |
| w/o-S | 0.837 3 | 0.802 0 | 0.810 0 | 0.805 9 |
| w/o-E | 0.826 8 | 0.808 4 | 0.798 0 | 0.803 2 |
| w/o-S-E | 0.817 3 | 0.793 8 | 0.785 1 | 0.789 4 |
| w/o-M | 0.801 9 | 0.777 6 | 0.764 3 | 0.770 9 |
| 数据集 | 模型 | 准确率 | Macro-F1 |
|---|---|---|---|
| Twitter-15 | MSDCS | 0.789 8 | 0.770 8 |
| TomBERT | 0.761 8 | 0.712 7 | |
| Twitter-17 | MSDCS | 0.719 8 | 0.687 3 |
| TomBERT | 0.705 0 | 0.680 4 |
Tab. 4 Sentiment categorized results
| 数据集 | 模型 | 准确率 | Macro-F1 |
|---|---|---|---|
| Twitter-15 | MSDCS | 0.789 8 | 0.770 8 |
| TomBERT | 0.761 8 | 0.712 7 | |
| Twitter-17 | MSDCS | 0.719 8 | 0.687 3 |
| TomBERT | 0.705 0 | 0.680 4 |
| 1 | LIU H, WEI R, TU G, et al. Sarcasm driven by sentiment: a sentiment-aware hierarchical fusion network for multimodal sarcasm detection[J]. Information Fusion, 2024, 108: No. 102353. |
| 2 | GONZÁLEZ-IBÁÑEZ R, MURESAN S, WACHOLDER N. Identifying sarcasm in Twitter: a closer look[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2011: 581-586. |
| 3 | LUNANDO E, PURWARIANTI A. Indonesian social media sentiment analysis with sarcasm detection[C]// Proceedings of the 2013 International Conference on Advanced Computer Science and Information Systems. Piscataway: IEEE, 2013: 195-198. |
| 4 | JOSHI A, SHARMA V, BHATTACHARYYA P. Harnessing context incongruity for sarcasm detection[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Stroudsburg: ACL, 2015: 757-762. |
| 5 | CAI Y, CAI H, WAN X. Multi-modal sarcasm detection in twitter with hierarchical fusion model[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 2506-2515. |
| 6 | SAVINI E, CARAGEA C. Intermediate-task transfer learning with BERT for sarcasm detection[J]. Mathematics, 2022, 10(5): No. 844. |
| 7 | VITMAN O, KOSTIUK Y, SIDOROV G, et al. Sarcasm detection framework using context, emotion and sentiment features[J]. Expert Systems with Applications, 2023, 234: No. 121068. |
| 8 | ILIĆ S, MARRESE-TAYLOR E, BALAZS J A, et al. Deep contextualized word representations for detecting sarcasm and irony[C]// Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Stroudsburg: ACL, 2018: 2-7. |
| 9 | MAJUMDER N, PORIA S, PENG H, et al. Sentiment and sarcasm classification with multitask learning[J]. IEEE Intelligent Systems, 2019, 34(3): 38-43. |
| 10 | RAZALI M S, HALIN A A, NOROWI N M, et al. The importance of multimodality in sarcasm detection for sentiment analysis[C]// Proceedings of the 2017 IEEE 15th Student Conference on Research and Development. Piscataway: IEEE, 2017: 56-60. |
| 11 | BOUAZIZI M, OHTSUKI T. Sarcasm detection in Twitter: “all your products are incredibly amazing!!!” — are they really?[C]// Proceedings of the 2015 IEEE Global Communications Conference. Piscataway: IEEE, 2015: 1-6. |
| 12 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL]. [2024-06-02]. . |
| 13 | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// Proceedings of the 38th International Conference on Machine Learning. New York: PMLR, 2021: 8748-8763. |
| 14 | KIM W, SON B, KIM I. ViLT: vision-and-language Transformer without convolution or region supervision[C]// Proceedings of the 38th International Conference on Machine Learning. New York: PMLR, 2021: 5583-5594. |
| 15 | LI J, SELVARAJU R R, GOTMARE A D, et al. Align before fuse: vision and language representation learning with momentum distillation[C]// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 9694-9705. |
| 16 | BAO H, WANG W, DONG L, et al. VLMo: unified vision-language pre-training with mixture-of-modality-experts[C]// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 32897-32912. |
| 17 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg: ACL, 2019: 4171-4186. |
| 18 | DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009: 248-255. |
| 19 | JIA M, XIE C, JING L. Debiasing multimodal sarcasm detection with contrastive learning[EB/OL]. [2024-07-04]. . |
| 20 | LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[C]// Proceedings of the 20th Chinese National Conference on Computational Linguistics. Beijing: Chinese Information Processing Society of China, 2021: 1218-1227. |
| 21 | DEMSZKY D, MOVSHOVITZ-ATTIAS D, KO J, et al. GoEmotions: a dataset of fine-grained emotions[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 4040-4054. |
| 22 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. |
| 23 | ZHANG S, ZHENG D, HU X, et al. Bidirectional long short-term memory networks for relation classification[C]// Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. Stroudsburg: ACL, 2015: 73-78. |
| 24 | PAN H, LIN Z, FU P, et al. Modeling intra and inter-modality incongruity for multi-modal sarcasm detection[C]// Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg: ACL, 2020: 1383-1392. |
| 25 | LIANG B, LOU C, LI X, et al. Multi-modal sarcasm detection with interactive in-modal and cross-modal graphs[C]// Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 4707-4715. |
| 26 | XU N, ZENG Z, MAO W. Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 3777-3786. |
| 27 | YU J, JIANG J. Adapting BERT for target-oriented multimodal sentiment classification[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers Inc., 2019: 5408-5414. |
| [1] | Yufei LONG, Yuchen MOU, Ye LIU. Multi-source data representation learning model based on tensorized graph convolutional network and contrastive learning [J]. Journal of Computer Applications, 2025, 45(5): 1372-1378. |
| [2] | Chun XU, Shuangyan JI, Huan MA, Enwei SUN, Mengmeng WANG, Mingyu SU. Consultation recommendation method based on knowledge graph and dialogue structure [J]. Journal of Computer Applications, 2025, 45(4): 1157-1168. |
| [3] | Weichao DANG, Xinyu WEN, Gaimei GAO, Chunxia LIU. Multi-view and multi-scale contrastive learning for graph collaborative filtering [J]. Journal of Computer Applications, 2025, 45(4): 1061-1068. |
| [4] | Renjie TIAN, Mingli JING, Long JIAO, Fei WANG. Recommendation algorithm of graph contrastive learning based on hybrid negative sampling [J]. Journal of Computer Applications, 2025, 45(4): 1053-1060. |
| [5] | Wei CHEN, Changyong SHI, Chuanxiang MA. Crop disease recognition method based on multi-modal data fusion [J]. Journal of Computer Applications, 2025, 45(3): 840-848. |
| [6] | Yuanlong WANG, Tinghua LIU, Hu ZHANG. Commonsense question answering model based on cross-modal contrastive learning [J]. Journal of Computer Applications, 2025, 45(3): 732-738. |
| [7] | Sheng YANG, Yan LI. Contrastive knowledge distillation method for object detection [J]. Journal of Computer Applications, 2025, 45(2): 354-361. |
| [8] | Xiaosheng YU, Zhixin WANG. Sequential recommendation model based on multi-level graph contrastive learning [J]. Journal of Computer Applications, 2025, 45(1): 106-114. |
| [9] | Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG. Recommendation model combining self-features and contrastive learning [J]. Journal of Computer Applications, 2024, 44(9): 2704-2710. |
| [10] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
| [11] | Huanliang SUN, Siyi WANG, Junling LIU, Jingke XU. Help-seeking information extraction model for flood event in social media data [J]. Journal of Computer Applications, 2024, 44(8): 2437-2445. |
| [12] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
| [13] | Xiaoxia JIANG, Ruizhang HUANG, Ruina BAI, Lina REN, Yanping CHEN. Deep event clustering method based on event representation and contrastive learning [J]. Journal of Computer Applications, 2024, 44(6): 1734-1742. |
| [14] | Tianci KE, Jianhua LIU, Shuihua SUN, Zhixiong ZHENG, Zijie CAI. Aspect-level sentiment analysis model combining strong association dependency and concise syntax [J]. Journal of Computer Applications, 2024, 44(6): 1786-1795. |
| [15] | Jiong WANG, Taotao TANG, Caiyan JIA. PAGCL: positive augmentation graph contrastive learning recommendation method without negative sampling [J]. Journal of Computer Applications, 2024, 44(5): 1485-1492. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||