| 1 | LIU H, WEI R, TU G, et al. Sarcasm driven by sentiment: a sentiment-aware hierarchical fusion network for multimodal sarcasm detection[J]. Information Fusion, 2024, 108: No. 102353. | 
																													
																						| 2 | GONZÁLEZ-IBÁÑEZ R, MURESAN S, WACHOLDER N. Identifying sarcasm in Twitter: a closer look[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2011: 581-586. | 
																													
																						| 3 | LUNANDO E, PURWARIANTI A. Indonesian social media sentiment analysis with sarcasm detection[C]// Proceedings of the 2013 International Conference on Advanced Computer Science and Information Systems. Piscataway: IEEE, 2013: 195-198. | 
																													
																						| 4 | JOSHI A, SHARMA V, BHATTACHARYYA P. Harnessing context incongruity for sarcasm detection[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Stroudsburg: ACL, 2015: 757-762. | 
																													
																						| 5 | CAI Y, CAI H, WAN X. Multi-modal sarcasm detection in twitter with hierarchical fusion model[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 2506-2515. | 
																													
																						| 6 | SAVINI E, CARAGEA C. Intermediate-task transfer learning with BERT for sarcasm detection[J]. Mathematics, 2022, 10(5): No. 844. | 
																													
																						| 7 | VITMAN O, KOSTIUK Y, SIDOROV G, et al. Sarcasm detection framework using context, emotion and sentiment features[J]. Expert Systems with Applications, 2023, 234: No. 121068. | 
																													
																						| 8 | ILIĆ S, MARRESE-TAYLOR E, BALAZS J A, et al. Deep contextualized word representations for detecting sarcasm and irony[C]// Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Stroudsburg: ACL, 2018: 2-7. | 
																													
																						| 9 | MAJUMDER N, PORIA S, PENG H, et al. Sentiment and sarcasm classification with multitask learning[J]. IEEE Intelligent Systems, 2019, 34(3): 38-43. | 
																													
																						| 10 | RAZALI M S, HALIN A A, NOROWI N M, et al. The importance of multimodality in sarcasm detection for sentiment analysis[C]// Proceedings of the 2017 IEEE 15th Student Conference on Research and Development. Piscataway: IEEE, 2017: 56-60. | 
																													
																						| 11 | BOUAZIZI M, OHTSUKI T. Sarcasm detection in Twitter: “all your products are incredibly amazing!!!” — are they really?[C]// Proceedings of the 2015 IEEE Global Communications Conference. Piscataway: IEEE, 2015: 1-6. | 
																													
																						| 12 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL]. [2024-06-02]. . | 
																													
																						| 13 | RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// Proceedings of the 38th International Conference on Machine Learning. New York: PMLR, 2021: 8748-8763. | 
																													
																						| 14 | KIM W, SON B, KIM I. ViLT: vision-and-language Transformer without convolution or region supervision[C]// Proceedings of the 38th International Conference on Machine Learning. New York: PMLR, 2021: 5583-5594. | 
																													
																						| 15 | LI J, SELVARAJU R R, GOTMARE A D, et al. Align before fuse: vision and language representation learning with momentum distillation[C]// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 9694-9705. | 
																													
																						| 16 | BAO H, WANG W, DONG L, et al. VLMo: unified vision-language pre-training with mixture-of-modality-experts[C]// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 32897-32912. | 
																													
																						| 17 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg: ACL, 2019: 4171-4186. | 
																													
																						| 18 | DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009: 248-255. | 
																													
																						| 19 | JIA M, XIE C, JING L. Debiasing multimodal sarcasm detection with contrastive learning[EB/OL]. [2024-07-04]. . | 
																													
																						| 20 | LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach[C]// Proceedings of the 20th Chinese National Conference on Computational Linguistics. Beijing: Chinese Information Processing Society of China, 2021: 1218-1227. | 
																													
																						| 21 | DEMSZKY D, MOVSHOVITZ-ATTIAS D, KO J, et al. GoEmotions: a dataset of fine-grained emotions[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 4040-4054. | 
																													
																						| 22 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. | 
																													
																						| 23 | ZHANG S, ZHENG D, HU X, et al. Bidirectional long short-term memory networks for relation classification[C]// Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. Stroudsburg: ACL, 2015: 73-78. | 
																													
																						| 24 | PAN H, LIN Z, FU P, et al. Modeling intra and inter-modality incongruity for multi-modal sarcasm detection[C]// Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg: ACL, 2020: 1383-1392. | 
																													
																						| 25 | LIANG B, LOU C, LI X, et al. Multi-modal sarcasm detection with interactive in-modal and cross-modal graphs[C]// Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 4707-4715. | 
																													
																						| 26 | XU N, ZENG Z, MAO W. Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 3777-3786. | 
																													
																						| 27 | YU J, JIANG J. Adapting BERT for target-oriented multimodal sentiment classification[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers Inc., 2019: 5408-5414. |