Journal of Computer Applications

    Next Articles

Multimodal Fact Verification with Cross-Modal Semantic Association

  

  • Received:2025-05-14 Revised:2025-07-14 Accepted:2025-08-08 Online:2025-08-22 Published:2025-08-22
  • Supported by:
    the Fundamental Research Funds for the Central Universities

跨模态语义关联的多模态事实验证

刘欢娴1,王洪涛2,王宪奥2,王洪梅1,徐伟峰2   

  1. 1. 华北电力大学
    2. 华北电力大学(保定)
  • 通讯作者: 徐伟峰
  • 基金资助:
    中央高校基本科研业务费专项

Abstract: Multimodal fact verification aims to classify the veracity of claims by leveraging multimodal evidence, addressing the limitations of traditional unimodal methods in handling complex claims. However, existing approaches face challenges in bridging semantic gaps between multimodal evidence and between claims and evidence during feature fusion. To address this, the proposed method introduces a cross-modal semantic association approach, achieving cross-level semantic alignment and adaptive feature interaction through a cross-modal attention mechanism, effectively mitigating semantic discrepancies among multi-source information and improving classification performance for complex claims. In the evidence retrieval phase, textual evidence is retrieved based on the claim and used to filter semantically related image evidence, ensuring high relevance of multimodal evidence. During verification, the method employs the Contrastive Language–Image Pretraining (CLIP) model for semantic alignment between text and multimodal evidence and designs a claim-evidence joint attention module to enhance semantic associations among claim text, textual evidence, and image evidence. Experimental results on the MOCHEG and CEAD datasets show significant improvements in accuracy, recall, and F1 score, demonstrating the method's effectiveness in multimodal fact verification.

Key words: multimodal fact verification, semantic association, attention mechanism, Contrastive Language–Image Pretraining (CLIP), feature fusion

摘要: 多模态事实验证旨在利用多模态证据对声明的真实性进行分类,解决了传统单模态事实验证方法在处理复杂声明时的局限性。然而,现有研究在多模态特征融合中面临多模态证据间以及声明与证据间的语义差异问题。与现有工作不同,该文提出了一种跨模态语义关联的多模态事实验证方法,实现跨层次语义对齐与自适应特征交互,通过构建模态间注意力机制,有效弥补多源信息间的语义鸿沟,提升复杂声明验证的分类性能。在证据检索阶段,通过声明文本检索相关的文本证据,并进一步利用文本证据筛选语义相关的图像证据,以确保多模态证据的高相关性。在声明验证阶段,该文利用CLIP模型实现文本与多模态证据的语义对齐,并设计了声明-证据联合注意力模块,进一步强化声明文本、文本证据和图像证据三者之间的语义关联。实验结果表明,该方法在MOCHEG及CEAD数据集上的准确率、召回率和F1分数均显著优于现有方法,验证了其在多模态事实验证任务中的有效性。

关键词: 多模态事实验证, 语义关联, 注意力机制, CLIP模型, 特征融合

CLC Number: