《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (4): 1069-1076.DOI: 10.11772/j.issn.1001-9081.2025050526

• 人工智能 • 上一篇    下一篇

跨模态语义关联的多模态事实验证

刘欢娴1, 王洪涛1,2, 王宪奥1, 王洪梅1, 徐伟峰1,3()   

  1. 1.华北电力大学 计算机系,河北 保定 071003
    2.复杂能源系统智能计算教育部工程研究中心(华北电力大学),河北 保定 071003
    3.河北省能源电力知识计算重点实验室(华北电力大学),河北 保定 071003
  • 收稿日期:2025-05-14 修回日期:2025-07-14 接受日期:2025-08-08 发布日期:2025-08-22 出版日期:2026-04-10
  • 通讯作者: 徐伟峰
  • 作者简介:刘欢娴(2001—),女,吉林白山人,硕士研究生,主要研究方向:自然语言处理
    王洪涛(1983—),男,山东济宁人,副教授,博士,CCF会员,主要研究方向:自然语言处理、人工智能安全、隐私计算、知识计算
    王宪奥(2001—),男,山东济宁人,硕士研究生,主要研究方向:人工智能安全
    王洪梅(1981—),女,黑龙江哈尔滨人,讲师,硕士,主要研究方向:统计机器学习、概率图理论、数值计算
  • 基金资助:
    中央高校基本科研业务费专项资金资助项目(2023JC006)

Multimodal fact verification with cross-modal semantic association

Huanxian LIU1, Hongtao WANG1,2, Xian’ao WANG1, Hongmei WANG1, Weifeng XU1,3()   

  1. 1.Department of Computer,North China Electric Power University,Baoding Hebei 071003,China
    2.Engineering Research Center of Intelligent Computing for Complex Energy Systems,Ministry of Education (North China Electric Power University),Baoding Hebei 071003,China
    3.Hebei Key Laboratory of Knowledge Computing for Energy and Power (North China Electric Power University),Baoding Hebei 071003,China
  • Received:2025-05-14 Revised:2025-07-14 Accepted:2025-08-08 Online:2025-08-22 Published:2026-04-10
  • Contact: Weifeng XU
  • About author:LIU Huanxian, born in 2001, M. S. candidate. Her research interests include natural language processing.
    WANG Hongtao, born in 1983, Ph. D., associate professor. His research interests include natural language processing, AI security, privacy computing, knowledge computing.
    WANG Xian’ao, born in 2001, M. S. candidate. His research interests include AI security.
    WANG Hongmei, born in 1981, M. S., lecturer. Her research interests include statistical machine learning, probabilistic graph theory, numerical computation.
  • Supported by:
    Fundamental Research Funds for the Central Universities(2023JC006)

摘要:

针对在多模态特征融合中面临多模态证据间以及声明与证据间的语义差异问题,提出一种跨模态语义关联(CMSA)的多模态事实验证(MFV)方法,以实现跨层次语义对齐与自适应特征交互,有效弥补多源信息间的语义鸿沟,提升复杂声明验证的分类性能。在证据检索阶段,通过文本声明检索相关的文本证据,并进一步利用文本证据筛选语义相关的图像证据,以确保多模态证据的高相关性;在声明验证阶段,利用CLIP(Contrastive Language-Image Pretraining)模型实现文本与多模态证据的语义对齐,并设计声明?证据联合注意力(LCEA)模块,进一步强化文本声明、文本证据和图像证据三者之间的语义关联。实验结果表明,CMSA在公开数据集及自建数据集CEAD(Cross-modal Evidence Augmented Dataset)上的F1分数分别比MOCHEG模型分别至少提升了7.27%和6.65%,验证了它在MFV任务中的有效性。

关键词: 多模态事实验证, 语义关联, 注意力机制, CLIP模型, 特征融合

Abstract:

Semantic differences between multimodal evidences and among claims and evidences during feature fusion were addressed through a proposed Cross-Modal Semantic Association (CMSA)-based Multimodal Fact Verification (MFV) method, so as to realize cross-level semantic alignment and adaptive feature interaction, thereby eliminating semantic gaps across multi-source information, and enhancing classification performance of complex claim verification. During evidence retrieval, relevant textual evidence was retrieved from claim text, and semantically related image evidence was further filtered using the textual evidence, so as to ensure high cross-modal relevance. During claim verification, semantic alignment between text and multimodal evidences was achieved using the CLIP (Contrastive Language-Image Pretraining) model, and a Linked Claim and Evidence Attention (LCEA) module was designed, so as to reinforce semantic associations among claim text, textual evidence, and image evidence. Experimental results show that CMSA improves the F1 score on the public and self-constructed datasets CEAD (Cross-modal Evidence Augmented Dataset) by 7.27% and 6.65% at least, respectively, compared to the MOCHEG model, demonstrating its effectiveness in MFV tasks.

Key words: Multimodal Fact Verification (MFV), semantic association, attention mechanism, CLIP (Contrastive Language-Image Pretraining) model, feature fusion

中图分类号: