《计算机应用》唯一官方网站

• •    下一篇

因果干预下的多模态命名实体识别

孟佳娜,白晨皓,赵迪,王博林,高临霖   

  1. 大连民族大学
  • 收稿日期:2024-11-29 修回日期:2025-03-25 发布日期:2025-04-08 出版日期:2025-04-08
  • 通讯作者: 赵迪
  • 基金资助:
    基于知识图谱的中华文化互联网智慧传播研究;政务微博对公共突发事件网络谣言的舆论引导研究;人工智能视域下政务微博中大众情感风险识别研究

Multimodal Named Entity Recognition Under Causal Intervention

  • Received:2024-11-29 Revised:2025-03-25 Online:2025-04-08 Published:2025-04-08

摘要: 随着人工智能技术的不断进步,多模态命名实体识别任务逐渐成为了一个研究热点。这项任务旨在从文本和图像的联合数据中识别出具有特定意义的实体。然而,当前的方法在处理数据偏差和模态差距这两个问题时,仍然存在不足。数据偏差会导致有害的偏差误导注意力模块关注训练数据中的虚假相关性,从而损害模型的泛化能力。而模态差距则会阻碍文本和图像之间建立正确的语义对齐,影响模型的性能。为解决这两个问题,本文提出了一种因果干预下的多模态命名实体识别方法(CMNER),该方法利用因果干预理论,在文本模态中使用后门干预处理可观测到的混杂因素,在图像模态使用前门因果干预处理不可直接观测到的混杂因素,以此减轻数据偏差带来的有害影响。同时,结合互信息相关理论,拉近文本和图像之间的语义“距离”。通过文中的实验结果表明,本文提出的方法可以有效的缓解数据偏差和模态差距,提升多模态命名实体识别任务的性能。

关键词: 多模态命名实体识别, 因果干预, 互信息, 数据偏差, 模态差距

Abstract: With the continuous advancement of artificial intelligence technology, multimodal named entity recognition has gradually become a research hotspot. This task aims to identify entities with specific meanings from the joint data of text and images. However, current methods still have shortcomings in dealing with the two problems of data bias and modality gap. Data bias can cause harmful biases to mislead the attention module to focus on false correlations in the training data, thereby damaging the generalization ability of the model. The modality gap will hinder the establishment of correct semantic alignment between text and image, affecting the performance of the model. To solve these two problems, this paper proposes a multimodal named entity recognition method under causal intervention (CMNER). This method uses causal intervention theory to use backdoor intervention in the text modality to deal with observable confounding factors, and uses frontdoor causal intervention in the image modality to deal with confounding factors that cannot be directly observed, so as to mitigate the harmful effects of data bias. At the same time, combined with the mutual information correlation theory, the semantic "distance" between text and image is shortened. The experimental results in this paper show that the method proposed in this paper can effectively alleviate data bias and modality gap and improve the performance of multimodal named entity recognition tasks.

Key words: multimodal named entity recognition, causal intervention, mutual information, data bias, modal gap

中图分类号: