Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (12): 3796-3803.DOI: 10.11772/j.issn.1001-9081.2024111681

• Artificial intelligence • Previous Articles     Next Articles

Multimodal named entity recognition under causal intervention

Jiana MENG, Chenhao BAI, Di ZHAO, Bolin WANG, Linlin GAO   

  1. Computer Science and Engineering College,Dalian Minzu University,Dalian Liaoning 116600,China
  • Received:2024-12-02 Revised:2025-03-24 Accepted:2025-04-01 Online:2025-04-08 Published:2025-12-10
  • Contact: Di ZHAO
  • About author:MENG Jiana, born in 1972, Ph. D., professor. Her research interests include machine learning, text mining.
    BAI Chenhao, born in 1998, M. S. candidate. His research interests include multimodal named entity recognition.
    ZHAO Di, born in 1991, Ph. D., lecturer. His research interests include data mining, natural language processing.
    WANG Bolin, born in 1993, Ph. D., lecturer. Her research interests include text representation, biomedical text mining, semantic similarity calculation.
    GAO Linlin, born in 2001, M. S. candidate. His research interests include multimodal fake news detection.
  • Supported by:
    Humanities and Social Sciences Research Planning Fund of Ministry of Education(23YJA860010);Social Sciences Planning Fund of Liaoning Province in 2024(L24BTQ002);Research Project of Dalian Academy of Social Sciences in 2024(2024dlsky024)

因果干预下的多模态命名实体识别

孟佳娜, 白晨皓, 赵迪, 王博林, 高临霖   

  1. 大连民族大学 计算机科学与工程学院,辽宁 大连 116600
  • 通讯作者: 赵迪
  • 作者简介:孟佳娜(1972—),女,吉林四平人,教授,博士,CCF会员,主要研究方向:机器学习、文本挖掘
    白晨皓(1998—),男,河北邢台人,硕士研究生,主要研究方向:多模态命名实体识别
    赵迪(1991—),男,吉林四平人,讲师,博士,CCF会员,主要研究方向:数据挖掘、自然语言处理
    王博林(1993—),女,辽宁铁岭人,讲师,博士,CCF会员,主要研究方向:文本表示、生物医学文本挖掘、语义相似度计算
    高临霖(2001—),男,山东菏泽人,硕士研究生,主要研究方向:多模态虚假新闻检测。
  • 基金资助:
    教育部人文社会科学研究规划基金资助项目(23YJA860010);辽宁省2024年度社科规划基金资助项目(L24BTQ002);大连市社科院2024年度调研课题(2024dlsky024)

Abstract:

Multimodal Named Entity Recognition (MNER) task aims to recognize entities with specific meanings from the joint data of text and images. However, current methods have shortcomings in dealing with the two problems of data bias and modality gap. The data bias can cause harmful biases to mislead the attention module to focus on false correlations in the training data, thereby damaging generalization ability of the model; and the modality gap will hinder the establishment of correct semantic alignment between text and image, thereby affecting performance of the model. A method of Multimodal Named Entity Recognition under Causal intervention (CMNER) was proposed to solve these two problems. In the method, causal intervention theory was utilized to use backdoor intervention in the text modality to deal with observable confounding factors, and use frontdoor causal intervention in the image modality to deal with confounding factors that cannot be observed directly, so as to mitigate the harmful effects of data bias. At the same time, the Mutual Information (MI) correlation theory was combined to shorten the semantic “distance” between text and image. The entity recognition performance of the proposed method was verified in the multimodal domain. Experimental results on the Twitter-2015 and Twitter-2017 datasets show that CMNER method has the F1-scores reached 76.00% and 88.60%, respectively. Compared with the sub-optimal method, they are improved by 0.58 and 0.53 percentage points, respectively, achieving the optimal level. It can be seen that CMNER method can alleviate data bias and reduce modality gap effectively, thereby enhancing the performance of MNER tasks.

Key words: Multimodal Named Entity Recognition (MNER), causal intervention, Mutual Information (MI), data bias, modality gap

摘要:

多模态命名实体识别(MNER)任务旨在从文本和图像的联合数据中识别出具有特定意义的实体;然而,当前的方法在处理数据偏差和模态差距这2个问题时存在不足。数据偏差会导致有害的偏差误导注意力模块关注训练数据中的虚假相关性,从而损害模型的泛化能力;模态差距则会阻碍文本和图像之间建立正确的语义对齐,从而影响模型的性能。为了解决这2个问题,提出一种因果干预下的多模态命名实体识别(CMNER)方法。该方法利用因果干预理论,在文本模态中使用后门干预处理可观测到的混杂因素,在图像模态使用前门因果干预处理不可直接观测到的混杂因素,以此减轻数据偏差带来的有害影响;同时,结合互信息(MI)相关理论,拉近文本和图像之间的语义“距离”。在多模态领域中验证所提方法的实体识别效果,在数据集Twitter-2015和Twitter-2017上的实验结果表明,CMNER方法的F1分数分别达到了76.00%和88.60%,与次优方法相比分别提高了0.58和0.53个百分点,达到最优水平。可见,CMNER方法可以有效缓解数据偏差和缩小模态差距,进而提升MNER任务的性能。

关键词: 多模态命名实体识别, 因果干预, 互信息, 数据偏差, 模态差距

CLC Number: