《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (4): 1077-1085.DOI: 10.11772/j.issn.1001-9081.2025050563

• 人工智能 • 上一篇    下一篇

基于文本图像双通道特征门控融合机制的多模态事件抽取

王德龙1,2, 汪颢懿1,2(), 张青川1,2, 宋泽羲1,2   

  1. 1.北京工商大学 农产品质量安全追溯技术及应用国家工程研究中心,北京 100048
    2.北京工商大学 计算机与人工智能学院,北京 100048
  • 收稿日期:2025-05-26 修回日期:2025-08-07 接受日期:2025-08-08 发布日期:2025-08-15 出版日期:2026-04-10
  • 通讯作者: 汪颢懿
  • 作者简介:王德龙(2000—),男,山东青岛人,硕士研究生,主要研究方向:自然语言处理、深度学习
    张青川(1982—),男,河北石家庄人,副教授,博士,主要研究方向:自然语言处理、信息抽取
    宋泽羲(2001—),男,北京人,硕士研究生,主要研究方向:自然语言处理、深度学习。
  • 基金资助:
    国家市场监督管理总局科技计划项目(2023MK169);北京市属高校教师队伍建设支持计划高水平科研创新团队项目(BPHR20220104);北京学者计划项目(099)

Multimodal event extraction based on text-image dual-channel feature gated fusion mechanism

Delong WANG1,2, Haoyi WANG1,2(), Qingchuan ZHANG1,2, Zexi SONG1,2   

  1. 1.National Engineering Research Centre for Agri-product Quality Traceability,Beijing Technology and Business University,Beijing 100048,China
    2.School of Computer and Artificial Intelligence,Beijing Technology and Business University,Beijing 100048,China
  • Received:2025-05-26 Revised:2025-08-07 Accepted:2025-08-08 Online:2025-08-15 Published:2026-04-10
  • Contact: Haoyi WANG
  • About author:WANG Delong, born in 2000, M. S. candidate. His research interests include natural language processing, deep learning.
    ZHANG Qingchuan, born in 1982, Ph. D., associate professor. His research interests include natural language processing, information extraction.
    SONG Zexi, born in 2001, M. S. candidate. His research interests include natural language processing, deep learning.
  • Supported by:
    Technology Project of the State Administration for Market Regulation(2023MK169);Project of Construction and Support for High-level Innovative Teams of Beijing Municipal Institutions(BPHR20220104);Beijing Scholars Program(099)

摘要:

为提升多模态事件抽取方法中不同模态特征之间的对齐精度与融合效率,增强模型对图像与文本语义关系的理解能力,提出一种基于双通道“文本?图像”特征门控融合机制的多模态事件抽取模型MEE-DF(Multimodal Event Extraction based on Dual-channel Fusion)。首先,拓展图像生成文本描述通道,挖掘图像中隐含的事件论元,完善事件抽取的信息表示;其次,构建局部约束注意力(LCCA)机制,生成几何对齐图嵌入图像信息,提取高区分度的图像特征;再次,构建基于交互注意力图的对抗门控机制,进行文本实体与图像对象的细粒度对齐;最后,使用双通道融合特征策略,筛选重要Patch特征,去除冗余信息,提高特征整合效率。在MEED和M2E2公开数据集上的实验结果表明,MEE-DF在事件类型检测任务上F1值分别达到90.9%和88.8%,在事件论元抽取(EAE)任务上F1值分别达到73.3%和68.1%,优于现有的事件抽取模型,消融实验结果进一步表明MEE-DF的各模块对事件抽取性能提升均有显著贡献。

关键词: 文本描述, 几何对齐图, 交互注意力图, 特征对齐, 对抗学习

Abstract:

In order to improve the alignment accuracy and fusion efficiency between different modal features in multimodal event extraction methods and enhance the model’s understanding of semantic relationship between images and texts, a multimodal event extraction model based on dual-channel “text-image” feature gated fusion mechanism named MEE-DF (Multimodal Event Extraction based on Dual-channel Fusion) was proposed. Firstly, the channel of generating text descriptions from images was expanded, the event arguments existed in the images implicitly were mined, and the information representation of event extraction was improved. Secondly, Locality Constrained Cross Attention (LCCA) mechanism was built, the geometric alignment graphs were generated to embed image information, and image features with high discrimination were extracted. Thirdly, an adversarial gating mechanism based on interactive attention maps was constructed to achieve fine-grained alignment of text entities and image objects. Finally, a dual-channel fusion feature strategy was used to filter important Patch features, remove redundant information, and improve feature integration efficiency. Experimental results on the MEED and the M2E2 public datasets show that MEE-DF has the F1 value reached 90.9% and 88.8% on the event type detection task, respectively, and the F1 value reached 73.3% and 68.1% on the Event Argument Extraction (EAE) task, respectively. It can be seen that MEE-DF is better than the existing event extraction models. Ablation experiments further demonstrate that each module of the proposed model has significant contribution to the improvement of event extraction performance.

Key words: text description, geometric alignment graph, interactive attention map, feature alignment, adversarial learning

中图分类号: