Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Multimodal event extraction based on text-image dual-channel feature gated fusion mechanism
Delong WANG, Haoyi WANG, Qingchuan ZHANG, Zexi SONG
Journal of Computer Applications    2026, 46 (4): 1077-1085.   DOI: 10.11772/j.issn.1001-9081.2025050563
Abstract58)   HTML3)    PDF (1477KB)(30)       Save

In order to improve the alignment accuracy and fusion efficiency between different modal features in multimodal event extraction methods and enhance the model’s understanding of semantic relationship between images and texts, a multimodal event extraction model based on dual-channel “text-image” feature gated fusion mechanism named MEE-DF (Multimodal Event Extraction based on Dual-channel Fusion) was proposed. Firstly, the channel of generating text descriptions from images was expanded, the event arguments existed in the images implicitly were mined, and the information representation of event extraction was improved. Secondly, Locality Constrained Cross Attention (LCCA) mechanism was built, the geometric alignment graphs were generated to embed image information, and image features with high discrimination were extracted. Thirdly, an adversarial gating mechanism based on interactive attention maps was constructed to achieve fine-grained alignment of text entities and image objects. Finally, a dual-channel fusion feature strategy was used to filter important Patch features, remove redundant information, and improve feature integration efficiency. Experimental results on the MEED and the M2E2 public datasets show that MEE-DF has the F1 value reached 90.9% and 88.8% on the event type detection task, respectively, and the F1 value reached 73.3% and 68.1% on the Event Argument Extraction (EAE) task, respectively. It can be seen that MEE-DF is better than the existing event extraction models. Ablation experiments further demonstrate that each module of the proposed model has significant contribution to the improvement of event extraction performance.

Table and Figures | Reference | Related Articles | Metrics