Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (2): 419-425.DOI: 10.11772/j.issn.1001-9081.2021071184

• Artificial intelligence • Previous Articles     Next Articles

Multi-modal deep fusion for false information detection

Jie MENG1, Li WANG1(), Yanjie YANG1, Biao LIAN2   

  1. 1.College of Data Science,Taiyuan University of Technology,Taiyuan Shanxi 030600,China
    2.North Automatic Control Technology Institute,Taiyuan Shanxi 030006,China
  • Received:2021-07-09 Revised:2021-07-18 Accepted:2021-07-21 Online:2022-02-11 Published:2022-02-10
  • Contact: Li WANG
  • About author:MENG Jie, born in 1994, M. S. candidate. His research interests include natural language processing, false information detection.
    WANG Li, born in 1971, Ph. D, professor. Her research interests include big data computing and analysis, data mining.
    YANG Yanjie, born in 1995, M. S. candidate. His research interests include natural language processing, data mining.
    LIAN Biao, born in 1987, M. S. His research interests include software development, data mining.
  • Supported by:
    National Natural Science Foundation of China(61872260)

基于多模态深度融合的虚假信息检测

孟杰1, 王莉1(), 杨延杰1, 廉飚2   

  1. 1.太原理工大学 大数据学院,太原 030600
    2.北方自动控制技术研究所,太原 030006
  • 通讯作者: 王莉
  • 作者简介:孟杰(1994—),男,山西长治人,硕士研究生,主要研究方向:自然语言处理、虚假信息检测;
    王莉(1971—),女,山西太原人,教授,博士,CCF高级会员,主要研究方向:大数据计算与分析、数据挖掘;
    杨延杰(1995—),男,山西原平人,硕士研究生,主要研究方向:自然语言处理、数据挖掘;
    廉飚(1987—),男,山西太原人,硕士,主要研究方向:软件开发、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61872260)

Abstract:

Concerning the problem of insufficient image feature extraction and ignorance of single-modal internal relations and the interactions between single-modal and multi-modal, a text and image information based Multi-Modal Deep Fusion (MMDF) model was proposed. Firstly, the Bi-Gated Recurrent Unit (Bi-GRU) was used to extract the rich semantic features of the text, and the multi-branch Convolutional-Recurrent Neural Network (CNN-RNN) was used to extract the multi-level features of the image. Then the inter-modal and intra-modal attention mechanisms were established to capture the high-level interaction between the fields of language and vision, and the multi-modal joint representation was obtained. Finally, the original representation of each modal and the fused multi-modal joint representation were re-fused according to their attention weights to strengthen the role of the original information. Compared with the Multimodal Variational AutoEncoder (MVAE) model, the proposed model has the accuracy improved by 1.9 percentage points and 2.4 percentage points on the China Computer Federation (CCF) competition and the Weibo datasets respectively. Experimental results show that the proposed model can fully fuse multi-modal information and effectively improve the accuracy of false information detection.

Key words: false information detection, multi-modal fusion, Bi-directional Gated Recurrent Unit (Bi-GRU), attention mechanism, joint representation

摘要:

针对虚假信息检测中图片特征提取不充分,以及忽视了单模内关系以及单模与多模之间交互作用的问题,提出一种基于文本和图片信息的多模态深度融合(MMDF)模型。首先,用双向门控循环单元(Bi-GRU)提取文本的丰富语义特征,用多分支卷积-循环神经网络(CNN-RNN)提取图片的多层次特征;然后,建立模间和模内的注意力机制以捕获语言和视觉领域之间的高层交互,并得到多模态的联合表征;最后,将各模态原表征与融合后的多模态联合表征依据注意力权重进行再融合,以加强原信息的作用。该模型与多模态变分自动编码器(MVAE)模型相比,在中国计算机学会(CCF)竞赛和微博数据集上的准确率分别提升了1.9个百分点和2.4个百分点。实验结果表明,所提模型能够充分融合多模态信息,有效提高虚假信息检测的准确率。

关键词: 虚假信息检测, 多模态融合, 双向门控循环单元, 注意力机制, 联合表征

CLC Number: