《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (10): 3146-3153.DOI: 10.11772/j.issn.1001-9081.2024101453

• 人工智能 • 上一篇    

基于弱监督模态语义增强的多模态有害信息检测方法

刘晋文1,2,3, 王磊1,2,3(), 马博1,2,3, 董瑞1,2,3, 杨雅婷1,2,3, 艾合塔木江·艾合麦提1,2,3, 王欣乐4   

  1. 1.中国科学院 新疆理化技术研究所,乌鲁木齐 830011
    2.中国科学院大学,北京 100049
    3.新疆民族语音语言信息处理实验室,乌鲁木齐 830011
    4.河海大学,南京 210000
  • 收稿日期:2024-10-14 修回日期:2024-12-05 接受日期:2024-12-09 发布日期:2024-12-23 出版日期:2025-10-10
  • 通讯作者: 王磊
  • 作者简介:刘晋文(1999—),女,山西吕梁人,硕士研究生,主要研究方向:有害信息检测
    王磊(1974—),男,河南南阳人,研究员,博士,主要研究方向:多语言智能信息处理 Email:wanglei@ms.xjb.ac.cn
    马博(1984—),男,辽宁鞍山人,研究员,博士,CCF高级会员,主要研究方向:多语言智能信息处理
    董瑞(1985—),男,山东威海人,副研究员,博士,CCF高级会员,主要研究方向:自然语言处理、隐喻检测
    杨雅婷(1985—),女,新疆奇台人,研究员,博士,CCF高级会员,主要研究方向:多语言智能信息处理
    艾合塔木江·艾合麦提(1997—),男(维吾尔族),新疆喀什人,博士研究生,主要研究方向:多语言多模态语义建模
    王欣乐(2003—),女,河南南阳人,主要研究方向:智能科学与技术。
  • 基金资助:
    新疆维吾尔自治区自然科学基金重点项目(2023D01D17);“天山英才”科技创新领军人才项目(2022TSYCLJ0046);新疆维吾尔自治区重点研发计划项目(2023B03024);中国科学院青年创新促进会资助项目(Y2021112);新疆维吾尔自治区“天山英才”培养计划项目(2023TSYCCX0041)

Multimodal harmful content detection method based on weakly supervised modality semantic enhancement

Jinwen LIU1,2,3, Lei WANG1,2,3(), Bo MA1,2,3, Rui DONG1,2,3, Yating YANG1,2,3, Ahtamjan Ahmat1,2,3, Xinyue WANG4   

  1. 1.Xinjiang Technical Institute of Physics and Chemistry,Chinese Academy of Sciences,Urumqi Xinjiang 830011,China
    2.University of Chinese Academy of Sciences,Beijing 100049,China
    3.Xinjiang Laboratory of Minority Speech and Language Information Processing,Urumqi Xinjiang 830011,China
    4.Hohai University,Nanjing Jiangsu 210000,China
  • Received:2024-10-14 Revised:2024-12-05 Accepted:2024-12-09 Online:2024-12-23 Published:2025-10-10
  • Contact: Lei WANG
  • About author:LIU Jinwen, born in 1999, M. S. candidate. Her research interests include harmful information detection.
    WANG Lei, born in 1974, Ph. D., research fellow. His research interests include multilingual intelligent information processing.
    MA Bo, born in 1984, Ph. D., research fellow. His research interests include multilingual intelligent information processing.
    DONG Rui, born in 1985, Ph. D., associate research fellow. His research interests include natural language processing, metaphor detection.
    YANG Yating, born in 1985, Ph. D., research fellow. Her research interests include multilingual intelligent information processing.
    Ahtamjan Ahmat, born in 1997, Ph. D. candidate. His research interests include multilingual and multimodal semantic modeling.
    WANG Xinyue, born in 2003. Her research interests include intelligent science and technology.
  • Supported by:
    Key Project of Xinjiang Uygur Autonomous Region Natural Science Foundation(2023D01D17);“Tianshan Talents” Scientific and Technological Innovation Leading Talent Project(2022TSYCLJ0046);Xinjiang Uygur Autonomous Region Key Research and Development Program(2023B03024);Sciences Youth Innovation Promotion Association of Chinese Academy of Sciences(Y2021112);Xinjiang Uygur Autonomous Region “Tianshan Talents” Training Program(2023TSYCCX0041)

摘要:

社交媒体上多模态有害信息的泛滥不仅损害公众利益,还严重扰乱社会秩序,亟需有效的检测方法。现有研究依赖预训练模型提取与融合多模态特征,忽视了通用语义在有害信息检测任务中的局限性,且未能充分考虑有害信息复杂多变的组合形式。为此,提出一种基于弱监督模态语义增强的多模态有害信息检测方法(weak-S),所提方法通过引入弱监督模态信息辅助多模态特征的有害语义对齐,并设计一种低秩双线性池化的多模态门控集成机制,以区分不同信息的贡献度。实验结果表明,所提方法在Harm-P和MultiOFF数据集上的F1值相较于SOTA (State-Of-The-Art)模型分别提高了2.2和3.2个百分点,验证了弱监督模态语义在多模态有害信息检测中的重要性。此外,所提方法在多模态夸张检测任务上取得了泛化性能的提升。

关键词: 单模态弱监督, 对比学习, 门控集成, 多模态, 有害信息检测

Abstract:

Proliferation of multimodal harmful content on social media harms public interests and disrupts social order severely at the same time, highlighting the urgent need for effective detection methods of this content. The existing researches rely on pre-trained models to extract and fuse multimodal features, often neglect the limitations of general semantics in harmful content detection tasks, and fail to consider complex, dynamic combinations of harmful content. Therefore, a multimodal harmful content detection method based on weakly Supervised modality semantic enhancement (weak-S) was proposed. In the proposed method, weakly supervised modality information was introduced to facilitate the harmful semantic alignment of multimodal features, and a low-rank bilinear pooling-based multimodal gated integration mechanism was designed to differentiate the contributions of various information. Experimental results show that the proposed method achieves the F1 value improvements of 2.2 and 3.2 percentage points, respectively, on Harm-P and MultiOFF datasets, outperforming SOTA (State-Of-The-Art) models and validating the significance of weakly supervised modality semantics in multimodal harmful content detection. Additionally, the proposed method has improvement in generalization performance for multimodal exaggeration detection tasks.

Key words: unimodal weak supervision, contrastive learning, gated integration, multimodal, harmful content detection

中图分类号: