Journal of Computer Applications

    Next Articles

Human-object interaction detection method based on class decoupling feature enhancement

YE Qing, YANG Tao, ZHANG Yongmei   

  1. School of Artificial Intelligence and Computer Science ,North China University of Technology
  • Received:2025-10-09 Revised:2025-12-28 Online:2026-03-16 Published:2026-03-16
  • About author:YE Qing, born in 1977, Ph. D., associate professor. Her research interests include artificial intelligence, image processing, pattern recognition, intelligent video surveillance. YANG Tao, born in 2002, M. S. candidate. His research interests include image processing, pattern recognition. ZHANG Yongmei, born in 1967, Ph.D., professor. Her research interests include artificial intelligence, image processing, pattern recognition.
  • Supported by:
    Planning Fund Project of Humanities and Social Sciences Research of the Ministry of Education (24YJA880097)

基于类解耦特征增强的人物交互检测方法

叶青,杨涛,张永梅   

  1. 北方工业大学 人工智能与计算机学院
  • 通讯作者: 叶青
  • 作者简介:叶青(1977—),女,河北保定人,副教授,博士,主要研究方向:人工智能、图像处理、模式识别、智能视频监控;杨涛(2002—),男,河北廊坊人,硕士研究生,主要研究方向:图像处理、模式识别;张永梅(1967—),女,山西太原人,教授,博士,CCF高级会员,主要研究方向:人工智能、图像处理、模式识别。
  • 基金资助:
    教育部人文社会科学研究规划基金资助项目(24YJA880097)

Abstract: Aiming at the problems existing in current human-object interaction detection methods, such as insufficient feature expression and utilization, and weak perception and discrimination ability of interaction instances with fewer training samples, this paper proposed a human-object interaction detection method based on class decoupling feature enhancement. To address the issue of insufficient feature expression, a Focus-Diffusion Feature Enhancement Network (FDFENet) was proposed, which adaptively enhances the extracted middle and high-level features to improve the expression ability of the features in the model. For the problem of low detection accuracy for difficult-to-classify interaction categories, a Feature Enhancement Algorithm based on Class Decoupling (FEACD) was proposed.  This algorithm first utilizes the vision-semantic fusion module to fully fuse visual features and semantic features. After decoupling the fused visual features and semantic features, three types of features, namely human, object and action, are obtained, and the similarity between them is calculated based on the corresponding visual and semantic features. Based on the obtained similarity, loss functions for the three categories were respectively designed, and feedback was added to each category during the training process.  In addition, focus loss was added to the loss function in this method. Focus loss helps the model pay more attention to difficult-to-classify samples and reduces the focus and enhancement on easy-to-classify samples. Experimental results show that the proposed method achieves the highest mean Average Precision (mAP) under the Scenario 1 setting of the standard human-object interaction detection dataset V-COCO, as well as under the Full (Default) setting of the HICO-DET dataset, demonstrating its effectiveness.

Key words: human-object interaction detection, focus diffusion feature enhancement, semantic feature, cross attention, class decoupling feature enhancement 

摘要: 针对当前人物交互检测方法中存在的特征表达与利用不充分,以及对训练样本稀少的交互实例感知与判别能力较弱的问题,提出一种基于类解耦特征增强的人物交互检测方法。针对特征表达不充分的问题,提出了一种聚焦扩散特征增强网络(FDFENet),该网络对提取到的中层和高层特征进行自适应增强,以提高特征在模型当中的表达能力。针对难分类的交互类别的检测准确率较低的问题,提出了基于类解耦的特征增强算法(FEACD)。该算法首先利用视觉-语义融合模块充分融合视觉特征和语义特征,将融合后的视觉特征和语义特征经过解耦后得到人体、物体和动作三类特征,并根据对应的视觉和语义特征计算两者之间的相似度。根据得到的相似度分别设计3个类别的损失函数,在训练过程中给每个类别增加反馈。此外,在损失函数中增加了焦点损失,焦点损失有助于模型更关注难分类的样本,降低对易分类样本的关注和增强程度。实验结果表明,所提方法在标准人物交互检测数据集V-COCO的Scenario1场景下的平均均值精度(mAP)、在HICO-DET数据集Default配置下的Full指标下的mAP都取得了最优结果,验证了方法的有效性。

关键词: 人物交互检测, 聚焦扩散特征增强, 语义特征, 交叉注意力, 类解耦特征增强

CLC Number: