Human-object interaction detection method based on class decoupling feature enhancement

doi:10.11772/j.issn.1001-9081.2025091174

Journal of Computer Applications

Human-object interaction detection method based on class decoupling feature enhancement

YE Qing, YANG Tao, ZHANG Yongmei

School of Artificial Intelligence and Computer Science ,North China University of Technology

Received:2025-10-09 Revised:2025-12-28 Online:2026-03-16 Published:2026-03-16
About author:YE Qing, born in 1977, Ph. D., associate professor. Her research interests include artificial intelligence, image processing, pattern recognition, intelligent video surveillance. YANG Tao, born in 2002, M. S. candidate. His research interests include image processing, pattern recognition. ZHANG Yongmei, born in 1967, Ph.D., professor. Her research interests include artificial intelligence, image processing, pattern recognition.
Supported by:
Planning Fund Project of Humanities and Social Sciences Research of the Ministry of Education (24YJA880097)

基于类解耦特征增强的人物交互检测方法

叶青,杨涛,张永梅

北方工业大学人工智能与计算机学院

通讯作者: 叶青
作者简介:叶青(1977—)，女，河北保定人，副教授，博士，主要研究方向：人工智能、图像处理、模式识别、智能视频监控；杨涛(2002—)，男，河北廊坊人，硕士研究生，主要研究方向：图像处理、模式识别；张永梅(1967—)，女，山西太原人，教授，博士，CCF高级会员，主要研究方向：人工智能、图像处理、模式识别。
基金资助:
教育部人文社会科学研究规划基金资助项目（24YJA880097）

Abstract

Abstract: Aiming at the problems existing in current human-object interaction detection methods, such as insufficient feature expression and utilization, and weak perception and discrimination ability of interaction instances with fewer training samples, this paper proposed a human-object interaction detection method based on class decoupling feature enhancement. To address the issue of insufficient feature expression, a Focus-Diffusion Feature Enhancement Network (FDFENet) was proposed, which adaptively enhances the extracted middle and high-level features to improve the expression ability of the features in the model. For the problem of low detection accuracy for difficult-to-classify interaction categories, a Feature Enhancement Algorithm based on Class Decoupling (FEACD) was proposed. This algorithm first utilizes the vision-semantic fusion module to fully fuse visual features and semantic features. After decoupling the fused visual features and semantic features, three types of features, namely human, object and action, are obtained, and the similarity between them is calculated based on the corresponding visual and semantic features. Based on the obtained similarity, loss functions for the three categories were respectively designed, and feedback was added to each category during the training process. In addition, focus loss was added to the loss function in this method. Focus loss helps the model pay more attention to difficult-to-classify samples and reduces the focus and enhancement on easy-to-classify samples. Experimental results show that the proposed method achieves the highest mean Average Precision (mAP) under the Scenario 1 setting of the standard human-object interaction detection dataset V-COCO, as well as under the Full (Default) setting of the HICO-DET dataset, demonstrating its effectiveness.

Key words: human-object interaction detection, focus diffusion feature enhancement, semantic feature, cross attention, class decoupling feature enhancement

摘要： 针对当前人物交互检测方法中存在的特征表达与利用不充分，以及对训练样本稀少的交互实例感知与判别能力较弱的问题，提出一种基于类解耦特征增强的人物交互检测方法。针对特征表达不充分的问题，提出了一种聚焦扩散特征增强网络（FDFENet），该网络对提取到的中层和高层特征进行自适应增强，以提高特征在模型当中的表达能力。针对难分类的交互类别的检测准确率较低的问题，提出了基于类解耦的特征增强算法（FEACD）。该算法首先利用视觉-语义融合模块充分融合视觉特征和语义特征，将融合后的视觉特征和语义特征经过解耦后得到人体、物体和动作三类特征，并根据对应的视觉和语义特征计算两者之间的相似度。根据得到的相似度分别设计3个类别的损失函数，在训练过程中给每个类别增加反馈。此外，在损失函数中增加了焦点损失，焦点损失有助于模型更关注难分类的样本，降低对易分类样本的关注和增强程度。实验结果表明，所提方法在标准人物交互检测数据集V-COCO的Scenario1场景下的平均均值精度（mAP）、在HICO-DET数据集Default配置下的Full指标下的mAP都取得了最优结果，验证了方法的有效性。

关键词: 人物交互检测, 聚焦扩散特征增强, 语义特征, 交叉注意力, 类解耦特征增强

CLC Number:

TP183

YE Qing, YANG Tao, ZHANG Yongmei. Human-object interaction detection method based on class decoupling feature enhancement[J]. Journal of Computer Applications, DOI: 10.11772/j.issn.1001-9081.2025091174.

叶青杨涛张永梅. 基于类解耦特征增强的人物交互检测方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025091174.

[1]	Xiang BAI, Juchuan LI, Huimin WANG, Chao JING, Jian NIU, Xingzhong ZHANG, Yongqiang CHENG. Power image retrieval method based on improved Swin Transformer [J]. Journal of Computer Applications, 2026, 46(4): 1334-1343.
[2]	Yuebo FAN, Mingxuan CHEN, Xian TANG, Yongbin GAO, Wenchao LI. Multi-dimensional frequency domain feature fusion for human-object interaction detection [J]. Journal of Computer Applications, 2026, 46(2): 580-586.
[3]	Xiaowei LA, Lihua HU, Jianhua HU, Xiaoling YAO, Xinbo WANG. Low-overlap point cloud registration network integrating position encoding and overlap masks [J]. Journal of Computer Applications, 2026, 46(2): 536-545.
[4]	Yanan LI, Mengyang GUO, Guojun DENG, Yunfeng CHEN, Jianji REN, Yongliang YUAN. Method for life prediction of parallel branching engine based on multi-modal fusion features [J]. Journal of Computer Applications, 2026, 46(1): 305-313.
[5]	Yiming LIANG, Jing FAN, Wenze CHAI. Multi-scale feature fusion sentiment classification based on bidirectional cross attention [J]. Journal of Computer Applications, 2025, 45(9): 2773-2782.
[6]	Yihan WANG, Chong LU, Zhongyuan CHEN. Multimodal sentiment analysis model with cross-modal text information enhancement [J]. Journal of Computer Applications, 2025, 45(7): 2237-2244.
[7]	Hao ZHOU, Chao WANG, Guoheng CUI, Tingjin LUO. Visual question answering model based on association and fusion of multiple semantic features [J]. Journal of Computer Applications, 2025, 45(3): 739-745.
[8]	Weigang LI, Wenjie CAO, Jinling LI. Multi-stage point cloud completion network based on adaptive neighborhood feature fusion [J]. Journal of Computer Applications, 2025, 45(10): 3294-3301.
[9]	Yuan TANG, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN. Relation extraction model based on multi-scale hybrid attention convolutional neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2011-2017.
[10]	Lei GUO, Zhen JIA, Tianrui LI. Relational and interactive graph attention network for aspect-level sentiment analysis [J]. Journal of Computer Applications, 2024, 44(3): 696-701.
[11]	Li’an CHEN, Yi GUO. Text sentiment analysis model based on individual bias information [J]. Journal of Computer Applications, 2024, 44(1): 145-151.
[12]	Hongbin WANG, Xiao FANG, Hong JIANG. Commonsense reasoning and question answering method with three-dimensional semantic features [J]. Journal of Computer Applications, 2024, 44(1): 138-144.
[13]	Zhuoran LI, Zhonglin YE, Haixing ZHAO, Jingjing LIN. Graph convolutional network method based on hybrid feature modeling [J]. Journal of Computer Applications, 2022, 42(11): 3354-3363.
[14]	ZHOU Xianbing, FAN Xiaochao, REN Ge, YANG Yong. Automated English essay scoring method based on multi-level semantic features [J]. Journal of Computer Applications, 2021, 41(8): 2205-2211.
[15]	Yu DENG, Xiaoyu LI, Jian CUI, Qi LIU. Multi-head attention memory network for short text sentiment classification [J]. Journal of Computer Applications, 2021, 41(11): 3132-3138.

Human-object interaction detection method based on class decoupling feature enhancement

基于类解耦特征增强的人物交互检测方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics