基于类解耦特征增强的人物交互检测方法

doi:10.11772/j.issn.1001-9081.2025091174

《计算机应用》唯一官方网站

• • 下一篇

基于类解耦特征增强的人物交互检测方法

叶青,杨涛,张永梅

北方工业大学人工智能与计算机学院

收稿日期:2025-10-09 修回日期:2025-12-28 发布日期:2026-03-16 出版日期:2026-03-16
通讯作者: 叶青
作者简介:叶青(1977—)，女，河北保定人，副教授，博士，主要研究方向：人工智能、图像处理、模式识别、智能视频监控；杨涛(2002—)，男，河北廊坊人，硕士研究生，主要研究方向：图像处理、模式识别；张永梅(1967—)，女，山西太原人，教授，博士，CCF高级会员，主要研究方向：人工智能、图像处理、模式识别。
基金资助:
教育部人文社会科学研究规划基金资助项目（24YJA880097）

Human-object interaction detection method based on class decoupling feature enhancement

YE Qing, YANG Tao, ZHANG Yongmei

School of Artificial Intelligence and Computer Science ,North China University of Technology

Received:2025-10-09 Revised:2025-12-28 Online:2026-03-16 Published:2026-03-16
About author:YE Qing, born in 1977, Ph. D., associate professor. Her research interests include artificial intelligence, image processing, pattern recognition, intelligent video surveillance. YANG Tao, born in 2002, M. S. candidate. His research interests include image processing, pattern recognition. ZHANG Yongmei, born in 1967, Ph.D., professor. Her research interests include artificial intelligence, image processing, pattern recognition.
Supported by:
Planning Fund Project of Humanities and Social Sciences Research of the Ministry of Education (24YJA880097)

摘要/Abstract

摘要： 针对当前人物交互检测方法中存在的特征表达与利用不充分，以及对训练样本稀少的交互实例感知与判别能力较弱的问题，提出一种基于类解耦特征增强的人物交互检测方法。针对特征表达不充分的问题，提出了一种聚焦扩散特征增强网络（FDFENet），该网络对提取到的中层和高层特征进行自适应增强，以提高特征在模型当中的表达能力。针对难分类的交互类别的检测准确率较低的问题，提出了基于类解耦的特征增强算法（FEACD）。该算法首先利用视觉-语义融合模块充分融合视觉特征和语义特征，将融合后的视觉特征和语义特征经过解耦后得到人体、物体和动作三类特征，并根据对应的视觉和语义特征计算两者之间的相似度。根据得到的相似度分别设计3个类别的损失函数，在训练过程中给每个类别增加反馈。此外，在损失函数中增加了焦点损失，焦点损失有助于模型更关注难分类的样本，降低对易分类样本的关注和增强程度。实验结果表明，所提方法在标准人物交互检测数据集V-COCO的Scenario1场景下的平均均值精度（mAP）、在HICO-DET数据集Default配置下的Full指标下的mAP都取得了最优结果，验证了方法的有效性。

关键词: 人物交互检测, 聚焦扩散特征增强, 语义特征, 交叉注意力, 类解耦特征增强

Abstract: Aiming at the problems existing in current human-object interaction detection methods, such as insufficient feature expression and utilization, and weak perception and discrimination ability of interaction instances with fewer training samples, this paper proposed a human-object interaction detection method based on class decoupling feature enhancement. To address the issue of insufficient feature expression, a Focus-Diffusion Feature Enhancement Network (FDFENet) was proposed, which adaptively enhances the extracted middle and high-level features to improve the expression ability of the features in the model. For the problem of low detection accuracy for difficult-to-classify interaction categories, a Feature Enhancement Algorithm based on Class Decoupling (FEACD) was proposed. This algorithm first utilizes the vision-semantic fusion module to fully fuse visual features and semantic features. After decoupling the fused visual features and semantic features, three types of features, namely human, object and action, are obtained, and the similarity between them is calculated based on the corresponding visual and semantic features. Based on the obtained similarity, loss functions for the three categories were respectively designed, and feedback was added to each category during the training process. In addition, focus loss was added to the loss function in this method. Focus loss helps the model pay more attention to difficult-to-classify samples and reduces the focus and enhancement on easy-to-classify samples. Experimental results show that the proposed method achieves the highest mean Average Precision (mAP) under the Scenario 1 setting of the standard human-object interaction detection dataset V-COCO, as well as under the Full (Default) setting of the HICO-DET dataset, demonstrating its effectiveness.

Key words: human-object interaction detection, focus diffusion feature enhancement, semantic feature, cross attention, class decoupling feature enhancement

中图分类号:

TP183

叶青杨涛张永梅. 基于类解耦特征增强的人物交互检测方法[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2025091174.

YE Qing, YANG Tao, ZHANG Yongmei. Human-object interaction detection method based on class decoupling feature enhancement[J]. Journal of Computer Applications, DOI: 10.11772/j.issn.1001-9081.2025091174.

[1]	喇孝伟, 胡立华, 胡建华, 姚晓玲, 王欣波. 融合位置编码和重叠掩模的低重叠点云配准网络[J]. 《计算机应用》唯一官方网站, 2026, 46(2): 536-545.
[2]	樊跃波, 陈明轩, 汤显, 高永彬, 李文超. 基于多维频域特征融合的人物交互检测[J]. 《计算机应用》唯一官方网站, 2026, 46(2): 580-586.
[3]	李亚男, 郭梦阳, 邓国军, 陈允峰, 任建吉, 原永亮. 基于多模态融合特征的并分支发动机寿命预测方法[J]. 《计算机应用》唯一官方网站, 2026, 46(1): 305-313.
[4]	梁一鸣, 范菁, 柴汶泽. 基于双向交叉注意力的多尺度特征融合情感分类[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2773-2782.
[5]	王艺涵, 路翀, 陈忠源. 跨模态文本信息增强的多模态情感分析模型[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2237-2244.
[6]	周浩, 王超, 崔国恒, 罗廷金. 基于多语义关联与融合的视觉问答模型[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 739-745.
[7]	李维刚, 曹文杰, 李金灵. 基于自适应邻域特征融合的多阶段点云补全网络[J]. 《计算机应用》唯一官方网站, 2025, 45(10): 3294-3301.
[8]	唐媛, 陈艳平, 扈应, 黄瑞章, 秦永彬. 基于多尺度混合注意力卷积神经网络的关系抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2011-2017.
[9]	郭磊, 贾真, 李天瑞. 面向方面级情感分析的交互式关系图注意力网络[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 696-701.
[10]	王红斌, 房晓, 江虹. 融入三维语义特征的常识推理问答方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 138-144.
[11]	陈丽安, 过弋. 融合个体偏差信息的文本情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 145-151.
[12]	李卓然, 冶忠林, 赵海兴, 林晶晶. 基于混合特征建模的图卷积网络方法[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3354-3363.
[13]	周险兵, 樊小超, 任鸽, 杨勇. 基于多层次语义特征的英文作文自动评分方法[J]. 计算机应用, 2021, 41(8): 2205-2211.
[14]	邓钰, 李晓瑜, 崔建, 刘齐. 用于短文本情感分类的多头注意力记忆网络[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3132-3138.
[15]	谭金源, 刁宇峰, 祁瑞华, 林鸿飞. 基于BERT-PGN模型的中文新闻文本自动摘要生成[J]. 计算机应用, 2021, 41(1): 127-132.

基于类解耦特征增强的人物交互检测方法

Human-object interaction detection method based on class decoupling feature enhancement

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics