基于多维频域特征融合的人物交互检测

doi:10.11772/j.issn.1001-9081.2025020241

《计算机应用》唯一官方网站

• • 下一篇

基于多维频域特征融合的人物交互检测

樊跃波¹,陈明轩¹,汤显¹,高永彬¹,李文超²

1. 上海工程技术大学
2. 上海工程技术大学电子电气工程学院

收稿日期:2025-03-10 修回日期:2025-05-10 发布日期:2025-06-10 出版日期:2025-06-10
通讯作者: 陈明轩

Multi-dimensional frequency domain feature fusion for human-object interaction detection

Received:2025-03-10 Revised:2025-05-10 Online:2025-06-10 Published:2025-06-10

摘要/Abstract

摘要： 人物交互检测任务旨在检测图像中所有人和物体之间的交互关系。目前的研究大多采用编码器-解码器结构进行端到端的训练，但这通常依赖于绝对位置编码，且在复杂的多对象交互场景中表现有限。针对现有方法依赖绝对位置编码难以有效捕捉人与物体相对空间关系，以及在复杂多对象交互场景中局部与全局信息整合不足的问题，提出一种结合跨维度交互特征提取与频域特征融合的新型人物交互检测模型。该模型首先改进了传统的Transformer编码器，额外引入了一种位置编码，通过与绝对位置编码的融合，使其能够对人与物体之间的相对关系进行建模。其次引入一种新的特征提取模块来加强图像信息的整合，通过跨维度交互捕捉图像中通道、空间和特征维度的交互特征，提升信息表达能力，同时利用离散余弦变换提取频域特征，捕捉更丰富的局部与全局信息。最后结合Wise-IoU损失函数提升检测精度与类别区分能力，使得模型可以更加灵活地处理不同类别的目标。实验在HICO-DET和V-COCO两个公开数据集上进行，结果表明，与GEN-VLKT模型相比，本文模型在HICO-DET数据集全部种类上的mAP提升了0.95个百分点，在VCOCO数据集场景1上的AP提升了0.9个百分点。

关键词: 人物交互检测, 目标检测, 相对位置编码, 频域特征, 离散余弦变换

Abstract: The task of human-object interaction detection aims to identify all interactions between humans and objects in an image. Most existing approaches employ an encoder-decoder framework for end-to-end training, which heavily relies on absolute positional encoding and performs suboptimally in complex multi-object interaction scenarios. To address the limitations of capturing relative spatial relationships between humans and objects due to reliance on absolute positional encoding, as well as the insufficient integration of local and global information, a novel human-object interaction detection model was proposed by combining cross dimensional interaction feature extraction with frequency-domain feature fusion. The model improves the conventional Transformer encoder by incorporating an additional positional encoding scheme. Through the fusion of relative and absolute positional encodings, the model is enabled to capture the spatial relationships between humans and objects. Furthermore, a new feature extraction module was introduced to enhance image information representation by capturing interactions across channel, spatial, and feature dimensions, while discrete cosine transform was applied to extract frequency-domain features for richer local and global information representation. Finally, the Wise-IoU loss function was adopted to improve detection accuracy and class discrimination capability, allowing the model to flexibly handle targets of various categories. Experiments conducted on two public datasets, HICO-DET and V-COCO, show that the proposed model achieves an improvement of 0.95 percentage points in mAP on the full set of HICO-DET and 0.9 percentage points in AP on scenario 1 of the V-COCO dataset, compared to the GEN-VLKT model.

Key words: human-object interaction detection, object detection, relative position encoding, Frequency domain characteristics, discrete cosine transform

中图分类号:

TP391.41

樊跃波陈明轩汤显高永彬李文超. 基于多维频域特征融合的人物交互检测[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2025020241.

[1]	张嘉祥, 李晓明, 张佳慧. 结合新类特征增强与度量机制的小样本目标检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2984-2992.
[2]	魏利利, 闫丽蓉, 唐晓芬. 上下文语义表征和像素关系纠正的小样本目标检测[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2993-3002.
[3]	谢斌红, 剌颖坤, 张英俊, 张睿. 自步学习指导下的半监督目标检测框架[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2546-2554.
[4]	颜承志, 陈颖, 钟凯, 高寒. 基于多尺度网络与轴向注意力的3D目标检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2537-2545.
[5]	廖炎华, 鄢元霞, 潘文林. 基于YOLOv9的交通路口图像的多目标检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2555-2565.
[6]	陈亮, 王璇, 雷坤. 复杂场景下跨层多尺度特征融合的安全帽佩戴检测算法[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2333-2341.
[7]	张子墨, 赵雪专. 多尺度稀疏图引导的视觉图神经网络[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2188-2194.
[8]	于平平, 闫玉婷, 唐心亮, 苏鹤, 王建超. 输电线路场景下的施工机械多目标跟踪算法[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2351-2360.
[9]	范博淦, 王淑青, 陈开元. 基于改进YOLOv8的航拍无人机小目标检测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2342-2350.
[10]	张英俊, 闫薇薇, 谢斌红, 张睿, 陆望东. 梯度区分与特征范数驱动的开放世界目标检测[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2203-2210.
[11]	蒋沛宇, 王永光, 任亚亭, 李硕晨, 谭火彬. 基于测量不确定度表示指南的红外目标检测不确定度测量方案[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2162-2168.
[12]	张李伟, 梁泉, 胡禹涛, 朱乔乐. 基于分组卷积的通道重洗注意力机制[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1069-1076.
[13]	侯阳, 张琼, 赵紫煊, 朱正宇, 张晓博. 基于YOLOv5s的复杂场景下高效烟火检测算法YOLOv5s-MRD[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1317-1324.
[14]	赵轻轻, 胡滨. 不变性全局稀疏轮廓点表征的运动行人检测神经网络[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1271-1284.
[15]	张传浩, 屠晓涵, 谷学汇, 轩波. 基于多模态信息相互引导补充的雷达-相机三维目标检测[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 946-952.

基于多维频域特征融合的人物交互检测

Multi-dimensional frequency domain feature fusion for human-object interaction detection

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics