Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
Supported by:
姬张建,王思源
通讯作者:
基金资助:
Abstract: Abstract: Human pose estimation is a critical task in computer vision. High-resolution networks (HRNets) are widely used due to their outstanding performance. However, improvements in model accuracy are often accompanied by a surge in parameter count and computational cost. To address this issue, propose an innovative lightweight improved module, Frequency Domain Cross Attention Convolution (FCAC). First, reconstruct the basic convolutional module by fusing wavelet transform convolutions (WTconv) with depthwise separable convolutions (DSC). WTconv's enhanced receptive field compensates for DSC's lack of contextual connections, optimizing HRNet's performance while maintaining low complexity. Subsequently, an attention mechanism is employed to guide low-frequency components to learn high-frequency components, and the learning results are dynamically aggregated through a dynamic channel reweighting mechanism. Furthermore, SEnet is employed to compensate for the current wavelet convolution's learning of inter-channel feature relationships. Finally, FCAC is deployed at key nodes in the backbone network. Experiments on the COCO dataset show that although the number of parameters of the improved HRNet-W32 model increased from 28.5M to 34.8M, the computational complexity was reduced by 7.6% (from 7.1GFLOPs to 6.6GFLOPs), the AP index was improved by 3.4% (from 73.4 to 76.9), and the convergence ability of the model was greatly enhanced.
Key words: Keywords: wavelet transform, cross-attention, depthwise separable convolution, HRnet, human pose estimation
摘要: 摘 要: 人体姿态估计在计算机视觉领域中是一项关键任务,高分辨率网络(HRNet)因其出色的性能而被广泛应用,然而模型精度的提高往往伴随着参数量与计算成本的飙升。为了解决这一问题,提出了一种创新的轻量化改进模块——频域交叉注意力卷积(Frequency domain Cross Attention Convolution, FCAC),首先,通过融合小波卷积(Wavelet Transform Convolutions, WTconv)与深度可分离卷积(Depthwise Separable Convolution, DSC)重构基础卷积模块,使用WTconv增强感受野的能力弥补了DSC的上下文联系不足的问题,优化HRNet的性能的同时维持其复杂度保持在较低程度。随后,通过注意力机制,引导低频分量学习高频分量,在通过动态通道重加权机制动态聚合学习结果。此外,还使用SEnet用于弥补当前小波卷积对于通道间特征关系的学习。最后,在主干网络关键节点部署FCAC。在COCO数据集上的实验表明,改进后的HRNet-W32模型虽然参数量从28.5M升至34.8M,但是计算量降低7.6%(从7.1GFLOPs降至6.6GFLOPs),AP指标提升3.4%(从73.4提升至76.9),同时大幅增强了模型的收敛能力。
关键词: 小波变换, 交叉注意力, 深度可分离卷积, HRnet, 人体姿态估计
CLC Number:
TP391.41
姬张建 王思源. 基于增强低频的小波注意力机制的轻量化人体姿态估计框架[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025081009.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025081009