《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (5): 1398-1406.DOI: 10.11772/j.issn.1001-9081.2021030512
收稿日期:
2021-04-04
修回日期:
2021-06-02
接受日期:
2021-06-03
发布日期:
2022-06-11
出版日期:
2022-05-10
通讯作者:
何宁
作者简介:
孙琪翔(1994-),男,黑龙江大兴安岭人,硕士研究生,主要研究方向:数字图像处理、计算机视觉基金资助:
Qixiang SUN1, Ning HE2(), Jingzun ZHANG2, Chen HONG1
Received:
2021-04-04
Revised:
2021-06-02
Accepted:
2021-06-03
Online:
2022-06-11
Published:
2022-05-10
Contact:
Ning HE
About author:
HONG Chen, born in 1974,Ph. D.,associate professor. Hisresearch interests include multimedia information processing.Supported by:
摘要:
人体姿态估计是计算机视觉中的基本任务之一,可应用于动作识别、游戏、动画制作等领域。当前深度网络模型的设计大多通过加深网络以获得更好的性能,结果导致计算资源的需求超出嵌入式设备和移动设备的计算能力,达不到实际应用要求。针对上述问题,提出了一种融合Ghost模块结构的轻量级网络模型,即使用Ghost模块替换原高分辨率网络中的基础模块,从而减少网络模型的参数量。此外,设计了非局部高分辨率网络,即在网络1/32分辨率阶段融合非局部网络模块,使网络具有获取全局特征的能力,从而提高人体姿态估计的准确率,并在保证模型准确率的前提下降低网络参数量。在MPII人体姿态估计数据集和COCO人体姿态估计数据集上的实验结果表明,所提网络模型与原高分辨率网络相比,在网络模型参数量降低40%的情况下,人体姿态估计准确率提升了1.8个百分点。
中图分类号:
孙琪翔, 何宁, 张敬尊, 宏晨. 基于非局部高分辨率网络的轻量化人体姿态估计方法[J]. 计算机应用, 2022, 42(5): 1398-1406.
Qixiang SUN, Ning HE, Jingzun ZHANG, Chen HONG. Lightweight human pose estimation method based on non-local high-resolution network[J]. Journal of Computer Applications, 2022, 42(5): 1398-1406.
方法 | 身体各部位准确率 | 平均准确率 | ||||||
---|---|---|---|---|---|---|---|---|
头 | 肩 | 肘部 | 手腕 | 臀部 | 膝盖 | 脚踝 | ||
文献[ | 96.5 | 96.0 | 90.3 | 85.4 | 88.8 | 85.0 | 81.9 | 89.2 |
文献[ | 96.5 | 96.0 | 90.4 | 86.0 | 89.5 | 85.2 | 82.3 | 89.6 |
文献[ | 95.6 | 95.9 | 90.7 | 86.5 | 89.9 | 86.6 | 82.5 | 89.8 |
文献[ | 97.1 | 95.9 | 90.3 | 86.4 | 89.1 | 87.1 | 83.3 | 90.3 |
本文方法 | 97.3 | 96.0 | 90.9 | 86.8 | 89.2 | 87.5 | 83.0 | 90.5 |
表1 MPII验证集上的实验结果(PCKh@0.5) ( %)
Tab. 1 Experimental results on MPII validation set (PCKh@0.5)
方法 | 身体各部位准确率 | 平均准确率 | ||||||
---|---|---|---|---|---|---|---|---|
头 | 肩 | 肘部 | 手腕 | 臀部 | 膝盖 | 脚踝 | ||
文献[ | 96.5 | 96.0 | 90.3 | 85.4 | 88.8 | 85.0 | 81.9 | 89.2 |
文献[ | 96.5 | 96.0 | 90.4 | 86.0 | 89.5 | 85.2 | 82.3 | 89.6 |
文献[ | 95.6 | 95.9 | 90.7 | 86.5 | 89.9 | 86.6 | 82.5 | 89.8 |
文献[ | 97.1 | 95.9 | 90.3 | 86.4 | 89.1 | 87.1 | 83.3 | 90.3 |
本文方法 | 97.3 | 96.0 | 90.9 | 86.8 | 89.2 | 87.5 | 83.0 | 90.5 |
方法 | 骨干网络 | 输入大小 | 网络参数量/106 | 计算力/GFLOPS | 不同评价标准下的准确率/% | AR/% | ||||
---|---|---|---|---|---|---|---|---|---|---|
AP | AP50 | AP75 | APM | APL | ||||||
Stacked Hourglass[ | Hourglass | 256×192 | 25.1 | 14.3 | 66.9 | — | — | — | — | — |
CPN[ | ResNet-50 | 256×192 | 27.0 | 6.2 | 68.6 | — | — | — | — | — |
CPN+OHKM[ | ResNet-50 | 256×192 | 27.0 | 6.2 | 69.4 | — | — | — | — | — |
SimpleBaseline[ | ResNet-50 | 256×192 | 34.0 | 8.9 | 70.4 | 88.6 | 78.3 | 67.1 | 77.2 | 76.3 |
ResNet-101 | 256×192 | 53.0 | 12.4 | 71.4 | 89.3 | 79.3 | 68.1 | 78.1 | 77.1 | |
ResNet-152 | 256×192 | 68.6 | 15.7 | 72.0 | 89.3 | 79.8 | 68.7 | 78.9 | 77.8 | |
HRNet W32[ | HRNet-W32 | 256×192 | 28.5 | 7.1 | 74.4 | 90.5 | 81.9 | 70.8 | 81.0 | 79.8 |
HRNet W48[ | HRNet-W48 | 256×192 | 63.6 | 14.6 | 75.1 | 90.6 | 82.2 | 71.5 | 81.8 | 80.4 |
本文方法W32 | NGHRNet-W32 | 256×192 | 14.6 | 4.3 | 76.2 | 92.6 | 83.6 | 72.0 | 81.7 | 81.1 |
本文方法W48 | NGHRNet-W48 | 256×192 | 33.5 | 7.87 | 76.9 | 93.2 | 83.9 | 72.6 | 82.0 | 81.9 |
SimpleBaseline[ | ResNet-152 | 384×288 | 68.6 | 35.6 | 74.3 | 89.6 | 81.1 | 70.5 | 79.7 | 79.7 |
HRNet W32[ | HRNet-W32 | 384×288 | 28.5 | 16.0 | 75.8 | 90.6 | 82.7 | 71.9 | 82.8 | 81.0 |
HRNet W48[ | HRNet-W48 | 384×288 | 63.6 | 32.9 | 76.3 | 90.8 | 82.9 | 72.3 | 83.4 | 81.2 |
本文方法W32 | NGHRNet-W32 | 384×288 | 14.6 | 8.4 | 77.6 | 92.9 | 83.2 | 73.1 | 83.5 | 81.9 |
本文方法W48 | NGHRNet-W48 | 384×288 | 33.5 | 17.1 | 78.0 | 93.2 | 84.4 | 74.1 | 84.4 | 82.0 |
表2 不同方法在COCO数据集上的性能对比
Tab. 2 Performance comparison of different methods on COCO dataset
方法 | 骨干网络 | 输入大小 | 网络参数量/106 | 计算力/GFLOPS | 不同评价标准下的准确率/% | AR/% | ||||
---|---|---|---|---|---|---|---|---|---|---|
AP | AP50 | AP75 | APM | APL | ||||||
Stacked Hourglass[ | Hourglass | 256×192 | 25.1 | 14.3 | 66.9 | — | — | — | — | — |
CPN[ | ResNet-50 | 256×192 | 27.0 | 6.2 | 68.6 | — | — | — | — | — |
CPN+OHKM[ | ResNet-50 | 256×192 | 27.0 | 6.2 | 69.4 | — | — | — | — | — |
SimpleBaseline[ | ResNet-50 | 256×192 | 34.0 | 8.9 | 70.4 | 88.6 | 78.3 | 67.1 | 77.2 | 76.3 |
ResNet-101 | 256×192 | 53.0 | 12.4 | 71.4 | 89.3 | 79.3 | 68.1 | 78.1 | 77.1 | |
ResNet-152 | 256×192 | 68.6 | 15.7 | 72.0 | 89.3 | 79.8 | 68.7 | 78.9 | 77.8 | |
HRNet W32[ | HRNet-W32 | 256×192 | 28.5 | 7.1 | 74.4 | 90.5 | 81.9 | 70.8 | 81.0 | 79.8 |
HRNet W48[ | HRNet-W48 | 256×192 | 63.6 | 14.6 | 75.1 | 90.6 | 82.2 | 71.5 | 81.8 | 80.4 |
本文方法W32 | NGHRNet-W32 | 256×192 | 14.6 | 4.3 | 76.2 | 92.6 | 83.6 | 72.0 | 81.7 | 81.1 |
本文方法W48 | NGHRNet-W48 | 256×192 | 33.5 | 7.87 | 76.9 | 93.2 | 83.9 | 72.6 | 82.0 | 81.9 |
SimpleBaseline[ | ResNet-152 | 384×288 | 68.6 | 35.6 | 74.3 | 89.6 | 81.1 | 70.5 | 79.7 | 79.7 |
HRNet W32[ | HRNet-W32 | 384×288 | 28.5 | 16.0 | 75.8 | 90.6 | 82.7 | 71.9 | 82.8 | 81.0 |
HRNet W48[ | HRNet-W48 | 384×288 | 63.6 | 32.9 | 76.3 | 90.8 | 82.9 | 72.3 | 83.4 | 81.2 |
本文方法W32 | NGHRNet-W32 | 384×288 | 14.6 | 8.4 | 77.6 | 92.9 | 83.2 | 73.1 | 83.5 | 81.9 |
本文方法W48 | NGHRNet-W48 | 384×288 | 33.5 | 17.1 | 78.0 | 93.2 | 84.4 | 74.1 | 84.4 | 82.0 |
阶段序号 | 网络参数量/106 | 计算力/GFLOPS | AP/% |
---|---|---|---|
0 | 29.1 | 7.1 | 76.7 |
1 | 22.9 | 6.0 | 76.5 |
1、2 | 18.8 | 5.1 | 76.4 |
1、2、3 | 16.0 | 4.6 | 76.3 |
1、2、3、4 | 14.6 | 4.3 | 76.2 |
表3 网络轻量化消融实验结果
Tab. 3 Ablation experimental results of network lightweight
阶段序号 | 网络参数量/106 | 计算力/GFLOPS | AP/% |
---|---|---|---|
0 | 29.1 | 7.1 | 76.7 |
1 | 22.9 | 6.0 | 76.5 |
1、2 | 18.8 | 5.1 | 76.4 |
1、2、3 | 16.0 | 4.6 | 76.3 |
1、2、3、4 | 14.6 | 4.3 | 76.2 |
阶段序号 | 网络参数量/106 | 计算力/GFLOPS | AP/% |
---|---|---|---|
0 | 14.3 | 4.3 | 74.2 |
1 | 14.3 | 4.3 | 74.4 |
2 | 14.4 | 4.3 | 75.2 |
3 | 14.5 | 4.3 | 75.5 |
4 | 14.6 | 4.3 | 76.2 |
表4 非局部网络模块消融实验结果
Tab. 4 Ablation experimental results of non-local network module
阶段序号 | 网络参数量/106 | 计算力/GFLOPS | AP/% |
---|---|---|---|
0 | 14.3 | 4.3 | 74.2 |
1 | 14.3 | 4.3 | 74.4 |
2 | 14.4 | 4.3 | 75.2 |
3 | 14.5 | 4.3 | 75.5 |
4 | 14.6 | 4.3 | 76.2 |
1 | 王冉.基于深度卷积神经网络的人体姿势估计研究[D].成都:电子科技大学,2016:11-12. |
WANG R. A research of human pose estimation based on deep convolutional neural network [D]. Chengdu: University of Electronic Science and Technology of China, 2016: 11-12. | |
2 | YAN S J, XIONG Y J, LIN D H. Spatial temporal graph convolutional networks for skeleton-based action recognition [C]// Proceedings of the 2018 32nd AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018:7444-7452. |
3 | FANG H S, XIE S Q, TAI Y W, et al. RMPE: regional multi-person pose estimation [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2353-2362. 10.1109/iccv.2017.256 |
4 | CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7103-7112. 10.1109/cvpr.2018.00742 |
5 | HUANG S L, GONG M M, TAO D C. A coarse-fine network for keypoint localization [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 3047-3056. 10.1109/iccv.2017.329 |
6 | XIAO B, WU H P, WEI Y C. Simple baselines for human pose estimation and tracking [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11210. Cham: Springer, 2018:472-487. |
7 | NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation [C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9912. Cham: Springer, 2016: 483-499. |
8 | YANG W, LI S, OUYANG W L, et al. Learning feature pyramids for human pose estimation [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1290-1299. 10.1109/iccv.2017.144 |
9 | PISHCHULIN L, INSAFUTDINOV E, TANG S Y, et al. DeepCut: joint subset partition and labeling for multi person pose estimation [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 4929-4937. 10.1109/cvpr.2016.533 |
10 | IQBAL U, GALL J. Multi-person pose estimation with local joint-to-person associations [C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9914. Cham: Springer, 2016: 627-642. |
11 | INSAFUTDINOV E, PISHCHULIN L, ANDRES B, et al. DeeperCut: a deeper, stronger, and faster multi-person pose estimation model [C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9910. Cham: Springer, 2016: 34-50. |
12 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016:770-778. 10.1109/cvpr.2016.90 |
13 | LEVINKOV E, UHRIG J, TANG S Y, et al. Joint graph decomposition & node labeling: problem, algorithms, applications [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1904-1912. 10.1109/cvpr.2017.206 |
14 | VARADARAJAN S, DATTA P, TICKOO O. A greedy part assignment algorithm for real-time multi-person 2D pose estimation [C]// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2018: 418-426. 10.1109/wacv.2018.00052 |
15 | CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1302-1310. 10.1109/cvpr.2017.143 |
16 | NEWELL A, HUANG Z A, DENG J. Associative embedding: end-to-end learning for joint detection and grouping [C]// Proceedings of the 2017 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 2274-2284. |
17 | XIA F, WANG P, CHEN X, et al. Joint multi-person pose estimation and semantic part segmentation [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6769-6778. 10.1109/cvpr.2017.644 |
18 | PAPANDREOU G, ZHU T, CHEN L C, et al. PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11218. Cham: Springer, 2018: 282-299. |
19 | TANG W, YU P, WU Y. Deeply learned compositional models for human pose estimation [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11207. Cham: Springer, 2018: 197-214. |
20 | 孙琪翔,何宁,张聪聪,等.轻量级图卷积人体骨架动作识别方法[J/OL].计算机工程.[2021-06-25].. |
SUN Q X, HE N, ZHANG C C, et al. A lightweight graph convolution human skeleton action recognition method [J/OL]. Computer Engineering. [2021-06-25]. . | |
21 | VANHOUCKE V, SENIOR A, MAO M Z. Improving the speed of neural networks on CPUs [EB/OL]. [2021-02-12]. . |
22 | GONG Y C, LIU L, YANG M, et al. Compressing deep convolutional networks using vector quantization [EB/OL]. [2021-06-25]. . |
23 | COURBARIAUX M, BENGIO Y, DAVID J P. BinaryConnect: training deep neural networks with binary weights during propagations [C]// Proceedings of the 2015 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 3123-3131. |
24 | COURBARIAUX M, HUBARA I, SOUDRY D, et al. Binarized neural networks: training deep neural networks with weights and activations constrained to + 1 or - 1 [EB/OL]. [2021-06-25]. . |
25 | RASTEGARI M, ORDONEZ V, REDMON J, et al. XNOR-Net: ImageNet classification using binary convolutional neural networks [C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9908. Cham: Springer, 2016: 525-542. |
26 | HANSON S J, PRATT L Y. Comparing biases for minimal network construction with back-propagation [C]// Proceedings of the 1988 1st International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1988: 177-185. |
27 | 李小夏,李孝安.一种改进的神经网络相关性剪枝算法[J].电子设计工程,2013,21(8): 65-67. 10.3969/j.issn.1674-6236.2013.08.020 |
LI X X, LI X A. An improved correlation pruning algorithm for artificial neural network [J]. Electronic Design Engineering, 2013, 21(8): 65-67. 10.3969/j.issn.1674-6236.2013.08.020 | |
28 | 赵蓉,唐楚淇,刘伟林,等.一种新的基于灰色关联分析的BP神经网络剪枝算法[J].科技创新与应用,2016(13):17-18. |
ZHAO R, TANG C Q, LIU W L, et al. A new BP neural network pruning algorithm based on grey relational analysis [J]. Technological Innovation and Application, 2016(13): 17-18. | |
29 | IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size [EB/OL]. [2021-06-25]. . |
30 | HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications [EB/OL]. [2021-06-25]. . 10.1109/cvpr.2018.00286 |
31 | SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5686-5696. 10.1109/cvpr.2019.00584 |
32 | HAN K, WANG Y H, TIAN Q, et al. GhostNet: more features from cheap operations [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 1577-1586. 10.1109/cvpr42600.2020.00165 |
33 | WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7794-7803. 10.1109/cvpr.2018.00813 |
34 | ANDRILUKA M, PISHCHULIN L, GEHLER P, et al. 2D human pose estimation: new benchmark and state of the art analysis [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 3686-3693. 10.1109/cvpr.2014.471 |
35 | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context [C]// Proceedings of the 2014 European Conference on Computer Vision, LNCS 8693. Cham: Springer, 2014: 740-755. |
36 | BUADES A, COLL B, MOREL J M. A non-local algorithm for image denoising [C]// Proceeding of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2005: 60-65. 10.1109/cvpr.2005.38 |
37 | KINGMA D P, BA J L. Adam: a method for stochastic optimization [EB/OL]. [2021-06-25]. . |
[1] | 强保华, 翟艺杰, 陈金龙, 谢武, 郑虹, 王学文, 张世豪. 基于改进CPMs和SqueezeNet的轻量级人体骨骼关键点检测模型[J]. 计算机应用, 2020, 40(6): 1806-1811. |
[2] | 邓雄, 王洪春. 基于深度学习和特征融合的人脸活体检测算法[J]. 计算机应用, 2020, 40(4): 1009-1015. |
[3] | 叶志宇, 冯爱民, 高航. 基于深度LightGBM集成学习模型的谷歌商店顾客购买力预测[J]. 计算机应用, 2019, 39(12): 3434-3439. |
[4] | 韩贵金. 基于支持向量机与模糊k-均值算法的部位外观模型[J]. 计算机应用, 2015, 35(7): 2043-2046. |
[5] | 田瑞琴, 吴尽昭, 唐鼎. 物联网网关中轻量化规则引擎的设计与实现[J]. 计算机应用, 2015, 35(4): 1035-1039. |
[6] | 白宇 郭显娥. 中缀算术表达式的轻量化求值算法[J]. 计算机应用, 2013, 33(11): 3163-3166. |
[7] | 殷明强 李世其. 保持外观的CAD模型轻量化技术[J]. 计算机应用, 2013, 33(06): 1719-1722. |
[8] | 宋树彬 王能 . 无线传感器网络上超轻量化的IPv6协议栈[J]. 计算机应用, 2007, 27(10): 2556-2558. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||