Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (8): 2407-2414.DOI: 10.11772/j.issn.1001-9081.2021061103
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Received:
2021-06-29
Revised:
2021-09-21
Accepted:
2021-09-28
Online:
2022-08-09
Published:
2022-08-10
Contact:
Qing HOU
About author:
LI Kun, born in 1997, M. S. candidate. His research interests include image processing, computer vision.Supported by:
通讯作者:
侯庆
作者简介:
李坤(1997—),男,山东潍坊人,硕士研究生,主要研究方向:图像处理、计算机视觉;基金资助:
CLC Number:
Kun LI, Qing HOU. Lightweight human pose estimation based on attention mechanism[J]. Journal of Computer Applications, 2022, 42(8): 2407-2414.
李坤, 侯庆. 基于注意力机制的轻量型人体姿态估计[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2407-2414.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021061103
模型 | 基础框架 | 输入尺寸 | 参数量/106 | 浮点运算量/GFLOPs | mAP/% | AP50/% | AP75/% | APM/% | APL/% | AR/% |
---|---|---|---|---|---|---|---|---|---|---|
Hourglass | Hourglass | 256×192 | 25.1 | 14.3 | 66.9 | — | — | — | — | — |
CPN[ | ResNet-50 | 256×192 | 27.0 | 6.2 | 68.6 | — | — | — | — | — |
CPN+OHKM | ResNet-50 | 256×192 | 27.0 | 6.2 | 69.4 | — | — | — | — | — |
SimpleBaseline[ | ResNet-50 | 256×192 | 34.0 | 8.9 | 70.4 | 88.6 | 78.3 | 67.1 | 77.2 | 76.3 |
HRNet | HRNet | 256×192 | 28.5 | 7.1 | 73.4 | 89.5 | 80.7 | 70.2 | 80.1 | 78.9 |
Lite-HRNet[ | Lite-HRNet-18 | 256×192 | 1.1 | 0.2 | 64.8 | 86.7 | 73.0 | 62.1 | 70.5 | 71.2 |
Lite-HRNet | Lite-HRNet-30 | 256×192 | 1.8 | 0.3 | 67.2 | 88.0 | 75.0 | 64.3 | 73.1 | 73.3 |
SCANet | HRNet | 256×192 | 13.5 | 2.8 | 72.3 | 90.0 | 79.6 | 69.3 | 79.1 | 78.0 |
Tab. 1 Performance comparison on COCO validation set
模型 | 基础框架 | 输入尺寸 | 参数量/106 | 浮点运算量/GFLOPs | mAP/% | AP50/% | AP75/% | APM/% | APL/% | AR/% |
---|---|---|---|---|---|---|---|---|---|---|
Hourglass | Hourglass | 256×192 | 25.1 | 14.3 | 66.9 | — | — | — | — | — |
CPN[ | ResNet-50 | 256×192 | 27.0 | 6.2 | 68.6 | — | — | — | — | — |
CPN+OHKM | ResNet-50 | 256×192 | 27.0 | 6.2 | 69.4 | — | — | — | — | — |
SimpleBaseline[ | ResNet-50 | 256×192 | 34.0 | 8.9 | 70.4 | 88.6 | 78.3 | 67.1 | 77.2 | 76.3 |
HRNet | HRNet | 256×192 | 28.5 | 7.1 | 73.4 | 89.5 | 80.7 | 70.2 | 80.1 | 78.9 |
Lite-HRNet[ | Lite-HRNet-18 | 256×192 | 1.1 | 0.2 | 64.8 | 86.7 | 73.0 | 62.1 | 70.5 | 71.2 |
Lite-HRNet | Lite-HRNet-30 | 256×192 | 1.8 | 0.3 | 67.2 | 88.0 | 75.0 | 64.3 | 73.1 | 73.3 |
SCANet | HRNet | 256×192 | 13.5 | 2.8 | 72.3 | 90.0 | 79.6 | 69.3 | 79.1 | 78.0 |
模型 | 基础框架 | 输入尺寸 | 参数量/106 | 浮点运算量/GFLOPs | mAP/% | AP50/% | AP75/% | APM/% | APL/% | AR/% |
---|---|---|---|---|---|---|---|---|---|---|
HRNet | HRNet | 384×288 | 28.5 | 16.0 | 74.9 | 92.5 | 82.8 | 71.3 | 80.9 | 80.1 |
SCANet | HRNet | 384×288 | 13.5 | 6.2 | 72.8 | 92.6 | 80.7 | 69.8 | 79.9 | 79.0 |
Tab. 2 Performance comparison on COCO test set
模型 | 基础框架 | 输入尺寸 | 参数量/106 | 浮点运算量/GFLOPs | mAP/% | AP50/% | AP75/% | APM/% | APL/% | AR/% |
---|---|---|---|---|---|---|---|---|---|---|
HRNet | HRNet | 384×288 | 28.5 | 16.0 | 74.9 | 92.5 | 82.8 | 71.3 | 80.9 | 80.1 |
SCANet | HRNet | 384×288 | 13.5 | 6.2 | 72.8 | 92.6 | 80.7 | 69.8 | 79.9 | 79.0 |
模型 | 参数量/106 | 浮点运算量/GFLOPs | 预测关键点的准确率/% | |||||||
---|---|---|---|---|---|---|---|---|---|---|
头部 | 肩部 | 肘部 | 手腕 | 臀部 | 膝盖 | 脚踝 | 平均 | |||
Hourglass | 25.1 | 19.1 | 96.5 | 95.3 | 88.4 | 82.5 | 87.1 | 83.5 | 78.3 | 87.5 |
SimpleBaseline | 68.6 | 20.9 | 96.7 | 95.4 | 88.6 | 82.9 | 87.5 | 83.8 | 79.0 | 87.9 |
HRNet | 28.5 | 9.5 | 97.0 | 95.5 | 90.0 | 85.2 | 88.1 | 85.1 | 81.0 | 89.3 |
Lite-HRNet-18 | 1.1 | 0.3 | — | — | — | — | — | — | — | 86.1 |
Lite-HRNet-30 | 1.8 | 0.4 | — | — | — | — | — | — | — | 87.0 |
SCANet | 13.5 | 3.7 | 97.2 | 95.4 | 89.9 | 83.7 | 88.9 | 84.6 | 79.8 | 88.7 |
Tab. 3 Performance comparison on MPII validation set (PCKh@0.5)
模型 | 参数量/106 | 浮点运算量/GFLOPs | 预测关键点的准确率/% | |||||||
---|---|---|---|---|---|---|---|---|---|---|
头部 | 肩部 | 肘部 | 手腕 | 臀部 | 膝盖 | 脚踝 | 平均 | |||
Hourglass | 25.1 | 19.1 | 96.5 | 95.3 | 88.4 | 82.5 | 87.1 | 83.5 | 78.3 | 87.5 |
SimpleBaseline | 68.6 | 20.9 | 96.7 | 95.4 | 88.6 | 82.9 | 87.5 | 83.8 | 79.0 | 87.9 |
HRNet | 28.5 | 9.5 | 97.0 | 95.5 | 90.0 | 85.2 | 88.1 | 85.1 | 81.0 | 89.3 |
Lite-HRNet-18 | 1.1 | 0.3 | — | — | — | — | — | — | — | 86.1 |
Lite-HRNet-30 | 1.8 | 0.4 | — | — | — | — | — | — | — | 87.0 |
SCANet | 13.5 | 3.7 | 97.2 | 95.4 | 89.9 | 83.7 | 88.9 | 84.6 | 79.8 | 88.7 |
模型 | 参数量/106 | 浮点运算量/GFLOPs | 预测关键点的准确率/% | |||||||
---|---|---|---|---|---|---|---|---|---|---|
头部 | 肩部 | 肘部 | 手腕 | 臀部 | 膝盖 | 脚踝 | 平均 | |||
HRNet | 28.5 | 9.5 | 98.3 | 96.6 | 92.5 | 88.1 | 91.2 | 88.0 | 84.2 | 91.7 |
SCANet | 13.5 | 3.7 | 98.5 | 96.4 | 92.4 | 86.8 | 91.8 | 86.7 | 83.1 | 91.0 |
Tab. 4 Performance comparison on MPII test set (PCKh@0.5)
模型 | 参数量/106 | 浮点运算量/GFLOPs | 预测关键点的准确率/% | |||||||
---|---|---|---|---|---|---|---|---|---|---|
头部 | 肩部 | 肘部 | 手腕 | 臀部 | 膝盖 | 脚踝 | 平均 | |||
HRNet | 28.5 | 9.5 | 98.3 | 96.6 | 92.5 | 88.1 | 91.2 | 88.0 | 84.2 | 91.7 |
SCANet | 13.5 | 3.7 | 98.5 | 96.4 | 92.4 | 86.8 | 91.8 | 86.7 | 83.1 | 91.0 |
模型 | 参数量/106 | 浮点运算量/GFLOPs | 平均准确率/% |
---|---|---|---|
HRNet | 28.5 | 9.5 | 89.3 |
SCANet | 13.5 | 3.7 | 88.7 |
SCANet(无注意力) | 9.1 | 3.6 | 88.0 |
Tab. 5 Ablation experiment
模型 | 参数量/106 | 浮点运算量/GFLOPs | 平均准确率/% |
---|---|---|---|
HRNet | 28.5 | 9.5 | 89.3 |
SCANet | 13.5 | 3.7 | 88.7 |
SCANet(无注意力) | 9.1 | 3.6 | 88.0 |
1 | FISCHLER M A, ELSCHLAGER R A. The representation and matching of pictorial structures[J]. IEEE Transactions on Computers, 1973, C-22(1): 67-92. 10.1109/t-c.1973.223602 |
2 | KIEFEL M, GEHLER P V. Human pose estimation with fields of parts [C]// Proceedings of the 2014 European Conference on Computer Vision, LNCS 8693. Cham: Springer, 2014: 331-346. |
3 | TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 1653-1660. 10.1109/cvpr.2014.214 |
4 | SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5686-5696. 10.1109/cvpr.2019.00584 |
5 | NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation [C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9912. Cham: Springer, 2016: 483-499. |
6 | ZHOU D Q, HOU Q B, CHEN Y P, et al. Rethinking bottleneck structure for efficient mobile network design [C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12348. Cham: Springer, 2020: 680-697. |
7 | HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design [C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 13708-13717. 10.1109/cvpr46437.2021.01350 |
8 | 王丹峰,陈超波,马天力,等.基于深度可分离卷积的YOLOv3行人检测算法[J].计算机应用与软件, 2020, 37(6): 218-223. 10.3969/j.issn.1000-386x.2020.06.038 |
WANG D F, CHEN C B, MA T L, et al. YOLOv3 pedestrian detection algorithm based on depth-wise separable convolution[J]. Computer Applications and Software, 2020, 37(6): 218-223. 10.3969/j.issn.1000-386x.2020.06.038 | |
9 | 董永昌,单玉刚,袁杰.基于改进SSD算法的行人检测方法[J].计算机工程与设计, 2020, 41(10): 2921-2926. 10.16208/j.issn1000-7024.2020.10.037 |
DONG Y C, SHAN Y G, YUAN J. Pedestrain detection based on improved SSD[J]. Computer Engineering and Design, 2020, 41(10): 2921-2926. 10.16208/j.issn1000-7024.2020.10.037 | |
10 | HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2980-2988. 10.1109/iccv.2017.322 |
11 | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015: 91-99. |
12 | CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1302-1310. 10.1109/cvpr.2017.143 |
13 | WEI S E, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 4724-4732. 10.1109/cvpr.2016.511 |
14 | SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 4510-4520. 10.1109/cvpr.2018.00474 |
15 | MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNet V2: practical guidelines for efficient CNN architecture design [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11218. Cham: Springer, 2018: 122-138. |
16 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. 10.1109/cvpr.2018.00745 |
17 | WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11211. Cham: Springer, 2018: 3-19. |
18 | 肖振久,杨晓迪,魏宪,等.改进的轻量型网络在图像识别上的应用[J].计算机科学与探索, 2021, 15(4): 743-753. 10.3778/j.issn.1673-9418.2004057 |
XIAO Z J, YANG X D, WEI X, et al. Improved lightweight network in image recognition[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(4): 743-753. 10.3778/j.issn.1673-9418.2004057 | |
19 | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2012: 1097-1105. |
20 | ANDRILUKA M, PISHCHULIN L, GEHLER P, et al. 2D human pose estimation:new benchmark and state of the art analysis[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 3686-3693. 10.1109/cvpr.2014.471 |
21 | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]// Proceedings of the 2018 European Conference on Computer Vision. Cham: Springer, 2014: 740-755. 10.1007/978-3-319-10602-1_48 |
22 | CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7103-7112. 10.1109/cvpr.2018.00742 |
23 | XIAO B, WU H P, WEI Y C. Simple baselines for human pose estimation and tracking [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11210. Cham: Springer, 2018: 472-487. |
24 | YU C Q, XIAO B, GAO C X, et al. Lite-HRNet: a lightweight high-resolution network [C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 10435-10445. 10.1109/cvpr46437.2021.01030 |
[1] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[2] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[3] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[4] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[5] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[6] | Rui SHI, Yong LI, Yanhan ZHU. Adversarial sample attack algorithm of modulation signal based on equalization of feature gradient [J]. Journal of Computer Applications, 2024, 44(8): 2521-2527. |
[7] | Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594. |
[8] | Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617. |
[9] | Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109. |
[10] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[11] | Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182. |
[12] | Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191. |
[13] | Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232. |
[14] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[15] | Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||