Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (6): 1989-1997.DOI: 10.11772/j.issn.1001-9081.2025050663
• Multimedia computing and computer simulation • Previous Articles
Qiuyan YIN1, Jing DING2(
), Zhigang NIE1
Received:2025-06-23
Revised:2025-09-13
Accepted:2025-09-18
Online:2025-09-25
Published:2026-06-10
Contact:
Jing DING
About author:YIN Qiuyan, born in 1998, M. S. candidate. Her research interests include pose recognition, object detection.Supported by:通讯作者:
丁婧
作者简介:尹秋燕(1998—),女,山东聊城人,硕士研究生,CCF会员,主要研究方向:姿态识别、目标检测基金资助:CLC Number:
Qiuyan YIN, Jing DING, Zhigang NIE. YOLO-AirPose: human pose estimation algorithm in UAV aerial view[J]. Journal of Computer Applications, 2026, 46(6): 1989-1997.
尹秋燕, 丁婧, 聂志刚. 无人机航拍视角下的人体姿态估计算法YOLO-AirPose[J]. 《计算机应用》唯一官方网站, 2026, 46(6): 1989-1997.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025050663
| 模块 | Box(P) | Pose(P) | Pose(R) | Pose(mAP@0.5) |
|---|---|---|---|---|
| Baseline | 0.940 | 0.757 | 0.633 | 0.601 |
| C2BRA | 0.989 | 0.729 | 0.646 | 0.583 |
| C2CGA | 0.965 | 0.723 | 0.548 | 0.509 |
| C2DA | 0.921 | 0.796 | 0.599 | 0.569 |
Tab.1 Comparison of C2 modules
| 模块 | Box(P) | Pose(P) | Pose(R) | Pose(mAP@0.5) |
|---|---|---|---|---|
| Baseline | 0.940 | 0.757 | 0.633 | 0.601 |
| C2BRA | 0.989 | 0.729 | 0.646 | 0.583 |
| C2CGA | 0.965 | 0.723 | 0.548 | 0.509 |
| C2DA | 0.921 | 0.796 | 0.599 | 0.569 |
| 模块 | Box(P) | Pose(P) | Pose(R) | Pose(mAP@0.5) |
|---|---|---|---|---|
| DAttention | 0.968 | 0.758 | 0.689 | 0.645 |
| DWR | 0.933 | 0.650 | 0.616 | 0.543 |
| DLKA | 0.962 | 0.725 | 0.596 | 0.536 |
Tab.2 Comparison of C3k2 modules
| 模块 | Box(P) | Pose(P) | Pose(R) | Pose(mAP@0.5) |
|---|---|---|---|---|
| DAttention | 0.968 | 0.758 | 0.689 | 0.645 |
| DWR | 0.933 | 0.650 | 0.616 | 0.543 |
| DLKA | 0.962 | 0.725 | 0.596 | 0.536 |
| 模块 | Box(P) | Pose(P) | Pose(R) | Pose(mAP@0.5) |
|---|---|---|---|---|
| AIFI | 0.979 | 0.736 | 0.649 | 0.583 |
| SPPF-LSKA | 0.970 | 0.776 | 0.615 | 0.540 |
| FocalModulation | 0.966 | 0.703 | 0.611 | 0.522 |
| AIFIRepBN | 0.862 | 0.628 | 0.481 | 0.417 |
Tab.3 Comparison of SPPF series
| 模块 | Box(P) | Pose(P) | Pose(R) | Pose(mAP@0.5) |
|---|---|---|---|---|
| AIFI | 0.979 | 0.736 | 0.649 | 0.583 |
| SPPF-LSKA | 0.970 | 0.776 | 0.615 | 0.540 |
| FocalModulation | 0.966 | 0.703 | 0.611 | 0.522 |
| AIFIRepBN | 0.862 | 0.628 | 0.481 | 0.417 |
| C3k2-DAttention | C2BRA | AIFI | Box(P) | Pose(P) | Pose(R) | Pose(mAP@0.5) |
|---|---|---|---|---|---|---|
| - | - | - | 0.940 | 0.757 | 0.633 | 0.601 |
| √ | - | - | 0.968 | 0.758 | 0.689 | 0.645 |
| - | √ | - | 0.989 | 0.729 | 0.644 | 0.583 |
| - | - | √ | 0.979 | 0.736 | 0.649 | 0.583 |
| √ | √ | - | 0.970 | 0.744 | 0.676 | 0.636 |
| √ | - | √ | 0.969 | 0.745 | 0.631 | 0.566 |
| - | √ | √ | 0.985 | 0.656 | 0.581 | 0.517 |
| √ | √ | √ | 0.970 | 0.807 | 0.679 | 0.669 |
Tab.4 Results of ablation experiments
| C3k2-DAttention | C2BRA | AIFI | Box(P) | Pose(P) | Pose(R) | Pose(mAP@0.5) |
|---|---|---|---|---|---|---|
| - | - | - | 0.940 | 0.757 | 0.633 | 0.601 |
| √ | - | - | 0.968 | 0.758 | 0.689 | 0.645 |
| - | √ | - | 0.989 | 0.729 | 0.644 | 0.583 |
| - | - | √ | 0.979 | 0.736 | 0.649 | 0.583 |
| √ | √ | - | 0.970 | 0.744 | 0.676 | 0.636 |
| √ | - | √ | 0.969 | 0.745 | 0.631 | 0.566 |
| - | √ | √ | 0.985 | 0.656 | 0.581 | 0.517 |
| √ | √ | √ | 0.970 | 0.807 | 0.679 | 0.669 |
| 方法 | Box(P) | Pose(P) | Pose(R) | Pose(mAP@0.5) |
|---|---|---|---|---|
| YOLOv11n-Pose | 0.940 | 0.633 | ||
| YOLOv8n-Pose | 0.923 | 0.631 | 0.558 | 0.498 |
| YOLOv12n | 0.953 | 0.667 | 0.481 | |
| YOLOv13n | 0.975 | 0.669 | 0.584 | 0.529 |
| HyperYOLO-tiny | 0.887 | 0.524 | 0.471 | 0.370 |
| Mamba-YOLO-tiny | 0.871 | 0.562 | 0.365 | 0.305 |
| 本文方法 | 0.807 | 0.679 | 0.669 |
Tab.5 Results of comparison experiments
| 方法 | Box(P) | Pose(P) | Pose(R) | Pose(mAP@0.5) |
|---|---|---|---|---|
| YOLOv11n-Pose | 0.940 | 0.633 | ||
| YOLOv8n-Pose | 0.923 | 0.631 | 0.558 | 0.498 |
| YOLOv12n | 0.953 | 0.667 | 0.481 | |
| YOLOv13n | 0.975 | 0.669 | 0.584 | 0.529 |
| HyperYOLO-tiny | 0.887 | 0.524 | 0.471 | 0.370 |
| Mamba-YOLO-tiny | 0.871 | 0.562 | 0.365 | 0.305 |
| 本文方法 | 0.807 | 0.679 | 0.669 |
| [1] | XU Y, ZHANG J, ZHANG Q, et al. ViTPose: simple vision Transformer baselines for human pose estimation [J]. Advances in Neural Information Processing Systems, 2022, 35: 38571‑38584. |
| [2] | HUNG J-M, CHIANG J-Y, WANG K. Tennis player pose classification using YOLO and MLP neural networks[C]// Proceedings of the 2021 International Symposium on Intelligent Signal Processing and Communication Systems. Piscataway: IEEE, 2021: 1-2. |
| [3] | 郝鹤菲,张龙豪,崔洪振,等. 深度神经网络在人体姿态估计中的应用综述[J]. 计算机工程与应用, 2025, 61(9): 41-60. |
| HAO H F, ZHANG L H, CUI H Z, et al. Review of application of deep neural networks in human pose estimation[J]. Computer Engineering and Applications, 2025, 61(9): 41-60. | |
| [4] | 徐琳皓,赵林,孙辛欣,等. 基于深度学习的遮挡人体姿态估计进展综述[J]. 中国图象图形学报, 2024, 29(12): 3529-3542. |
| XU L H, ZHAO L, SUN X X, et al. A comprehensive review of progress in deep-learning-based occluded human pose estimation[J]. Journal of Image and Graphics, 2024, 29(12): 3529-3542. | |
| [5] | 闫航,陈刚,佟瑶,等. 基于姿态估计与GRU网络的人体康复动作识别[J]. 计算机工程, 2021, 47(1): 12-20. |
| YAN H, CHEN G, TONG Y, et al. Human rehabilitation action recognition based on pose estimation and GRU network[J]. Computer Engineering, 2021, 47(1): 12-20. | |
| [6] | JAIMES A, SEBE N. Multimodal human-computer interaction: a survey[J]. Computer Vision and Image Understanding, 2007, 108(1/2): 116-134. |
| [7] | XU Y, ZHANG J, ZHANG Q, et al. ViTPose: simple vision Transformer baselines for human pose estimation[C]// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2022: 38571-38584. |
| [8] | XIAO B, WU H, WEI Y. Simple baselines for human pose estimation and tracking[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11210. Cham: Springer, 2018: 472-487. |
| [9] | SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5686-5696. |
| [10] | CHENG B, XIAO B, WANG J, et al. HigherHRNet: scale-aware representation learning for bottom-up human pose estimation[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5385-5394. |
| [11] | CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1302-1310. |
| [12] | DING J, NIU S, NIE Z, et al. Research on human posture estimation algorithm based on YOLO-Pose[J]. Sensors, 2024, 24(10): No.3036. |
| [13] | 黄健,胡翻,展越. 基于Yolov7_Pose的轻量化人体姿态估计网络[J]. 现代电子技术, 2024, 47(23): 98-104. |
| HUANG J, HU F, ZHAN Y. Lightweight human pose estimation network based on Yolov7_Pose[J]. Modern Electronic Technique, 2024, 47(23): 98-104. | |
| [14] | 罗智杰,王泽宇,岑飘,等. 基于改进YOLOv8pose的校园体测运动姿势识别研究[J]. 电子测量技术, 2024, 47(19): 24-33. |
| LUO Z J, WANG Z Y, CEN P, et al. Research on human motion pose recognition algorithm based on improved YOLOv8pose[J]. Electronic Measurement Technology, 2024, 47(19): 24-33. | |
| [15] | 王泉,叶广飞,陈祺东. YOLO-SWR:无人机视角下轻量级交通车辆检测算法[J]. 计算机工程与应用, 2025, 61(14): 112-122. |
| WANG Q, YE G F, CHEN Q D. YOLO-SWR: lightweight traffic vehicle detection algorithm from UAV perspective[J]. Computer Engineering and Applications, 2025, 61(14): 112-122. | |
| [16] | 贺智轩,陈里里,王翔,等. DMF-YOLOv11:基于改进YOLOv11n的无人机航拍图像目标检测算法[J]. 计算机工程与应用, 2025, 61(14): 88-100. |
| HE Z X, CHEN L L, WANG X, et al. DMF-YOLOv11: target detection algorithm for UAV images based on improved YOLOv11n[J]. Computer Engineering and Applications, 2025, 61(14): 88-100. | |
| [17] | 李彬,李生林. 改进YOLOv11n的无人机小目标检测算法[J]. 计算机工程与应用, 2025, 61(7): 96-104. |
| LI B, LI S L. Improved YOLOv11n small object detection algorithm in UAV view[J]. Computer Engineering and Applications, 2025, 61(7): 96-104. | |
| [18] | ZHU L, WANG X, KE Z, et al. BiFormer: vision Transformer with bi-level routing attention[C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 10323-10333. |
| [19] | ZHAO Y, LV W, XU S, et al. DETRs beat YOLOs on real-time object detection[C]// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2024: 16965-16974. |
| [20] | XIA Z, PAN X, SONG S, et al. Vision Transformer with deformable attention[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 4784-4793. |
| [21] | LIU X, PENG H, ZHENG N, et al. EfficientViT: memory efficient vision Transformer with cascaded group attention [C]// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 14420-14430. |
| [22] | WEI H, LIU X, XU S, et al. DWRSeg: rethinking efficient acquisition of multiscale contextual information for real-time semantic segmentation [EB/OL]. [2025-06-02]. . |
| [23] | AZAD R, NIGGEMEIER L, HÜTTEMANN M, et al. Beyond self-attention: deformable large kernel attention for medical image segmentation[C]// Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway:IEEE, 2024: 1287-1297. |
| [24] | LAU K W, PO L M, REHMAN Y A U. Large separable kernel attention: rethinking the large kernel attention design in CNN [J]. Expert Systems with Applications, 2024, 236: 121352. |
| [25] | YANG J, LI C, DAI X, et al. Focal modulation networks[J]. Advances in Neural Information Processing Systems, 2022, 35: 4203-4217. |
| [26] | GUO J, CHEN X, TANG Y, ET AL. Slab: efficient transformers with simplified linear attention and progressive re-parameterized batch normalization [EB/OL]. [2025-06-04]. . |
| [27] | GENG Z, SUN K, XIAO B, et al. Bottom-up human pose estimation via disentangled keypoint regression [C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 14676-14686. |
| [28] | TIAN Y, YE Q, DOERMANN D. YOLOv12: attention-centric real-time object detectors [EB/OL]. [2025-06-04]. . |
| [29] | LEI M, LI S, WU Y, et al. YOLOv13: real-time object detection with hypergraph-enhanced adaptive visual perception [EB/OL]. [2025-06-04]. . |
| [30] | FENG Y, HUANG J, DU S, et al. Hyper-YOLO: when visual object detection meets hypergraph computation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 47(4): 2388-2401. |
| [31] | WANG Z, LI C, XU H, et al. Mamba YOLO: a simple baseline for object detection with state space model [EB/OL]. [2025-06-04]. . |
| [1] | Chao LYU, Geyao MA. Lightweight human pose estimation network based on redundant feature suppression [J]. Journal of Computer Applications, 2026, 46(6): 1973-1980. |
| [2] | Yi DU, Mingjin XU, Jiayi KONG, Liyao WANG, Chen ZHAO. Low-rank adaptive parameter-efficient fine-tuning algorithm based on YOLOv11 [J]. Journal of Computer Applications, 2026, 46(6): 1738-1745. |
| [3] | Jinxiao ZHANG, Chenglong LI, Xinyan GAO, Ming ZHANG. 3D human pose estimation model based on temporal-spatial feature pyramid network and multi-hypothesis interaction mechanism [J]. Journal of Computer Applications, 2026, 46(6): 1965-1972. |
| [4] | Minqi WU, Yuanhua YANG, Hang LI, Yaqin HU, Zhihao TANG, Teng MEI. Lightweight underwater small object detection based on graph Transformer and RT-DETR [J]. Journal of Computer Applications, 2026, 46(5): 1586-1595. |
| [5] | Hongrui ZHANG, Weiming FENG, Luxia YANG, Yongjie MA. CSAF-YOLO: improved YOLO11 algorithm for underwater small object detection [J]. Journal of Computer Applications, 2026, 46(5): 1578-1585. |
| [6] | Shuai HE, Chunhua DENG. Object detection algorithm with few-shot learning based on YOLO-World [J]. Journal of Computer Applications, 2026, 46(4): 1275-1282. |
| [7] | Yongbing ZHANG, Lirong YAN, Xiaofen TANG. Progressive dual-stage modality interaction for single-domain generalized object detection [J]. Journal of Computer Applications, 2026, 46(4): 1264-1274. |
| [8] | Peng CHEN, Xu LI, Xiaosheng YU. RGB-D dual-stream mirror network for camouflaged object detection [J]. Journal of Computer Applications, 2026, 46(4): 1253-1263. |
| [9] | Yang GUO, Hailiang WANG, Xu GAO, Haitao WANG, Yibo WANG. Survey on BEV 3D object detection algorithm system [J]. Journal of Computer Applications, 2026, 46(4): 1238-1252. |
| [10] | Yinshan YU, Xu TANG, Mingjian DING, Wenkai HUANG, Jiawen BI, Guochen TAN. Real-time vehicle detection algorithm based on YOLOv10 [J]. Journal of Computer Applications, 2026, 46(3): 950-958. |
| [11] | Tingquan DENG, Yuling LI, Yonghang REN, Tian XIA, Kunfu WANG, Shengchun WANG. UAV swarm formation recognition algorithm based on multi-scale complex networks [J]. Journal of Computer Applications, 2026, 46(3): 1004-1010. |
| [12] | Quanjie LIU, Zhaoyi GU, Chunyuan WANG. Unsafe driving behavior detection under complex lighting conditions [J]. Journal of Computer Applications, 2026, 46(2): 613-619. |
| [13] | Yuebo FAN, Mingxuan CHEN, Xian TANG, Yongbin GAO, Wenchao LI. Multi-dimensional frequency domain feature fusion for human-object interaction detection [J]. Journal of Computer Applications, 2026, 46(2): 580-586. |
| [14] | Mingguang LI, Chongben TAO. Hierarchical cross-modal fusion method for 3D object detection based on Mamba model [J]. Journal of Computer Applications, 2026, 46(2): 572-579. |
| [15] | Binhong XIE, Rui WANG, Rui ZHANG, Yingjun ZHANG. Agent prototype distillation algorithm for few-shot object detection [J]. Journal of Computer Applications, 2026, 46(1): 233-241. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||