YOLO-AirPose： human pose estimation algorithm in UAV aerial view

doi:10.11772/j.issn.1001-9081.2025050663

Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (6): 1989-1997.DOI: 10.11772/j.issn.1001-9081.2025050663

• Multimedia computing and computer simulation • Previous Articles

YOLO-AirPose： human pose estimation algorithm in UAV aerial view

Qiuyan YIN¹, Jing DING²(), Zhigang NIE¹

^1.College of Information Science and Technology，Gansu Agricultural University，Lanzhou Gansu 730070，China
^2.Department of Physical Education，Gansu Agricultural University，Lanzhou Gansu 730070，China

Received:2025-06-23 Revised:2025-09-13 Accepted:2025-09-18 Online:2025-09-25 Published:2026-06-10
Contact: Jing DING
About author:YIN Qiuyan， born in 1998， M. S. candidate. Her research interests include pose recognition， object detection.
NIE Zhigang， born in 1980， Ph. D.， professor. His research interests include computer vision， smart agriculture.
First author contact:DING Jing， born in 1979， M. S.， associate professor. Her research interests include sports training and rehabilitation， human posture correction.
Supported by:
Youth Mentor Support Program of Gansu Agricultural University(GAU-QDFC-2022-19);Top-notch Talent Program of Gansu Province(GSBJLJ-2023-09)

无人机航拍视角下的人体姿态估计算法YOLO-AirPose

尹秋燕¹, 丁婧²(), 聂志刚¹

^1.甘肃农业大学信息科学技术学院，兰州 730070
^2.甘肃农业大学体育教学部，兰州 730070

通讯作者: 丁婧
作者简介:尹秋燕（1998—），女，山东聊城人，硕士研究生，CCF会员，主要研究方向：姿态识别、目标检测
聂志刚（1980—），男，甘肃张掖人，教授，博士，主要研究方向：计算机视觉、智慧农业。
第一联系人：丁婧（1979—），女，陕西咸阳人，副教授，硕士，主要研究方向：运动训练与康复、人体姿态矫正
基金资助:
甘肃农业大学青年导师扶持项目(GAU-QDFC-2022-19);甘肃省拔尖领军人才项目(GSBJLJ-2023-09)

Abstract

Abstract:

To address the challenges of background interference， keypoint localization deviation， and target occlusion in Unmanned Aerial Vehicle （UAV） aerial view human pose estimation， an enhanced human pose estimation algorithm named YOLO-AirPose was proposed for non-ground view scenarios. Firstly， a symmetric flip augmentation strategy based on keypoint topology constraint， named IPSFA （Index-Preserved Symmetric Flip Augmentation）， was designed to improve generalization under multi-view scenarios. Secondly， a C2BRA （C2 Bi-level Routing Attention） module was constructed by integrating BRA （Bi-level Routing Attention） mechanism to replace the original C2PSA （Cross stage Partial with Spatial Attention）， thereby enhancing the model’s perception of small-scale targets and occluded keypoints. Thirdly， combining spatial modeling ability of Transformer， an AIFI （Adaptive Interaction Feature Integration） module was embedded into the backbone network， so that 2D positional encoding was combined to improve keypoint localization performance. Finally， a C3k2-DAttention module based on deformable attention mechanism was designed to strengthen the network’s global modeling and receptive field adjustment abilities. Experimental results show that YOLO-AirPose achieves improvements of 3.0， 5.0， 4.6， and 6.8 percentage points in precision of object detection and precision， recall， and mAP@0.5 of pose estimation compared to the baseline model YOLO-Pose， respectively， while maintaining low computational cost and parameter quantity. It can be seen that the proposed algorithm provides an improved solution to the accuracy limitations in UAV aerial view human pose estimation and enhances adaptability to complex human poses.

Key words: human pose estimation, Unmanned Aerial Vehicle (UAV), YOLOv11-Pose, object detection, region awareness

摘要：

针对无人机（UAV）航拍视角下人体姿态估计中存在的复杂背景干扰、关键点定位偏移和目标遮挡等问题，提出一种适用于非地面视角下的增强型人体姿态估计算法YOLO-AirPose。首先，设计基于关键点拓扑约束的对称翻转增强策略IPSFA （Index-Preserved Symmetric Flip Augmentation），提升多视角场景下的泛化能力；其次，融合BRA（Bi-level Routing Attention）机制构建C2BRA（C2 Bi-level Routing Attention）模块替代原有的C2PSA（Cross stage Partial with Spatial Attention），增强模型对小尺寸目标与遮挡关键点的表达能力；再次，结合Transformer的空间建模能力，将AIFI（Adaptive Interaction Feature Integration）模块嵌入主干网络，以结合2D位置编码优化关键点定位性能；最后，设计基于可变形注意力机制的C3k2-DAttention模块，以增强网络的全局建模与感受野调控能力。实验结果表明，在保持较低计算量和较低参数量的前提下，YOLO-AirPose在目标检测的精确率以及姿态估计的精确率、召回率和mAP@0.5上相较于基准模型YOLO-Pose分别提升了3.0以及5.0、4.6和6.8个百分点。可见，所提算法为UAV俯视视角下人体姿态估计精度不足问题提供了改进方案，同时还提升了对人体复杂姿态的适应能力。

关键词: 人体姿态估计, 无人机, YOLOv11-Pose, 目标检测, 区域感知

CLC Number:

TP391.4

Qiuyan YIN, Jing DING, Zhigang NIE. YOLO-AirPose： human pose estimation algorithm in UAV aerial view[J]. Journal of Computer Applications, 2026, 46(6): 1989-1997.

尹秋燕, 丁婧, 聂志刚. 无人机航拍视角下的人体姿态估计算法YOLO-AirPose[J]. 《计算机应用》唯一官方网站, 2026, 46(6): 1989-1997.

Figures/Tables 13

References 31

[1]	XU Y， ZHANG J， ZHANG Q， et al. ViTPose： simple vision Transformer baselines for human pose estimation ［J］. Advances in Neural Information Processing Systems， 2022， 35： 38571‑38584.
[2]	HUNG J-M， CHIANG J-Y， WANG K. Tennis player pose classification using YOLO and MLP neural networks［C］// Proceedings of the 2021 International Symposium on Intelligent Signal Processing and Communication Systems. Piscataway： IEEE， 2021： 1-2.
[3]	郝鹤菲，张龙豪，崔洪振，等. 深度神经网络在人体姿态估计中的应用综述［J］. 计算机工程与应用， 2025， 61（9）： 41-60.
	HAO H F， ZHANG L H， CUI H Z， et al. Review of application of deep neural networks in human pose estimation［J］. Computer Engineering and Applications， 2025， 61（9）： 41-60.
[4]	徐琳皓，赵林，孙辛欣，等. 基于深度学习的遮挡人体姿态估计进展综述［J］. 中国图象图形学报， 2024， 29（12）： 3529-3542.
	XU L H， ZHAO L， SUN X X， et al. A comprehensive review of progress in deep-learning-based occluded human pose estimation［J］. Journal of Image and Graphics， 2024， 29（12）： 3529-3542.
[5]	闫航，陈刚，佟瑶，等. 基于姿态估计与GRU网络的人体康复动作识别［J］. 计算机工程， 2021， 47（1）： 12-20.
	YAN H， CHEN G， TONG Y， et al. Human rehabilitation action recognition based on pose estimation and GRU network［J］. Computer Engineering， 2021， 47（1）： 12-20.
[6]	JAIMES A， SEBE N. Multimodal human-computer interaction： a survey［J］. Computer Vision and Image Understanding， 2007， 108（1/2）： 116-134.
[7]	XU Y， ZHANG J， ZHANG Q， et al. ViTPose： simple vision Transformer baselines for human pose estimation［C］// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2022： 38571-38584.
[8]	XIAO B， WU H， WEI Y. Simple baselines for human pose estimation and tracking［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11210. Cham： Springer， 2018： 472-487.
[9]	SUN K， XIAO B， LIU D， et al. Deep high-resolution representation learning for human pose estimation［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 5686-5696.
[10]	CHENG B， XIAO B， WANG J， et al. HigherHRNet： scale-aware representation learning for bottom-up human pose estimation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 5385-5394.
[11]	CAO Z， SIMON T， WEI S E， et al. Realtime multi-person 2D pose estimation using part affinity fields［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 1302-1310.
[12]	DING J， NIU S， NIE Z， et al. Research on human posture estimation algorithm based on YOLO-Pose［J］. Sensors， 2024， 24（10）： No.3036.
[13]	黄健，胡翻，展越. 基于Yolov7_Pose的轻量化人体姿态估计网络［J］. 现代电子技术， 2024， 47（23）： 98-104.
	HUANG J， HU F， ZHAN Y. Lightweight human pose estimation network based on Yolov7_Pose［J］. Modern Electronic Technique， 2024， 47（23）： 98-104.
[14]	罗智杰，王泽宇，岑飘，等. 基于改进YOLOv8pose的校园体测运动姿势识别研究［J］. 电子测量技术， 2024， 47（19）： 24-33.
	LUO Z J， WANG Z Y， CEN P， et al. Research on human motion pose recognition algorithm based on improved YOLOv8pose［J］. Electronic Measurement Technology， 2024， 47（19）： 24-33.
[15]	王泉，叶广飞，陈祺东. YOLO-SWR：无人机视角下轻量级交通车辆检测算法［J］. 计算机工程与应用， 2025， 61（14）： 112-122.
	WANG Q， YE G F， CHEN Q D. YOLO-SWR： lightweight traffic vehicle detection algorithm from UAV perspective［J］. Computer Engineering and Applications， 2025， 61（14）： 112-122.
[16]	贺智轩，陈里里，王翔，等. DMF-YOLOv11：基于改进YOLOv11n的无人机航拍图像目标检测算法［J］. 计算机工程与应用， 2025， 61（14）： 88-100.
	HE Z X， CHEN L L， WANG X， et al. DMF-YOLOv11： target detection algorithm for UAV images based on improved YOLOv11n［J］. Computer Engineering and Applications， 2025， 61（14）： 88-100.
[17]	李彬，李生林. 改进YOLOv11n的无人机小目标检测算法［J］. 计算机工程与应用， 2025， 61（7）： 96-104.
	LI B， LI S L. Improved YOLOv11n small object detection algorithm in UAV view［J］. Computer Engineering and Applications， 2025， 61（7）： 96-104.
[18]	ZHU L， WANG X， KE Z， et al. BiFormer： vision Transformer with bi-level routing attention［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 10323-10333.
[19]	ZHAO Y， LV W， XU S， et al. DETRs beat YOLOs on real-time object detection［C］// Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2024： 16965-16974.
[20]	XIA Z， PAN X， SONG S， et al. Vision Transformer with deformable attention［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 4784-4793.
[21]	LIU X， PENG H， ZHENG N， et al. EfficientViT： memory efficient vision Transformer with cascaded group attention ［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2023： 14420-14430.
[22]	WEI H， LIU X， XU S， et al. DWRSeg： rethinking efficient acquisition of multiscale contextual information for real-time semantic segmentation ［EB/OL］. ［2025-06-02］. .
[23]	AZAD R， NIGGEMEIER L， HÜTTEMANN M， et al. Beyond self-attention： deformable large kernel attention for medical image segmentation［C］// Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway：IEEE， 2024： 1287-1297.
[24]	LAU K W， PO L M， REHMAN Y A U. Large separable kernel attention： rethinking the large kernel attention design in CNN ［J］. Expert Systems with Applications， 2024， 236： 121352.
[25]	YANG J， LI C， DAI X， et al. Focal modulation networks［J］. Advances in Neural Information Processing Systems， 2022， 35： 4203-4217.
[26]	GUO J， CHEN X， TANG Y， ET AL. Slab： efficient transformers with simplified linear attention and progressive re-parameterized batch normalization ［EB/OL］. ［2025-06-04］. .
[27]	GENG Z， SUN K， XIAO B， et al. Bottom-up human pose estimation via disentangled keypoint regression ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 14676-14686.
[28]	TIAN Y， YE Q， DOERMANN D. YOLOv12： attention-centric real-time object detectors ［EB/OL］. ［2025-06-04］. .
[29]	LEI M， LI S， WU Y， et al. YOLOv13： real-time object detection with hypergraph-enhanced adaptive visual perception ［EB/OL］. ［2025-06-04］. .
[30]	FENG Y， HUANG J， DU S， et al. Hyper-YOLO： when visual object detection meets hypergraph computation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2024， 47（4）： 2388-2401.
[31]	WANG Z， LI C， XU H， et al. Mamba YOLO： a simple baseline for object detection with state space model ［EB/OL］. ［2025-06-04］. .

模块	Box（P）	Pose（P）	Pose（R）	Pose（mAP@0.5）
Baseline	0.940	0.757	0.633	0.601
C2BRA	0.989	0.729	0.646	0.583
C2CGA	0.965	0.723	0.548	0.509
C2DA	0.921	0.796	0.599	0.569

模块	Box（P）	Pose（P）	Pose（R）	Pose（mAP@0.5）
Baseline	0.940	0.757	0.633	0.601
C2BRA	0.989	0.729	0.646	0.583
C2CGA	0.965	0.723	0.548	0.509
C2DA	0.921	0.796	0.599	0.569

模块	Box（P）	Pose（P）	Pose（R）	Pose（mAP@0.5）
DAttention	0.968	0.758	0.689	0.645
DWR	0.933	0.650	0.616	0.543
DLKA	0.962	0.725	0.596	0.536

模块	Box（P）	Pose（P）	Pose（R）	Pose（mAP@0.5）
DAttention	0.968	0.758	0.689	0.645
DWR	0.933	0.650	0.616	0.543
DLKA	0.962	0.725	0.596	0.536

模块	Box（P）	Pose（P）	Pose（R）	Pose（mAP@0.5）
AIFI	0.979	0.736	0.649	0.583
SPPF-LSKA	0.970	0.776	0.615	0.540
FocalModulation	0.966	0.703	0.611	0.522
AIFIRepBN	0.862	0.628	0.481	0.417

YOLO-AirPose： human pose estimation algorithm in UAV aerial view

无人机航拍视角下的人体姿态估计算法YOLO-AirPose

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 31

Related Articles 15

Recommended Articles

Metrics

方法	Box（P）	Pose（P）	Pose（R）	Pose（mAP@0.5）
YOLOv11n-Pose	0.940	0.757	0.633	0.601
YOLOv8n-Pose	0.923	0.631	0.558	0.498
YOLOv12n	0.953	0.667	0.641	0.481
YOLOv13n	0.975	0.669	0.584	0.529
HyperYOLO-tiny	0.887	0.524	0.471	0.370
Mamba-YOLO-tiny	0.871	0.562	0.365	0.305
本文方法	0.970	0.807	0.679	0.669

[1]	Chao LYU, Geyao MA. Lightweight human pose estimation network based on redundant feature suppression [J]. Journal of Computer Applications, 2026, 46(6): 1973-1980.
[2]	Yi DU, Mingjin XU, Jiayi KONG, Liyao WANG, Chen ZHAO. Low-rank adaptive parameter-efficient fine-tuning algorithm based on YOLOv11 [J]. Journal of Computer Applications, 2026, 46(6): 1738-1745.
[3]	Jinxiao ZHANG, Chenglong LI, Xinyan GAO, Ming ZHANG. 3D human pose estimation model based on temporal-spatial feature pyramid network and multi-hypothesis interaction mechanism [J]. Journal of Computer Applications, 2026, 46(6): 1965-1972.
[4]	Minqi WU, Yuanhua YANG, Hang LI, Yaqin HU, Zhihao TANG, Teng MEI. Lightweight underwater small object detection based on graph Transformer and RT-DETR [J]. Journal of Computer Applications, 2026, 46(5): 1586-1595.
[5]	Hongrui ZHANG, Weiming FENG, Luxia YANG, Yongjie MA. CSAF-YOLO： improved YOLO11 algorithm for underwater small object detection [J]. Journal of Computer Applications, 2026, 46(5): 1578-1585.
[6]	Shuai HE, Chunhua DENG. Object detection algorithm with few-shot learning based on YOLO-World [J]. Journal of Computer Applications, 2026, 46(4): 1275-1282.
[7]	Yongbing ZHANG, Lirong YAN, Xiaofen TANG. Progressive dual-stage modality interaction for single-domain generalized object detection [J]. Journal of Computer Applications, 2026, 46(4): 1264-1274.
[8]	Peng CHEN, Xu LI, Xiaosheng YU. RGB-D dual-stream mirror network for camouflaged object detection [J]. Journal of Computer Applications, 2026, 46(4): 1253-1263.
[9]	Yang GUO, Hailiang WANG, Xu GAO, Haitao WANG, Yibo WANG. Survey on BEV 3D object detection algorithm system [J]. Journal of Computer Applications, 2026, 46(4): 1238-1252.
[10]	Yinshan YU, Xu TANG, Mingjian DING, Wenkai HUANG, Jiawen BI, Guochen TAN. Real-time vehicle detection algorithm based on YOLOv10 [J]. Journal of Computer Applications, 2026, 46(3): 950-958.
[11]	Tingquan DENG, Yuling LI, Yonghang REN, Tian XIA, Kunfu WANG, Shengchun WANG. UAV swarm formation recognition algorithm based on multi-scale complex networks [J]. Journal of Computer Applications, 2026, 46(3): 1004-1010.
[12]	Quanjie LIU, Zhaoyi GU, Chunyuan WANG. Unsafe driving behavior detection under complex lighting conditions [J]. Journal of Computer Applications, 2026, 46(2): 613-619.
[13]	Yuebo FAN, Mingxuan CHEN, Xian TANG, Yongbin GAO, Wenchao LI. Multi-dimensional frequency domain feature fusion for human-object interaction detection [J]. Journal of Computer Applications, 2026, 46(2): 580-586.
[14]	Mingguang LI, Chongben TAO. Hierarchical cross-modal fusion method for 3D object detection based on Mamba model [J]. Journal of Computer Applications, 2026, 46(2): 572-579.
[15]	Binhong XIE, Rui WANG, Rui ZHANG, Yingjun ZHANG. Agent prototype distillation algorithm for few-shot object detection [J]. Journal of Computer Applications, 2026, 46(1): 233-241.

C3k2-DAttention	C2BRA	AIFI	Box（P）	Pose（P）	Pose（R）	Pose（mAP@0.5）
－	－	－	0.940	0.757	0.633	0.601
√	－	－	0.968	0.758	0.689	0.645
－	√	－	0.989	0.729	0.644	0.583
－	－	√	0.979	0.736	0.649	0.583
√	√	－	0.970	0.744	0.676	0.636
√	－	√	0.969	0.745	0.631	0.566
－	√	√	0.985	0.656	0.581	0.517
√	√	√	0.970	0.807	0.679	0.669

C3k2-DAttention	C2BRA	AIFI	Box（P）	Pose（P）	Pose（R）	Pose（mAP@0.5）
－	－	－	0.940	0.757	0.633	0.601
√	－	－	0.968	0.758	0.689	0.645
－	√	－	0.989	0.729	0.644	0.583
－	－	√	0.979	0.736	0.649	0.583
√	√	－	0.970	0.744	0.676	0.636
√	－	√	0.969	0.745	0.631	0.566
－	√	√	0.985	0.656	0.581	0.517
√	√	√	0.970	0.807	0.679	0.669