《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (11): 3639-3646.DOI: 10.11772/j.issn.1001-9081.2023101379
• 前沿与综合应用 • 上一篇
收稿日期:2023-10-13
修回日期:2024-01-16
接受日期:2024-01-18
发布日期:2024-11-13
出版日期:2024-11-10
通讯作者:
杨慧
作者简介:梁睿衍(1998—),男,广东佛山人,硕士研究生,主要研究方向:姿态估计、图卷积网络
基金资助:Received:2023-10-13
Revised:2024-01-16
Accepted:2024-01-18
Online:2024-11-13
Published:2024-11-10
Contact:
Hui YANG
About author:LIANG Ruiyan, born in 1998, M. S. candidate. His research interests include pose estimation, graph convolutional network.
Supported by:摘要:
传统的以ViT(Vision Transformer)模型为基准架构的关节点检测模型通常采用二维正弦位置编码,易丢失图像关键的二维形状信息,导致精度下降;而行为分类模型中,传统的时空图卷积网络(ST-GCN)在单标签分区策略中存在非物理连接的关节连接间关联度缺失问题。针对上述问题,设计一种轻量化实时跌倒检测算法框架,以快速准确地检测跌倒行为。该框架包含关节点检测模型RPEpose(Relative Position Encoding pose estimation)和行为分类模型XJ-GCN(Cross-Joint attention Graph Convolutional Network)。一方面,RPEpose模型采用相对位置编码克服原有位置编码的位置不敏感的缺陷,提升ViT架构在关节点检测中的性能;另一方面,提出X-Joint(Cross?Joint)注意力机制,将分区策略重构为XJL(X-Joint Labeling)分区策略后,对所有关节连接之间的依赖关系建模,能获得关节连接潜在相关性,具有分类性能优异且参数量小的优势。实验结果表明,在COCO 2017验证集上,对于分辨率为256×192的图像,RPEpose模型的计算开销仅为8.2 GFLOPs(Giga FLOating Point of operations),测试平均精度(AP)为74.3%;在以交叉目标(X?Sub)为划分标准的NTU RGB+D数据集上,XJ-GCN模型的测试Top-1准确率为89.6%,所提框架RPEpose+XJ-GCN的处理速度为30 frame/s,预测准确率为87.2%,具有较高的实时性和准确性。
中图分类号:
梁睿衍, 杨慧. 基于RPEpose和XJ-GCN的轻量级跌倒检测算法框架[J]. 计算机应用, 2024, 44(11): 3639-3646.
Ruiyan LIANG, Hui YANG. Lightweight fall detection algorithm framework based on RPEpose and XJ-GCN[J]. Journal of Computer Applications, 2024, 44(11): 3639-3646.
| 位置编码 | AP | AR |
|---|---|---|
| 2D Sine Position Embedding | 71.7 | 77.1 |
| Bias Mode | 72.9 | 77.4 |
| Contextual Mode | 73.3 | 77.6 |
| RPE-I(本文) | 74.3 | 78.2 |
表1 不同位置编码对比 (%)
Tab. 1 Comparison of different position embeddings
| 位置编码 | AP | AR |
|---|---|---|
| 2D Sine Position Embedding | 71.7 | 77.1 |
| Bias Mode | 72.9 | 77.4 |
| Contextual Mode | 73.3 | 77.6 |
| RPE-I(本文) | 74.3 | 78.2 |
| 模型 | 分辨率 | 计算量/GFLOPs | AP/% | AR/% |
|---|---|---|---|---|
| TransPose-H-A4[ | 256×192 | 10.2 | 74.2 | 78.0 |
| CPN+[ | 384×288 | 29.2 | 73.0 | 79.0 |
| AlphaPose[ | 320×256 | 26.7 | 72.3 | — |
| Simple Baseline[ | 384×288 | 35.6 | 72.3 | 79.0 |
| OpenPose[ | — | — | 65.3 | — |
| YOLO-Pose[ | 960×960 | — | 68.5 | 75.0 |
| OpenPifPaf[ | — | — | 71.9 | — |
| RPEpose | 256×192 | 8.2 | 74.3 | 78.2 |
表2 不同关节点检测模型性能对比
Tab. 2 Performance comparison of different joint keypoint detection models
| 模型 | 分辨率 | 计算量/GFLOPs | AP/% | AR/% |
|---|---|---|---|---|
| TransPose-H-A4[ | 256×192 | 10.2 | 74.2 | 78.0 |
| CPN+[ | 384×288 | 29.2 | 73.0 | 79.0 |
| AlphaPose[ | 320×256 | 26.7 | 72.3 | — |
| Simple Baseline[ | 384×288 | 35.6 | 72.3 | 79.0 |
| OpenPose[ | — | — | 65.3 | — |
| YOLO-Pose[ | 960×960 | — | 68.5 | 75.0 |
| OpenPifPaf[ | — | — | 71.9 | — |
| RPEpose | 256×192 | 8.2 | 74.3 | 78.2 |
| 维度 | Top-1 Accuracy/% | |
|---|---|---|
| X-Sub | X-View | |
| 2D | 88.4 | 95.2 |
| 3D | 89.6 | 94.6 |
表3 XJ-GCN在不同维度数据集上的Top-1 Accuracy对比
Tab. 3 Top-1 Accuracy comparison of XJ-GCN on different dimensional datasets
| 维度 | Top-1 Accuracy/% | |
|---|---|---|
| X-Sub | X-View | |
| 2D | 88.4 | 95.2 |
| 3D | 89.6 | 94.6 |
| 模型 | 参数量/MB | Top-1 Accuracy/% | |
|---|---|---|---|
| X-Sub | X-View | ||
| S-TR[ | 3.1 | 86.8 | 93.8 |
| HCN[ | 1.1 | 86.5 | 91.1 |
| ST-GCN[ | 3.1 | 81.5 | 88.3 |
| 2s-AGCN[ | 7.1 | 88.5 | 95.1 |
| AS-GCN[ | 7.6 | 86.8 | 94.2 |
| SR-TSL[ | 19.2 | 84.8 | 92.4 |
| AGC-LSTM[ | 23.4 | 87.5 | 93.5 |
| VA-CNN[ | 24.1 | 88.7 | 94.3 |
| CoST-GCN[ | 3.1 | 86.0 | 93.4 |
| XJ-GCN | 1.4 | 89.6 | 94.6 |
表4 各模型在NTU RGB+D数据集上的性能对比
Tab. 4 Performance comparison of different models on NTU RGB+D dataset
| 模型 | 参数量/MB | Top-1 Accuracy/% | |
|---|---|---|---|
| X-Sub | X-View | ||
| S-TR[ | 3.1 | 86.8 | 93.8 |
| HCN[ | 1.1 | 86.5 | 91.1 |
| ST-GCN[ | 3.1 | 81.5 | 88.3 |
| 2s-AGCN[ | 7.1 | 88.5 | 95.1 |
| AS-GCN[ | 7.6 | 86.8 | 94.2 |
| SR-TSL[ | 19.2 | 84.8 | 92.4 |
| AGC-LSTM[ | 23.4 | 87.5 | 93.5 |
| VA-CNN[ | 24.1 | 88.7 | 94.3 |
| CoST-GCN[ | 3.1 | 86.0 | 93.4 |
| XJ-GCN | 1.4 | 89.6 | 94.6 |
| 跌倒检测算法框架 | 准确率 |
|---|---|
| OpenPose+CoST-GCN | 85.1 |
| OpenPose+XJ-GCN | 85.9 |
| OpenPifPaf+CoST-GCN | 85.8 |
| OpenPifPaf+XJ-GCN | 86.3 |
| RPEpose+CoST-GCN | 86.4 |
| RPEpose+XJ-GCN | 87.2 |
表5 不同跌倒检测算法框架准确率对比 (%)
Tab. 5 Accuracy comparison of different fall detection algorithm frameworks
| 跌倒检测算法框架 | 准确率 |
|---|---|
| OpenPose+CoST-GCN | 85.1 |
| OpenPose+XJ-GCN | 85.9 |
| OpenPifPaf+CoST-GCN | 85.8 |
| OpenPifPaf+XJ-GCN | 86.3 |
| RPEpose+CoST-GCN | 86.4 |
| RPEpose+XJ-GCN | 87.2 |
| 1 | PIERLEONI P, BELLI A, PALMA L, et al. A high reliability wearable device for elderly fall detection [J]. IEEE Sensors Journal, 2015, 15(8): 4544-4553. |
| 2 | CAO Z, HIDALGO G, SIMON T, et al. OpenPose: realtime multi-person 2D pose estimation using part affinity fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(1): 172-186. |
| 3 | MAJI D, NAGORI S, MATHEW M, et al. YOLO-Pose: enhancing YOLO for multi person pose estimation using object keypoint similarity loss[C]// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 2636-2645. |
| 4 | CHEN Y, WANG Z, PENG Y, et al. Cascaded pyramid network for multi-person pose estimation[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7103-7112. |
| 5 | YANG S, QUAN Z, NIE M, et al. TransPose: keypoint localization via Transformer[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 11782-11792. |
| 6 | RAMACHANDRAN P, PARMAR N, VASWANI A, et al. Stand-alone self-attention in vision models[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2019: 68-80. |
| 7 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale [EB/OL]. [2023-10-11]. . |
| 8 | LIN T-Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]// Proceedings of the 13th European Conference on Computer Vision. Cham: Springer, 2014: 740-755. |
| 9 | YAN S, XIONG Y, LIN D. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018: 7444-7452. |
| 10 | LI M, CHEN S, CHEN X, et al. Actional-structural graph convolutional networks for skeleton-based action recognition[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 3590-3598. |
| 11 | HEDEGAARD L, HEIDARI N, IOSIFIDIS A. Continual spatio-temporal graph convolutional networks[J]. Pattern Recognition, 2023, 140: 109528. |
| 12 | SHAHROUDY A, LIU J, T-T NG, et al. NTU RGB+ D: a large scale dataset for 3D human activity analysis[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1010-1019. |
| 13 | XU Y, ZHANG J, ZHANG Q, et al. ViTPose: simple vision Transformer baselines for human pose estimation[C]// Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2022: 38571-38584. |
| 14 | YUAN Y, FU R, HUANG L, et al. HRFormer: high-resolution vision Transformer for dense predict[C]// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2021: 7281-7293. |
| 15 | 曹建荣,吕俊杰,武欣莹,等.融合运动特征和深度学习的跌倒检测算法[J].计算机应用,2021,41(2):583-589. |
| CAO J R, LYU J J, WU X Y, et al. Fall detection algorithm integrating motion features and deep learning[J]. Journal of Computer Applications, 2021, 41(2): 583-589. | |
| 16 | 马敬奇,雷欢,陈敏翼.基于AlphaPose优化模型的老人跌倒行为检测算法[J].计算机应用,2022,42(1):294-301. |
| MA J Q, LEI H, CHEN M Y. Fall behavior detection algorithm for the elderly based on AlphaPose optimization model[J]. Journal of Computer Applications, 2022, 42(1):294-301. | |
| 17 | DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009: 248-255. |
| 18 | WU K, PENG H, CHEN M, et al. Rethinking and improving relative position encoding for vision Transformer[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 10033-10041. |
| 19 | FANG H-S, XIE S, TAI Y-W, et al. RMPE: regional multi-person pose estimation[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2353-2362. |
| 20 | XIAO B, WU H, WEI Y. Simple baselines for human pose estimation and tracking[C]// Proceedings of the 15th European Conference on Computer Vision.Cham: Springer, 2018: 472-487. |
| 21 | KREISS S, BERTONI L, ALAHI A. OpenPifPaf: composite fields for semantic keypoint detection and spatio-temporal association[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 13498-13511. |
| 22 | PLIZZARI C, CANNICI M, MATTEUCCI M. Skeleton-based action recognition via spatial and temporal Transformer networks[J]. Computer Vision and Image Understanding, 2021, 208/209: 103219. |
| 23 | LI C, ZHONG Q, XIE D, et al. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation [EB/OL]. [2023-08-22]. . |
| 24 | SHI L, ZHANG Y, CHENG J, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 12018-12027. |
| 25 | SI C, JING Y, WANG W, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning[C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 106-121. |
| 26 | SI C, CHEN W, WANG W, et al. An attention enhanced graph convolutional LSTM network for skeleton-based action recognition[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 1227-1236. |
| 27 | ZHANG P, LAN C, XING J, et al. View adaptive neural networks for high performance skeleton-based human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1963-1978. |
| [1] | 秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974. |
| [2] | 李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738. |
| [3] | 赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892. |
| [4] | 薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392. |
| [5] | 汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399. |
| [6] | 高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406. |
| [7] | 李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594. |
| [8] | 莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617. |
| [9] | 刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109. |
| [10] | 徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199. |
| [11] | 李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182. |
| [12] | 魏文亮, 王阳萍, 岳彪, 王安政, 张哲. 基于光照权重分配和注意力的红外与可见光图像融合深度学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2183-2191. |
| [13] | 熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232. |
| [14] | 李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072. |
| [15] | 毛典辉, 李学博, 刘峻岭, 张登辉, 颜文婧. 基于并行异构图和序列注意力机制的中文实体关系抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2018-2025. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||
