《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (11): 3639-3646.DOI: 10.11772/j.issn.1001-9081.2023101379

• 前沿与综合应用 • 上一篇    

基于RPEpose和XJ-GCN的轻量级跌倒检测算法框架

梁睿衍, 杨慧()   

  1. 西南交通大学 信息科学与技术学院,成都 610031
  • 收稿日期:2023-10-13 修回日期:2024-01-16 接受日期:2024-01-18 发布日期:2024-11-13 出版日期:2024-11-10
  • 通讯作者: 杨慧
  • 作者简介:梁睿衍(1998—),男,广东佛山人,硕士研究生,主要研究方向:姿态估计、图卷积网络
  • 基金资助:
    区域光纤通信网与新型光通信系统国家重点实验室开放基金资助项目

Lightweight fall detection algorithm framework based on RPEpose and XJ-GCN

Ruiyan LIANG, Hui YANG()   

  1. School of Information Science and Technology,Southwest Jiaotong University,Chengdu Sichuan 610031,China
  • Received:2023-10-13 Revised:2024-01-16 Accepted:2024-01-18 Online:2024-11-13 Published:2024-11-10
  • Contact: Hui YANG
  • About author:LIANG Ruiyan, born in 1998, M. S. candidate. His research interests include pose estimation, graph convolutional network.
  • Supported by:
    Open Fund of State Key Laboratory of Advanced Optical Communication Systems and Networks

摘要:

传统的以ViT(Vision Transformer)模型为基准架构的关节点检测模型通常采用二维正弦位置编码,易丢失图像关键的二维形状信息,导致精度下降;而行为分类模型中,传统的时空图卷积网络(ST-GCN)在单标签分区策略中存在非物理连接的关节连接间关联度缺失问题。针对上述问题,设计一种轻量化实时跌倒检测算法框架,以快速准确地检测跌倒行为。该框架包含关节点检测模型RPEpose(Relative Position Encoding pose estimation)和行为分类模型XJ-GCN(Cross-Joint attention Graph Convolutional Network)。一方面,RPEpose模型采用相对位置编码克服原有位置编码的位置不敏感的缺陷,提升ViT架构在关节点检测中的性能;另一方面,提出X-Joint(Cross?Joint)注意力机制,将分区策略重构为XJL(X-Joint Labeling)分区策略后,对所有关节连接之间的依赖关系建模,能获得关节连接潜在相关性,具有分类性能优异且参数量小的优势。实验结果表明,在COCO 2017验证集上,对于分辨率为256×192的图像,RPEpose模型的计算开销仅为8.2 GFLOPs(Giga FLOating Point of operations),测试平均精度(AP)为74.3%;在以交叉目标(X?Sub)为划分标准的NTU RGB+D数据集上,XJ-GCN模型的测试Top-1准确率为89.6%,所提框架RPEpose+XJ-GCN的处理速度为30 frame/s,预测准确率为87.2%,具有较高的实时性和准确性。

关键词: 跌倒检测, 关节点检测, 相对位置编码, 时空图卷积网络, 注意力机制

Abstract:

The traditional joint keypoint detection model based on the Vision Transformer (ViT) model usually adopts 2D Sine Position Embedding, which is prone to losing key two-dimensional shape information in the image, leading to a decrease in accuracy. For behavior classification models, the traditional Spatio-Temporal Graph Convolutional Network (ST?GCN) suffers from the lack of correlation between non-physically connected joint connections in uni-labeling partitioning strategy. To address the above problems, a lightweight real-time fall detection algorithm framework was designed to detect fall behavior quickly and accurately. The framework contains a joint keypoint detection model RPEpose (Relative Position Encoding pose estimation) and a behavior classification model XJ-GCN (Cross-Joint attention Graph Convolutional Network). On the one hand, a type of relative position encoding was adopted by the RPEpose model to overcome the position insensitivity defect of the original position encoding and improve the performance of the ViT architecture in joint keypoint detection. On the other hand, an X-Joint (Cross-Joint) attention mechanism was proposed, after reconstructing the partitioning strategy into the XJL (X-Joint Labeling) partitioning strategy, the dependencies between all joint connections were modelled to obtain the potential correlation between joint connections with excellent classification performance and few parameters. Experimental results indicate that, on the COCO 2017 validation set, RPEpose model only requires 8.2 GFLOPs (Giga FLOating Point of operations) of computational overhead while achieving a testing Average Precision (AP) of 74.3% for images with a resolution of 256×192; on the NTU RGB+D dataset, the Top-1 accuracy using Cross Subject (X?Sub) as the partitioning standard is 89.6%, and the proposed framework RPEpose+XJ-GCN has a prediction accuracy of 87.2% at a processing speed of 30 frame/s, verifying its high real-time and accuracy.

Key words: fall detection, joint keypoint detection, relative position encoding, Spatio-Temporal Graph Convolutional Network (ST-GCN), attention mechanism

中图分类号: