Journal of Computer Applications

    Next Articles

Action recognition based on fusion of temporal convolution and multi-dimensional attention

LI Yuchen, LI Wanggen, WANG Cheng, GAO Shangshu, ZHANG Chunsheng   

  1. School of Computer & Information, Anhui Normal University
  • Received:2025-09-24 Revised:2025-12-16 Online:2025-12-30 Published:2025-12-30
  • About author:LI Yuchen,born in 1999,M. S. candidate. His research interests include action recognition. LI Wanggen, born in 1973, Ph. D., professor. His research interests include biological computing, computational intelligence. WANG Cheng, born in 1989, Ph. D. His research interests include medical image intelligent analysis. GAO Shangshu, born in 2000, M. S. candidate. His research interests include depression screening. ZHANG Chunsheng, born in 2001, M. S. candidate. His research interests include human pose estimation.
  • Supported by:
    National Natural Science Foundation of China (61976006)

基于融合时间卷积与多维度注意力的动作识别

李宇辰,李汪根,王程,高尚书,张春生   

  1. 安徽师范大学 计算机与信息学院
  • 通讯作者: 李汪根
  • 作者简介:李宇辰(1999—),男,安徽宿州人,硕士研究生,主要研究方向:动作识别;李汪根(1973—),男,安徽太湖人,教授,博士,主要研究方向:生物计算、智能计算;王程(1989—),男,安徽六安人,博士,主要研究方向:医学图像智能分析;高尚书(2000—),男,安徽宿州人,硕士研究生,主要研究方向:抑郁症检测;张春生(2001—),男,安徽池州人,硕士研究生,主要研究方向:姿态估计。
  • 基金资助:
    国家自然科学基金资助项目(61976006)

Abstract: To address the issue of insufficient temporal representation and inadequate feature aggregation in most existing skeleton-based action recognition algorithms, a skeleton-based action recognition method using dual-branch temporal convolution and multi-dimensional attention was developed. ,  Firstly, progressive cross-scale temporal convolution (PCST-Conv) was employed to extract local temporal dependencies of actions, which reduced model parameters while improving recognition performance. Secondly, a du-al-branch parallel temporal block (DBPT-Block) was utilized to capture both long- and short-term temporal dependencies effectively through a parallel temporal convolution architecture. Meanwhile, a cross-temporal adaptive fusion (CTAF) module was introduced to enhance the representation of key action frames through dynamic weight allocation, addressing the problem that traditional mul-ti-scale fusion tends to ignore temporal variations across different action moments. Finally, hybrid multi-dimensional attention (HMDA) was applied to efficiently aggregate multi-dimensional features and further optimize feature representation. Experimental results showed that the method achieved accuracies of 91.7% on the cross-subject (CS) benchmark and 96.0% on the cross-view (CV) benchmark on the NTU-RGB+D 60 dataset. On the NTU-RGB+D 120 dataset, 87.4% accuracy was achieved on the cross-subject (CS-120) benchmark and 88.3% on the cross-setup (SS-120) benchmark. Moreover, compared with existing mainstream methods, higher recognition accuracy was achieved with fewer parameters, demonstrating significant advantages in both accuracy and effi-ciency. 

Key words: action recognition, skeleton sequence, adaptive fusion, multi-dimensional attention, dual-branch temporal convolution

摘要: 针对目前大多数基于骨骼的人体动作识别算法存在时序表征能力不足与特征聚合不充分的问题,提出了一种使用双分支时间卷积和多维度注意力的骨骼动作识别方法。首先使用渐进跨尺度时间卷积(PCST-Conv)提取动作局部时间依赖,在减少模型参数量的同时也提高了模型识别的性能;其次,利用双分支并联时间块(DBPT-Block)有效捕捉长短程动作的时间依赖;同时引入跨时序自适应融合(CTAF)模块,通过动态权重分配强化关键动作帧的表征能力,解决了传统多尺度融合会忽略动作不同时刻的差异性问题;最后通过混合多维度注意力(HMDA)高效地聚合多维度的特征,进一步优化特征表达。实验结果表明,该方法在NTU-RGB+D 60数据集上取得了跨受试者基准(CS)91.7%和跨视角基准(CV)96.0%,在NTU-RGB+D 120数据集的交叉主题基准(CS-120)和交叉设置基准(SS-120)的识别准确率分别为87.4%和88.3%。此外,与现有主流方法相比,该方法以更少的参数量取得了更高的识别准确率,在精度与效率上均展现出显著优势。

关键词: 动作识别, 骨架序列, 自适应融合, 多维度注意力, 双分支时间卷积

CLC Number: