《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (S1): 15-18.DOI: 10.11772/j.issn.1001-9081.2022060836

• 人工智能 • 上一篇    下一篇

基于Transformer时间特征聚合的步态识别模型

邓帆1, 曾渊1, 刘博文1, 姜博源2,3, 钟重阳2,3(), 夏时洪2,3   

  1. 1.国网北京城区供电公司, 北京 110102
    2.中国科学院 计算技术研究所, 北京 100190
    3.中国科学院大学 计算机与控制工程学院, 北京 100049
  • 收稿日期:2022-06-10 修回日期:2022-09-01 接受日期:2022-09-05 发布日期:2023-07-04 出版日期:2023-06-30
  • 通讯作者: 钟重阳
  • 作者简介:邓帆(1988—),女,福建上杭人,高级工程师,硕士,主要研究方向:电力系统及自动化、人工智能
    曾渊(1992—),男,辽宁兴城人,工程师,硕士,主要研究方向:电力系统及自动化、人工智能
    刘博文(1989—),男,黑龙江牡丹江人,工程师,硕士,主要研究方向:电力系统及自动化、人工智能
    姜博源(1998—),男,山东滨州人,硕士,主要研究方向:计算机图形学、人体姿态估计
    钟重阳(1994—),男,重庆人,博士,主要研究方向:计算机图形学、人体运动仿真 zhongchongyang@ict.ac.cn
    夏时洪(1974—),男,四川平昌人,研究员,博士,主要研究方向:计算机图形学、虚拟现实、人工智能。

Gait recognition model based on temporal feature aggregation with Transformer

Fan DENG1, Yuan ZENG1, Bowen LIU1, Boyuan JIANG2,3, Chongyang ZHONG2,3(), Shihong XIA2,3   

  1. 1.State Grid Beijing Urban Power Supply Company,Beijing 110102,China
    2.Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
    3.School of Computer and Control Engineering,University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2022-06-10 Revised:2022-09-01 Accepted:2022-09-05 Online:2023-07-04 Published:2023-06-30
  • Contact: Chongyang ZHONG

摘要:

步态识别是最有前途的基于视频生物识别技术之一。目前,大多数步态识别方法更着重于提升神经网络提取空间特征的能力,而忽视在时间维度上特征的聚合。针对步态识别中缺乏时间维度特征提取能力的问题,提出了一种基于Transformer时间特征聚合的步态识别模型。首先,步态剪影序列通过卷积神经网络提取特征,与位置编码结合;然后,在时间维度上使用Transformer编码器聚合时间特征;最后,连接线性分类层实现步态识别。在最流行的步态识别数据集CASIA-B上进行实验,所提模型比GaitSet模型识别准确度在NM#5-6上提升了3.4个百分点,BG#1-2上提升了1.5个百分点,CL#1-2上提升了11.6个百分点。实验结果表明,Transformer提升了网络对时间维度特征的聚合能力,并且降低了模型对外套和携带物的敏感性。

关键词: 步态识别, 神经网络, 特征提取, Transformer, 位置编码

Abstract:

Gait recognition is one of the most promising video-based biometric technologies. Recently, most gait recognition methods focus more on improving the ability of neural networks to extract spatial features while ignoring the aggregation of features in the temporal dimension. To address the lack of feature extraction ability in the temporal dimension in gait recognition, a gait recognition model based on temporal feature aggregation with Transformer was proposed. Firstly, features were extracted from gait silhouette sequences by convolutional neural networks, combined with location coding. Then, temporal features were aggregated in the temporal dimension using a Transformer encoder. Finally, gait recognition was implemented by linear classification layers. Experiments were onducted on the most popular gait recognition dataset CASIA-B.The recognition accuracy of the proposed model is 3.4 percentage points higher than that of the GaitSet model on NM#5-6, 1.5 percentage points higher on BG#1-2, and 11.6 percentage points higher on CL#1-2. Experimental results show that Transformer improves the network’s ability to aggregate time-dimensional features and reduces the model’s sensitivity to coats and carriers.

Key words: gait recognition, neural network, feature extraction, Transformer, positional encoding

中图分类号: