基于Transformer时间特征聚合的步态识别模型

doi:10.11772/j.issn.1001-9081.2022060836

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (S1): 15-18.DOI: 10.11772/j.issn.1001-9081.2022060836

基于Transformer时间特征聚合的步态识别模型

邓帆¹, 曾渊¹, 刘博文¹, 姜博源²^,³, 钟重阳²^,³(), 夏时洪²^,³

^1.国网北京城区供电公司, 北京 110102
^2.中国科学院计算技术研究所, 北京 100190
^3.中国科学院大学计算机与控制工程学院, 北京 100049

收稿日期:2022-06-10 修回日期:2022-09-01 接受日期:2022-09-05 发布日期:2023-07-04 出版日期:2023-06-30
通讯作者: 钟重阳
作者简介:邓帆（1988—），女，福建上杭人，高级工程师，硕士，主要研究方向：电力系统及自动化、人工智能
曾渊（1992—），男，辽宁兴城人，工程师，硕士，主要研究方向：电力系统及自动化、人工智能
刘博文（1989—），男，黑龙江牡丹江人，工程师，硕士，主要研究方向：电力系统及自动化、人工智能
姜博源（1998—），男，山东滨州人，硕士，主要研究方向：计算机图形学、人体姿态估计
钟重阳（1994—），男，重庆人，博士，主要研究方向：计算机图形学、人体运动仿真 zhongchongyang@ict.ac.cn
夏时洪（1974—），男，四川平昌人，研究员，博士，主要研究方向：计算机图形学、虚拟现实、人工智能。

Gait recognition model based on temporal feature aggregation with Transformer

Fan DENG¹, Yuan ZENG¹, Bowen LIU¹, Boyuan JIANG²^,³, Chongyang ZHONG²^,³(), Shihong XIA²^,³

^1.State Grid Beijing Urban Power Supply Company，Beijing 110102，China
^2.Institute of Computing Technology，Chinese Academy of Sciences，Beijing 100190，China
^3.School of Computer and Control Engineering，University of Chinese Academy of Sciences，Beijing 100049，China

Received:2022-06-10 Revised:2022-09-01 Accepted:2022-09-05 Online:2023-07-04 Published:2023-06-30
Contact: Chongyang ZHONG

摘要/Abstract

摘要：

步态识别是最有前途的基于视频生物识别技术之一。目前，大多数步态识别方法更着重于提升神经网络提取空间特征的能力，而忽视在时间维度上特征的聚合。针对步态识别中缺乏时间维度特征提取能力的问题，提出了一种基于Transformer时间特征聚合的步态识别模型。首先，步态剪影序列通过卷积神经网络提取特征，与位置编码结合；然后，在时间维度上使用Transformer编码器聚合时间特征；最后，连接线性分类层实现步态识别。在最流行的步态识别数据集CASIA-B上进行实验，所提模型比GaitSet模型识别准确度在NM#5-6上提升了3.4个百分点，BG#1-2上提升了1.5个百分点，CL#1-2上提升了11.6个百分点。实验结果表明，Transformer提升了网络对时间维度特征的聚合能力，并且降低了模型对外套和携带物的敏感性。

关键词: 步态识别, 神经网络, 特征提取, Transformer, 位置编码

Abstract:

Gait recognition is one of the most promising video-based biometric technologies. Recently， most gait recognition methods focus more on improving the ability of neural networks to extract spatial features while ignoring the aggregation of features in the temporal dimension. To address the lack of feature extraction ability in the temporal dimension in gait recognition， a gait recognition model based on temporal feature aggregation with Transformer was proposed. Firstly， features were extracted from gait silhouette sequences by convolutional neural networks， combined with location coding. Then， temporal features were aggregated in the temporal dimension using a Transformer encoder. Finally， gait recognition was implemented by linear classification layers. Experiments were onducted on the most popular gait recognition dataset CASIA-B.The recognition accuracy of the proposed model is 3.4 percentage points higher than that of the GaitSet model on NM#5-6， 1.5 percentage points higher on BG#1-2， and 11.6 percentage points higher on CL#1-2. Experimental results show that Transformer improves the network’s ability to aggregate time-dimensional features and reduces the model’s sensitivity to coats and carriers.

Key words: gait recognition, neural network, feature extraction, Transformer, positional encoding

中图分类号:

TP181

邓帆, 曾渊, 刘博文, 姜博源, 钟重阳, 夏时洪. 基于Transformer时间特征聚合的步态识别模型[J]. 计算机应用, 2023, 43(S1): 15-18.

Fan DENG, Yuan ZENG, Bowen LIU, Boyuan JIANG, Chongyang ZHONG, Shihong XIA. Gait recognition model based on temporal feature aggregation with Transformer[J]. Journal of Computer Applications, 2023, 43(S1): 15-18.

图/表 5

参考文献 18

1	HAN J， BHANU B. Individual recognition using gait energy image［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2005， 28（2）： 316-322. 10.1109/tpami.2006.38
2	MAKIHARA Y， SAGAWA R， MUKAIGAWA Y， et al. Gait recognition using a view transformation model in the frequency domain［C］// Proceedings of the 2006 European Conference on Computer Vision，LNCS 3953. Cham： Springer， 2006： 151-163.
3	ZHANG K， LUO W， MA L， et al. Learning joint gait representation via quintuplet loss minimization ［C］ // Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4700-4709. 10.1109/cvpr.2019.00483
4	ABDOLAHI B， GHEISSARI N. Gait recognition using dynamic texture descriptors［C］// Proceedings of the 2012 2nd International eConference on Computer and and Knowledge Engineering. Piscataway： IEEE， 2012： 6-11. 10.1109/iccke.2012.6395343
5	CHAO H， HE Y， ZHANG J， et al. GaitSet： Regarding gait as a set for cross-view gait recognition ［C］// Proceedings of the 2019 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019： 8126-8133. 10.1609/aaai.v33i01.33018126
6	LI X， MAKIHARA Y， XU C， et al. Joint intensity transformer network for gait recognition robust against clothing and carrying status［J］. IEEE Transactions on Information Forensics and Security， 2019， 14（12）： 3102-3115. 10.1109/tifs.2019.2912577
7	WOLF T， BABAEE M， RIGOLL G. Multi-view gait recognition using 3D convolutional neural networks［C］// Proceedings of the 2016 IEEE International Conference on Image Processing. Piscataway： IEEE， 2016： 4165-4169. 10.1109/icip.2016.7533144
8	SONG C， HUANG Y， HUANG Y， et al. GaitNet： an end-to-end network for gait based human identification［J］. Pattern Recognition， 2019， 96： No.106988. 10.1016/j.patcog.2019.106988
9	ZHANG Y， HUANG Y， YU S， et al. Cross-view gait recognition by discriminative feature learning ［J］. IEEE Transactions on Image Processing， 2019， 29： 1001-1015. 10.1109/tip.2019.2926208
10	何逸炜，张军平.步态识别的深度学习：综述［J］.模式识别与人工智能，2018，31（5）：442-452.
11	WU Z， HUANG Y， WANG L， et al. A comprehensive study on cross-view gait based human identification with deep CNNs ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2016， 39（2）： 209-226. 10.1109/tpami.2016.2545669
12	ZHANG Z， TRAN L， YIN X， et al. Gait recognition via disentangled representation learning ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4710-4719. 10.1109/cvpr.2019.00484
13	YAMAUCHI K， BHANU B， SAITO H. 3D human body modeling using range data［C］// Proceedings of the 20th International Conference on Pattern Recognition. Piscataway： IEEE， 2010： 3476-3479. 10.1109/icpr.2010.849
14	ARIYANTO G， NIXON M S. Marionette mass-spring model for 3D gait biometrics ［C］// Proceedings of the 5th IAPR International Conference on Biometrics. Piscataway： IEEE， 2012： 354-359. 10.1109/icb.2012.6199832
15	VASWANI A， SHAZEER N， PARMAR N，et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017：6000-6010.
16	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words：transformers for image recognition at scale ［EB/OL］（2021-06-03）［2022-01-04］. .
17	LIU Z， LIN Y， CAO Y， et al. Swin transformer： hierarchical vision transformer using shifted windows ［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 10012-10022. 10.1109/iccv48922.2021.00986
18	TOUVRON H， CORD M， DOUZE M， et al. Training data-efficient image transformers & distillation through attention ［C］// Proceedings of the 2021 International Conference on Machine Learning. New York： ACM， 2021： 10347-10357. 10.1109/iccv48922.2021.00091

卷积块	层	输入通道数	输出通道数	卷积核大小	填充大小
卷积块1	conv1	1	32	5	2
	conv2	32	32	3	1
	max pooling
卷积块2	conv3	32	64	3	1
	conv4	64	64	3	1
	max pooling
卷积块3	conv5	64	128	3	1
卷积块3	conv6	128	128	3	1

卷积块	层	输入通道数	输出通道数	卷积核大小	填充大小
卷积块1	conv1	1	32	5	2
	conv2	32	32	3	1
	max pooling
卷积块2	conv3	32	64	3	1
	conv4	64	64	3	1
	max pooling
卷积块3	conv5	64	128	3	1
卷积块3	conv6	128	128	3	1

角度/ （°）	NM #5-6			BG #1-2			CL #1-2
角度/ （°）	CNN-LB	GaitSet	本文模型	CNN-LB	GaitSet	本文模型	CNN-LB	GaitSet	本文模型
平均	89.9	91.6	95.0	72.4	85.7	87.2	54.0	58.9	70.5
0	82.6	90.8	91.1	64.2	83.0	83.8	37.7	42.1	61.4
18	90.3	97.9	91.5	80.6	87.8	91.2	57.2	58.2	75.4
36	96.1	98.4	92.4	82.7	88.3	91.8	66.6	65.1	80.7
54	94.3	94.9	96.9	76.9	93.3	88.8	61.1	70.7	77.3
72	90.1	86.9	93.6	64.8	82.6	83.3	55.2	68.0	72.1
90	87.4	92.6	91.7	63.1	74.8	81.0	54.6	70.6	70.1
108	89.9	93.5	95.0	68.0	89.5	84.1	55.2	65.3	71.5
126	94.0	96.0	97.8	76.9	91.0	90.0	59.1	69.4	73.5
144	94.7	90.9	98.9	82.2	86.1	92.2	58.9	51.5	73.5
162	91.3	88.8	96.8	75.4	81.2	94.4	48.8	50.1	68.4
180	78.5	89.0	85.8	61.3	85.6	79.0	39.4	36.6	50.0

角度/ （°）	NM #5-6			BG #1-2			CL #1-2
角度/ （°）	CNN-LB	GaitSet	本文模型	CNN-LB	GaitSet	本文模型	CNN-LB	GaitSet	本文模型
平均	89.9	91.6	95.0	72.4	85.7	87.2	54.0	58.9	70.5
0	82.6	90.8	91.1	64.2	83.0	83.8	37.7	42.1	61.4
18	90.3	97.9	91.5	80.6	87.8	91.2	57.2	58.2	75.4
36	96.1	98.4	92.4	82.7	88.3	91.8	66.6	65.1	80.7
54	94.3	94.9	96.9	76.9	93.3	88.8	61.1	70.7	77.3
72	90.1	86.9	93.6	64.8	82.6	83.3	55.2	68.0	72.1
90	87.4	92.6	91.7	63.1	74.8	81.0	54.6	70.6	70.1
108	89.9	93.5	95.0	68.0	89.5	84.1	55.2	65.3	71.5
126	94.0	96.0	97.8	76.9	91.0	90.0	59.1	69.4	73.5
144	94.7	90.9	98.9	82.2	86.1	92.2	58.9	51.5	73.5
162	91.3	88.8	96.8	75.4	81.2	94.4	48.8	50.1	68.4
180	78.5	89.0	85.8	61.3	85.6	79.0	39.4	36.6	50.0

[1]	刘希未, 宫晓燕, 赵红霞, 边思宇, 邵帅, 戴亚平, 代文鑫. 基于混合注意力机制的动态人脸表情识别[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 1-7.
[2]	赵帅斌, 林旭东, 翁晓健. 基于经验模态分解与投资者情绪的Attention-BiLSTM股价趋势预测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 112-118.
[3]	江魁, 余志航, 陈小雷, 李宇豪. 基于BERT-CNN的Webshell流量检测系统设计与实现[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 126-132.
[4]	郑超, 邬悦婷, 肖珂. 基于联邦学习和深度残差网络的入侵检测[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 133-138.
[5]	王辉, 李建红. 基于Transformer的三维模型小样本识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1750-1758.
[6]	杨盼, 张敏情, 葛虞, 狄富强, 张英男. 基于风格迁移过程的彩色图像信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1730-1735.
[7]	郑智雄, 刘建华, 孙水华, 徐戈, 林鸿辉. 融合多窗口局部信息的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1796-1802.
[8]	张慧斌, 冯丽萍, 郝耀军, 王一宁. 基于注意力机制和迁移学习的古壁画朝代识别[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1826-1832.
[9]	王利, 宣士斌, 秦续阳, 李紫薇. 基于双解码器的Transformer多目标跟踪方法[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1919-1929.
[10]	王先兰, 周金坤, 穆楠, 王晨. 基于多任务联合学习的跨视角地理定位方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1625-1635.
[11]	杨森淇, 段旭良, 肖展, 郎松松, 李志勇. 基于ERNIE+DPCNN+BiGRU的农业新闻文本分类[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1461-1466.
[12]	刘阳, 陆志扬, 王骏, 施俊. 基于自注意力连接UNet的磁共振成像去吉布斯伪影算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1606-1611.
[13]	许睿, 梁爽, 万航, 文益民, 沈世铭, 李建. 基于烛台图模式匹配的PM_2.5扩散特征的提取[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1394-1400.
[14]	隋佳宏, 毛莺池, 于慧敏, 王子成, 平萍. 基于图注意力网络的全局图像描述生成方法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1409-1415.
[15]	蒋瑞林, 覃仁超. 基于深度可分离卷积的多神经网络恶意代码检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1527-1533.

基于Transformer时间特征聚合的步态识别模型

Gait recognition model based on temporal feature aggregation with Transformer

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 18

相关文章 15

编辑推荐

Metrics