基于图卷积网络与自注意力图池化的视频行人重识别方法

doi:10.11772/j.issn.1001-9081.2022010034

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (3): 728-735.DOI: 10.11772/j.issn.1001-9081.2022010034

所属专题：人工智能

基于图卷积网络与自注意力图池化的视频行人重识别方法

姚英茂, 姜晓燕()

上海工程技术大学电子电气工程学院，上海 201620

收稿日期:2022-01-13 修回日期:2022-03-10 接受日期:2022-03-14 发布日期:2022-05-31 出版日期:2023-03-10
通讯作者: 姜晓燕
作者简介:姚英茂（1997—），男，河南孟州人，硕士研究生，主要研究方向：行人重识别
姜晓燕（1985—），女，江苏南通人，副教授，博士研究生，主要研究方向：计算机视觉。
基金资助:
国家自然科学基金资助项目(U2033218)

Video-based person re-identification method based on graph convolution network and self-attention graph pooling

Yingmao YAO, Xiaoyan JIANG()

School of Electronic and Electrical Engineering，Shanghai University of Engineering Science，Shanghai 201620，China

Received:2022-01-13 Revised:2022-03-10 Accepted:2022-03-14 Online:2022-05-31 Published:2023-03-10
Contact: Xiaoyan JIANG
About author:YAO Yingmao， born in 1997， M. S. candidate. His research interests include person re-identification.
JIANG Xiaoyan， born in 1985， Ph. D. candidate， associate professor. Her research interests include computer vision.
Supported by:
National Natural Science Foundation of China(U2033218)

摘要/Abstract

摘要：

针对跨相机网络视频中存在的遮挡、空间不对齐、背景杂波等因素导致视频行人重识别效果较差的问题，提出一种基于图卷积网络（GCN）与自注意力图池化（SAGP）的视频行人重识别方法。首先，通过区块关系图建模挖掘视频中帧间不同区域的关联信息，并利用GCN优化逐帧图像中的区域特征，缓解遮挡和不对齐等问题；然后，通过SAGP机制去除对行人特征贡献较低的区域，避免背景杂波区域的干扰；最后，提出一种加权损失函数策略，使用中心损失优化分类学习结果，并使用在线软挖掘和类感知注意力（OCL）损失解决难样本挖掘过程中可用样本未被充分利用的问题。实验结果表明，在MARS数据集上，相较于次优的AITL方法，所提方法的平均精度均值（mAP）与Rank-1分别提高1.3和2.0个百点。所提方法能够较好地利用视频中的时空信息，提取更具判别力的行人特征，提高行人重识别任务的效果。

关键词: 视频行人重识别, 图卷积网络, 自注意力图池化, 加权损失函数策略, 中心损失

Abstract:

Aiming at the bad effect of video person re-identification caused by factors such as occlusion， spatial misalignment and background clutter in cross-camera network videos， a video-based person re-identification method based on Graph Convolutional Network （GCN） and Self-Attention Graph Pooling （SAGP） was proposed. Firstly， the correlation information of different regions between frames in the video was mined through the patch relation graph modeling.In order to alleviate the problems such as occlusion and misalignment， the region features in the frame-by-frame images were optimized by using GCN. Then， the regions with low contribution to person features were removed by SAGP mechanism to avoid the interference of background clutter regions. Finally， a weighted loss function strategy was proposed， the center loss was used to optimize the classification learning results， and Online soft mining and Class-aware attention Loss （OCL） were used to solve the problem that the available samples were not fully used in the process of hard sample mining. Experimental results on MARS dataset show that compared with the sub-optimal Attribute-aware Identity-hard Triplet Loss （AITL）， the proposed method has the mean Average Precision （mAP） and Rank-1 increased by 1.3 percentage points and 2.0 percentage points. The proposed method can better utilize the spatial-temporal information in the video to extract more discriminative person features， and improve the effect of person re-identification tasks.

Key words: video-based person re-identification, Graph Convolutional Network (GCN), Self-Attention Graph Pooling (SAGP), weighted loss function strategy, center loss

中图分类号:

TP391.4

姚英茂, 姜晓燕. 基于图卷积网络与自注意力图池化的视频行人重识别方法[J]. 计算机应用, 2023, 43(3): 728-735.

Yingmao YAO, Xiaoyan JIANG. Video-based person re-identification method based on graph convolution network and self-attention graph pooling[J]. Journal of Computer Applications, 2023, 43(3): 728-735.

图/表 11

参考文献 33

1	叶钰，王正，梁超，等. 多源数据行人重识别研究综述［J］. 自动化学报， 2020， 46（9）： 1869-1884. 10.16383/j.aas.c190278
	YE Y， WANG Z， LIANG C， et al. A survey on multi-source person re-identification［J］. Acta Automatica Sinica， 2020， 46（9）： 1869-1884. 10.16383/j.aas.c190278
2	韩建栋，李晓宇. 基于多尺度特征融合的行人重识别方法［J］. 计算机应用， 2021， 41（10）： 2991-2996. 10.11772/j.issn.1001-9081.2020121908
	HAN J D， LI X Y. Pedestrian re-identification method based on multi-scale feature fusion［J］. Journal of Computer Applications， 2021， 41（10）： 2991-2996. 10.11772/j.issn.1001-9081.2020121908
3	CHUNG D， TAHBOUB K， DELP E J. A two stream siamese convolutional neural network for person re-identification ［C］// Proceedings of the 16th IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 1983-1991. 10.1109/iccv.2017.218
4	ZHOU Z， HUANG Y， WANG W， et al. See the forest for the trees： joint spatial and temporal recurrent neural networks for video-based person re-identification ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 4747-4756. 10.1109/cvpr.2017.717
5	LIU Y， YUAN Z， ZHOU W， et al. Spatial and temporal mutual promotion for video-based person re-identification［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019， 33（1）： 8786-8793. 10.1609/aaai.v33i01.33018786
6	LI J， ZHANG S， HUANG T. Multi-scale temporal cues learning for video person re-identification［J］. IEEE Transactions on Image Processing， 2020， 29： 4461-4473. 10.1109/tip.2020.2972108
7	LIAO X， HE L， YANG Z， et al. Video-based person re-identification via 3d convolutional networks and non-local attention［C］// Proceedings of the 14th Asian Conference on Computer Vision， LNCS 11366. Cham： Springer， 2019： 620-634. 10.1007/978-3-030-20876-9_39
8	FU Y， WANG X， WEI Y， et al. STA： spatial-temporal attention for large-scale video-based person re-identification［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019， 33（1）： 8287-8294. 10.1609/aaai.v33i01.33018287
9	LIU C T， WU C W， WANG Y C F， et al. Spatially and temporally efficient non-local attention network for video-based person re-identification［C］// Proceedings of the 2019 British Machine Vision Conference. Durham： BMVA Press， 2019： No.77. 10.1145/3377170.3377253
10	CHEN D， LI H， XIAO T， et al. Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 1169-1178. 10.1109/cvpr.2018.00128
11	SUBRAMANIAM A， NAMBIAR A， MITTAL A. Co-segmentation inspired attention networks for video-based person re-identification［C］// Proceedings of the 17th IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 562-572. 10.1109/iccv.2019.00065
12	WU Y， BOURAHLA O E F， LI X， et al. Adaptive graph representation learning for video person re-identification［J］. IEEE Transactions on Image Processing， 2020， 29： 8821-8830. 10.1109/tip.2020.3001693
13	KIPF T N， WELLING M. Semi-supervised classification with graph convolutional networks ［EB/OL］. （2017-02-22）［2021-08-13］. . 10.48550/arXiv.1609.02907
14	LEE J， LEE I， KANG J. Self-attention graph pooling［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 3734-3743.
15	WEN Y， ZHANG K， LI Z， et al. A discriminative feature learning approach for deep face recognition［C］// Proceedings of the 14th European Conference on Computer Vision， LNCS 9911. Cham： Springer， 2016： 499-515.
16	WANG X， HUA Y， KODIROV E， et al. Deep metric learning by online soft mining and class-aware attention［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019， 33（1）： 5361-5368. 10.1609/aaai.v33i01.33015361
17	SCARSELLI F， GORI M， TSOI A C， et al. The graph neural network model［J］. IEEE Transactions on Neural Networks， 2009， 20（1）： 61-80. 10.1109/tnn.2008.2005605
18	CHEN L， ZHANG H， XIAO J， et al. Counterfactual critic multi-agent training for scene graph generation［C］// Proceedings of the 17th IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 4613-4623. 10.1109/iccv.2019.00471
19	WANG Y， SUN Y， LIU Z， et al. Dynamic graph CNN for learning on point clouds［J］. ACM Transactions on Graphics， 2019， 38（5）： No.146. 10.1145/3326362
20	LIU Z， ZHANG H， CHEN Z， et al. Disentangling and unifying graph convolutions for skeleton-based action recognition［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 143-152. 10.1109/cvpr42600.2020.00022
21	BAO L， MA B， CHANG H， et al. Masked graph attention network for person re-identification［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2019： 1496-1505. 10.1109/cvprw.2019.00191
22	YANG J， ZHENG W S， YANG Q， et al. Spatial-temporal graph convolutional network for video-based person re-identification［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 3289-3299. 10.1109/cvpr42600.2020.00335
23	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
24	LI S， BAK S， CARR P， et al. Diversity regularized spatiotemporal attention for video-based person re-identification［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 369-378. 10.1109/cvpr.2018.00046
25	SUN Y， ZHENG L， YANG Y， et al. Beyond part models： person retrieval with refined part pooling （and a strong convolutional baseline）［C］// Proceedings of the 15th European Conference on Computer Vision， LNCS 11208. Cham： Springer， 2018： 480-496.
26	GAO H， JI S. Graph U-nets［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 2083-2092.
27	HERMANS A， BEYER L， LEIBE B. In defense of the triplet loss for person re-identification ［EB/OL］. （2017-11-21）［2021-10-21］. . 10.21203/rs.3.rs-1501673/v1
28	ZHENG L， BIE Z， SUN Y， et al. MARS： a video benchmark for large-scale person re-identification［C］// Proceedings of the 14th European Conference on Computer Vision， LNCS 9910. Cham： Springer， 2016： 868-884.
29	RISTANI E， SOLERA F， ZOU R， et al. Performance measures and a data set for multi-target， multi-camera tracking［C］// Proceedings of the 14th European Conference on Computer Vision， LNCS 9914. Cham： Springer， 2016： 17-35.
30	KIRAN M， BHUIYAN A， BLAIS-MORIN L A， et al. Flow guided mutual attention for person re-identification［J］. Image and Vision Computing， 2021， 113： 104246. 10.1016/j.imavis.2021.104246
31	PORRELLO A， BERGAMINI L， CALDERARA S. Robust re-identification by multiple views knowledge distillation［C］// Proceedings of the 16th European Conference on Computer Vision， LNCS 12355. Cham： Springer， 2020： 93-110.
32	CHEN Z， LI A， JIANG S， et al. Attribute-aware identity-hard triplet loss for video-based person re-identification ［EB/OL］. ［2021-07-24］. . 10.3390/app10062198
33	SELVARAJU R R， COGSWELL M， DAS A， et al. Grad-CAM： visual explanations from deep networks via gradient-based localization［C］// Proceedings of the 16th IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 618-626. 10.1109/iccv.2017.74

方法	MARS				DukeMTMC-VideoReID
方法	mAP	R1	R5	R20	mAP	R1	R5	R20
CNN+CQDA	47.6	65.3	82.0	89.0	—	—	—	—
TAM+ SRM	50.7	70.6	90.0	97.6	—	—	—	—
SSA+ CASE	76.1	86.3	94.7	98.2	—	—	—	—
3DCNN+NLA	79.5	88.6	96.4	98.8	93.7	95.5	99.3	99.7
COSAM	79.9	84.9	95.5	97.9	94.1	95.4	99.3	99.8
STA	80.8	86.3	95.7	98.1	94.6	96.2	99.3	99.6
MA	80.9	87.3	—	—	94.8	96.7	—	—
STE- NVAN	81.2	88.9	—	—	93.5	95.2	—	—
VKD	83.1	89.4	96.8	—	93.5	95.2	98.6	—
AITL	84.4	88.2	96.5	98.4	95.3	95.4	99.6	99.9
本文方法	85.7	90.2	96.7	98.1	95.8	96.7	99.3	99.9

方法	MARS				DukeMTMC-VideoReID
方法	mAP	R1	R5	R20	mAP	R1	R5	R20
CNN+CQDA	47.6	65.3	82.0	89.0	—	—	—	—
TAM+ SRM	50.7	70.6	90.0	97.6	—	—	—	—
SSA+ CASE	76.1	86.3	94.7	98.2	—	—	—	—
3DCNN+NLA	79.5	88.6	96.4	98.8	93.7	95.5	99.3	99.7
COSAM	79.9	84.9	95.5	97.9	94.1	95.4	99.3	99.8
STA	80.8	86.3	95.7	98.1	94.6	96.2	99.3	99.6
MA	80.9	87.3	—	—	94.8	96.7	—	—
STE- NVAN	81.2	88.9	—	—	93.5	95.2	—	—
VKD	83.1	89.4	96.8	—	93.5	95.2	98.6	—
AITL	84.4	88.2	96.5	98.4	95.3	95.4	99.6	99.9
本文方法	85.7	90.2	96.7	98.1	95.8	96.7	99.3	99.9

模型	mAP	R1	R5	R20
Baseline	84.2	88.7	96.0	97.7
Baseline+GCN	85.3	88.7	96.6	98.3
Baseline+GCN+SAGP	85.4	89.2	96.4	98.2
Baseline+CL+OCL	85.1	88.9	96.7	98.3
Baseline+GCN+SAGP+CL+OCL	85.7	90.2	96.7	98.1

模型	mAP	R1	R5	R20
Baseline	84.2	88.7	96.0	97.7
Baseline+GCN	85.3	88.7	96.6	98.3
Baseline+GCN+SAGP	85.4	89.2	96.4	98.2
Baseline+CL+OCL	85.1	88.9	96.7	98.3
Baseline+GCN+SAGP+CL+OCL	85.7	90.2	96.7	98.1

切分块数	mAP	R1	R5	R20
2	84.8	88.8	96.1	98.1
4	85.7	90.2	96.7	98.1
8	85.2	89.0	96.1	98.4

基于图卷积网络与自注意力图池化的视频行人重识别方法

Video-based person re-identification method based on graph convolution network and self-attention graph pooling

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 33

相关文章 15

编辑推荐

Metrics

r	mAP	R1	R5	R20
10	85.3	89.5	96.2	98.1
20	85.1	89.2	96.2	98.1
25	85.7	90.2	96.7	98.1
30	85.3	89.6	96.5	98.1
50	85.2	89.1	96.2	98.0
75	85.2	89.3	96.3	98.2
90	85.3	89.0	96.6	98.3

[1]	庞川林, 唐睿, 张睿智, 刘川, 刘佳, 岳士博. D2D通信系统中基于图卷积网络的分布式功率控制算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2855-2862.
[2]	薛桂香, 王辉, 周卫峰, 刘瑜, 李岩. 基于知识图谱和时空扩散图卷积网络的港口交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2952-2957.
[3]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[4]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[5]	吕锡婷, 赵敬华, 荣海迎, 赵嘉乐. 基于Transformer和关系图卷积网络的信息传播预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1760-1766.
[6]	黎施彬, 龚俊, 汤圣君. 基于Graph Transformer的半监督异配图表示学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1816-1823.
[7]	朱子蒙, 李志新, 郇战, 陈瑛, 梁久祯. 基于三元中心引导的弱监督视频异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1452-1457.
[8]	高龙涛, 李娜娜. 基于方面感知注意力增强的方面情感三元组抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1049-1057.
[9]	杨先凤, 汤依磊, 李自强. 基于交替注意力机制和图卷积网络的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1058-1064.
[10]	王楷天, 叶青, 程春雷. 基于异构图表示的中医电子病历分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 411-417.
[11]	吴祖成, 吴小俊, 徐天阳. 基于模态内细粒度特征关系提取的图像文本检索模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3776-3783.
[12]	胡新荣, 陈静雪, 黄子键, 王帮超, 姚迅, 刘军平, 朱强, 杨捷. 基于图卷积网络的掩码数据增强[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3335-3344.
[13]	王利琴, 张特, 许智宏, 董永峰, 杨国伟. 融合实体语义及结构信息的知识图谱推理[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3371-3378.
[14]	梁睿衍, 杨慧. 基于RPEpose和XJ-GCN的轻量级跌倒检测算法框架[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3639-3646.
[15]	项能强, 朱小飞, 高肇泽. 原型感知双通道图卷积神经网络的信息传播预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3260-3266.