《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (3): 728-735.DOI: 10.11772/j.issn.1001-9081.2022010034
所属专题: 人工智能
收稿日期:
2022-01-13
修回日期:
2022-03-10
接受日期:
2022-03-14
发布日期:
2022-05-31
出版日期:
2023-03-10
通讯作者:
姜晓燕
作者简介:
姚英茂(1997—),男,河南孟州人,硕士研究生,主要研究方向:行人重识别基金资助:
Received:
2022-01-13
Revised:
2022-03-10
Accepted:
2022-03-14
Online:
2022-05-31
Published:
2023-03-10
Contact:
Xiaoyan JIANG
About author:
YAO Yingmao, born in 1997, M. S. candidate. His research interests include person re-identification.Supported by:
摘要:
针对跨相机网络视频中存在的遮挡、空间不对齐、背景杂波等因素导致视频行人重识别效果较差的问题,提出一种基于图卷积网络(GCN)与自注意力图池化(SAGP)的视频行人重识别方法。首先,通过区块关系图建模挖掘视频中帧间不同区域的关联信息,并利用GCN优化逐帧图像中的区域特征,缓解遮挡和不对齐等问题;然后,通过SAGP机制去除对行人特征贡献较低的区域,避免背景杂波区域的干扰;最后,提出一种加权损失函数策略,使用中心损失优化分类学习结果,并使用在线软挖掘和类感知注意力(OCL)损失解决难样本挖掘过程中可用样本未被充分利用的问题。实验结果表明,在MARS数据集上,相较于次优的AITL方法,所提方法的平均精度均值(mAP)与Rank-1分别提高1.3和2.0个百点。所提方法能够较好地利用视频中的时空信息,提取更具判别力的行人特征,提高行人重识别任务的效果。
中图分类号:
姚英茂, 姜晓燕. 基于图卷积网络与自注意力图池化的视频行人重识别方法[J]. 计算机应用, 2023, 43(3): 728-735.
Yingmao YAO, Xiaoyan JIANG. Video-based person re-identification method based on graph convolution network and self-attention graph pooling[J]. Journal of Computer Applications, 2023, 43(3): 728-735.
方法 | MARS | DukeMTMC-VideoReID | ||||||
---|---|---|---|---|---|---|---|---|
mAP | R1 | R5 | R20 | mAP | R1 | R5 | R20 | |
CNN+CQDA | 47.6 | 65.3 | 82.0 | 89.0 | — | — | — | — |
TAM+ SRM | 50.7 | 70.6 | 90.0 | 97.6 | — | — | — | — |
SSA+ CASE | 76.1 | 86.3 | 94.7 | 98.2 | — | — | — | — |
3DCNN+NLA | 79.5 | 88.6 | 96.4 | 98.8 | 93.7 | 95.5 | 99.7 | |
COSAM | 79.9 | 84.9 | 95.5 | 97.9 | 94.1 | 95.4 | ||
STA | 80.8 | 86.3 | 95.7 | 98.1 | 94.6 | 99.6 | ||
MA | 80.9 | 87.3 | — | — | 94.8 | 96.7 | — | — |
STE- NVAN | 81.2 | 88.9 | — | — | 93.5 | 95.2 | — | — |
VKD | 83.1 | 96.8 | — | 93.5 | 95.2 | 98.6 | — | |
AITL | 88.2 | 96.5 | 95.4 | 99.6 | 99.9 | |||
本文 方法 | 85.7 | 90.2 | 98.1 | 95.8 | 96.7 | 99.9 |
表1 不同方法比较 (%)
Tab. 1 Comparison of different methods
方法 | MARS | DukeMTMC-VideoReID | ||||||
---|---|---|---|---|---|---|---|---|
mAP | R1 | R5 | R20 | mAP | R1 | R5 | R20 | |
CNN+CQDA | 47.6 | 65.3 | 82.0 | 89.0 | — | — | — | — |
TAM+ SRM | 50.7 | 70.6 | 90.0 | 97.6 | — | — | — | — |
SSA+ CASE | 76.1 | 86.3 | 94.7 | 98.2 | — | — | — | — |
3DCNN+NLA | 79.5 | 88.6 | 96.4 | 98.8 | 93.7 | 95.5 | 99.7 | |
COSAM | 79.9 | 84.9 | 95.5 | 97.9 | 94.1 | 95.4 | ||
STA | 80.8 | 86.3 | 95.7 | 98.1 | 94.6 | 99.6 | ||
MA | 80.9 | 87.3 | — | — | 94.8 | 96.7 | — | — |
STE- NVAN | 81.2 | 88.9 | — | — | 93.5 | 95.2 | — | — |
VKD | 83.1 | 96.8 | — | 93.5 | 95.2 | 98.6 | — | |
AITL | 88.2 | 96.5 | 95.4 | 99.6 | 99.9 | |||
本文 方法 | 85.7 | 90.2 | 98.1 | 95.8 | 96.7 | 99.9 |
模型 | mAP | R1 | R5 | R20 |
---|---|---|---|---|
Baseline | 84.2 | 88.7 | 96.0 | 97.7 |
Baseline+GCN | 85.3 | 88.7 | 96.6 | 98.3 |
Baseline+GCN+SAGP | 85.4 | 89.2 | 96.4 | 98.2 |
Baseline+CL+OCL | 85.1 | 88.9 | 96.7 | 98.3 |
Baseline+GCN+SAGP+CL+OCL | 85.7 | 90.2 | 96.7 | 98.1 |
表2 在MARS数据集上的消融实验结果 (%)
Tab. 2 Ablation experimental results on MARS dataset
模型 | mAP | R1 | R5 | R20 |
---|---|---|---|---|
Baseline | 84.2 | 88.7 | 96.0 | 97.7 |
Baseline+GCN | 85.3 | 88.7 | 96.6 | 98.3 |
Baseline+GCN+SAGP | 85.4 | 89.2 | 96.4 | 98.2 |
Baseline+CL+OCL | 85.1 | 88.9 | 96.7 | 98.3 |
Baseline+GCN+SAGP+CL+OCL | 85.7 | 90.2 | 96.7 | 98.1 |
切分块数 | mAP | R1 | R5 | R20 |
---|---|---|---|---|
2 | 84.8 | 88.8 | 96.1 | 98.1 |
4 | 85.7 | 90.2 | 96.7 | 98.1 |
8 | 85.2 | 89.0 | 96.1 | 98.4 |
表3 特征切分策略对比 (%)
Tab. 3 Comparison of feature segmentation strategies
切分块数 | mAP | R1 | R5 | R20 |
---|---|---|---|---|
2 | 84.8 | 88.8 | 96.1 | 98.1 |
4 | 85.7 | 90.2 | 96.7 | 98.1 |
8 | 85.2 | 89.0 | 96.1 | 98.4 |
r | mAP | R1 | R5 | R20 |
---|---|---|---|---|
10 | 85.3 | 89.5 | 96.2 | 98.1 |
20 | 85.1 | 89.2 | 96.2 | 98.1 |
25 | 85.7 | 90.2 | 96.7 | 98.1 |
30 | 85.3 | 89.6 | 96.5 | 98.1 |
50 | 85.2 | 89.1 | 96.2 | 98.0 |
75 | 85.2 | 89.3 | 96.3 | 98.2 |
90 | 85.3 | 89.0 | 96.6 | 98.3 |
表4 图池化比率的对比实验结果 (%)
Tab. 4 Comparative experimental results of graph pooling ratio
r | mAP | R1 | R5 | R20 |
---|---|---|---|---|
10 | 85.3 | 89.5 | 96.2 | 98.1 |
20 | 85.1 | 89.2 | 96.2 | 98.1 |
25 | 85.7 | 90.2 | 96.7 | 98.1 |
30 | 85.3 | 89.6 | 96.5 | 98.1 |
50 | 85.2 | 89.1 | 96.2 | 98.0 |
75 | 85.2 | 89.3 | 96.3 | 98.2 |
90 | 85.3 | 89.0 | 96.6 | 98.3 |
λ | mAP | R1 | R5 | R20 |
---|---|---|---|---|
10 | 85.1 | 89.2 | 96.3 | 98.2 |
30 | 85.5 | 88.8 | 96.2 | 98.1 |
40 | 85.3 | 89.0 | 96.5 | 98.1 |
50 | 85.7 | 90.2 | 96.7 | 98.1 |
60 | 84.7 | 89.1 | 96.1 | 98.2 |
70 | 85.2 | 89.4 | 96.1 | 97.9 |
90 | 84.4 | 89.1 | 96.4 | 98.1 |
表5 损失函数的权重参数的对比 (%)
Tab. 5 Comparison on weighting parameters of loss function
λ | mAP | R1 | R5 | R20 |
---|---|---|---|---|
10 | 85.1 | 89.2 | 96.3 | 98.2 |
30 | 85.5 | 88.8 | 96.2 | 98.1 |
40 | 85.3 | 89.0 | 96.5 | 98.1 |
50 | 85.7 | 90.2 | 96.7 | 98.1 |
60 | 84.7 | 89.1 | 96.1 | 98.2 |
70 | 85.2 | 89.4 | 96.1 | 97.9 |
90 | 84.4 | 89.1 | 96.4 | 98.1 |
1 | 叶钰, 王正, 梁超, 等. 多源数据行人重识别研究综述[J]. 自动化学报, 2020, 46(9): 1869-1884. 10.16383/j.aas.c190278 |
YE Y, WANG Z, LIANG C, et al. A survey on multi-source person re-identification[J]. Acta Automatica Sinica, 2020, 46(9): 1869-1884. 10.16383/j.aas.c190278 | |
2 | 韩建栋, 李晓宇. 基于多尺度特征融合的行人重识别方法[J]. 计算机应用, 2021, 41(10): 2991-2996. 10.11772/j.issn.1001-9081.2020121908 |
HAN J D, LI X Y. Pedestrian re-identification method based on multi-scale feature fusion[J]. Journal of Computer Applications, 2021, 41(10): 2991-2996. 10.11772/j.issn.1001-9081.2020121908 | |
3 | CHUNG D, TAHBOUB K, DELP E J. A two stream siamese convolutional neural network for person re-identification [C]// Proceedings of the 16th IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1983-1991. 10.1109/iccv.2017.218 |
4 | ZHOU Z, HUANG Y, WANG W, et al. See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4747-4756. 10.1109/cvpr.2017.717 |
5 | LIU Y, YUAN Z, ZHOU W, et al. Spatial and temporal mutual promotion for video-based person re-identification[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2019, 33(1): 8786-8793. 10.1609/aaai.v33i01.33018786 |
6 | LI J, ZHANG S, HUANG T. Multi-scale temporal cues learning for video person re-identification[J]. IEEE Transactions on Image Processing, 2020, 29: 4461-4473. 10.1109/tip.2020.2972108 |
7 | LIAO X, HE L, YANG Z, et al. Video-based person re-identification via 3d convolutional networks and non-local attention[C]// Proceedings of the 14th Asian Conference on Computer Vision, LNCS 11366. Cham: Springer, 2019: 620-634. 10.1007/978-3-030-20876-9_39 |
8 | FU Y, WANG X, WEI Y, et al. STA: spatial-temporal attention for large-scale video-based person re-identification[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2019, 33(1): 8287-8294. 10.1609/aaai.v33i01.33018287 |
9 | LIU C T, WU C W, WANG Y C F, et al. Spatially and temporally efficient non-local attention network for video-based person re-identification[C]// Proceedings of the 2019 British Machine Vision Conference. Durham: BMVA Press, 2019: No.77. 10.1145/3377170.3377253 |
10 | CHEN D, LI H, XIAO T, et al. Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 1169-1178. 10.1109/cvpr.2018.00128 |
11 | SUBRAMANIAM A, NAMBIAR A, MITTAL A. Co-segmentation inspired attention networks for video-based person re-identification[C]// Proceedings of the 17th IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 562-572. 10.1109/iccv.2019.00065 |
12 | WU Y, BOURAHLA O E F, LI X, et al. Adaptive graph representation learning for video person re-identification[J]. IEEE Transactions on Image Processing, 2020, 29: 8821-8830. 10.1109/tip.2020.3001693 |
13 | KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks [EB/OL]. (2017-02-22) [2021-08-13]. . 10.48550/arXiv.1609.02907 |
14 | LEE J, LEE I, KANG J. Self-attention graph pooling[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 3734-3743. |
15 | WEN Y, ZHANG K, LI Z, et al. A discriminative feature learning approach for deep face recognition[C]// Proceedings of the 14th European Conference on Computer Vision, LNCS 9911. Cham: Springer, 2016: 499-515. |
16 | WANG X, HUA Y, KODIROV E, et al. Deep metric learning by online soft mining and class-aware attention[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2019, 33(1): 5361-5368. 10.1609/aaai.v33i01.33015361 |
17 | SCARSELLI F, GORI M, TSOI A C, et al. The graph neural network model[J]. IEEE Transactions on Neural Networks, 2009, 20(1): 61-80. 10.1109/tnn.2008.2005605 |
18 | CHEN L, ZHANG H, XIAO J, et al. Counterfactual critic multi-agent training for scene graph generation[C]// Proceedings of the 17th IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 4613-4623. 10.1109/iccv.2019.00471 |
19 | WANG Y, SUN Y, LIU Z, et al. Dynamic graph CNN for learning on point clouds[J]. ACM Transactions on Graphics, 2019, 38(5): No.146. 10.1145/3326362 |
20 | LIU Z, ZHANG H, CHEN Z, et al. Disentangling and unifying graph convolutions for skeleton-based action recognition[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 143-152. 10.1109/cvpr42600.2020.00022 |
21 | BAO L, MA B, CHANG H, et al. Masked graph attention network for person re-identification[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2019: 1496-1505. 10.1109/cvprw.2019.00191 |
22 | YANG J, ZHENG W S, YANG Q, et al. Spatial-temporal graph convolutional network for video-based person re-identification[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3289-3299. 10.1109/cvpr42600.2020.00335 |
23 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
24 | LI S, BAK S, CARR P, et al. Diversity regularized spatiotemporal attention for video-based person re-identification[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 369-378. 10.1109/cvpr.2018.00046 |
25 | SUN Y, ZHENG L, YANG Y, et al. Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline)[C]// Proceedings of the 15th European Conference on Computer Vision, LNCS 11208. Cham: Springer, 2018: 480-496. |
26 | GAO H, JI S. Graph U-nets[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 2083-2092. |
27 | HERMANS A, BEYER L, LEIBE B. In defense of the triplet loss for person re-identification [EB/OL]. (2017-11-21) [2021-10-21]. . 10.21203/rs.3.rs-1501673/v1 |
28 | ZHENG L, BIE Z, SUN Y, et al. MARS: a video benchmark for large-scale person re-identification[C]// Proceedings of the 14th European Conference on Computer Vision, LNCS 9910. Cham: Springer, 2016: 868-884. |
29 | RISTANI E, SOLERA F, ZOU R, et al. Performance measures and a data set for multi-target, multi-camera tracking[C]// Proceedings of the 14th European Conference on Computer Vision, LNCS 9914. Cham: Springer, 2016: 17-35. |
30 | KIRAN M, BHUIYAN A, BLAIS-MORIN L A, et al. Flow guided mutual attention for person re-identification[J]. Image and Vision Computing, 2021, 113: 104246. 10.1016/j.imavis.2021.104246 |
31 | PORRELLO A, BERGAMINI L, CALDERARA S. Robust re-identification by multiple views knowledge distillation[C]// Proceedings of the 16th European Conference on Computer Vision, LNCS 12355. Cham: Springer, 2020: 93-110. |
32 | CHEN Z, LI A, JIANG S, et al. Attribute-aware identity-hard triplet loss for video-based person re-identification [EB/OL]. [2021-07-24]. . 10.3390/app10062198 |
33 | SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization[C]// Proceedings of the 16th IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 618-626. 10.1109/iccv.2017.74 |
[1] | 庞川林, 唐睿, 张睿智, 刘川, 刘佳, 岳士博. D2D通信系统中基于图卷积网络的分布式功率控制算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2855-2862. |
[2] | 薛桂香, 王辉, 周卫峰, 刘瑜, 李岩. 基于知识图谱和时空扩散图卷积网络的港口交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2952-2957. |
[3] | 刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557. |
[4] | 李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072. |
[5] | 吕锡婷, 赵敬华, 荣海迎, 赵嘉乐. 基于Transformer和关系图卷积网络的信息传播预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1760-1766. |
[6] | 黎施彬, 龚俊, 汤圣君. 基于Graph Transformer的半监督异配图表示学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1816-1823. |
[7] | 朱子蒙, 李志新, 郇战, 陈瑛, 梁久祯. 基于三元中心引导的弱监督视频异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1452-1457. |
[8] | 高龙涛, 李娜娜. 基于方面感知注意力增强的方面情感三元组抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1049-1057. |
[9] | 杨先凤, 汤依磊, 李自强. 基于交替注意力机制和图卷积网络的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1058-1064. |
[10] | 王楷天, 叶青, 程春雷. 基于异构图表示的中医电子病历分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 411-417. |
[11] | 吴祖成, 吴小俊, 徐天阳. 基于模态内细粒度特征关系提取的图像文本检索模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3776-3783. |
[12] | 胡新荣, 陈静雪, 黄子键, 王帮超, 姚迅, 刘军平, 朱强, 杨捷. 基于图卷积网络的掩码数据增强[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3335-3344. |
[13] | 王利琴, 张特, 许智宏, 董永峰, 杨国伟. 融合实体语义及结构信息的知识图谱推理[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3371-3378. |
[14] | 梁睿衍, 杨慧. 基于RPEpose和XJ-GCN的轻量级跌倒检测算法框架[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3639-3646. |
[15] | 项能强, 朱小飞, 高肇泽. 原型感知双通道图卷积神经网络的信息传播预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3260-3266. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||