Video-based person re-identification method by jointing evenly sampling-random erasing and global temporal feature pooling

doi:10.11772/j.issn.1001-9081.2020060909

Abstract

Abstract: In order to solve the problem of low accuracy of video-based person re-identification caused by factors such as occlusion, background interference, and person appearance and posture similarity in video surveillance, a video-based person re-identification method of Evenly Sampling-random Erasing (ESE) and global temporal feature pooling was proposed. Firstly, aiming at the situation where the object person is disturbed or partially occluded, a data enhancement method of evenly sampling-random erasing was adopted to effectively alleviate the occlusion problem, improving the generalization ability of the model, so as to more accurately match the person. Secondly, to further improve the accuracy of video-based person re-identification, and learn more discriminative feature representations, a 3D Convolutional Neural Network (3DCNN) was used to extract temporal and spatial features. And a Global Temporal Feature Pooling (GTFP) layer was added to the network before the output of person feature representations, so as to ensure the obtaining of spatial information of the context, and refine the intra-frame temporal information. Lots of experiments conducted on three public video datasets, MARS, DukeMTMC-VideoReID and PRID-201l, prove that the method of jointing evenly sampling-random erasing and global temporal feature pooling is competitive compared with some state-of-the-art video-based person re-identification methods.

Key words: video-based person re-identification, 3D Convolutional Neural Network (3DCNN）, global temporal feature representation, Evenly Sampling-random Erasing (ESE), data augmentation

摘要： 针对为解决视频监控中遮挡、背景物干扰，以及行人外观、姿势相似性等因素导致的视频行人重识别准确率较低的问题，提出了联合均等采样随机擦除和全局时间特征池化的视频行人重识别方法。首先针对目标行人被干扰或部分遮挡的情况，采用了均等采样随机擦除（ESE）的数据增强方法来有效地缓解遮挡，提高模型的泛化能力，更准确地匹配行人；其次为了进一步提高视频行人重识别的精度，学习更有判别力的特征表示，使用三维卷积神经网络（3DCNN）提取时空特征，并在网络输出行人特征表示前加上全局时间特征池化层（GTFP），这样既能获取上下文的空间信息，又能细化帧与帧之间的时序信息。通过在MARS、DukeMTMC-VideoReID 和PRID-2011三个公共视频数据集上的大量实验，证明所提出的联合均等采样随机擦除和全局时间特征池化的方法，相较于目前一些先进的视频行人重识别方法，具有一定的竞争力。

关键词: 视频行人重识别, 三维卷积神经网络, 全局时间特征表示, 均等采样随机擦除, 数据增强

CLC Number:

TP391.41

CHEN Li, WANG Hongyuan, ZHANG Yunpeng, CAO Liang, YIN Yuchang. Video-based person re-identification method by jointing evenly sampling-random erasing and global temporal feature pooling[J]. Journal of Computer Applications, 2021, 41(1): 164-169.

陈莉, 王洪元, 张云鹏, 曹亮, 殷雨昌. 联合均等采样随机擦除和全局时间特征池化的视频行人重识别方法[J]. 计算机应用, 2021, 41(1): 164-169.

References

[1] SONG C, HUANG Y, OUYANG W, et al. Mask-guided contrastive attention model for person re-identification[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision2016:1325-1334.
[7] YAN Y, NI B, SONG Z, et al. Person re-identification via recurrent feature aggregation[C]//Proceedings of the 14th European Conference on Computer Vision,LNCS 9910. Cham:Springer, 2016:701-716.
[8] LIAO X, HE L, YANG Z, et al. Video-based person reidentification via 3D convolutional networks and non-local attention[C]//Proceedings of the 14th Asian Conference on Computer Vision,LNCS 11366. Cham:Springer,2018:620-634.
[9] ZHONG Z,ZHENG L,KANG G,et al. Random erasing data augmentation[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1708.04896.pdf.
[10] HIRZER M, BELEZNAI C, ROTH P M, et al. Person reidentification by descriptive and discriminative classification[C]//Proceedings of the 17th Scandinavian Conference on Image Analysis,LNCS 6688. Berlin:Springer,2011:91-102.
[11] RISTANI E,SOLERA F,ZOU R,et al. Performance measures and a data set for multi-target, multi-camera tracking[C]//Proceedings of the 14th European Conference on Computer Vision, LNCS 9914. Cham:Springer,2016:17-35.
[12] ZHENG L,BIE Z,SUN Y,et al. MARS:a video benchmark for large-scale person re-identification[C]//Proceedings of the 14th European Conference on Computer Vision,LNCS 9910. Cham:Springer,2016:868-884.
[13] KARPATHY A,TODERICI G,SHETTY S,et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the 2014 IEEE conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2014:1725-1732.
[14] GAO J, YANG Z, NEVATIA R. RED:reinforced encoderdecoder networks for action anticipation[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1707.04818.pdf.
[15] SHOU Z,WANG D,CHANG S F. Temporal action localization in untrimmed videos via multi-stage CNNs[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:1049-1058.
[16] LI J,ZHANG S,HUANG T. Multi-scale 3D convolution network for video based person re-identification[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2019:8618-8625.
[17] 郑伟诗, 吴岸聪. 非对称行人重识别:跨摄像机持续行人追踪[J]. 中国科学:信息科学,2018,48(5):545-563.(ZHENG W S,WU A C. Asymmetric person re-identification:cross-view person tracking in a large camera network[J]. SCIENTIA SINICA Informationis,2018,48(5):545-563.)
[18] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision. Piscataway:IEEE,2015:4489-4497.
[19] HARA K,KATAOKA H,SATOH Y. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6546-6555.
[20] GAO J,YANG Z,SUN C,et al. TURN TAP:temporal unit regression network for temporal action proposals[C]//Proceedings of the 2017 IEEE Conference on Computer Vision. Piscataway:IEEE,2017:3648-3656.
[21] LI S, BAK S, CARR P, et al. Diversity regularized spatiotemporal attention for video-Based person re-identification[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:369-378.
[22] SUH Y, WANG J, TANG S, et al. Part-aligned bilinear representations for person re-identification[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1804.07094.pdf.
[23] CARREIRA J,ZISSERMAN A. QUO vadis,action recognition? a new model and the Kinetics dataset[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:4724-4733.
[24] HERMANS A,BEYER L,LEIBE B. In defense of the triplet loss for person re-identification[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1703.07737.pdf.
[25] 戴臣超, 王洪元, 倪彤光, 等. 基于深度卷积生成对抗网络和拓展近邻重排序的行人重识别[J]. 计算机研究与发展,2019,56(8):1632-1641.(DAI C C,WANG H Y,NI T G,et al. Person re-identification based on deep convolutional generative adversarial network and expanded neighbor reranking[J]. Journal of Computer Research and Development,2019,56(8):1632-1641.)
[26] KINGMA D P, BA J L. Adam:a method for stochastic optimization[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1412.6980v8.pdf.
[27] ZHOU Z,HUANG Y,WANG W,et al. See the forest for the trees:joint spatial and temporal recurrent neural networks for video-based person re-identification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:6776-6785.
[28] ZHANG J,WANG N,ZHANG L. Multi-shot pedestrian reidentification via sequential decision making[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6781-6789.
[29] WU Y,LIN Y,DONG X,et al. Exploit the unknown gradually:one-shot video-based person re-identification by stepwise learning[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:5177-5186.
[30] ZHAO Y, SHEN X, JIN Z, et al. Attribute-driven feature disentangling and temporal aggregation for video person reidentification[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019:4908-4917.
[31] LIU Y,YUAN Z,ZHOU W,et al. Spatial and temporal mutual promotion for video-based person re-identification[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2019:8786-8793.
[32] SU X,QU X,ZHOU Z,et al. k-Reciprocal harmonious attention network for video-based person re-identification[J]. IEEE Access,2019,7:22457-22470.

[1]	Jiong WANG, Taotao TANG, Caiyan JIA. PAGCL： positive augmentation graph contrastive learning recommendation method without negative sampling [J]. Journal of Computer Applications, 2024, 44(5): 1485-1492.
[2]	Jie GUO, Jiayu LIN, Zuhong LIANG, Xiaobo LUO, Haitao SUN. Recommendation method based on knowledge‑awareness and cross-level contrastive learning [J]. Journal of Computer Applications, 2024, 44(4): 1121-1127.
[3]	Andi GUO, Zhen JIA, Tianrui LI. High-precision entity and relation extraction in medical domain based on pseudo-entity data augmentation [J]. Journal of Computer Applications, 2024, 44(2): 393-402.
[4]	Yifei SONG, Yi LIU. Fast adversarial training method based on data augmentation and label noise [J]. Journal of Computer Applications, 2024, 44(12): 3798-3807.
[5]	Xinrong HU, Jingxue CHEN, Zijian HUANG, Bangchao WANG, Xun YAO, Junping LIU, Qiang ZHU, Jie YANG. Graph convolution network-based masked data augmentation [J]. Journal of Computer Applications, 2024, 44(11): 3335-3344.
[6]	Yingmao YAO, Xiaoyan JIANG. Video-based person re-identification method based on graph convolution network and self-attention graph pooling [J]. Journal of Computer Applications, 2023, 43(3): 728-735.
[7]	Qiujie SUN, Jinggui LIANG, Si LI. Chinese grammatical error correction model based on bidirectional and auto-regressive transformers noiser [J]. Journal of Computer Applications, 2022, 42(3): 860-866.
[8]	Yuchang YIN, Hongyuan WANG, Li CHEN, Zundeng FENG, Yu XIAO. One-shot video-based person re-identification with multi-loss learning and joint metric [J]. Journal of Computer Applications, 2022, 42(3): 764-769.
[9]	Yimin CAO, Lei CAI, Jingyang GAO. Gene data generation method based on generative adversarial network [J]. Journal of Computer Applications, 2022, 42(3): 783-790.
[10]	Yu PENG, Yaolian SONG, Jun YANG. Motor imagery electroencephalography classification based on data augmentation [J]. Journal of Computer Applications, 2022, 42(11): 3625-3632.
[11]	Ping LUO, Ling DING, Xue YANG, Yang XIANG. Chinese event detection based on data augmentation and weakly supervised adversarial training [J]. Journal of Computer Applications, 2022, 42(10): 2990-2995.
[12]	Shuang DENG, Xiaohai HE, Linbo QING, Honggang CHEN, Qizhi TENG. Weakly supervised fine-grained classification method of Alzheimer’s disease based on improved visual geometry group network [J]. Journal of Computer Applications, 2022, 42(1): 302-309.
[13]	JIA Chengxun, LAI Hua, YU Zhengtao, WEN Yonghua, YU Zhiqiang. Chinese-Vietnamese pseudo-parallel corpus generation based on monolingual language model [J]. Journal of Computer Applications, 2021, 41(6): 1652-1658.
[14]	GAN Lan, SHEN Hongfei, WANG Yao, ZHANG Yuejin. Data augmentation method based on improved deep convolutional generative adversarial networks [J]. Journal of Computer Applications, 2021, 41(5): 1305-1313.
[15]	LU Xinwei, YU Pengfei, LI Haiyan, LI Hongsong, DING Wenqian. Weakly supervised fine-grained image classification algorithm based on attention-attention bilinear pooling [J]. Journal of Computer Applications, 2021, 41(5): 1319-1325.