Abstract:In order to solve the problem of low accuracy of video-based person re-identification caused by factors such as occlusion, background interference, and person appearance and posture similarity in video surveillance, a video-based person re-identification method of Evenly Sampling-random Erasing (ESE) and global temporal feature pooling was proposed. Firstly, aiming at the situation where the object person is disturbed or partially occluded, a data enhancement method of evenly sampling-random erasing was adopted to effectively alleviate the occlusion problem, improving the generalization ability of the model, so as to more accurately match the person. Secondly, to further improve the accuracy of video-based person re-identification, and learn more discriminative feature representations, a 3D Convolutional Neural Network (3DCNN) was used to extract temporal and spatial features. And a Global Temporal Feature Pooling (GTFP) layer was added to the network before the output of person feature representations, so as to ensure the obtaining of spatial information of the context, and refine the intra-frame temporal information. Lots of experiments conducted on three public video datasets, MARS, DukeMTMC-VideoReID and PRID-201l, prove that the method of jointing evenly sampling-random erasing and global temporal feature pooling is competitive compared with some state-of-the-art video-based person re-identification methods.
[1] SONG C, HUANG Y, OUYANG W, et al. Mask-guided contrastive attention model for person re-identification[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision2016:1325-1334. [7] YAN Y, NI B, SONG Z, et al. Person re-identification via recurrent feature aggregation[C]//Proceedings of the 14th European Conference on Computer Vision,LNCS 9910. Cham:Springer, 2016:701-716. [8] LIAO X, HE L, YANG Z, et al. Video-based person reidentification via 3D convolutional networks and non-local attention[C]//Proceedings of the 14th Asian Conference on Computer Vision,LNCS 11366. Cham:Springer,2018:620-634. [9] ZHONG Z,ZHENG L,KANG G,et al. Random erasing data augmentation[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1708.04896.pdf. [10] HIRZER M, BELEZNAI C, ROTH P M, et al. Person reidentification by descriptive and discriminative classification[C]//Proceedings of the 17th Scandinavian Conference on Image Analysis,LNCS 6688. Berlin:Springer,2011:91-102. [11] RISTANI E,SOLERA F,ZOU R,et al. Performance measures and a data set for multi-target, multi-camera tracking[C]//Proceedings of the 14th European Conference on Computer Vision, LNCS 9914. Cham:Springer,2016:17-35. [12] ZHENG L,BIE Z,SUN Y,et al. MARS:a video benchmark for large-scale person re-identification[C]//Proceedings of the 14th European Conference on Computer Vision,LNCS 9910. Cham:Springer,2016:868-884. [13] KARPATHY A,TODERICI G,SHETTY S,et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the 2014 IEEE conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2014:1725-1732. [14] GAO J, YANG Z, NEVATIA R. RED:reinforced encoderdecoder networks for action anticipation[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1707.04818.pdf. [15] SHOU Z,WANG D,CHANG S F. Temporal action localization in untrimmed videos via multi-stage CNNs[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:1049-1058. [16] LI J,ZHANG S,HUANG T. Multi-scale 3D convolution network for video based person re-identification[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2019:8618-8625. [17] 郑伟诗, 吴岸聪. 非对称行人重识别:跨摄像机持续行人追踪[J]. 中国科学:信息科学,2018,48(5):545-563.(ZHENG W S,WU A C. Asymmetric person re-identification:cross-view person tracking in a large camera network[J]. SCIENTIA SINICA Informationis,2018,48(5):545-563.) [18] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision. Piscataway:IEEE,2015:4489-4497. [19] HARA K,KATAOKA H,SATOH Y. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6546-6555. [20] GAO J,YANG Z,SUN C,et al. TURN TAP:temporal unit regression network for temporal action proposals[C]//Proceedings of the 2017 IEEE Conference on Computer Vision. Piscataway:IEEE,2017:3648-3656. [21] LI S, BAK S, CARR P, et al. Diversity regularized spatiotemporal attention for video-Based person re-identification[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:369-378. [22] SUH Y, WANG J, TANG S, et al. Part-aligned bilinear representations for person re-identification[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1804.07094.pdf. [23] CARREIRA J,ZISSERMAN A. QUO vadis,action recognition? a new model and the Kinetics dataset[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:4724-4733. [24] HERMANS A,BEYER L,LEIBE B. In defense of the triplet loss for person re-identification[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1703.07737.pdf. [25] 戴臣超, 王洪元, 倪彤光, 等. 基于深度卷积生成对抗网络和拓展近邻重排序的行人重识别[J]. 计算机研究与发展,2019,56(8):1632-1641.(DAI C C,WANG H Y,NI T G,et al. Person re-identification based on deep convolutional generative adversarial network and expanded neighbor reranking[J]. Journal of Computer Research and Development,2019,56(8):1632-1641.) [26] KINGMA D P, BA J L. Adam:a method for stochastic optimization[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1412.6980v8.pdf. [27] ZHOU Z,HUANG Y,WANG W,et al. See the forest for the trees:joint spatial and temporal recurrent neural networks for video-based person re-identification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:6776-6785. [28] ZHANG J,WANG N,ZHANG L. Multi-shot pedestrian reidentification via sequential decision making[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6781-6789. [29] WU Y,LIN Y,DONG X,et al. Exploit the unknown gradually:one-shot video-based person re-identification by stepwise learning[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:5177-5186. [30] ZHAO Y, SHEN X, JIN Z, et al. Attribute-driven feature disentangling and temporal aggregation for video person reidentification[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019:4908-4917. [31] LIU Y,YUAN Z,ZHOU W,et al. Spatial and temporal mutual promotion for video-based person re-identification[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2019:8786-8793. [32] SU X,QU X,ZHOU Z,et al. k-Reciprocal harmonious attention network for video-based person re-identification[J]. IEEE Access,2019,7:22457-22470.