联合均等采样随机擦除和全局时间特征池化的视频行人重识别方法

doi:10.11772/j.issn.1001-9081.2020060909

计算机应用 ›› 2021, Vol. 41 ›› Issue (1): 164-169.DOI: 10.11772/j.issn.1001-9081.2020060909

所属专题：第八届中国数据挖掘会议(CCDM 2020)

• 第八届中国数据挖掘会议(CCDM 2020) • 上一篇下一篇

联合均等采样随机擦除和全局时间特征池化的视频行人重识别方法

陈莉, 王洪元, 张云鹏, 曹亮, 殷雨昌

常州大学计算机与人工智能学院阿里云大数据学院, 江苏常州 213164

收稿日期:2020-05-31 修回日期:2020-07-16 出版日期:2021-01-10 发布日期:2021-01-16
通讯作者: 王洪元
作者简介:陈莉(1995-),女,江苏盐城人,硕士研究生,主要研究方向:计算机视觉;王洪元(1960-),男,江苏常州人,教授,博士,CCF会员,主要研究方向:计算机视觉;张云鹏(1995-),男,江苏淮安人,硕士研究生,主要研究方向:计算机视觉;曹亮(1996-),男,江苏盐城人,硕士研究生,主要研究方向:计算机视觉;殷雨昌(1996-),男,江苏盐城人,硕士研究生,主要研究方向:计算机视觉。
基金资助:
国家自然科学基金资助项目（61976028）。

Video-based person re-identification method by jointing evenly sampling-random erasing and global temporal feature pooling

CHEN Li, WANG Hongyuan, ZHANG Yunpeng, CAO Liang, YIN Yuchang

School of Computer Science and Artificial Intelligence Aliyun School of Big Data, Changzhou University, Changzhou Jiangsu 213164, China

Received:2020-05-31 Revised:2020-07-16 Online:2021-01-10 Published:2021-01-16
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61976028).

摘要/Abstract

摘要： 针对为解决视频监控中遮挡、背景物干扰，以及行人外观、姿势相似性等因素导致的视频行人重识别准确率较低的问题，提出了联合均等采样随机擦除和全局时间特征池化的视频行人重识别方法。首先针对目标行人被干扰或部分遮挡的情况，采用了均等采样随机擦除（ESE）的数据增强方法来有效地缓解遮挡，提高模型的泛化能力，更准确地匹配行人；其次为了进一步提高视频行人重识别的精度，学习更有判别力的特征表示，使用三维卷积神经网络（3DCNN）提取时空特征，并在网络输出行人特征表示前加上全局时间特征池化层（GTFP），这样既能获取上下文的空间信息，又能细化帧与帧之间的时序信息。通过在MARS、DukeMTMC-VideoReID 和PRID-2011三个公共视频数据集上的大量实验，证明所提出的联合均等采样随机擦除和全局时间特征池化的方法，相较于目前一些先进的视频行人重识别方法，具有一定的竞争力。

关键词: 视频行人重识别, 三维卷积神经网络, 全局时间特征表示, 均等采样随机擦除, 数据增强

Abstract: In order to solve the problem of low accuracy of video-based person re-identification caused by factors such as occlusion, background interference, and person appearance and posture similarity in video surveillance, a video-based person re-identification method of Evenly Sampling-random Erasing (ESE) and global temporal feature pooling was proposed. Firstly, aiming at the situation where the object person is disturbed or partially occluded, a data enhancement method of evenly sampling-random erasing was adopted to effectively alleviate the occlusion problem, improving the generalization ability of the model, so as to more accurately match the person. Secondly, to further improve the accuracy of video-based person re-identification, and learn more discriminative feature representations, a 3D Convolutional Neural Network (3DCNN) was used to extract temporal and spatial features. And a Global Temporal Feature Pooling (GTFP) layer was added to the network before the output of person feature representations, so as to ensure the obtaining of spatial information of the context, and refine the intra-frame temporal information. Lots of experiments conducted on three public video datasets, MARS, DukeMTMC-VideoReID and PRID-201l, prove that the method of jointing evenly sampling-random erasing and global temporal feature pooling is competitive compared with some state-of-the-art video-based person re-identification methods.

Key words: video-based person re-identification, 3D Convolutional Neural Network (3DCNN）, global temporal feature representation, Evenly Sampling-random Erasing (ESE), data augmentation

中图分类号:

TP391.41

陈莉, 王洪元, 张云鹏, 曹亮, 殷雨昌. 联合均等采样随机擦除和全局时间特征池化的视频行人重识别方法[J]. 计算机应用, 2021, 41(1): 164-169.

CHEN Li, WANG Hongyuan, ZHANG Yunpeng, CAO Liang, YIN Yuchang. Video-based person re-identification method by jointing evenly sampling-random erasing and global temporal feature pooling[J]. Journal of Computer Applications, 2021, 41(1): 164-169.

参考文献

[1] SONG C, HUANG Y, OUYANG W, et al. Mask-guided contrastive attention model for person re-identification[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision2016:1325-1334.
[7] YAN Y, NI B, SONG Z, et al. Person re-identification via recurrent feature aggregation[C]//Proceedings of the 14th European Conference on Computer Vision,LNCS 9910. Cham:Springer, 2016:701-716.
[8] LIAO X, HE L, YANG Z, et al. Video-based person reidentification via 3D convolutional networks and non-local attention[C]//Proceedings of the 14th Asian Conference on Computer Vision,LNCS 11366. Cham:Springer,2018:620-634.
[9] ZHONG Z,ZHENG L,KANG G,et al. Random erasing data augmentation[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1708.04896.pdf.
[10] HIRZER M, BELEZNAI C, ROTH P M, et al. Person reidentification by descriptive and discriminative classification[C]//Proceedings of the 17th Scandinavian Conference on Image Analysis,LNCS 6688. Berlin:Springer,2011:91-102.
[11] RISTANI E,SOLERA F,ZOU R,et al. Performance measures and a data set for multi-target, multi-camera tracking[C]//Proceedings of the 14th European Conference on Computer Vision, LNCS 9914. Cham:Springer,2016:17-35.
[12] ZHENG L,BIE Z,SUN Y,et al. MARS:a video benchmark for large-scale person re-identification[C]//Proceedings of the 14th European Conference on Computer Vision,LNCS 9910. Cham:Springer,2016:868-884.
[13] KARPATHY A,TODERICI G,SHETTY S,et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the 2014 IEEE conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2014:1725-1732.
[14] GAO J, YANG Z, NEVATIA R. RED:reinforced encoderdecoder networks for action anticipation[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1707.04818.pdf.
[15] SHOU Z,WANG D,CHANG S F. Temporal action localization in untrimmed videos via multi-stage CNNs[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2016:1049-1058.
[16] LI J,ZHANG S,HUANG T. Multi-scale 3D convolution network for video based person re-identification[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2019:8618-8625.
[17] 郑伟诗, 吴岸聪. 非对称行人重识别:跨摄像机持续行人追踪[J]. 中国科学:信息科学,2018,48(5):545-563.(ZHENG W S,WU A C. Asymmetric person re-identification:cross-view person tracking in a large camera network[J]. SCIENTIA SINICA Informationis,2018,48(5):545-563.)
[18] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]//Proceedings of the 2015 IEEE Conference on Computer Vision. Piscataway:IEEE,2015:4489-4497.
[19] HARA K,KATAOKA H,SATOH Y. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6546-6555.
[20] GAO J,YANG Z,SUN C,et al. TURN TAP:temporal unit regression network for temporal action proposals[C]//Proceedings of the 2017 IEEE Conference on Computer Vision. Piscataway:IEEE,2017:3648-3656.
[21] LI S, BAK S, CARR P, et al. Diversity regularized spatiotemporal attention for video-Based person re-identification[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:369-378.
[22] SUH Y, WANG J, TANG S, et al. Part-aligned bilinear representations for person re-identification[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1804.07094.pdf.
[23] CARREIRA J,ZISSERMAN A. QUO vadis,action recognition? a new model and the Kinetics dataset[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:4724-4733.
[24] HERMANS A,BEYER L,LEIBE B. In defense of the triplet loss for person re-identification[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1703.07737.pdf.
[25] 戴臣超, 王洪元, 倪彤光, 等. 基于深度卷积生成对抗网络和拓展近邻重排序的行人重识别[J]. 计算机研究与发展,2019,56(8):1632-1641.(DAI C C,WANG H Y,NI T G,et al. Person re-identification based on deep convolutional generative adversarial network and expanded neighbor reranking[J]. Journal of Computer Research and Development,2019,56(8):1632-1641.)
[26] KINGMA D P, BA J L. Adam:a method for stochastic optimization[EB/OL].[2020-07-04]. https://arxiv.org/pdf/1412.6980v8.pdf.
[27] ZHOU Z,HUANG Y,WANG W,et al. See the forest for the trees:joint spatial and temporal recurrent neural networks for video-based person re-identification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:6776-6785.
[28] ZHANG J,WANG N,ZHANG L. Multi-shot pedestrian reidentification via sequential decision making[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:6781-6789.
[29] WU Y,LIN Y,DONG X,et al. Exploit the unknown gradually:one-shot video-based person re-identification by stepwise learning[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:5177-5186.
[30] ZHAO Y, SHEN X, JIN Z, et al. Attribute-driven feature disentangling and temporal aggregation for video person reidentification[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2019:4908-4917.
[31] LIU Y,YUAN Z,ZHOU W,et al. Spatial and temporal mutual promotion for video-based person re-identification[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2019:8786-8793.
[32] SU X,QU X,ZHOU Z,et al. k-Reciprocal harmonious attention network for video-based person re-identification[J]. IEEE Access,2019,7:22457-22470.

[1]	贾承勋, 赖华, 余正涛, 文永华, 于志强. 融合单语语言模型的汉越伪平行语料生成[J]. 计算机应用, 2021, 41(6): 1652-1658.
[2]	甘岚, 沈鸿飞, 王瑶, 张跃进. 基于改进DCGAN的数据增强方法[J]. 计算机应用, 2021, 41(5): 1305-1313.
[3]	陆鑫伟, 余鹏飞, 李海燕, 李红松, 丁文谦. 基于注意力自身线性融合的弱监督细粒度图像分类算法[J]. 计算机应用, 2021, 41(5): 1319-1325.
[4]	霍首君, 郝琰, 石慧宇, 董艳清, 曹锐. 基于深度卷积网络的运动想象脑电信号模式识别[J]. 计算机应用, 2021, 41(4): 1042-1048.
[5]	徐少伟, 秦品乐, 曾建朝, 赵致楷, 高媛. 基于注意力机制的两阶段纵膈淋巴结自动分割算法[J]. 计算机应用, 2021, 41(2): 556-562.
[6]	刘紫燕, 朱明成, 袁磊, 马珊珊, 陈霖周廷. 基于非局部关注和多重特征融合的视频行人重识别[J]. 计算机应用, 2021, 41(2): 530-536.
[7]	陈佛计, 朱枫, 吴清潇, 郝颖明, 王恩德. 基于生成对抗网络的红外图像数据增强[J]. 计算机应用, 2020, 40(7): 2084-2088.
[8]	程广涛, 巩家昌, 李建. 基于稠密卷积神经网络的烟雾识别方法[J]. 计算机应用, 2020, 40(5): 1465-1469.
[9]	谌贵辉, 易欣, 李忠兵, 钱济人, 陈伍. 基于改进YOLOv2和迁移学习的管道巡检航拍图像第三方施工目标检测[J]. 计算机应用, 2020, 40(4): 1062-1068.
[10]	刘紫燕, 万培佩. 基于注意力机制的行人重识别特征提取方法[J]. 计算机应用, 2020, 40(3): 672-676.
[11]	费大胜, 宋慧慧, 张开华. 基于多层特征增强的实时视觉跟踪[J]. 计算机应用, 2020, 40(11): 3300-3305.
[12]	万永菁, 王博玮, 娄定风. 基于三维卷积神经网络的虫音特征识别方法[J]. 计算机应用, 2019, 39(9): 2744-2748.
[13]	金忠星, 李东. 消费者偏好预测的深度学习神经网络模型[J]. 计算机应用, 2019, 39(7): 1888-1893.
[14]	徐姗姗, 颜超, 高琳明. 基于三维卷积神经网络的湖泊提取算法[J]. 计算机应用, 2019, 39(12): 3450-3455.
[15]	余鹰, 王乐为, 张应龙. 基于特征提取偏好与背景色相关性的数据增强算法[J]. 计算机应用, 2019, 39(11): 3172-3177.

联合均等采样随机擦除和全局时间特征池化的视频行人重识别方法

Video-based person re-identification method by jointing evenly sampling-random erasing and global temporal feature pooling

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics