Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (1): 164-169.DOI: 10.11772/j.issn.1001-9081.2020060909

Special Issue: 第八届中国数据挖掘会议(CCDM 2020)

• China Conference on Data Mining 2020 (CCDM 2020) • Previous Articles     Next Articles

Video-based person re-identification method by jointing evenly sampling-random erasing and global temporal feature pooling

CHEN Li, WANG Hongyuan, ZHANG Yunpeng, CAO Liang, YIN Yuchang   

  1. School of Computer Science and Artificial Intelligence Aliyun School of Big Data, Changzhou University, Changzhou Jiangsu 213164, China
  • Received:2020-05-31 Revised:2020-07-16 Online:2021-01-10 Published:2021-01-16
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61976028).


陈莉, 王洪元, 张云鹏, 曹亮, 殷雨昌   

  1. 常州大学 计算机与人工智能学院 阿里云大数据学院, 江苏 常州 213164
  • 通讯作者: 王洪元
  • 作者简介:陈莉(1995-),女,江苏盐城人,硕士研究生,主要研究方向:计算机视觉;王洪元(1960-),男,江苏常州人,教授,博士,CCF会员,主要研究方向:计算机视觉;张云鹏(1995-),男,江苏淮安人,硕士研究生,主要研究方向:计算机视觉;曹亮(1996-),男,江苏盐城人,硕士研究生,主要研究方向:计算机视觉;殷雨昌(1996-),男,江苏盐城人,硕士研究生,主要研究方向:计算机视觉。
  • 基金资助:

Abstract: In order to solve the problem of low accuracy of video-based person re-identification caused by factors such as occlusion, background interference, and person appearance and posture similarity in video surveillance, a video-based person re-identification method of Evenly Sampling-random Erasing (ESE) and global temporal feature pooling was proposed. Firstly, aiming at the situation where the object person is disturbed or partially occluded, a data enhancement method of evenly sampling-random erasing was adopted to effectively alleviate the occlusion problem, improving the generalization ability of the model, so as to more accurately match the person. Secondly, to further improve the accuracy of video-based person re-identification, and learn more discriminative feature representations, a 3D Convolutional Neural Network (3DCNN) was used to extract temporal and spatial features. And a Global Temporal Feature Pooling (GTFP) layer was added to the network before the output of person feature representations, so as to ensure the obtaining of spatial information of the context, and refine the intra-frame temporal information. Lots of experiments conducted on three public video datasets, MARS, DukeMTMC-VideoReID and PRID-201l, prove that the method of jointing evenly sampling-random erasing and global temporal feature pooling is competitive compared with some state-of-the-art video-based person re-identification methods.

Key words: video-based person re-identification, 3D Convolutional Neural Network (3DCNN), global temporal feature representation, Evenly Sampling-random Erasing (ESE), data augmentation

摘要: 针对为解决视频监控中遮挡、背景物干扰,以及行人外观、姿势相似性等因素导致的视频行人重识别准确率较低的问题,提出了联合均等采样随机擦除和全局时间特征池化的视频行人重识别方法。首先针对目标行人被干扰或部分遮挡的情况,采用了均等采样随机擦除(ESE)的数据增强方法来有效地缓解遮挡,提高模型的泛化能力,更准确地匹配行人;其次为了进一步提高视频行人重识别的精度,学习更有判别力的特征表示,使用三维卷积神经网络(3DCNN)提取时空特征,并在网络输出行人特征表示前加上全局时间特征池化层(GTFP),这样既能获取上下文的空间信息,又能细化帧与帧之间的时序信息。通过在MARS、DukeMTMC-VideoReID 和PRID-2011三个公共视频数据集上的大量实验,证明所提出的联合均等采样随机擦除和全局时间特征池化的方法,相较于目前一些先进的视频行人重识别方法,具有一定的竞争力。

关键词: 视频行人重识别, 三维卷积神经网络, 全局时间特征表示, 均等采样随机擦除, 数据增强

CLC Number: