Person re-identification in video sequence based on spatial-temporal regularization

doi:10.11772/j.issn.1001-9081.2019051084

Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (11): 3216-3220.DOI: 10.11772/j.issn.1001-9081.2019051084

• The 2019 CCF Conference on Artificial Intelligence (CCFAI2019) • Previous Articles Next Articles

Person re-identification in video sequence based on spatial-temporal regularization

LIU Baocheng, PIAO Yan, TANG Yue

College of Electronic Information Engineering, Changchun University of Science and Technology, Changchun Jilin 130012, China

Received:2019-05-24 Revised:2019-06-24 Online:2019-09-11 Published:2019-11-10
Supported by:
This work is partially supported by the Science and Technology Support Project of Jilin Province (20180201091GX), the Project of Jilin Provincial Science and Technology Innovation Center (20180623039TC).

基于时空正则化的视频序列中行人的再识别

刘保成, 朴燕, 唐悦

长春理工大学电子信息工程学院, 长春 130012

通讯作者: 朴燕
作者简介:刘保成(1995-),男,吉林白山人,硕士研究生,CCF会员,主要研究方向:机器学习、计算机视觉;朴燕(1965-),女,吉林长春人,教授,博士,主要研究方向:计算机视觉、模式识别;唐悦(1994-),女,吉林长春人,硕士研究生,主要研究方向:深度学习、计算机视觉。
基金资助:
吉林省科技支撑项目（20180201091GX）；吉林省科技创新中心项目（20180623039TC）。

Abstract

Abstract: Due to the interference of various factors in the complex situation of reality, the errors may occur in the person re-identification. To improve the accuracy of person re-identification, a person re-identification algorithm based on spatial-temporal regularization was proposed. Firstly, the ResNet-50 network was used to extract the features of the input video sequence frame by frame, and the series of frame-level features were input into the spatial-temporal regularization network to generate corresponding weight scores. Then the weighted average was performed on the frame-level features to obtain the sequence-level features. To avoid weight scores from being aggregated in one frame, frame-level regularization was used to limit the difference between frames. Finally, the optimal results were obtained by minimizing the losses. A large number of tests were performed on MARS and DukeMTMC-ReID datasets. The experimental results show that the mean Average Precision (mAP) and the accuracy can be effectively improved by the proposed algorithm compared with Triplet algorithm. And the proposed algorithm has excellent performance for human posture variation, viewing angle changes and interference with similar appearance targets.

Key words: machine vision, person re-identification, attention mechanism, Convolutional Neural Network (CNN), temporal modeling

摘要： 由于现实复杂情况中各种因素的干扰，行人再识别的过程中可能出现识别错误等问题。为了提高行人再识别的准确性，提出了一种基于时空正则化的行人再识别算法。首先，利用ResNet-50网络对输入的视频序列逐帧进行特征提取，将一系列帧级特征输入到时空正则化网络并产生对应的权重分数；然后，对帧级特征使用加权平均得到视频序列级特征，为避免权重分数聚集在一帧，使用帧级正则化来限制帧间差异；最后，通过最小化损失得到最优结果。在DukeMTMC-ReID和MARS数据集中做了大量的测试，实验结果表明，所提方法与Triplet算法相比能够有效提高行人再识别的平均精度（mAP）和准确率，并且对于人体姿势变化、视角变化和相似外观目标的干扰具有出色的性能表现。

关键词: 机器视觉, 行人再识别, 注意力机制, 卷积神经网络, 时间建模

CLC Number:

TP391.41

LIU Baocheng, PIAO Yan, TANG Yue. Person re-identification in video sequence based on spatial-temporal regularization[J]. Journal of Computer Applications, 2019, 39(11): 3216-3220.

刘保成, 朴燕, 唐悦. 基于时空正则化的视频序列中行人的再识别[J]. 计算机应用, 2019, 39(11): 3216-3220.

References

[1] 李幼蛟,卓力,张菁,等.行人再识别技术综述[J].自动化学报, 2018, 44(9):1554-1568. (LI Y J, ZHUO L, ZHANG J, et al. A survey of person re-identification[J]. Acta Automatica Sinica, 2018, 44(9):1554-1568.)
[2] MCLAUGHLIN N, DEL RINCON J M, MILLER P. Recurrent convolutional network for video-based person re-identification[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:1325-1334.
[3] WU Z, WANG X, JIANG Y G, et al. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification[C]//Proceedings of the 23rd ACM International Conference on Multimedia. New York:ACM, 2015:461-470.
[4] LIU Y, YAN J, OUYANG W. Quality aware network for set to set recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:4694-4703.
[5] ZHOU Z, HUANG Y, WANG W, et al. See the forest for the trees:Joint spatial and temporal recurrent neural networks for video-based person re-identification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:4747-4756.
[6] KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:1725-1732.
[7] DENG J, DONG W, SOCHER R, et al. ImageNet:a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2009:248-255.
[8] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778.
[9] YOU J, WU A, LI X, et al. Top-push video-based person re-identification[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:1345-1353.
[10] YAN Y, NI B, SONG Z, et al. Person re-identification via recurrent feature aggregation[C]//Proceedings of the 14th European Conference on Computer Vision. Berlin:Springer, 2016:701-716.
[11] XU K, BA J, KIROS R, et al. Show, attend and tell:Neural image caption generation with visual attention[C]//Proceedings of the 32nd International Conference on Machine Learning.[S. l.]:International Machine Learning Society, 2015:2048-2057.
[12] HERMANS A, BEYR L, LEIBE B. In defense of the triplet loss for person re-identification[EB/OL].[2017-11-21]. http://arxiv.org/pdf/1703.07737.
[13] KINGMA D P, BA J. Adam:a method for stochastic optimization[EB/OL].[2017-01-30]. http://csce.uark.edu/~mgashler/ml/2018_spring/r3/adam.pdf.
[14] ZHENG L, BIE Z, SUN Y, et al. Mars:a video benchmark for large-scale person re-identification[C]//Proceedings of the 14th European Conference on Computer Vision. Berlin:Springer, 2016:868-884.
[15] ZHENG Z, ZHENG L, YANG Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington, DC:IEEE Computer Society, 2017:3754-3762.
[16] RISTANI E, SOLERA F, ZOU R, et al. Performance measures and a data set for multi-target, multi-camera tracking[C]//Proceedings of the 14th European Conference on Computer Vision. Berlin:Springer, 2016:17-35.
[17] LI D, CHEN X, ZHANG Z, et al. Learning deep context-aware features over body and latent parts for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:384-393.
[18] XIAO Q, LUO H, ZHANG C. Margin sample mining loss:a deep learning based method for person re-identification[EB/OL].[2017-10-07]. http://arxiv.org/pdf/1710.00478.
[19] LI S, BAK S, CARR P, et al. Diversity regularized spatiotemporal attention for video-based person re-identification[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:369-378.
[20] LI W, ZHU X, GONG S. Harmonious attention network for person re-identification[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:2285-2294.
[21] LIN Y, ZHENG L, ZHENG Z, et al. Improving person re-identification by attribute and identity learning[J]. Pattern Recognition, 2019, 95:151-161.
[22] CHEN D, LI H, XIAO T, et al. Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:1169-1178.
[23] CHANG X, HOSPEDALES T M, XIANG T. Multi-level factorisation net for person re-identification[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:2109-2118.
[24] CHEN Y, ZHU X, GONG S. Person re-identification by deep learning multi-scale representations[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:2590-2600.
[25] 李姣,张晓晖,朱虹,等.多置信度重排序的行人再识别算法[J].模式识别与人工智能, 2017, 30(11):995-1002. (LI J, ZHANG X H, ZHU H, et al. Person re-identification via multiple confidences re-ranking[J]. Pattern Recognition and Artificial Intelligence, 2017, 30(11):995-1002.)

Person re-identification in video sequence based on spatial-temporal regularization

基于时空正则化的视频序列中行人的再识别

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	WANG Hebing, ZHANG Chunmei. Facial landmark detection based on ResNeXt with asymmetric convolution and squeeze excitation [J]. Journal of Computer Applications, 2021, 41(9): 2741-2747.
[2]	SONG Zhongshan, LIANG Jiarui, ZHENG Lu, LIU Zhenyu, TIE Jun. Remote sensing scene classification based on bidirectional gated scale feature fusion [J]. Journal of Computer Applications, 2021, 41(9): 2726-2735.
[3]	LI Kangkang, ZHANG Jing. Multi-layer encoding and decoding model for image captioning based on attention mechanism [J]. Journal of Computer Applications, 2021, 41(9): 2504-2509.
[4]	ZHANG Yongbin, CHANG Wenxin, SUN Lianshan, ZHANG Hang. Detection method of domains generated by dictionary-based domain generation algorithm [J]. Journal of Computer Applications, 2021, 41(9): 2609-2614.
[5]	ZHAO Hong, KONG Dongyi. Chinese description of image content based on fusion of image feature attention and adaptive attention [J]. Journal of Computer Applications, 2021, 41(9): 2496-2503.
[6]	XU Jianglang, LI Linyan, WAN Xinjun, HU Fuyuan. Indoor scene recognition method combined with object detection [J]. Journal of Computer Applications, 2021, 41(9): 2720-2725.
[7]	DAI Yurou, YANG Qing, ZHANG Fengli, ZHOU Fan. Trajectory prediction model of social network users based on self-supervised learning [J]. Journal of Computer Applications, 2021, 41(9): 2545-2551.
[8]	LIU Yaxuan, ZHONG Yong. Joint extraction method of entities and relations based on subject attention [J]. Journal of Computer Applications, 2021, 41(9): 2517-2522.
[9]	CAO Yuhong, XU Hai, LIU Sun'ao, WANG Zixiao, LI Hongliang. Review of deep learning-based medical image segmentation [J]. Journal of Computer Applications, 2021, 41(8): 2273-2287.
[10]	QIN Binbin, PENG Liangkang, LU Xiangming, QIAN Jiangbo. Research progress on driver distracted driving detection [J]. Journal of Computer Applications, 2021, 41(8): 2330-2337.
[11]	HUANG Chengcheng, DONG Xiaoxiao, LI Zhao. Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm [J]. Journal of Computer Applications, 2021, 41(8): 2258-2264.
[12]	DANG Weichao, LI Tao, BAI Shangwang, GAO Gaimei, LIU Chunxia. Real-time remaining life prediction method of Web software system based on self-attention-long short-term memory network [J]. Journal of Computer Applications, 2021, 41(8): 2346-2351.
[13]	ZENG Xiangyin, ZHENG Bochuan, LIU Dan. Detection of left and right railway tracks based on deep convolutional neural network and clustering [J]. Journal of Computer Applications, 2021, 41(8): 2324-2329.
[14]	TAN Daoqiang, ZENG Cheng, QIAO Jinxia, ZHANG Jun. Shadow detection method based on hybrid attention model [J]. Journal of Computer Applications, 2021, 41(7): 2076-2081.
[15]	GAO Qinquan, HUANG Bingcheng, LIU Wenzhe, TONG Tong. Bamboo strip surface defect detection method based on improved CenterNet [J]. Journal of Computer Applications, 2021, 41(7): 1933-1938.