基于时空正则化的视频序列中行人的再识别

doi:10.11772/j.issn.1001-9081.2019051084

计算机应用 ›› 2019, Vol. 39 ›› Issue (11): 3216-3220.DOI: 10.11772/j.issn.1001-9081.2019051084

• 2019年中国计算机学会人工智能会议(CCFAI2019)论文 • 上一篇下一篇

基于时空正则化的视频序列中行人的再识别

刘保成, 朴燕, 唐悦

长春理工大学电子信息工程学院, 长春 130012

收稿日期:2019-05-24 修回日期:2019-06-24 发布日期:2019-09-11 出版日期:2019-11-10
通讯作者: 朴燕
作者简介:刘保成(1995-),男,吉林白山人,硕士研究生,CCF会员,主要研究方向:机器学习、计算机视觉;朴燕(1965-),女,吉林长春人,教授,博士,主要研究方向:计算机视觉、模式识别;唐悦(1994-),女,吉林长春人,硕士研究生,主要研究方向:深度学习、计算机视觉。
基金资助:
吉林省科技支撑项目（20180201091GX）；吉林省科技创新中心项目（20180623039TC）。

Person re-identification in video sequence based on spatial-temporal regularization

LIU Baocheng, PIAO Yan, TANG Yue

College of Electronic Information Engineering, Changchun University of Science and Technology, Changchun Jilin 130012, China

Received:2019-05-24 Revised:2019-06-24 Online:2019-09-11 Published:2019-11-10
Supported by:
This work is partially supported by the Science and Technology Support Project of Jilin Province (20180201091GX), the Project of Jilin Provincial Science and Technology Innovation Center (20180623039TC).

摘要/Abstract

摘要： 由于现实复杂情况中各种因素的干扰，行人再识别的过程中可能出现识别错误等问题。为了提高行人再识别的准确性，提出了一种基于时空正则化的行人再识别算法。首先，利用ResNet-50网络对输入的视频序列逐帧进行特征提取，将一系列帧级特征输入到时空正则化网络并产生对应的权重分数；然后，对帧级特征使用加权平均得到视频序列级特征，为避免权重分数聚集在一帧，使用帧级正则化来限制帧间差异；最后，通过最小化损失得到最优结果。在DukeMTMC-ReID和MARS数据集中做了大量的测试，实验结果表明，所提方法与Triplet算法相比能够有效提高行人再识别的平均精度（mAP）和准确率，并且对于人体姿势变化、视角变化和相似外观目标的干扰具有出色的性能表现。

关键词: 机器视觉, 行人再识别, 注意力机制, 卷积神经网络, 时间建模

Abstract: Due to the interference of various factors in the complex situation of reality, the errors may occur in the person re-identification. To improve the accuracy of person re-identification, a person re-identification algorithm based on spatial-temporal regularization was proposed. Firstly, the ResNet-50 network was used to extract the features of the input video sequence frame by frame, and the series of frame-level features were input into the spatial-temporal regularization network to generate corresponding weight scores. Then the weighted average was performed on the frame-level features to obtain the sequence-level features. To avoid weight scores from being aggregated in one frame, frame-level regularization was used to limit the difference between frames. Finally, the optimal results were obtained by minimizing the losses. A large number of tests were performed on MARS and DukeMTMC-ReID datasets. The experimental results show that the mean Average Precision (mAP) and the accuracy can be effectively improved by the proposed algorithm compared with Triplet algorithm. And the proposed algorithm has excellent performance for human posture variation, viewing angle changes and interference with similar appearance targets.

Key words: machine vision, person re-identification, attention mechanism, Convolutional Neural Network (CNN), temporal modeling

中图分类号:

TP391.41

刘保成, 朴燕, 唐悦. 基于时空正则化的视频序列中行人的再识别[J]. 计算机应用, 2019, 39(11): 3216-3220.

LIU Baocheng, PIAO Yan, TANG Yue. Person re-identification in video sequence based on spatial-temporal regularization[J]. Journal of Computer Applications, 2019, 39(11): 3216-3220.

参考文献

[1] 李幼蛟,卓力,张菁,等.行人再识别技术综述[J].自动化学报, 2018, 44(9):1554-1568. (LI Y J, ZHUO L, ZHANG J, et al. A survey of person re-identification[J]. Acta Automatica Sinica, 2018, 44(9):1554-1568.)
[2] MCLAUGHLIN N, DEL RINCON J M, MILLER P. Recurrent convolutional network for video-based person re-identification[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:1325-1334.
[3] WU Z, WANG X, JIANG Y G, et al. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification[C]//Proceedings of the 23rd ACM International Conference on Multimedia. New York:ACM, 2015:461-470.
[4] LIU Y, YAN J, OUYANG W. Quality aware network for set to set recognition[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:4694-4703.
[5] ZHOU Z, HUANG Y, WANG W, et al. See the forest for the trees:Joint spatial and temporal recurrent neural networks for video-based person re-identification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:4747-4756.
[6] KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:1725-1732.
[7] DENG J, DONG W, SOCHER R, et al. ImageNet:a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2009:248-255.
[8] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778.
[9] YOU J, WU A, LI X, et al. Top-push video-based person re-identification[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:1345-1353.
[10] YAN Y, NI B, SONG Z, et al. Person re-identification via recurrent feature aggregation[C]//Proceedings of the 14th European Conference on Computer Vision. Berlin:Springer, 2016:701-716.
[11] XU K, BA J, KIROS R, et al. Show, attend and tell:Neural image caption generation with visual attention[C]//Proceedings of the 32nd International Conference on Machine Learning.[S. l.]:International Machine Learning Society, 2015:2048-2057.
[12] HERMANS A, BEYR L, LEIBE B. In defense of the triplet loss for person re-identification[EB/OL].[2017-11-21]. http://arxiv.org/pdf/1703.07737.
[13] KINGMA D P, BA J. Adam:a method for stochastic optimization[EB/OL].[2017-01-30]. http://csce.uark.edu/~mgashler/ml/2018_spring/r3/adam.pdf.
[14] ZHENG L, BIE Z, SUN Y, et al. Mars:a video benchmark for large-scale person re-identification[C]//Proceedings of the 14th European Conference on Computer Vision. Berlin:Springer, 2016:868-884.
[15] ZHENG Z, ZHENG L, YANG Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington, DC:IEEE Computer Society, 2017:3754-3762.
[16] RISTANI E, SOLERA F, ZOU R, et al. Performance measures and a data set for multi-target, multi-camera tracking[C]//Proceedings of the 14th European Conference on Computer Vision. Berlin:Springer, 2016:17-35.
[17] LI D, CHEN X, ZHANG Z, et al. Learning deep context-aware features over body and latent parts for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:384-393.
[18] XIAO Q, LUO H, ZHANG C. Margin sample mining loss:a deep learning based method for person re-identification[EB/OL].[2017-10-07]. http://arxiv.org/pdf/1710.00478.
[19] LI S, BAK S, CARR P, et al. Diversity regularized spatiotemporal attention for video-based person re-identification[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:369-378.
[20] LI W, ZHU X, GONG S. Harmonious attention network for person re-identification[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:2285-2294.
[21] LIN Y, ZHENG L, ZHENG Z, et al. Improving person re-identification by attribute and identity learning[J]. Pattern Recognition, 2019, 95:151-161.
[22] CHEN D, LI H, XIAO T, et al. Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:1169-1178.
[23] CHANG X, HOSPEDALES T M, XIANG T. Multi-level factorisation net for person re-identification[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:2109-2118.
[24] CHEN Y, ZHU X, GONG S. Person re-identification by deep learning multi-scale representations[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:2590-2600.
[25] 李姣,张晓晖,朱虹,等.多置信度重排序的行人再识别算法[J].模式识别与人工智能, 2017, 30(11):995-1002. (LI J, ZHANG X H, ZHU H, et al. Person re-identification via multiple confidences re-ranking[J]. Pattern Recognition and Artificial Intelligence, 2017, 30(11):995-1002.)

基于时空正则化的视频序列中行人的再识别

Person re-identification in video sequence based on spatial-temporal regularization

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[2]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[3]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[4]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[5]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[6]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[7]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[8]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[9]	汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399.
[10]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[11]	李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594.
[12]	莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617.
[13]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[14]	刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109.
[15]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.