计算机应用 ›› 2019, Vol. 39 ›› Issue (11): 3216-3220.DOI: 10.11772/j.issn.1001-9081.2019051084

• 2019年中国计算机学会人工智能会议(CCFAI2019)论文 • 上一篇    下一篇

基于时空正则化的视频序列中行人的再识别

刘保成, 朴燕, 唐悦   

  1. 长春理工大学 电子信息工程学院, 长春 130012
  • 收稿日期:2019-05-24 修回日期:2019-06-24 发布日期:2019-09-11 出版日期:2019-11-10
  • 通讯作者: 朴燕
  • 作者简介:刘保成(1995-),男,吉林白山人,硕士研究生,CCF会员,主要研究方向:机器学习、计算机视觉;朴燕(1965-),女,吉林长春人,教授,博士,主要研究方向:计算机视觉、模式识别;唐悦(1994-),女,吉林长春人,硕士研究生,主要研究方向:深度学习、计算机视觉。
  • 基金资助:
    吉林省科技支撑项目(20180201091GX);吉林省科技创新中心项目(20180623039TC)。

Person re-identification in video sequence based on spatial-temporal regularization

LIU Baocheng, PIAO Yan, TANG Yue   

  1. College of Electronic Information Engineering, Changchun University of Science and Technology, Changchun Jilin 130012, China
  • Received:2019-05-24 Revised:2019-06-24 Online:2019-09-11 Published:2019-11-10
  • Supported by:
    This work is partially supported by the Science and Technology Support Project of Jilin Province (20180201091GX), the Project of Jilin Provincial Science and Technology Innovation Center (20180623039TC).

摘要: 由于现实复杂情况中各种因素的干扰,行人再识别的过程中可能出现识别错误等问题。为了提高行人再识别的准确性,提出了一种基于时空正则化的行人再识别算法。首先,利用ResNet-50网络对输入的视频序列逐帧进行特征提取,将一系列帧级特征输入到时空正则化网络并产生对应的权重分数;然后,对帧级特征使用加权平均得到视频序列级特征,为避免权重分数聚集在一帧,使用帧级正则化来限制帧间差异;最后,通过最小化损失得到最优结果。在DukeMTMC-ReID和MARS数据集中做了大量的测试,实验结果表明,所提方法与Triplet算法相比能够有效提高行人再识别的平均精度(mAP)和准确率,并且对于人体姿势变化、视角变化和相似外观目标的干扰具有出色的性能表现。

关键词: 机器视觉, 行人再识别, 注意力机制, 卷积神经网络, 时间建模

Abstract: Due to the interference of various factors in the complex situation of reality, the errors may occur in the person re-identification. To improve the accuracy of person re-identification, a person re-identification algorithm based on spatial-temporal regularization was proposed. Firstly, the ResNet-50 network was used to extract the features of the input video sequence frame by frame, and the series of frame-level features were input into the spatial-temporal regularization network to generate corresponding weight scores. Then the weighted average was performed on the frame-level features to obtain the sequence-level features. To avoid weight scores from being aggregated in one frame, frame-level regularization was used to limit the difference between frames. Finally, the optimal results were obtained by minimizing the losses. A large number of tests were performed on MARS and DukeMTMC-ReID datasets. The experimental results show that the mean Average Precision (mAP) and the accuracy can be effectively improved by the proposed algorithm compared with Triplet algorithm. And the proposed algorithm has excellent performance for human posture variation, viewing angle changes and interference with similar appearance targets.

Key words: machine vision, person re-identification, attention mechanism, Convolutional Neural Network (CNN), temporal modeling

中图分类号: