计算机应用 ›› 2021, Vol. 41 ›› Issue (2): 530-536.DOI: 10.11772/j.issn.1001-9081.2020050739

所属专题: 多媒体计算与计算机仿真

• 多媒体计算与计算机仿真 • 上一篇    下一篇

基于非局部关注和多重特征融合的视频行人重识别

刘紫燕1, 朱明成1, 袁磊1, 马珊珊1, 陈霖周廷2   

  1. 1. 贵州大学 大数据与信息工程学院, 贵阳 550025;
    2. 贵州理工学院 航空航天工程学院, 贵阳 550003
  • 收稿日期:2020-06-01 修回日期:2020-07-27 出版日期:2021-02-10 发布日期:2020-08-14
  • 通讯作者: 刘紫燕
  • 作者简介:刘紫燕(1974-),女,贵州都匀人,副教授,硕士,CCF会员,主要研究方向:无线通信系统、移动机器人、大数据挖掘分析;朱明成(1993-),男,江苏江阴人,硕士研究生,主要研究方向:行人重识别;袁磊(1995-),男,贵州思南人,硕士研究生,主要研究方向:目标检测;马珊珊(1996-),女,贵州遵义人,硕士研究生,主要研究方向:深度学习;陈霖周廷(1981-),男,浙江青田人,副教授,博士,主要研究方向:智能控制。
  • 基金资助:
    贵州省科学技术基金资助项目(黔科合基础[2016]1054);贵州省联合资金资助项目(黔科合LH字[2017]7226号);贵州大学2017年度学术新苗培养及创新探索专项(黔科合平台人才[2017]5788);贵州省科技计划项目(黔科合基础[2017]1069);贵州省教育厅创新群体重大研究项目(黔教合KY字[2018]026);贵州省普通高等学校工程研究中心项目(黔教合KY字[2018]007);贵州省科技计划重点项目([2019]1416)。

Video person re-identification based on non-local attention and multi-feature fusion

LIU Ziyan1, ZHU Mingcheng1, YUAN Lei1, MA Shanshan1, CHEN Lingzhouting2   

  1. 1. College of Big Data and Information Engineering, Guizhou University, Guiyang Guizhou 550025, China;
    2. School of Aerospace Engineering, Guizhou Institute of Technology, Guiyang Guizhou 550003, China
  • Received:2020-06-01 Revised:2020-07-27 Online:2021-02-10 Published:2020-08-14
  • Supported by:
    This work is partially supported by the Natural Science Foundation of Guizhou Province ([2016]1054), the Joint Natural Science Foundation of Guizhou Province (LH[2017]7226), the 2017 Special Project of New Academic Talent Training and Innovation Exploration of Guizhou University ([2017]5788), the Guizhou Provincial Science and Technology Program ([2017]1069), the Major Research Program of Innovation Groups of Guizhou Educational Department ([2018]026), the Project of Engineering Research Center of Guizhou Colleges and Universities ([2018]007), the Key Project of Science and Technology Plan of Guizhou Province ([2019]1416).

摘要: 现有视频行人重识别方法无法有效地提取视频连续帧之间的时空信息,因此提出一种基于非局部关注和多重特征融合的行人重识别网络来提取全局与局部表征特征和时序信息。首先嵌入非局部关注模块来提取全局特征;然后通过提取网络的低中层特征和局部特征实现多重特征融合,从而获得行人的显著特征;最后将行人特征进行相似性度量并排序,计算出视频行人重识别的精度。在大数据集MARS和DukeMTMC-VideoReID上进行实现,结果显示所提出的模型较现有的多尺度三维卷积(M3D)和学习片段相似度聚合(LCSA)模型的性能均有明显提升,平均精度均值(mAP)分别达到了81.4%和93.4%,Rank-1分别达到了88.7%和95.3%;同时在小数据集PRID2011上,所提模型的Rank-1也达到94.8%。

关键词: 视频行人重识别, 时空信息, 全局特征, 非局部关注, 特征融合

Abstract: Aiming at the fact that the existing video person re-identification methods cannot effectively extract the spatiotemporal information between consecutive frames of the video, a person re-identification network based on non-local attention and multi-feature fusion was proposed to extract global and local representation features and time series information. Firstly, the non-local attention module was embedded to extract global features. Then, the multi-feature fusion was realized by extracting the low-level and middle-level features as well as the local features, so as to obtain the salient features of the person. Finally, the similarity measurement and sorting were performed to the person features in order to calculate the accuracy of video person re-identification. The proposed model has significantly improved performance compared to the existing Multi-scale 3D Convolution (M3D) and Learned Clip Similarity Aggregation (LCSA) models with the mean Average Precision (mAP) reached 81.4% and 93.4% respectively and the Rank-1 reached 88.7% and 95.3% respectively on the large datasets MARS and DukeMTMC-VideoReID. At the same time, the proposed model has the Rank-1 reached 94.8% on the small dataset PRID2011.

Key words: video person re-identification, spatiotemporal information, global feature, non-local attention, feature fusion

中图分类号: