Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (10): 3200-3208.DOI: 10.11772/j.issn.1001-9081.2021081510

• Multimedia computing and computer simulation • Previous Articles    

Cross-modal person re-identification model based on dynamic dual-attention mechanism

Dawei LI1,2, Zhiyong ZENG1,2   

  1. 1.College of Computer and Cyber Security,Fujian Normal University,Fuzhou Fujian 350117,China
    2.Digital Fujian Institute of Big Data Security Technology,Fujian Normal University,Fuzhou Fujian 350117,China
  • Received:2021-08-24 Revised:2021-12-06 Accepted:2021-12-06 Online:2022-01-07 Published:2022-10-10
  • Contact: Zhiyong ZENG
  • About author:LI Dawei, born in 1997, M. S. candidate. His research interests include person re-identification.
    ZENG Zhiyong, born in 1965, Ph. D. , associate professor. His research interests include object detection, face recognition.

基于动态双注意力机制的跨模态行人重识别模型

李大伟1,2, 曾智勇1,2   

  1. 1.福建师范大学 计算机与网络空间安全学院,福州 350117
    2.福建师范大学 数字福建大数据安全技术研究所,福州 350117
  • 通讯作者: 曾智勇
  • 作者简介:第一联系人:李大伟(1997—),男,安徽六安人,硕士研究生,主要研究方向:行人重识别
    曾智勇(1965—),男,江西龙南人,副教授,博士,主要研究方向:目标检测、人脸识别。zzyong@fjnu.edu.cn

Abstract:

Focused on the issue that huge modal difference between cross-modal person re-identification images, pixel alignment and feature alignment are commonly utilized by most of the existing methods to realize image matching. In order to further improve the accuracy of matching two modal images, a multi-input dual-stream network model based on dynamic dual-attention mechanism was designed. Firstly, the neural network was able to learn sufficient feature information in a limited number of samples by adding images of the same person taken by different cameras in each training batch. Secondly, the gray-scale image obtained by homogeneous augmentation was used as an intermediate bridge to retain the structural information of the visible light images and eliminate the color information at the same time. The use of gray-scale images weakened the network’s dependence on color information, thereby strengthening the network model’s ability to mine structural information. Finally, a Weighted Six-Directional triple Ranking (WSDR) loss suitable for images three modalities was proposed, which made full use of cross-modal triple relationship under different angles of view, optimized relative distance between multiple modal features and improved the robustness to modal changes. Experimental results on SYSU-MM01 dataset show that the proposed model increases evaluation indexes Rank-1 and mean Average Precision (mAP) by 4.66 and 3.41 percentage points respectively compared to Dynamic Dual-attentive AGgregation (DDAG) learning model.

Key words: cross-modal, person re-identification, multi-input dual-stream network, homogeneous augmentation, Weighted Six-Directional triple Ranking (WSDR) loss

摘要:

针对跨模态行人重识别图像间模态差异大的问题,大多数现有方法采用像素对齐、特征对齐来实现图像间的匹配。为进一步提高两种模态图像间的匹配的精度,设计了一个基于动态双注意力机制的多输入双流网络模型。首先,在每个批次的训练中通过增加同一行人在不同相机下的图片,让神经网络在有限的样本中学习到充分的特征信息;其次,利用齐次增强得到灰度图像作为中间桥梁,在保留了可见光图像结构信息的同时消除了颜色信息,而灰度图像的运用弱化了网络对颜色信息的依赖,从而加强了网络模型挖掘结构信息的能力;最后,提出了适用于3个模态间图像的加权六向三元组排序(WSDR)损失,所提损失充分利用了不同视角下的跨模态三元组关系,优化了多个模态特征间的相对距离,并提高了对模态变化的鲁棒性。实验结果表明,在SYSU-MM01数据集上,与动态双注意聚合(DDAG)学习模型相比,所提模型在评价指标Rank-1和平均精确率均值(mAP)上分别提升了4.66和3.41个百分点。

关键词: 跨模态, 行人重识别, 多输入双流网络, 齐次增强, 加权六向三元组排序损失

CLC Number: