Cross-modal person re-identification model based on dynamic dual-attention mechanism

doi:10.11772/j.issn.1001-9081.2021081510

Abstract

Abstract:

Focused on the issue that huge modal difference between cross-modal person re-identification images， pixel alignment and feature alignment are commonly utilized by most of the existing methods to realize image matching. In order to further improve the accuracy of matching two modal images， a multi-input dual-stream network model based on dynamic dual-attention mechanism was designed. Firstly， the neural network was able to learn sufficient feature information in a limited number of samples by adding images of the same person taken by different cameras in each training batch. Secondly， the gray-scale image obtained by homogeneous augmentation was used as an intermediate bridge to retain the structural information of the visible light images and eliminate the color information at the same time. The use of gray-scale images weakened the network’s dependence on color information， thereby strengthening the network model’s ability to mine structural information. Finally， a Weighted Six-Directional triple Ranking （WSDR） loss suitable for images three modalities was proposed， which made full use of cross-modal triple relationship under different angles of view， optimized relative distance between multiple modal features and improved the robustness to modal changes. Experimental results on SYSU-MM01 dataset show that the proposed model increases evaluation indexes Rank-1 and mean Average Precision （mAP） by 4.66 and 3.41 percentage points respectively compared to Dynamic Dual-attentive AGgregation （DDAG） learning model.

Key words: cross-modal, person re-identification, multi-input dual-stream network, homogeneous augmentation, Weighted Six-Directional triple Ranking (WSDR) loss

摘要：

针对跨模态行人重识别图像间模态差异大的问题，大多数现有方法采用像素对齐、特征对齐来实现图像间的匹配。为进一步提高两种模态图像间的匹配的精度，设计了一个基于动态双注意力机制的多输入双流网络模型。首先，在每个批次的训练中通过增加同一行人在不同相机下的图片，让神经网络在有限的样本中学习到充分的特征信息；其次，利用齐次增强得到灰度图像作为中间桥梁，在保留了可见光图像结构信息的同时消除了颜色信息，而灰度图像的运用弱化了网络对颜色信息的依赖，从而加强了网络模型挖掘结构信息的能力；最后，提出了适用于3个模态间图像的加权六向三元组排序（WSDR）损失，所提损失充分利用了不同视角下的跨模态三元组关系，优化了多个模态特征间的相对距离，并提高了对模态变化的鲁棒性。实验结果表明，在SYSU-MM01数据集上，与动态双注意聚合（DDAG）学习模型相比，所提模型在评价指标Rank-1和平均精确率均值（mAP）上分别提升了4.66和3.41个百分点。

关键词: 跨模态, 行人重识别, 多输入双流网络, 齐次增强, 加权六向三元组排序损失

CLC Number:

TP391

Dawei LI, Zhiyong ZENG. Cross-modal person re-identification model based on dynamic dual-attention mechanism[J]. Journal of Computer Applications, 2022, 42(10): 3200-3208.

李大伟, 曾智勇. 基于动态双注意力机制的跨模态行人重识别模型[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3200-3208.

Figures/Tables 12

References 39

1	宋婉茹，赵晴晴，陈昌红，等. 行人重识别研究综述［J］. 智能系统学报， 2017， 12（6）：770-780. 10.11992/tis.201706084
	SONG W R， ZHAO Q Q， CHEN C H， et al. Survey on pedestrian re-identification research［J］. CAAI Transactions on Intelligent Systems， 2017， 12（6）：770-780. 10.11992/tis.201706084
2	YE M， SHEN J B， SHAO L. Visible-infrared person re-identification via homogeneous augmented tri-modal learning［J］. IEEE Transactions on Information Forensics and Security， 2021， 16： 728-739. 10.1109/tifs.2020.3001665
3	DAI P， JI R， WANG H， et al. Cross-modality person re-identification with generative adversarial training［C］// Proceedings of the 27th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2018： 677-683. 10.24963/ijcai.2018/94
4	WANG G A， ZHANG T Z， CHENG J， et al. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 3622-3631. 10.1109/iccv.2019.00372
5	YE M， SHEN J B， CRANDALL D J， et al. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12362. Cham： Springer， 2020： 229-247.
6	YI D， LEI Z， LIAO S C， et al. Deep metric learning for person re-identification［C］// Proceedings of the 22nd International Conference on Pattern Recognition. Piscataway： IEEE， 2014： 34-39. 10.1109/icpr.2014.16
7	JÜNGLING K， BODENSTEINER C， ARENS M. Person re-identification in multi-camera networks［C］// Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2011： 55-61. 10.1109/cvprw.2011.5981771
8	ZHENG L， BIE Z， SUN Y F， et al. MARS： a video benchmark for large-scale person re-identification［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9910. Cham： Springer， 2016： 868-884.
9	FELZENSZWALB P， McALLESTER D， RAMANAN D. A discriminatively trained， multiscale， deformable part model［C］// Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2008： 1-8. 10.1109/cvpr.2008.4587597
10	ZHENG W S， GONG S G， XIANG T. Reidentification by relative distance comparison［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2013， 35（3）： 653-668. 10.1109/tpami.2012.138
11	WEINBERGER K Q， SAUL L K. Distance metric learning for large margin nearest neighbor classification［J］. Journal of Machine Learning Research， 2009， 10：207-244.
12	LIAO S C， HU Y， ZHU X Y， et al. Person re-identification by local maximal occurrence representation and metric learning［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 2197-2206. 10.1109/cvpr.2015.7298832
13	ZHENG W S， GONG S G， XIANG T. Person re-identification by probabilistic relative distance comparison［C］// Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2011： 649-656. 10.1109/cvpr.2011.5995598
14	PEDAGADI S， ORWELL J， VELASTIN S， et al. Local Fisher discriminant analysis for pedestrian re-identification［C］// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2013： 3318-3325. 10.1109/cvpr.2013.426
15	WANG J Y， ZHU X T， GONG S G， et al. Transferable joint attribute-identity deep learning for unsupervised person re-identification［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 2275-2284. 10.1109/cvpr.2018.00242
16	ZHANG X， LUO H， FAN X， et al. AlignedReID： surpassing human-level performance in person re-identification［EB/OL］. （2018-01-31）［2021-10-10］..
17	SUN Y F， XU Q， LI Y L， et al. Perceive where to focus： learning visibility-aware part-level features for partial person re-identification［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 393-402. 10.1109/cvpr.2019.00048
18	SUN Y F， ZHENG L， YANG Y， et al. Beyond part models： person retrieval with refined part pooling （and a strong convolutional baseline）［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11208. Cham： Springer， 2018： 501-518.
19	ZHENG F， DENG C， SUN X， et al. Pyramidal person re-identification via multi-loss dynamic training［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 8506-8514. 10.1109/cvpr.2019.00871
20	WANG Z X， WANG Z， ZHENG Y Q， et al. Learning to reduce dual-level discrepancy for infrared-visible person re-identification［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 618-626. 10.1109/cvpr.2019.00071
21	YE M， LAN X Y， LI J W， et al. Hierarchical discriminative learning for visible thermal person re-identification［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 7501-7508. 10.1609/aaai.v32i1.12293
22	LI S， XIAO T， LI H S， et al. Identity-aware textual-visual matching with latent co-attention［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 1908-1917. 10.1109/iccv.2017.209
23	PANG L， WANG Y W， SONG Y Z， et al. Cross-domain adversarial feature learning for sketch re-identification［C］// Proceedings of the 26th ACM International Conference on Multimedia. New York： ACM， 2018： 609-617. 10.1145/3240508.3240606
24	WU A C， ZHENG W S， YU H X， et al. RGB-infrared cross-modality person re-identification［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 5390-5399. 10.1109/iccv.2017.575
25	YE M， LAN X Y， WANG Z， et al. Bi-directional center-constrained top-ranking for visible thermal person re-identification［J］. IEEE Transactions on Information Forensics and Security， 2020， 15： 407-419. 10.1109/tifs.2019.2921454
26	ZHU Y X， YANG Z， WANG L， et al. Hetero-center loss for cross-modality person Re-identification［J］. Neurocomputing， 2020， 386： 97-109. 10.1016/j.neucom.2019.12.100
27	HAO Y， WANG N N， LI J， et al. HSME： hypersphere manifold embedding for visible thermal person re-identification［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019： 8385-8392. 10.1609/aaai.v33i01.33018385
28	YE M， LAN X Y， LENG Q M. Modality-aware collaborative learning for visible thermal person re-identification［C］// Proceedings of the 27th ACM International Conference on Multimedia. New York： ACM， 2019： 347-355. 10.1145/3343031.3351043
29	FENG Z X， LAI J H， XIE X H. Learning modality-specific representations for visible-infrared person re-identification［J］. IEEE Transactions on Image Processing， 2020， 29： 579-590. 10.1109/tip.2019.2928126
30	LIU C T， WU C W， WANG Y C F， et al. Spatially and temporally efficient non-local attention network for video-based person re-identification［C］// Proceedings of the 2019 British Machine Vision Conference. Durham： BMVA Press， 2019： No.77. 10.1145/3377170.3377253
31	SHAO R， LAN X Y， LI J W， et al. Multi-adversarial discriminative deep domain generalization for face presentation attack detection［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 10015-10023. 10.1109/cvpr.2019.01026
32	YIN J H， MA Z Y， XIE J Y， et al. DF²AM： dual-level feature fusion and affinity modeling for RGB-infrared cross-modality person re-identification［EB/OL］. （2021-04-01）［2021-06-10］.. 10.1016/j.neucom.2022.09.077
33	HARWOOD B， VIJAY K B G， CARNEIRO G， et al. Smart mining for deep metric learning［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2840-2848. 10.1109/iccv.2017.307
34	WANG Y M， CHOI J， MORARIU V I， et al. Mining discriminative triplets of patches for fine-grained classification［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 1163-1172. 10.1109/cvpr.2016.131
35	WANG C， ZHANG Q， HUANG C， et al. Mancs： a multi-task attentional network with curriculum sampling for person re-identification［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11208. Cham： Springer， 2018： 384-400.
36	NGUYEN D T， HONG H G， KIM K W， et al. Person recognition system based on a combination of body images from visible light and thermal cameras［J］. Sensors， 2017， 17（3）： No.605. 10.3390/s17030605
37	HERMANS A， BEYER L， LEIBE B. In defense of the triplet loss for person re-identification［EB/OL］. （2017-11-21）［2020-10-10］.. 10.21203/rs.3.rs-1501673/v1
38	DALAL N， TRIGGS B. Histograms of oriented gradients for human detection［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2005： 886-893. 10.1109/cvpr.2005.4
39	YE M， SHEN J B， LIN G J， et al. Deep learning for person re-identification： a survey and outlook［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（6）： 2872-2893.

模式	全局搜索					室内搜索
模式	r=1	r=5	r=10	r=20	mAP	r=1	r=5	r=10	r=20	mAP
B	0.547 5	0.823 1	0.903 9	0.958 1	0.530 2	0.610 2	0.871 3	0.940 6	0.984 1	0.679 8
B+H0	0.568 1	0.825 7	0.912 4	0.964 1	0.534 2	0.629 1	0.881 0	0.935 6	0.979 1	0.689 9
B+DHHI	0.572 4	0.829 3	0.915 7	0.966 4	0.542 5	0.636 2	0.888 5	0.941 2	0.982 1	0.691 3
B+DHHI+SDR	0.593 7	0.852 3	0.929 8	0.972 4	0.563 1	0.650 7	0.899 5	0.956 7	0.986 5	0.715 5
B+DHHI+WSDR	0.594 1	0.854 9	0.934 5	0.975 8	0.564 3	0.652 5	0.901 1	0.959 5	0.989 7	0.718 9

模式	全局搜索					室内搜索
模式	r=1	r=5	r=10	r=20	mAP	r=1	r=5	r=10	r=20	mAP
B	0.547 5	0.823 1	0.903 9	0.958 1	0.530 2	0.610 2	0.871 3	0.940 6	0.984 1	0.679 8
B+H0	0.568 1	0.825 7	0.912 4	0.964 1	0.534 2	0.629 1	0.881 0	0.935 6	0.979 1	0.689 9
B+DHHI	0.572 4	0.829 3	0.915 7	0.966 4	0.542 5	0.636 2	0.888 5	0.941 2	0.982 1	0.691 3
B+DHHI+SDR	0.593 7	0.852 3	0.929 8	0.972 4	0.563 1	0.650 7	0.899 5	0.956 7	0.986 5	0.715 5
B+DHHI+WSDR	0.594 1	0.854 9	0.934 5	0.975 8	0.564 3	0.652 5	0.901 1	0.959 5	0.989 7	0.718 9

损失策略	全局搜索		室内搜索
损失策略	r=1	mAP	r=1	mAP
Triplet（Hard）^［37］	0.539 1	0.517 6	0.585 7	0.658 9
WTDR ^［2］	0.564 2	0.533 2	0.625 4	0.687 2
WSDR	0.582 3	0.550 8	0.641 0	0.703 9

损失策略	全局搜索		室内搜索
损失策略	r=1	mAP	r=1	mAP
Triplet（Hard）^［37］	0.539 1	0.517 6	0.585 7	0.658 9
WTDR ^［2］	0.564 2	0.533 2	0.625 4	0.687 2
WSDR	0.582 3	0.550 8	0.641 0	0.703 9

策略	全局搜索		室内搜索
策略	r=1	mAP	r=1	mAP
Base	0.573 3	0.542 6	0.634 1	0.685 0
Base+IWPA	0.583 9	0.550 4	0.641 2	0.693 4
Base+CGSA	0.573 5	0.548 0	0.635 0	0.690 3
Base+IWPA+CGSA	0.594 1	0.564 3	0.652 5	0.718 9