Cross-domain person re-identification method based on attention mechanism with learning intra-domain variance

doi:10.11772/j.issn.1001-9081.2021030459

Abstract

Abstract:

To solve severe performance degradation problem of person re-identification task during cross-domain migration， a new cross-domain person re-identification method based on attention mechanism with learning intra-domain variance was proposed. Firstly， ResNet50 was used as the backbone network and some modifications were made to it， so that it was more suitable for person re-identification task. And Instance-Batch Normalization Network （IBN-Net） was introduced to improve the generalization ability of model. At the same time， for the purpose of learning more discriminative features， a region attention branch was added to the backbone network. For the training of source domain， it was treated as a classification task. Cross-entropy loss was utilized for supervised learning of source domain， and triplet loss was introduced to mine the details of source domain samples and improve the classification performance of source domain. For the training of target domain， intra-domain variance was considered to adapt the difference in data distribution between the source domain and the target domain. In the test phase， the output of ResNet50 pool-5 layer was used as image features， and Euclidean distance between query image and candidate image was calculated to measure the similarity of them. In the experiments on two large-scale public datasets of Market-1501 and DukeMTMC-reID， the Rank-1 accuracy of the proposed method is 80.1% and 67.7% respectively， and its mean Average Precision （mAP） is 49.5% and 44.2% respectively. Experimental results show that， the proposed method has better performance in improving generalization ability of model.

Key words: unsupervised domain adaptation, intra-domain variance, person re-identification, attention mechanism, discriminative feature

摘要：

针对行人重识别任务跨域迁移时性能严重衰退的问题，提出了一种基于注意力机制学习域内变化的跨域行人重识别方法。首先，以ResNet50为基础架构并对其进行调整使其适合行人重识别任务，并引入实例-批归一化网络（IBN-Net）以提高模型的泛化能力，同时增加区域注意力分支以提取更具鉴别性的行人特征。对于源域的训练，将其作为分类任务，使用交叉熵损失进行源域的有监督学习，同时引入三元组损失来挖掘源域样本的细节，从而提高源域的分类性能。对于目标域的训练，通过学习域内变化来适应源域和目标域间的数据分布差异。在测试阶段，以ResNet50 pool-5层的输出作为图像特征，并计算查询图像与候选图像间的欧氏距离来度量两者的相似度。在两个大规模公共数据集Market-1501和DukeMTMC-reID上进行实验，所提方法的Rank-1准确率分别达到80.1%和67.7%，平均精度均值（mAP）分别为49.5%和44.2%。实验结果表明，所提方法在提高模型泛化能力方面性能较优。

关键词: 无监督域适应, 域内变化, 行人重识别, 注意力机制, 鉴别特征

CLC Number:

TP391.41

Daili CHEN, Guoliang XU. Cross-domain person re-identification method based on attention mechanism with learning intra-domain variance[J]. Journal of Computer Applications, 2022, 42(5): 1391-1397.

陈代丽, 许国良. 基于注意力机制学习域内变化的跨域行人重识别方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1391-1397.

Figures/Tables 9

References 26

1	LI Y， WU Z Y， KARANAM S， et al. Real-world re-identification in an airport camera network ［C］// Proceedings of the 2014 International Conference on Distributed Smart Cameras. New York： ACM， 2014： 1-6. 10.1145/2659021.2659039
2	LUO C C， SONG C F， ZHANG Z X. Generalizing person re-identification by camera-aware invariance learning and cross-domain mixup ［C］// Proceedings of the 2020 16th European Conference on Computer Vision， LNCS 12360. Cham： Springer， 2020： 224-241.
3	FU Y， WEI Y C， WANG G S， et al. Self-similarity grouping： a simple unsupervised cross domain adaptation approach for person re-identification ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 6111-6120. 10.1109/iccv.2019.00621
4	YU H X， ZHENG W S， WU A， et al. Unsupervised person re-identification by soft multi-label learning ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 2148-2157. 10.1109/cvpr.2019.00225
5	DENG W J， ZHENG L， YE Q X， et al. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 994-1003. 10.1109/cvpr.2018.00110
6	WEI L H， ZHANG S L， GAO W， et al. Person transfer GAN to bridge domain gap for person re-identification ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 79-88. 10.1109/cvpr.2018.00016
7	ZHANG W Y， ZHU L， LU L. Improving the style adaptation for unsupervised cross-domain person re-identification ［C］// Proceedings of the 2020 International Joint Conference on Neural Networks. Piscataway： IEEE. 2020： 1-8. 10.1109/ijcnn48605.2020.9207712
8	ZHONG Z， ZHENG L， LUO Z M， et al. Invariance matters： exemplar memory for domain adaptive person re-identification ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 598-607. 10.1109/cvpr.2019.00069
9	LI Y Y， YAO H T， XU C S. Intra-domain Consistency Enhancement for Unsupervised Person Re-identification ［J］. IEEE Transactions on Multimedia， 2021， 24： 415-425. 10.1109/tmm.2021.3052354
10	WANG J Y， ZHU X T， GONG S G， et al. Transferable joint attribute-identity deep learning for unsupervised person re-identification ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 2275-2284. 10.1109/cvpr.2018.00242
11	IOFFE S， SZEGEDY C. Batch normalization： accelerating deep network training by reducing internal covariate shift ［C］// Proceedings of the 2015 32nd International Conference on Machine Learning. New York： ACM， 2015： 448-456.
12	HUANG X， BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 1510-1519. 10.1109/iccv.2017.167
13	PAN X G， LUO P， SHI J P， et al. Two at once： enhancing learning and generalization capacities via IBN-Net ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11208. Cham： Springer， 2018： 484-500.
14	WOO S Y， PARK J C， LEE J-Y， et al. CBAM： convolutional block attention module ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11211. Cham： Springer， 2018： 3-19.
15	李佳宾，李学伟，刘宏哲，等.基于局部特征关联与全局注意力机制的行人重识别［J］.计算机工程，2022，48（1）：245-252. 10.1155/2022/6041828
	LI J B， LI X W， LIU H Z， et al. Person recognition based on local features relation and global attention mechanism ［J］. Computer Engineering， 2022， 48（1）： 245-252. 10.1155/2022/6041828
16	HERMANS A， BEYER L， LEIBE B. In defense of the triplet loss for person re-identification ［EB/OL］. ［2020-12-13］. .
17	廖华年，徐新.基于注意力机制的跨分辨率行人重识别［J］.北京航空航天大学学报，2021，47（3）：605-612. 10.1109/icpr48806.2021.9413309
	LIAO H N， XU X Cross-resolution person re-identification based on attention mechanism ［J］. Journal of Beijing University of Aeronautics and Astronautics， 2021， 47（3）： 605-612. 10.1109/icpr48806.2021.9413309
18	LIN Y T， XIE L X， WU Y， et al. Unsupervised person re-identification via softened similarity learning ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 3387-3396. 10.1109/cvpr42600.2020.00345
19	ZHU J Y， PARK T， ISOLA P， et al. Unpaired image to-image translation using cycle-consistent adversarial networks ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2242-2251. 10.1109/iccv.2017.244
20	ZHONG Z， ZHENG L， ZHENG Z D， et al. Camera style adaptation for person reidentification ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 5157-5166. 10.1109/cvpr.2018.00541
21	ZHENG L， SHEN L Y， TIAN L， et al. Scalable person re-identification： a benchmark ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1116-1124. 10.1109/iccv.2015.133
22	ZHENG Z D， ZHENG L， YANG Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 3774-3782. 10.1109/iccv.2017.405
23	RISTANI E， SOLERA F， ZOU R S， et al. Performance measures and a data set for multi-target， multi-camera tracking ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9914. Cham： Springer， 2016： 17-35.
24	LIAO S C， HU Y， ZHU X Y， et al. Person re-identification by local maximal occurrence representation and metric learning ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 2197-2206. 10.1109/cvpr.2015.7298832
25	MEKHAZNI D， BHUIYAN A， ESKANDER G， et al. Unsupervised domain adaptation in the dissimilarity space for person re-identification ［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12372. Cham： Springer， 2020： 159-174.
26	SELVARAJU R R， COGSWELL M， DAS A， et al. Grad-CAM： visual explanations from deep networks via gradient-based localization ［J］. International Journal of Computer Vision， 2020， 128（2）： 336-359. 10.1007/s11263-019-01228-7

方法	DukeMTMC-reID→ Market-1501		Market-1501→DukeMTMC-reID
方法	mAP	Rank-1	mAP	Rank-1
Baseline	20.2	45.6	19.4	48.7
Baseline+GFB	47.9	78.6	43.4	65.8
Baseline+RAB	49.5	78.9	44.3	66.4
Baseline+GFB+RAB	49.5	80.1	44.2	67.7

方法	DukeMTMC-reID→ Market-1501		Market-1501→DukeMTMC-reID
方法	mAP	Rank-1	mAP	Rank-1
Baseline	20.2	45.6	19.4	48.7
Baseline+GFB	47.9	78.6	43.4	65.8
Baseline+RAB	49.5	78.9	44.3	66.4
Baseline+GFB+RAB	49.5	80.1	44.2	67.7

方法	DukeMTMC-reID → Market-1501				Market-1501 → DukeMTMC-reID
方法	Rank-1	Rank-5	Rank-10	mAP	Rank-1	Rank-5	Rank-10	mAP
LOMO	27.2	41.6	49.1	8.0	12.3	21.3	26.6	4.8
BoW	35.8	52.4	60.3	14.8	17.1	28.8	34.9	8.3
SSG	80.0	90.0	92.4	58.3	73.0	80.6	83.2	53.4
MAR	67.7	81.9	—	40.0	67.1	79.8	—	48.0
SPGAN	51.5	70.1	76.8	22.8	41.1	56.6	63.0	22.3
PTGAN	38.6	—	66.7	—	27.4	—	50.7	—
CamStyle	58.8	78.2	84.3	27.4	48.4	62.5	68.9	25.1
CSGAN	61.9	78.8	84.4	29.7	47.8	63.5	67.2	26.3
ECN	75.1	78.8	84.0	43.0	63.3	75.8	80.4	40.4
D-MMD	70.6	87.0	90.2	48.8	63.5	78.8	83.9	46.0
ICE	90.8	95.8	97.2	73.8	80.2	88.5	91.6	66.4
本文方法	80.1	91.1	93.9	49.5	67.7	79.1	82.5	44.2

方法	DukeMTMC-reID → Market-1501				Market-1501 → DukeMTMC-reID
方法	Rank-1	Rank-5	Rank-10	mAP	Rank-1	Rank-5	Rank-10	mAP
LOMO	27.2	41.6	49.1	8.0	12.3	21.3	26.6	4.8
BoW	35.8	52.4	60.3	14.8	17.1	28.8	34.9	8.3
SSG	80.0	90.0	92.4	58.3	73.0	80.6	83.2	53.4
MAR	67.7	81.9	—	40.0	67.1	79.8	—	48.0
SPGAN	51.5	70.1	76.8	22.8	41.1	56.6	63.0	22.3
PTGAN	38.6	—	66.7	—	27.4	—	50.7	—
CamStyle	58.8	78.2	84.3	27.4	48.4	62.5	68.9	25.1
CSGAN	61.9	78.8	84.4	29.7	47.8	63.5	67.2	26.3
ECN	75.1	78.8	84.0	43.0	63.3	75.8	80.4	40.4
D-MMD	70.6	87.0	90.2	48.8	63.5	78.8	83.9	46.0
ICE	90.8	95.8	97.2	73.8	80.2	88.5	91.6	66.4
本文方法	80.1	91.1	93.9	49.5	67.7	79.1	82.5	44.2

[1]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[2]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[3]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[4]	Jieru JIA, Jianchao YANG, Shuorui ZHANG, Tao YAN, Bin CHEN. Unsupervised person re-identification based on self-distilled vision Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2893-2902.
[5]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[6]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[7]	Cui WANG, Miaolei DENG, Dexian ZHANG, Lei LI, Xiaoyan YANG. Review of end-to-end person search algorithms based on images [J]. Journal of Computer Applications, 2024, 44(8): 2544-2550.
[8]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[9]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[10]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[11]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[12]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[13]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.
[14]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[15]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.