基于多模态图卷积神经网络的行人重识别方法

doi:10.11772/j.issn.1001-9081.2022060827

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (7): 2182-2189.DOI: 10.11772/j.issn.1001-9081.2022060827

所属专题：人工智能

基于多模态图卷积神经网络的行人重识别方法

何嘉明¹, 杨巨成¹(), 吴超¹, 闫潇宁², 许能华²

^1.天津科技大学人工智能学院，天津 300457
^2.深圳市安软科技股份有限公司，广东深圳 518131

收稿日期:2022-06-10 修回日期:2022-09-02 接受日期:2022-09-09 发布日期:2022-10-11 出版日期:2023-07-10
通讯作者: 杨巨成
作者简介:何嘉明（1995—），男，广东清远人，硕士研究生，CCF会员，主要研究方向：行人重识别；
杨巨成（1980—），男，湖北天门人，教授，博士，CCF杰出会员，主要研究方向：图像处理、模式识别；
吴超（1974—），男，湖北咸宁人，讲师，博士，主要研究方向：图像处理、模式识别；
闫潇宁（1989—），男，山西大同人，硕士，主要研究方向：视频图像人工智能；
许能华（1982—），男，江西上饶人，主要研究方向：视频图像人工智能。

Person re-identification method based on multi-modal graph convolutional neural network

Jiaming HE¹, Jucheng YANG¹(), Chao WU¹, Xiaoning YAN², Nenghua XU²

^1.College of Artificial Intelligence，Tianjin University of Science and Technology，Tianjin 300457，China
^2.Shenzhen Softsz Technology Company Limited，Shenzhen Guangdong 518131，China

Received:2022-06-10 Revised:2022-09-02 Accepted:2022-09-09 Online:2022-10-11 Published:2023-07-10
Contact: Jucheng YANG
About author:HE Jiaming， born in 1995， M. S. candidate. His research interests include person re-identification.
YANG Jucheng， born in 1980， Ph. D.， professor. His research interests include image processing， pattern recognition.
WU Chao， born in 1974， Ph. D.， lecturer. His research interests include image processing， pattern recognition.
YAN Xiaoning， born in 1989， M. S. His research interests include artificial intelligence of video image.
XU Nenghua， born in 1982. His research interests include artificial intelligence of video image.

摘要/Abstract

摘要：

针对行人重识别中行人文本属性信息未被充分利用以及文本属性之间语义联系未被挖掘的问题，提出一种基于多模态的图卷积神经网络（GCN）行人重识别方法。首先使用深度卷积神经网络（DCNN）学习行人文本属性与行人图像特征；然后借助GCN有效的关系挖掘能力，将文本属性特征与图像特征作为GCN的输入，通过图卷积运算来传递文本属性节点间的语义信息，从而学习文本属性间隐含的语义联系信息，并将该语义信息融入图像特征中；最后GCN输出鲁棒的行人特征。该多模态的行人重识别方法在Market-1501数据集上获得了87.6%的平均精度均值（mAP）和95.1%的Rank-1准确度；在DukeMTMC-reID数据集上获得了77.3%的mAP和88.4%的Rank-1准确度，验证了所提方法的有效性。

关键词: 行人重识别, 多模态, 图卷积神经网络, 行人文本属性, 隐含语义联系

Abstract:

Aiming at the problems that person textual attribute information is not fully utilized and the semantic relationships among the textual attributes are not mined in person re-identification， a person re-identification method based on multi-modal Graph Convolutional neural Network （GCN） was proposed. Firstly， Deep Convolutional Neural Network （DCNN） was used to learn person textual attributes and person image features. Then， with the help of the effective relationship mining ability of GCN， the textual attribute features and image features were treated as the input of GCN， and the semantic information of the textual attribute nodes was transferred through the graph convolution operation， so as to learn the implicit semantic relationship information among the textual attributes and incorporate this semantic information into image features. Finally， the robust person features were output by GCN. The multi-modal person re-identification method achieves the mean Average Precision （mAP） of 87.6% and the Rank-1 accuracy of 95.1% on Market-1501 dataset， and achieves the mAP of 77.3% and the Rank-1 accuracy of 88.4% on DukeMTMC-reID dataset， which verify the effectiveness of the proposed method.

Key words: person re-identification, multi-modal, Graph Convolutional neural Network (GCN), person textual attribute, potential semantic relationship

中图分类号:

TP391.41

何嘉明, 杨巨成, 吴超, 闫潇宁, 许能华. 基于多模态图卷积神经网络的行人重识别方法[J]. 计算机应用, 2023, 43(7): 2182-2189.

Jiaming HE, Jucheng YANG, Chao WU, Xiaoning YAN, Nenghua XU. Person re-identification method based on multi-modal graph convolutional neural network[J]. Journal of Computer Applications, 2023, 43(7): 2182-2189.

图/表 6

图1 本文方法总览

Fig. 1 Overview of the proposed method

图2 行人文本属性的相关矩阵

Fig. 2 Correlation matrix of person textual attributes

表1 Market-1501与DukeMTMC-reID数据集上的实验结果对比 ( %)

Tab. 1 Comparison of experimental results on Market-1501 and DukeMTMC-reID datasets

方法	Market-1501		DukeMTMC-reID
方法	mAP	Rank-1	mAP	Rank-1
HA-CNN^［15］	75.7	91.2	63.8	80.5
PCB^［12］	77.3	92.4	69.2	83.3
OSNet^［13］	84.9	94.8	73.5	88.6
APR^［20］	66.9	87.0	55.5	73.9
ACRN^［21］	62.6	83.4	52.0	72.6
AANet^［22］	82.5	93.9	72.6	86.4
SGGNN^［23］	82.8	92.3	68.2	81.1
MGAT^［24］	76.5	91.5	—	—
HOReID^［25］	84.9	94.2	75.6	86.9
CLFA^［27］	85.9	94.5	76.4	86.4
DCC^［28］	88.6	—	—	—
Base-CNN（本文方法）	85.6	94.2	76.2	86.3
MMGCN（本文方法）	87.6	95.1	77.3	88.4

图3 不同前m个属性的实验结果

Fig. 3 Experimental results of different top-m attributes

图4 不同图卷积层数的实验结果

Fig. 4 Experimental result of different graph convolutional layers

图5 可视化实验结果

Fig. 5 Visualized experimental results

参考文献 43

1	KHAMIS S， KUO C H， SINGH V K， et al. Joint learning for attribute-consistent person re-identification ［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8927. Cham： Springer， 2015： 134-146.
2	KÖSTINGER M， HIRZER M， WOHLHART P， et al. Large scale metric learning from equivalence constraints ［C］// Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2012： 2288-2295. 10.1109/cvpr.2012.6247939
3	LI W， WANG X G. Locally aligned feature transforms across views ［C］// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2013： 3594-3601. 10.1109/cvpr.2013.461
4	ZHENG W S， GONG S G， XIANG T. Reidentification by relative distance comparison［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2013， 35（3）： 653-668. 10.1109/tpami.2012.138
5	DALAL N， TRIGGS B. Histograms of oriented gradients for human detection ［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition — Volume 1. Piscataway： IEEE， 2005： 886-893. 10.1109/cvpr.2005.4
6	LOWE D G. Object recognition from local scale-invariant features ［C］// Proceedings of the 7th IEEE International Conference on Computer Vision — Volume 2. Piscataway： IEEE， 1999： 1150-1157. 10.1109/iccv.1999.790410
7	WU Z H， PAN S R， CHEN F W， et al. A comprehensive survey on graph neural networks［J］. IEEE Transactions on Neural Networks and Learning Systems， 2021， 32（1）： 4-24. 10.1109/tnnls.2020.2978386
8	杨永胜，邓淼磊，李磊，等.基于深度学习的行人重识别综述［J］.计算机工程与应用， 2022， 58（9）： 51-66. 10.3778/j.issn.1002-8331.2110-0300
	YANG Y S， DENG M L， LI L， et al. Overview of pedestrian re-identification based on deep learning［J］. Computer Engineering and Applications， 2022， 58（9）： 51-66. 10.3778/j.issn.1002-8331.2110-0300
9	杨锋，许玉，尹梦晓，等.基于深度学习的行人重识别综述［J］.计算机应用， 2020， 40（5）： 1243-1252. 10.11772/j.issn.1001-9081.2019091703
	YANG F， XU Y， YIN M X， et al. Review on deep learning-based pedestrian re-identification［J］. Journal of Computer Applications， 2020， 40（5）： 1243-1252. 10.11772/j.issn.1001-9081.2019091703
10	罗浩，姜伟，范星，等.基于深度学习的行人重识别研究进展［J］.自动化学报， 2019， 45（11）： 2032-2049. 10.16383/j.aas.c180154
	LUO H， JIANG W， FAN X， et al. A survey on deep learning based person re-identification［J］. Acta Automatica Sinica， 2019， 45（11）： 2032-2049. 10.16383/j.aas.c180154
11	LI W， ZHAO R， XIAO T， et al. DeepReID： deep filter pairing neural network for person re-identification ［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 152-159. 10.1109/cvpr.2014.27
12	SUN Y F， ZHENG L， YANG Y， et al. Beyond part models： person retrieval with refined part pooling （and a strong convolutional baseline）［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11208. Cham： Springer， 2018： 501-518.
13	ZHOU K Y， YANG Y X， CAVALLARO A， et al. Omni-scale feature learning for person re-identification ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 3701-3711. 10.1109/iccv.2019.00380
14	邓轩，廖开阳，郑元林，等.基于深度多视图特征距离学习的行人重识别［J］.计算机应用， 2019， 39（8）： 2223-2229. 10.11772/j.issn.1001-9081.2018122505
	DENG X， LIAO K Y， ZHENG Y L， et al. Person re-identification based on deep multi-view feature distance learning［J］. Journal of Computer Applications， 2019， 39（8）： 2223-2229. 10.11772/j.issn.1001-9081.2018122505
15	LI W， ZHU X T， GONG S G. Harmonious attention network for person re-identification ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 2285-2294. 10.1109/cvpr.2018.00243
16	LI S， BAK S， CARR P， et al. Diversity regularized spatiotemporal attention for video-based person re-identification ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 369-378. 10.1109/cvpr.2018.00046
17	LIU H， FENG J S， QI M B， et al. End-to-end comparative attention networks for person re-identification［J］. IEEE Transactions on Image Processing， 2017， 26（7）： 3492-3506. 10.1109/tip.2017.2700762
18	LIU X H， ZHAO H Y， TIAN M Q， et al. HydraPlus-Net： attentive deep features for pedestrian analysis ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 350-359. 10.1109/iccv.2017.46
19	刘紫燕，万培佩.基于注意力机制的行人重识别特征提取方法［J］.计算机应用， 2020， 40（3）： 672-676.
	LIU Z Y， WAN P P. Pedestrian re-identification feature extraction method based on attention mechanism［J］. Journal of Computer Applications， 2020， 40（3）： 672-676.
20	LIN Y T， ZHENG L， ZHENG Z D， et al. Improving person re-identification by attribute and identity learning［J］. Pattern Recognition， 2019， 95： 151-161. 10.1016/j.patcog.2019.06.006
21	SCHUMANN A， STIEFELHAGEN R. Person re-identification by deep learning attribute-complementary information ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2017： 1435-1443. 10.1109/cvprw.2017.186
22	TAY C P， ROY S， YAP K H. AANet： attribute attention network for person re-Identifications ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 7127-7136. 10.1109/cvpr.2019.00730
23	SHEN Y T， LI H S， YI S， et al. Person re-identification with deep similarity-guided graph neural network ［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11219. Cham： Springer， 2018： 508-526.
24	BAO L Q， MA B P， CHANG H， et al. Masked graph attention network for person re-identification ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2019： 1496-1505. 10.1109/cvprw.2019.00191
25	WANG G A， YANG S， LIU H Y， et al. High-order information matters： learning relation and topology for occluded person re-identification ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition， Piscataway： IEEE， 2020： 6448-6457. 10.1109/cvpr42600.2020.00648
26	KIPF T N， WELLING M. Semi-supervised classification with graph convolutional networks［EB/OL］. （2017-02-22）［2022-03-22］. . 10.48550/arXiv.1609.02907
27	CHEN Q Y， ZHANG W， FAN J P. Cluster-level feature alignment for person re-identification［EB/OL］. （2020-08-15）［2022-07-12］. .
28	YAO H T， XU C S. Dual cluster contrastive learning for object re-identification［EB/OL］. （2022-04-21）［2022-07-12］. .
29	ZHENG L， SHEN L Y， TIAN L， et al. Scalable person re-identification： a benchmark ［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1116-1124. 10.1109/iccv.2015.133
30	RISTANI E， SOLERA F， ZOU R， et al. Performance measures and a data set for multi-target， multi-camera tracking ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9914. Cham： Springer， 2016： 17-35.
31	ROSENBLATT F. The perceptron： a probabilistic model for information storage and organization in the brain［J］. Psychological Review， 1958， 65（6）： 386-408. 10.1037/h0042519
32	JIA J， HUANG H J， YANG W J， et al. Rethinking of pedestrian attribute recognition： realistic datasets and a strong baseline［EB/OL］. （2020-05-26）［2022-03-24］. .
33	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
34	LI D W， CHEN X T， ZHANG Z， et al. Pose guided deep model for pedestrian attribute recognition in surveillance scenarios ［C］// Proceedings of the 2018 IEEE International Conference on Multimedia and Expo. Piscataway： IEEE， 2018： 1-6. 10.1109/icme.2018.8486604
35	MIKOLOV T， SUTSKEVER I， CHEN K， et al. Distributed representations of words and phrases and their compositionality ［C］// Proceedings of the 26th International Conference on Neural Information Processing Systems — Volume 2. Red Hook， NY： Curran Associates Inc.， 2013： 3111-3119.
36	PENNINGTON J， SOCHER R， MANNING C D. GloVe： global vectors for word representation ［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2014： 1532-1543. 10.3115/v1/d14-1162
37	ZHANG G H， LIANG G Y， SU F， et al. Cross-domain attribute representation based on convolutional neural network ［C］// Proceedings of the 2018 International Conference on Intelligent Computing， LNCS 10956. Cham： Springer， 2018： 134-142.
38	XU S M， LUO L K， HU S Q. Attention-based model with attribute classification for cross-domain person re-identification ［C］// Proceedings of the 25th International Conference on Pattern Recognition. Piscataway： IEEE， 2021： 9149-9155. 10.1109/icpr48806.2021.9413309
39	XU B L， LIU J X， HOU X X， et al. Cross domain person re-identification with large scale attribute annotated datasets［J］. IEEE Access， 2019， 7： 21623-21634. 10.1109/access.2019.2896663
40	XIAO Q Q， CAO K L， CHEN H N， et al. Cross domain knowledge transfer for person re-identification［EB/OL］. （2016-11-18）［2022-03-25］. .
41	LUO H， GU Y Z， LIAO X Y， et al. Bag of tricks and a strong baseline for deep person re-identification ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2019： 1487-1495. 10.1109/cvprw.2019.00190
42	WEN Y D， ZHANG K P， LI Z F， et al. A discriminative feature learning approach for deep face recognition ［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9911. Cham： Springer， 2016： 499-515.
43	LI X Y， JIANG S Q. Know more say less： image captioning based on scene graphs［J］. IEEE Transactions on Multimedia， 2019， 21（8）： 2117-2130. 10.1109/tmm.2019.2896516

[1]	张睿, 张鹏云, 高美蓉. 自优化双模态多通路非深度前庭神经鞘瘤识别模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2975-2982.
[2]	黄颖, 杨佳宇, 金家昊, 万邦睿. 用于RGBT跟踪的孪生混合信息融合算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2878-2885.
[3]	贾洁茹, 杨建超, 张硕蕊, 闫涛, 陈斌. 基于自蒸馏视觉Transformer的无监督行人重识别[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2893-2902.
[4]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[5]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[6]	王翠, 邓淼磊, 张德贤, 李磊, 杨晓艳. 基于图像的端到端行人搜索算法综述[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2544-2550.
[7]	姚迅, 秦忠正, 杨捷. 生成式标签对抗的文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1781-1785.
[8]	沈君凤, 周星辰, 汤灿. 基于改进的提示学习方法的双通道情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1796-1806.
[9]	陈田, 蔡从虎, 袁晓辉, 罗蓓蓓. 基于多尺度卷积和自注意力特征融合的多模态情感识别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 369-376.
[10]	赖华, 孙童, 王文君, 余正涛, 高盛祥, 董凌. 多模态特征的越南语语音识别文本标点恢复[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 418-423.
[11]	王星, 刘贵娟, 陈志豪. 高斯混合模型与文本图卷积网络结合的虚假评论识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 360-368.
[12]	郑盛有, 陈雁翔, 赵祖兴, 刘海洋. 多模态部分伪造数据集的构建与基准检测[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3134-3140.
[13]	林于翔, 吴运兵, 阴爱英, 廖祥文. 基于语义相关性分析的多模态摘要模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 65-72.
[14]	罗俊豪, 朱焱. 用于未对齐多模态语言序列情感分析的多交互感知网络[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 79-85.
[15]	李牧, 杨宇恒, 柯熙政. 基于混合特征提取与跨模态特征预测融合的情感识别模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 86-93.

基于多模态图卷积神经网络的行人重识别方法

Person re-identification method based on multi-modal graph convolutional neural network

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献 43

相关文章 15

编辑推荐

Metrics