Pedestrian attribute recognition based on two-domain self-attention mechanism

doi:10.11772/j.issn.1001-9081.2020060850

Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (2): 372-378.DOI: 10.11772/j.issn.1001-9081.2020060850

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Pedestrian attribute recognition based on two-domain self-attention mechanism

WU Rui^1,2, LIU Yu², FENG Kai^1,2

1. Wuhan Research Institute of Posts and Telecommunications, Wuhan Hubei 430074, China;
2. Nanjing Fiberhome Starrysky Company Limited, Nanjing Jiangsu 210019, China

Received:2020-06-19 Revised:2020-09-18 Online:2020-12-17 Published:2021-02-10
Supported by:
This work is partially supported by the National Key Research and Development Program of China (2017YFBI400704).

基于双域自注意力机制的行人属性识别

吴锐^1,2, 刘宇², 冯凯^1,2

1. 武汉邮电科学研究院, 武汉 430074;
2. 南京烽火星空通信发展有限公司, 南京 210019

通讯作者: 吴锐
作者简介:吴锐(1996-),男,贵州黎平人,硕士研究生,主要研究方向:计算机视觉、行人属性识别、行人重识别;刘宇(1974-),男,吉林辽源人,硕士,高级工程师,主要研究方向:互联网应用、网络安全、大数据;冯凯(1996-),男,陕西延安人,硕士研究生,主要研究方向:计算机视觉、深度学习。
基金资助:
国家重点研发计划项目（2017YFBI400704）

Abstract

Abstract: Focusing on the issue that different attributes have different requirements for feature granularity and feature dependence in pedestrian attribute recognition tasks, a pedestrian attribute recognition model based on two-domain self-attention mechanism composed of spatial self-attention mechanism and channel self-attention mechanism was proposed. Firstly, ResNet50 was used as the backbone network to extract the features with certain semantic information. Then, the features were input into the two-branch network respectively to extract the self-attention features with spatial dependence and semantic relevance as well as the global features of overall information. Finally, the features of two branches were concatenated, and the strategies of Batch Normalization (BN) and weighted loss were used to reduce the impact of imbalanced pedestrian attribute samples. Experimental results on two pedestrian attribute datasets PETA and RAP show that the proposed model improves the mean accuracy index by 3.91 percentage points and 4.05 percentage points respectively compared with the benchmark model, and has strong competitiveness in the existing pedestrian attribute recognition models. The proposed pedestrian attribute recognition based on two-domain self-attention mechanism can be used to perform the structural description of pedestrians in monitoring scenarios, so as to improve the accuracy and efficiency of pedestrian analysis and retrieval tasks.

Key words: pedestrian attribute recognition, spatial self-attention, channel self-attention, feature dependence, semantic relevance

摘要： 针对行人属性识别任务中不同属性对特征粒度和特征依赖性的需求不同的问题，提出了一种基于由空间自注意力机制和通道自注意力机制组成的双域自注意力机制的行人属性识别模型。首先，使用ResNet50作为骨干网络，提取出具有一定语义信息的特征；然后将得到的特征分别输入到双分支网络中，提取具有空间依赖性与语义相关性的自注意力特征以及整体性信息的全局特征；最后，融合双分支的特征，并利用批归一化（BN）和加权损失的策略降低行人属性样本不平衡的影响。在两个行人属性数据集PETA和RAP上的实验结果表明，所提出的模型比基准模型的平均准确率指标分别提高了3.91个百分点和4.05个百分点，在已提出的行人属性识别模型中具有较强的竞争力。基于双域自注意力机制的行人属性识别方法可在监控场景下对行人进行结构化描述，提高行人分析和检索等任务的准确度和效率。

关键词: 行人属性识别, 空间自注意力, 通道自注意力, 特征依赖, 语义相关

CLC Number:

TP391.4

WU Rui, LIU Yu, FENG Kai. Pedestrian attribute recognition based on two-domain self-attention mechanism[J]. Journal of Computer Applications, 2021, 41(2): 372-378.

吴锐, 刘宇, 冯凯. 基于双域自注意力机制的行人属性识别[J]. 计算机应用, 2021, 41(2): 372-378.

References

[1] CAO L,DIKMEN M,FU Y,et al. Gender recognition from body[C]//Proceedings of the 16th ACM International Conference on Multimedia. New York:ACM,2008:725-728.
[2] JOO J,WANG S,ZHU S C. Human attribute recognition by rich appearance dictionary[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2013:721-728.
[3] SUDOWE P,SPITZER H,LEIBE B. Person attribute recognition with a jointly-trained holistic CNN model[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision Workshops. Piscataway:IEEE,2015:329-337.
[4] ABDULNABI A H,WANG G,LU J,et al. Multi-task CNN model for attribute prediction[J]. IEEE Transactions on Multimedia, 2015,17(11):1949-1959.
[5] LI D,CHEN X,ZHANG Z,et al. Pose guided deep model for pedestrian attribute recognition in surveillance scenarios[C]//Proceedings of the 2018 IEEE International Conference on Multimedia and Expo. Piscataway:IEEE,2018:1-6.
[6] ZHAO X,SANG L,DING G,et al. Grouping attribute recognition for pedestrian with joint recurrent learning[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2018:3177-3183.
[7] LI Q,ZHAO X,HE R,et al. Visual-semantic graph reasoning for pedestrian attribute recognition[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press, 2019:8634-8641.
[8] WANG J,ZHU X,GONG S,et al. Attribute recognition by joint recurrent learning of context and correlation[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:531-540.
[9] ZHU F,LI H,OUYANG W,et al. Learning spatial regularization with image-level supervisions for multi-label image classification[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2017:2027-2036.
[10] LIU X,ZHAO H,TIAN M,et al. HydraPlus-Net:attentive deep features for pedestrian analysis[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:350-359.
[11] LI Q,ZHAO X,HE R,et al. Pedestrian attribute recognition by joint visual-semantic reasoning and knowledge distillation[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2019:833-839.
[12] BUADES A,COLL B,MOREL J M. A non-local algorithm for image denoising[C]//Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2005:60-65.
[13] WANG X,GIRSHICK R,GUPTA A,et al. Non-local neural networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018:7794-7803.
[14] ZHANG H, GOODFELLOW I, METAXAS D, et al. Selfattention generative adversarial networks[C]//Proceedings of the 36th International Conference on Machine Learning. New York:JMLR. org,2019:7354-7363.
[15] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2017:6000-6010.
[16] ZHAO H,JIA J,KOLTUN V. Exploring self-attention for image recognition[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2020:10073-10082.
[17] HU J,SHEN L,SUN G. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:7132-7141.
[18] WOO S,PARK J,LEE J Y,et al. CBAM:convolutional block attention module[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11211. Cham:Springer, 2018:3-19.
[19] CHEN T,DING S,XIE J,et al. ABD-Net:attentive but diverse person re-identification[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE, 2019:8350-8360.
[20] TAN Z,YANG Y,WAN J,et al. Attention-based pedestrian attribute analysis[J]. IEEE Transactions on Image Processing, 2019,28(12):6126-6140.
[21] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2016:770-778.
[22] DENG Y, LUO P, LOY C C, et al. Pedestrian attribute recognition at far distance[C]//Proceedings of the 22nd ACM International Conference on Multimedia. New York:ACM,2014:789-792.
[23] LI D,ZHANG Z,CHEN X,et al. A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios[J]. IEEE Transactions on Image Processing,2019,28(4):1575-1590.
[24] DENG J,DONG W,SOCHER R,et al. ImageNet:a large-scale hierarchical image database[C]//Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2009:248-255.
[25] LI D, CHEN X, HUANG K. Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios[C]//Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition. Piscataway:IEEE,2015:111-115.
[26] LIU P,LIU X,YAN J,et al. Localization guided learning for pedestrian attribute recognition[C]//Proceedings of the 2018 British Machine Vision Conference. Durham:BMVA Press, 2018:No. 0573.
[27] 郑少飞, 汤进, 罗斌, 等. 基于改进损失函数的多阶段行人属性识别方法[J]. 模式识别与人工智能,2018,31(12):1085-1095. (ZHENG S F, TANG J, LUO B, et. al. Multistage pedestrian attribute recognition method based on improved loss function[J]. Pattern Recognition and Artificial Intelligence, 2018,31(12):1085-1095.)
[28] ZHAO X,SANG L,DING G,et al. Recurrent attention model for pedestrian attribute recognition[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2019:9275-9282.
[29] JI Z,HE E,WANG H,et al. Image-attribute reciprocally guided attention network for pedestrian attribute recognition[J]. Pattern Recognition Letters,2019,120:89-95.
[30] SELVARAJU R R,COGSWELL M,DAS A,et al. Grad-CAM:visual explanations from deep networks via gradient-based localization[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:618-626.

Pedestrian attribute recognition based on two-domain self-attention mechanism

基于双域自注意力机制的行人属性识别

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 3

Recommended Articles

Metrics

[1]	Yuxiang LIN, Yunbing WU, Aiying YIN, Xiangwen LIAO. Multi-modal summarization model based on semantic relevance analysis [J]. Journal of Computer Applications, 2024, 44(1): 65-72.
[2]	Nanjiang CHENG, Zhenxia YU, Lin CHEN, Hezhe QIAO. Multi-source and multi-label pedestrian attribute recognition based on domain adaptation [J]. Journal of Computer Applications, 2022, 42(8): 2401-2406.
[3]	LIU Gaojun, FANG Xiao, DUAN Jianyong. Query extension based on deep semantic information [J]. Journal of Computer Applications, 2020, 40(11): 3192-3197.