基于EfficientNet的双分路多尺度联合学习行人再识别

doi:10.11772/j.issn.1001-9081.2021050852

摘要/Abstract

摘要：

针对视频图像中因小目标行人、遮挡和行人姿态多变而造成的行人再识别率低的问题，建立了一种基于高效网络EfficientNet的双分路多尺度联合学习方法。首先采用性能高效的EfficientNet-B1网络作为主干结构；然后利用加权双向特征金字塔（BiFPN）分支对提取的不同尺度全局特征进行融合，并且得到包含不同层次语义信息的全局特征，从而提高小目标行人的识别率；其次利用PCB分支提取深层局部特征来挖掘行人的非显著信息，并减轻行人遮挡和姿态多变性对识别率的影响；最后在训练阶段将两个分支网络分别提取的行人特征通过Softmax损失函数得到不同子损失，并把它们相加进行联合表示；在测试阶段将获得的全局特征和深层局部特征拼接融合，并计算欧氏距离得到再识别匹配结果。该方法在Market1501和DukeMTMC-Reid 数据集上的Rank-1的准确率分别达到了95.1%和89.1%，与原始EfficientNet-B1主干结构相比分别提高了3.9个百分点和2.3个百分点。实验结果表明，所提出的模型有效提高了行人再识别的准确率。

关键词: 行人再识别, EfficientNet, 局部特征提取, 多尺度特征提取, 联合学习

Abstract:

In order to deal with the problem of low pedestrian re-identification rate in video images due to small target pedestrians， occlusions and variable pedestrian postures， a dual-channel multi-scale integrated learning method was established based on efficient network EfficientNet. Firstly， EfficientNet-B1 （EfficientNet-Baseline1） network was used as the backbone structure. Secondly， a weighted Bidirectional Feature Pyramid Network （BiFPN） branch was used to integrate the extracted global features at different scales. In order to improve the identification rate of small target pedestrians， the global features with different semantic information were obtained. Thirdly， PCB （Part-based Convolutional Baseline） branch was used to extract deep local features to mine non-significant information of pedestrians and reduce the influence of pedestrian occlusion and posture variability on identification rate. Finally， in the training stage， the pedestrian features extracted by the two branch networks respectively were calculated by the Softmax loss function to obtain different subloss functions， and they were added for joint representation. In the test stage， the global features and deep local features obtained were spliced and fused， and the Euclidean distance was calculated to obtain the pedestrian re-identification matching results. The accuracy of Rank-1 of this method on Market1501 and DukeMTMC-Reid datasets reaches 95.1% and 89.1% respectively， which is 3.9 percentage points and 2.3 percentage points higher than that of the original backbone structure respectively. Experimental results show that the proposed model improves the accuracy of pedestrian re-identification effectively.

Key words: pedestrian re-identification, EfficientNet, local feature extraction, multi-scale feature extraction, integrated learning

中图分类号:

TP391.4

仇天昊, 陈淑荣. 基于EfficientNet的双分路多尺度联合学习行人再识别[J]. 计算机应用, 2022, 42(7): 2065-2071.

Tianhao QIU, Shurong CHEN. EfficientNet based dual-branch multi-scale integrated learning for pedestrian re-identification[J]. Journal of Computer Applications, 2022, 42(7): 2065-2071.

图/表 12

图1 基于EfficientNet的双分路多尺度联合学习网络结构

Fig.1 EfficientNet based dual-branch multi-scale integrated learning network structure

图2 随机擦除后的数据集图像效果

Fig. 2 Dataset image effect after random erasing augmentation

图3 MBConv6模块

Fig. 3 MBConv6 module

图4 BiFPN结构

Fig.4 BiFPN structure

图5 PCB局部特征提取流程

Fig.5 PCB local feature extraction process

表1 5种主干网络的行人再识别性能对比

Tab.1 Pedestrian re-identification performance comparison of five backbone networks

Network	Rank-1/%	mAP/%	Params/M
ResNet50	88.8	71.6	26.0
ResNet101	88.9	71.8	46.0
DenseNet121	90.2	74.1	17.0
MobileNet-V3	60.8	37.5	4.2
EfficientNet-B1	91.2	77.7	7.8

图6 原始和预处理后的数据集损失曲线对比

Fig.6 Loss curve comparison of original and preprocessed datasets

表2 数据预处理和BiFPN对识别结果的影响 ( %)

Tab.2 Influence of data preprocessing and BiFPN on identification results

方法	Rank-1	mAP
EfficientNet-B1（无数据预处理）	89.9	75.2
EfficientNet-B1（有数据预处理）	91.2	77.7
EfficientNet-B1（数据预处理和BiFPN）	92.4	77.8
EfficientNet-B1（无数据预处理）（小目标行人检索）	80.2	68.1
EfficientNet-B1（有数据预处理）（小目标行人检索）	82.4	70.3
EfficientNet-B1（数据预处理和BiFPN）（小目标行人检索）	86.3	71.9

图7 小目标行人图片

Fig.7 Small target pedestrian images

表3 不同L情况下Market1501和DukeMTMC-Reid数据集上的行人再识别结果单位：%Tab.3　Pedestrian re-identification results on Market1501 and DukeMTMC-Reid datasets under different L (%)

L	Market1501		DukeMTMC-Reid
L	Rank1	mAP	Rank1	mAP
1	93.1	78.6	87.2	75.4
2	93.8	81.9	87.8	75.9
3	94.7	83.7	89.0	77.1
4	95.1	86.3	89.1	77.2
6	94.1	86.1	88.2	76.6
8	92.5	80.3	88.1	76.5

表4 本文方法与其他方法的行人再识别准确率比较 (%)

Tab. 4 Comparison of pedestrian re-identification accuracy of the proposed method and other methods

方法	Market1501		DukeMTMC-Reid
方法	Rank1	mAP	Rank1	mAP
IDE（ResNet50）^［15］	90.6	80.1	—	—
PCB（ResNet50）^［4］	92.4	77.3	81.9	65.3
PCB+RPP（ResNet50）^［4］	93.1	81.0	82.9	68.5
HPM（ResNet50）^［16］	93.7	83.4	—	—
MGN（ResNet50）^［17］	95.7	86.9	88.7	78.4
本文方法（EfficientNet-B1）	95.1	86.3	89.1	77.2

图8 行人再识别可视化结果

Fig. 8 Pedestrian re-identification visualization results

参考文献 17

1	刘娜. 基于卷积神经网络的行人重识别算法［D］. 上海：华东师范大学， 2017：977-983. 10.1109/icassp.2017.7952461
	LIU N. Person re-identification based on convolutional neural networks［D］. Shanghai： East China Normal University， 2017：977-983. 10.1109/icassp.2017.7952461
2	李姣，张晓晖，朱虹，等. 多置信度重排序的行人再识别算法［J］. 模式识别与人工智能， 2017， 30（11）：995-1002.
	LI J， ZHANG X H， ZHU H， et al. Person re-identification via multiple confidences re-ranking［J］. Pattern Recognition and Artificial Intelligence， 2017， 30（11）：995-1002.
3	AHMED E， JONES M， MARKS T K. An improved deep learning architecture for person re-identification［C］// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 3908-3916. 10.1109/cvpr.2015.7299016
4	SUN Y F， ZHENG L， YANG Y， et al. Beyond part models： person retrieval with refined part pooling （and a strong convolutional baseline）［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS 11208. Cham： Springer， 2018：501-518.
5	ZHENG F， DENG C， SUN X， et al. Pyramidal person re-identification via multi-loss dynamic training［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 8506-8514. 10.1109/cvpr.2019.00871
6	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
7	ZHANG J P， JIANG F. Multi-level supervised network for person re-identification［C］// Proceedings of the 2019 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2019： 2072-2076. 10.1109/icassp.2019.8683858
8	TAN M X， LE Q V. EfficientNet： rethinking model scaling for convolutional neural networks［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 6105-6114.
9	TAN M X， PANG R M， LE Q V. EfficientDet： scalable and efficient object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020：10778-10787. 10.1109/cvpr42600.2020.01079
10	ZHONG Z， ZHENG L， KANG G L， et al. Random erasing data augmentation［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2020： 13001-13008. 10.1609/aaai.v34i07.7000
11	HU J， SHEN L， ALBANIE S， et al. Squeeze-and-excitation networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020， 42（8）： 2011-2023. 10.1109/tpami.2019.2913372
12	RISTANI E， SOLERA F， ZOU R， et al. Performance measures and a data set for multi-target， multi-camera tracking［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9914. Cham： Springer， 2016： 17-35.
13	LI W， ZHAO R， XIAO T， et al. DeepReID： deep filter pairing neural network for person re-identification［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 152-159. 10.1109/cvpr.2014.27
14	KINGMA D P， BA J L. Adam： a method for stochastic optimization［EB/OL］. （2017-01-30）［2020-01-20］..
15	ZHENG Z， ZHENG L， YANG Y. A discriminatively learned CNN embedding for person re-identification［EB/OL］. （2016-11-17）［2020-02-21］. . 10.1145/3159171
16	FU Y， WEI Y C， ZHOU Y Q， et al. Horizontal pyramid matching for person re-identification［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019： 8295-8302. 10.1609/aaai.v33i01.33018295
17	WANG G S， YUAN Y F， CHEN X， et al. Learning discriminative features with multiple granularities for person re-identification［C］// Proceedings of the 26th ACM International Conference on Multimedia. New York： ACM， 2018： 274-282. 10.1145/3240508.3240552

[1]	陈亭秀, 尹建芹. 基于关键帧筛选网络的视听联合动作识别[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 731-735.
[2]	魏文钰, 杨文忠, 马国祥, 黄梅. 基于深度学习的行人再识别技术研究综述[J]. 《计算机应用》唯一官方网站, 2020, 40(9): 2479-2492.
[3]	张心怡, 冯仕民, 丁恩杰. 面向煤矿的实体识别与关系抽取模型[J]. 计算机应用, 2020, 40(8): 2182-2188.
[4]	花超, 王庚润, 陈雷. 基于低通滤波模型的行人再识别算法[J]. 计算机应用, 2020, 40(11): 3314-3319.
[5]	祁子梁, 曲寒冰, 赵传虎, 董良, 李博昭, 王长生. 基于孪生网络和双向最大边界排序损失的行人再识别[J]. 计算机应用, 2019, 39(4): 977-983.
[6]	刘保成, 朴燕, 唐悦. 基于时空正则化的视频序列中行人的再识别[J]. 计算机应用, 2019, 39(11): 3216-3220.
[7]	李威史泽林尹健. 立体目标的宽基线图像匹配[J]. 计算机应用, 2013, 33(03): 635-639.
[8]	尚丽苏品刚杜吉祥. 基于局部非负稀疏编码的掌纹识别方法[J]. 计算机应用, 2011, 31(06): 1609-1612.