基于单标注样本的多损失学习与联合度量视频行人重识别

doi:10.11772/j.issn.1001-9081.2021040788

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (3): 764-769.DOI: 10.11772/j.issn.1001-9081.2021040788

所属专题：人工智能； 2021年中国计算机学会人工智能会议(CCFAI 2021)

• 2021年中国计算机学会人工智能会议(CCFAI 2021) • 上一篇下一篇

基于单标注样本的多损失学习与联合度量视频行人重识别

殷雨昌¹, 王洪元¹(), 陈莉¹, 冯尊登¹, 肖宇²

^1.常州大学计算机与人工智能学院阿里云大数据学院，江苏常州 213000
^2.常州工程职业技术学院，江苏常州 213000

收稿日期:2021-05-17 修回日期:2021-06-03 接受日期:2021-06-15 发布日期:2021-11-09 出版日期:2022-03-10
通讯作者: 王洪元
作者简介:殷雨昌（1996—），男，江苏盐城人，硕士研究生，主要研究方向：计算机视觉
陈莉（1995—），女，江苏盐城人，硕士研究生，主要研究方向：计算机视觉
冯尊登（1996—），男，安徽宿州人，硕士研究生，主要研究方向：计算机视觉
肖宇（1981—），女，黑龙江伊春人，副教授，硕士，主要研究方向：数字媒体技术、图形图像处理。
基金资助:
国家自然科学基金资助项目(61976028)

One-shot video-based person re-identification with multi-loss learning and joint metric

Yuchang YIN¹, Hongyuan WANG¹(), Li CHEN¹, Zundeng FENG¹, Yu XIAO²

^1.School of Computer Science and Artificial Intelligence，Aliyun School of Big Data，Changzhou University，Changzhou Jiangsu 213000，China
^2.Changzhou Vocational Institute of Engineering，Changzhou Jiangsu 213000，China

Received:2021-05-17 Revised:2021-06-03 Accepted:2021-06-15 Online:2021-11-09 Published:2022-03-10
Contact: Hongyuan WANG
About author:YIN Yuchang， born in 1996， M. S. candidate. His research interests include computer vision.
CHEN Li， born in 1995， M. S. candidate. Her research interests include computer vision.
FENG Zundeng， born in 1996， M. S. candidate. His research interests include computer vision.
XIAO Yu， born in 1981， M. S.， associate professor. Her research interests include digital media technology， graphics and image processing.
Supported by:
National Natural Science Foundation of China(61976028)

摘要/Abstract

摘要：

为解决行人重识别标注成本巨大的问题，提出了基于单标注样本的多损失学习与联合度量视频行人重识别方法。针对标签样本数量少，得到的模型不够鲁棒的问题，提出了多损失学习（MLL）策略：在每次训练过程中，针对不同的数据，采用不同的损失函数进行优化，提高模型的判别力。其次，在标签估计时，提出了一个联合距离度量（JDM），该度量将样本距离和近邻距离结合，进一步提升伪标签预测的精度。JDM改善了无标签数据标签估计的准确率低、未标记的数据没有被充分利用导致训练过程不稳定的问题。实验结果表明，和单标注样本渐进学习方法PL相比，当每次迭代增加的伪标签样本的比率为 $0.10$ 时，在MARS和 DukeMTMC-VideoReID两个数据集上的rank-1准确度达到了65.5%和76.2%，分别提升了7.6和5.2个百分点。

关键词: 视频行人重识别, 单标注样本学习, 半监督学习, 标签估计, 距离度量

Abstract:

In order to solve the problem of huge labeling cost for person re-identification， a method of one-shot video-based person re-identification with multi-loss learning and joint metric was proposed. Aiming at the problem that the number of label samples is small and the model obtained is not robust enough， a Multi-Loss Learning （MLL） strategy was proposed. In each training process， different loss functions were used for different data to optimize and improve the discriminative ability of the model. Secondly， a Joint Distance Metric （JDM） was proposed for label estimation， which combined the sample distance and the nearest neighbor distance to further improve the accuracy of pseudo label prediction. JDM solved the problems of the low accuracy of label estimation for unlabeled data， and the instability in the training process caused by the unlabeled data not fully utilized. Experimental results show that compared with the one-shot progressive learning method PL （Progressive Learning）， the rank-1 accuracy reaches 65.5% and 76.2% on MARS and DukeMTMC-VideoReID datasets when the ratio of pseudo label samples added per iteration is 0.10， with the improvement of the proposed method of 7.6 and 5.2 percentage points， respectively.

Key words: video-based person re-identification, one-shot learning, semi-supervised learning, label estimation, distance metric

中图分类号:

TP391.10

殷雨昌, 王洪元, 陈莉, 冯尊登, 肖宇. 基于单标注样本的多损失学习与联合度量视频行人重识别[J]. 计算机应用, 2022, 42(3): 764-769.

Yuchang YIN, Hongyuan WANG, Li CHEN, Zundeng FENG, Yu XIAO. One-shot video-based person re-identification with multi-loss learning and joint metric[J]. Journal of Computer Applications, 2022, 42(3): 764-769.

图/表 9

图1 本文方法的整体迭代框架

Fig. 1 Overall iterative framework of the proposed method

图2 MLL策略

Fig. 2 MLL strategy

图3 JDM用于标签估计示意图

Fig. 3 Schematic diagram of JDM for label estimation

表1 各方法在两个大规模数据集上的性能比较 (%)

Tab.1 Performance comparison of different methods on two large-scale datasets

方法		MARS				DukeMTMC-VideoReID
方法		rank-1	rank-5	rank-20	mAP	rank-1	rank-5	rank-20	mAP
Baseline（one-shot）^［10］		36.20	50.20	61.90	15.50	39.60	56.80	67.00	33.30
DGM+IDE^［17］		36.80	54.00	68.50	16.90	42.40	57.90	69.30	33.60
Stepwise^［25］		41.20	55.60	66.80	19.70	56.30	70.40	79.20	46.80
EUG^［9］	$p = 0.10$	57.62	69.64	78.08	34.68	70.79	83.61	89.60	61.76
EUG^［9］	$p = 0.05$	62.67	74.94	82.57	42.45	72.79	84.18	91.45	63.23
BUC^［27］		55.10	68.30	—	29.40	74.80	86.80	—	66.70
LGF^［20］		58.80	69.00	78.50	36.20	86.30	96.00	98.60	82.70
SCLU^［19］	$p = 0.10$	61.97	76.52	84.34	41.47	72.79	84.19	91.03	62.99
SCLU^［19］	$p = 0.05$	63.74	78.44	85.51	42.74	72.79	85.04	90.31	63.15
PL^［10］	$p = 0.10$	57.90	70.30	79.30	34.90	71.00	83.80	90.30	61.90
PL^［10］	$p = 0.05$	62.80	75.20	83.80	42.60	72.90	84.30	91.40	63.30
MLL+JDM	$p = 0.10$	65.50	78.50	86.60	44.20	76.20	87.20	93.30	67.50
MLL+JDM	$p = 0.05$	68.50	80.80	88.60	47.80	76.50	88.70	93.20	68.70

表1 各方法在两个大规模数据集上的性能比较 (%)

Tab.1 Performance comparison of different methods on two large-scale datasets

方法		MARS				DukeMTMC-VideoReID
方法		rank-1	rank-5	rank-20	mAP	rank-1	rank-5	rank-20	mAP
Baseline（one-shot）^［10］		36.20	50.20	61.90	15.50	39.60	56.80	67.00	33.30
DGM+IDE^［17］		36.80	54.00	68.50	16.90	42.40	57.90	69.30	33.60
Stepwise^［25］		41.20	55.60	66.80	19.70	56.30	70.40	79.20	46.80
EUG^［9］	$p = 0.10$	57.62	69.64	78.08	34.68	70.79	83.61	89.60	61.76
EUG^［9］	$p = 0.05$	62.67	74.94	82.57	42.45	72.79	84.18	91.45	63.23
BUC^［27］		55.10	68.30	—	29.40	74.80	86.80	—	66.70
LGF^［20］		58.80	69.00	78.50	36.20	86.30	96.00	98.60	82.70
SCLU^［19］	$p = 0.10$	61.97	76.52	84.34	41.47	72.79	84.19	91.03	62.99
SCLU^［19］	$p = 0.05$	63.74	78.44	85.51	42.74	72.79	85.04	90.31	63.15
PL^［10］	$p = 0.10$	57.90	70.30	79.30	34.90	71.00	83.80	90.30	61.90
PL^［10］	$p = 0.05$	62.80	75.20	83.80	42.60	72.90	84.30	91.40	63.30
MLL+JDM	$p = 0.10$	65.50	78.50	86.60	44.20	76.20	87.20	93.30	67.50
MLL+JDM	$p = 0.05$	68.50	80.80	88.60	47.80	76.50	88.70	93.20	68.70

表2 各方法的标签估计准确率对比 (%)

Tab.2 Comparison of label estimation precision among different methods

方法	MARS	DukeMTMC-VideoReID
PUL^［26］	37.29	61.24
EUG^［9］（dis）	36.40	43.78
EUG^［9］（cls）	55.56	69.75
SCLU^［19］（dis）	58.76	70.41
SCLU^［19］（con）	62.92	76.80
PL^［10］	55.70	71.20
MLL+JDM	66.30	76.80

图4 p=0.10时在两个数据集上的消融实验结果

Fig. 4 Ablation experiment results on two datasets with p=0.10

表3 p取不同值时在MARS和DukeMTMC-VideoReID数据集上的消融实验结果 (%)

Tab.3 Ablation experiment results on MARS and DukeMTMC-VideoReID datasets with different p values

$p$	方法	MARS		DukeMTMC-VideoReID		$p$	方法	MARS		DukeMTMC-VideoReID
$p$	方法	rank-1	mAP	rank-1	mAP	$p$	方法	rank-1	mAP	rank-1	mAP
0.30	PL^［10］	44.5	22.1	66.1	56.3	0.10	PL^［10］	57.9	34.9	71.0	61.9
	MLL	49.2	25.9	68.7	59.9		MLL	61.9	39.5	73.4	65.2
	JDM	48.3	25.6	67.1	58.0		JDM	61.2	38.2	72.2	63.4
	MLL+JDM	48.5	26.8	69.5	60.2		MLL+JDM	65.5	44.2	76.2	67.5
0.20	PL^［10］	49.6	27.2	69.1	59.6	0.05	PL^［10］	62.8	42.6	72.9	63.3
	MLL	55.1	30.7	69.9	60.8		MLL	64.5	43.3	73.5	66.0
	JDM	54.7	31.0	70.1	60.5		JDM	63.8	42.6	73.1	64.0
	MLL+JDM	58.0	34.7	71.1	61.8		MLL+JDM	68.5	47.8	76.5	68.7

表3 p取不同值时在MARS和DukeMTMC-VideoReID数据集上的消融实验结果 (%)

Tab.3 Ablation experiment results on MARS and DukeMTMC-VideoReID datasets with different p values

$p$	方法	MARS		DukeMTMC-VideoReID		$p$	方法	MARS		DukeMTMC-VideoReID
$p$	方法	rank-1	mAP	rank-1	mAP	$p$	方法	rank-1	mAP	rank-1	mAP
0.30	PL^［10］	44.5	22.1	66.1	56.3	0.10	PL^［10］	57.9	34.9	71.0	61.9
	MLL	49.2	25.9	68.7	59.9		MLL	61.9	39.5	73.4	65.2
	JDM	48.3	25.6	67.1	58.0		JDM	61.2	38.2	72.2	63.4
	MLL+JDM	48.5	26.8	69.5	60.2		MLL+JDM	65.5	44.2	76.2	67.5
0.20	PL^［10］	49.6	27.2	69.1	59.6	0.05	PL^［10］	62.8	42.6	72.9	63.3
	MLL	55.1	30.7	69.9	60.8		MLL	64.5	43.3	73.5	66.0
	JDM	54.7	31.0	70.1	60.5		JDM	63.8	42.6	73.1	64.0
	MLL+JDM	58.0	34.7	71.1	61.8		MLL+JDM	68.5	47.8	76.5	68.7

表4 在DukeMTMC-VideoReID上使用不同α的JDM性能的比较 (%)

Tab.4 Performance comparison of JDM with different α on DukeMTMC-VideoReID

$α$	rank-1	rank-5	rank-20	mAP
0.3	73.4	88.8	92.2	65.3
0.4	73.8	86.0	93.0	65.5
0.5	76.2	87.2	93.3	67.5
0.6	73.5	86.3	92.9	65.2

表4 在DukeMTMC-VideoReID上使用不同α的JDM性能的比较 (%)

Tab.4 Performance comparison of JDM with different α on DukeMTMC-VideoReID

$α$	rank-1	rank-5	rank-20	mAP
0.3	73.4	88.8	92.2	65.3
0.4	73.8	86.0	93.0	65.5
0.5	76.2	87.2	93.3	67.5
0.6	73.5	86.3	92.9	65.2

表5 在DukeMTMC-VideoReID上使用不同K值的JDM的性能比较 (%)

Tab.5 Performance comparison of JDM with different K on DukeMTMC-VideoReID

$K$	rank-1	rank-5	rank-20	mAP
2	75.2	86.3	91.9	66.9
3	76.2	87.2	93.3	67.5
4	73.4	85.6	91.7	65.4
5	73.2	85.9	91.3	65.0

表5 在DukeMTMC-VideoReID上使用不同K值的JDM的性能比较 (%)

Tab.5 Performance comparison of JDM with different K on DukeMTMC-VideoReID

$K$	rank-1	rank-5	rank-20	mAP
2	75.2	86.3	91.9	66.9
3	76.2	87.2	93.3	67.5
4	73.4	85.6	91.7	65.4
5	73.2	85.9	91.3	65.0

参考文献 27

1	戴臣超，王洪元，倪彤光，等. 基于深度卷积生成对抗网络和拓展近邻重排序的行人重识别［J］. 计算机研究与发展， 2019， 56（8）：1632-1641. 10.7544/issn1000-1239.2019.20190195
	DAI C C， WANG H Y， NI T G， et al. Person re-identification based on deep convolutional generative adversarial network and expanded neighbor reranking［J］. Journal of Computer Research and Development， 2019， 56（8）： 1632-1641. 10.7544/issn1000-1239.2019.20190195
2	WANG H， DING Z， ZHANG J， et al. Person reidentification by semisupervised dictionary rectification learning with retraining module［J］. Journal of Electronic Imaging， 2018， 27（4）： 043043. 10.1117/1.jei.27.4.043043
3	GU X， CHANG H， MA B， et al. Appearance-preserving 3D convolution for video-based person re-identification［C］// Proceedings of the 16th European Conference on Computer Vision， LNCS 12347. Cham： Springer， 2020： 228-243. 10.1007/978-3-030-58536-5_14
4	CHEN D， XU D， LI H， et al. Group consistent similarity learning via deep CRF for person re-identification［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2018： 8649-8658. 10.1109/cvpr.2018.00902
5	ZHENG Z， ZHENG L， YANG Y. Unlabeled samples generated by gan improve the person re-identification baseline in vitro［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington， DC： IEEE Computer Society， 2017： 3754-3762. 10.1109/ICCV.2017.405
6	LIU X， SONG M， TAO D， et al. Semi-supervised coupled dictionary learning for person re-identification［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2014： 3550-3557. 10.1109/cvpr.2014.454
7	MA A J， LI P. Semi-supervised ranking for re-identification with few labeled image pairs［C］// Proceedings of the 2014 Asian Conference on Computer Vision， LNCS 9006. Cham： Springer， 2014： 598-613.
8	BAK S， CARR P. One-shot metric learning for person re-identification［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2017： 2990-2999. 10.1109/cvpr.2017.171
9	WU Y， LIN Y， DONG X， et al. Exploit the unknown gradually： one-shot video-based person re-identification by stepwise learning［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2018： 5177-5186. 10.1109/cvpr.2018.00543
10	WU Y， LIN Y， DONG X， et al. Progressive learning for person re-identification with one example［J］. IEEE Transactions on Image Processing， 2019， 28（6）： 2872-2881. 10.1109/tip.2019.2891895
11	SZEGEDY C， VANHOUCKE V， IOFFE S， et al. Rethinking the inception architecture for computer vision［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2016： 2818-2826. 10.1109/cvpr.2016.308
12	CHEN L， YANG H， GAO Z. Joint attentive spatial-temporal feature aggregation for video-based person re-identification［J］. IEEE Access， 2019， 7： 41230-41240. 10.1109/access.2019.2907274
13	HOU R， MA B， CHANG H， et al. VRSTC： occlusion-free video person re-identification［C］// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2019： 7183-7192. 10.1109/cvpr.2019.00735
14	SUBRAMANIAM A， NAMBIAR A， MITTAL A. Co-segmentation inspired attention networks for video-based person re-identification［C］// Proceedings of the 2019 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2019： 562-572. 10.1109/iccv.2019.00065
15	WU Y， BOURAHLA O E F， LI X， et al. Adaptive graph representation learning for video person re-identification［J］. IEEE Transactions on Image Processing， 2020， 29： 8821-8830. 10.1109/tip.2020.3001693
16	YAN Y， QIN J， CHEN J， et al. Learning multi-granular hypergraphs for video-based person re-identification［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 2899-2908. 10.1109/cvpr42600.2020.00297
17	YE M， MA A J， ZHENG L， et al. Dynamic label graph matching for unsupervised video re-identification［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington， DC： IEEE Computer Society， 2017： 5142-5150. 10.1109/iccv.2017.550
18	XU T I， LI J， WU H， et al. Feature space regularization for person re-identification with one sample［C］// Proceedings of the IEEE 31st International Conference on Tools with Artificial Intelligence. Piscataway： IEEE， 2019： 1463-1470. 10.1109/ictai.2019.00208
19	YIN J， LI B， WAN F， et al. A new data selection strategy for one-shot video-based person re-identification［C］// Proceedings of the 2019 IEEE International Conference on Image Processing. Piscataway： IEEE， 2019： 1227-1231. 10.1109/icip.2019.8803723
20	ZHAO C， ZHANG Z， YAN J， et al. Local-global feature for video-based one-shot person re-identification［C］// Proceedings of the 2020 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2020： 3662-3666. 10.1109/icassp40776.2020.9053134
21	LI H， XIAO J， SUN M， et al. Progressive sample mining and representation learning for one-shot person re-identification with adversarial samples ［EB/OL］. ［2021-05-10］. . 10.1016/j.patcog.2020.107614
22	XIN X， WANG J， XIE R， et al. Semi-supervised person re-identification using multi-view clustering［J］. Pattern Recognition， 2019， 88： 285-297. 10.1016/j.patcog.2018.11.025
23	LIU C T， LI Y J， CHIEN S Y， et al. Semantics-guided clustering with deep progressive learning for semi-supervised person re-identification ［EB/OL］. ［2021-05-10］. . 10.48550/arXiv.2010.01148
24	ZHENG L， BIE Z， SUN Y， et al. MARS： a video benchmark for large-scale person re-identification［C］// Proceedings of the 2016 European Conference on Computer Vision. Cham： Springer， 2016： 868-884. 10.1007/978-3-319-46466-4_52
25	LIU Z， WANG D， LU H. Stepwise metric promotion for unsupervised video person re-identification［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington， DC： IEEE Computer Society， 2017： 2429-2438. 10.1109/iccv.2017.266
26	FAN H， ZHENG L， YAN C， et al. Unsupervised person re-identification： clustering and fine-tuning［J］. ACM Transactions on Multimedia Computing， Communications， and Applications， 2018， 14（4）： 1-18. 10.1145/3243316
27	LIN Y， DONG X， ZHENG L， et al. A bottom-up clustering approach to unsupervised person re-identification［C］// Proceedings of the 2019 AAAI Conference on Artificial Intelligence. Menlo Park， CA： AAAI， 2019： 8738-8745. 10.1609/aaai.v33i01.33018738

[1]	张英俊, 李牛牛, 谢斌红, 张睿, 陆望东. 课程学习指导下的半监督目标检测框架[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2326-2333.
[2]	周妍, 李阳. 用于脑卒中病灶分割的具有注意力机制的校正交叉伪监督方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1942-1948.
[3]	张帅华, 张淑芬, 周明川, 徐超, 陈学斌. 基于半监督联邦学习的恶意流量检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3487-3494.
[4]	王瑞琪, 纪淑娟, 曹宁, 郭亚杰. 基于一致性训练的半监督虚假招聘广告检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2932-2939.
[5]	姚英茂, 姜晓燕. 基于图卷积网络与自注意力图池化的视频行人重识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 728-735.
[6]	伏博毅, 彭云聪, 蓝鑫, 秦小林. 基于深度学习的标签噪声学习算法综述[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 674-684.
[7]	方昕, 黄泽鑫, 张聿晗, 高天, 潘嘉, 付中华, 高建清, 刘俊华, 邹亮. 基于时域波形的半监督端到端虚假语音检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 227-231.
[8]	李锦烨, 黄瑞章, 秦永彬, 陈艳平, 田小瑜. 基于反绎学习的裁判文书量刑情节识别[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1802-1807.
[9]	邱永茹, 姚光乐, 冯杰, 崔昊宇. 基于半监督学习的单幅图像去雨算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1577-1582.
[10]	吴洁, 张师天, 谢海滨, 杨光. 基于多影像中心磁共振成像数据的半监督膝盖异常分类[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 316-324.
[11]	张师鹏, 李永忠, 杜祥通. 基于半监督学习和三支决策的入侵检测模型[J]. 计算机应用, 2021, 41(9): 2602-2608.
[12]	毛铭泽, 曹芮浩, 闫春钢. 基于权值多样性的半监督分类算法[J]. 计算机应用, 2021, 41(9): 2473-2480.
[13]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 《计算机应用》唯一官方网站, 2021, 41(8): 2273-2287.
[14]	刘紫燕, 朱明成, 袁磊, 马珊珊, 陈霖周廷. 基于非局部关注和多重特征融合的视频行人重识别[J]. 计算机应用, 2021, 41(2): 530-536.
[15]	李子龙, 周勇, 鲍蓉, 王洪栋. 优化三元组损失的深度距离度量学习方法[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3480-3484.

基于单标注样本的多损失学习与联合度量视频行人重识别

One-shot video-based person re-identification with multi-loss learning and joint metric

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 27

相关文章 15

编辑推荐

Metrics