One-shot video-based person re-identification with multi-loss learning and joint metric

doi:10.11772/j.issn.1001-9081.2021040788

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (3): 764-769.DOI: 10.11772/j.issn.1001-9081.2021040788

• 2021 CCF Conference on Artificial Intelligence (CCFAI 2021) • Previous Articles

One-shot video-based person re-identification with multi-loss learning and joint metric

Yuchang YIN¹, Hongyuan WANG¹(), Li CHEN¹, Zundeng FENG¹, Yu XIAO²

^1.School of Computer Science and Artificial Intelligence，Aliyun School of Big Data，Changzhou University，Changzhou Jiangsu 213000，China
^2.Changzhou Vocational Institute of Engineering，Changzhou Jiangsu 213000，China

Received:2021-05-17 Revised:2021-06-03 Accepted:2021-06-15 Online:2021-11-09 Published:2022-03-10
Contact: Hongyuan WANG
About author:YIN Yuchang， born in 1996， M. S. candidate. His research interests include computer vision.
CHEN Li， born in 1995， M. S. candidate. Her research interests include computer vision.
FENG Zundeng， born in 1996， M. S. candidate. His research interests include computer vision.
XIAO Yu， born in 1981， M. S.， associate professor. Her research interests include digital media technology， graphics and image processing.
Supported by:
National Natural Science Foundation of China(61976028)

基于单标注样本的多损失学习与联合度量视频行人重识别

殷雨昌¹, 王洪元¹(), 陈莉¹, 冯尊登¹, 肖宇²

^1.常州大学计算机与人工智能学院阿里云大数据学院，江苏常州 213000
^2.常州工程职业技术学院，江苏常州 213000

通讯作者: 王洪元
作者简介:殷雨昌（1996—），男，江苏盐城人，硕士研究生，主要研究方向：计算机视觉
陈莉（1995—），女，江苏盐城人，硕士研究生，主要研究方向：计算机视觉
冯尊登（1996—），男，安徽宿州人，硕士研究生，主要研究方向：计算机视觉
肖宇（1981—），女，黑龙江伊春人，副教授，硕士，主要研究方向：数字媒体技术、图形图像处理。
基金资助:
国家自然科学基金资助项目(61976028)

Abstract

Abstract:

In order to solve the problem of huge labeling cost for person re-identification， a method of one-shot video-based person re-identification with multi-loss learning and joint metric was proposed. Aiming at the problem that the number of label samples is small and the model obtained is not robust enough， a Multi-Loss Learning （MLL） strategy was proposed. In each training process， different loss functions were used for different data to optimize and improve the discriminative ability of the model. Secondly， a Joint Distance Metric （JDM） was proposed for label estimation， which combined the sample distance and the nearest neighbor distance to further improve the accuracy of pseudo label prediction. JDM solved the problems of the low accuracy of label estimation for unlabeled data， and the instability in the training process caused by the unlabeled data not fully utilized. Experimental results show that compared with the one-shot progressive learning method PL （Progressive Learning）， the rank-1 accuracy reaches 65.5% and 76.2% on MARS and DukeMTMC-VideoReID datasets when the ratio of pseudo label samples added per iteration is 0.10， with the improvement of the proposed method of 7.6 and 5.2 percentage points， respectively.

Key words: video-based person re-identification, one-shot learning, semi-supervised learning, label estimation, distance metric

摘要：

为解决行人重识别标注成本巨大的问题，提出了基于单标注样本的多损失学习与联合度量视频行人重识别方法。针对标签样本数量少，得到的模型不够鲁棒的问题，提出了多损失学习（MLL）策略：在每次训练过程中，针对不同的数据，采用不同的损失函数进行优化，提高模型的判别力。其次，在标签估计时，提出了一个联合距离度量（JDM），该度量将样本距离和近邻距离结合，进一步提升伪标签预测的精度。JDM改善了无标签数据标签估计的准确率低、未标记的数据没有被充分利用导致训练过程不稳定的问题。实验结果表明，和单标注样本渐进学习方法PL相比，当每次迭代增加的伪标签样本的比率为 $0.10$ 时，在MARS和 DukeMTMC-VideoReID两个数据集上的rank-1准确度达到了65.5%和76.2%，分别提升了7.6和5.2个百分点。

关键词: 视频行人重识别, 单标注样本学习, 半监督学习, 标签估计, 距离度量

CLC Number:

TP391.10

Yuchang YIN, Hongyuan WANG, Li CHEN, Zundeng FENG, Yu XIAO. One-shot video-based person re-identification with multi-loss learning and joint metric[J]. Journal of Computer Applications, 2022, 42(3): 764-769.

殷雨昌, 王洪元, 陈莉, 冯尊登, 肖宇. 基于单标注样本的多损失学习与联合度量视频行人重识别[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 764-769.

Figures/Tables 9

Fig. 1 Overall iterative framework of the proposed method

Fig. 2 MLL strategy

Fig. 3 Schematic diagram of JDM for label estimation

Tab.1 Performance comparison of different methods on two large-scale datasets

方法		MARS				DukeMTMC-VideoReID
方法		rank-1	rank-5	rank-20	mAP	rank-1	rank-5	rank-20	mAP
Baseline（one-shot）^［10］		36.20	50.20	61.90	15.50	39.60	56.80	67.00	33.30
DGM+IDE^［17］		36.80	54.00	68.50	16.90	42.40	57.90	69.30	33.60
Stepwise^［25］		41.20	55.60	66.80	19.70	56.30	70.40	79.20	46.80
EUG^［9］	$p = 0.10$	57.62	69.64	78.08	34.68	70.79	83.61	89.60	61.76
EUG^［9］	$p = 0.05$	62.67	74.94	82.57	42.45	72.79	84.18	91.45	63.23
BUC^［27］		55.10	68.30	—	29.40	74.80	86.80	—	66.70
LGF^［20］		58.80	69.00	78.50	36.20	86.30	96.00	98.60	82.70
SCLU^［19］	$p = 0.10$	61.97	76.52	84.34	41.47	72.79	84.19	91.03	62.99
SCLU^［19］	$p = 0.05$	63.74	78.44	85.51	42.74	72.79	85.04	90.31	63.15
PL^［10］	$p = 0.10$	57.90	70.30	79.30	34.90	71.00	83.80	90.30	61.90
PL^［10］	$p = 0.05$	62.80	75.20	83.80	42.60	72.90	84.30	91.40	63.30
MLL+JDM	$p = 0.10$	65.50	78.50	86.60	44.20	76.20	87.20	93.30	67.50
MLL+JDM	$p = 0.05$	68.50	80.80	88.60	47.80	76.50	88.70	93.20	68.70

Tab.1 Performance comparison of different methods on two large-scale datasets

方法		MARS				DukeMTMC-VideoReID
方法		rank-1	rank-5	rank-20	mAP	rank-1	rank-5	rank-20	mAP
Baseline（one-shot）^［10］		36.20	50.20	61.90	15.50	39.60	56.80	67.00	33.30
DGM+IDE^［17］		36.80	54.00	68.50	16.90	42.40	57.90	69.30	33.60
Stepwise^［25］		41.20	55.60	66.80	19.70	56.30	70.40	79.20	46.80
EUG^［9］	$p = 0.10$	57.62	69.64	78.08	34.68	70.79	83.61	89.60	61.76
EUG^［9］	$p = 0.05$	62.67	74.94	82.57	42.45	72.79	84.18	91.45	63.23
BUC^［27］		55.10	68.30	—	29.40	74.80	86.80	—	66.70
LGF^［20］		58.80	69.00	78.50	36.20	86.30	96.00	98.60	82.70
SCLU^［19］	$p = 0.10$	61.97	76.52	84.34	41.47	72.79	84.19	91.03	62.99
SCLU^［19］	$p = 0.05$	63.74	78.44	85.51	42.74	72.79	85.04	90.31	63.15
PL^［10］	$p = 0.10$	57.90	70.30	79.30	34.90	71.00	83.80	90.30	61.90
PL^［10］	$p = 0.05$	62.80	75.20	83.80	42.60	72.90	84.30	91.40	63.30
MLL+JDM	$p = 0.10$	65.50	78.50	86.60	44.20	76.20	87.20	93.30	67.50
MLL+JDM	$p = 0.05$	68.50	80.80	88.60	47.80	76.50	88.70	93.20	68.70

Tab.2 Comparison of label estimation precision among different methods

方法	MARS	DukeMTMC-VideoReID
PUL^［26］	37.29	61.24
EUG^［9］（dis）	36.40	43.78
EUG^［9］（cls）	55.56	69.75
SCLU^［19］（dis）	58.76	70.41
SCLU^［19］（con）	62.92	76.80
PL^［10］	55.70	71.20
MLL+JDM	66.30	76.80

Fig. 4 Ablation experiment results on two datasets with p=0.10

Tab.3 Ablation experiment results on MARS and DukeMTMC-VideoReID datasets with different p values

$p$	方法	MARS		DukeMTMC-VideoReID		$p$	方法	MARS		DukeMTMC-VideoReID
$p$	方法	rank-1	mAP	rank-1	mAP	$p$	方法	rank-1	mAP	rank-1	mAP
0.30	PL^［10］	44.5	22.1	66.1	56.3	0.10	PL^［10］	57.9	34.9	71.0	61.9
	MLL	49.2	25.9	68.7	59.9		MLL	61.9	39.5	73.4	65.2
	JDM	48.3	25.6	67.1	58.0		JDM	61.2	38.2	72.2	63.4
	MLL+JDM	48.5	26.8	69.5	60.2		MLL+JDM	65.5	44.2	76.2	67.5
0.20	PL^［10］	49.6	27.2	69.1	59.6	0.05	PL^［10］	62.8	42.6	72.9	63.3
	MLL	55.1	30.7	69.9	60.8		MLL	64.5	43.3	73.5	66.0
	JDM	54.7	31.0	70.1	60.5		JDM	63.8	42.6	73.1	64.0
	MLL+JDM	58.0	34.7	71.1	61.8		MLL+JDM	68.5	47.8	76.5	68.7

Tab.3 Ablation experiment results on MARS and DukeMTMC-VideoReID datasets with different p values

$p$	方法	MARS		DukeMTMC-VideoReID		$p$	方法	MARS		DukeMTMC-VideoReID
$p$	方法	rank-1	mAP	rank-1	mAP	$p$	方法	rank-1	mAP	rank-1	mAP
0.30	PL^［10］	44.5	22.1	66.1	56.3	0.10	PL^［10］	57.9	34.9	71.0	61.9
	MLL	49.2	25.9	68.7	59.9		MLL	61.9	39.5	73.4	65.2
	JDM	48.3	25.6	67.1	58.0		JDM	61.2	38.2	72.2	63.4
	MLL+JDM	48.5	26.8	69.5	60.2		MLL+JDM	65.5	44.2	76.2	67.5
0.20	PL^［10］	49.6	27.2	69.1	59.6	0.05	PL^［10］	62.8	42.6	72.9	63.3
	MLL	55.1	30.7	69.9	60.8		MLL	64.5	43.3	73.5	66.0
	JDM	54.7	31.0	70.1	60.5		JDM	63.8	42.6	73.1	64.0
	MLL+JDM	58.0	34.7	71.1	61.8		MLL+JDM	68.5	47.8	76.5	68.7

Tab.4 Performance comparison of JDM with different α on DukeMTMC-VideoReID

$α$	rank-1	rank-5	rank-20	mAP
0.3	73.4	88.8	92.2	65.3
0.4	73.8	86.0	93.0	65.5
0.5	76.2	87.2	93.3	67.5
0.6	73.5	86.3	92.9	65.2

Tab.4 Performance comparison of JDM with different α on DukeMTMC-VideoReID

$α$	rank-1	rank-5	rank-20	mAP
0.3	73.4	88.8	92.2	65.3
0.4	73.8	86.0	93.0	65.5
0.5	76.2	87.2	93.3	67.5
0.6	73.5	86.3	92.9	65.2

Tab.5 Performance comparison of JDM with different K on DukeMTMC-VideoReID

$K$	rank-1	rank-5	rank-20	mAP
2	75.2	86.3	91.9	66.9
3	76.2	87.2	93.3	67.5
4	73.4	85.6	91.7	65.4
5	73.2	85.9	91.3	65.0

Tab.5 Performance comparison of JDM with different K on DukeMTMC-VideoReID

$K$	rank-1	rank-5	rank-20	mAP
2	75.2	86.3	91.9	66.9
3	76.2	87.2	93.3	67.5
4	73.4	85.6	91.7	65.4
5	73.2	85.9	91.3	65.0

References 27

1	戴臣超，王洪元，倪彤光，等. 基于深度卷积生成对抗网络和拓展近邻重排序的行人重识别［J］. 计算机研究与发展， 2019， 56（8）：1632-1641. 10.7544/issn1000-1239.2019.20190195
	DAI C C， WANG H Y， NI T G， et al. Person re-identification based on deep convolutional generative adversarial network and expanded neighbor reranking［J］. Journal of Computer Research and Development， 2019， 56（8）： 1632-1641. 10.7544/issn1000-1239.2019.20190195
2	WANG H， DING Z， ZHANG J， et al. Person reidentification by semisupervised dictionary rectification learning with retraining module［J］. Journal of Electronic Imaging， 2018， 27（4）： 043043. 10.1117/1.jei.27.4.043043
3	GU X， CHANG H， MA B， et al. Appearance-preserving 3D convolution for video-based person re-identification［C］// Proceedings of the 16th European Conference on Computer Vision， LNCS 12347. Cham： Springer， 2020： 228-243. 10.1007/978-3-030-58536-5_14
4	CHEN D， XU D， LI H， et al. Group consistent similarity learning via deep CRF for person re-identification［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2018： 8649-8658. 10.1109/cvpr.2018.00902
5	ZHENG Z， ZHENG L， YANG Y. Unlabeled samples generated by gan improve the person re-identification baseline in vitro［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington， DC： IEEE Computer Society， 2017： 3754-3762. 10.1109/ICCV.2017.405
6	LIU X， SONG M， TAO D， et al. Semi-supervised coupled dictionary learning for person re-identification［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2014： 3550-3557. 10.1109/cvpr.2014.454
7	MA A J， LI P. Semi-supervised ranking for re-identification with few labeled image pairs［C］// Proceedings of the 2014 Asian Conference on Computer Vision， LNCS 9006. Cham： Springer， 2014： 598-613.
8	BAK S， CARR P. One-shot metric learning for person re-identification［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2017： 2990-2999. 10.1109/cvpr.2017.171
9	WU Y， LIN Y， DONG X， et al. Exploit the unknown gradually： one-shot video-based person re-identification by stepwise learning［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2018： 5177-5186. 10.1109/cvpr.2018.00543
10	WU Y， LIN Y， DONG X， et al. Progressive learning for person re-identification with one example［J］. IEEE Transactions on Image Processing， 2019， 28（6）： 2872-2881. 10.1109/tip.2019.2891895
11	SZEGEDY C， VANHOUCKE V， IOFFE S， et al. Rethinking the inception architecture for computer vision［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2016： 2818-2826. 10.1109/cvpr.2016.308
12	CHEN L， YANG H， GAO Z. Joint attentive spatial-temporal feature aggregation for video-based person re-identification［J］. IEEE Access， 2019， 7： 41230-41240. 10.1109/access.2019.2907274
13	HOU R， MA B， CHANG H， et al. VRSTC： occlusion-free video person re-identification［C］// Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2019： 7183-7192. 10.1109/cvpr.2019.00735
14	SUBRAMANIAM A， NAMBIAR A， MITTAL A. Co-segmentation inspired attention networks for video-based person re-identification［C］// Proceedings of the 2019 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2019： 562-572. 10.1109/iccv.2019.00065
15	WU Y， BOURAHLA O E F， LI X， et al. Adaptive graph representation learning for video person re-identification［J］. IEEE Transactions on Image Processing， 2020， 29： 8821-8830. 10.1109/tip.2020.3001693
16	YAN Y， QIN J， CHEN J， et al. Learning multi-granular hypergraphs for video-based person re-identification［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 2899-2908. 10.1109/cvpr42600.2020.00297
17	YE M， MA A J， ZHENG L， et al. Dynamic label graph matching for unsupervised video re-identification［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington， DC： IEEE Computer Society， 2017： 5142-5150. 10.1109/iccv.2017.550
18	XU T I， LI J， WU H， et al. Feature space regularization for person re-identification with one sample［C］// Proceedings of the IEEE 31st International Conference on Tools with Artificial Intelligence. Piscataway： IEEE， 2019： 1463-1470. 10.1109/ictai.2019.00208
19	YIN J， LI B， WAN F， et al. A new data selection strategy for one-shot video-based person re-identification［C］// Proceedings of the 2019 IEEE International Conference on Image Processing. Piscataway： IEEE， 2019： 1227-1231. 10.1109/icip.2019.8803723
20	ZHAO C， ZHANG Z， YAN J， et al. Local-global feature for video-based one-shot person re-identification［C］// Proceedings of the 2020 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2020： 3662-3666. 10.1109/icassp40776.2020.9053134
21	LI H， XIAO J， SUN M， et al. Progressive sample mining and representation learning for one-shot person re-identification with adversarial samples ［EB/OL］. ［2021-05-10］. . 10.1016/j.patcog.2020.107614
22	XIN X， WANG J， XIE R， et al. Semi-supervised person re-identification using multi-view clustering［J］. Pattern Recognition， 2019， 88： 285-297. 10.1016/j.patcog.2018.11.025
23	LIU C T， LI Y J， CHIEN S Y， et al. Semantics-guided clustering with deep progressive learning for semi-supervised person re-identification ［EB/OL］. ［2021-05-10］. . 10.48550/arXiv.2010.01148
24	ZHENG L， BIE Z， SUN Y， et al. MARS： a video benchmark for large-scale person re-identification［C］// Proceedings of the 2016 European Conference on Computer Vision. Cham： Springer， 2016： 868-884. 10.1007/978-3-319-46466-4_52
25	LIU Z， WANG D， LU H. Stepwise metric promotion for unsupervised video person re-identification［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington， DC： IEEE Computer Society， 2017： 2429-2438. 10.1109/iccv.2017.266
26	FAN H， ZHENG L， YAN C， et al. Unsupervised person re-identification： clustering and fine-tuning［J］. ACM Transactions on Multimedia Computing， Communications， and Applications， 2018， 14（4）： 1-18. 10.1145/3243316
27	LIN Y， DONG X， ZHENG L， et al. A bottom-up clustering approach to unsupervised person re-identification［C］// Proceedings of the 2019 AAAI Conference on Artificial Intelligence. Menlo Park， CA： AAAI， 2019： 8738-8745. 10.1609/aaai.v33i01.33018738

[1]	Jie WU, Shitian ZHANG, Haibin XIE, Guang YANG. Semi-supervised knee abnormality classification based on multi-imaging center MRI data [J]. Journal of Computer Applications, 2022, 42(1): 316-324.
[2]	MAO Mingze, CAO Ruihao, YAN Chungang. Semi-supervised classification algorithm based on weight diversity [J]. Journal of Computer Applications, 2021, 41(9): 2473-2480.
[3]	ZHANG Shipeng, LI Yongzhong, DU Xiangtong. Intrusion detection model based on semi-supervised learning and three-way decision [J]. Journal of Computer Applications, 2021, 41(9): 2602-2608.
[4]	CAO Yuhong, XU Hai, LIU Sun'ao, WANG Zixiao, LI Hongliang. Review of deep learning-based medical image segmentation [J]. Journal of Computer Applications, 2021, 41(8): 2273-2287.
[5]	Zilong LI, Yong ZHOU, Rong BAO, Hongdong WANG. Deep distance metric learning method based on optimized triplet loss [J]. Journal of Computer Applications, 2021, 41(12): 3480-3484.
[6]	ZHU Yuna, ZHANG Yutao, YAN Shaoge, FAN Yudan, CHEN Hantuo. Protocol identification approach based on semi-supervised subspace clustering [J]. Journal of Computer Applications, 2021, 41(10): 2900-2904.
[7]	CHEN Li, WANG Hongyuan, ZHANG Yunpeng, CAO Liang, YIN Yuchang. Video-based person re-identification method by jointing evenly sampling-random erasing and global temporal feature pooling [J]. Journal of Computer Applications, 2021, 41(1): 164-169.
[8]	LYU Yali, MIAO Junzhong, HU Weixin. Semi-supervised learning algorithm of graph based on label metric learning [J]. Journal of Computer Applications, 2020, 40(12): 3430-3436.
[9]	CHENG Kai, WANG Yan, LIU Jianfei. Semi-supervised learning method for automatic nuclei segmentation using generative adversarial network [J]. Journal of Computer Applications, 2020, 40(10): 2917-2922.
[10]	LIU Ying, LIANG Nannan, LI Daxiang, YANG Fanchao. Hyperspectral image unmixing algorithm based on spectral distance clustering [J]. Journal of Computer Applications, 2019, 39(9): 2541-2546.
[11]	DENG Xuan, LIAO Kaiyang, ZHENG Yuanlin, YUAN Hui, LEI Hao, CHEN Bing. Person re-identification based on deep multi-view feature distance learning [J]. Journal of Computer Applications, 2019, 39(8): 2223-2229.
[12]	CHEN Kejia, YANG Zeyu, LIU Zheng, LU Hao. Graph convolutional network model using neighborhood selection strategy [J]. Journal of Computer Applications, 2019, 39(12): 3415-3419.
[13]	REN Fulong, CAO Peng, WAN Chao, ZHAO Dazhe. Grading of diabetic retinopathy based on cost-sensitive semi-supervised ensemble learning [J]. Journal of Computer Applications, 2018, 38(7): 2124-2129.
[14]	WANG Qian, CHEN Yimin, DING Youdong. Vehicle re-identification algorithm based on bag of visual words in complicated environments [J]. Journal of Computer Applications, 2018, 38(5): 1299-1303.
[15]	SUN Shengzi, WAN Yuan, ZENG Cheng. Semi-supervised adaptive multi-view embedding method for feature dimension reduction [J]. Journal of Computer Applications, 2018, 38(12): 3391-3398.

One-shot video-based person re-identification with multi-loss learning and joint metric

基于单标注样本的多损失学习与联合度量视频行人重识别

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 27

Related Articles 15

Recommended Articles

Metrics