Contrastive knowledge distillation method for object detection

doi:10.11772/j.issn.1001-9081.2024020212

Abstract

Abstract:

Knowledge distillation is one of the most effective model compression methods in tasks such as image classification， but its application in complex tasks such as object detection is relatively limited. The existing knowledge distillation methods mainly focus on constructing information graphs to filter out noise from foreground or background regions during feature extraction by teachers and students， and then minimizing the mean square error loss between features. However， the objective functions of these methods are difficult to further optimize and only utilize the supervision signals of teachers， resulting in a lack of targeted information of incorrect knowledge for students. Based on this， a Contrastive Knowledge Distillation （CKD） method for object detection was proposed， which redesigned the distillation framework and loss function， and not only used the teacher’s supervision signal， but also utilized the constructed negative samples to provide guidance information for knowledge distillation， allowing students to acquire the teacher’s knowledge and acquire more knowledge through self-learning at the same time. Experimental results of the proposed method compared with the baseline on Pascal VOC and COCO2014 datasets using GFocal （Generalized Focal loss） and YOLOv5 models show that when using GFocal model on Pascal VOC dataset， CKD has the mean Average Precision （mAP） improvement of 5.6 percentage points， and the AP₅₀ （Average Precision@0.50） improvement of 5.6 percentage points； and when using YOLOv5 model on COCO2014 dataset， CKD method has the mAP improvement of 1.1 percentage points， and the AP₅₀ improvement of 1.7 percentage points.

Key words: deep neural network, knowledge distillation, contrastive learning, model compression, object detection

摘要：

知识蒸馏在图像分类等任务中是最有效的模型压缩方法之一，然而它在复杂任务如目标检测上的应用较少。现有的知识蒸馏方法主要专注于构建信息图，以过滤教师和学生在特征提取过程中来自前景或背景区域的噪声，最小化特征之间的均方差损失；然而，这些方法的目标函数难以进一步优化，且只利用教师的监督信号，导致学生缺乏对非正确知识的针对性信息。基于此，提出一种面向目标检测的对比知识蒸馏（CKD）方法。该方法重新设计蒸馏框架和损失函数，不仅使用教师的监督信号，而且利用构造的负样本提供指导信息进行知识蒸馏，让学生在获得教师的知识的同时通过自我学习获取更多知识。在Pascal VOC和COCO2014数据集上，使用GFocal（Generalized Focal loss）和YOLOv5模型将所提方法与基线方法对比的实验结果表明：CKD方法在Pascal VOC数据集上使用GFocal模型的平均精度均值（mAP）提升5.6个百分点，平均精度（阈值为0.5）AP50提升5.6个百分点；在COCO2014数据集上使用YOLOv5模型的mAP提升1.1个百分点，AP50提升1.7个百分点。

关键词: 深度神经网络, 知识蒸馏, 对比学习, 模型压缩, 目标检测

CLC Number:

TP183

Sheng YANG, Yan LI. Contrastive knowledge distillation method for object detection[J]. Journal of Computer Applications, 2025, 45(2): 354-361.

杨晟, 李岩. 面向目标检测的对比知识蒸馏方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 354-361.

Figures/Tables 16

References 40

1	ZOU Z， CHEN K， SHI Z， et al. Object detection in 20 years： a survey［J］. Proceedings of the IEEE， 2023， 111（3）： 257-276.
2	HAN S， POOL J， TRAN J， et al. Learning both weights and connections for efficient neural networks［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems — Volume 1. Cambridge： MIT Press， 2015： 1135-1143.
3	龚成，卢冶，代素蓉，等. 一种超低损失的深度神经网络量化压缩方法［J］. 软件学报， 2021， 32（8）： 2391-2407.
	GONG C， LU Y， DAI S R， et al. Ultra-low loss quantization method for deep neural network compression［J］. Journal of Software， 2021， 32（8）： 2391-2407.
4	RASTEGARI M， ORDONEZ V， REDMON J， et al. XNOR-Net： ImageNet classification using binary convolutional neural networks［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9908. Cham： Springer， 2016： 525-542.
5	HAN S， MAO H， DALLY W J. Deep compression： compressing deep neural networks with pruning， trained quantization and Huffman coding［EB/OL］. ［2024-01-13］..
6	LI R， WANG Y， LIANG F， et al. Fully quantized network for object detection［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 2805-2814.
7	ALVAREZ J M， SALZMANN M. Learning the number of neurons in deep networks［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2016： 2270-2278.
8	DENTON E， ZAREMBA W， BRUNA J， et al. Exploiting linear structure within convolutional networks for efficient evaluation［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2014： 1269-1277.
9	ZHANG X， ZOU J， HE K， et al. Accelerating very deep convolutional networks for classification and detection［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2016， 38（10）： 1943-1955.
10	WEN W， WU C， WANG Y， et al. Learning structured sparsity in deep neural networks［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2016： 2082-2090.
11	LIU Z， LI J， SHEN Z， et al. Learning efficient convolutional networks through network slimming［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2755-2763.
12	LI H， KADAV A， DURDANOVIC I， et al. Pruning filters for efficient ConvNets［EB/OL］. ［2024-01-13］..
13	HE Y， ZHANG X， SUN J. Channel pruning for accelerating very deep neural networks［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 1398-1406.
14	LUO J H， WU J， LIN W. ThiNet： a filter level pruning method for deep neural network compression［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 5068-5076.
15	HINTON G， VINYALS O， DEAN J. Distilling the knowledge in a neural network［EB/OL］. ［2024-01-13］..
16	ZAGORUYKO S， KOMODAKIS N. Paying more attention to attention： improving the performance of convolutional neural networks via attention transfer［EB/OL］. ［2024-01-13］.，pdf.
17	CHEN Y， WANG N， ZHANG Z. DarkRank： accelerating deep metric learning via cross sample similarities transfer［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2018： 2852-2859.
18	CHEN G， CHOI W， YU X， et al. Learning efficient object detection models with knowledge distillation［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 742-751.
19	XU T B， LIU C L. Deep neural network self-distillation exploiting data representation invariance［J］. IEEE Transactions on Neural Networks and Learning Systems， 2022， 33（1）： 257-269.
20	陈嘉言，任东东，李文斌，等. 面向小样本学习的轻量化知识蒸馏［J］. 软件学报， 2024， 35（5）： 2414-2429.
	CHEN J Y， REN D D， LI W B， et al. Lightweight knowledge distillation for few-shot learning［J］. Journal of Software， 2024， 35（5）： 2414-2429.
21	CHEN W， WILSON J T， TYREE S， et al. Compressing neural networks with the hashing trick［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015： 2285-2294.
22	SRINIVAS S， BABU R V. Data-free parameter pruning for deep neural networks［C］// Proceedings of the 2015 British Machine Vision Conference. Durham： BMVA Press， 2015： No.31.
23	CHEN S， ZHAO Q. Shallowing deep networks： layer-wise pruning based on feature representations［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2019， 41（12）： 3048-3056.
24	WANG T， YUAN L， ZHANG X， et al. Distilling object detectors with fine-grained feature imitation［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 4928-4937.
25	SUN R， TANG F， ZHANG X， et al. Distilling object detectors with task adaptive regularization［EB/OL］. ［2024-01-13］..
26	GUO J， HAN K， WANG Y， et al. Distilling object detectors via decoupled features［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 2154-2164.
27	ZHENG Z， YE R， WANG P， et al. Localization distillation for dense object detection［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 9397-9406.
28	TIAN Y， KRISHNAN D， ISOLA P. Contrastive representation distillation［EB/OL］. ［2024-01-13］..
29	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2014： 580-587.
30	GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2015： 1440-1448.
31	REN S， HE K， GIRSHICK R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149.
32	LIU W， ANGUELOV D， ERHAN D， et al. SSD： single shot multibox detector［C］// Proceedings of the 2016 European Conference on Computer Vision， LNCS 9905. Cham： Springer， 2016： 21-37.
33	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788.
34	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. ［2024-01-13］..
35	高洁，朱元，陆科. 基于雷达和相机融合的目标检测方法［J］. 计算机应用， 2021， 41（11）： 3242-3250.
	GAO J， ZHU Y， LU K. Object detection method based on radar and camera fusion［J］. Journal of Computer Applications， 2021， 41（11）： 3242-3250.
36	BUCILUǍ C， CARUANA R， NICULESCU-MIZIL A. Model compression［C］// Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2006： 535-541.
37	HE K， FAN H， WU Y， et al. Momentum contrast for unsupervised visual representation learning［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 9726-9735.
38	WU Z， XIONG Y， YU S X， et al. Unsupervised feature learning via non-parametric instance discrimination［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 3733-3742.
39	VAN DEN OORD A， LI Y， VINYALS O. Representation learning with contrastive predictive coding［EB/OL］. ［2024-01-13］..
40	LI X， WANG W， WU L， et al. Generalized focal loss： learning qualified and distributed bounding boxes for dense object detection［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 21002-21012.

数据集	类数	训练集样本数	验证集样本数
COCO2014	80	82 783	40 540
Pascal VOC	20	16 551	4 953

数据集	类数	训练集样本数	验证集样本数
COCO2014	80	82 783	40 540
Pascal VOC	20	16 551	4 953

方法	数据集	模型	AP₅₀	mAP
Teacher	COCO2014	YOLOv5x	51.2	33.3
Teacher	Pascal VOC	YOLOv5x	77.9	55.8
Baseline	COCO2014	YOLOv5l	48.7	31.2
	COCO2014	YOLOv5m	44.9	27.9
	Pascal VOC	YOLOv5m	74.2	51.1
	Pascal VOC	YOLOv5s	69.8	45.5
FGFI^［25］	COCO2014	YOLOv5l	49.3	31.6
	COCO2014	YOLOv5m	45.3	28.0
	Pascal VOC	YOLOv5m	74.7	51.6
	Pascal VOC	YOLOv5s	70.4	45.9
TADF^［26］	COCO2014	YOLOv5l	49.3	31.6
	COCO2014	YOLOv5m	45.3	28.1
	Pascal VOC	YOLOv5m	75.2	51.6
	Pascal VOC	YOLOv5s	70.2	45.6
DeFeat^［27］	COCO2014	YOLOv5l	49.4	31.6
	COCO2014	YOLOv5m	45.5	28.2
	Pascal VOC	YOLOv5m	75.0	51.5
	Pascal VOC	YOLOv5s	70.4	45.8
CKD	COCO2014	YOLOv5l	50.4	32.3
	COCO2014	YOLOv5m	45.9	28.5
	Pascal VOC	YOLOv5m	74.9	51.9
	Pascal VOC	YOLOv5s	70.5	46.4

方法	数据集	模型	AP₅₀	mAP
Teacher	COCO2014	YOLOv5x	51.2	33.3
Teacher	Pascal VOC	YOLOv5x	77.9	55.8
Baseline	COCO2014	YOLOv5l	48.7	31.2
	COCO2014	YOLOv5m	44.9	27.9
	Pascal VOC	YOLOv5m	74.2	51.1
	Pascal VOC	YOLOv5s	69.8	45.5
FGFI^［25］	COCO2014	YOLOv5l	49.3	31.6
	COCO2014	YOLOv5m	45.3	28.0
	Pascal VOC	YOLOv5m	74.7	51.6
	Pascal VOC	YOLOv5s	70.4	45.9
TADF^［26］	COCO2014	YOLOv5l	49.3	31.6
	COCO2014	YOLOv5m	45.3	28.1
	Pascal VOC	YOLOv5m	75.2	51.6
	Pascal VOC	YOLOv5s	70.2	45.6
DeFeat^［27］	COCO2014	YOLOv5l	49.4	31.6
	COCO2014	YOLOv5m	45.5	28.2
	Pascal VOC	YOLOv5m	75.0	51.5
	Pascal VOC	YOLOv5s	70.4	45.8
CKD	COCO2014	YOLOv5l	50.4	32.3
	COCO2014	YOLOv5m	45.9	28.5
	Pascal VOC	YOLOv5m	74.9	51.9
	Pascal VOC	YOLOv5s	70.5	46.4

方法	骨干网络	AP₅₀	mAP
Teacher	Res101	67.2	67.2
Baseline	Res18	55.5	55.5
LD^［28］	Res18	55.4	55.4
CKD	Res18	61.1	61.1