基于知识蒸馏的特定知识学习

doi:10.11772/j.issn.1001-9081.2021060923

《计算机应用》唯一官方网站 ›› 2021, Vol. 41 ›› Issue (12): 3426-3431.DOI: 10.11772/j.issn.1001-9081.2021060923

• 第十八届中国机器学习会议(CCML 2021) • 上一篇

基于知识蒸馏的特定知识学习

戴朝霞¹, 曹堉栋², 朱光明²^,³, 沈沛意²^,³, 徐旭²^,⁴, 梅林²^,⁴, 张亮²^,³()

^1.中国电子科技集团公司第三十研究所，成都 610041
^2.西安电子科技大学计算机科学与技术学院，西安 710071
^3.西安市智能软件工程重点实验室，西安 710071
^4.公安部第三研究所，上海 200031

收稿日期:2021-05-12 修回日期:2021-08-19 接受日期:2021-08-31 发布日期:2021-10-18 出版日期:2021-12-10
通讯作者: 张亮
作者简介:戴朝霞（1972—），女，湖南衡山人，高级工程师，主要研究方向：网络信息安全、网络管理
曹堉栋（1996—），男，山西介休人，硕士研究生，主要研究方向：模型压缩、知识蒸馏
朱光明（1987—），男，河南周口人，副教授，博士，CCF会员，主要研究方向：手势识别、行为识别
沈沛意（1969—），男，浙江绍兴人，教授，博士，CCF会员，主要研究方向：图像处理、场景语义图生成
徐旭（1981—），男，安徽宿州人，研究员，硕士，CCF会员，主要研究方向：知识图谱
梅林（1971—），男，安徽阜阳人，教授，博士，CCF会员，主要研究方向：视频大数据；
基金资助:
国家自然科学基金资助项目(62072358);国家重点研发计划项目(2020YFF0304900);陕西省重点研发计划项目(2018ZDXM-GY-036)

Specific knowledge learning based on knowledge distillation

Zhaoxia DAI¹, Yudong CAO², Guangming ZHU²^,³, Peiyi SHEN²^,³, Xu XU²^,⁴, Lin MEI²^,⁴, Liang ZHANG²^,³()

^1.The 30th Research Institute of China Electronics Technology Group Corporation，Chengdu Sichuan 610041，China
^2.School of Computer Science and Technology，Xidian University，Xi’an Shaanxi 710071，China
^3.Xi’an Key Laboratory of Intelligent Software Engineering，Xi’an Shaanxi 710071，China
^4.The Third Research Institute of Ministry of Public Security，Shanghai 200031，China

Received:2021-05-12 Revised:2021-08-19 Accepted:2021-08-31 Online:2021-10-18 Published:2021-12-10
Contact: Liang ZHANG
About author:DAI Zhaoxia， born in 1972， senior engineer. Her research interests include network information security， network management.
CAO Yudong， born in 1996， M. S. candidate. His research interests include model compression， knowledge distillation.
ZHU Guangming， born in 1987， Ph. D.， associate professor. His research interests include gesture recognition， behavior recognition.
SHEN Peiyi， born in 1969， Ph. D.， professor. His research interests include image processing， scene graph generation.
XU Xu， born in 1981， M. S.， research fellow. His research interests include knowledge graph.
MEI Lin， born in 1971， Ph. D.， professor. His research interests include video big data.
Supported by:
the National Natural Science Foundation of China(62072358);the National Key Research and Development Program of China(2020YFF0304900);the Key Research and Development Program of Shaanxi Province(2018ZDXM-GY-036)

摘要/Abstract

摘要：

在传统知识蒸馏框架中，教师网络将自身的知识全盘传递给学生网络，而传递部分知识或者特定知识的研究几乎没有。考虑到工业现场具有场景单一、分类数目少的特点，需要重点评估神经网络模型在特定类别领域的识别性能。基于注意力特征迁移蒸馏算法，提出了三种特定知识学习算法来提升学生网络在特定类别分类中的分类性能。首先，对训练数据集作特定类筛选以排除其他非特定类别的训练数据；在此基础上，将其他非特定类别视为背景并在蒸馏过程中抑制背景知识，从而进一步减少其他无关类知识对特定类知识的影响；最后，更改网络结构，即仅在网络高层抑制背景类知识，而保留网络底层基础图形特征的学习。实验结果表明，通过特定知识学习算法训练的学生网络在特定类别分类中能够媲美甚至超越参数规模六倍于它的教师网络的分类性能。

关键词: 模型压缩, 深度卷积神经网络, 残差网络, 知识蒸馏, 深度学习

Abstract:

In the framework of traditional knowledge distillation， the teacher network transfers all of its own knowledge to the student network， and there are almost no researches on the transfer of partial knowledge or specific knowledge. Considering that the industrial field has the characteristics of single scene and small number of classifications， the evaluation of recognition performance of neural network models in specific categories need to be focused on. Based on the attention feature transfer distillation algorithm， three specific knowledge learning algorithms were proposed to improve the classification performance of student networks in specific categories. Firstly， the training dataset was filtered for specific classes to exclude other non-specific classes of training data. On this basis， other non-specific classes were treated as background and the background knowledge was suppressed in the distillation process， so as to further reduce the impact of other irrelevant knowledge on specific classes of knowledge. Finally， the network structure was changed， that is the background knowledge was suppressed only at the high-level of the network， and the learning of basic graphic features was retained at the bottom of the network. Experimental results show that the student network trained by a specific knowledge learning algorithm can be as good as or even has better classification performance than a teacher network whose parameter scale is six times of that of the student network in specific category classification.

Key words: model compression, deep convolutional neural network, residual network, knowledge distillation, deep learning

中图分类号:

TP391

戴朝霞, 曹堉栋, 朱光明, 沈沛意, 徐旭, 梅林, 张亮. 基于知识蒸馏的特定知识学习[J]. 计算机应用, 2021, 41(12): 3426-3431.

Zhaoxia DAI, Yudong CAO, Guangming ZHU, Peiyi SHEN, Xu XU, Lin MEI, Liang ZHANG. Specific knowledge learning based on knowledge distillation[J]. Journal of Computer Applications, 2021, 41(12): 3426-3431.

图/表 7

图1特定知识学习示意图

Fig. 1 Schematic diagram of specific knowledge learning

图2 注意力特征迁移结构

Fig. 2 Attention feature transfer structure

图3 特定知识学习基础算法结构

Fig. 3 AT_Specific structure

图4 背景类知识抑制算法结构

Fig. 4 AT_Background structure

图5 背景类知识抑制算法内部蒸馏细节

Fig. 5 Internal distillation details of AT_Background

表1 CIFAR-100上特定分类任务1的实验结果

Tab. 1 Experimental results of specific category task 1 on CIFAR-100

算法网络	参数量/MB	Acc/%	Pre/%	Re/%	F₁/%
Teacher^［19］	7.4	88.00	88.08	88.00	88.03
StudentFS-5^［19］	1.2	82.40	82.41	82.40	82.32
StudentFS-100^［19］	1.2	85.20	85.20	85.20	85.18
AT^［15］	1.2	84.60	84.60	84.60	84.52
AT_Specific	1.2	85.20	85.03	85.20	85.05
AT_Background	1.2	87.40	87.39	87.40	87.39
AT_Background_High	1.2	88.80	88.78	88.80	88.78

图6 七种网络在20个特定类别任务的准确率柱状图

Fig. 6 Histogram of accuracy of 7 networks in 20 specific category tasks

参考文献 19

1	RIGAMONTI R， SIRONI A， LEPETIT V， et al. Learning separable filters［C］// Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2013： 2754-2761. 10.1109/cvpr.2013.355
2	BA L J， CARUANA R. Do deep nets really need to be deep？［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2014： 2654-2662.
3	DENIL M， SHAKIBI B， DINH L， et al. Predicting parameters in deep learning［C］// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2013： 2148-2156.
4	ZHANG X Y， ZOU J H， HE K M， et al. Accelerating very deep convolutional networks for classification and detection［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2016， 38（10）： 1943-1955. 10.1109/tpami.2015.2502579
5	DENTON E， ZAREMBA W， BRUNA J， et al. Exploiting linear structure within convolutional networks for efficient evaluation［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2014： 1269-1277.
6	VANHOUCKE V， SENIOR A， MAO M Z. Improving the speed of neural networks on CPUs［EB/OL］. ［2021-03-05］..
7	GUPTA S， AGRAWAL A， GOPALAKRISHNAN K， et al. Deep learning with limited numerical precision［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR.org， 2015： 1737-1746.
8	LIU Z， LI J G， SHEN Z Q， et al. Learning efficient convolutional networks through network slimming［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2755-2763. 10.1109/iccv.2017.298
9	LUO J H， ZHANG H， ZHOU H Y， et al. ThiNet： pruning CNN filters for a thinner net［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2019， 41（10）： 2525-2538. 10.1109/tpami.2018.2858232
10	DONG X， CHEN S Y， PAN S J. Learning to prune deep neural networks via layer-wise optimal brain surgeon［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 4860-4874.
11	HAN S， POOL J， TRAN J， et al. Learning both weights and connections for efficient neural networks［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015： 1135-1143.
12	ZHANG T Y， YE S K， ZHANG K Q， et al. A systematic DNN weight pruning framework using alternating direction method of multipliers［C］// Proceedings of the 2018 European Conference on Computer Vision， LNCS11212. Cham： Springer， 2018： 191-207.
13	HINTON G， VINYALS O， DEAN J. Distilling the knowledge in a neural network［EB/OL］. （2015-03-09）［2021-03-05］..
14	ROMERO A， BALLAS N， KAHOU S E， et al. FitNets： hints for thin deep nets［EB/OL］. （2015-03-27）［2021-03-05］..
15	ZAGORUYKO S， KOMODAKIS N. Paying more attention to attention： improving the performance of convolutional neural networks via attention transfer［EB/OL］. （2017-02-12）［2021-03-15］.. 10.1109/cvpr.2015.7299064
16	YIM J， JOO D， BAE J， et al. A gift from knowledge distillation： fast optimization， network minimization and transfer learning［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 7130-7138. 10.1109/cvpr.2017.754
17	HEO B， LEE M， YUN S， et al. Knowledge transfer via distillation of activation boundaries formed by hidden neurons［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019： 3779-3787. 10.1609/aaai.v33i01.33013779
18	ZHANG Y， XIANG T， HOSPEDALES T M， et al. Deep mutual learning［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4320-4328. 10.1109/cvpr.2018.00454
19	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90

[1]	张成, 万源, 强浩鹏. 基于知识蒸馏的深度无监督离散跨模态哈希[J]. 计算机应用, 2021, 41(9): 2523-2531.
[2]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[3]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[4]	谢德峰, 吉建民. 融入句法感知表示进行句法增强的语义解析[J]. 计算机应用, 2021, 41(9): 2489-2495.
[5]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[6]	陈成瑞, 孙宁, 何世彪, 廖勇. 面向C-V2X通信的基于深度学习的联合信道估计与均衡算法[J]. 计算机应用, 2021, 41(9): 2687-2693.
[7]	王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747.
[8]	郑志强, 胡鑫, 翁智, 王雨禾, 程曦. 基于改进DenseNet的牛眼图像特征提取方法[J]. 计算机应用, 2021, 41(9): 2780-2784.
[9]	何正海, 线岩团, 王蒙, 余正涛. 融合句法指导与字符注意力机制的案情阅读理解方法[J]. 计算机应用, 2021, 41(8): 2427-2431.
[10]	黄继爽, 张华, 李永龙, 赵皓, 王皓冉, 冯春成. 基于动态特征蒸馏的水工隧洞缺陷识别方法[J]. 计算机应用, 2021, 41(8): 2358-2365.
[11]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 《计算机应用》唯一官方网站, 2021, 41(8): 2273-2287.
[12]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[13]	高钦泉, 黄炳城, 刘文哲, 童同. 基于改进CenterNet的竹条表面缺陷检测方法[J]. 计算机应用, 2021, 41(7): 1933-1938.
[14]	侯笑晗, 金国栋, 谭力宁, 薛远亮. 基于自适应和最优特征的合成孔径雷达舰船检测方法[J]. 计算机应用, 2021, 41(7): 2150-2155.
[15]	李亚芳, 梁烨, 冯韦玮, 祖宝开, 康玉健. 基于社区优化的深度网络嵌入方法[J]. 计算机应用, 2021, 41(7): 1956-1963.

基于知识蒸馏的特定知识学习

Specific knowledge learning based on knowledge distillation

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 19

相关文章 15

编辑推荐

Metrics