《计算机应用》唯一官方网站 ›› 2021, Vol. 41 ›› Issue (12): 3426-3431.DOI: 10.11772/j.issn.1001-9081.2021060923

• 第十八届中国机器学习会议(CCML 2021) • 上一篇    

基于知识蒸馏的特定知识学习

戴朝霞1, 曹堉栋2, 朱光明2,3, 沈沛意2,3, 徐旭2,4, 梅林2,4, 张亮2,3()   

  1. 1.中国电子科技集团公司第三十研究所,成都 610041
    2.西安电子科技大学 计算机科学与技术学院,西安 710071
    3.西安市智能软件工程重点实验室,西安 710071
    4.公安部第三研究所,上海 200031
  • 收稿日期:2021-05-12 修回日期:2021-08-19 接受日期:2021-08-31 发布日期:2021-10-18 出版日期:2021-12-10
  • 通讯作者: 张亮
  • 作者简介:戴朝霞(1972—),女,湖南衡山人,高级工程师,主要研究方向:网络信息安全、网络管理
    曹堉栋(1996—),男,山西介休人,硕士研究生,主要研究方向:模型压缩、知识蒸馏
    朱光明(1987—),男,河南周口人,副教授,博士,CCF会员,主要研究方向:手势识别、行为识别
    沈沛意(1969—),男,浙江绍兴人,教授,博士,CCF会员,主要研究方向:图像处理、场景语义图生成
    徐旭(1981—),男,安徽宿州人,研究员,硕士,CCF会员,主要研究方向:知识图谱
    梅林(1971—),男,安徽阜阳人,教授,博士,CCF会员,主要研究方向:视频大数据;
  • 基金资助:
    国家自然科学基金资助项目(62072358);国家重点研发计划项目(2020YFF0304900);陕西省重点研发计划项目(2018ZDXM-GY-036)

Specific knowledge learning based on knowledge distillation

Zhaoxia DAI1, Yudong CAO2, Guangming ZHU2,3, Peiyi SHEN2,3, Xu XU2,4, Lin MEI2,4, Liang ZHANG2,3()   

  1. 1.The 30th Research Institute of China Electronics Technology Group Corporation,Chengdu Sichuan 610041,China
    2.School of Computer Science and Technology,Xidian University,Xi’an Shaanxi 710071,China
    3.Xi’an Key Laboratory of Intelligent Software Engineering,Xi’an Shaanxi 710071,China
    4.The Third Research Institute of Ministry of Public Security,Shanghai 200031,China
  • Received:2021-05-12 Revised:2021-08-19 Accepted:2021-08-31 Online:2021-10-18 Published:2021-12-10
  • Contact: Liang ZHANG
  • About author:DAI Zhaoxia, born in 1972, senior engineer. Her research interests include network information security, network management.
    CAO Yudong, born in 1996, M. S. candidate. His research interests include model compression, knowledge distillation.
    ZHU Guangming, born in 1987, Ph. D., associate professor. His research interests include gesture recognition, behavior recognition.
    SHEN Peiyi, born in 1969, Ph. D., professor. His research interests include image processing, scene graph generation.
    XU Xu, born in 1981, M. S., research fellow. His research interests include knowledge graph.
    MEI Lin, born in 1971, Ph. D., professor. His research interests include video big data.
  • Supported by:
    the National Natural Science Foundation of China(62072358);the National Key Research and Development Program of China(2020YFF0304900);the Key Research and Development Program of Shaanxi Province(2018ZDXM-GY-036)

摘要:

在传统知识蒸馏框架中,教师网络将自身的知识全盘传递给学生网络,而传递部分知识或者特定知识的研究几乎没有。考虑到工业现场具有场景单一、分类数目少的特点,需要重点评估神经网络模型在特定类别领域的识别性能。基于注意力特征迁移蒸馏算法,提出了三种特定知识学习算法来提升学生网络在特定类别分类中的分类性能。首先,对训练数据集作特定类筛选以排除其他非特定类别的训练数据;在此基础上,将其他非特定类别视为背景并在蒸馏过程中抑制背景知识,从而进一步减少其他无关类知识对特定类知识的影响;最后,更改网络结构,即仅在网络高层抑制背景类知识,而保留网络底层基础图形特征的学习。实验结果表明,通过特定知识学习算法训练的学生网络在特定类别分类中能够媲美甚至超越参数规模六倍于它的教师网络的分类性能。

关键词: 模型压缩, 深度卷积神经网络, 残差网络, 知识蒸馏, 深度学习

Abstract:

In the framework of traditional knowledge distillation, the teacher network transfers all of its own knowledge to the student network, and there are almost no researches on the transfer of partial knowledge or specific knowledge. Considering that the industrial field has the characteristics of single scene and small number of classifications, the evaluation of recognition performance of neural network models in specific categories need to be focused on. Based on the attention feature transfer distillation algorithm, three specific knowledge learning algorithms were proposed to improve the classification performance of student networks in specific categories. Firstly, the training dataset was filtered for specific classes to exclude other non-specific classes of training data. On this basis, other non-specific classes were treated as background and the background knowledge was suppressed in the distillation process, so as to further reduce the impact of other irrelevant knowledge on specific classes of knowledge. Finally, the network structure was changed, that is the background knowledge was suppressed only at the high-level of the network, and the learning of basic graphic features was retained at the bottom of the network. Experimental results show that the student network trained by a specific knowledge learning algorithm can be as good as or even has better classification performance than a teacher network whose parameter scale is six times of that of the student network in specific category classification.

Key words: model compression, deep convolutional neural network, residual network, knowledge distillation, deep learning

中图分类号: