《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (9): 2652-2658.DOI: 10.11772/j.issn.1001-9081.2021071201

• 人工智能 • 上一篇    

小样本问题下培训弱教师网络的模型蒸馏模型

蔡淳豪, 李建良()   

  1. 南京理工大学 理学院,南京 210094
  • 收稿日期:2021-07-12 修回日期:2021-09-06 接受日期:2021-09-08 发布日期:2021-09-14 出版日期:2022-09-10
  • 通讯作者: 李建良
  • 作者简介:蔡淳豪(1997—),男,江苏无锡人,硕士研究生,主要研究方向:模型蒸馏、深度学习、图像识别;
  • 基金资助:
    装备预研中国电科联合基金资助项目(6141B08231109)

Model distillation model based on training weak teacher networks about few-shots

Chunhao CAI, Jianliang LI()   

  1. School of Science,Nanjing University of Science and Technology,Nanjing Jiangsu 210094,China
  • Received:2021-07-12 Revised:2021-09-06 Accepted:2021-09-08 Online:2021-09-14 Published:2022-09-10
  • Contact: Jianliang LI
  • About author:CAI Chunhao, born in 1997, M. S. candidate. His research interests include model distillation, deep learning, image recognition.
  • Supported by:
    Equipment Pre-Research and CETC Joint Fund(6141B08231109)

摘要:

针对深度神经网络在图像识别中存在的训练数据不足,以及多模型蒸馏中存在的细节特征丢失和蒸馏计算量大的问题,提出一种小样本问题下培训弱教师网络的模型蒸馏模型。首先通过集成学习算法中的引导聚集(Bagging)算法培训弱教师网络集,在保留图像数据集细节特征的同时进行并行计算以提升网络生成效率;然后融合知识合并算法,并基于弱教师网络特征图形成单个高质量、高复杂度的教师网络,从而获得细节重点更突出的图像特征图;最后在目前先进的模型蒸馏基础上提出了针对组合特征图改进元网络的集成蒸馏模型,该算法在减少了元网络训练计算量的同时实现了小样本数据集对目标网络的训练。实验结果表明,所提模型在准确率上相较于单纯以优质网络为教师网络的蒸馏方案有6.39%的相对改进;比较自适应增强(AdaBoost)算法训练教师网络再加以蒸馏得到的模型和集成蒸馏模型的模型准确率,二者相差在给定误差范围内,而集成蒸馏模型比AdaBoost算法的网络生成速率提升了4.76倍。可见所提模型能有效提高目标模型在小样本问题下的准确率和训练效率。

关键词: 小样本, 模型蒸馏, 集成学习, 元学习, 特征合并

Abstract:

Aiming at the lack of training data of deep neural networks in image recognition, as well as the loss of detailed features and the large amount of distillation calculations in the multi-model distillation, a model distillation model based on training weak teacher networks about few-shots was proposed. Firstly, the weak teacher network set was trained through the Bootstrap aggregating (Bagging) algorithm in the ensemble learning algorithm. While retaining the detailed features of the image dataset, parallel computing was able to be realized to improve the efficiency of network generation. Then, the knowledge merging algorithm was combined, and single high-quality high-complexity teacher networks were formed based on the weak teacher network feature maps, thereby obtaining the image feature maps with more significant details. Finally, based on the current advanced model distillation, an ensemble distillation algorithm with meta-network improved with combined feature maps was proposed, which reduced the calculation of meta-network training and realized the training of the target network about few-shots at the same time. Experimental results show that the algorithm had a 6.39% relative improvement in accuracy compared to the distillation scheme that uses a high-quality network as the teacher network. Comparing the accuracy of the model which obtained by training and distilling the teacher networks with Adaptive Boosting (AdaBoost) algorithm and the accuracy of the model obtained by the ensemble distillation model, the difference is within the given error range. However, the network generation rate of the ensemble distillation algorithm was increased by 4.76 times compared with that of AdaBoost algorithm. Therefore, the proposed algorithm can effectively improve the accuracy and training efficiency of the target model about few-shots.

Key words: few-shot, model distillation, Ensemble Learning (EL), meta learning, feature merging

中图分类号: