Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (9): 2652-2658.DOI: 10.11772/j.issn.1001-9081.2021071201
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Received:
2021-07-12
Revised:
2021-09-06
Accepted:
2021-09-08
Online:
2021-09-14
Published:
2022-09-10
Contact:
Jianliang LI
About author:
CAI Chunhao, born in 1997, M. S. candidate. His research interests include model distillation, deep learning, image recognition.
Supported by:
通讯作者:
李建良
作者简介:
蔡淳豪(1997—),男,江苏无锡人,硕士研究生,主要研究方向:模型蒸馏、深度学习、图像识别;
基金资助:
CLC Number:
Chunhao CAI, Jianliang LI. Model distillation model based on training weak teacher networks about few-shots[J]. Journal of Computer Applications, 2022, 42(9): 2652-2658.
蔡淳豪, 李建良. 小样本问题下培训弱教师网络的模型蒸馏模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2652-2658.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021071201
模型 | ||||||
---|---|---|---|---|---|---|
元学习模型 | 1 | 1 | 0 | 2 | 0 | 1 |
集成蒸馏模型+Boosting | 1 | 20 | 3 | 0 | 1 | 0 |
集成蒸馏模型 | 1 | 20 | 3 | 0 | 1 | 0 |
集成蒸馏模型+外部网络 | 2 | 21 | 1 | 1 | 1 | 1 |
Tab. 1 Hyperparameter settings of different models on CUB200 dataset
模型 | ||||||
---|---|---|---|---|---|---|
元学习模型 | 1 | 1 | 0 | 2 | 0 | 1 |
集成蒸馏模型+Boosting | 1 | 20 | 3 | 0 | 1 | 0 |
集成蒸馏模型 | 1 | 20 | 3 | 0 | 1 | 0 |
集成蒸馏模型+外部网络 | 2 | 21 | 1 | 1 | 1 | 1 |
模型 | 准确率/% | 运行时间/h |
---|---|---|
经典模型蒸馏 | 42.15±0.75 | 5.75 |
元学习模型 | 65.05±1.19 | 5.35 |
集成蒸馏模型+Boosting | 69.37±0.66 | 32.72 |
集成蒸馏模型 | 58.07±0.73 | 5.68 |
集成蒸馏模型+外部网络 | 69.21±0.82 | 6.55 |
Tab. 2 Accuracy and computing time comparison of different models on CUB200 dataset
模型 | 准确率/% | 运行时间/h |
---|---|---|
经典模型蒸馏 | 42.15±0.75 | 5.75 |
元学习模型 | 65.05±1.19 | 5.35 |
集成蒸馏模型+Boosting | 69.37±0.66 | 32.72 |
集成蒸馏模型 | 58.07±0.73 | 5.68 |
集成蒸馏模型+外部网络 | 69.21±0.82 | 6.55 |
模型 | 每个类别参与训练的图像数 | ||||
---|---|---|---|---|---|
100 | 200 | 400 | 700 | 1 000 | |
经典模型 | 51.90±0.58 | 58.58±0.86 | 67.34±0.97 | 77.53±1.30 | 81.08±1.27 |
注意力模型 | 62.32±0.91 | 67.97±1.43 | 77.43±0.49 | 81.44±1.76 | 83.17±0.79 |
元学习模型 | 67.95±1.26 | 71.91±1.31 | 79.42±1.20 | 83.10±0.32 | 85.46±0.72 |
集成蒸馏模型 | 72.56±1.13 | 81.57±0.73 | 81.57±0.73 | 84.60±1.19 | 86.62±1.52 |
Tab. 3 Accuracies of different models on CIFAR-10 dataset’s images with different scales
模型 | 每个类别参与训练的图像数 | ||||
---|---|---|---|---|---|
100 | 200 | 400 | 700 | 1 000 | |
经典模型 | 51.90±0.58 | 58.58±0.86 | 67.34±0.97 | 77.53±1.30 | 81.08±1.27 |
注意力模型 | 62.32±0.91 | 67.97±1.43 | 77.43±0.49 | 81.44±1.76 | 83.17±0.79 |
元学习模型 | 67.95±1.26 | 71.91±1.31 | 79.42±1.20 | 83.10±0.32 | 85.46±0.72 |
集成蒸馏模型 | 72.56±1.13 | 81.57±0.73 | 81.57±0.73 | 84.60±1.19 | 86.62±1.52 |
1 | POLIKAR R. Ensemble learning[M]// ZHANG C, MA Y Q. Ensemble Machine Learning: Methods and Applications. Boston: Springer, 2012:1-34. 10.1007/978-1-4419-9326-7_1 |
2 | PAN S J, YANG Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10):1345-1359. 10.1109/tkde.2009.191 |
3 | RAZAVIAN A S, AZIZPOUR H, SULLIVAN J, et al. CNN features off-the-shelf: an astounding baseline for recognition[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2014:512-519. 10.1109/cvprw.2014.131 |
4 | CUI Y, SONG Y, SUN C, et al. Large scale fine-grained categorization and domain-specific transfer learning[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018:4109-4118. 10.1109/cvpr.2018.00432 |
5 | ROMERO A, BALLAS N, KAHOU S E, et al. FitNets: hints for thin deep nets[EB/OL]. (2015-03-27). [2021-08-21].. |
6 | ZAGORUYKO S, KOMODAKIS N. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer[EB/OL]. (2017-02-12). [2021-08-21].. 10.1109/icip42928.2021.9506101 |
7 | SRINIVAS S, FLEURET F. Knowledge transfer with Jacobian matching[C]// Proceedings of the 35th International Conference on Machine Learning. New York: PMLR.org, 2018:4723-4731. |
8 | JANG Y, LEE H, HWANG S J, et al. Learning what and where to transfer[C]// Proceedings of the 36th International Conference on Machine Learning. New York: PMLR.org, 2019:3030-3039. |
9 | JIA D, WEI D, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009:248-255. 10.1109/cvpr.2009.5206848 |
10 | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10) [2021-08-21].. |
11 | RIBEIRO M T, SINGH S, GUESTRIN C. "Why should I trust you?": explaining the predictions of any classifier[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016:1135-1144. 10.1145/2939672.2939778 |
12 | YOU S, XU C, XU C, et al. Learning from multiple teacher networks[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2017:1285-1294. 10.1145/3097983.3098135 |
13 | FURLANELLO T, LIPTON Z C, TSCHANNEN M, et al. Born-again neural networks[C]// Proceedings of the 35th International Conference on Machine Learning. New York: PMLR.org, 2018:1607-1616. |
14 | DVORNIK N, MAIRAL J, SCHMID C. Diversity with cooperation: ensemble methods for few-shot classification[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019:3722-3730 . 10.1109/iccv.2019.00382 |
15 | ZHANG Q S, CAO R M, SHI F, et al. Interpreting CNN knowledge via an explanatory graph[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018:4454-4463. 10.1609/aaai.v32i1.11819 |
16 | BAGHERINEZHAD H, HORTON M, RASTEGARI M, et al. Label refinery: improving ImageNet classification through label progression[EB/OL]. (2018-05-07). [2021-08-21].. |
17 | UZKENT B, YEH C, ERMON S. Efficient object detection in large images using deep reinforcement learning[C]// Proceedings of the 2020 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2020:1824-1822. 10.1109/wacv45572.2020.9093447 |
18 | FROSST N, HINTON G. Distilling a neural network into a soft decision tree[EB/OL]. (2017-11-27). [2021-08-21].. |
19 | 刘乐姗. 卷积神经网络模型压缩的算法优化研究[D]. 石家庄:河北经贸大学, 2020:8-36. |
LIU L S. The research on algorithm optimization of convolutional neural network model compression[D]. Shijiazhuang: Hebei University of Economics and Business, 2020:8-36. | |
20 | FLENNERHAG S, MORENO P G, LAWRENCE N D, et al. Transferring knowledge across learning processes[EB/OL]. (2019-03-22). [2021-08-21].. |
21 | MURDOCH W J, LIU P J, YU B. Beyond word importance: contextual decomposition to extract interactions from LSTMs[EB/OL]. (2019-04-27). [2021-08-21].. |
22 | HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[EB/OL]. (2015-03-09). [2021-08-21].. |
23 | HEO B, LEE M, YUN S, et al. Knowledge distillation with adversarial samples supporting decision boundary[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2019:3771-3778. 10.1609/aaai.v33i01.33013771 |
24 | SHEN C C, WANG X C, SONG J, et al. Amalgamating knowledge towards comprehensive classification[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2019:3068-3075. 10.1609/aaai.v33i01.33013068 |
25 | BREIMAN L. Bagging predictors[J]. Machine Learning, 1996, 24(2):123-140. 10.1007/bf00058655 |
26 | SHEN C C, XUE M Q, WANG X C, et al. Customizing student networks from heterogeneous teachers via adaptive knowledge amalgamation[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019:3503-3512. 10.1109/iccv.2019.00360 |
27 | YE J W, WANG X C, JI Y X, et al. Amalgamating filtered knowledge: learning task-customized student from multi-task teachers[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2019: 4128-4134. 10.24963/ijcai.2019/573 |
28 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016:770-778. 10.1109/cvpr.2016.90 |
[1] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[2] | Xinyan YU, Cheng ZENG, Qian WANG, Peng HE, Xiaoyu DING. Few-shot news topic classification method based on knowledge enhancement and prompt learning [J]. Journal of Computer Applications, 2024, 44(6): 1767-1774. |
[3] | Hongtian LI, Xinhao SHI, Weiguo PAN, Cheng XU, Bingxin XU, Jiazheng YUAN. Few-shot object detection via fusing multi-scale and attention mechanism [J]. Journal of Computer Applications, 2024, 44(5): 1437-1444. |
[4] | Zhihao WU, Ziqiu CHI, Ting XIAO, Zhe WANG. Meta-learning adaption for few-shot text-to-speech [J]. Journal of Computer Applications, 2024, 44(5): 1629-1635. |
[5] | Wangjun SHI, Jing WANG, Xiaojun NING, Youfang LIN. Sleep stage classification model by meta transfer learning in few-shot scenarios [J]. Journal of Computer Applications, 2024, 44(5): 1445-1451. |
[6] | Xinye LI, Yening HOU, Yinghui KONG, Zhiqi YAN. Few-shot object detection combining feature fusion and enhanced attention [J]. Journal of Computer Applications, 2024, 44(3): 745-751. |
[7] | Keyi FU, Gaocai WANG, Man WU. Few-shot object detection method based on improved region proposal network and feature aggregation [J]. Journal of Computer Applications, 2024, 44(12): 3790-3797. |
[8] | Yuxin HUANG, Yiwang HUANG, Hui HUANG. Meta label correction method based on shallow network predictions [J]. Journal of Computer Applications, 2024, 44(11): 3364-3370. |
[9] | Li XIE, Weiping SHU, Junjie GENG, Qiong WANG, Hailin YANG. Few-shot cervical cell classification combining weighted prototype and adaptive tensor subspace [J]. Journal of Computer Applications, 2024, 44(10): 3200-3208. |
[10] | Bihui YU, Xingye CAI, Jingxuan WEI. Few-shot text classification method based on prompt learning [J]. Journal of Computer Applications, 2023, 43(9): 2735-2740. |
[11] | Xiaomin ZHOU, Fei TENG, Yi ZHANG. Automatic international classification of diseases coding model based on meta-network [J]. Journal of Computer Applications, 2023, 43(9): 2721-2726. |
[12] | Junjian JIANG, Dawei LIU, Yifan LIU, Yougui REN, Zhibin ZHAO. Few-shot object detection algorithm based on Siamese network [J]. Journal of Computer Applications, 2023, 43(8): 2325-2329. |
[13] | Hui WANG, Jianhong LI. Few-shot recognition method of 3D models based on Transformer [J]. Journal of Computer Applications, 2023, 43(6): 1750-1758. |
[14] | Jiehang DENG, Wenquan GUO, Hanjie CHEN, Guosheng GU, Jingjian LIU, Yukun DU, Chao LIU, Xiaodong KANG, Jian ZHAO. Few-shot diatom detection combining multi-scale multi-head self-attention and online hard example mining [J]. Journal of Computer Applications, 2022, 42(8): 2593-2600. |
[15] | Renjie XU, Baodi LIU, Kai ZHANG, Weifeng LIU. Model agnostic meta learning algorithm based on Bayesian weight function [J]. Journal of Computer Applications, 2022, 42(3): 708-712. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||