Data-free class incremental learning based on knowledge distillation

doi:10.11772/j.issn.1001-9081.2023111716

Abstract

Abstract:

Previous data-free class incremental learning methods can generate class data for learned tasks through techniques such as model inversion， but they cannot alleviate the model's plasticity-stability dilemma effectively， and these synthesis techniques are easy to ignore the diversity of data. To address these issues， a knowledge distillation-based incremental learning strategy was proposed. Firstly， local cross-entropy loss was utilized to facilitate the model in learning knowledge related to new classes. Secondly， a combination of distillation based on output features was introduced to reduce forgetting of knowledge related to old classes. Finally， the distillation based on relational features was applied to alleviate model's conflicts between learning representation of new classes and retaining representation of old classes. Furthermore， to enhance the diversity of generated data， a regularization term was introduced on the basis of model inversion to prevent the generated samples from being similar excessively. Experimental results show that compared to Relation-guided representation learning for Data-Free Class Incremental Learning （R?DFCIL）， on CIFAR-100 dataset， the proposed model achieves average incremental accuracy improvements of 0.25 and 0.18 percentage points on 5-task and 10-task scenarios respectively， while on Tiny-ImageNet dataset， the corresponding improvements are 0.21 and 0.07 percentage points respectively. Besides， the proposed model does not require additional classifiers for fine-tuning， and the proposed diversity regularization item provides a way for improvement in data-free class incremental learning.

Key words: knowledge distillation, class incremental learning, model inversion, diversity regularization, deep learning

摘要：

以往的不存储旧数据的类增量学习方法虽然能通过模型反转等技术生成已学任务中的类别数据，但未能有效缓解模型的可塑性-稳定性困境，并且这些合成技术很容易忽略数据的多样性。针对以上问题，提出一种基于知识蒸馏的增量学习策略。首先，采用局部交叉熵损失促使模型学习新的类别知识；其次，引入基于输出特征的蒸馏组合，以减少对旧类别知识的遗忘；最后，使用基于关系特征的蒸馏，从而缓解模型在学习新类别表征与保留旧类别表征之间的冲突。而且，为了增加生成数据的多样性，在模型反转的基础上引入一个正则项，以防止生成的样本过于相似。实验结果表明，与基于关系引导表示学习的不存储旧数据的类增量学习（R-DFCIL）相比：在CIFAR-100数据集上，所提模型在5个任务和10个任务上的平均增量准确率分别提高了0.25和0.18个百分点；在Tiny-ImageNet数据集上，相应的提升分别为0.21和0.07个百分点。此外，所提模型不需要额外的分类器微调，且所提多样性正则项为不存储旧数据的类增量学习提供了一种改进方向。

关键词: 知识蒸馏, 类增量学习, 模型反转, 多样性正则, 深度学习

CLC Number:

TP391.41

Zhanyang LIU, Jinfeng LIU. Data-free class incremental learning based on knowledge distillation[J]. Journal of Computer Applications, 0, (): 12-17.

刘展阳, 刘进锋. 基于知识蒸馏的不存储旧数据的类增量学习[J]. 《计算机应用》唯一官方网站, 0, (): 12-17.

Figures/Tables 7

方法	5-task	10-task
DGR^*	14.40±0.40	8.10±0.10
LwF^*	17.00±0.10	9.20±0.00
DeepInversion^*	18.80±0.30	10.90±0.60
ABD	44.40±0.70	34.24±0.87
MFGR	47.01±0.83	30.07±0.91
R-DFCIL^*	50.47±0.43	42.37±0.72
DFCIL-KD	50.26±0.52	42.04±0.92

方法	5-task	10-task
ABD	60.29±0.86	53.85±1.22
MFGR	62.91±1.05	49.07±0.97
R-DFCIL^*	64.85±1.78	59.41±1.76
DFCIL-KD	65.10±0.47	59.59±0.81

方法	5-task		10-task
方法	$A N$	$A ¯ N$	$A N$	$A ¯ N$
ABD	28.62±0.32	44.59±0.81	20.67±0.16	39.24±0.74
MFGR	33.72±0.48	47.33±0.56	19.17±0.27	33.91±0.22
R-DFCIL^*	35.89±0.75	48.96±0.40	29.58±0.51	44.36±0.18
DFCIL-KD	36.06±0.70	49.17±0.72	28.52±0.89	44.43±0.56

方法	5-task		10-task
方法	$A N$	$A ¯ N$	$A N$	$A ¯ N$
ABD	28.62±0.32	44.59±0.81	20.67±0.16	39.24±0.74
MFGR	33.72±0.48	47.33±0.56	19.17±0.27	33.91±0.22
R-DFCIL^*	35.89±0.75	48.96±0.40	29.58±0.51	44.36±0.18
DFCIL-KD	36.06±0.70	49.17±0.72	28.52±0.89	44.43±0.56

$L l c e$	$L k d$	$L r k d$	$L d r$	增量准确率				平均增量准确率
$L l c e$	$L k d$	$L r k d$	$L d r$	任务2	任务3	任务4	任务5	平均增量准确率
√	√	√		71.67±0.27	60.17±0.11	55.38±0.23	49.66±0.28	64.62±0.24
√	√		√	72.07±0.36	60.95±0.22	55.44±0.16	50.09±0.27	64.96±0.33
√		√	√	49.88±0.40	31.50±0.48	25.54±0.77	18.91±0.56	42.42±0.45
√	√	√	√	71.40±0.48	61.48±0.29	56.10±0.34	50.26±0.52	65.10±0.47

$L l c e$	$L k d$	$L r k d$	$L d r$	增量准确率				平均增量准确率
$L l c e$	$L k d$	$L r k d$	$L d r$	任务2	任务3	任务4	任务5	平均增量准确率
√	√	√		71.67±0.27	60.17±0.11	55.38±0.23	49.66±0.28	64.62±0.24
√	√		√	72.07±0.36	60.95±0.22	55.44±0.16	50.09±0.27	64.96±0.33
√		√	√	49.88±0.40	31.50±0.48	25.54±0.77	18.91±0.56	42.42±0.45
√	√	√	√	71.40±0.48	61.48±0.29	56.10±0.34	50.26±0.52	65.10±0.47

References 27

1	朱飞，张煦尧，刘成林. 类别增量学习研究进展和性能评价［J］. 自动化学报， 2023， 49（3）： 635-660.
2	ZHOU D W， WANG Q W， QI Z H， et al. Deep class-incremental learning： a survey［EB/OL］. ［2023-09-13］..
3	CARPENTER G A， GROSSBERG S. A massively parallel architecture for a self-organizing neural pattern recognition machine［J］. Computer Vision， Graphics， and Image Processing， 1987， 37（1）： 54-115.
4	MERMILLOD M， BUGAISKA A， BONIN P. The stability-plasticity dilemma： investigating the continuum from catastrophic forgetting to age-limited learning effects［J］. Frontiers in Psychology， 2013， 4： No.54.
5	McCLOSKEY M， COHEN N J. Catastrophic interference in connectionist networks： the sequential learning problem［J］. Psychology of Learning and Motivation， 1989， 24： 109-165.
6	FRENCH R M. Catastrophic forgetting in connectionist networks［J］. Trends in Cognitive Sciences， 1999， 3（4）： 128-135.
7	ROBINS A. Catastrophic forgetting， rehearsal and pseudorehearsal［J］. Connection Science， 1995， 7（2）： 123-146.
8	XIN X， ZHONG Y， HOU Y， et al. Memory-free generative replay for class incremental learning［EB/OL］. ［2023-10-07］..
9	黄震华，杨顺志，林威，等. 知识蒸馏研究综述［J］. 计算机学报， 2022， 45（3）： 624-653.
10	WANG L， YOON K J. Knowledge distillation and student-teacher learning for visual intelligence： a review and new outlooks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（6）： 3048-3068.
11	HINTON G， VINYALS O， DEAN J. Distilling the knowledge in a neural network［EB/OL］. ［2023-07-30］..
12	GOTMARE A， KESKAR N S， XIONG C， et al. A closer look at deep learning heuristics： learning rate restarts， warmup and distillation［EB/OL］. ［2023-07-30］. .
13	ROMERO A， BALLAS N， KAHOU S E， et al. FitNets： hints for thin deep nets［EB/OL］. ［2023-07-15］. .
14	PARK W， KIM D， LU Y， et al. Relational knowledge distillation ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3962-3971.
15	SMITH J， HSU Y C， BALLOCH J， et al. Always be dreaming： a new approach for data-free class-incremental learning［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2021： 9354-9364.
16	MORDVINTSEV A， OLAH C， TYKA M. Inceptionism： going deeper into neural networks［EB/OL］. ［2023-07-16］. .
17	CHEN H， WANG Y， XU C， et al. Data-free learning of student networks［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 3513-3521.
18	YIN H， MOLCHANOV P， ALVAREZ J M， et al. Dreaming to distill： data free knowledge transfer via DeepInversion ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 8712-8721.
19	GAO Q， ZHAO C， GHANEM B， et al. R-DFCIL： relation-guided representation learning for data-free class incremental learning［C］// Proceedings of the 2022 European Conference on Computer Vision， LNCS 13683. Cham： Springer， 2022： 423-439.
20	MAO Q， LEE H Y， TSENG H Y， et al. Mode seeking generative adversarial networks for diverse image synthesis［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019：1429-1437.
21	LI Z， HOIEM D. Learning without forgetting ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（12）： 2935-2947.
22	KRIZHEVSKY A， HINTON G. Learning multiple layers of features from tiny images ［R/OL］. ［2023-07-17］. .
23	LE Y， YANG X. Tiny ImageNet visual recognition challenge ［R/OL］. ［2023-07-18］. .
24	LIU Y， PARISOT S， SLABAUGH G， et al. More classifiers， less forgetting： a generic multi-classifier paradigm for incremental learning［C］// Proceedings of the 2020 European Conference on Computer Vision， LNCS 12371. Cham： Springer， 2020：699-716.
25	SHIN H， LEE J K， KIM J， et al. Continual learning with deep generative replay ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 2994-3003.
26	REBUFFI S A， KOLESNIKOV A， SPERL G， et al. iCaRL： incremental classifier and representation learning ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 5533-5542.
27	HOU S， PAN X， LOY C C， et al. Learning a unified classifier incrementally via rebalancing ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019：831-839.

[1]	Weigang LI, Jiale SHAO, Zhiqiang TIAN. Point cloud classification and segmentation network based on dual attention mechanism and multi-scale fusion [J]. Journal of Computer Applications, 2025, 45(9): 3003-3010.
[2]	Zhixiong XU, Bo LI, Xiaoyong BIAN, Qiren HU. Adversarial sample embedded attention U-Net for 3D medical image segmentation [J]. Journal of Computer Applications, 2025, 45(9): 3011-3016.
[3]	Panfeng JING, Yudong LIANG, Chaowei LI, Junru GUO, Jinyu GUO. Semi-supervised image dehazing algorithm based on teacher-student learning [J]. Journal of Computer Applications, 2025, 45(9): 2975-2983.
[4]	Hongjun ZHANG, Gaojun PAN, Hao YE, Yubin LU, Yiheng MIAO. Multi-source heterogeneous data analysis method combining deep learning and tensor decomposition [J]. Journal of Computer Applications, 2025, 45(9): 2838-2847.
[5]	Jin LI, Liqun LIU. SAR and visible image fusion based on residual Swin Transformer [J]. Journal of Computer Applications, 2025, 45(9): 2949-2956.
[6]	Bing YIN, Zhenhua LING, Yin LIN, Changfeng XI, Ying LIU. Emotion recognition method compatible with missing modal reasoning [J]. Journal of Computer Applications, 2025, 45(9): 2764-2772.
[7]	Peng PENG, Ziting CAI, Wenling LIU, Caihua CHEN, Wei ZENG, Baolai HUANG. Speech emotion recognition method based on hybrid Siamese network with CNN and bidirectional GRU [J]. Journal of Computer Applications, 2025, 45(8): 2515-2521.
[8]	Shuo ZHANG, Guokai SUN, Yuan ZHUANG, Xiaoyu FENG, Jingzhi WANG. Dynamic detection method of eclipse attacks for blockchain node analysis [J]. Journal of Computer Applications, 2025, 45(8): 2428-2436.
[9]	Yanhua LIAO, Yuanxia YAN, Wenlin PAN. Multi-target detection algorithm for traffic intersection images based on YOLOv9 [J]. Journal of Computer Applications, 2025, 45(8): 2555-2565.
[10]	Lina GE, Mingyu WANG, Lei TIAN. Review of research on efficiency of federated learning [J]. Journal of Computer Applications, 2025, 45(8): 2387-2398.
[11]	Jinxian SUO, Liping ZHANG, Sheng YAN, Dongqi WANG, Yawen ZHANG. Review of interpretable deep knowledge tracing methods [J]. Journal of Computer Applications, 2025, 45(7): 2043-2055.
[12]	Zhenzhou WANG, Fangfang GUO, Jingfang SU, He SU, Jianchao WANG. Robustness optimization method of visual model for intelligent inspection [J]. Journal of Computer Applications, 2025, 45(7): 2361-2368.
[13]	Yuelan ZHANG, Jing SU, Hangyu ZHAO, Baili YANG. Multi-view knowledge-aware and interactive distillation recommendation algorithm [J]. Journal of Computer Applications, 2025, 45(7): 2211-2220.
[14]	Qiaoling QI, Xiaoxiao WANG, Qianqian ZHANG, Peng WANG, Yongfeng DONG. Label noise adaptive learning algorithm based on meta-learning [J]. Journal of Computer Applications, 2025, 45(7): 2113-2122.
[15]	Xiaoyang ZHAO, Xinzheng XU, Zhongnian LI. Research review on explainable artificial intelligence in internet of things applications [J]. Journal of Computer Applications, 2025, 45(7): 2169-2179.